Machine Learning for Scientific Applications

Data taken in scientific context often forms an image. Whether this is a reconstruction of a physical space or if it is simply a graphical representation of data, machine learning and image processing tachniques can have some use for a scientist.

In one context, energy sharing between two detectors forms lines of a constant sum.

In this case it was not possible to calibrate the y-axis directly. Only the x-axis and the sum could be accurately calibrated. Instead of recursively scanning and re-calibrating the data, one can use a clustering algorithm called k-lines means. The slope of these lines is the negative reciprocal of the slope of the calibration. 

In this case, rapid convergence even for k=3, fewer than the k=9 which is closer to the number of calibration points which is physically meaningful.

This fact allowed for a fast running algorithm which can run online for stability monitoring.

How do I choose? Part 1. Shopping in the information age

Making a decision with an over abundance of information is the burden of choice. Sometime you just have too much information, some of it is garbage (spam, etc.), some of it is irrelevant, some of it is in the grey zone (biased, neutral, etc.), some data are only useful if you have enough statistics (and sometimes the data just isn’t available).

It can be exhausting to try to make all of your decisions based off of the best information available. Most people don’t bother. You just go on the recommendation of someone you trust or you go with your gut.

I want to do better. I’m going to talk about a specific instance: shopping. It takes some effort to go shopping for a big ticket item, or something you want to use frequently, or for a long time and you care about how well it works.

This is how I bought a vacuum cleaner the last time. I had well defined design criteria. I  needed a vacuum cleaner that could deal with the long hair from my wife that seems to coat my floor in a mono-layer of fibrous keratin. These fibers love to wrap themselves inside of vacuum brushes and block those brushes from picking up anything else. I wanted a vacuum cleaner that either dealt with hair better or at least made it easy to clean it out of the rollers.

To the google machine!

“Ok google, how can I get you to tell me the things that I care about?” If i google that I get a huffpo article, and well that’s not helpful. So I have to think. Well I care about value. What does that mean? It means I want the best device for the thing that I want at the cheapest price. Deals with hair as a search criterion returns a lot of pet hair vacuum cleaners not suitable for household cleaning. So after a couple of hours of digging through reviews for some crowd based wisdom. We found several choices.

So now we needed to know whether or not those reviews were reliable. So, we wanted to see them. We found we could find several of them at Sears. So we went there tested them out and found that we could do with a Shark NV681. It has a groove down the middle of the rollers so that you can go with scissors and cut out hair. It seems like that was the best we could do.

Back to google, to see where we can find the best price. My wife is better at this part, and we were able to wrangle the best deal at Khols.

Two days later, we had a vacuum cleaner.

I can’t do that for every decision. So I’m working to make a toolkit (data scraper etc.) to make that easier. Part 2 will have a description of the toolkit. But here are the criteria,

Filtering: I have to be able to separate the wheat from the chaff. Some times reviews are paid for and sometimes they don’t really say anything substantive. I need a good representative sample of 1-5 star reviews. Other times I have budgetary or other requirements that mean I can place hard cuts on the search before I even start looking.

Prioritization: Sometimes I care about specifications, sometimes I need the reviews. Sometimes I want a purple one. I need a toolkit that can let me rank how important various aspects of the data are.

Categorization: I would like a solution that can cluster objects of a similar ilk.

VISUALIZATION: Most importantly, I need to be able to intuit what the data says. This is by far the hardest thing to automate because you often have to do the other things first before you can create a method of visualization. therefore I need something which is…

Interactive and flexible: Having a tool which is independent and just works is a bit of a pipe dream. The system will rely on having my dynamic input and intellect in order to build the digest and summary. My brain’s ability to recognize patterns is unparalleled. Even if it was, I won’t be comfortable with answer unless I come to the conclusion myself.

Before part 2 comes: Does anyone out there have ideas? Tell me in the comments how you make decisions.

After some research, it seems like shopping websites really don’t like you to scrape all their data and reviews and try to build something up that makes it easier to sift through. They want you to come to their site and stay and not take the time to do all the comparison shopping because they might lose your business. This kind of thing would be a drain on their resources with out much benefit to them. I guess I understand. So, oh well.

Hello world!

This blog contains lessons learned and interesting things (to me anyway) discovered in the course of exploring the universe.

I have borne witness to things a billion times smaller than a bacterium and seen the darkness beyond the dimmest star.

The world we live in is amazing, and I would like to share it with you.