One Day Build (House Hunting)

We’re in the middle of house hunting in Nashville (which is booming).

So, I wanted a tool that would help me vet addresses as they pop-up in my feed from our realtor.
We wanted to be able to walk to some things in our neighborhood and fortunately that criterion a pretty easy thing to map out by hand in inkscape with screenshots from Google maps. I was able to produce a png map that had those regions clearly identified. For those that remember the 2010 floods, water ways are something to be wary of. So I pulled a map from FEMA (maps.nashville.gov) and overlayed that with a transparency by hand, easy enough. I could also have tried to overlay crime maps and offender registries, but this was sufficient for triage.

Now I just needed a converter(mapper) between the x,y in inkscape and gps coordinates.
I used PIL (python image library, aka pillow) to draw on the picture I had created.
The converter looks like this:

def mapper(lat,long):
#scale = [4528.104575164487, -162821.9551111503, 3633.747495444875, 315924.9870280141]
#return(long*scale[2]+scale[3],flip-( lat*scale[0]+scale[1]))
scale = [-5228.758169935899, 189234.5555556009, 4041.34431458903, 351362.65809204464]
return(long*scale[2]+scale[3],lat*scale[0]+scale[1])

There are two attempts here because a conversion from GPS coordinates to inkscape coordinates is unfortunately not the same as GPS to pillow.

I derived this via two “calibration” points on my map and the respective coordinates in pillow.

given = [[36.039065, -86.782672],[36.042890, -86.606493]]
res = [[644, 795],[1356,775]]
a = (given[1][0] – given[0][0]) / (res[1][1] – res[0][1])
b = -a * res[0][1] + given[0][0]
c = (given[1][1] – given[0][1]) / (res[1][0] – res[0][0])
d = -c * res[0][0] + given[0][1]
scale = [1/a, -b/a, 1/c, -d/c]

This finds slope and offset for two categories , latitude and longitude, based off of four points and is exact (those potentially off by a little depending on the accuracy of my calibration points). I should’ve picked points better, because 36.039 is not very different from 36.042. Oh well. In the end it worked.
Then I just hardcoded the values of the variable scale into the function mapper.

I have my x,y coordinates from latitude and longitude. Now I want to draw on my map.

def drawer(coords):
im = Image.open(“pillow.png”)
draw = ImageDraw.ImageDraw(im)
flip = im.size[-1]
for pair in coords:
vec = [mapper(pair[0]+0.001,pair[1]-0.001,flip),
mapper(pair[0]-0.001,pair[1]+0.001,flip)]
print(vec)
draw.ellipse(vec,fill=100)
im.show()

It makes all the points the same color which makes it difficult to judge multiple new points on a figure, but this was sufficient for my purposes.
The plus and minus 0.001 was found by trial and error to make the correct sized dots on the map.

The tool was just for me but my wife also appreciated that we could use this to quickly go through the initial barrage of home listings
and weed out the listings that were for sure not going to be of interest.

Not too bad for a few hours of work and most of that was just deciding and drawing out the regions of interest.

Machine Learning for Scientific Applications

Data taken in scientific context often forms an image. Whether this is a reconstruction of a physical space or if it is simply a graphical representation of data, machine learning and image processing tachniques can have some use for a scientist.

In one context, energy sharing between two detectors forms lines of a constant sum.

In this case it was not possible to calibrate the y-axis directly. Only the x-axis and the sum could be accurately calibrated. Instead of recursively scanning and re-calibrating the data, one can use a clustering algorithm called k-lines means. The slope of these lines is the negative reciprocal of the slope of the calibration. 

In this case, rapid convergence even for k=3, fewer than the k=9 which is closer to the number of calibration points which is physically meaningful.

This fact allowed for a fast running algorithm which can run online for stability monitoring.

How do I choose? Part 1. Shopping in the information age

Making a decision with an over abundance of information is the burden of choice. Sometime you just have too much information, some of it is garbage (spam, etc.), some of it is irrelevant, some of it is in the grey zone (biased, neutral, etc.), some data are only useful if you have enough statistics (and sometimes the data just isn’t available).

It can be exhausting to try to make all of your decisions based off of the best information available. Most people don’t bother. You just go on the recommendation of someone you trust or you go with your gut.

I want to do better. I’m going to talk about a specific instance: shopping. It takes some effort to go shopping for a big ticket item, or something you want to use frequently, or for a long time and you care about how well it works.

This is how I bought a vacuum cleaner the last time. I had well defined design criteria. I  needed a vacuum cleaner that could deal with the long hair from my wife that seems to coat my floor in a mono-layer of fibrous keratin. These fibers love to wrap themselves inside of vacuum brushes and block those brushes from picking up anything else. I wanted a vacuum cleaner that either dealt with hair better or at least made it easy to clean it out of the rollers.

To the google machine!

“Ok google, how can I get you to tell me the things that I care about?” If i google that I get a huffpo article, and well that’s not helpful. So I have to think. Well I care about value. What does that mean? It means I want the best device for the thing that I want at the cheapest price. Deals with hair as a search criterion returns a lot of pet hair vacuum cleaners not suitable for household cleaning. So after a couple of hours of digging through reviews for some crowd based wisdom. We found several choices.

So now we needed to know whether or not those reviews were reliable. So, we wanted to see them. We found we could find several of them at Sears. So we went there tested them out and found that we could do with a Shark NV681. It has a groove down the middle of the rollers so that you can go with scissors and cut out hair. It seems like that was the best we could do.

Back to google, to see where we can find the best price. My wife is better at this part, and we were able to wrangle the best deal at Khols.

Two days later, we had a vacuum cleaner.

I can’t do that for every decision. So I’m working to make a toolkit (data scraper etc.) to make that easier. Part 2 will have a description of the toolkit. But here are the criteria,

Filtering: I have to be able to separate the wheat from the chaff. Some times reviews are paid for and sometimes they don’t really say anything substantive. I need a good representative sample of 1-5 star reviews. Other times I have budgetary or other requirements that mean I can place hard cuts on the search before I even start looking.

Prioritization: Sometimes I care about specifications, sometimes I need the reviews. Sometimes I want a purple one. I need a toolkit that can let me rank how important various aspects of the data are.

Categorization: I would like a solution that can cluster objects of a similar ilk.

VISUALIZATION: Most importantly, I need to be able to intuit what the data says. This is by far the hardest thing to automate because you often have to do the other things first before you can create a method of visualization. therefore I need something which is…

Interactive and flexible: Having a tool which is independent and just works is a bit of a pipe dream. The system will rely on having my dynamic input and intellect in order to build the digest and summary. My brain’s ability to recognize patterns is unparalleled. Even if it was, I won’t be comfortable with answer unless I come to the conclusion myself.

Before part 2 comes: Does anyone out there have ideas? Tell me in the comments how you make decisions.