On the Date of State Of The Union

As I sat at lunch today, I noticed that the state of the Union is tonight. I thought, ‘Wow! That’s late isn’t it?’ But I wasn’t sure.

So, I did the math. Here is the result:

If we look back to the dates from 1910, the current date is 1.3σ away from that mean. Furthermore, presently (since 1980), the later dates with smaller variance are still only 1.5σ away from the mean. If these fluctuations were random, 3σ would be within expectations. The σ of 6 for this time period is exactly what you would expect for random fluctuation about some average.

The only interesting thing I noticed was that the ~1940 lower average seemed to transition between 1960-1980 to a new later average.

Does anyone know of a reason why?

What makes me happy

I have said for a long time that

“If I’m thinking, I’m happy.”

Today changed my mind a bit. Thinking does make me happy, but I realized something (perhaps very simple) today: that idea may be a sufficient but not a necessary condition.

I had an email this morning from a collaborator in Europe who needed help with a particular tool. I volunteered to help. We’re on Linux systems and GNU screen is a terminal extension program I have used frequently during training or teaching sessions. I can make several windows and collaborators can watch what I do or I can open an in-terminal editor (read vim, emacs or nano etc.) and we can chat. The text file can then be saved for later review. This is a very helpful framework that I have used often.

After a few failed attempts, we figured it out and went on with our days. It was as simple as a configuration file being inappropriate and the associated error message being less than useless because it was distracting. However, the utility and craft of good and bad error messages might be a different post altogether.

The problem presented was a variant of something I had seen before. It was a challenge, but I realized that the challenge wasn’t the aspect of what I truly enjoyed about my morning. I think I’ve often been of this mindset, but I think I’ve only just accepted that my desire to have useful contributions may supersede my desire to be challenged.

I have also said for some time, that I have found myself very naturally making tools that help everyone be successful. This phrase may be too long. I think I will start saying:

‘If I’m helping, I’m happy.’

What do you think?

After I was done, I got to talking to a man named Thomas in the place where I was sitting. He said to me that I am a resource in search of an opportunity. I will be thinking about that phrase as well.

One Day Build (House Hunting)

We’re in the middle of house hunting in Nashville (which is booming).

So, I wanted a tool that would help me vet addresses as they pop-up in my feed from our realtor.
We wanted to be able to walk to some things in our neighborhood and fortunately that criterion a pretty easy thing to map out by hand in inkscape with screenshots from Google maps. I was able to produce a png map that had those regions clearly identified. For those that remember the 2010 floods, water ways are something to be wary of. So I pulled a map from FEMA (maps.nashville.gov) and overlayed that with a transparency by hand, easy enough. I could also have tried to overlay crime maps and offender registries, but this was sufficient for triage.

Now I just needed a converter(mapper) between the x,y in inkscape and gps coordinates.
I used PIL (python image library, aka pillow) to draw on the picture I had created.
The converter looks like this:

def mapper(lat,long):
#scale = [4528.104575164487, -162821.9551111503, 3633.747495444875, 315924.9870280141]
#return(long*scale[2]+scale[3],flip-( lat*scale[0]+scale[1]))
scale = [-5228.758169935899, 189234.5555556009, 4041.34431458903, 351362.65809204464]
return(long*scale[2]+scale[3],lat*scale[0]+scale[1])

There are two attempts here because a conversion from GPS coordinates to inkscape coordinates is unfortunately not the same as GPS to pillow.

I derived this via two “calibration” points on my map and the respective coordinates in pillow.

given = [[36.039065, -86.782672],[36.042890, -86.606493]]
res = [[644, 795],[1356,775]]
a = (given[1][0] – given[0][0]) / (res[1][1] – res[0][1])
b = -a * res[0][1] + given[0][0]
c = (given[1][1] – given[0][1]) / (res[1][0] – res[0][0])
d = -c * res[0][0] + given[0][1]
scale = [1/a, -b/a, 1/c, -d/c]

This finds slope and offset for two categories , latitude and longitude, based off of four points and is exact (those potentially off by a little depending on the accuracy of my calibration points). I should’ve picked points better, because 36.039 is not very different from 36.042. Oh well. In the end it worked.
Then I just hardcoded the values of the variable scale into the function mapper.

I have my x,y coordinates from latitude and longitude. Now I want to draw on my map.

def drawer(coords):
im = Image.open(“pillow.png”)
draw = ImageDraw.ImageDraw(im)
flip = im.size[-1]
for pair in coords:
vec = [mapper(pair[0]+0.001,pair[1]-0.001,flip),
mapper(pair[0]-0.001,pair[1]+0.001,flip)]
print(vec)
draw.ellipse(vec,fill=100)
im.show()

It makes all the points the same color which makes it difficult to judge multiple new points on a figure, but this was sufficient for my purposes.
The plus and minus 0.001 was found by trial and error to make the correct sized dots on the map.

The tool was just for me but my wife also appreciated that we could use this to quickly go through the initial barrage of home listings
and weed out the listings that were for sure not going to be of interest.

Not too bad for a few hours of work and most of that was just deciding and drawing out the regions of interest.

Machine Learning for Scientific Applications

Data taken in scientific context often forms an image. Whether this is a reconstruction of a physical space or if it is simply a graphical representation of data, machine learning and image processing tachniques can have some use for a scientist.

In one context, energy sharing between two detectors forms lines of a constant sum.

In this case it was not possible to calibrate the y-axis directly. Only the x-axis and the sum could be accurately calibrated. Instead of recursively scanning and re-calibrating the data, one can use a clustering algorithm called k-lines means. The slope of these lines is the negative reciprocal of the slope of the calibration. 

In this case, rapid convergence even for k=3, fewer than the k=9 which is closer to the number of calibration points which is physically meaningful.

This fact allowed for a fast running algorithm which can run online for stability monitoring.

How do I choose? Part 1. Shopping in the information age

Making a decision with an over abundance of information is the burden of choice. Sometime you just have too much information, some of it is garbage (spam, etc.), some of it is irrelevant, some of it is in the grey zone (biased, neutral, etc.), some data are only useful if you have enough statistics (and sometimes the data just isn’t available).

It can be exhausting to try to make all of your decisions based off of the best information available. Most people don’t bother. You just go on the recommendation of someone you trust or you go with your gut.

I want to do better. I’m going to talk about a specific instance: shopping. It takes some effort to go shopping for a big ticket item, or something you want to use frequently, or for a long time and you care about how well it works.

This is how I bought a vacuum cleaner the last time. I had well defined design criteria. I  needed a vacuum cleaner that could deal with the long hair from my wife that seems to coat my floor in a mono-layer of fibrous keratin. These fibers love to wrap themselves inside of vacuum brushes and block those brushes from picking up anything else. I wanted a vacuum cleaner that either dealt with hair better or at least made it easy to clean it out of the rollers.

To the google machine!

“Ok google, how can I get you to tell me the things that I care about?” If i google that I get a huffpo article, and well that’s not helpful. So I have to think. Well I care about value. What does that mean? It means I want the best device for the thing that I want at the cheapest price. Deals with hair as a search criterion returns a lot of pet hair vacuum cleaners not suitable for household cleaning. So after a couple of hours of digging through reviews for some crowd based wisdom. We found several choices.

So now we needed to know whether or not those reviews were reliable. So, we wanted to see them. We found we could find several of them at Sears. So we went there tested them out and found that we could do with a Shark NV681. It has a groove down the middle of the rollers so that you can go with scissors and cut out hair. It seems like that was the best we could do.

Back to google, to see where we can find the best price. My wife is better at this part, and we were able to wrangle the best deal at Khols.

Two days later, we had a vacuum cleaner.

I can’t do that for every decision. So I’m working to make a toolkit (data scraper etc.) to make that easier. Part 2 will have a description of the toolkit. But here are the criteria,

Filtering: I have to be able to separate the wheat from the chaff. Some times reviews are paid for and sometimes they don’t really say anything substantive. I need a good representative sample of 1-5 star reviews. Other times I have budgetary or other requirements that mean I can place hard cuts on the search before I even start looking.

Prioritization: Sometimes I care about specifications, sometimes I need the reviews. Sometimes I want a purple one. I need a toolkit that can let me rank how important various aspects of the data are.

Categorization: I would like a solution that can cluster objects of a similar ilk.

VISUALIZATION: Most importantly, I need to be able to intuit what the data says. This is by far the hardest thing to automate because you often have to do the other things first before you can create a method of visualization. therefore I need something which is…

Interactive and flexible: Having a tool which is independent and just works is a bit of a pipe dream. The system will rely on having my dynamic input and intellect in order to build the digest and summary. My brain’s ability to recognize patterns is unparalleled. Even if it was, I won’t be comfortable with answer unless I come to the conclusion myself.

Before part 2 comes: Does anyone out there have ideas? Tell me in the comments how you make decisions.

 

Hello world!

This blog contains lessons learned and interesting things (to me anyway) discovered in the course of exploring the universe.

I have borne witness to things a billion times smaller than a bacterium and seen the darkness beyond the dimmest star.

The world we live in is amazing, and I would like to share it with you.