Do you believe this number?

Do you believe this number at face value and without question? “40% of the USA’s coronavirus deaths could have been prevented.”

I hope not. Even if you think you likely will agree with the headline that is all over my news feed today, this statement should give you pause. At very least, I would ask you to stop and think for just a moment. What does that number mean? How was it calculated? Should I care? Is this a call to any action?

Especially, please ask when you read such articles, where can I find more information? If you read nothing else from my blog, read the source and understand it if you are curious, I think it has some very interesting food for thought. You have to register on the site, but the article is free.

I first want to critique some of the news coverage. I was specifically disappointed in USA Today which only named the Lancet Commission report on “Public Policy and Health in the Trump Era.” I thought Newsweek was better because the article actually quoted directly the relevant section of the report. I was most pleased with ABC’s write up because they provided a link to the report and the title of the article was less sensationalizing. 

Let me just go through how I engaged with this material this evening. As I first read the news article these are the things that smell funny to me. 

  1. I was on alert that 40% is a round number not given with any uncertainty. 
  2. ‘Could have been prevented.’ I was very unsatisfied that this term was not well defined in several articles. How do they know these deaths were preventable? That doesn’t make sense. 
  3. I was also alerted because the date range over which this data was collected was not specifically mentioned. 

So, I wanted to look at the report more closely. 

The main downside is that the report is a 49 page document. While I genuinely think it has some valuable insight, this is a lot for someone to sit down and read. But, even if all you did was look at the first 2 out of 11 charts in this report and understand what they indicate, I think you would have a better understanding of the 40% number and what it really means. You might come to the same conclusion that I did: that the importance of this report is not just a critique of recent policies but rather that there are deep and ever worsening health issues for Americans as compared to the rest of the world.

The first figure shows life expectancy in the US compared to other G7 countries (Japan, Italy, France, Canada, UK, and Germany) and spoilers, it isn’t great. Starting in the mid 80’s the life expectancy has been shorter in the US than all of these other developed countries and has remained at the bottom of this list and since then, the gap has only widened. 

The second and similar figure shows the number of deaths above the average of the six other G7 countries. This plot shows that since 2014 there has been a yearly excess of 400,000 deaths in this country every year if you compare US health statistics to other parts of the world. It also shows that this is simply the state of things at the end of a 40 year cultural inheritance of poorer health. 

So after reading this far I began to understand that the “40% preventable COVID deaths” is a number which compares our loss of life to other countries and implies a claim of causation that had we enacted better policies similar to those enacted by other countries more lives would have been saved. 

This claim is suspect at best. Only by actually stopping and thinking for just a moment and reading this report one can question this conclusion. Consider the alternative, that the US was as healthy as the rest of the world prior to the pandemic but had a 40% higher death rate during the time of the pandemic. In that case, we could justifiably blame recent policy, dismissal and inaction for those deaths. But somehow we have to reconcile that health in the US has been on the decline for a long time and talk about how to change it. There are some recommendations in this report. 

The point is, one could dismiss this 40% number as politically prejudicial, one could overlook this number with a grim acceptance that this number aligns with your expectations for the policies of the former president, or one can realize that we have a problem in the culture of american health and healthcare and consider what can be done.

Please be well and stay safe,

N

 

A little about luck

Four Leaf
Who knew that finding such things was a skill?

 

I have an ability to find four-leaf and higher multiplicity clovers. This may be somewhat related to my Scotch-Irish heritage but, I think, not necessarily so. I definitely don’t think it has anything to do with luck.

You see, I was taught how to do this and I would be very interested if you would comment on this post whether or not my instruction here guides you sufficiently to be able to find your own. The corresponding tradition, by the way, that I was taught is to keep the four-leaf clovers and if you do find a five or six-leaf clover, that these should be given away. 

My grandmother taught me how to find these beautiful mutants. She passed away last week and writing this post is a bit of my tribute to her legacy in my life. She showed me my first and then taught me to remember the pattern of a four-leaf clover in my mind as to scan as many clovers as possible stopping only if I saw something that looks similar enough. Yes, one might over look an oddity in this process. But, there is a useful heuristic that most of the clovers that one looks at are normal and the more time you spend looking at them, is the more time that you waste. 

So, succinctly put, if you would like to try:

  1. Memorize the form of a four-leaf clover.
  2. Scan the clover patches quickly.
  3. Stop if you see anything that looks out of place (most of the time this will be 2 clovers growing close by to each other).
  4. If you find one, there are likely others nearby. 

To one who doesn’t know, it will look after a time to be this wondrous luck. You will be walking along in a park, etc. and you will simply stop and pick up a treasure that many have never encountered because they haven’t looked. 

There are many lessons that I see in this. Not the least of which is the adage, “Fortune favors the prepared.” But mostly I see the lesson that pattern recognition is such a powerful brain mechanism and a primary cognitive bias. My brains ability to recognize patterns allows me to quickly and efficiently reject irrelevant information and find something valuable, whatever that may be. But the flip side is that if you rely on patterns 100% of the time, you may miss out on something special. Like in the example of the clover, you may have gone your whole life never having seen one and you may think that this means you ought not to bother trying because they must be very rare or only lucky people find them. This may not be true.

This is also true in scientific research and data analysis. Typically, if I bring what I know to bear on a problem that I’m trying to solve and only focus on what I care about, what I find will be of better quality. Except, that ‘what I know’ may not be true all the time and I should check and others should check as well. This is, thankfully, the scientific process.  I had an illustration of this in my programming exercise today. I wanted to find the number of digits in the binary representation of several numbers. There are two ways I tested in python:

len(np.base_repr(np.random.randint(0,2**60-1),2))

and

np.floor(log2(np.random.randint(0,2**60-1)))+1

The first does the conversion to binary (base_repr) and then finds the length (len). The second method takes the logarithm in base 2, rounds it down (floor) and adds one (0 in binary is ‘0’ and has length 1). 

Each of these took about the same amount of time. That is, 0.09 s on average using timeit on my machine and it may be alot clearer to someone else reading my code that I wanted the length of the binary string if I use the first case. I might also want the binary string itself later for some reason. So, go with that. While typically, just calculating the answer I am interested in is faster, this is not always helpful and might mean that I have to redo something later. Each of these approaches might also change their performance in a different context or on a different computer. So this is why ‘best-practices’ or ‘one-size-fits-all’ solutions should always be debatable and every once and a while revisited when a new context shows itself. 

Be well, stay safe

N

As a side note: I’ve often wondered why there aren’t more of four leaf clovers. Is the added benefit of extra energy collection not outweighed by the need to support the other leaves? Why three? Is this a packing problem, such that they start to overlap after this? If you know or are also curious, drop a comment.

 

Asking the right questions

Have you ever had someone tell you: “Oh. That’s a good question.”

If you haven’t, you  should try it. It is quite nice. But which of these do you  think are good questions?

1.”What does this button do?”

If your response to number 1 is to push the button, you might be about to have a bad day. But, if you are honestly asking that question to an expert on the operation of said button, it is the best question to be asking. This is the truth behind the statement from Newton – “If I have seen farther it is because I have stood on the shoulders of giants.” Ask a subject matter expert, good question. Follow this question up with questions like “what should I see when I push this button?” and “What do I do if I don’t see that?”

2. “Does this <thing I am doing> even matter?”

This value based, pragmatic, critical thinking question is at the heart of all good questions. It helps keep you motivated if the answer is yes and if the answer is no, it helps you spend your time more wisely. Good question. Keep asking, ‘Is this important to me personally or professionally?” or “Is this important to some stakeholder in the <thing I am doing>?” This could be your wife, employer, etc.

Cost and benefit or risk and reward analysis is also super important in analysis. In science, statistics and data analysis one can usually come up with a new angle or question to ask of the data or continue trying to squeeze every ounce of precision available to you. For some data sets this matters. Let’s say that there are relatively straightforward means to be 90% confident in the insight from some data.  If you want 99%, it is going to take 10-100x more effort to worry about that last little bit. A good question asks, “is it worth it?” If you see my earlier post about shopping for a vacuum cleaner, it was worth it in that case. 

3.”When did you stop beating your wife?”

This is a classic example of a leading question. It may serve its purpose in interrogation or parenting, but it is devastating in data analysis or decision making. If you want honest inquiry into a subject, you have to work very hard to combat biases. This is really challenging, part of a scientific mindset is to hypothesize first. Make a guess and check that it works. But that mindset very easily leads to confirmation bias if one is not careful to follow it up with the next question.

4.”What are the chances I could be wrong?”

Great question. The answer is non-zero. Failure is always an option. But glorious ‘failure’ that leads to new insight, inventions and to the right answer eventually. But this is the philosophical success of Bayesian reasoning, my confidence can approach but never reach 100%. The trouble is, this mindset needs to be practiced rigorously. It is much more natural to round off likely-hoods and think in terms of 0%,50% or 100% and no nuance in between. 

5.”What is the tallest mountain in Europe?”

This kind of academic or quiz question has its place in the classroom where subject matter expertise needs to be assessed. Even in a classroom setting, this kind of question doesn’t promote creativity or honest inquiry about a subject. But in a professional setting, there is also a kind of question that a person might ask that they know the answer to and they are asking because they want to show off. Furthermore, instead of just making a comment, they ask in such a way as to make the speaker squirm. For my money, this is the worst kind of question.

N

One Day Build – Life Expectancy Comparison a la SQL and Python

I was inspired today to continue learning. Thanks to some folks over at Penny University (pennyuniversity.org), I found a quick little learning opportunity. I am focused this month on learning some more skills in SQL and in JS. A list and map of life expectancy data was posted to our slack channel (wikipedia.org/wiki/List_of_U.S._states_and_territories_by_life_expectancy) and showed some very interesting level of detail. County and census track level analysis had been done for the first time (https://www.pnas.org/content/117/30/17688) and this paper indicated that life expectancy varies at an even smaller, very local scale in the US. So, that was a neat read. But I noticed two of the tables in the wikipedia article, one that shows the break down of these numbers by state and another by the 50 largest US cities. I had a guess that the largest population centers in the US had a large effect on the data of the states in which they are located. This looked like an opportunity to try to do two things:

  1. Get a statistic for how similar the city data and the state data are for states where the 50 largest cities are located.
  2. Plot the same in python and hopefully get the same answer.

Half of this build was spent just cleaning up 5 data sets I grabbed related to this idea, 3 from the wikipedia and 2 from the CDC. I’ll probably keep playing with this so it was neat to try taking 5 .csv files and dumping them into a MySQL database. I got to remind myself how to regex find and replace with back references in gedit and that was kinda terrible, but with the datasets cleaned up sufficiently I could make a script for building the tables in a new database (No really, this took a while). I learned today about mysqlimport and the use of –local and I learned another way via ‘LOAD DATA INFILE’. I learned the use of ‘FIELDS TERMINATED BY’ because some of my files were tab separated and some were comma separated.

My first gotcha came when I learned about DECIMAL declarations for the numerical fields I wanted to import. DECIMAL is really an integer by default and that’s weird to me. The default is to give you a rounded value until you tell it specifically how many digits there are in total and how many come after the decimal point. 

Then there was some more back and forth deleting and loading the data while checking on the warning messages. This was pretty straightforward. 

Then I had to come up with the interestingly complicated SQL call below:

select avg(d) as avgd from (select avg(s.LE2018)-avg(c.LE2018) as d
 from LEByState1018 s join LEByCity c on s.State=c.State group by s.State) t;
 +----------------+
| avgd |
+----------------+
| 0.037269585161 |
+----------------+
1 row in set (0.01 sec) 

I want to have the states that are in common between the two tables. I do that by from table1 alias1 join table2 alias2 on condition , then I can take the average on the grouping of states. The trickiest part for me was realizing I needed to have avg(s.LE2018) because in this table each of these only had one entry. But, there was a check against these entries being not aggregated in the group. Then I have to make sure to name everything and it works. I have my answer: 0.037 years or 13.5 days is the amount by which large cities differ from their states’ average on average.

But what is the breakdown? I want to see this stuff plotted. So I will see what I can do in python. There was a lovely post from https://plotly.com/python/v3/graph-data-from-mysql-database-in-python/. With the MySQLdb library and pandas I was able to drop my SQL tables into some dataframes with 4 lines of code easily implemented with no surprises.Comparison of States and Cities for LE

Comparison of 2018 Data for States and 50 largest US Cities for Life Expectancy. wikipedia.org/wiki/List_of_U.S._states_and_territories_by_life_expectancy copied 10-8-2020

 

I made some plots and could see that indeed in many cases there is a close correlation. The abscissa doesn’t mean a whole lot but you can see the main point. Everything trends pretty close to the mean +- about a year and a half. The obvious point of note is Virginia Beach having a six year lower life expectancy than Virginia as a whole. My guess is there is a skew towards DC. But who knows. If I decide to go deeper on this I might try to recreate the data from the PNAS paper for a next little challenge. 

Things that ‘shouldn’t’ work

Virtual learning is a challenge for all this year. My wife is a teacher and is struggling in that boat. She encountered a scenario which was very frustrating for her and reminded me of a very important and often over looked point of problem solving. 

She was on the phone with administrators and technical support for her virtual learning platform for several hours, because she couldn’t login. They tried password resets, support staff tried to reproduce the error on a different computer and couldn’t (they were successful), but my wife was not successful on a different computer and others were successful on her computer. All to no avail. After trying many things, in the end, the third tier technical support professional tried to give her a new username. He said, “this shouldn’t work” but let’s just try it. Lo and behold, it worked. A spelling error in the database for her email (as a username) had led to her not being able to login, because she had been using her email spelled correctly. The company and others trying to help, had been copying her misspelled username from the database and it was working for them. Frustrating, to say the least, that many had not double checked the spelling of “schoos” instead of “schools” in the email. 

The idea that I want to promote here is not the Sherlock Holmes quote “When you have eliminated the impossible, whatever remains, however improbable, must be the truth.” This quote shows up in at least six books of Doyle’s. I am also not promoting the idea of trying to break the things that you have made, in order to test them. Although, this is also a very good idea. I am also not saying do nothing but question what has already been well established. If we do that, we never progress. 

No, rather I want to promote the idea of perpetual self-reproach and self-questioning. In order to be a good problem solver, one should have a practice of trying things that shouldn’t work. One should try the things that you know in advance should fail, because HOW they fail is so very important. 

This is fundamental to nuclear physics, where we learn so much about nature from watching things decay and from watching things explode and collide and reveal constituent components. It is vital in life, to ask myself, “What are my preconceived notions?” or “Have I been rude or prejudiced today?” or “Is there anything that I can do today to be more considerate of the people around me?” 

I find my faith life helps me here. I have a spiritual practice of knowing I am redeemed in my life. That I am capable of doing some good, because there is one who is good that lives for me and in me. I know that “all have sinned”, but that I am “free from condemnation” to use some quotes from Romans. So, I can approach myself continually, loving being wrong. I can confidently ask myself, “Am I evil?”, I can answer, “Yeah, probably.” Then I can take a deep breath, recognize the very spirit of God in every breath and heart beat, and I can be better today than I was yesterday without being crippled by shame.

I can learn. I learned today that it may not unfortunate that style often wins over substance (this thanks to MC in a learning community that I’m a part of called Penny University (pennyuniversity.org)). Maybe style is simply a vital part of substance. 

The point is this, I shouldn’t wait to prove all the other things impossible before I consider the improbable. I should build a life where I question myself and my most cherished ideas. I’m convinced that taking old and well practiced ideas and converting them to new and better ones is the hardest thing anyone can ever do. When it happens though, it is a revolution and new ideas have the power to solve all kinds of problems. 

Be well, Stay safe

N

The wonder of invisible things

Today I will write about one of many aspects of learning and problem solving in general that I’m very passionate about. 

Invisible things make the world go round. I don’t just mean that the basic building blocks of our universe are invisible. As a physicist, you can assume I also mean that. But there is a kind of aspect to problem solving that invokes something that isn’t just unseen by the naked eye, it is truly fiction. I’m talking about the cases like dropping a perpendicular line in geometry. In any given geometry problem, there are an infinite number of lines you can draw and most of them are decidedly unhelpful. However, a particular choice ( usually a perpendicular line) that comes with a constraint unlock the next step and allows you to say something new about the system. An iterative process of this kind, bootstraps knowledge together in such a way as to unlock problem solving. This is wondrous to me. 

It doesn’t stop there. Physics is rife with such problems to solve. The brachistochrone problem was one of many like this for me in graduate school. Problem solving in this system was unlocked by a particular set of co-ordinate choice. For other problems the key was the ‘right’ choice of basis in a quantum mechanics course. The ‘right’ one was of course revealed by the choice making the problem tractable at all. The common frustration between myself and my peers was ‘how do you know in advance what to choose?’ The response was a disappointing, ‘you can’t always know.’ Progress is made by an arbitrary choice of something abstract, something fictional or something invisible. 

More recently, I found this with a counting problem. I wanted to index the difference of the elements of a vector with itself so position 0 minus position (1,2,3, … ) then the difference of position 1 with position (2,3, … ) pretty easy to set up two loops over these indices. In python:

for i in range(0,maxi-1):
    for j in range(i+1,maxi):

With maxi being the total length of the vector of interest. This makes something like traversing an upper left triangular matrix. Well then I wanted a counter in this loop that goes (0,1,2,3,4….). It didn’t take too long for me to realize triangular numbers (n(n+1)/2) and indexing were involved but the specific form was a little illusive.

I started from what I could see (the elements of the vector) and I started counting from the bottom up of a smaller upper left triangular table. I reasoned that if I could count 1,2,3 from the bottom up, simple subtraction for the max value counts me down. I used a good problem solving step as well, I reduced my issue to something smaller but equivalent and worked on a whiteboard instead of the larger vectors I was interested in.  I got to the end, redid it because I made some mistakes and back substituted some new variables I had created and I realized that the final form indicated a much simpler derivation than all the steps I had taken to get there.

Had I, from the beginning, imagined a square matrix with indices i,j (row,column) and imagined subtracting off the ends of the lines I could have almost just written down j+i*maxi - 1/2(i+1)(i+2). Where the term, j+i*maxi counts through a square and 1/2(i+1)(i+2) is a triangular number index but for the bottom-right triangular array that is then subtracted off. But in truth, that part doesn’t exist. It is invisible. It is fictional and it absolutely unlocks the solution to the problem at hand.

It is incredible to me that I have made a career out of solving problems by looking at invisible things. 

Be well, stay safe

N

Next: Trying solutions that “shouldn’t” work

One day build – Scheduler

If have to do anything more than twice, I’m likely to make a script to help. 

Like so many others we are keeping to ourselves these days and it was helpful to make a scheduler pop-up for telling me when to move my child onto the next task during the day. Enter my super easy one day build. 

The relevant library was notify2 in python.  It tells my OS to pop-up a notification and send me a message.

import notify2
from playsound import playsound as ps
import schedule
import time

Then I set the important variables and initialize the relevant objects. These are a Notification object and a sound file name string. 

notify2.init("MyName")
n = notify2.Notification(None)
n.set_urgency(notify2.URGENCY_CRITICAL)
n.set_timeout(5000)
song_file = "gw151226.mp3" 

It plays a gravitational wave chirp. Quite nice. Then, I define one of several ‘jobs’.

def job1():
    n.update("message!")
    n.show()
    ps(song_file)

where ps is playsound.playsound() from the import line. It is then invoked with some variant of the ‘schedule’ program. 

schedule.every().day.at("08:30").do(job1)

And that’s all there is. 

Be well, stay safe, wear a mask. 

N

 

On the Date of State Of The Union

As I sat at lunch today, I noticed that the state of the Union is tonight. I thought, ‘Wow! That’s late isn’t it?’ But I wasn’t sure.

So, I did the math. Here is the result:

If we look back to the dates from 1910, the current date is 1.3σ away from that mean. Furthermore, presently (since 1980), the later dates with smaller variance are still only 1.5σ away from the mean. If these fluctuations were random, 3σ would be within expectations. The σ of 6 for this time period is exactly what you would expect for random fluctuation about some average.

The only interesting thing I noticed was that the ~1940 lower average seemed to transition between 1960-1980 to a new later average.

Does anyone know of a reason why?

One Day Build (House Hunting)

We’re in the middle of house hunting in Nashville (which is booming).

So, I wanted a tool that would help me vet addresses as they pop-up in my feed from our realtor.
We wanted to be able to walk to some things in our neighborhood and fortunately that criterion a pretty easy thing to map out by hand in inkscape with screenshots from Google maps. I was able to produce a png map that had those regions clearly identified. For those that remember the 2010 floods, water ways are something to be wary of. So I pulled a map from FEMA (maps.nashville.gov) and overlayed that with a transparency by hand, easy enough. I could also have tried to overlay crime maps and offender registries, but this was sufficient for triage.

Now I just needed a converter(mapper) between the x,y in inkscape and gps coordinates.
I used PIL (python image library, aka pillow) to draw on the picture I had created.
The converter looks like this:

def mapper(lat,long):
#scale = [4528.104575164487, -162821.9551111503, 3633.747495444875, 315924.9870280141]
#return(long*scale[2]+scale[3],flip-( lat*scale[0]+scale[1]))
scale = [-5228.758169935899, 189234.5555556009, 4041.34431458903, 351362.65809204464]
return(long*scale[2]+scale[3],lat*scale[0]+scale[1])

There are two attempts here because a conversion from GPS coordinates to inkscape coordinates is unfortunately not the same as GPS to pillow.

I derived this via two “calibration” points on my map and the respective coordinates in pillow.

given = [[36.039065, -86.782672],[36.042890, -86.606493]]
res = [[644, 795],[1356,775]]
a = (given[1][0] - given[0][0]) / (res[1][1] - res[0][1])
b = -a * res[0][1] + given[0][0]
c = (given[1][1] - given[0][1]) / (res[1][0] - res[0][0])
d = -c * res[0][0] + given[0][1]
scale = [1/a, -b/a, 1/c, -d/c]

This finds slope and offset for two categories , latitude and longitude, based off of four points and is exact (those potentially off by a little depending on the accuracy of my calibration points). I should’ve picked points better, because 36.039 is not very different from 36.042. Oh well. In the end it worked.
Then I just hardcoded the values of the variable scale into the function mapper.

I have my x,y coordinates from latitude and longitude. Now I want to draw on my map.

def drawer(coords):
im = Image.open("pillow.png")
draw = ImageDraw.ImageDraw(im)
flip = im.size[-1]
for pair in coords:
vec = [mapper(pair[0]+0.001,pair[1]-0.001,flip),
mapper(pair[0]-0.001,pair[1]+0.001,flip)]
print(vec)
draw.ellipse(vec,fill=100)
im.show()

It makes all the points the same color which makes it difficult to judge multiple new points on a figure, but this was sufficient for my purposes.
The plus and minus 0.001 was found by trial and error to make the correct sized dots on the map.

The tool was just for me but my wife also appreciated that we could use this to quickly go through the initial barrage of home listings
and weed out the listings that were for sure not going to be of interest.

Not too bad for a few hours of work and most of that was just deciding and drawing out the regions of interest.

Machine Learning for Scientific Applications

Data taken in scientific context often forms an image. Whether this is a reconstruction of a physical space or if it is simply a graphical representation of data, machine learning and image processing tachniques can have some use for a scientist.

In one context, energy sharing between two detectors forms lines of a constant sum.

In this case it was not possible to calibrate the y-axis directly. Only the x-axis and the sum could be accurately calibrated. Instead of recursively scanning and re-calibrating the data, one can use a clustering algorithm called k-lines means. The slope of these lines is the negative reciprocal of the slope of the calibration. 

In this case, rapid convergence even for k=3, fewer than the k=9 which is closer to the number of calibration points which is physically meaningful.

This fact allowed for a fast running algorithm which can run online for stability monitoring.