abstract green tiled background pattern

Get Dirty with Data

By Alfred Sommer

Clearly, number-crunching technology makes it possible to do studies that we could never have done before. However, it is very easy now to push a button and lose a lot of insight in the process.

The whole vitamin A–mortality connection … I wasn’t looking for that. I was looking for why some kids get eye disease. If I had asked a statistician to give me the associations for having vitamin A deficiency, I would have seen associations with diet, pneumonia, measles... and published a nice paper about their correlation coefficients. Instead, I looked at the original data: 15 kids had night blindness on round-one, and on round-two, only four were still around. Hmmm. What happened to those kids? I looked, and the data told me that they were dead.

You’ve got to get into the raw data—and feel it, smell it, touch it and think about it and let it lead you, rather than going in with a preconceived notion and pushing a button. Click, done! Yes, you proved something or no, you didn’t. You may miss the really important thing which had nothing to do with the question you were originally asking, but is buried in the data.

Vitamin A is the perfect example: I’m sure I would have missed it if I hadn’t been so deep in the data. I’m absolutely confident things like that are missed every day, because people don’t get dirty with their data.