Skip Navigation


Big Data - OverloadMichael Gibbs

Big Data - Overload (continued)

Even so, the promise of all that data now encourages researchers to go where they might have feared to go before.

Brian Caffo, PhD, associate professor of Biostatistics, recently led a Johns Hopkins team in a competition to use neuroimaging data to predict ADHD diagnoses. The organizers of the ADHD-200 Global Competition gave Caffo’s team, and 20 other academic teams, structural and functional MRI data on 700 children to use in training their image-data-crunching algorithms. Then the teams were asked to use their algorithms to determine which of 200 new subjects had been diagnosed with ADHD.

One key to dealing with today's ultra-large datasets is knowing what to leave out, says biostatistician Brian Caffo.

“With multiple images per subject and multiple processing stages, we ended up handling trillions of bytes of data,” Caffo says. “But the predictive value of the imaging data turned out to be weak.” (In fact, a slightly higher-scoring algorithm devised by a University of Alberta team relied entirely on the handful of non-imaging data given, such as IQ, gender and age, and was disqualified by the judges for failing to adhere to the spirit of the competition.)

Knowing what to leave out is definitely a part of the challenge of big datasets, Caffo says.

Bandeen-Roche couldn’t agree more. “Sound statistical thinking is as needed or even more needed than ever to assure that what comes out of these tremendous technological resources are really valuable, valid findings,” she says.

Also needed more than ever, as these big-data challenges increase, are biostatisticians themselves. “The demand these days is always greater than the supply,” says Caffo. “In fact, statistics is often rebranded as something else—sabermetrics [baseball stat analysis] and Web analytics are two examples—in part because our field doesn’t produce enough people to fill the need.”

The intense math training needed, and the esoteric lingo—“Granger Causation,” “Markov models,” “Pearson’s Chi-squared test” and so forth—probably has something to do with it. “We’re also poorly branded,” Caffo says. “Biostatistics is actually one of the most exciting fields to go into right now.”


This forum is closed

Read about our policy on comments to magazine articles.

design element
Online Extras

Alain Labrique

Alain Labrique

Alain Labrique shows off a trove of low-cost technological treasures that support research from Kenya to Bangladesh.

Watch Now

Talk to Us

Amazed? Enthralled? Disappointed? We want to hear from you. Share your thoughts on articles and your ideas for new stories:

Download the PDF

Get a copy of all Feature articles in PDF format. Read stories offline, optimized for printing.

Download Now (4.2MB)