Orange lines making up a digital code float in space.

5 Things You Need to Know About AI

Understanding a powerful new technology’s secrets and nuances.

By Brian W. Simpson • Illustration by Gremlin/Getty Images

AI needs supervision.

AI learns very well when you throw all possible data at it, but you can lose control over what it learns.

There are some infamous examples. In 2018, a radiology resident found something surprising about how a deep learning model was diagnosing cardiomegaly (enlarged heart) from chest X-rays. The sickest patients were X-rayed with a portable system because they were too ill to come in for a regular X-ray. The AI learned to associate the word “portable” (included on every X-ray from a portable system) with cardiomegaly.

The presence of the word “portable” played a significant role in its prediction. But, of course, it’s not filtered by the fact that the word is not scientifically important information.

It shows the value of knowing the science and asking, “Why is AI doing this?”

AI and observational data require skepticism. 

We now are in an era where we have profound amounts of structured electronic health record data, increasing amounts of claims data, and patient-provided data and sensor data. I’m wearing my Fitbit right now.

But we’re still fraught with incomplete information. Obviously, people who don’t get access to health care aren’t represented in the health data. Therefore we miss that entire segment of the population. And maybe that’s where much of the need and pathology and social issues lie, but they’re invisible to AI algorithms. 

We have to understand the fundamental limitations of observational research. It’s not experimentation, and clinical trials remain the gold standard. We can’t afford to do a clinical trial on every question we might want to ask, so we have to use observational data. But we must always be skeptical of results and information that emerge from it.

AI algorithms have a shelf life. 

Data may shift or drift. “Data shift” occurs when an algorithm performs poorly when applied to a new population. For example, does an algorithm trained on populations in California work in Baltimore? 

On the other hand, “data drift” refers to how populations change, potentially leading algorithms to perform worse over time. An example of this would be an algorithm developed in the early 2000s that includes smoking in its risk calculation. Because smoking prevalence and consumption among smokers have declined over the past couple of decades, it no longer predicts risk as accurately as before.

Continuously monitoring performance and periodically updating algorithms are important to ensure they are working in the real world. The overall goal is: How do we make sure AI works for people? Does AI help populations both now and in the future?

AI works better with people than on its own.

AI enthusiasts say that a person plus AI will be better than AI or a person without AI.

Do I buy that? I honestly do. A car with AI safety functions can warn you to brake, and you can take action based on the situation. The two work better than a person without these functions—or than a driverless car.

The concern, of course, is that AI is making great leaps. No one would trust a driverless car 10 years ago. Now, there are plenty of them on the road. They don't get tired or drive under the influence, but they do occasionally get into accidents.

Right now, it still means that the person plus AI is probably better than either alone, and I think this example applies to most modern AI systems.

AI is opening the door to new opportunities in genomics. 

The challenge for genomics researchers like me is that we don’t have a lot of labeled data because cells are constantly changing. It’s hard to label a particular cell type in cells that are in different stages of replicating and dividing, for example. 

Historically, AI models outside of genomics have relied on labeled data for training. But new models don’t need labeled data. Instead, they use the data itself to create labels. This approach called “self-supervised learning” can speed discovery. It can help us more accurately and efficiently identify complex cell types. Eventually, self-supervised learning may make it easier for scientists to see patterns that can help us detect earlier stages of Alzheimer’s, for example. 

The field of genomics is starting to rapidly adopt these types of AI models, and I am excited to understand the potential challenges and opportunities with these data.