Bridging The Gap

Pharmacogenomics and deep learning

Pharmacogenomics and deep learning

At SPARK’s weekly Wednesday night meeting on July 22, Dr. Russ Altman dug into some of his work using machine learning to tackle a problem in clinical care – how to predict if drugs will be properly metabolized in an individual based on their genetic makeup.

Dr. Altman is the Kenneth Fong Professor of Bioengineering, Genetics, Medicine, Biomedical Data Science and (by courtesy) Computer Science, and past chairman of the Bioengineering Department at Stanford University. His main interests include the application of computing, AI and informatics to medicine, and he’s particularly interested in how those methods apply to drug action at the molecular, cellular, organism and population levels.

Pharmacogenomics is the study of how a person’s genome affects their response to drugs. As pharmacogenetics (the study of individual genes) and pharmacogenomics enter the clinic, they are used to help physicians understand if a drug should work as expected, or if they should change the dose, if there’s an increased chance of toxicity, or if they should just use another drug, Dr. Altman said.

Pharmacogenomics is really only useful for common gene variants, Dr. Altman said. On Wednesday night, he presented us with a problem, and a solution his group is working on – rare, unknown variants in a patient’s genes that interfere with drug metabolism, and the use of deep learning to identify these variants. He asked, can we take sequences and predict a gene’s function based on sequence alone?

“That would allow us to bring clinical pharmacogenetics to patients who have rare variants, whereas today, I have to tell them I’m sorry – I have no idea what your variant means for the dose of this drug.”

There are some genes for which variants cause the gene to either poorly metabolize or over metabolize a drug. For instance, Dr. Altman said, CYP2D6, a liver enzyme involved in drug metabolism, has 161 observed haplotypes and thousands of unknown variants! CYP2D6 metabolizes drugs including antidepressants, beta blockers, and antipsychotics, and up to 23% of people in the U.S. have a compromised ability to metabolize opioids.

Deep learning methods hold promise for predicting clinically useful phenotypes for novel variations in important genes, Dr. Altman said. Deep learning is a new way to do machine learning that has emerged in the last 5 years, and is based loosely on neural networks, in which raw data from rods and cones in the retina is integrated through layers of neural cells to get higher level features.

His group examined the database in the UK Biobank, which holds genomic information for about 500,000 people, to identify the prevalence of rare pharmacogenetic variance in a population. Dr. Altman’s team evaluated 8 key pharmacogenes in UK Biobank exomes, and found 6.1% of individuals carry one novel deleterious variant, and each person has an average of 12 drugs for which unusual response might be expected.

Dr. Altman also noted that the UK Biobank’s database is predominantly European, while the novel variants observed were highly enriched in non-European populations, meaning there’s a lot of work to do to understand how genes in non-white populations affect drug metabolism.

His group then used transfer learning to train an artificial neural network to recognize CYP2D6 variants. Transfer learning allows you to add data to a previously built network so it can solve a different problem. Dr. Altman illustrated the issue using cats and dogs – using dog data to train the final layers of a neural network that already contains cat data to classify pictures of dogs.

However, “We do not have enough CYP2D6 gene sequences with known phenotypes to just do a simple straightforward cat training.”

His lab used activity scores for CYP2D6 — “an incredible simple algorithm for predicting phenotype” — generated 50,000 synthetic CYP2D6 sequences with known CYP2D6 variations embedded in the sequences, and estimated their activity scores. They trained a model to learn how to assign activity scores, and then used sparse experimental and database data, plus nucleotide-specific annotation data, to refine the final layers.

The team then used the neural network to predict the functions of 71 unknown variants. They also compared the results to a study that characterized 49 new CYP2D6 alleles, and found the model could explain 71% of the variants in the dataset.

For issues such as pharmacogenetics, where there is a deluge of data and a problem to solve, deep learning could prove extremely useful. The algorithm provides “not perfect, but definitely clinically useful information. I could take this and I could make some decisions as a physician even knowing that the performance isn’t perfect,” Dr. Altman said.

“Of course, our hope is with a better dataset, and with better ethnic diversity in that dataset, this will turn out to be quite effective.”

Dr. Altman said his group has ideas to improve the model’s performance, and also wants to build models for genes beyond CYP2D6.

Dr. Altman’s PharmGKB group is involved in the Clinical Pharmacogenetics Implementation Consortium (CPIC), which writes guidelines for physicians on how to use genetic data to dose drugs. Altman has since opened a consult clinic to give advice to a patient’s physician on their phenotype’s effect on drugs.