Artificial Intelligence and Machine Learning for Predictive Human Toxicology
Developing a new therapeutic today costs in excess of $2 billion and can take more than a decade before making it to the patient. Of the drugs that enter Phase 1 clinical trials approximately 10 percent result in new drug applications (NDAs). These high levels of attrition are due largely to a combination of poor safety profiles and inadequate efficacy. Such poor performance is a result of our limited understanding of disease etiology, how drugs perturb cellular pathways, and an inability to accurately predict off target drug interactions.
We are addressing the issue of high attrition during drug discovery by developing predictive methods for identifying drugs with a high risk of causing tissue injury [Fraser et al. Chem. Res. Toxicol. 31, 412-430 (2018)]. Specifically, we are building models for drug induced liver injury (DILI). DILI has been identified as one of the primary reasons for clinical trial failure for several compounds across drug classes and disease indications. Our approach takes advantage of machine learning tools to mine and analyze biomedical data across a range of structured and unstructured data warehouses (Figure AI-1). This work builds on our high-throughput in vitro platform for screening drug toxicity and leverages the high-performance computing capabilities through the Rensselaer Center for Computational Innovation (CCI) and the Cognitive and Immersive Systems Lab (CISL).
Figure AI-1. Computational/experimental paradigm for drug discovery, which includes cognitive systems for exploring new dimensions of biomedical information [Fraser et al. Chem. Res. Toxicol. 31, 412-430 (2018)].
Adverse drug reactions, particularly those that result in drug-induced liver injury (DILI), are a major cause of drug failure in clinical trials and drug withdrawals. Hepatotoxicity-mediated drug attrition occurs despite substantial investments of time and money in developing cellular assays, animal models, and computational models to predict its occurrence in humans. Underperformance in predicting hepatotoxicity associated with drugs and drug candidates has been attributed to existing gaps in our understanding of the mechanisms involved in driving hepatic injury after these compounds perfuse and are metabolized by the liver. These gaps are primarily due to the immense spatial, biological, and genetic complexity of the liver. Though many attempts have been made to recapitulate the liver in vitro, including highly complex body on a chip systems, hepatic organoids, and others, none have built a sufficient mixture of predictivity and throughput.
To experimentally address this weakness, we have employed a high-throughput, three-dimensional cell culture platform containing two cell types to screen a library of 26 small molecule drugs of various mechanisms of action and modes of toxicity [Bruckner et al. AIChE J. 64, 4331-4340 (2018)]. Correlations of in vitro toxicity to in vivo murine toxicity are substantially improved with primary human hepatocytes vs. HepG2 (Figure AI-2). At a murine LD50 cutoff of 300 mg/kg, the calculated predictivity for primary human hepatocytes is 76%, as compared to a calculated predictivity for HepG2 cells of 54%. These results demonstrate that primary human hepatocytes are highly predictive of in vivo outcomes, and the use of the 3D chip platform enables substantial reduction in the number of hepatocytes required for in vitro toxicology studies.
Figure AI-2. Effect of varying LD50 and IC50 toxic cutoff values on sensitivity, specificity, and overall predictivity. ● : Overall Predictivity, ■ : Specificity, ▲ : Sensitivity. Comparisons used for calculations- Primary human hepatocyte in vitro vs murine literature in vivo and HepG2 in vitro vs murine literature in vivo [Bruckner et al. AIChE J. 64, 4331-4340 (2018)].
Existing drug toxicity data in the literature, while limited, is not without value. While these data are too numerous for humans to manually assess holistically, they are an excellent starting point to use a machine learning approach to better predict human outcomes. Along these lines we are currently developing supervised machine learning models based on publicly available data repositories such as ToxCast predict human relevant DILI. While ToxCast which contains a wealth of in vitro data. Finally, a major gap is the available human toxicity data. Nonetheless, available electronic health records (EHRs) and clinical data can provide some aspect of human toxicity that can be used for predicting drug candidate/environmental chemical toxicity based on data. Natural language processing tools can further help expand the available human and in vivo dataspace, by extracting adverse effects found in humans and animals from unstructured biomedical research papers and clinical trial results. Computationally, our goal is to leverage machine learning tools to better probe the link among in vitro-in vivo-cheminformatics-human toxicity, to extract correlations between and increase concordance among these datasets, and therefore, develop a robust system that properly predicts human toxicity relying solely on in vitro and chemogenetics data. Our ongoing approach is depicted in Figure AI-3 and exploits data from various available databases, as well as potential results from electronic health records and clinical trials.
Figure AI-3. Combination of various datasets to advance machine learning techniques, including data analytics and semantics, to explore the link between in vitro, in vivo, and human toxicity
James Hendler - Rensselaer Polytechnic Institute