Industrialized Machine Learning and Explainable AI for Late Phase Trails


Terms like “digitalization”, “machine learning (ML)” or “artificial intelligence (AI)” are more than just buzzwords these days. Databases are analyzed worldwide with modern algorithms and entire industries are making data-driven decisions at an even faster pace. In Pharma, it is not enough to get the prediction (the what). The model must also explain how it came to the prediction (the why). ML models can only be debugged and audited when they can be interpreted, which then allows for fairness, robustness and trust. Presently, however, the amount, complexity, variety, and speed of clinical data runs the risk of leaving us knowing less about our compounds than regulatory bodies. While the capabilities of ML and AI have received much attention, their role in clinical development has now moved from the theoretical to practical application stage. Using industrialized ML/AI tools, can detect clinically relevant, highly complex safety/efficacy signals that are not identifiable via classical approaches that force hypotheses on the data. By deriving the best hypothesis given the data, ML is currently the best available methodology to create holistic mathematical models of complex (biological) systems using all available data and variables while complementing findings from classical approaches. We, the Biomarker & Data Insight Group at Bayer, have developed a MLAI pipeline in R. Our MLAI pipeline is comprised of four core-modules (data preprocessing, modeling / hyperparameter tuning, higher order interaction analysis and reporting) using most of the available data of late phase trails covering standard endpoint types (time-to-event, class and continuous.). Each core module has its own created internal R package integrating several R packages (e.g. tidyverse, tidymodels, mlr3, iml, Rmarkdown, Shiny,…). The pipeline is an industrialized, mature and validated software product with continuous delivery and continuous deployment. Something special about this pipeline is that we have the effort to open the “black box” using explainable AI. With these extra tools, we can understand better why a certain variable is relevant for the prediction, reveal the nature of its relationship (monotonic or non-monotonic) with the outcome, and make the ML results more understandable and meaningful for clinicians.

Presented at 2021 Conference