Intermap: An integrative multiomics approach to generating therapeutic target hypotheses

Abstract

In this talk, we will be discussing an architecturally and bioinformatically multi-layered integrative multiomic approach to the development of target hypotheses. Scientists work to help pharmaceutical companies advance towards the identification of potent therapeutics on a daily basis. In some scenarios, biological scientists can develop therapeutic tools without a specific target in mind. In this case, they would like to generate a list of potential targets for their tools, within a given set of parameters for the delivery. However, combing through all of the appropriate databases to find these targets that have the appropriate molecular biology characteristics, viable mouse models that recapitulate the human disease phenotypes, and pathologies in the tissues of interest, to generate this list is very difficult to perform manually. This work requires making recursive decisions from the present wealth of biological literature and its data at scale. Such decision-making is a herculean task that requires the simultaneous propagated joins of annotated entity catalogs (genes, knockout mice, diseases, structured vocabulary terms, etc.) and, orthogonally, recursive filtration of hierarchical associations between those entities and controlled biomedical vocabularies. To streamline and accelerate this process, we used public data repositories (Uniprot, National Center for Biotechnology Information, International Mouse Phenotyping Consortium, Online Mendelian Inheritance in Man), ontologies (Gene Ontology, Mammalian Phenotype Ontology, Human Phenotype Ontology), and their multi-species (mouse, human) entity annotations to populate and index a MySQL relational database and a Neo4j graph database with their descriptive and relational properties. We then built an API (application programming interface) via the plumber package for R to dynamically generate optimized SQL and Neo4j Cypher queries that interact with the MySQL database, via the RMariaDB package for R, and the Neo4j graph database, via the neo4r package for R, to fuse data across the ingested biomedical repository data and use the yielded results to generate parseable JSON objects. Finally, we built a user-friendly shiny app for constructing and submitting queries via the API, parsing the JSON API outputs, and providing interactive network visualizations of the queries via the VisNet package for R, in-depth explanations of how the results were generated, and links to external resources for further relevant scientific data. We delivered this app to fellow scientist collaborators via RStudio Connect, enabling these biologists to, within milliseconds, leverage high-dimensional, multi-species relationships to identify potential targets.

Publication
Presented at 2021 Conference