Becoming Multilingual


As stated in my 2018 R/Pharma presentation “Becoming Bilingual in SAS and R” I believe in problem-solving using different data science tools. This talk is about my team’s efforts at using different data science tools (SAS R and Python) to harmonize data from 10+ clinical studies to build a robust and automated data mart that will eventually integrate biomarker data from clinical studies and real world data(RWD). (1) SAS data dictionary and ODS are first used because of two reasons Firstly ADaM datasets are in sas7bdat format. Secondly Data dictionary and ODS are powerful tools that R or Python have not well-established package. (2) R is used for its visualization power and Shiny and Rstudio’s Reticulate tools for integration of Python into R projects. (3) Python is used for its fuzzywuzzy package and potentially NLTK package. In this project we are particularly pleased and impressed by Rstudio’s work on seamlessly integrating Python tools into R projects. This project showcases the use case of combining the three programming languages in Clinical Data Integration space. It also provides a POC(proof of concept) for integrating Kite internal data with external data and RWD data. It is also future looking in the sense that it prepares us to deal with future wearable device data that innovative technology and precision medicine will bring into Oncology treatment scene.

Presented at 2020 Conference