Exploration of Large language model-based frameworks to develop R codes for output delivery in pharma


Over thousands of outputs (tables, graphs and listings) may need to be generated each year for filing, external publications, internal read outs and other activities in a pharmaceutical company. Although most of these outputs could be produced utilizing previous existing codes with trial specific adjustments, this process is still labor-intensive and requires good data&coding knowledge. Therefore, in this proof-of-concept project, we explored the potential of implementing large language model (LLM)-based frameworks to develop R codes to produce the outputs from ADaM datasets. GPT4 Code Interpreter with uploaded supporting files (template codes, variable dictionary and function manuals) demonstrated good potential of completing following tasks per user’s natural language requests 1) select the fit-for-purpose template code; 2) search in the variable dictionary and propose variables to use; 3) modify template codes to filter patients and update the output contents. This shows promising prospects for LLMs as an assistant for the future output generation, which will significantly reduce the labor required and lower the barrier to data&coding knowledge.

Presented at 2023 Conference