Maximizing LLM Potential: How Data Structure and Context Drive Quality Insights

Abstract

Large Language Models (LLMs) promise to revolutionize clinical data analysis but struggle with messy, imperfect datasets common in drug development research. This presentation demonstrates how the inclusion of comprehensive context dramatically improves LLM output quality in clinical applications without extensive data cleaning. By enriching models with data context (field-level metadata, data dictionaries, relational mappings) as well as use case context (semantic data mapping, study design, domain ontologies), we enable flexible natural language exploration while maintaining quality and consistency. This approach bridges the gap between imperfect human queries and precise answers needed for pharmaceutical decision-making, revealing insights within complex, real-world datasets.

Type
Publication
Presented at genAI Day 2025

Related