Routinely-collected healthcare databases generated from insurance claims and electronic health records have tremendous potential to provide information on the real-world effectiveness and safety of medical products. However, unmeasured confounding stemming from non-randomized treatments and poorly measured comorbidities remains the greatest obstacle to utilizing these data sources for real-world evidence generation. To reduce unmeasured confounding, data-driven algorithms can be used to leverage the large volume of information in healthcare databases to identify proxy variables for confounders that are either unknown to the investigator or not directly measured in these data sources (proxy confounder adjustment). Evidence has shown that data-driven algorithms for proxy confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. Consequently, there has been a recent explosion in the development of data-driven methods for high-dimensional proxy confounder adjustment. In this talk, I will discuss recent advancements in data-driven methods for high-dimensional proxy confounder adjustment and their implementation within the R computing environment. I will discuss challenges in assessing the validity of alternative analytic choices to tailor analyses to the given study to improve validity and robustness when estimating treatment effects in healthcare databases.