Why we Need to Improve Software Engineering in Biostatistics - A Call to Action


Programming is ubiquitous in applied biostatistics, and most statisticians know a programming language such as R - yet software engineering is still neglected as a skill and undervalued as a profession in pharmaceutical statistics. Why is this a problem? Importantly, we run the risk of wrong decisions when relying on code that we wrote ourselves without any code review by other statisticians. When transitioning over undocumented code to successors or other teams, we cannot be sure that they can even use, yet maintain it in the future at all. Also, whether they can reproduce results we produced earlier is a matter of luck. If we later need to add features to our code, and don’t have sufficient tests in place, we will undoubtedly introduce bugs and alter the program behavior without knowing it. Finally, if we need to implement new statistical methods for analyses submitted to regulators, we need to have appropriate software validation pipelines in place, which will demand well developed and tested code. What can we do about it? First and foremost, we must become aware of the problem. Second, we need to take software engineering seriously, starting from education in basic software engineering skills - across schools, universities, and during the work life. Establishing dedicated software engineering teams within academic institutions and companies can be a key factor for the establishment of good software engineering practices and catalyze improvements across research projects. Providing attractive career paths is important for the retainment of talents. Finally, collaboration between software developers from different organizations is key to harness open-source software efficiently and optimally, while building trusted solutions. We illustrate the potential with examples of successful projects.

Presented at 2023 Conference