If you’re going to do good science, release the computer code too
I don’t have any opinion about whether this is or isn’t an indictment of climate change research, but the general point here is absolutely right. All the work in my dissertation depends on a pretty substantial volume of bespoke Matlab code and probably an equally substantial volume of code written by others (AFNI, MVPA toolbox). AFNI, which is produced by a very conscientious and dedicated group of programmers at the NIH, recently discovered a bug that severely inflates the results of certain statistical comparisons. I reported a fantastic correlation in a talk my first year of grad school that was due to a sign error in the analysis code. A large part of my confidence in the validity of my dissertation work is due to the fact that my advisor and I independently coded the analyses in different languages — he used Java, I used Matlab. We worked this way for reasons more to do with stubbornness than conscience, but we both feel pretty good about it now. My volume of scientific programming hasn’t dwindled in my postdoc; I write PsychoPy code to run experiments and Python, R, and Matlab code to analyze them. Psychologists who aren’t comfortable with programming will use E-Prime, Excel, and SPSS. But Microsoft, at least, won’t save them. The problem is well stated, and it extends well beyond climate change.
What to do about it is not as clear as the article might suggest. Making data and analysis code public is a clear step forward, but that’s a sizable step away from making it easy for people to verify the claims. To check my code, you’ve got to have AFNI and the MVPA toolbox installed — that’s a big investment! That’s assuming you have access to Matlab, which is a proprietary language. And a lot of my more resource-intensive analyses were done in parallel on a computing cluster with several dozen nodes; those aren’t so easy to check on your PC. Still, a clear step forward.
Once you’ve logged the data and programs, though, how many papers are going to get the same kind of attention as the high-profile results Dr. Ince is talking about? Again, no question that it’s better to check just high-profile results than nothing. But we should be realistic about the fact that most of these programs will not be checked. In cognitive neuroscience, at least, I don’t think there are enough reviewers competent to check them, to say nothing of how time-consuming it would be. (This is more of a problem for bespoke code; packages like AFNI, the MVPA toolbox, PsychoPy, and so on could presumably be dealt with by a specialized accreditation unit of some sort — but who has the incentive or the resources to create such a unit?)
To me, it’s not so clear what happens once a dispute arises over the correctness of the code. Maybe it’s always clear, or always devolves to mathematical arguments; that would be the best case. But is there always a way to establish ground truth? Who is responsible for making the final arbitration? Maybe communities tend to self-organize around these issues and produce a reliable consensus; Dr. Ince implies that was the case for the proof of the four-color theorem. I’m just not sure.
It’s also worth thinking about the new biases an open-code policy would encourage. As one commenter noted, “I do… sympathize with scientists who don’t want to release computer code to bullies who just want to pick it apart to find tiny errors and then blow them out of all proportion, claiming that they undermine the whole body of science behind climate change.” I think the scientific community is apt to separate gold from dross in these situations if mobilized in sufficient numbers, but I’m more worried about it in the context of smaller disputes. How do you set things up so an obstructionist dissenter can’t effectively filibuster someone’s publication by picking at the code? Imagine this from the perspective of an assistant professor who’s choosing between spending all his time rebutting attacks on his code and being forced to retract a paper. These sorts of problems are probably solvable, but open code alone won’t solve them.
Also, is an open-code policy especially fair to quantitative researchers? It doesn’t have to be — scientists should exalt truth over competitive advantage — but my wife, for example, is a developmental biologist who uses biochemical tools that seem to operate on a basis of absolute voodoo. It’s known why they’re supposed to work, but sometimes they behave strangely and no one knows why. No one’s going to do any kind of bug-hunting on her beautiful stain or PCR, but my fMRI results are up for grabs? If I have to spend more time convincing people that my imaging results are real than my friend down the hall has to spend convincing people that his single-unit neurophysiology experiments are real, the world may find itself short on imagers. (Although this would be bad for me, I’ll concede that this may be a good thing in general — there’s a strong case for saying we only want really good people in research that depends on extremely elaborate calculations. Again, it’s not that I think quantitative researchers have any right to shy away from scrutiny of their code — but these new openness requirements, if they ever actually become requirements, are going to have consequences, and the fact that openness seems like a good idea makes the potential consequences that much more worth thinking about.)
The comments section is intermittently interesting — it’s mostly climate change-specific, but there’s some more general-purpose insight there. I was struck by one serious error that no one seems to have caught:
“… if random computer code errors are affecting the reliability of conclusions about global warming, then they will be equally likely to be underestimating as overestimating the effects. So this line of argument does not really help the sceptics case.”
Wrong! The errors that underestimate the effects don’t get published. They’re probably also more apt to get debugged — if your code is producing results you don’t like, you’re going to make absolutely sure it’s right, but you’re much less likely to scrutinize code that seems to be working (from your perspective) fine.