Data Analysis: Honest disagreement about methods may explain irreproducible results
It sounds like an easy question for any half-competent scientist to answer: Do dark-skinned footballers get given red cards more often than light-skinned ones?
But, as Raphael Silberzahn of IESE, a Spanish business school, and Eric Uhlmann of INSEAD, an international one (he works in the branch in Singapore), illustrate in this week’s Nature, it is not. The answer depends on whom you ask, and the methods they use.
Dr Silberzahn and Dr Uhlmann sought their answers from 29 research teams. They gave their volunteers the same wodge of data (covering 2,000 male footballers for a single season in the top divisions of the leagues of England, France, Germany and Spain) and waited to see what would come back.
The consensus was that dark-skinned players were about 1.3 times more likely to be sent off than were their light-skinned confrères. But there was a lot of variation. Nine of the research teams found no significant relationship between a player’s skin colour and the likelihood of his receiving a red card. Of the 20 that did find a difference, two groups reported that dark-skinned players were less, rather than more, likely to receive red cards than their paler counterparts (only 89% as likely, to be precise). At the other extreme, another group claimed that dark-skinned players were nearly three times as likely to be sent off.
Dr Uhlmann and Dr Silberzahn are less interested in football than in the way science works. Their study may shed light on a problem that has quite a few scientists worried: the difficulty of reproducing many results published in journals.
Fraud, unconscious bias and the cherry-picking of data have all been blamed at one time or another—and all, no doubt, contribute. But Dr Uhlmann’s and Dr Silberzahn’s work offers another explanation: that even scrupulously honest scientists may disagree about how best to attack a data set. Their 29 volunteer teams used a variety of statistical models (“everything from Bayesian clustering to logistic regression and linear modelling”, since you ask) and made different decisions about which variables within the data set were deemed relevant. (Should a player’s playing position on the field be taken into account? Or the country he was playing in?) It was these decisions, the authors reckon, that explain why different teams came up with different results.
How to get around this is a puzzle. But when important questions are being considered—when science is informing government decisions, for instance—asking several different researchers to do the analysis, and then comparing their results, is probably a good idea.
From The Economist print edition: Science and technology