Peter Norvig talks about the different kinds of errors that occur in experimental design and in the interpretation of the results. It’s fairly common for media coverage of experimental results to be presented as facts without any of the caveats or critical analysis. For example, publication bias appears when the process of publication of experimental results is itself biased. If only positive results are considered interesting but without seeing the negative results it’s possible for the result to occur but not in a statistically significant way. From the article:
Here is my amazing claim: under the strictest of controls, I have been able, using my sheer force of will, to influence an electronic coin flip (implemented by a random number generator) to come up heads 25 times in a row. The odds against getting 25 heads in a row are 33 million to 1. You might have any number of objections: Is the random number generator partial to heads? No. Is it partial to long runs? No. Am I lying? No. Am I really psychic? No. Is there a trick? Yes. The trick is that I repeated the experiment 100 million times, and only told you about my best result. There were about 50 million times when I got zero heads in a row. At times I did seem lucky/psychic: it only took me 2.3 million tries to get 24 heads in a row, when the odds say it should take 16 million on average. But in the end, I seemed unlucky: I only got 25 in a row, not the expected 26.
Many experiments that claim to beat the odds do it using a version of my trick. And while my purpose was to intentionally deceive, others do it without any malicious intent. It happens at many levels: experimenters don’t complete an experiment if it seems to be going badly, or they fail to write it up for publication (the so-called “file drawer” effect, which has been investigated by many, including my former colleague Jeff Scargle in a very nice paper), or the journal rejects the paper. The whole system has a publication bias for positive results over negative results. So when a published paper proclaims “statistically, this could only happen by chance one in twenty times”, it is quite possible that similar experiments have been performed twenty times, but have not been published.