Statistics Shows Psychology Is Not Science
As Alex Berezow wrote in his piece yesterday, psychology is not a science, and statistics in and of itself is not science either. Then again, lots of useful and worthwhile things are not science. So, that's not necessarily a problem. What is a problem is that poor statistical methods and irreproducibility damage not just the validity of any one study or one theory, but the rigor and quality of two-thirds of all studies in psychology.
Alex and I have previously detailed what we believe are the requirements for calling a field of study science: clearly defined terminology, quantifiability, highly controlled conditions, reproducibility, and finally, predictability and testability.
The failure of psychology (and indeed many other so-called social "sciences") to meet these criteria often manifests as an obvious symptom: lousy statistics. Statistics is just a language. Like other languages it can be harnessed to express logical points in a consistent way, or it can demonstrate poorly reasoned ideas in a sloppy way.
Statistical studies in psychology limp off the runway wounded by poor quantifiability, take further damage from imprecise conditions and measurements, and finally crash and burn due to a breakdown of reproducibility.
The strengths of hard sciences often shine through their statistical conclusions (although studies performed in hard science disciplines are certainly not immune to poor practices). Statistics underlies some of the most important and empirically successful chemistry and physics ever discovered.
Albert Einstein once said of thermodynamics, a field that can be theoretically derived using advanced statistics:
"It is the only physical theory of universal content concerning which I am convinced that, within the framework of applicability of its basic concepts, it will never be overthrown."
If you took high school chemistry, you may remember Boyle's Law, Gay-Lussac's Law, or the Ideal Gas Law. You might have read about the concepts of equilibrium, the ever-increasing entropy of the universe, or the "five-sigma significance" discovery of the Higgs Boson. All of these things are directly derived through statistics performed on atoms or particles.
But statistics is problematic when it comes to the social sciences. The first key issue is sample size.
Think of a political survey poll. Every one of these polls states a margin of error; surveys with a larger number of respondents have a correspondingly smaller margin of error. Most social research studies use sample sizes of tens, hundreds, and occasionally thousands. That may sound like a lot, but remember that statistical physics deals with sample sizes that can be described in unimaginable ways like this: One thousand trillion times more than the total number of stars in the Universe. Or, enough sample atoms that if each one were a grain of sand, they could build a sand castle 5 miles high. Or, a number of molecules greater than the number of milliseconds since the Big Bang.
The next big difference is a bit more subtle: quantifiability.
Working with such variables as awareness, happiness, self-esteem, and other squishy concepts makes quantifiability hard. This is the sloppy language problem. Even when these ideas are translated into some more concrete measure (say how long it takes a test subject to push a button or eat a marshmallow), the simplicity and truth of this transformation is far from crystal clear or rock solid.
Precision of measurement is another big issue. A social science survey may measure ten subjects with a stopwatch for a handful of seconds and produce an error of a second or two. They may ask people to rate things on a 1-10 scale. How sure are you that your "8" is not another person's "6.5"? The sorts of measurements chemists make have no such wiggle room. They ask molecules questions that have exact answers that cannot be fudged. What's your temperature? How much kinetic energy do you possess? A scientist in Texas and a scientist in Alaska and a scientist on the moon and a scientist at the bottom of the sea and a scientist on poor icy demoted dwarf planet Pluto could all measure the same molecule under the same experimental conditions and get the same answer to five decimal places.
Even the supposedly concrete measurements often fall vastly short of the rigor of true science. Photon-counting experiments often measure times in the range of nanoseconds. Timing subjects by hand with a stopwatch is quite literally one billion or even one trillion times less precise.
Finally, the issue of reproducibility.
While a study of human sexual practices conducted with 44 undergraduate college students may never be reproduced, the predictions of statistical physics will give you a correct answer to 10 decimal places. What's more, thanks to the enormous sample sizes involved, taking a verifying measurement every minute of every hour of every day for the rest of your life, you'd more likely be struck by lightning than detect any deviation from the theory a single time.
These distinctions only scratch at the surface of the vast gulf in rigor and objective truth between hard science and soft, fuzzy social science. Statistics aren't the only problem. While academics may politely demur from judgment, when only 39% of studies chosen from a particular field hold up under scrutiny, the public wises up. They stop believing findings and start ignoring every study. The public may just be right to be skeptical.