Thirty-five years ago a small group of students met in a seminar room in Ann Arbor. These students were planning careers that would lead into the quantitative analysis of public policy, an idea much in vogue at the time. Important public decisions might be made not just on the basis of politics, but on the policy impacts, costs and benefits, and economic efficiency.
The students were taught by leading faculty members at a top resesarch institution, learning about administrative theory, organizational behavior, economics, and research methods.
On this particular occasion the group of first-year students was eager to show their stuff. They wanted to impress their young professor, a man who would later be recognized not only as a first-rate teacher, but a top scholar in his field.
The professor led the seminar by introducing a series of findings drawn from social science literature. These were relationships like voting patterns of black males, party identification of former military personnel, and the like.
As he introduced each finding, the professor invited the students to comment, suggesting hypotheses to explain the results. Straining to please, the students had many imaginative suggestions. Their ideas would have filled out many journal articles. They were showing off, and happy to do so. The professor provided some positive feedback for the thoughtful analyses, and ticked off a dozen or so propositions.
At the end of the seminar, the students sat back, satisfied with their performance. The professor congratulated them on their creativity and imagination, and everyone sat up a little taller.
Then the prof dropped the bombshell:
The actual findings were all EXACTLY THE OPPOSITE of what he had stated!
The next day’s assignment was to come back with new hypotheses for the other finding.
This is an extremely important lesson. Analyzing lots of data, with hundreds of possible relationships, will always yield some findings — statistically significant! Fertile minds can figure out some logic to explain these findings.
That approach is backwards. Good research begins with theory and hypotheses and then moves to testing.
In one sense it is a shame that Wall Street researchers did not get this kind of training. If one looks carefully at their reports, it is pretty obvious when a researcher is "data mining" and when there is some theory behind the work. A key question is: Which came first?