The following is the full text of a paper by Sir Francis Galton first published in the March 7, 1907 issue of the scientific journal, NATURE. The piece is referenced in the February 18, 2005 episode of WNYC's most excellent RadioLab program. A scan of the paper on its original facsimile is available as PDF. It seemed a shame that what is possibly the first solid explanation for why Google's ranking algorithm is so capable should be hidden from it.
In these democratic days, any investigation into the trustworthiness and peculiarities of popular judgments is of interest. The material about to be discussed refers to a small matter, but is much to the point.
A weight-judging competition was carried on at the annual show of the West of England Fat Stock and Poultry Exhibition recently held at Plymouth (England). A fat ox having been selected, competitors bought stamped and numbered cards, for 6d. each, on which to inscribe their respective names, addresses, and estimates of what the ox would weigh after it had been slaughtered and “dressed.” Those who guessed most successfully received prizes. About 800 tickets were issued, which were kindly lent me for examination after they had fulfilled their immediate purpose. These afforded excellent material. The judgements were [unbiassed] by passion and uninfluenced by oratory and the like. The sixpenny fee deterred practical joking, and the hope of a prize and the joy of competition prompted each competitor to do his best. The competitors included butchers and farmers, some of whom were highly expert in judging the weight of cattle; others were probably guided by such information as they might pick up, and by their own fancies. The average competitor was probably as well fitted for making a just estimate of the dressed weight of the ox, as an average voter is of judging the merits of most political issues on which he votes, and the variety among the voters to judge justly was probably much the same in either case.
After weeding thirteen cards out of the collection, as being defective or illegible, there remained 787 for discussion. I arrayed them in order of magnitudes of the estimates, and converted the cwt., quarters, and lbs. in which they were made, into lbs., under which form they will be treated.
Distribution of the estimates of the dressed weight of a particular living ox, made by 787 different persons.
| Degrees of the length of Array 0°-100° | Estimates in lbs. | Centiles | Excess of Observed over Normal | ||
|---|---|---|---|---|---|
| Observed Deviates from 1207 lbs. | Normal p.e. = 37 | ||||
| 5 | 1074 | -133 | -90 | +43 | |
| 10 | 1109 | -98 | -70 | +28 | |
| 15 | 1126 | -81 | -57 | +24 | |
| 20 | 1148 | -59 | -46 | +13 | |
| q1 | 25 | 1162 | -45 | -37 | +8 |
| 30 | 1174 | -33 | -29 | +4 | |
| 35 | 1181 | -26 | -21 | +5 | |
| 40 | 1188 | -19 | -14 | +5 | |
| 45 | 1197 | -10 | -7 | +3 | |
| m | 50 | 1207 | 0 | 0 | 0 |
| 55 | 1214 | +7 | +7 | 0 | |
| 60 | 1219 | +12 | +14 | -2 | |
| 65 | 1225 | +18 | +21 | -3 | |
| 70 | 1230 | +23 | +29 | -6 | |
| q3 | 75 | 1236 | +29 | +37 | -8 |
| 80 | 1243 | +36 | +46 | -10 | |
| 85 | 1254 | +47 | +57 | -10 | |
| 90 | 1267 | +52 | +70 | -18 | |
| 95 | 1293 | +86 | +90 | -4 | |
q1, q3, the first and third quartiles, stand at 25° and 75° respectively. m, the median or middlemost value, stands at 50°. The dressed weight proved to be 1198 lbs.
According to the democratic principle of “one vote one value,” the middlemost estimate expresses the vox populi, every other estimate being condemned as too low or high by a majority of the voters (for fuller explanation see One Vote, One Value, NATURE, February 28, p. 414). Now the middlemost estimate is 1207 lb., and the weight of the dressed ox proved to be 1198 lb.; so the vox populi was in this case 9 lb., or 0.8 per cent. of the whole weight too high. The distribution of the estimates about their middlemost value was of the usual type, so far that they clustered closely in its neighbourhood and became rapidly more sparse as the distance from it increased.
But they were not scattered symmetrically. One quarter of them deviated more
than 45 lb. above the middlemost (3.7 per cent.), and another quarter
deviated more than 29 lb. below it (2.4 per cent.), therefore the range of the
two middle quarters, that is, of the middlemost half, lay within those
limits. It would be an equal chance that the estimate written on any card
picked at random out of the collection lay within or without those limits. In
other words, the “probably error” of a single observation may be reckoned as
1/2(45 + 29), or 37 lb. (3.1 per cent.). Taking this for the p.e. of the
normal curve that is best adapted for comparison with the observed values, the
results are obtained which appear in above table, and graphically in the
diagram1.
The abnormality of the distribution of the estimates now becomes manifest, and is of this kind. The competitors may be imagined to have erred normally in the first instance, and then to have magnified all errors that were positive. The lower half of the “observed” curve agrees for a large part of its range with a normal curve having the p.e. = 45, and the upper half with one having its p.e. = 29. I have not sufficient knowledge of the mental methods followed by those who judge weights to offer a useful opinion as to the cause of this curious anomaly. It is partly a psychological question, in answering which the various psychophysical investigations of Fechner and other would have to be taken into account. Also the anomaly may be partly due to the use of a small variety of different methods, or formulae, so that the estimates are not homogeneous in that respect.
It appears then, in this particular instance, that the vox populi is correct to within 1 per cent. of the real value, and that the individual estimates are abnormally distributed in such a way that it is an equal chance whether one of them, selected at random, falls within or without the limits of -3.7 per cent. and +2.4 per cent. of their middlemost value.
This result is, I think, more creditable to the trustworthiness of a democratic judgement than might have been expected.
The authorities of the more important cattle shows might do service to statistics if they made a practice of preserving the sets of cards of this description, that they may obtain on future occasions, and loaned them under proper restrictions, as these have been, for statistical discussion. The fact of the cards being numbered makes it possible to ascertain whether any given set is complete.
Francis Galton.
Notes
1 The original article included a diagram that illustrated the normal vs. observed curve. I plan on adding this at some point but have not yet had the time to try to emulate the hand-written diagram with modern plotting tools.
This entry has been tagged statistics, interesting, democracy, google, science — follow a tag for an archive of related essays, weblog entries, and bookmarks.
Discuss
Fantastic! I read about this in the introduction to The Wisdom of Crowds, but never bothered to look up the original paper.
Possibly of interest, from the Wisdom reference section:
Looks like they're available at galton.org too. (More PDF. Yay.)
— Norman David Gerre on Saturday, October 28, 2006 at 02:05 AM #
Hi Norman. Thanks for bringing that up. I meant to include a link to The Wisdom of Crowds in the main text somewhere. The RadioLab episode referenced includes a 10-15 minute segment with the author, James Surowiecki. He tells the story of Galton’s fat ox with quite some flair.
— Ryan Tomayko on Sunday, October 29, 2006 at 01:29 AM #
Note however Condorcet’s jury theorem and how it leads, as a corollary, to The idiocy of crowds.
— Aristotle Pagaltzis on Wednesday, December 20, 2006 at 10:43 PM #
Interesting. Looks like a I have more path to follow here.
— Ryan Tomayko on Saturday, December 23, 2006 at 08:06 AM #
In the text of Galton’s paper as you give it above, Galton states “middlemost estimate expresses the vox populi,” and “the middlemost estimate is 1207 lb., and the weight of the dressed ox proved to be 1198 lb.” Surowiecki in “The Wisdom of Crowds” states “The crowd had guessed that the ox, after it had been slaughtered and dressed, would weigh 1,197 pounds.” Surowiecki has incorrectly cited Galton’s paper, presumably having read 1197, which is the figure at the 45th centile. You usefully link a facsimile of the original paper, so we can be sure of our facts.
On reading Surowiecki’s book, I found the precision with which the crowd collectively guessed the weight, with a claimed error of 1 in 1198 to be suspiciously excessive. I find the correct version much more plausible.
I wonder if Surowiecki’s claim for the location of the submarine Scorpion similarly undermines itself by doctoring of the facts to give a spurious precision.
You may like to see my critique of Surowiecki’s lambasting of medical pathologists, at http://www.e-immunohistochemistry.info/surowiecki.htm
— Paul Bishop on Saturday, May 05, 2007 at 07:13 AM #
Hi Paul, the same thing ocured to me when I read the original paper. IIRC, Surowiecki doesn’t mention the weight directly in the RadioLab episode but says that the crowd was one pound off. This is clearly not what Galton originally recorded.
I wouldn’t know. I'm not familiar with Surowiecki’s work beyond the RadioLab episode. Do drop by and let me know if you find out. You've piqued my interest.
— Ryan Tomayko on Monday, May 07, 2007 at 12:22 AM #
Ryan, A couple of observations and questions.
First, from what I read in Galton’s writing, 1207 is the median guess: that amount where 50% of the guesses are above and 50% are below. As such, it is not necessarily the average of the guesses. And since Galton mentions that the curve graphing the guesses was “abnormal”, I would infer that the median and the mean are indeed different. I have not been able to determine the actual average of the guesses. (Surowiecki refers to a simple average computed by Galton; I've been unable to find Galton referring to anything other than the median guess of 1207.) Have you seen anyplace where the raw data is captured? It is not impossible that the average of the guesses was 1197… But I sure can’t prove it with the data in the Nature article.
A comment from Paul questions the story of the Scorpion. I happen to have the book cited by Surowiecki (Blind Man’s Bluff). The actual story of the Scorpion strikes me as more complex, and less astonishing, than Surowiecki’s recounting of it. The Scorpion disappeared on a trip from the Mediterranean back to the US. The Navy had some idea of when it disappeared: it knew when The Scorpion had last checked in. Further, there were recordings of a loud undersea sound from about the time of the disappearance. The Navy had recordings of the sound event from 3 different listening locations, so the position of the event could be triangulated. In addition, Craven (the officer whose story is told by Surowiecki) was able to use the recordings to determine that the Scorpion was headed east: 180 degrees off course. At the time, there was a known reason for a submarine reversing course: “hot-running”, where a torpedo has targeted the sub itself. The protocol for a hot running torpedo was for the sub to reverse course. (See the wikipedia entry on the Scorpion for more details, as well as updates to the theory of what caused The Scorpion to go down.)
So, Craven had a clear idea of when and where disaster struck the Scorpion. He also had a clear idea of the direction she was traveling when she went down. (The Navy searched for weeks to the west of the event, assuming that the Scorpion was on course.) Armed with these facts, Craven devised various scenarios of what could have happened, and polled a group of experts in the Navy. One question, for instance, was the slope at which the Scorpion would have sunk. The range of possibilities was from 1 foot drop for every foot of forward progress, to 7 feet of drop per foot of forward motion. The result of the poll, per Blind Man’s Bluff, was “between 3 and 4 feet”. It strikes me that averaging the extremes of the range gives 4 feet, which leaves me underwhelmed by the “wisdom” of the poll results.
After the Navy gave up looking to the west of the event, Craven was able to persuade one of the searchers to look east. The Scorpion was found fairly promptly, within 220 yards of where the “poll” had suggested it would be found.
I conclude that, had the Navy heeded Craven’s advice, to look east, in the first place, they would have found the sub almost immediately. All the fol-de-rol about polling experts and running their answers through Bayes' Theorem was beside the point.
— Charles Meyrick on Tuesday, May 15, 2007 at 02:55 PM #
I agree with Charles Meyrick’s comments. I think that Surowiecki has estimated the true mean from the average of the values at 5% intervals. This value is 1197.2!
I have updated my comments at
http://www.e-immunohistochemistry.info/Surowiecki.htm
— Paul Bishop on Sunday, May 20, 2007 at 04:12 AM #
Paul,
I ran a similar exercise, taking the average of each pair of percentiles, then averaging those averages. Because we don’t have the 0th and 100th percentiles, we only have 18 pairs, so 18 individual averages. The average of those averages that I got was 1197.972.
Of course, this assumes that the guesses in each percentile band are evenly distributed. Given that the overall distribution was (per Galton) abnormal, I don’t know how safe it is to conclude that the distribution within any given percentile band is normal, or should be expected to be normal. It’s a shame that the original raw data appears to be lost.
Charles Meyrick
— Charles Meyrick on Monday, May 21, 2007 at 10:47 AM #
Leave a comment