Tuesday, November 11, 2014

John Venn on the Reference Class Problem

A classic statement from John Venn’s The Logic of Chance (1876) about a fundamental problem in philosophy of probability, the reference class problem:
“We must now shift our point of view a little; instead of starting, as in the former chapters, with a determinate series supposed to be given to us, let us assume that the individual only is given, and that the work is imposed upon us of finding out the appropriate series. How are we to set about the task? In the former case our data were of this kind:—Eight out of ten men, aged fifty, will live eleven years more, and we ascertained in what sense, and with what certainty, we could infer that, say, John Smith, aged fifty, would live to sixty-one.

§ 12. Let us then suppose, instead, that John Smith presents himself, how should we in this case set about obtaining a series for him? In other words, how should we collect the appropriate statistics? It should be borne in mind that when we are attempting to make real inferences about things as yet unknown, it is in this form that the problem will practically present itself.

At first sight the answer to this question may seem to be obtained by a very simple process, viz. by counting how many men of the age of John Smith, respectively do and do not live for eleven years. In reality however the process is far from being so simple as it appears. For it must be remembered that each individual thing has not one distinct and appropriate series, to which, and to which alone, it properly belongs. We may indeed be practically in the habit of considering it under such a single aspect, and it may therefore seem to us more familiar when it occupies a place in one series rather than in another; but such a practice is merely customary on our part, not obligatory. It is obvious that every individual thing or event has an indefinite number of properties or attributes observable in it, and might therefore be considered as belonging to an indefinite number of different classes of things. By belonging to any one class it of course becomes at the same time a member of all the higher classes, the genera, of which that class was a species. But, moreover, by virtue of each accidental attribute which it possesses, it becomes a member of a class intersecting, so to say, some of the other classes. John Smith is a consumptive man say, and a native of a northern climate. Being a man he is of course included in the class of vertebrates, also in that of animals, as well as in any higher such classes that there may be. The property of being consumptive refers him to another class, narrower than any of the above; whilst that of being born in a northern climate refers him to a new and distinct class, not conterminous with any of the rest, for there are things born in the north which are not men.

When therefore John Smith presents himself to our notice without, so to say, any particular label attached to him informing us under which of his various aspects he is to be viewed, the process of thus referring him to a class becomes to a great extent arbitrary. If he had been indicated to us by a general name, that, of course, would have been some clue; for the name having a determinate connotation would specify at any rate a fixed group of attributes within which our selection was to be confined. But names and attributes being connected together, we are here supposed to be just as much in ignorance what name he is to be called by, as what group out of all his innumerable attributes is to be taken account of; for to tell us one of these things would be precisely the same in effect as to tell us the other. In saying that it is thus arbitrary under which class he is placed, we mean, of course, that there are no logical grounds of decision; the selection must be determined by some extraneous considerations. Mere inspection of the individual would simply show us that he could equally be referred to an indefinite number of classes, but would in itself give no inducement to prefer, for our special purpose, one of these classes to another.
This variety of classes to which the individual may be referred owing to his possession of a multiplicity of attributes, has an important bearing on the process of inference which was indicated in the earlier sections of this chapter, and which we must now examine in more special reference to our particular subject.” (Venn 1876: 194–195).
What is the problem here?

The problem is closely related to probability theory. What, for example, is the probability that some particular person John Smith will contract cancer in his lifetime? Is there a fixed, objective, numeric probability that we can give? Some would say: yes. Can’t we just look at the frequency of how many people get cancer of the whole national population of the country where John Smith lives?

Let us imagine that John Smith lives in the UK. According to statistical data, a UK citizen will have at least a 1 in 3 chance of being diagnosed with one of the many forms of cancer during his or her lifetime. The crucial class is: the class of those people who are diagnosed with cancer of all the UK population.

So is there a fixed, objective, numeric probability of at least 1 in 3 that John Smith will contract cancer during his lifetime? On closer inspection, however, this does not necessarily seem to be right. Some people have a much greater risk of developing cancer than other people, on the basis of genetics and environmental influences like their incidence of smoking, excessive alcohol consumption and exposure to carcinogenic substances.

Immediately, we can identify further, narrower reference classes to which John Smith might belong and which would change the probability of his being diagnosed with cancer.

Let us consider the reference classes one by one, and imagine we know many details about John Smith:
(1) the population of the UK;

(2) the class of people who develop cancer in the population of the UK;

(3) the class of male people who develop cancer in the UK (since John Smith is a man);

(4) the class of non-smoking male people who develop cancer in the UK (since John Smith is a non-smoker);

(6) the class of non-smoking, male people who do not drink alcohol and who develop cancer in the UK (since John Smith does not drink);

(7) the class of non-smoking, male people who do not drink alcohol and who are not exposed to known carcinogenic substances in the workplace and who develop cancer in the UK (since John Smith has a job where he is not exposed to known carcinogenic substances);

(8) the class of non-smoking, male people who do not drink alcohol and who are not exposed to known carcinogenic substances in the workplace and who do regular exercise and who develop cancer in the UK (since John Smith does regular exercise).
If we were to look at the class of people in class (8), then we can calculate a statistical probability.

But the trouble is: this is not even an exhaustive list! Can we really obtain a fixed, objective, numeric probability if we keep adding narrower reference classes and finally get to a point where we stop? Even here there is a crucial problem: the uncertainty of the future. Whatever probability obtained may well be overturned in the future, so that it can hardly be said to be objectively fixed in the long run in the way that the probability of rolling 6 in a fair game of die is actually fixed at 1 in 6 and becomes closer to this probability in the long run.

Say, some new, preventive drug is developed in 4 years which cuts one’s risk of developing cancer by 60%, and John takes the drug. Suddenly his probability of developing cancer has to be radically revised.

Say, John has an undiagnosed genetic disorder that modern science cannot identify that makes it 100% certain that he will develop cancer. Here the statistical probability, even in the narrowest reference class, will be wrong and clearly not an objective, numeric probability, because we have missed a fundamental fact about John.

We can quickly see the difficulties that the reference class problem gives rise to, and how it can render statistical probabilities problematic. Of course, I do not want to suggest that statistical probabilities are useless: clearly they are not, and they can be very useful.

But we should be aware of the limitations of statistical probabilities too, and how some may just give us the illusion of objectivity.

Venn, John. 1876. The Logic of Chance: An Essay on the Foundations and Province of the Theory of Probability, with Especial Reference to its Logical Bearings and its Application to Moral and Social Science (2nd rev. edn.). Macmillan, London.

1 comment:

  1. Very nice. You might want to read John Martin's Explanation of Social Action (Oxford University Press, 2011). He makes complementary observations and critiques, and posits field theory as a different way to do social science.