(Axiomatic) Definition of probability and its properties

(Axiomatic) Definition of probability

During the XXth century, a Russian mathematician, Andrei Kolmogorov, proposed a definition of probability, which is the one that we keep on using nowadays.

If we do a certain experiment, which has a sample space $$\Omega$$, we define the probability as a function that associates a certain probability, $$P(A)$$ with every event $$A$$, satisfying the following properties.

The probability of any event $$A$$ is positive or zero. Namely $$P(A)\geq 0$$. The probability measures, in a certain way, the difficulty of event $$A$$ happening: the smaller the probability, the more difficult it is to happen.
The probability of the sure event is $$1$$. Namely $$P(\Omega)=1$$. And so, the probability is always greater than $$0$$ and smaller than $$1$$: probability zero means that there is no possibility for it to happen (it is an impossible event), and probability $$1$$ means that it will always happen (it is a sure event).
The probability of the union of any set of two by two incompatible events is the sum of the probabilities of the events. That is, if we have, for example, events $$A, B, C$$, and these are two by two incompatible, then $$P(A\cup B \cup C)=P(A)+P(B)+P(C).$$

Note: In mathematics, an axiom is a result that is accepted without the need for proof. In this case, we say that this is the axiomatic definition of probability because we define probability as a function that satisfies these three axioms. Also, we might have chosen different axioms, and then probability would be another thing.

Main properties of probability

$$P(A)+P(\overline{A})=1$$.

That is, the probabilities of complementary events add up to $$1$$. Often we will use this property to calculate probability of the complementary set: $$P(\overline{A})=1-P(A)$$.

Let's see why. We know that, on the one hand, $$A$$ and $$\overline{A}$$ are incompatible, and on the other that $$A\cup \overline{A}= \Omega$$, since one is the opposite of the other. This is another way of understanding what we already knew, i.e., that the event $$A\cup \overline{A}$$ is a sure event, and therefore, because of axiom 2 $$P(A \cup \overline{A})=1$$, it always happens. Then, for the axiom 3 $$P(A \cup \overline{A})=P(A)+P(\overline{A})$$. But $$P(A \cup \overline{A})=P(\Omega)=1$$, thus $$P(A)+P(\overline{A})=1$$.

This property, which turns out to be very useful, can be generalized:

If we have three or more events, two by two incompatible, and such that their union is the whole sample space, that is to say, $$A, B, C$$ two by two incompatible so that $$A\cup B \cup C = \Omega$$, then $$P(A)+P(B)+P(C)=1$$, for axioms 2 and 3.

We say in this case that $$A, B, C$$ form an events complete system. Let's observe that whenever we express $$\Omega$$ as a set of elementary events, in fact we are giving a complete system of events.

As a result $$P(\emptyset)=0$$, that is to say, the probability of the impossible event is $$0$$, since, as we know that the event opposite to the impossible one is the sure event, then we can replace this in the equality of the property $$P(\emptyset)+P(\Omega)=1$$. Therefore, as for the second axiom of the probability $$P(\Omega)=1$$, we have $$P(\emptyset)+1=1$$, thus $$P(\emptyset)=0$$.

If $$A\subset B$$, then $$P(A) \leq P(B)$$.

The notation "if $$A\subset B$$" reads "if the event $$A$$ is included in event $$B$$" that is to say, if all the possible results that satisfy $$A$$ also satisfy $$B$$.

This property is quite logical: if, after throwing a dice, we want to compare the probability of $$A =$$"to extract $$2$$" with $$B =$$"to extract an even number", then, the probability of $$A$$ has to be smaller or the same as that of $$B$$ since if we extract $$2$$, we are extracting an even number. In other words, when $$A$$ is satisfied, $$B$$ is also satisfied, therefore it should be more difficult to satisfy $$A$$ than $$B$$. Namely $$P(A) \leq P(B)$$.

$$P(A\cup B)=P(A)+P(B)-P(A\cap B)$$.

This result, which is very important to remember, is a consequence of something that you can see in the sets cell: given two sets, A and B, you can express its union as $$A\cup B = (A-B)\cup (A\cap B) \cup (B-A),$$ which are two by two incompatible. Then, for axiom 3 $$P(A\cup B)=P(A-B)+P(A\cap B)+P(B-A)$$.

In the Sets Teory we have that $$A=(A-B) \cup (A\cap B)$$, which are two incompatible events, and therefore, for axiom 3 $$P(A)=P(A-B)+P(A\cap B)$$, that is to say, $$P(A-B)=P(A)-P(A\cap B)$$.

Similarly,

$$B=(B-A) \cup (B\cap A) = (B-A) \cup (A\cap B)$$, by which $$P(B-A)=P(B)-P(A\cap B)$$.

Replacing these probabilities in the equality, we find

$$P(A\cup B)= P(A-B)+P(A\cap B)+P(B-A)=$$ $$=P(A)-P(A\cap B)+P(A\cap B)+(P(B)-P(A\cap B))=$$ $$=P(A)+P(B)-P(A\cap B)$$

Now, we can solve some problems.

A dice of six faces is tailored so that the probability of getting every face is proportional to the number depicted on it.

1 What is the probability of extracting a $$6$$?

In this case, we say that the probability of each face turning up is not the same, therefore we cannot simply apply the rule of Laplace. If we follow the statement, it says that the probability of each face turning up is proportional to the number of the face itself, and this means that, if we say that the probability of face $$1$$ being turned up is $$k$$ which we do not know, then:

$$P(\{1\})=k, \ P(\{2\})=2k, \ P(\{3\})=3k, \ P(\{4\})=4k,$$

$$\ P(\{5\})=5k, P(\{6\})=6k.$$

Now, since $$\{1\},\{2\},\{3\},\{4\},\{5\},\{6\}$$ form an events complete system , necessarily

$$$P(\{1\})+P(\{2\})+P(\{3\})+P(\{4\})+P(\{5\})+P(\{6\})=1$$$

Therefore $$$k+2k+3k+4k+5k+6k=1$$$ which is an equation that we can already solve: $$$21k=1$$$ thus $$$k=\dfrac{1}{21}$$$

And so, the probability of extracting $$6$$ is $$P(\{6\})=6k=6\cdot \dfrac{1}{21}=\dfrac{6}{21}.$$

2 What is the probability of extracting an odd number?

The cases favourable to event $$A =$$ "to extract an odd number" are: $$\{1\},\{3\},\{5\}$$. Therefore, since they are incompatible events,

$$$P(A)=P(\{1\})+P(\{3\})+P(\{5\})=k+3k+5k=9k=9\cdot \dfrac{1}{21}=\dfrac{9}{21} $$$

Tomorrow there is an exam. Esther has studied really hard, and she only has $$\dfrac{1}{5}$$ probability of not passing the exam.

David has studied less, and he has $$\dfrac{1}{3}$$ probability of not passing the exam. We know that the probability of both not passing the exam is $$\dfrac{1}{8}$$.

What is the probability that at least one of them does not pass the exam?

The first thing that we must do is express the problem as we know how, i.e., with events. We define the events $$A = $$"Esther does not pass the exam", $$B =$$"David does not pass the exam".

From the statement, we know that $$P(A\cap B)=\dfrac{1}{8}$$.

We might think that if Esther has probability $$\dfrac{1}{5}$$ of not passing the exam, and David $$\dfrac{1}{3}$$ of not passing the exam, then the probability of at least one of them not passing, that is to say $$P(A\cup B)$$, should be $$\dfrac{1}{5} + \dfrac{1}{3} = \dfrac{8}{15}$$, but this is false.

If we compute it this way, we are assuming that the events $$A$$ and $$B$$ are incompatible, that is to say, that they cannot happen simultaneously, when the statement says that they could both not pass (simultaneously).

Therefore, the correct way of calculating this probability is using the formula that we have seen before: $$$P(A\cup B)=P(A)+P(B)-P(A\cap B)$$$

by replacing with the results that we know, we get $$$P(A\cup B)=\dfrac{1}{5}+\dfrac{1}{3}-\dfrac{1}{8}=\dfrac{24}{120}+\dfrac{40}{120}-\dfrac{15}{120}=\dfrac{49}{120}$$$

or what amounts to the same, $$40,8\widehat{3}\%$$.