Probability
All Possible Worlds
Possible World Instance
Example: possible dice throwing results
Probability of Possible Worlds
Rules
Unconditional Probability
Unconditional probability is the degree of belief in a proposition in the absence of any other evidence. All the questions that we have asked so far were questions of unconditional probability, because the result of rolling a die is not dependent on previous events.
Conditional Probability
Degree of belief in a proposition given some evidence that has already have been revealed
: The probability that is true given that we already know that is true
: What we want the probability of
: The information that we already know for certain about the world
Example
: Probability that today is raining given that yesterday was raining
Mathematical Relation
Example
What is the prob that the sum of two dice rolling equal 6 given that the first roll was 6?
Random Variable
A variable in probability theory with a domain of possible values it can take on
Example
- Variable: Dice Roll
- Domain: {1,2,3,4,5,6}
Example
- Variable: Flight
- Domain: {on time, delayed, cancelled}
Often, we are interested in the probability with which each value occurs. We represent this using a probability distribution. For example,
A probability distribution can be represented more succinctly as a vector. For example,
Independence
Independence is the knowledge that the occurrence of one event does not affect the probability of the other event.
Examples
Independent Events
When rolling two dice, the result of each die is independent from the other. Rolling a 4 with the first die does not influence the value of the second die that we roll.
Dependent Events
This is opposed to dependent events, like clouds in the morning and rain in the afternoon. If it is cloudy in the morning, it is more likely that it will rain in the morning, so these events are dependent.
Mathematical Relation
Independence can be defined mathematically: events and are independent if and only if the probability of and is equal to the probability of times the probability of :
Bayes’ Rule
Knowing , in addition to and P(b), allows us to calculate . This is helpful, because:
- Knowing the conditional probability of a visible effect given an unknown cause: , allows us to calculate the probability of the unknown cause given the visible effect:
Joint Probability
Joint probability is the likelihood of multiple events all occurring.
Example
Let us consider the following example, concerning the probabilities of clouds in the morning and rain in the afternoon.
C = cloud | C = ¬cloud |
0.4 | 0.6 |
R = rain | R = ¬rain |
0.1 | 0.9 |
Looking at these data, we can’t say whether clouds in the morning are related to the likelihood of rain in the afternoon. To be able to do so, we need to look at the joint probabilities of all the possible outcomes of the two variables. We can represent this in a table as follows:
R = rain | R = ¬rain | |
C = cloud | 0.08 | 0.32 |
C = ¬cloud | 0.02 | 0.58 |
- Using joint probabilities, we can deduce conditional probability.
- For example, if we are interested in the probability distribution of clouds in the morning given rain in the afternoon:
- In words, we divide the joint probability of rain and clouds by the probability of rain.
- A side note: in probability, commas and ∧ are used interchangeably. Thus, .
- It is possible to view as some constant by which is multiplied. Thus, we can rewrite , or , based on the tables above
Probability Rules
- Negation:
- Inclusion-Exclusion:
- Exclude the double counted cases
- Marginalization:
- It allows us to go from the joint distributions to individual probabilities
- Conditioning: .
- Similar to marginalization, but this type depends on conditional probability instead of joint probability
Bayesian network
- A data structure that represents the dependencies among random variables.
- It is one of the common Probability Models
- Bayesian networks have the following properties:
- They are directed graphs.
- Each node on the graph represent a random variable.
- An arrow from X to Y represents that X is a parent of Y. That is, the probability distribution of Y depends on the value of X.
- Each node X has probability distribution .
- Parents can be considered as causes of the effects
Example
- The graph illustrates the dependencies and chain rules among the random variables
- Bayesian network from the top down:
- Rain is the root node in this network. This means that its probability distribution is not reliant on any prior event. In our example, Rain is a random variable that can take the values {none, light, heavy} with the following probability distribution:
- Maintenance, in our example, encodes whether there is train track maintenance, taking the values {yes, no}. Rain is a parent node of Maintenance, which means that the probability distribution of Maintenance is affected by Rain.
- Train is the variable that encodes whether the train is on time or delayed, taking the values {on time, delayed}. Note that Train has arrows pointing to it from both Maintenance and Rain. This means that both are parents of Train, and their values affect the probability distribution of Train.
- Appointment is a random variable that represents whether we attend our appointment, taking the values {attend, miss}. Note that its only parent is Train. This point about Bayesian network is noteworthy: parents include only direct relations. It is true that maintenance affects whether the train is on time, and whether the train is on time affects whether we attend the appointment. However, in the end, what directly affects our chances of attending the appointment is whether the train came on time, and this is what is represented in the Bayesian network. For example, if the train came on time, it could be heavy rain and track maintenance, but that has no effect over whether we made it to our appointment.
none | light | heavy |
0.7 | 0.2 | 0.1 |
R | yes | no |
none | 0.4 | 0.6 |
light | 0.2 | 0.8 |
heavy | 0.1 | 0.9 |
R | M | on time | delayed |
none | yes | 0.8 | 0.2 |
none | no | 0.9 | 0.1 |
light | yes | 0.6 | 0.4 |
light | no | 0.7 | 0.3 |
heavy | yes | 0.4 | 0.6 |
heavy | no | 0.5 | 0.5 |
T | attend | miss |
on time | 0.9 | 0.1 |
delayed | 0.6 | 0.4 |
- To find the probability of missing the appointment, then we need to compute the joint probability as follows:
Inference
- Query : variable for which to compute distribution
- Evidence variables : observed variables for event
- Hidden-variables : variables that aren’t the query and also haven’t been observed.
- Similar to hidden layers in ANN
- Goal: Calculate
- That can be done through Marginalization
Inference by Enumeration
Inference by enumeration is a process of finding the probability distribution of variable X given observed evidence e and some hidden variables Y.
Check the lecture notes for coding example