The probability of variable being in particular state is written
,
so P(X=x) is the probability that variable X is in
state x. A vector of probabilities over all states (state probability
distribution or vector) of a variable is given by
[28, pp
420].
The probabilities of different statements or events can be combined using
logic operators such as logical-and `
' and logical-or `
'. For
example, Equation 3.1 shows that the probability of X1being in state x1 and X2 being in x2 is 0.25.
The set of probabilities defining the conjunction (logical-and) of all possible combinations of all the states of every variable in the system3.2 is termed the joint probability distribution. Equation 3.2 shows the syntax for expressing the joint probability distribution for a system consisting of the variables X0 through XN. A common alternative and more convenient representation of joint probability that will be used in this report is shown in Equation 3.3 [28, pp 425].
For systems with more than just a few variables and states, these distributions can become very large, as every possible state combination over every variable must be represented. For example, assuming binary (two-state) variables, a system with 10 variables would require 210 =1024 individual probabilities. This number increases dramatically if the variables can take on more states (which they frequently do).
The size and complexity of representing a system of variables can be reduced by taking advantage of the dependence between variables. If a change of state in one variable doesn't effect the probabilities of another, then their entries in the joint probability distribution are effectively redundant. The notion of only using probability distributions of related variables is the basis of conditional probability.
The syntax used for conditional probability is
;
is ``the probability of variable A being in state a given
the state of B is b''. Here A is the query variable, and B=b the
evidence, and there is a relationship between A and B3.3 - a change in the state of B will effect
the probability (posterior) distribution of A. The statement A=a is
referred to as a hypothesis or proposition. The probability
distribution (vector) of A given the evidence B=b is given by
,
and the array of conditional probabilities for all states of Agiven each state of B is
,
and is known as conditional
probability distribution or CPD for the variable A.
Conditional probability can be defined in terms of standard unconditional
probabilities using the product rule, as shown in Equation
3.4. To avoid confusion, conditional probability theory calls
probabilities without conditions, such as
,
unconditional
probabilities, prior probabilities, or just `priors'
[28].
The communicative nature of the `and' operator can be used in rearranging
Equation 3.4 so that
can be found using the value
of
.
The resulting equation 3.6 is known as
Bayes' Rule3.4, which
allows probabilities (beliefs) to be calculated for every node in a BN
(excluding instantiated nodes, as their state is already known), regardless of
where evidence is added and directions of causal links. This way the
probability of a disease given the symptoms can be found (diagnostic
inference), as can the probability of symptoms given the disease (causal
inference).
Once evidence has been added to one or more variables, an inference algorithm [30] is used to update all the probability distributions for each uninstantiated variable. Inference algorithms are based on the application of Bayes' rule, shown in Equation 3.6. The application of an inference algorithm is sometimes referred to as belief updating. The probability distributions resulting from belief updating are the posterior beliefs or just `posteriors' of the system. These can be seen as the `predictions' of the states of unobserved (or unobservable) variables, given the added evidence.
As an example, a BN for diagnosing diseases may have a causal link from the node ``cold'' to ``sneezing''. This encodes the causal relationship ``sneezing is caused by a cold''. A node can have any number of parents; the nodes ``hay-fever'' and ``allergies'' may also point to the child node ``sneezing''.
Belief networks have been found to be useful in many applications related to reasoning and decision making for some time, and as a consequence have been subject to substantial research, which has in turn produced a wide array of available literature [28,31,27, pp. 437].
Besides more efficient representation and computation, an additional benefit of using conditional probability and belief networks over a joint probability distribution is that information is represented in a more understandable and logical manner, making construction and interpretation much simpler.
As shown in figure 3.1, child nodes can also be parent nodes.
Nodes with one or more parents require their conditional probability
distribution to be provided by way of a conditional probability table
(CPT), which specifies the conditional probability of the child node being in
a particular state, given the states of all its parents:
.
A conditioning state is
used to specify the states of all the parents when specifying an entry in the
child node's conditional probability table. Nodes without parents only require
a prior probability distribution,
(Section 3.1.2).
As mentioned earlier, the conditional probability is a summarised form of the joint probability distribution. The relationship between joint and conditional probability is shown in Equation 3.7, where the function Parents(Xi) represents the set of nodes with causal links directed into the node Xi.
The node C in Figure 3.1 would have an associated CPT
specifying the conditional distribution
.
Similarly, the CPTs for
nodes D and E would specify
and
respectively. The nodes
A and B have no parents, so only require the prior probability distributions
and
.
For example, assuming all variables in Figure 3.1 are
binary, where variable A can take on the states a1,a2 and so forth, the
CPT for node C would be shown in Table 3.1. Note that in a
real CPT, the
probability expressions are replaced by
probability values between 0 and 1.
Computer learning-based techniques may also be employed to build belief networks. Using test data, a software program may be able to determine appropriate parameters (conditional and prior probability distributions) for the network. A substantially more complex piece of software may even be able to determine the causal relationships between variables and therefore automatically generate an appropriate network structure [32].
The simplified steps for (manually) building and using a belief network is summarised by the flowchart in Figure 3.2.
Dynamic belief network (DBNs), are an extension of belief networks, specialised to better model temporal environments (such as walking). A DBN is made up of interconnected time slices of smaller (usually structurally identical) static belief networks. The evidence and inferred beliefs of previous time slices are used to predict or estimate beliefs in the current and future (or prediction) time slices.
The conditional probability distributions of nodes with parents in different time slices are defined by a state evolution model, which describes how the system evolves over time. The causal links across slices are sometimes referred to as temporal links.
The conditional probabilities between state and observation nodes are defined using a sensor model (see Figure 3.3). The sensor model embodies the accuracy of the sensor, by providing a conditional probability distribution of the sensor reading given the actual (correct) state of the system.
As new evidence is added to a DBN, new time slices are added. To reduce computational complexity, `old' time slices are commonly removed and their information summarised into prior probabilities of following slices. This produces a `moving window' of slices, all connected to form a DBN. Inference is performed as if the network were a normal BN, although the nature of DBNs usually results in larger and more complex networks, requiring more computation to update. This, and the possible requirement be updated in real-time (depending on the frequency of new evidence), means the inference algorithm must be faster and/or more efficient than may be required for static BNs. As a consequence there has been substantial research effort concentrated on reducing the computational complexity of performing inference on DBNs [33,34,35,36]. Most of these methods use `approximate' updating algorithms.
DBNs are created in a similar manner to normal BNs, except the designer must also take temporal relationships between slices into account. Usually DBNs are specified by supplying some `reference' network, and software is used to expand, contract, and update the network. The implementation steps of a DBN are summarised in figure 3.4.