next up previous contents
Next: System Design Overview Up: No Title Previous: Human Ambulation and Falls

Subsections

   
Belief Network Theory

The belief network3.1 (BN) is a `causal reasoning' tool that has found many uses in a wide variety of applications, and is now the mainstay of the AI research field known as ``uncertain reasoning'' [27,28,29]. BNs are based on the laws of probability, and in particular conditional and Bayesian probability theory. This chapter assumes some background knowledge in probability theory, so only reviews the concepts and syntax required to introduce belief and dynamic belief networks.

   
Probability theory review

   
Basic probability theory

Throughout this report the probability of a statement will be given by $\emph{P}(statement)$, and will take on a value between 0 (statement is always false) and 1 (statement is always true). The probabilistic systems used in this project deal in discrete random variables, which must take on exactly one state from a finite domain of predefined states. The probability of a variable being in each of its states is defined in that variable's probability distribution. As the variable may take on only one of its states at any given time, these distributions always sum to 1.

The probability of variable being in particular state is written $\emph{P}(variable=state)$, so P(X=x) is the probability that variable X is in state x. A vector of probabilities over all states (state probability distribution or vector) of a variable is given by $\emph{P}(variable)$ [28, pp 420].

The probabilities of different statements or events can be combined using logic operators such as logical-and `$\wedge$' and logical-or `$\vee$'. For example, Equation 3.1 shows that the probability of X1being in state x1 and X2 being in x2 is 0.25.


 \begin{displaymath}
\emph{P}(X_1=x_1\wedge{}X_2=x_2)=0.25
\end{displaymath} (3.1)

The set of probabilities defining the conjunction (logical-and) of all possible combinations of all the states of every variable in the system3.2 is termed the joint probability distribution. Equation 3.2 shows the syntax for expressing the joint probability distribution for a system consisting of the variables X0 through XN. A common alternative and more convenient representation of joint probability that will be used in this report is shown in Equation 3.3 [28, pp 425].


  
$\displaystyle \emph{P}(X_0=x_0\wedge{}X_1=x_1\wedge{}\ldots{}\wedge{}X_n=x_N)$     (3.2)
$\displaystyle \emph{P}(X_0=x_0,X_1=x_1,\ldots{},X_n=x_N)$     (3.3)

For systems with more than just a few variables and states, these distributions can become very large, as every possible state combination over every variable must be represented. For example, assuming binary (two-state) variables, a system with 10 variables would require 210 =1024 individual probabilities. This number increases dramatically if the variables can take on more states (which they frequently do).

The size and complexity of representing a system of variables can be reduced by taking advantage of the dependence between variables. If a change of state in one variable doesn't effect the probabilities of another, then their entries in the joint probability distribution are effectively redundant. The notion of only using probability distributions of related variables is the basis of conditional probability.

   
Bayesian and conditional probability

The conditional or posterior probability is used to find the probability distribution of a variable over its states, given some evidence about the state of other variables in the system. Evidence is added to the system by instantiation of variables: `forcing' an evidence variable variable to take on a particular state. When an evidence variable X is instantiated to the state x, its probability distribution is changed so that $\emph{P}(X=x)=1$, which implies $\emph{P}(X\neq x)=0$. An uninstantiated variable (a variable that has received no evidence) that we are interested in observing after evidence has been taken into account is called a query variable. After evidence has been taken into account the posterior belief (the state probability distribution after evidence has been added) of the systems uninstantiated nodes can be found.

The syntax used for conditional probability is $\emph{P}(query\vert evidence)$; $\emph{P}(A=a\vert B=b)$ is ``the probability of variable A being in state a given the state of B is b''. Here A is the query variable, and B=b the evidence, and there is a relationship between A and B3.3 - a change in the state of B will effect the probability (posterior) distribution of A. The statement A=a is referred to as a hypothesis or proposition. The probability distribution (vector) of A given the evidence B=b is given by $\emph{P}(A\vert B=b)$, and the array of conditional probabilities for all states of Agiven each state of B is $\emph{P}(A\vert B)$, and is known as conditional probability distribution or CPD for the variable A.


 
$\displaystyle \emph{P}(A\vert B)$ = $\displaystyle \frac{\emph{P}(A \wedge B)}{\emph{P}(B)}$  
% latex2html id marker 6242
$\displaystyle \therefore \emph{P}(A \wedge B)$ = $\displaystyle \emph{P}(A\vert B)\emph{P}(B)$ (3.4)

Conditional probability can be defined in terms of standard unconditional probabilities using the product rule, as shown in Equation 3.4. To avoid confusion, conditional probability theory calls probabilities without conditions, such as $\emph{P}(A=a)$, unconditional probabilities, prior probabilities, or just `priors' [28].

The communicative nature of the `and' operator can be used in rearranging Equation 3.4 so that $\emph{P}(A\vert B)$ can be found using the value of $\emph{P}(B\vert A)$. The resulting equation 3.6 is known as Bayes' Rule3.4, which allows probabilities (beliefs) to be calculated for every node in a BN (excluding instantiated nodes, as their state is already known), regardless of where evidence is added and directions of causal links. This way the probability of a disease given the symptoms can be found (diagnostic inference), as can the probability of symptoms given the disease (causal inference).


  
$\displaystyle \emph{P}(A \wedge B)$ = $\displaystyle \emph{P}(B \wedge A)$ (3.5)
  = $\displaystyle \emph{P}(A\vert B)\emph{P}(B)$  
  = $\displaystyle \emph{P}(B\vert A)\emph{P}(A)$  
       
% latex2html id marker 6252
$\displaystyle \therefore \emph{P}(A\vert B)$ = $\displaystyle \frac{\emph{P}(B\vert A)\emph{P}(A)}{\emph{P}(B)}$ (3.6)

Once evidence has been added to one or more variables, an inference algorithm [30] is used to update all the probability distributions for each uninstantiated variable. Inference algorithms are based on the application of Bayes' rule, shown in Equation 3.6. The application of an inference algorithm is sometimes referred to as belief updating. The probability distributions resulting from belief updating are the posterior beliefs or just `posteriors' of the system. These can be seen as the `predictions' of the states of unobserved (or unobservable) variables, given the added evidence.

   
Belief networks

Theory

A belief network is a directed acyclic graph (DAG) which encodes the causal relationships between particular variables, represented in the DAG as nodes. Nodes are connected by causal links - represented by arrows - which point from parent nodes (causes) to child nodes (effects), as shown in Figure 3.1.

As an example, a BN for diagnosing diseases may have a causal link from the node ``cold'' to ``sneezing''. This encodes the causal relationship ``sneezing is caused by a cold''. A node can have any number of parents; the nodes ``hay-fever'' and ``allergies'' may also point to the child node ``sneezing''.

Belief networks have been found to be useful in many applications related to reasoning and decision making for some time, and as a consequence have been subject to substantial research, which has in turn produced a wide array of available literature [28,31,27, pp. 437].

Besides more efficient representation and computation, an additional benefit of using conditional probability and belief networks over a joint probability distribution is that information is represented in a more understandable and logical manner, making construction and interpretation much simpler.


  
Figure 3.1: Simple example belief network.
\includegraphics[width=50mm]{figures/simpleBN.eps}

As shown in figure 3.1, child nodes can also be parent nodes. Nodes with one or more parents require their conditional probability distribution to be provided by way of a conditional probability table (CPT), which specifies the conditional probability of the child node being in a particular state, given the states of all its parents: $\emph{P}(child\vert parent_1,parent_2,\ldots,parent_N)$. A conditioning state is used to specify the states of all the parents when specifying an entry in the child node's conditional probability table. Nodes without parents only require a prior probability distribution, $\emph{P}(node)$ (Section 3.1.2).

As mentioned earlier, the conditional probability is a summarised form of the joint probability distribution. The relationship between joint and conditional probability is shown in Equation 3.7, where the function Parents(Xi) represents the set of nodes with causal links directed into the node Xi.


 \begin{displaymath}
\emph{P}(X_1,X_2,\ldots ,X_N) = \prod_{i=1}^{N}\emph{P}(X_i \vert \mathrm{Parents}(X_i))
\end{displaymath} (3.7)

The node C in Figure 3.1 would have an associated CPT specifying the conditional distribution $\emph{P}(C\vert A,B)$. Similarly, the CPTs for nodes D and E would specify $\emph{P}(D\vert C)$ and $\emph{P}(E\vert C)$ respectively. The nodes A and B have no parents, so only require the prior probability distributions $\emph{P}(A)$ and $\emph{P}(B)$. For example, assuming all variables in Figure 3.1 are binary, where variable A can take on the states a1,a2 and so forth, the CPT for node C would be shown in Table 3.1. Note that in a real CPT, the $\emph{P}(\ldots)$ probability expressions are replaced by probability values between 0 and 1.


 
Table 3.1: CPT for example node C
    C
A B c1 c2
a1 b1 $\emph{P}(C=c1 \vert A=a1,B=b1)$ $\emph{P}(C=c2 \vert A=a1,B=b1)$
a1 b2 $\emph{P}(C=c1 \vert A=a1,B=b2)$ $\emph{P}(C=c2 \vert A=a1,B=b2)$
a2 b1 $\emph{P}(C=c1 \vert A=a2,B=b1)$ $\emph{P}(C=c2 \vert A=a2,B=b1)$
a2 b2 $\emph{P}(C=c1 \vert A=a2,B=b2)$ $\emph{P}(C=c2 \vert A=a2,B=b2)$

Belief network construction

Before evidence can be added and beliefs extracted from a BN, it must first be created. The initial design of belief networks generally requires the help of or `expert' in the area being modelled, to specify the correct causal relationships and conditional probabilities.

Computer learning-based techniques may also be employed to build belief networks. Using test data, a software program may be able to determine appropriate parameters (conditional and prior probability distributions) for the network. A substantially more complex piece of software may even be able to determine the causal relationships between variables and therefore automatically generate an appropriate network structure [32].

The simplified steps for (manually) building and using a belief network is summarised by the flowchart in Figure 3.2.


  
Figure 3.2: BN implementation flowchart
\includegraphics{figures/BNflowchart.eps}

Dynamic belief networks


  
Figure 3.3: Generalised DBN model
\includegraphics[width=140mm]{figures/generalDBN.eps}

   
Theory

The `static' belief networks presented in Section 3.2 work with evidence and beliefs from a single instant in time. As a result BNs are not particularly suited to systems that are `evolving' over time. Sometimes it is useful to remember past evidence and beliefs, as they may useful for determining the system's current state or for predicting future beliefs.

Dynamic belief network (DBNs), are an extension of belief networks, specialised to better model temporal environments (such as walking). A DBN is made up of interconnected time slices of smaller (usually structurally identical) static belief networks. The evidence and inferred beliefs of previous time slices are used to predict or estimate beliefs in the current and future (or prediction) time slices.

The conditional probability distributions of nodes with parents in different time slices are defined by a state evolution model, which describes how the system evolves over time. The causal links across slices are sometimes referred to as temporal links.

The conditional probabilities between state and observation nodes are defined using a sensor model (see Figure 3.3). The sensor model embodies the accuracy of the sensor, by providing a conditional probability distribution of the sensor reading given the actual (correct) state of the system.

As new evidence is added to a DBN, new time slices are added. To reduce computational complexity, `old' time slices are commonly removed and their information summarised into prior probabilities of following slices. This produces a `moving window' of slices, all connected to form a DBN. Inference is performed as if the network were a normal BN, although the nature of DBNs usually results in larger and more complex networks, requiring more computation to update. This, and the possible requirement be updated in real-time (depending on the frequency of new evidence), means the inference algorithm must be faster and/or more efficient than may be required for static BNs. As a consequence there has been substantial research effort concentrated on reducing the computational complexity of performing inference on DBNs [33,34,35,36]. Most of these methods use `approximate' updating algorithms.

DBNs are created in a similar manner to normal BNs, except the designer must also take temporal relationships between slices into account. Usually DBNs are specified by supplying some `reference' network, and software is used to expand, contract, and update the network. The implementation steps of a DBN are summarised in figure 3.4.


  
Figure 3.4: DBN implementation flowchart
\includegraphics{figures/DBNflowchart.eps}

Applications

Due to their handling of uncertainty causal relationships, belief networks and dynamic belief networks can be applied to almost any environment which is capable of providing some form of evidence. Most limitations to the applicability of BN theory usually lie with the design and inference of the network models being used. Some applications of these theories include medical diagnosis and decision support [28,37,38, page 457], fault detection in machinery, user modelling [39], pattern recognition and classification [40,41], modelling gene expression [42], expert systems [43], robot navigation [44,45], finance [46] and classification to name a few.

Existing software

There are a large number of existing software systems developed and used to perform uncertain or probabilistic reasoning, which find use in some of the above applications. A few of these systems are described here.

Netica

[47] The Netica API is a software library of functions for implementing belief networks. A Netica application is also available from it creators, Norsys, for creating and viewing belief networks in a graphical interface. Netica uses a fast exact inference algorithm implementing message passing in a join tree or junction tree, which requires the network be `compiled' before updating [48,49]. The Netica software library is used for this project. Although the Netica GUI does provide some functions to `expand time slices', these could not be used to create DBNs.

Pathfinder system

This system was started during the 1980s by the Stanford Medical Computer Science program as a `diagnostic expert system' for diagnosing lymph-node diseases. Four versions of the Pathfinder system were developed, later versions using static belief networks. After consultation between Pathfinder's developers and expert physicians, the system was capable of dealing with more than 60 diseases and 100 symptoms. During trials on actual patients, the Pathfinder achieved 89% accuracy, and more recent versions outperform the physicians [28, page 547].

SNePS

[50] SNePS stands for Schematic Network Programming System. Developed by the State University of New York at Buffalo, it is ``a system for building, using and retrieving information from propositional semantic networks'' [50]. Like BN and DBN theory, SNePS processes information using networks structures, however the network relates information (as opposed to variables which may take on different states) and new information is added to the network in the form new nodes.


next up previous contents
Next: System Design Overview Up: No Title Previous: Human Ambulation and Falls
Daniel J Willis
2000-10-23