0% found this document useful (0 votes)

189 views11 pages

Learning Bayesian Networks With R: Susanne G. Bøttcher Claus Dethlefsen

This document summarizes a software package called deal that is available for R. Deal allows users to analyze data using Bayesian networks with both discrete and continuous variables, though it is restricted to conditionally Gaussian networks. It includes methods for constructing priors for network parameters, learning parameters from data using conjugate updating, calculating network scores to learn network structure, and performing heuristic searches. The package can be downloaded from CRAN and used freely for non-commercial purposes.

Uploaded by

rabbityeah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

189 views11 pages

Learning Bayesian Networks With R: Susanne G. Bøttcher Claus Dethlefsen

Uploaded by

rabbityeah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Proceedings of the 3rd International Workshop

on Distributed Statistical Computing (DSC 2003)

March 20–22, Vienna, Austria ISSN 1609-395X
Kurt Hornik, Friedrich Leisch & Achim Zeileis (eds.)
http://www.ci.tuwien.ac.at/Conferences/DSC-2003/

Learning Bayesian Networks with R

Susanne G. Bøttcher Claus Dethlefsen

Abstract
deal is a software package freely available for use with R. It includes several
methods for analysing data using Bayesian networks with variables of discrete
and/or continuous types but restricted to conditionally Gaussian networks.
Construction of priors for network parameters is supported and their param-
eters can be learned from data using conjugate updating. The network score
is used as a metric to learn the structure of the network and forms the basis
of a heuristic search strategy. deal has an interface to Hugin.

1 Introduction
A Bayesian network is a graphical model that encodes the joint probability distri-
bution for a set of random variables. Bayesian networks are treated in e.g. Cowell,
Dawid, Lauritzen, and Spiegelhalter (1999) and have found application within many
fields, see Lauritzen (2003) for a recent overview.
Here we consider Bayesian networks with mixed variables, i.e. the random vari-
ables in a network can be of both discrete and continuous types. A method for
learning the parameters and structure of such Bayesian networks has recently been
described by Bøttcher (2001). We have developed a package called deal, writ-
ten in R (Ihaka and Gentleman, 1996), which provides these methods for learn-
ing Bayesian networks. In particular, the package includes procedures for defin-
ing priors, estimating parameters, calculating network scores, performing heuristic
search as well as simulating data sets with a given dependency structure. The
package can be downloaded from the Comprehensive R Archive Network (CRAN)
http://cran.R-project.org/ and may be used freely for non-commercial pur-
poses.
In Section 2 we define Bayesian networks for mixed variables. To learn a
Bayesian network, the user needs to supply a training data set and represent any
prior knowledge available as a Bayesian network. Section 3 discusses how to specify
a Bayesian network in terms of a directed acyclic graph and the local probability
distributions. deal uses the prior Bayesian network to deduce prior distributions
Proceedings of DSC 2003 2

for all parameters in the model. Then, this is combined with the training data
to yield posterior distributions of the parameters. The parameter learning proce-
dure is treated in Section 4. Section 5 describes how to learn the structure of the
network. A network score is calculated and a search strategy is employed to find
the network with the highest score. This network gives the best representation of
data and we call it the posterior network. Section 6 describes how to transfer the
posterior network to Hugin (http://www.hugin.com). The Hugin graphical user
interface (GUI) can then be used for further inference in the posterior network.

2 Bayesian networks
Let D = (V, E) be a Directed Acyclic Graph (DAG), where V is a finite set of
nodes and E is a finite set of directed edges (arrows) between the nodes. The DAG
defines the structure of the Bayesian network. To each node v ∈ V in the graph
corresponds a random variable Xv . The set of variables associated with the graph
D is then X = (Xv )v∈V . Often, we do not distinguish between a variable Xv and
the corresponding node v. To each node v with parents pa(v), a local probability
distribution, p(xv |xpa(v) ) is attached. The set of local probability distributions for
all variables in the network is P. A Bayesian network for a set of random variables
X is then the pair (D, P).
The possible lack of directed edges in D encodes conditional independencies
between the random variables X through the factorization of the joint probability
distribution, Y
p(x) = p xv |xpa(v) .
v∈V

Here, we allow Bayesian networks with both discrete and continuous variables,
as treated in Lauritzen (1992), so the set of nodes V is given by V = ∆ ∪ Γ, where
∆ and Γ are the sets of discrete and continuous nodes, respectively. The set of
variables X can then be denoted X = (Xv )v∈V = (I, Y ) = ((Iδ )δ∈∆ , (Yγ )γ∈Γ ),
where I and Y are the sets of discrete and continuous variables, respectively. For a
discrete variable, δ, we let Iδ denote the set of levels.
To ensure availability of exact local computation methods, we do not allow
discrete variables to have continuous parents. The joint probability distribution
then factorizes into a discrete part and a mixed part, so
Y Y
p(x) = p(i, y) = p iδ |ipa(δ) p yγ |ipa(γ) , ypa(γ) .
δ∈∆ γ∈Γ

3 Specification of a Bayesian network

In deal, a Bayesian network is represented as an object of class network. The
network object has several attributes, added or changed by methods described in
the following sections. A network is generated from a dataframe (here ksl), where
the discrete variables are specified as factors,
ksl.nw <- network(ksl)
and default it is set to the empty network (the network without any arrows). If the
option specifygraph is set to TRUE, a point and click graphical interface allows the
user to insert and delete arrows until the requested DAG is obtained.
Proceedings of DSC 2003 3

The primary attribute of a network is the list of nodes, in the example: ksl.nw$nodes.
Each entry in the list is an object of class node representing a node in the graph,
which includes information associated with the node. Several methods for the net-
work class operate by applying an appropriate method for one or more nodes in the
list of nodes.

3.1 Specification of the probability distributions

The joint distribution of the random variables in a network in deal is a CG distri-
bution.
For discrete nodes, this means that the local probability distributions are unre-
stricted discrete distributions. We parameterize this as

θiδ |ipa(δ) = p iδ |ipa(δ) , θδ|ipa(δ) ,
P
where θδ|ipa(δ) = (θiδ |ipa(δ) )iδ ∈Iδ . The parameters fulfil iδ ∈Iδ θiδ |ipa(δ) = 1 and
0 ≤ θiδ |ipa(δ) ≤ 1.
For continuous nodes, the local probability distributions are Gaussian linear re-
gressions on the continuous parents with parameters depending on the configuration
of the discrete parents. We parameterize this as
2

θγ|ipa(γ) = mγ|ipa(γ) , βγ|ipa(γ) , σγ|i pa(γ)
,

A suggestion for the local probability distributions is generated and attached to

each node as the attribute prob. The suggestion can then be edited afterwards.
For a discrete variable δ, the suggested local probability distribution p(iδ |ipa(δ) )
is taken to be uniform over the levels for each parent configuration, i.e.

p(iδ |ipa(δ) ) = 1/Iδ .

Define zpa(γ) = (1, ypa(γ) ) and let ηγ|ipa(γ) = (mγ|ipa(γ) , βγ|ipa(γ) ), where mγ|ipa(γ)
is the intercept and βγ|ipa(γ) is the vector of coefficients. For a continuous variable γ,
2
the suggested local probability distribution N (zpa(γ) ηγ|ipa(γ) , σγ|i pa(γ)
) is determined
as a regression on the continuous parents for each configuration of the discrete
parents.

3.2 The joint distribution

We now show how the joint probability distribution of a network can be calculated
from the local probability distributions.
For the discrete part of the network, the joint probability distribution is found
as Y
p(i) = p iδ |ipa(δ) .
δ∈∆

For continuous variables, the joint distribution N (Mi , Σi ) is determined for each
configuration of the discrete variables by applying a sequential algorithm developed
in Shachter and Kenley (1989).
In deal, we can assess these quantities by
Proceedings of DSC 2003 4

ksl.j <- jointprior(ksl.nw)

and inspect the attributes jointmu, containing Mi , jointsigma, containing Σi , and
jointalpha. The discrete part, p(i), is not returned directly but may be deduced
from ksl.j$jointalpha by division by sum(ksl.j$jointalpha).

4 Parameter learning
To estimate the parameters in the network, we use the Bayesian approach. We
encode our uncertainty about parameters θ in a prior distribution p(θ), use data d
to update this distribution, and hereby obtain the posterior distribution p(θ|d) by
using Bayes’ theorem,

p(d|θ)p(θ)
p(θ|d) = , θ ∈ Θ. (1)
p(d)

Here Θ is the parameter space, d is a random sample from the probability distri-
bution p(x|θ) and p(d|θ) is the joint probability distribution of d, also called the
likelihood of θ. We refer to this as parameter learning or just learning.
In deal, we assume that the parameters associated with one variable are inde-
pendent of the parameters associated with the other variables and, in addition, that
the parameters are independent for each configuration of the discrete parents, i.e.
Y Y Y Y
p(θ) = p(θδ|ipa(δ) ) p(θγ|ipa(γ) ), (2)
δ∈∆ ipa(δ) ∈Ipa(δ) γ∈Γ ipa(γ) ∈Ipa(γ)

We refer to (2) as parameter independence. Further, as we have assumed complete

data, the parameters stay independent given data, see Bøttcher (2001). This means
that we can learn the parameters of a node independently of the parameters of the
other nodes, i.e. we update the local parameter prior p(θv|ipa(v) ) for each node v
and each configuration of the discrete parents.
As local prior parameter distributions we use the Dirichlet distribution for the
discrete variables and the Gaussian inverse-Gamma distribution for the continuous
variables. These distributions are conjugate to observations from the respective
distributions and this ensures simple calculations of the posterior distributions.
In the next section we present an automated procedure for specifying the local
parameter priors associated with any possible DAG. The procedure is called the
master prior procedure. For the mixed case it is treated in Bøttcher (2001), for the
purely discrete and the purely continuous cases it is treated in Heckerman, Geiger,
and Chickering (1995) and Geiger and Heckerman (1994), respectively.

4.1 The master prior procedure

The idea in the master prior procedure is that from a given Bayesian network we
can deduce parameter priors for any possible DAG. The user just has to specify the
Bayesian network as he believes it to be. We call this network a prior Bayesian
network.
1. Specify a prior Bayesian network, i.e. a prior DAG and prior local probability
distributions. Calculate the joint prior distribution.
Proceedings of DSC 2003 5

2. From this joint prior distribution, the marginal distribution of all parameters
in the family consisting of the node and its parents can be determined. We
call this the master prior.
3. The local parameter priors are now determined by conditioning in the master
prior distribution.
This procedure ensures parameter independence. Further, it has the property
that if a node has the same set of parents in two different networks, then the local
parameter prior for this node will be the same in the two networks. Therefore,
we only have to deduce the local parameter prior for a node given the same set of
parents once. This property is called parameter modularity.

4.2 Master prior for discrete nodes

Let Ψ = (Ψi )i∈I be the parameters for the joint distribution of the discrete variables.
The joint prior parameter distribution is assumed to be a Dirichlet distribution

p(Ψ) ∼ D(α),

with hyperparameters α = (αi )i∈I . To specify this Dirichlet distribution, we need

to specify these hyperparameters. Consider the following relation for the Dirichlet
distribution,
αi
p(i) = E(Ψi ) = ,
N
P
with N = i∈I αi . Now we use the probabilities in the prior network as an estimate
of E(Ψi ), so we only need to determine N in order to calculate the parameters αi .
We determine N by using the notion of an imaginary data base. We imagine
that we have a data base of cases, from which we have updated the distribution
of Ψ out of total ignorance. The imaginary sample size of this imaginary data
base is thus N . It expresses how much confidence we have in the (in)dependencies
expressed in the prior network, see Heckerman et al. (1995).
We use this joint distribution to deduce the master prior distribution of the
family A = δ ∪ pa(δ). Let X
αiA = αj ,
j:jA =iA

and let αA = (αiA )iA ∈IA . Then the marginal distribution of ΨA is Dirichlet,
p(ΨA ) ∼ D(αA ). This is the master prior in the discrete case. The local parameter
priors can now be found by conditioning in these master prior distributions.

4.3 Master prior for continuous nodes

Bøttcher (2001) derived this procedure in the mixed case. For a configuration i of
the discrete variables we let νi = ρi = αi , where αi was determined in Section 4.2.
Also, Φi = (νi − 1)Σi .
The joint parameter priors are assumed to be distributed as

1
p(Mi |Σi ) = N µi , Σi
νi
p(Σi ) = IW(ρi , Φi ).
Proceedings of DSC 2003 6

We cannot use these distributions to derive priors for other networks, so instead
we use the imaginary data base to derive local master priors.
Define the notation X
ρiA∩∆ = ρj
j:jA∩∆ =iA∩∆

Again, the local parameter priors can be found by conditioning in these local master
priors.

4.4 The learning procedure in deal

The parameters of the joint distribution of the variables in the network are deter-
mined by the function jointprior() with the size of the imaginary data base as
optional argument. If the size is not specified, deal sets the size to a reasonably
small value.
ksl.prior <- jointprior(ksl.nw) ## auto set size of imaginary data base
ksl.prior <- jointprior(ksl.nw,12) ## set size of imaginary data base to 12
The parameters in the object ksl.prior may be assessed as the attributes jointalpha,
jointnu, jointrho and jointphi.
The procedure learn() determines the master prior, local parameter priors and
local parameter posteriors,
ksl.nw <- learn(ksl.nw,ksl,ksl.prior)$nw
The result is attached to each node as the attributes condprior and condposterior.
These contain the parameters in the local prior distribution and the parameters in
the local posterior distribution, respectively.

5 Learning the structure

In this section we will show how to learn the structure of the DAG from data.
The section is based on Bøttcher (2001), Heckerman et al. (1995) and Geiger and
Heckerman (1994).
As a measure of how well a DAG D represents the conditional independencies
between the random variables, we use the relative probability
S(D) = p(D, d) = p(d|D)p(D),
Proceedings of DSC 2003 7

and refer to it as a network score.

The network score factorizes into a discrete part and a mixed part as
Y Y Y Y
S(D) = Sδ|ipa(δ) (D) Sγ|ipa(γ) (D).
δ∈∆ ipa(δ) ∈Ipa(δ) γ∈Γ ipa(γ) ∈Ipa(γ)

Note that the network score is a product over terms involving only one node
and its parents. This property is called decomposability. It can be shown that the
network scores for two independence equivalent DAGs are equal. This property is
called likelihood equivalence and it is a property of the master prior procedure.
In deal we use, for computational reasons, the logarithm of the network score.
The log network score contribution of a node is evaluated whenever the node is
learned and the log network score is updated and is stored in the score attribute of
the network.

5.1 Model search

In principle, we could evaluate the network score for all possible DAGs. However,
the number of possible DAGs grows more than exponentially with the number
of nodes and if the number of random variables in a network is large, it is not
computationally possible to calculate the network score for all the possible DAGs.
For these situations a strategy for searching for DAGs with high score is needed.
In deal, the search strategy greedy search with random restarts, see e.g. Heckerman
et al. (1995), is implemented. As a way of comparing the network scores for two
different DAGs, D and D∗ , we use the posterior odds,

p(D|d) p(D, d) p(D) p(d|D)

∗
= ∗
= × ,
p(D |d) p(D , d) p(D ) p(d|D∗ )
∗

where p(D)/p(D∗ ) is the prior odds and p(d|D)/p(d|D∗ ) is the Bayes factor. At the
moment, the only option in deal for specifying prior distribution over DAGs is to
let all DAGs be equally likely, so the prior odds are always equal to one. Therefore,
we use the Bayes factor for comparing two different DAGs.
In greedy search we compare models that differ only by a single arrow, either
added, removed or reversed. In these cases, the Bayes factor is especially simple,
because of decomposability of the network score.
To manually assess the network score of a network (e.g. to use as initial network
in a search), use

ksl.nw <- drawnetwork(ksl.nw,ksl,ksl.prior)$nw

In the drawnetwork() procedure, it is possible to mark (ban) some of the arrows. In
the search, deal then disregards any DAG which contains any of these arrows, and
this reduces the search space.
The automated search algorithmis implemented in the function heuristic().
The initial network is perturbed according to the parameter degree and the search
is performed starting with the perturbed network. The process is restarted the
number of times specified by the option restart. A network family of all visited
networks is returned.

ksl.h <- heuristic(ksl.nw,ksl,ksl.prior,restart=10,degree=5)$nw

Proceedings of DSC 2003 8

6 Hugin interface
A network object may be written to a file in the Hugin .net language. Hugin
(http://www.hugin.com) is commercial software for inference in Bayesian net-
works. Hugin has the ability to learn networks with only discrete networks but
cannot learn either purely continuous or mixed networks. deal may therefore be
used for this purpose and the result can then be transferred to Hugin. The pro-
cedure savenet() saves a network to a file. For each node, we use point estimates
of the parameters in the local probability distributions. The readnet() procedure
reads the network structure from a file but does not, however, read the probability
distributions. This is planned to be included in a future version of deal.

7 Example
In this section, we describe the analysis of the ksl data that has been used as
illustration throughout the paper. The data set, included in Badsberg (1995), is
from a study measuring health and social characteristics of representative samples
of Danish 70-year old people, taken in 1967 and 1984. In total, 1083 cases have
been recorded and each case contains observations on nine different variables, see
Table 1.

Node index Variable Explanation

1 Fev Forced ejection volume – lung function
2 Kol Cholesterol
3 Hyp Hypertension (no/yes)
4 BMI Body Mass Index
5 Smok Smoking (no/yes)
6 Alc Alcohol consumption (seldom/frequently)
7 Work Working (yes/no)
8 Sex Gender (male/female)
9 Year Survey year (1967/1984)

Table 1: Variables in the ksl data set. The variables Fev, Kol, BMI are continuous
variables and the rest are discrete variables.

The purpose of our analysis is to find dependency relations between the vari-
ables. One interest is to determine which variables influence the presence or absence
of hypertension. From a medical viewpoint, it is possible that hypertension is in-
fluenced by some of the continuous variables Fev, Kol and BMI. However, in deal
we do not allow continuous parents of discrete nodes, so we cannot describe such a
relation. A way to overcome this problem is to treat Hyp as a continuous variable,
even though this is obviously not most natural. This is done in the analysis below.
Further, the initial data analysis indicates a transformation of BMI into log(BMI).
With these adjustments, the data set is ready for analysis in deal.
We have no prior knowledge about specific dependency relations, so for simplicity
we use the empty DAG as the prior DAG and let the probability distribution of
the discrete variables be uniform. The assessment of the probability distribution
for the continuous variables is based on data, as described in Section 3.1.

ksl.nw <- network(ksl) # specify prior network

Proceedings of DSC 2003 9

ksl.prior <- jointprior(ksl.nw) # make joint prior distribution

We do not allow arrows into Sex and Year, as none of the other variables can
influence these variables. So we create a ban list which is attached to the network.
The ban list is a matrix with two columns. Each row contains the directed edge
that is not allowed.

## ban arrows towards Sex and Year

banlist <- matrix(c(5,5,6,6,7,7,9,
8,9,8,9,8,9,8),ncol=2)
ksl.nw$banlist <- banlist

Finally, the parameters in the network are learned and structural learning is
used with the prior DAG as starting point.

ksl.nw <- learn(ksl.nw,ksl,ksl.prior)$nw

result <- heuristic(ksl.nw,ksl,ksl.prior,restart=2,degree=10,trace=TRUE)
thebest <- result$nw[[1]]
savenet(thebest, "ksl.net")

Kol

Year
logBMI

Hyp Sex

FEV

Smok Work
Alc

Figure 1: The network with the highest score, log(score) = −15957.91.

The resulting network thebest is shown in Figure 1 and it is the network with
the highest network score among those networks that have been tried through the
search.
In the result we see for the discrete variables that Alc, Smok and Work depend
directly on Sex and Year. In addition, Smok and Work also depend on Alc. These two
arrows are, however, not causal arrows, as Smok ← Alc → Work in the given DAG
represents the same probability distribution as the relations Smok ← Alc ← Work and
Smok → Alc → Work, i.e. the three DAGs are independence equivalent. Year and
Sex are independent on all variables, as specified in the ban list. For the continuous
Proceedings of DSC 2003 10

variables all the arrows are causal arrows. We see that Fev depends directly on Year,
Sex and Smok. So given these variables, Fev is conditional independent on the rest of
the variables. Kol depends directly on Year and Sex, and logBMI depends directly on
Kol and Sex. Given logBMI and Fev, the variable Hyp is conditionally independent on
the rest of the variables. So according to this study, hypertension can be determined
by the body mass index and the lung function forced ejection volume. However, as
Hyp is not continuous by nature, other analyses should be performed with Hyp as a
discrete variable, e.g. a logistic regression with Hyp as a response and the remaning
as explanatory variables. Such an analysis indicates that, in addition, Sex and Smok
may influence Hyp but otherwise identifies logBMI as the main predictor.

8 Discussion and future work

deal is a tool box that adds functionality to R so that Bayesian networks may
be used in conjunction with other statistical methods available in R for analysing
data. In particular, deal is part of the gR project, which is a newly initiated
workgroup with the aim of developing procedures in R for supporting data analysis
with graphical models, see http://www.r-project.org/gR.
In addition to methods for analysing networks with either discrete or continuous
variables, deal handles networks with mixed variables. deal has some limitations
and we plan to extend the package with the procedures described below. Also,
it is the intention that the procedures in deal will eventually be adjusted to the
other procedures developed under the gR project. The methods in deal are only
applicable on complete data sets and in the future we would like to incorporate
procedures for handling data with missing values and networks with latent variables.
The criteria for comparing the different network structures in deal is the BDe
criteria. We intend to also incorporate the Bayesian Information Criteria (BIC)
and Akaikes Information Criteria (AIC) and let it be up to the user to decide which
criteria to use. Another possible extension of deal is to incorporate procedures for
specifying mixed networks, where the variance in the mixed part of the network does
not depend on the discrete parents, but the mean does. Finally, we are working on
an implementation of the greedy equivalence search (GES) algorithm, see Chickering
(2002), which is an algorithm for search between equivalence classes. Asymptoti-
cally, for the size of the database tending to infinity, this algorithm guarantees that
the search terminates with the network with the highest network score.

Acknowledgements
The work has been supported by Novo Nordisk A/S.

References
J.H. Badsberg. An Environment for Graphical Models. PhD thesis, Aalborg Uni-
versity, 1995.
S.G. Bøttcher. Learning Bayesian networks with mixed variables. In Proceedings of
the Eighth International Workshop in Artificial Intelligence and Statistics, 2001.
Proceedings of DSC 2003 11

D.M. Chickering. Optimal structure identification with greedy search. Journal of

Machine Learning Research, 3:507–554, November 2002.
R.G. Cowell, A.P. Dawid, S.L. Lauritzen, and D.J. Spiegelhalter. Probabilistic
Networks and Expert Systems. Springer-Verlag, 1999.
D. Geiger and D. Heckerman. Learning Gaussian networks. Technical Report MSR-
TR-94-10, Microsoft Research, 1994.
D. Heckerman, D. Geiger, and D. Chickering. Learning Bayesian networks: The
combination of knowledge and statistical data. Machine Learning, 1995.
R. Ihaka and R. Gentleman. R: A language for data analysis and graphics. Journal
of Computational and Graphical Statistics, 5:299–314, 1996.
S.L. Lauritzen. Propagation of probabilities, means and variances in mixed graphical
association models. Journal of the American Statistical Association, 82:1082–
1108, 1992.
S.L. Lauritzen. Some modern applications of graphical models. In P.J. Green, N.L.
Hjort, and S. Richardson, editors, Highly Structured Stochastic Systems. Oxford
University Press, 2003.
R.D. Shachter and C.R. Kenley. Gaussian influence diagrams. Management Science,
35:527–550, 1989.

Corresponding author
Claus Dethlefsen
Dept. of Mathematical Sciences
Aalborg University
Fr. Bajers Vej 7G
9220 Aalborg, Denmark
E-mail: [email protected]

Probabilistic Inferences in Bayesian Networks
No ratings yet
Probabilistic Inferences in Bayesian Networks
15 pages
Bayesian Network
No ratings yet
Bayesian Network
20 pages
Bayesian Networks
No ratings yet
Bayesian Networks
16 pages
A Direct Measure for the Efficacy of Bayesian Network Structures Learned from Data 1st Edition by Gary Holness 9783540734987 instant download
No ratings yet
A Direct Measure for the Efficacy of Bayesian Network Structures Learned from Data 1st Edition by Gary Holness 9783540734987 instant download
48 pages
22cse61 Module 4
No ratings yet
22cse61 Module 4
110 pages
learning bayesian networks 1st edition by Richard Neapolitan ISBN 0130125342 978-0130125347 - The full ebook version is ready for instant download
100% (5)
learning bayesian networks 1st edition by Richard Neapolitan ISBN 0130125342 978-0130125347 - The full ebook version is ready for instant download
82 pages
learning bayesian networks 1st edition by Richard Neapolitan ISBN 0130125342 978-0130125347 - The full ebook with complete content is ready for download
No ratings yet
learning bayesian networks 1st edition by Richard Neapolitan ISBN 0130125342 978-0130125347 - The full ebook with complete content is ready for download
43 pages
BN DBN SSM HMM - ghahramani
No ratings yet
BN DBN SSM HMM - ghahramani
30 pages
SMITH - CROFT - 2003 - BN For Discrete Multivariate Data
No ratings yet
SMITH - CROFT - 2003 - BN For Discrete Multivariate Data
16 pages
Bayesian Networks
No ratings yet
Bayesian Networks
45 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
10 pages
Unit V -Graphical Models
No ratings yet
Unit V -Graphical Models
43 pages
BayesNets2016
No ratings yet
BayesNets2016
62 pages
EXP1_A09_DS
No ratings yet
EXP1_A09_DS
6 pages
Lecture 4
No ratings yet
Lecture 4
13 pages
2021 Lecture09 BayesianNetworks
No ratings yet
2021 Lecture09 BayesianNetworks
60 pages
4.2 Bayes-nets
No ratings yet
4.2 Bayes-nets
33 pages
Exp-1 Minor- AAI_9a7965ced37e86c83268ee3574807b37
No ratings yet
Exp-1 Minor- AAI_9a7965ced37e86c83268ee3574807b37
2 pages
BNetwork Presentation
No ratings yet
BNetwork Presentation
18 pages
13 Bayes Nets
No ratings yet
13 Bayes Nets
38 pages
Bayesian Neworks
No ratings yet
Bayesian Neworks
32 pages
Categorical Data
No ratings yet
Categorical Data
38 pages
Summative Test-G11-Statistics-and-Probability
No ratings yet
Summative Test-G11-Statistics-and-Probability
1 page
Group Activity3
No ratings yet
Group Activity3
4 pages
Statistics For Quality
No ratings yet
Statistics For Quality
170 pages
Learning Causal Bayesian Network Structures From Experimental Data - Byron Ellis Wing Hung Wong
No ratings yet
Learning Causal Bayesian Network Structures From Experimental Data - Byron Ellis Wing Hung Wong
39 pages
CM Week 15
No ratings yet
CM Week 15
15 pages
Bayesian Networks Analysis
No ratings yet
Bayesian Networks Analysis
51 pages
Bayesian Nets PDF
No ratings yet
Bayesian Nets PDF
14 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
10 pages
Lecture Bayesian Networks
No ratings yet
Lecture Bayesian Networks
50 pages
Good BayesianNetworksPrimer
No ratings yet
Good BayesianNetworksPrimer
23 pages
Lecture 5 Bayesian Networks
No ratings yet
Lecture 5 Bayesian Networks
12 pages
notes pqt
No ratings yet
notes pqt
17 pages
Bayesian Belief Network, Exact Inference, Approx Inference, Causal Network
No ratings yet
Bayesian Belief Network, Exact Inference, Approx Inference, Causal Network
15 pages
Chapter 9 (Solutions)(1)
No ratings yet
Chapter 9 (Solutions)(1)
14 pages
AI & ML Unit 2 Notes
No ratings yet
AI & ML Unit 2 Notes
12 pages
Determining Probability With Candy
No ratings yet
Determining Probability With Candy
20 pages
Prob Nets
No ratings yet
Prob Nets
21 pages
Bayesian Network Homework
100% (1)
Bayesian Network Homework
5 pages
Bayesian Networks: A Tutorial
No ratings yet
Bayesian Networks: A Tutorial
73 pages
Joint and marginal Probability
No ratings yet
Joint and marginal Probability
2 pages
ASHTIKA
No ratings yet
ASHTIKA
9 pages
Bayesian Networks in AI
No ratings yet
Bayesian Networks in AI
8 pages
Building Probabilistic Graphical Models With Python
No ratings yet
Building Probabilistic Graphical Models With Python
24 pages
BayesianNetworks Reduced
No ratings yet
BayesianNetworks Reduced
14 pages
Bayesian Network Models of Gene Expression Data
No ratings yet
Bayesian Network Models of Gene Expression Data
4 pages
Unit-4 Bayesian Networks
No ratings yet
Unit-4 Bayesian Networks
19 pages
随机分配的好处
100% (1)
随机分配的好处
7 pages
Using R For Data Analysis and Graphics An Introduction: J H Maindonald
No ratings yet
Using R For Data Analysis and Graphics An Introduction: J H Maindonald
112 pages
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
No ratings yet
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
58 pages
BayesianNetworks Reduced
No ratings yet
BayesianNetworks Reduced
14 pages
Standard Probability Distribution
No ratings yet
Standard Probability Distribution
3 pages
Statistics and Probability
No ratings yet
Statistics and Probability
11 pages
Continuous Random Variables and Probability Distributions
No ratings yet
Continuous Random Variables and Probability Distributions
26 pages
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
100% (1)
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
44 pages
SSP4SE Appa
No ratings yet
SSP4SE Appa
10 pages
Statistical Properties of Exponential Rayleigh Distribution and Its Applications To Medical Science and Engineering
No ratings yet
Statistical Properties of Exponential Rayleigh Distribution and Its Applications To Medical Science and Engineering
16 pages
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
No ratings yet
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
43 pages
Mathematics For Informatics 4a: The Story of The Film So Far..
No ratings yet
Mathematics For Informatics 4a: The Story of The Film So Far..
6 pages
Mathematical Definition of Probability
No ratings yet
Mathematical Definition of Probability
11 pages
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
Bayes Ball
No ratings yet
Bayes Ball
5 pages
STA 301 Quiz 50 % Correct Answer 6 Solved Quiz: And-Probability/home
No ratings yet
STA 301 Quiz 50 % Correct Answer 6 Solved Quiz: And-Probability/home
18 pages
One Function of Two Random Variables: Y X G Z X F F
No ratings yet
One Function of Two Random Variables: Y X G Z X F F
6 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Tutorial 4
No ratings yet
Tutorial 4
3 pages
Chapter 1 Lecture1
No ratings yet
Chapter 1 Lecture1
39 pages
R For Beginners: Emmanuel Paradis
No ratings yet
R For Beginners: Emmanuel Paradis
58 pages
Heckerman95 BN KnowledgePLUSData
No ratings yet
Heckerman95 BN KnowledgePLUSData
47 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
15 pages
4.6 Weibull Distributions
No ratings yet
4.6 Weibull Distributions
2 pages
Assignment I
100% (1)
Assignment I
2 pages
Assignment 2 Exercise 2.5
No ratings yet
Assignment 2 Exercise 2.5
3 pages
EECS6895 AdvancedBigDataAnalytics Lecture6
No ratings yet
EECS6895 AdvancedBigDataAnalytics Lecture6
81 pages
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
No ratings yet
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
8 pages
Introduction To Support Vector Machines: Hsuan-Tien Lin
No ratings yet
Introduction To Support Vector Machines: Hsuan-Tien Lin
20 pages
Poisson Distribution and Traffic Applications
100% (1)
Poisson Distribution and Traffic Applications
2 pages
Support Vector Machines Jie Tang
No ratings yet
Support Vector Machines Jie Tang
28 pages
One of The Difficulties About Probability Is Realizing When Two Different Looking Problems Are
No ratings yet
One of The Difficulties About Probability Is Realizing When Two Different Looking Problems Are
1 page
Sa
No ratings yet
Sa
4 pages
PSet1 - Solnb Solutiond
No ratings yet
PSet1 - Solnb Solutiond
10 pages
Bayesian Belief Network in Artificial Intelligence
No ratings yet
Bayesian Belief Network in Artificial Intelligence
10 pages
Lecture 13: Bayesian Networks I: CS221 / Spring 2019 / Charikar & Sadigh
No ratings yet
Lecture 13: Bayesian Networks I: CS221 / Spring 2019 / Charikar & Sadigh
76 pages
Probability and Stochastic Processes 3rd Edition Quiz Solutions
100% (2)
Probability and Stochastic Processes 3rd Edition Quiz Solutions
90 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Exponential Distribution
No ratings yet
Exponential Distribution
4 pages
F (X, Y) y X 1: Assignment-5 (B)
No ratings yet
F (X, Y) y X 1: Assignment-5 (B)
4 pages
Bayesian Network: Fundamentals and Applications
From Everand
Bayesian Network: Fundamentals and Applications
Fouad Sabry
No ratings yet
Axiomatic Probability and Concepts
No ratings yet
Axiomatic Probability and Concepts
6 pages
Learning Bayesian Networks (Neapolitan, Richard) PDF
100% (1)
Learning Bayesian Networks (Neapolitan, Richard) PDF
704 pages
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Dynamic Bayesian Networks: Fundamentals and Applications
From Everand
Dynamic Bayesian Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Learning Bayesian Networks With R: Susanne G. Bøttcher Claus Dethlefsen

Uploaded by

Learning Bayesian Networks With R: Susanne G. Bøttcher Claus Dethlefsen

Uploaded by

Proceedings of the 3rd International Workshop

on Distributed Statistical Computing (DSC 2003)

Learning Bayesian Networks with R

3 Specification of a Bayesian network

3.1 Specification of the probability distributions

A suggestion for the local probability distributions is generated and attached to

p(iδ |ipa(δ) ) = 1/Iδ .

3.2 The joint distribution

ksl.j <- jointprior(ksl.nw)

We refer to (2) as parameter independence. Further, as we have assumed complete

4.1 The master prior procedure

4.2 Master prior for discrete nodes

with hyperparameters α = (αi )i∈I . To specify this Dirichlet distribution, we need

4.3 Master prior for continuous nodes

4.4 The learning procedure in deal

5 Learning the structure

and refer to it as a network score.

5.1 Model search

p(D|d) p(D, d) p(D) p(d|D)

ksl.nw <- drawnetwork(ksl.nw,ksl,ksl.prior)$nw

ksl.h <- heuristic(ksl.nw,ksl,ksl.prior,restart=10,degree=5)$nw

Node index Variable Explanation

ksl.nw <- network(ksl) # specify prior network

ksl.prior <- jointprior(ksl.nw) # make joint prior distribution

## ban arrows towards Sex and Year

ksl.nw <- learn(ksl.nw,ksl,ksl.prior)$nw

Figure 1: The network with the highest score, log(score) = −15957.91.

8 Discussion and future work

D.M. Chickering. Optimal structure identification with greedy search. Journal of

You might also like