sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis...

22
This document describes the file formats used by JavaBayes 1. Introduction.................................................2 2. BIF version 0.20.............................................3 3. BIF version 0.15.............................................8 4. BIF version 0.10.............................................8 5. XMLBIF version 0.50..........................................8 6. XMLBIF version 0.40.........................................15 7. XMLBIF version 0.30.........................................15 8. XMLBIF-EVIDENCE 0.50........................................16 Author: André Hideaki Saheki [email protected]

Transcript of sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis...

Page 1: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

This document describes the file formats used by JavaBayes

1. Introduction..................................................................................................................................22. BIF version 0.20...........................................................................................................................33. BIF version 0.15...........................................................................................................................84. BIF version 0.10...........................................................................................................................85. XMLBIF version 0.50..................................................................................................................86. XMLBIF version 0.40................................................................................................................157. XMLBIF version 0.30................................................................................................................158. XMLBIF-EVIDENCE 0.50........................................................................................................16

Author: André Hideaki [email protected]

Page 2: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

1. IntroductionLoading and saving data in JavaBayes

Data can be locally loaded/saved when you use JavaBayes as an application. Note that applets cannot load/save data (they are forbidden by the browsers)! Applications and applets can read Bayesian networks through the Internet; this opens the possibility that JavaBayes be used to help process and organize the huge amounts of data and knowledge in the Internet. This section contains a detailed description of the formats that can be manipulated by JavaBayes. If you have no interest on this kind of information (if you are not reading/writing files for JavaBayes), you can skip this section entirely.

All the formats

There are six different formats to store networks. JavaBayes is able to write to two of these formats and read from five of them.The Bayesian Interchange Format version 0.10 (BIF 0.10) is a simple format, that has been succesfully used to represent a variety of networks. But BIF 0.10 had certain problems, and has been replaced by BIF version 0.15. Support to BIF 0.10 has been dropped from the current version of JavaBayes.BIF 0.20 is an improvement over BIF0.15, and JavaBayes does not save files in BIF 0.10 and 0.15 anymore. You can choose between XMLBIF 0.50 and BIF 0.20 in the Options menu, with options to use save probabilities as a single table or individual entries.XMLBIF 0.30 is an experimental format, based on the XML 1.0 specification. It has been superseded by the XMLBIF 0.40 and 0.50 formats.The best way to understand it is to read about BIF 0.20, then read something about XML, then read the description of XMLBIF 0.50. For files, any extension is possible, but the extension bif is recommended for BIF 0.20, and the extension xml is used for XMLBIF 0.50.In summary, JavaBayes reads BIF 0.15, BIF 0.20, XMLBIF 0.30, XMLBIF 0.40 and XMLBIF 0.50, and writes to BIF 0.20 and XMLBIF 0.50.The preferred and most flexible format to use is XMLBIF 0.50.

Note that no format supports Noisy functions (since JavaBayes does not support those functions yet). The BIF formats also use the general concept of a property; implementations of the BIF format can use specific properties. JavaBayes handles some properties, such as observed, explanation and credal-set, which are explained later on.

Representing probability values

It is important to understand how the JavaBayes formats handle the specification of probability values. All distributions are specified as arrays of real numbers, and the meaning of the numbers depends on the definition of the distribution. Note that the same representation is used in internal arrays to store and manipulate probability values.

Page 3: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

The dog problem example is used again to show how probabilities are stored.The distribution p(f) in the example above can be specified as follows: 0.15, 0.85

Let's consider a more complicated example. The function p(d|f,b) is given by 0.99, 0.90, 0.97, 0.30, 0.01, 0.10, 0.03, 0.70

The logic is simple: proceed as if you were filling a table, where the indices of the table vary from the right to left (in the example above, it is like binary counting because all variables have only two values). A more complicated example would be a function p(A|B,C) where A has 3 values, B has 2 values and C has 4 values. The function is represented as: p(A1|B1 C1) p(A1|B1 C2) p(A1|B1 C3) p(A1|B1 C4) p(A1|B2 C1) p(A1|B2 C2) p(A1|B2 C3) p(A1|B2 C4) p(A2|B1 C1) p(A2|B1 C2) p(A2|B1 C3) p(A2|B1 C4) p(A2|B2 C1) p(A2|B2 C2) p(A2|B2 C3) p(A2|B2 C4) p(A3|B1 C1) p(A3|B1 C2) p(A3|B1 C3) p(A3|B1 C4) p(A3|B2 C1) p(A3|B2 C2) p(A3|B2 C3) p(A3|B2 C4).

IMPORTANT: Notice that there is some redundancy in the values, because all probability functions must add up to one. Right now the BayesianNetworks package does not attempt to fill blanks or ensure consistency; the user has to provide the data in the correct format (it has to have the correct number of values, has to add to one, etc). IMPORTANT: The BIF 0.20 and XMLBIF 0.50 save probabilities in a different order. Using the same concept of filling a table, the BIF format reads one columns after another, while the XMLBIF reads one row at a time. For example, the function p(d|f,b) is written as0.99, 0.90, 0.97, 0.30, 0.01, 0.10, 0.03, 0.70 in the BIF 0.20 format and0.99, 0.01, 0.90, 0.10, 0.97, 0.03, 030, 0.70in the XMLBIF 0.50 format.

2. BIF version 0.20 The BIF formats follow a syntax using blocks, similar to the way C or Java code is written.White spaces, tabs and newlines are ignored; the C/C++ style of comments is adopted. The ``,'' character is also ignored when it occurs between tokens.

Page 4: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

The basic unit of information is a block: a piece of text which starts with a keyword and ends with the end of an attribute list (to be explained later). Arbitrary characters are allowed between blocks. This allows the user to insert arbitrarily long comments outside the blocks. It also allows user-specific blocks and commands to be placed outside the standard blocks. Other than blocks, the BIF 0.20 refers to three entities: words, non-negative integers and non-negative reals.A word is a contiguous sequence of ASCII characters, enclosed by double quotes.A non-negative number is a sequence of numeric characters, containing a decimal point or an exponent or both. Blocks A block is a unit of information. The general format of a block is: block-type block-name { attribute-name attribute-value; attribute-name attribute-value; attribute-name attribute-value; }with as many attributes as necessary. The closing semicolon is mandatory after each attribute. There are three possible blocks: network, variable and probability blocks. A network block defines the name of the network and lists the properties. Example: network "Robot-Planning" { property “version 1.1”; property “author Nobody”; }Variable blocks define the variables in a network. Example: variable Leg { type discrete[2] { long, short }; property “temporary yes”; }Probability blocks specify the (conditional) probability tables (CPTs) for these variables, and hence the topology of the network. The block indicates the variables of the probability distribution right after the keyword probability. Example: probability ( "Leg" | "Arm" ) { table 0.1 0.9 0.9 0.1; }The blocks must be placed in the following order: A network declaration block (one, must be first). A series of variable declaration blocks and probability definition blocks, possibly inter-mixed. Attributes Several attributes are defined at this point: property, type, table, default and entry attributes (the entry attribute is not associated with any keyword). The attribute property can appear in all types of blocks. A property is just a string of arbitrary text to be associated with a block. Examples of properties: property "size 12"; property "name Trial number ten";Any text is valid in the string following keyword property. The idea is to store information that is specific to a particular system or network in the properties. Any number of property attributes can appear in a block. The text of the property must be enclosed by double quotes. There are attributes that are specific to probability blocks (these attributes are discussed in the next section): table lists a sequence of non-negative real numbers.

Page 5: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

default lists a sequence of non-negative real numbers. the entry attribute, which is not associated with any keyword. The JavaBayes properties JavaBayes uses a number of properties to load and save information about Bayesian networks: In the BIF 0.20 format, these properties are used only for informative purposes.The syntax for properties isproperty “<text>”;Network, variable and definition blocks may have any number of properties.

Variable BlocksA variable block is identifies by the keywork “variable” followed by the name of the variable.The type attribute is specific to variable blocks. The property type lists the values of a discrete variable: type discrete[ number-of-values ] { list-of-values };The number-of-values token is a non-negative integer which indicates how many different values this variable may assume (the size of the list-of-values). The list-of-values is a sequence of words, each one the name of a variable value.Position is also an attribute available only for variable blocks. It denotes the position of the node on the screen, with coordinates in pixels starting from the top-left corner of the screen.Another attribute for variables is mode. It may assume one of four different values in BIF 0.20:[nature|decision|utility|explanation]. Below is an example showing the position and mode attributes:variable "node1" { //2 categories

type discrete[2] { "true" "false" }; position = (69, 99); mode nature;

}

The only meaningful values for JavaBayes are nature or explanation. Nature is a normal bayesian network node, while explanation indicates that the variable is explanatory. The meaning of a explanatory variable is that you would like to know which value for the variable would produce the highest probability or expectation. It is not necessarily true that you can operate on the variable and change it at will; it is just that you want to know which value would be best in the face of evidence.If you request JavaBayes to produce the ``best'' configuration for the explanation variables, JavaBayes will only process the variables that are marked through an explanation mode.

Probability BlocksProbability blocks are used to define the actual network topology and conditional probability tables. An example of a standard probability block is: probability("GasGauge" | "Gas", "BatteryPower") { ("yes" "high") 0.999 0.001; ("yes" "low") 0.850 0.150; ("yes" "medium") 0.000 1.000; ("no" "high") 0.000 1.000; ("no" "low") 0.000 1.000; ("no" "medium") 0.000 1.000;}As explained before, the symbol `,'' is ignored between tokens so it does not affect the list of variables given after the keyword probability. The variables however must be enclosed by parenthesis. The following syntax would also be accepted (for each line):("yes", "high") 0.999 0.001;

Page 6: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

The example above uses the entry attribute, which is different from the other attributes in that it has no keyword. It simply starts with an opening parenthesis, and has a list of values for all the conditioning variables. After the closing parenthesis, a list of probability values for the first variable is given (the user must provide numbers that add to 1, but this is not mandatory). The probability vectors can be listed in any order, since the names in parentheses uniquely identify the parent instantiation. In addition to the entry attribute, the BIF 0.20 supports the concept of a default entry. So the above CPT could have been specified equivalently as: probability("GasGauge" | "Gas", "BatteryPower") { default 0.000 1.000; ("yes" "low") 0.850 0.150; ("no" "medium") 0.000 1.000;}Note that each number is a separate token, so we can use ``,'' between numbers. Another way to define a probability distribution is through the table attribute. The body of such attribute is a sequence of non-negative real numbers, in the counting order of the declared variables (if all variables were binary, we would say binary counting with least significant digit in the right). So, for the example above, we could simply say: probability("GasGauge" | "Gas", "BatteryPower") { table 0.999 0.850 0.0 0.0 0.0 0.0 0.001 0.15 1.0 1.0 1.0 1.0;}There are some subtle rules that regulate these declarations. If multiple default declarations exist, only the last one is valid. If multiple table declarations exist, only the last one is valid. A table can contain more elements than the necessary to specify a distribution; the excess elements are discarded. A table can contain less elements than the necessary to specify a distribution, which is then padded with zeros. Specified entries override conflicting default and table declarations.

Character formattingAll network, variable and category names must be enclosed in double quotes. If the name contains a double quote or a backslash they must be escaped by a backslash.All characters are accepted, expect new lines, tabs and backspaces.For JavaBayes 0.4, the encoding of the characters must be ASCII.

ImplementationThe implementation of BIF 0.20 is based on a set of rules written and compiled with JavaCC.

ExamplesHere are some of the available examples: dog-problem.bif, a very simple network based on the discussion at Charniak, E., Bayesian Networks without Tears, AI Magazine, 1991. elimbel2.bif, a simple network based on the second example in the Elimbel system. car-starts.bif, a somewhat large network contributed by Sreekanth Nagarajan, based on the automobile belief network that David Heckerman and Jack Breese presented in the March, 1995 issue of Communications of the ACM. alarm.bif, the famous Alarm network. Here is the dog-problem.bif network: // Bayesian network in BIF format// File generated by JavaBayes (http://www.cs.cmu.edu/~javabayes)

Page 7: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

// Fri Nov 28 14:00:56 GMT-03:00 2003

network "Dog_Problem" { //5 nodes }

variable "dog_out" { //2 categories type discrete[2] { "true" "false" }; position = (155, 165); mode nature;

}

probability ( "dog_out" | "bowel_problem" "family_out" ) { //3 variable(s) and 8 values table

0.99 0.97 0.9 0.3 0.01 0.03 0.1 0.7;}

variable "bowel_problem" { //2 categories type discrete[2] { "true" "false" }; position = (190, 69); mode nature;

}

probability ( "bowel_problem" ) { //1 variable(s) and 2 values table

0.01 0.99;}

variable "family_out" { //2 categories type discrete[2] { "true" "false" }; position = (112, 69); mode nature;

}

probability ( "family_out" ) { //1 variable(s) and 2 values table

0.15 0.85;}

variable "hear_bark" { //2 categories type discrete[2] { "true" "false" }; position = (154, 241); mode nature;

}

probability ( "hear_bark" | "dog_out" ) { //2 variable(s) and 4 values table

0.7 0.01 0.3 0.99;}

variable "light_on" { //2 categories type discrete[2] { "true" "false" };

Page 8: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

position = (73, 165); mode nature;

}

probability ( "light_on" | "family_out" ) { //2 variable(s) and 4 values table

0.6 0.05 0.4 0.95;}

3. BIF version 0.15The BIF version 0.15 is very similar to 0.20. The only differences is the lack of a “mode” attribute, the “position” attribute is written in a different way and a variable is defined as “explanatory” with an optional property “explanation”.The position of the node is written inside the variable block as:property "position = (x coordinate, y coordinate)" ;Below is sample variable block with the “explanation” property:variable "hear-bark" { //2 values

type discrete[2] { "true" "false" };property "explanation";property "position = (296, 268)" ;

}

4. BIF version 0.10The BIF version 0.10 is different from 0.15 because it does not use double quotes surrounding words. This limits the possible characters in a network, variable or category names to numbers, letters, underscore (_) and dash (-). An additional restriction is that the first character must be a letter.

5. XMLBIF version 0.50The XMLBIF format provides a different perspective for the storage and manipulation of Bayesian networks. Instead of focusing on a readable and simplified description of Bayesian networks, the XMLBIF format emphasizes ease of distribution through wide area networks. The XMLBIF format is defined through XML, a dialect of SGML that is used to specify formats. The advantage of XML is that it has industry-wide support, and many software developers plan to introduce parsers, search-engines, and browsers for XML. The power of XML is that it is a standard language for editing formats, and XMLBIF attempts to use XML to reduce to a minimum the burden of distributing graphical models to a large audience.

The XMLBIF format is actually quite similar to BIF 0.15, but it is stated in a manner that is XML-compliant. Note the similarity of XMLBIF to HTML; this happens because both HTML and XML are dialects of SGML. White spaces, tabs and newlines are ignored outside tags. The XML style of comments and declarations is used to detect text that should be ignored: any character between <! and > is ignored. Note that XML comments should be enclosed by <!- and ->. The XMLBIF format is defined by a set of XML-compliant tags. Other than XML tags, the XMLBIF 0.50 refers to three entities: words, non-negative integers and non-negative reals.

Page 9: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

A word is a contiguous sequence of characters, whose encoding is defined by the encoding attribute of the XML file.A non-negative number is a sequence of numeric characters, containing a decimal point or an exponent or both. Note that every XML file starts with the expression <?xml version="1.0"?>, indicating the XML version. Other attributes and directives can be contained within this tag; for example, the tag <?xml version="1.0" encoding="ISO-8859-1"?> specifies the file encoding. This initial tag is followed by any XML definitions and statements that define the DTD for the document (the DTD is always optional in XML).

SpecificationThe DTD for XMLBIF0.50 is:<!DOCTYPE BIF [<!ELEMENT BIF ( NETWORK )*>

<!ATTLIST BIF VERSION CDATA #REQUIRED><!ELEMENT NETWORK ( NAME, ( PROPERTY | VARIABLE | DEFINITION )* )> <!ATTLIST NETWORK TYPE (discrete|continuous|hybrid) "discrete"><!ELEMENT NAME (#PCDATA)><!ELEMENT VARIABLE ( NAME, ( OUTCOME | PROPERTY | POSITION)* ) >

<!ATTLIST VARIABLE TYPE (nature|decision|utility|explanation|gaussian) "nature"><!ELEMENT OUTCOME (#PCDATA)><!ELEMENT DEFINITION ( FOR | GIVEN | TABLE | ENTRY | PROPERTY )* ><!ELEMENT FOR (#PCDATA)><!ELEMENT GIVEN (#PCDATA)><!ELEMENT TABLE (#PCDATA)><!ELEMENT ENTRY (CATEGORY, LIST , MEAN , VARIANCE , REGRESSORS)*><!ELEMENT CATEGORY (#PCDATA)><!ELEMENT LIST (#PCDATA)><!ELEMENT MEAN (#PCDATA)><!ELEMENT VARIANCE (#PCDATA)><!ELEMENT REGRESSORS (#PCDATA)><!ELEMENT PROPERTY (#PCDATA)><!ELEMENT POSITION EMPTY>

<!ATTLIST POSITION X CDATA #REQUIRED Y CDATA #REQUIRED>]>The first tag of a XMLBIF 0.5 file is the <BIF> tag; the last tag is the closing </BIF> tag. All the information about the model is contained between these tags. There are three basic units of information: network, variable and definitions. A network is defined by its name, followed by a list of properties (optional), followed by a list of variables and probability densities. The network tag has an optional attribute TYPE. This attribute defines if the network is discrete(only discrete variables), gaussian(only continuous gaussian variables) or hybrid(both type of variables). In the absence of the attribute, it is assumed that the network is discrete.For example, a network may be defined as: <BIF VERSION="0.5"><NETWORK TYPE="discrete"><NAME>Dog-Problem</NAME><PROPERTY>date Sunday, 19 July, 1998</PROPERTY><PROPERTY>author John</PROPERTY>

variables and probabilities go here

Page 10: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

</NETWORK></BIF>

The VERSION attribute in the BIF tag is mandatory. Variables are defined by their names, types and properties: <VARIABLE TYPE="nature">

<NAME>light-on</NAME><OUTCOME>true</OUTCOME><OUTCOME>false</OUTCOME><POSITION X="30" Y="30"/><PROPERTY>any text can be used here</PROPERTY>

</VARIABLE>

Conditional probability densities and distributions can be specified in various ways inside the DEFINITION tag.One example for a discrete variable is: <DEFINITION>

<FOR>hear-bark</FOR><GIVEN>dog-out</GIVEN><TABLE>0.7 0.3 0.01 0.99</TABLE>

</DEFINITION>There is no mandatory order of variable and definition blocks.

A property is just a string of arbitrary text to be associated with a block. Examples of properties: <PROPERTY>size 12</PROPERTY> <PROPERTY>comment Trial number ten</PROPERTY>Any text is valid in the string inside the PROPERTY opening and closing tags. The idea is to store information that is specific to a particular system or network in the properties. Any number of property attributes can appear in a block.

VariablesA variable is defined by a NAME tag, and its possible OUTCOMES, if there are any. The TYPE attribute of the variable is “nature” for discrete variables, “explanation” for explanatory discrete variables and “gaussian” for gaussian variables. Only discrete variables have OUTCOME tags. Other possible values for TYPE are “decision” and “utility”, even though these are not treated by JavaBayes.POSITION is a tag without text, with two attributes: the X coordinate and Y coordinate, starting from the top-left point in the drawing area of the network.The “explanation” TYPE for a variable is used when it is desired to indicate that variable light-on is to be estimated. To accomplish this, light-on can be set as a explanation variable, i.e., a variable which will be estimated. The meaning of a explanatory variable is that it is wanted to know which value for the variable would produce the highest probability or expectation. It is not necessarily true that the variable can operated and change it at will; it is just that it desired to know which value would be best in the face of evidence.If JavaBayes is requested to produce the ``best'' configuration for the explanation variables, JavaBayes will only process the variables that are marked through an explanation property. There are also properties that are related to robustness analysis in JavaBayes. Since robustness analysis is still an ongoing research project, the support for it is minimal.

Definition

Page 11: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

The structure of the DEFINITION tag depends whether the variable is discrete or gaussian.

Definition of discrete variablesThe TABLE tag is specific to the DEFINITION block of discrete variables (note that a definition can be a probability distribution, a set of decision values, a set of utility values or the moment characteristics of gaussian variables, depending on the TYPE attributes of the referred variable). DEFINITION blocks are used to define the actual network topology, by specifying conditional probability tables and distributions.

An example of a standard definition block is: <DEFINITION>

<FOR>GasGauge</FOR><GIVEN>BatteryPower</GIVEN><TABLE>1.0 0.0 0.2 0.8 </TABLE>

</DEFINITION>

for a variable GasGauge that is defined with TYPE equal to ``nature” and has the variable BatteryPower as its only parent. The body of the TABLE tag is a sequence of non-negative real numbers, in the counting order of the declared variables (if all variables were binary, we would say binary counting with least significant digit in the right). If multiple table declarations exist, only the last one is valid. The same definition could be written as separate entries for each combination of parents.<DEFINITION>

<FOR>GasGauge</FOR><GIVEN>BatteryPower</GIVEN><ENTRY><CATEGORY>true</CATEGORY><LIST>1.0 0.0</LIST></ENTRY><ENTRY><CATEGORY>false</CATEGORY><LIST>0.2 0.8</LIST></ENTRY>

</DEFINITION>In this form, each ENTRY corresponds to a combination of parents. There are one or more CATEGORY tags, each for a parent. The single LIST tag represents the probabilities for the variable for this combination of parents.Variables with the “explanation” attribute have exactly the same definition block of “nature” variables.

Definition of gaussian variables without discrete parentsGaussian variables without discrete parents also have their distribution specified in definitions blocks.<DEFINITION><FOR>gaussian_node1</FOR><GIVEN>gaussian_parent1</GIVEN><ENTRY><MEAN>1.0</MEAN><VARIANCE>1.0</VARIANCE><REGRESSORS>4.0</REGRESSORS></ENTRY>

Page 12: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

</DEFINITION>Above is a sample definition for a variable gaussian_node1 with a parent gaussian_parent1. The presentation of parent-child relationships is the same as with discrete variables, with the variable name in the FOR tag and its parents in the GIVEN tag.There is only one ENTRY tag. Inside the ENTRY there are mandatory MEAN and VARIANCE tags, each one containing a double value, with the restriction that VARIANCE is a positive non-zero value. A REGRESSOR tag must be present if the node has parents. The REGRESSOR tag is a list of doubles, with each value representing a linear function between the variable and each one of its parents. For example, for a variable with 5 gaussian parents, the REGRESSOR tag must contain 5 double values.

Definition of gaussian variables with discrete parentsGaussian variables with discrete parents differ from the above in the ENTRY BLOCK.Suppose there is a gaussian variable with two gaussian and two discrete binary (true, false) parents. Its definition block would be:<DEFINITION><FOR>gaussian_node1</FOR><GIVEN>gaussian_parent1</GIVEN><GIVEN>gaussian_parent2</GIVEN><GIVEN>discrete_parent1</GIVEN><GIVEN>discrete_parent2</GIVEN><ENTRY><CATEGORY>true</CATEGORY><CATEGORY>false</CATEGORY><MEAN>1.2</MEAN><VARIANCE>0.5</VARIANCE><REGRESSORS>4.0 2.0</REGRESSORS></ENTRY>

… Three more entries…

</DEFINITION>

As can be seen above, the structure combines elements from the definition of discrete variables (in the entry style) and also from gaussian variables without discrete parents.There should be one ENTRY for each combination of the discrete parents. Inside each ENTRY there are MEAN, VARIANCE and REGRESSORS tags. Again, the REGRESSORS tag only exists if there are gaussian parents.The order of GIVEN tags must be respected inside the groups of discrete and gaussian parents. This means that the following order represents the same as above:<GIVEN>gaussian_parent1</GIVEN><GIVEN>discrete_parent1</GIVEN><GIVEN>gaussian_parent2</GIVEN><GIVEN>discrete_parent2</GIVEN>

However the following order is different from the previously presented:<GIVEN>gaussian_parent2</GIVEN><GIVEN>gaussian_parent1</GIVEN><GIVEN>discrete_parent1</GIVEN><GIVEN>discrete_parent2</GIVEN>

Page 13: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

ImplementationXMLBIF is based on the xml specification 1.0. The implementation of loading/saving functions in JavaBayes 0.4 uses a validating SAX2 parser.Unless of are using a JRE (Java Runtime Environment) 1.4 or higher you need to install a SAX2 parser to use XML formats with JavaBayes.A reference implementation is available at HTTP://www.saxproject.org

ExamplesHere are some of the available examples: dog-problem.xml, a very simple network based on the discussion at Charniak, E., Bayesian Networks without Tears, AI Magazine, 1991. Here is the dog-problem.xml network:

<?xml version="1.0" encoding="ISO-8859-1"?><!--Bayesian network in XMLBIF v0.5 (BayesNet Interchange Format)Produced by JavaBayes (http://www.cs.cmu.edu/~javabayes/--><!-- DTD for the XMLBIF 0.5 format -->

<!DOCTYPE BIF [<!ELEMENT BIF ( NETWORK )*>

<!ATTLIST BIF VERSION CDATA #REQUIRED><!ELEMENT NETWORK ( NAME, ( PROPERTY | VARIABLE | DEFINITION )* )> <!ATTLIST NETWORK TYPE (discrete|continuous|hybrid) "discrete"><!ELEMENT NAME (#PCDATA)><!ELEMENT VARIABLE ( NAME, ( OUTCOME | PROPERTY | POSITION)* ) >

<!ATTLIST VARIABLE TYPE (nature|decision|utility|explanation|gaussian) "nature"><!ELEMENT OUTCOME (#PCDATA)><!ELEMENT DEFINITION ( FOR | GIVEN | TABLE | ENTRY | PROPERTY )* ><!ELEMENT FOR (#PCDATA)><!ELEMENT GIVEN (#PCDATA)><!ELEMENT TABLE (#PCDATA)><!ELEMENT ENTRY (CATEGORY, LIST , MEAN , VARIANCE , REGRESSORS)*><!ELEMENT CATEGORY (#PCDATA)><!ELEMENT LIST (#PCDATA)><!ELEMENT MEAN (#PCDATA)><!ELEMENT VARIANCE (#PCDATA)><!ELEMENT REGRESSORS (#PCDATA)><!ELEMENT PROPERTY (#PCDATA)><!ELEMENT POSITION EMPTY>

<!ATTLIST POSITION X CDATA #REQUIRED Y CDATA #REQUIRED>]>

<BIF VERSION="0.5"><NETWORK><NAME>Dog_Problem</NAME><VARIABLE TYPE="nature"><NAME>dog_out</NAME><OUTCOME>true</OUTCOME>

Page 14: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

<OUTCOME>false</OUTCOME><POSITION X="155" Y="165"/></VARIABLE><DEFINITION><FOR>dog_out</FOR><GIVEN>bowel_problem</GIVEN><GIVEN>family_out</GIVEN><TABLE>0.99 0.01 0.97 0.03 0.9 0.1 0.3 0.7</TABLE></DEFINITION><VARIABLE TYPE="nature"><NAME>bowel_problem</NAME><OUTCOME>true</OUTCOME><OUTCOME>false</OUTCOME><POSITION X="190" Y="69"/></VARIABLE><DEFINITION><FOR>bowel_problem</FOR><TABLE>0.01 0.99</TABLE></DEFINITION><VARIABLE TYPE="nature"><NAME>family_out</NAME><OUTCOME>true</OUTCOME><OUTCOME>false</OUTCOME><POSITION X="112" Y="69"/></VARIABLE><DEFINITION><FOR>family_out</FOR><TABLE>0.15 0.85</TABLE></DEFINITION><VARIABLE TYPE="nature"><NAME>hear_bark</NAME><OUTCOME>true</OUTCOME><OUTCOME>false</OUTCOME><POSITION X="154" Y="241"/></VARIABLE><DEFINITION><FOR>hear_bark</FOR><GIVEN>dog_out</GIVEN><TABLE>0.7 0.3 0.01 0.99</TABLE></DEFINITION><VARIABLE TYPE="nature"><NAME>light_on</NAME><OUTCOME>true</OUTCOME><OUTCOME>false</OUTCOME><POSITION X="73" Y="165"/></VARIABLE><DEFINITION><FOR>light_on</FOR><GIVEN>family_out</GIVEN><TABLE>0.6 0.4 0.05 0.95</TABLE></DEFINITION>

Page 15: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

</NETWORK></BIF>

6. XMLBIF version 0.40The XMLBIF version 0.40 is a simplified version of XMLBIF 0.50. This version is similar in most aspects to 0.50, with the significant difference that it only works with discrete networks. The XMLBIF 0.50 format was written with the objective of being backward and forward compatible with 0.40. This means discrete networks save in XMLBIF 0.40 can be opened with 0.50, as well as networks saved with XMLBIF 0.50 can be opened with 0.40. Gaussian and hybrid networks can only be opened with XMLBIF 0.50.

The DTD for XMLBIF 0.40 is posted below:<!-- DTD for the XMLBIF 0.4 format --><!DOCTYPE BIF [<!ELEMENT BIF ( NETWORK )*>

<!ATTLIST BIF VERSION CDATA #REQUIRED><!ELEMENT NETWORK ( NAME, ( PROPERTY | VARIABLE | DEFINITION )* )><!ELEMENT NAME (#PCDATA)><!ELEMENT VARIABLE ( NAME, ( OUTCOME | PROPERTY | POSITION)* ) >

<!ATTLIST VARIABLE TYPE (nature|decision|utility|explanation) "nature"><!ELEMENT OUTCOME (#PCDATA)><!ELEMENT DEFINITION ( FOR | GIVEN | TABLE | ENTRY | PROPERTY )* ><!ELEMENT FOR (#PCDATA)><!ELEMENT GIVEN (#PCDATA)><!ELEMENT TABLE (#PCDATA)><!ELEMENT ENTRY (CATEGORY*, LIST)><!ELEMENT CATEGORY (#PCDATA)><!ELEMENT LIST (#PCDATA)><!ELEMENT PROPERTY (#PCDATA)><!ELEMENT POSITION EMPTY>

<!ATTLIST POSITION X CDATA #REQUIRED Y CDATA #REQUIRED>]>

7. XMLBIF version 0.30The XMLBIF 0.30 was the first XML version used with JavaBayes. The structure of the format is similar in most aspects with 0.40 and 0.50.The specific differences are:This version allows the indication of observed variables using a property.For example, to indicate that variable light-on is observed with value true (i.e., light-on = true is the evidence):<VARIABLE TYPE="chance">

<NAME>light-on</NAME><OUTCOME>true</OUTCOME><OUTCOME>false</OUTCOME><PROPERTY>observed true</PROPERTY><PROPERTY>position = (73, 165)</PROPERTY>

</VARIABLE>

Page 16: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

The same functionality has been moved to a separate file format with XMLBIF 0.40 and 0.50.

The position of the variable is written in a different manner, inside a PROPERTY tag. The information that a variable is explanatory is also informed with a property.An example follows below:<VARIABLE TYPE="chance">

<NAME>light-on</NAME><OUTCOME>true</OUTCOME><OUTCOME>false</OUTCOME><PROPERTY>explanation</PROPERTY><PROPERTY>position = (73, 165)</PROPERTY></VARIABLE>

</VARIABLE>

The attribute TYPE of variables have the following options: chance, decision and utility. The value “chance” has been renamed to “nature” in XMLBIF 0.40 and 0.50.

In previous JavaBayes versions, the XMLBIF parser has been implemented with JavaCC. For this reason, there are some cases where the saved networks cannot be properly opened with newer versions of JavaBayes. This problem normally happens when some characters which need espaping in XML are used for names in the network. In these cases, it is necessary to manually correct the files containing the saved networks.

8. XMLBIF-EVIDENCE 0.50

XMLBIF-EVIDENCE is an accompanying format for XMLBIF 0.50. The objective of this specification is to store evidences in a separate file.The DTD for XMLBIF-EVIDENCE is as follows:

<!-- DTD for the XMLBIF-EVIDENCE 0.5 format -->

<!DOCTYPE BIF-EVIDENCE [<!ELEMENT BIF-EVIDENCE (DESCRIPTION?, NETWORK*)>

<!ATTLIST BIF-EVIDENCE VERSION CDATA #REQUIRED><!ELEMENT DESCRIPTION (#PCDATA)><!ELEMENT NETWORK (NAME, EVIDENCE*)><!ELEMENT NAME (#PCDATA)><!ELEMENT EVIDENCE (VARIABLENAME, VALUE)><!ELEMENT VARIABLENAME (#PCDATA)><!ELEMENT VALUE (#PCDATA)>]>

Only variables with evidence are saved in the file. When an evidence file is loaded, all variables that don't contain an entry in the file have their evidence retracted.

The DESCRIPTION tag is optional and denotes any string of text.

Page 17: sites.poli.usp.brsites.poli.usp.br/pmr/ltd/People/asaheki/docs/File formats.doc  · Web viewThis document describes the file formats used by JavaBayes. 1. Introduction 2. 2. BIF

The file must contain a NETWORK tag, with its name and a list of EVIDENCEs. An empty list of evidences will retract all evidences in the network.Each EVIDENCE tag contains two values: the name of the variable (VARIABLENAME) and the value of its evidence. For discrete variables, the evidence is the name of the observed category. For gaussian variables, the evidence is a double with the mean value of the variable.

Before loading an observation file, be sure to load the corresponding network.