QuantNet - a Database Driven Web-Oriented Knowl-edge Base
Anton Andriyashin
C.A.S.E. — Center for Applied Statisticsand EconomicsChair of StatisticsHumboldt-Universitat zu Berlinhttp://ise.wiwi.hu-berlin.de
1
Motivation 1-2
What is QuantNet?
QuantNet is the web-oriented framework for the network ofdifferent algorithms arising from various educational areas insideHU.
• Collection of students’ term projects in a compact form
• Examples, comments and additional course material from thelecturers
• Auxiliary e-book algorithms and programs
• etc.
• Publicly available 24/7 on the Web
QuantNet
Motivation 1-3
A Database Driven Web Server
• The architecture of a web server must be flexible to allow anykind of document structuring and formatting
• There should be a list of documents and respective contentgroups stored in a database
• Each document has matching data to display which arecontained in a raw data format
• The paths to these raw data files are to be in the database aswell
• There should be a way to apply any kind of consistentformatting to the raw data files
QuantNet
Motivation 1-4
What Does the Client Get?
The client gets only the result of application and mixing of multiplelayers of information, which are produced on the web server.
• HTML page with a dynamic navigation
• Dynamic elements should not be driven by Javascript
• Server-sided scripting
• ...that allows effective and reliable database querying on thefly
• Raw data format should allow an effective style applicationand transformation to HTML
QuantNet
Technical Overview: Document View 2-5
Typical Set Up
Suppose a professor wants to publish some algorithms and relateddescriptions in the Internet. Should this person or his/heremployees master web-oriented programming?
• Provide a computer code with some comments to ease theunderstanding
• Add comments with a prespecified keywords in the head ofthe code
• Fill out these comment fields with a related content in anatural form
• Let QuantNet care about the rest
QuantNet
Technical Overview: Document View 2-6
How Does It Look Like?
1 @Area SFM
2 @Name Autocorrelation Plots
3 @Function call SFMacfar2 ()
4 @Description plots the autocorrelation function
process.
5 @Revision 1.2
6 @Author Christian Hafner , 2007 -01 -06
7 lag=30 ; lag value
8 a1=0.5 ; value of alpha_1
9 a2=0.4 ; value of alpha_1
10 input=readvalue("alpha1"|"alpha2"|"lag", 0.5|0.4|30)
11 ...
QuantNet
Technical Overview: Document View 2-7
Conversion to a Raw Data Format - XML
1 <?xml version="1.0" encoding="ISO -8859 -1"?>
2 <quantlet >
3 <name> Autocorrelation Plots </name>
4 <area> SFM </area>
5 <function_call >SFMacfar2 ()</function_call >
6 <desc>
7 Description plots the autocorrelation function of
8 AR(2) process
9 </desc>
10 <rev>1.2</rev>
11 <author > Christian Hafner , 20070106 </author >
12 </quantlet >
QuantNet
Technical Overview: Document View 2-8
Why XML?
XML is Extensible Markup Language
Formally defined languages based on XML are RSS, MathML,GraphML, XHTML, Scalable Vector Graphics, MusicXML andthousands of other examples
• Raw data format
• User-defined markup language for data description
• Designed to work with XSLT - Extensible StylesheetLanguage Transformations
• XML + XSLT = HTML, PDF, XML, ...
QuantNet
Technical Overview: Document View 2-9
XSLT Example
1 <xsl:template match="head">
2 <div class="box" id="boxContainer">
3 <div class="box" id="boxContent">
4 <p class="padded_table">
5 <xsl:apply -templates/>
6 </p>
7 </div>
8 </div>
9 </xsl:template >
10 <xsl:template match="head/*">
11 <p class="padded_table">
12 <b><xsl:value -of select="@print"/>:</b>
13 <xsl:apply -templates/>
14 </p>
15 </xsl:template >
QuantNet
Technical Overview: Document View 2-10
Pit Stop 1: XML + XSLT = HTML
At the moment raw data from ASCII file could be converted toXML format, which coupled with XSLT could provide an end-userwith a friendly HTML layout.
But how can one obtain XML out of ASCII? The answer is stringparsing via C/C++
QuantNet
Technical Overview: Document View 2-11
Basics of String Parsing
Recall that ASCII files have predefined key words like@description. These words and content in the respective vicinityshould be ’cut’ and stored in an adequate XML field.
And what about scalability? What if tomorrow one needs not onlyfields like @name and @description, but also something like@bug history?
• Carefully detach valuable information from ASCII file betweenpredefined keywords
• Store the portions of information inside correct XML tags• Ensure the resulted XML document to be well-formed
How? Would that be plain C/C++ from scratch?By all means - NO
QuantNet
Technical Overview: Document View 2-12
Lex - A Way to Generate a Lexical Analyzer
• Lex is a program that generates lexical analyzers (’scanners’or ’lexers’)
• It is the standard lexical analyzer generator on Unix systems,and is included in the POSIX standard
• Lex reads an input stream specifying the lexical analyzer andoutputs source code implementing the lexer in the Cprogramming language
QuantNet
Technical Overview: Document View 2-13
Lex Example
Goal: read the input stream and detach any digits present
1 %%
2 /*** Rules section ***/
3
4 /* [0-9]+ matches a string of one or more digits
*/
5 [0-9]+ {
6 /* yytext is a string containing the
matched text. */
7 printf("Saw an integer: %s\n", yytext);
8 }
9
10 . { /* Ignore all other characters. */ }
11 %%
QuantNet
Technical Overview: Document View 2-14
Lex Example Continued
Here is the simple I/O procedure employing the code generated byLex.
INPUT:
1 abc123z .!&*2 ghj6
OUTPUT:
1 Saw an integer: 123
2 Saw an integer: 2
3 Saw an integer: 6
QuantNet
Technical Overview: Document View 2-15
Pit Stop 2: ASCII → Lex → C/C++ → XML+ XSLT = HTML
Where are we now?
• String scanner is created via Lex
• It is compiled by C/C++ complilator
• ASCII file is transformed into a valid XML by applying a readyC/C++ procedure
• Scalable XSLT stylesheet must be written separately
• Conjunction of XML and XSLT results into an HTMLdocument
QuantNet
Technical Overview: Web Server View 3-16
From a Single Document to MultipleDocuments
• QuantNet is supposed to contain multiple documents
• Documents may belong to different areas → folding isnecessary
• An end user should easily navigate through differentdocuments → dynamic navigation system is required
• Scripting should preferably be performed on the server side
• Javascript should be avoided for compatibility issues
QuantNet
Technical Overview: Web Server View 3-17
mySQL + PHP = Database Driven WebServer
mySQL database could store the following items:
• List of different subject areas/folders, e.g.: Informatics,Macroeconomics, Statistics and so on
• Individual document names and hierarchy position, e.g.MVAproj01 belongs to Statistics group
• Path to the respective XML file for document/projectdescription
PHP as a scripting language could then be responsible for:
• Global HTML layout of the page• Tree-like navigation system• Dynamic processing of mouse clicks etc.• Possibly for some content-driven elements of the page
QuantNet
Technical Overview: Web Server View 3-18
PHP Example
1 function GetHref (&$Page ,&$Name ,$Myqsl){
2 $MyqslTable=$Myqsl["Table"];
3 @mysql_connect($Myqsl["Server"],$Myqsl["User"],
$Myqsl["Pass"])<...> ;
4 mysql_select_db($Myqsl["DB"]);
5 $usage="PublicUse=0 Or PublicUse=1";
6 $sql="SELECT Href FROM $MyqslTable WHERE Name = ’
$Name ’";
7 $href01 = mysql_query($sql);
8 if (! $href01) {
9 die(’Could not query:’ . mysql_error ()); }
10 $result_href = mysql_result($href01 , "main_table.
Href");
11 return $result_href;
12 }
QuantNet
Technical Overview: Web Server View 3-19
And What About Javascript?
One may need Javascript in the following situations:
• Passing variable values between frames• But PHP eliminates the need for frames by 99%• Database driven elements on the page? PHP is again the
answer• Dynamic layout functions like OnMouse()? CSS could be the
solution
And what’s wrong with Javascript? It may be switched off by anend-user for security reasons.
Current version of QuantNet does not employ a singleJavascript function.
QuantNet
Technical Overview: Web Server View 3-20
Pit Stop 3: {ASCII → Lex → C/C++ → XML+ XSLT} + {mySQL + PHP}= QuantNet
• ASCII files are collected and sorted into area groups
• XML representation is derived with the help of Lex
• Document names, location and hierarchy is stored in amySQL database
• PHP defines a global HTML markup, dynamic tree-likenavigation and call of an appropriate XML document
• Upon a call of a XML document, a global XSLT template isapplied
• HTML output with active navigation is created
And this is what QuantNet is!
QuantNet
Conclusion: Overview of QuantNet 4-21
Scalability and Benefits of QuantNet
• Easy to maintain• Don’t like the style? Change a single global XSLT stylesheet• New document arrived? Just add it to the mySQL database• Want a new subject area? The scripts are scalable, again just
add it to the database• Want to submit a document online but don’t know much
about HTML? Use regular human language in the predefinedcomment fields, and forget about the rest
• Know HTML perfectly and want a cutting-edge page with richformatting within QuantNet? No problem - just ask foradditional tags like <b>, <link>, <code> etc. to be definedin a XSLT stylesheet
• Easy to maintain
QuantNet
Conclusion: Overview of QuantNet 4-22
What Are the Costs of QuantNet?
• mySQL is a free database - compare to prices for Oracle
• No investments for PHP, XML, XSLT and C/C++ softwareare required
• Lex/Flex is available for free as well
• Just need to know XML, XSLT, mySQL, PHP, C/C++and HTML with CSS to implement it!
QuantNet
Top Related