Bringing Math to LOD
-
Upload
nikita-zhiltsov -
Category
Technology
-
view
1.089 -
download
1
description
Transcript of Bringing Math to LOD
Bringing Math to LOD:A Semantic Publishing Platform Prototype for
Scientific Collections in Mathematics
Olga Nevzorova, Nikita Zhiltsov, Danila Zaikin, Olga Zhibrik,Alexander Kirillovich, Vladimir Nevzorov, Evgeniy Birialtsev
Kazan Federal UniversityRussia
October 23, 2013
1 / 29
Our Contribution
Our prototype is geared to build a semantic graph ofmathematical knowledge objects, that
I is extracted from a collection of mathematicalscholarly papers, and
I is integrated into the LOD «cloud»
3 / 29
Research OutputIVM Data Set
I LOD representation of 1 330 scholarly publications ofthe «Izvestiya Vuzov. Matematika» (IVM) journal
I Covers the semantics of:I article metadataI elements of the logical structureI terminologyI formulas
I Aligned with DBpedia, CORDISI More than 850 000 RDF triplesI SPARQL endpoint:
http://cll.niimm.ksu.ru:8890/sparql-auth∗
∗the SPARQL endpoint is secured. Please email the authors for credentials4 / 29
Related Work
I Domain-specific languages: OMDoc, MathLangI Domain models: Cambridge MathematicalThesaurus, DBpedia (math-related part),ScienceWISE Ontology
I Math-related NLP: mArachna; linguistic modules ofarXMLiv
5 / 29
Key Research Contributions
I a thorough ontological model of the mathematicaldomain
I an ontology-based language-independent method forextraction of logical structure elements in papers
I an ontology-based method for extraction ofmathematical named entities from texts in Russian
I a method that connects mathematical named entitiesto symbolic expressions
7 / 29
Ontology of Structural Elements (1)http://cll.niimm.ksu.ru/ontologies/mocassin
I Covers 15 common structural elements:
I Defines 9 object properties and 4 datatype properties:
10 / 29
Ontology of Structural Elements (2)http://cll.niimm.ksu.ru/ontologies/mocassin
I 3 cardinality axioms, e.g.Proof ∧ (= 1 proves ProvableStatement†)
I 2 transitivity axioms for hasPart and dependsOnproperties
I DL expressivity: SRIN (D)
†i.e., Claim ∨ Corollary ∨ Lemma ∨ Proposition ∨ Theorem11 / 29
Ontology of Mathematical Concepts (1)http://cll.niimm.ksu.ru/ontologies/mathematics
I Covers 3 450 mathematical conceptsI Defines commonly used terms as well as terms fromthe emerging professional vocabulary (e.g.Bitsadze-Samarsky problem)
I Supports Russian/English labels
12 / 29
Ontology of Mathematical Concepts (2)http://cll.niimm.ksu.ru/ontologies/mathematics
I Includes two taxonomies:I taxonomy of mathematical theories‡:
F number theory, set theory, algebra, analysis, geometry,mathematical logic, discrete mathematics, theory ofcomputation, differential equations, numerical analysis,probability theory and statistics
I taxonomy of mathematical objects
I Covers common scientific concepts, such as Problem,Method, Statement, Formula etc.
I DL expressivity: ALCHI
‡covers just a part of the mathematical knowledge13 / 29
Ontology of Mathematical Concepts (3)Object properties
I belongsTo/contains, e.g.Barycentric Coordinates belongsTo Metric Geometry
I defines/isDefinedBy, e.g.Christoffel Symbol isDefinedBy Connectedness
I seeAlso, e.g.Chebyshev Iterative Method seeAlso Numerical Solution ofLinear Equation Systems
14 / 29
Ontology of Mathematical Concepts (4)Stats
I 3 450 classesI 27% of classes are mapped onto DBpediaI 3 630 subclass-of property instancesI 1 140 other object property instancesI Common facts about the development:
I lasted for 4 monthsI 7 pro mathematicians participated as domain experts
guided by the authorsI WebProtege was used as a collaborative tool
15 / 29
NLP Annotation
I Relies on the OntoIntegrator facilitiesI Solves some of the conventional linguistic tasks, suchas:
I tokenizationI sentence splitting (∼ 98% F-measure§)I morphological analysisI NP extraction (88% precision)
I Special handling of math symbols, abbreviations, andmath expressions as parts of NPs
I Currently supports only Russian language
§the metrics were evaluated on real math texts with the help ofdomain experts
17 / 29
Mining the Logical StructureI Supports our ontology of structural elements:
elements in real texts are instances of the ontology classesI Recognizing types of structural elements:
I A string similarity based method gives 89%-100%F-measure depending on the class
I Recognizing semantic relations between them:I A decision tree learner gives 61%-95% F-measure
depending on the relation
18 / 29
Mathematical Named Entity Extraction
I Supports our ontology of mathematical concepts:assigned NPs are instances of the ontology classes
I Our method employs annotations of the NP structureand Jaccard similarity
I The method gives 86% F-measure with parametersfocusing on precision/recall trade-off
19 / 29
Connecting Named Entities to FormulasI Parsing mathematical expressionsI Detection of variablesI Proximity-based matching of mathematical variableswith noun phrases at 68% accuracy
21 / 29
Other supported features
I Article metadata extraction (title, author names,publication year etc.) according to AKT Portalschema
I Semi-manual interlinking¶ with existing LOD datasets: DBpedia, CORDIS
I Publishing the extracted data as an LOD-compliantRDF data set
¶by leveraging the Silk app23 / 29
Finding DBpedia Entities in Mathematical Formulashttp://cll.niimm.ksu.ru/iswc-demo
1
2
25 / 29
Semantic Search of Theoretical FindingsFinding articles with theorems about finite groups
PREFIX moc: <http://cll.niimm.ksu.ru/ontologies/mocassin#>PREFIX math: <http://cll.niimm.ksu.ru/ontologies/mathematics#>SELECT ?article WHERE {?article moc:hasSegment ?theorem .?theorem moc:mentions ?entity; a moc:Theorem .?entity a math:E2183}
26 / 29
Conclusion
I We have developed a holistic approach for miningLOD representation of scholarly papers inmathematics
I We applied the prototype to a collection of over1 300 real math papers
I We conducted a thorough evaluation of the proposedmethods with the help of domain experts
I We provided several use cases to illustrate the utilityof the published data
27 / 29
Future Work
I Integrating all the modules into a full-fledged toolkitI Add support of English to the NLP moduleI Extend our approach to texts on other naturalscience domains
28 / 29