TMSync: Synchronizing topic maps
-
Upload
lars-marius-garshol -
Category
Technology
-
view
732 -
download
1
description
Transcript of TMSync: Synchronizing topic maps
http://www.ontopia.net/© 2006 Ontopia AS 1
TMSync
Topic map-to-topic map updates
Lars Marius GarsholCTO, Ontopia
TMRA 2006
2006-10-11
http://www.ontopia.net/© 2006 Ontopia AS 2
Agenda
• Background– the problem– why TMSync is the solution
• TMSync in detail– what it is– how it works
• Applications– what you can do with TMSync
• Conclusion
http://www.ontopia.net/© 2006 Ontopia AS 3
Background
The problem
Solving it with TMSync
http://www.ontopia.net/© 2006 Ontopia AS 4
The problem
• Topic Maps hold out a promise as a great technology for data integration
– because of merging, global identifiers, etc
• However, dynamic sources are poorly supported at the moment– that is, converting once is easy, but staying in sync is hard
• A solution that only supports static integration is near-worthless– in practice, integrated data is nearly always going to need updating from the
source– building a one-time conversion is easy– building data integration with update support is hard– so, suddenly data integration with Topic Maps isn’t so easy, after all
http://www.ontopia.net/© 2006 Ontopia AS 5
Merging is not the solution
• Merging in Topic Maps is often thought of in terms of <mergeMap>
– this is only useful if you are working from XTM files– <mergeMap> only has an effect when the XTM file is loaded– after that, the only way to use the <mergeMap> is to reload from scratch– reloading from scratch loses all changes...
• Real applications are based on databases– here <mergeMap> has no effect
http://www.ontopia.net/© 2006 Ontopia AS 6
What TMSync is
• A simple way to update part of one topic map with part of another– define which part of the target topic map you want,– define which part of the source topic map it is the master for, and– the algorithm does the rest
http://www.ontopia.net/© 2006 Ontopia AS 7
If the source is not a topic map
• Simply do a normal one-time conversion– let TMSync do the update for you
• In other words, TMSync reduces the update problem to a conversion problem
source.xml
convert.xslt TMSync
http://www.ontopia.net/© 2006 Ontopia AS 8
TMSync in depth
What it is
How it works
http://www.ontopia.net/© 2006 Ontopia AS 9
TMSync in mathematical terms
• A function that given– a target topic map,– a source topic map,– a topic selector for the target map (a function),– a characteristic selector for the target map (a function),– a topic selector for the source map (a function),– a characteristic selector for the source map (a function),
• produces an updated target map
http://www.ontopia.net/© 2006 Ontopia AS 10
Mathematical specification
• Currently based on the Q model[1]– mainly because this was the only model in existence when I started working
• Will translate to the TMRM– since this is better-known, and now has a TMDM mapping
[1] Q: A Model for Topic Maps,
http://www.ontopia.net/topicmaps/materials/quads.html
http://www.ontopia.net/© 2006 Ontopia AS 11
The selection process
name
name
occurrence
occurrence
occurrence
http://www.ontopia.net/© 2006 Ontopia AS 12
The update process
name
name
occurrence
occurrence
occurrence
NAME
name
occurrence
bar
occurrence
NAME
bar
http://www.ontopia.net/© 2006 Ontopia AS 13
How to configure the algorithm
• How to specify the topics– use a query– this gives great flexibility, while keeping the algorithm simple– it also means that we can efficiently find the set of topics to work on
• How to specify the characteristics– use a query, again, or– use a set of types, or– ...
http://www.ontopia.net/© 2006 Ontopia AS 14
What the algorithm does
• For each topic in the sync’ed fragment– remove all sync’ed characteristics not in the source
• except associations to non-sync’ed topics
– add all characteristics in the source that are not in the target– leave the rest alone
• Remove and add topics in the same way
http://www.ontopia.net/© 2006 Ontopia AS 15
Applications
City of Bergen
US Publisher
http://www.ontopia.net/© 2006 Ontopia AS 16
The City of Bergen
LivsIT
Service
Unit Person
City of Bergen
LivsIT
Norge.no
http://www.ontopia.net/© 2006 Ontopia AS 17
City of Bergen configuration
• On the source side– query to get all instances of “category” and “keyword”– accept all characteristics
• On the target side– query to get all instances of “category” and “keyword”
• except those with mark-as-local associations
– accept all characteristics except local search name and mark-as-local
http://www.ontopia.net/© 2006 Ontopia AS 18
Nameless US publisher
• Use an automated process to classify documents– documents get reclassified now and then– output of process is an XTM document
• If documents did not get reclassified, import would be enough– as it is, they use TMSync
classified.xtm
TMSync
http://www.ontopia.net/© 2006 Ontopia AS 19
Conclusion
Related work
Further work
http://www.ontopia.net/© 2006 Ontopia AS 20
Related work
• RDFSync– algorithm to synchronize two RDF graphs efficiently– no business case focus
• TM-Views– one possible way to define fragments for update
• TMRAP– uses TMSync for the update-topic request
http://www.ontopia.net/© 2006 Ontopia AS 21
Further work
• Reformulate algorithm to TMRM instead of Q– this will be done in the paper submitted to the proceedings
• Improve algorithm to handle delta sets– that is, to only need information about what has changed since last in the
source– this should not be very difficult– may do this for the final paper