Managing the Evolution of Information Systems with Intensional Views and Relational Algebra
-
Upload
kimmens -
Category
Technology
-
view
212 -
download
1
Transcript of Managing the Evolution of Information Systems with Intensional Views and Relational Algebra
Managing the Consistency of (Evolving) Informa8on Systems with
Intensional Views and Rela8onal Algebra
applied to a case of IP phone localisa1on
David Colpaert, Kim Mens & Bernard Lambeau Presented at BENEVOL 2014, Amsterdam — 28 November 2014
based on David Colpaert’s Master Thesis in Computer Science at UCL, Belgium — 19 June 2014
Managing the Consistency of (Evolving) Informa8on Systems with
Intensional Views and Rela8onal Algebra
applied to a case of IP phone localisa1on
David Colpaert, Kim Mens & Bernard Lambeau Presented at BENEVOL 2014, Amsterdam — 28 November 2014
based on David Colpaert’s Master Thesis in Computer Science at UCL, Belgium — 19 June 2014
MAY CONTAIN
TRACES OF
FRENCH
David Colpaert, Kim Mens & Bernard Lambeau 2/20
Intro-‐duc8on
Ini8al Solu8on
Case Study Valida8on Improved
Solu8on Intensio-‐nal Views Valida8on Conclusion
Ini8al Solu8on Improved Solu8on
When evolving, migra8ng or merging databases, how to detect poten8al inconsistencies that may exist in the data?
• Data coming from mul8ple contradictory or incomplete sources • Preferably via an easy-‐to-‐understand graphical user interface
Case study : localisa8on of IP telephones at a university
Need a generic tool to describe and detect consistency rules
3/20
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu8on
7. Intensional Views 8. Valida8on 9. Conclusion
David Colpaert, Kim Mens & Bernard Lambeau
Localise IP telephones at a university in case of emergency calls • By merging data coming from different sources • Via automated scripts • While iden8fying poten8al errors in the data
Generic tool for managing IP telephones
Web Interface
4/20
1. Plan 2. Intro 3. Ini0al Solu0on 4. Case Study 5. Valida8on 6. Improved
Solu8on 7. Intensional
Views 8. Valida8on 9. Conclusion
David Colpaert, Kim Mens & Bernard Lambeau
3 sources for localisa8on data: • Via the « deployment » Excel files • Via the network (IP switches) • Via the telephone exchange system MX1 and SAP system
Merging
MX1+SAP
Network
Deployment
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved
Solu8on 7. Intensional
Views 8. Valida8on 9. Conclusion
Numéro UCL
Bâ0ment Local
UA-‐00001 SC16 A 001
Extract of the deployment file
David Colpaert, Kim Mens & Bernard Lambeau 5/20
6/20 D. Colpaert, K. Mens & B. Lambeau
3 sources for localisa8on data: • Via the « deployment » Excel files • Via the network (IP switches) • Via the telephone exchange system MX1 and SAP system
One script per source
Merging of the data from these sources
Scripts and merges executed daily
20.000 lignes of code
Currently used in produc8on at the university Merging
MX1+SAP
Network
Deployment
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved
Solu8on 7. Intensional
Views 8. Valida8on 9. Conclusion
Numéro UCL
Bâ0ment Local
UA-‐00001 SC16 A 001
Extract of the deployment file
David Colpaert, Kim Mens & Bernard Lambeau 7/20
9/20 D. Colpaert, K. Mens & B. Lambeau
Mul8ple errors, inconsistencies and lacking informa8on • In each of the sources individually
• When merging the data
Errors logged in files • Difficult to manipulate • Difficult to understand • Difficult to solve • Hard to see the «bigger picture»
10/20
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida0on 6. Improved Solu8on
7. Intensional Views 8. Valida8on 9. Conclusion
[ERROR] Le local pour le numéro de téléphone 43325 est manquant à la ligne 34.
[WARNING] Un déploiement existe déjà pour le numéro UCL TA-‐00803 à la ligne 47.
[ERROR] Iden8fiant du switch non trouvé : SalleOleffe01281098 (172.31.28.137) pour l'adresse MAC : 00:08:5d:35:32:cc
[ERROR] Le bâ8ment pour l'adresse MAC 00:08:5d:35:3c:d2 est manquant
[ERROR] Le bâ8ment Logement 456 ne peut être conver8 en un code ba8ment car ce nom est inconnu dans la table buildings
[ERROR] Le numéro de téléphone 73999 n'existe pas dans le fichier SAP
Examples of some errors and warnings
David Colpaert, Kim Mens & Bernard Lambeau
Mul8ple errors, inconsistencies and lacking informa8on • In each of the sources individually
• When merging the data
Errors logged in files • Difficult to manipulate • Difficult to understand • Difficult to solve • Hard to see the «bigger picture»
10/20
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida0on 6. Improved Solu8on
7. Intensional Views 8. Valida8on 9. Conclusion
MISSING DATA
INCONSISTENT DATA
MISSING DATA
MISSING DATA
INCORRECT DATA
MISSING DATA Different kinds of inconsistencies
David Colpaert, Kim Mens & Bernard Lambeau
10/20
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida0on 6. Improved Solu8on
7. Intensional Views 8. Valida8on 9. Conclusion
Errors in the deployment files Errors in the network data
Errors in the MX1+SAP data Errors in the merged data
11/20
Expressing constraints on the data • Over mul8ple tables and fields • While filtering irrelevant entries
Detec8ng and inspec8ng inconsistencies • with respect to these constraints
Simplicity of expression without sacrificing expressiveness
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu0on
7. Intensional Views 8. Valida8on 9. Conclusion
David Colpaert, Kim Mens & Bernard Lambeau
12/20
Combine 2 exis8ng ideas:
• Intensional Views • Rela8onal Algebra Querying with Alf
Make a generic tool for defining and checking constraints over the data
Via an easy-‐to-‐use user interface
Valida8on by applying it to our case
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu8on
7. Intensional
Views 8. Valida8on 9. Conclusion
David Colpaert, Kim Mens & Bernard Lambeau
Intensional Views
Originally designed for sovware (code) quality assurance purposes
Allows expressing and verifying structural source-‐code regulari8es
Reuse this idea for expressing and detec8ng database constraints
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu8on
7. Intensional
Views 8. Valida8on 9. Conclusion
Alf (www.try-‐alf.org)
A database query language based on rela8onal algebra
project, restrict, join, union, intersect,
minus,…
« join_on(le6_table, right_table, [:mac]) »
Vues intensionnelles
Ini8alement des8né à la maintenance
logicielle
Vérifier des contraintes sur un
code source
Ici, contraintes sur une base de données
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu8on
7. Intensional
Views 8. Valida8on 9. Conclusion
It is also possible to define some additional filters inorder to consider only a subset of the data, for instance, onlyconsidering one particular building. This can be useful, forexample, when analyzing very large databases with lots ofinconsistencies, and the user wants to inspect the inconsisten-cies for a particular subset of the data only. In our particularexample, we didn’t apply any such filters.
Finally, when the user clicks on the ‘Check constraint’button, three different Alf queries are generated. The firstone is a query to find the positive results, i.e. all tuples thatsatisfy the declared constraint. A second query will calculatethe mismatches in the source table, i.e. all tuples in the sourcetable that do not satisfy the declared constraint. A third querycalculates the mismatches in the target table.
The generated Alf query for the positive results lookssomewhat like this:
r e s t r i c t (jo in on ( s o u r c e t a b l e , t a r g e t t a b l e ,
common key ) ,eq ( : s o u r c e t a b l e b u i l d i n g ,
: t a r g e t t a b l e b u i l d i n g ) &eq ( : s o u r c e t a b l e r o o m ,
: t a r g e t t a b l e r o o m ) )
From this query it can be observed that the two concernedtables are first joined based on their common key, and thenthe results are restricted to the tuples satisfying all conditions,i.e. that the buildings and rooms must be equal. In reality,the actual generated query 2 is a bit more complex than this,to take into account custom mappings (in our example, forinstance, there is no common key but the correspondencebetween MAC addresses and UCL ID’s needs to be lookedup in an intermediate table), renaming (for instance, whentwo corresponding fields have a different name in the differenttables), and filters (an extra restriction based on the specifiedfilters should be applied).
Each of these generated queries are then executed throughAlf. As exemplified by Figure 1, positive results are displayedin the table at the bottom center of the GUI, whereas negativeresults are shown on the bottom left and right, respectively.(For non-bidirectional relations there will no table either onthe left or on the right.)
In our example, we see that only one phone (the onewith MAC address 00:08:5d:00:00:01 and UCL-ID UA00001)satisfies the constraint of having the same location in bothsources. For all other phones, we find inconsistencies and theythus end up in the negative results. A negative result means thateither the building or room was different in the other table, orthat no correspondence whatsoever was found for this phonein the other table.
Whereas the presented positive and negative results al-ready provide a lot of useful information about detected(in)consistencies in the data, they are not always easy to inter-pret by the end-user because they are not shown in the contextof the original tables. For this purpose, our tool providesan alternative highlighted view which simply highlights thedetected (in)consistencies in the original tables. To open this
2More details on the query generation process can be found in [8].
Fig. 2. Inspecting data (in)consistencies with the highlighted view.
view it suffices to click on the button ‘Highlighted view’ atthe bottom of the intensional view editor.
Figure 2 illustrates what this highlighted view would looklike for our previous example. It displays each of the concernedtables, that is, the source and target tables but also theintermediate table used for defining the key mapping. For eachof these tables the tuples are coloured either in red if theycorrespond to an inconsistency, in green if they correspond toa positive result, or just appear in white if the tuple is notconcerned by this particular constraint.
In our example, we see that three tables are concerned. Thelocations from the network and from deployments, but also theintermediate attribution table which maps MAC addresses tophone IDs. The only positive case appears in green, all othersin red. One element in the attributions table appears in whitebecause no element in either the network or deployments tablehad such MAC address or UCL-ID.
Using the highlighted view we can observe, for in-stance, that the information for the phone with MAC address00:08:5d:00:00:02 and UCL-ID UA00002 is inconsistent, sinceit appears with location SC052–A003 in the network table,whereas it has location SC051–A 002 in the deployments table.
IV. VALIDATION
As explained above, intensional views allow the end-user to declare high-level constraints between data sourceswith relative ease, and reported inconsistencies can then beinspected in two different views to help him identify the causesof the inconsistencies.
In our actual case study, containing the data for about6500 phones, many inconsistencies were found, such as miss-ing phones, missing information for a given phone, missingmappings between phone IDs and their MAC address, andunknown buildings. All these inconsistencies can be foundwith our tool. The amount of phones dealt with and the amountof inconsistencies discovered were simply too large to behandled manually, which was the prime the reason for creatingthis intensional view tool for analyzing data inconsistencies.
For the constraint declared in the previous section, forexample, when applied to the 6500 IP phones in use at theuniversity, comparing locations in network and deployments
If 8me (and beamer) permit: video of
comparing localisa8on from network and deployment sources
15/20
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu8on
7. Intensional
Views 8. Valida8on 9. Conclusion
David Colpaert, Kim Mens & Bernard Lambeau
1. Plan 2. Intro 3. Objec8fs ini8aux 4. Cas d’étude 5. Valida8on 6. Nouveaux
objec8fs
7. Vues intensionnel
les 8. Valida8on 9. Conclusion
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu8on
7. Intensional
Views 8. Valida8on 9. Conclusion
17/20 David Colpaert, Kim Mens & Bernard Lambeau
• 1322 posi8ve results • 2787/4098 (68%) nega8ves in the network source • 2057/3368 (61%) nega8ves in the deployment source
Consistency of localisa8on data (network vs. deployment)
• ~1000/7512 (13%) cases
Iden8cal localisa8on data in all three sources
• 104/7512 (1%) cases
Contradictory data in all three sources
• 701/4098 (17%) cases
Missing ayribu8ons in the network source
18/20
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu8on
7. Intensional Views 8. Valida0on 9. Conclusion
David Colpaert, Kim Mens & Bernard Lambeau
Achieved Objec8ves: • New approach to express, verify and visualise data constraints • Combining intensional views and rela8onal algebra
Possible Improvements: • Contraints on more than two tables • Increase expressivity (aggrega8on, user-‐def. pred., devia8ons, logic queries) • Ergonomy of the user interface, efficiency improvements, …
Cross Fer8lisa8on: • Think out of the box • Apply old ideas to new domains (here we applied code tools to data) • (Could DB tools also be applied to code by seeing it as structured data?)
19/20
1. Plan 2. Intro 3. Ini8al Solu8on 4. Case Study 5. Valida8on 6. Improved Solu8on
7. Intensional Views 8. Valida8on 9.
Conclusion
David Colpaert, Kim Mens & Bernard Lambeau
20/20 David Colpaert, Kim Mens & Bernard Lambeau