Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language...
-
Upload
letitia-dawson -
Category
Documents
-
view
215 -
download
0
Transcript of Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language...
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
PQLite:An Overly Simplistic Query Language for
Data Provenance
[email protected]@soe.ucsc.edu
CMPS203 Final ProjectUniversity of California, Santa CruzJack Baskin School of Engineering
Michael {Leece, Sevilla}
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Overview• Introduction• Current Work• Design and Implementation• Conclusions
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
TerminologyApplicationsTerminologyApplications
Introduction
• Provenance: history + ancestry of an object [1]– Processes– Data
• Provenance Aware Storage (PASS)– Transparent collection
• PQL: Path Query Language– Useful for provenance
Terminology
Terminology
Ancestry Graph
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
TerminologyApplicationsTerminologyApplications
Introduction
• Security• File System Search• The Cloud• New Hierarchical File Systems• Yan Li’s Photo Album
Applications
Applications
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
TerminologyApplicationsTerminologyApplications
Introduction
• Obtained PASSv2 • Ran PQL query on provenance database– Infinite loops– {}
PQL Broken
PQL Broken
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
PQL BrokenPQL UndocumentedOverview
PQL BrokenPQL UndocumentedOverview
Current Work
• Obtained PASSv2 • Ran PQL query on provenance database– Infinite loops– {}
• “The problem with PQL and Sage is that the implementation… is really slow, and it’s perhaps too easy to generate PQL queries that do not return any data.”
– PASS Team
PQL Broken
PQL Broken
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
PQL BrokenPQL UndocumentedOverview
PQL BrokenPQL UndocumentedOverview
Current Work
PQL Undocumented
PQL Undocumented
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
PQL BrokenPQL UndocumentedOverview
PQL BrokenPQL UndocumentedOverview
Current Work
Overview
Overview
Waldo Database
Dump
Waldo Database
Dump
PASSv2 ModulesPASSv2 Modules
Kernel SpaceKernel Space
VFSVFSLasagna FSLasagna FS
App1App1 App2App2
User SpaceUser Space
BDBBDB.twig.twig
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
• What we have– [ P ] 1.0 INODE 4 INODE 12[ P ] 1.0 NAME 9 "/file.txt"[ P ] 1.0 TYPE 4 "FILE"[ P ] 1.0 FREEZETIME 8 TIME 1329510432.493134083[ P ] 1.0 FREEZETIME 8 TIME
1329510618.420311721[ P ] 1.0 FREEZETIME 8 TIME 1329510676.040716382[AP ] 1.1 INPUT 12 --> 2.1[AP ] 1.2 INPUT 12 --> 8.1[AP ] 1.3 INPUT 12 --> 16.2[ PT] 2.0 ARGV 4 [1]"cat"[ PT] 2.0 ENV 64 [2]"SHELL=/bin/bash" [3]"TERM=xterm" [4]"XDG_SESSION_COOKIE=06c3f2775eb071081dfacb984bf6c364-1329508695.722050-291519720" [5]"USER=root" [6]"LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:" [7]"MAIL=/var/mail/root" [8]"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" [9]"PWD=/test" [10]"LANG=en_US.UTF-8" [11]"SHLVL=1" [12]"HOME=/root" [13]"LOGNAME=root" [14]"LESSOPEN=| /usr/bin/lesspipe %s" [15]"LESSCLOSE=/usr/bin/lesspipe %s %s" [16]"_=/bin/cat" [17]"OLDPWD=/"[ ] 2.0 EXECTIME 8 TIME 1329510428.104272662[ P ] 2.0 TYPE 4 "PROC"[ ] 2.0 PID 4 INT 13739[ P ] 2.0 NAME 8 "/bin/cat"[A ] 2.0 FORKPARENT 12 --> 14762.0[ P ] 2.0 FREEZETIME 8 TIME 1329510428.104272662
• What we want– A list of files or processes that are one-step ancestors of
“/file.txt”
Use Case
Use Case
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
Use Case (cont.)
Waldo Database
Dump
Waldo Database
Dump
Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"
Query Parser
Evaluator
Dump Parser
Ancestry Graph
1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…
Label Map
Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]
Abstract Syntax Tree
Response:
[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]
Use Case
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
Use Case (cont.)
Waldo Database
Dump
Waldo Database
Dump
Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"
Query Parser
Evaluator
Dump Parser
Ancestry Graph
1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…
Label Map
Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]
Abstract Syntax Tree
Response:
[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]
Use Case
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
Use Case (cont.)
Waldo Database
Dump
Waldo Database
Dump
Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"
Query Parser
Evaluator
Dump Parser
Ancestry Graph
1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…
Label Map
Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]
Abstract Syntax Tree
Response:
[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]
Use Case
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
Select Statement
Language Specification
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
Select Statement
Language Specification
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
Expression
Language Specification
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
Expression
Language Specification
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
Use CaseLanguage SpecificationUse CaseLanguage Specification
Design & Implementation
Use Case (cont.)
Waldo Database
Dump
Waldo Database
Dump
Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"
Query Parser
Evaluator
Dump Parser
Ancestry Graph
1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…
Label Map
Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]
Abstract Syntax Tree
Response:
[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]
Use Case
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
What We Did WellLessons LearnedReferences
What We Did WellLessons LearnedReferences
Conclusions
• Functional– It works. (PQLite > PQL)
• Easy to use– Intuitive (SQL-like) way of querying a provenance
graph– Getting stuff we care about
What we did well
What We Did Well
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
What We Did WellLessons LearnedReferences
What We Did WellLessons LearnedReferences
Conclusions
• Infinite recursion in parsing– Left recursion in a recursive descent parser– Refined syntax
• Began coding too soon• Monads are useful– IO(), Maybe, State, Parsec
Lessons Learned
Lessons Learned
IntroductionCurrent WorkDesign & ImplementationConclusions
PQLite:Provenance Query Language
What We Did WellLessons LearnedReferences
What We Did WellLessons LearnedReferences
Conclusions
1) Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie. Provenance-Aware Storage Systems. (PDF) Harvard University Computer Science Technical Report TR-18-05, July 2005
2) Stephanie Jones, Christina Strong, Darrell D. E. Long, Ethan L. Miller, Tracking Emigrant Data via Transient Provenance, Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP '11), June 2011.
3) Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor.Layering in Provenance Systems. In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June 2009.
4) PQL Language Guide and Reference
References
References