PDQ: Proof-driven Querying presentation
-
Upload
dbonto -
Category
Technology
-
view
157 -
download
3
description
Transcript of PDQ: Proof-driven Querying presentation
![Page 1: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/1.jpg)
A short intro to PDQ: Proof-driven
Querying
A short intro to PDQ: Proof-driven
QueryingMichael Benedikt
with Julien Leblay, Efi Tsamoura, and Michael Vanden Boom
Michael Benedikt
with Julien Leblay, Efi Tsamoura, and Michael Vanden Boom
![Page 2: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/2.jpg)
BackgroundBackground
DBOnto: Semantics for a better worldDBOnto: Semantics for a better world
• Enable new applications
• Deliver better performance for current data-intensive tasks
• Diminish effort in integrating complex data sources
Exploit semantics of data: within a single source, among distributed sources, across data models
![Page 3: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/3.jpg)
BackgroundBackground
Dimensions of Semantic DataDimensions of Semantic Data
Completenessof Sources/Source Access Model
TargetImplementation
Data modelfor queries and constraints
![Page 4: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/4.jpg)
BackgroundBackground
Dimensions of Semantic DataDimensions of Semantic Data
Completenessof Sources/Source Access Model
TargetImplementation
Data modelfor queries and constraints
![Page 5: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/5.jpg)
BackgroundBackground
Semantic Data Technology Semantic Data Technology
Completeness of Sources/
Source Access Model
TargetImplementation
Data modelfor queries and constraints
• RDF data model, description logic constraints• Inherently incomplete sources• Certain answer semantics• Wide range of target implementations
Semantic Web
![Page 6: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/6.jpg)
BackgroundBackground
Semantic Data Technology Semantic Data Technology
TargetImplementation
Data modelfor queries and constraints
Query Optimizationwith Constraints
• Relational data model and constraints• Complete information• Access via lookup indices in sources• Compile to plan language of DBMS
Completeness of Sources/
Source Access Model
![Page 7: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/7.jpg)
BackgroundBackground
Semantic Data Technology Semantic Data Technology
TargetImplementation
Data modelfor queries and constraints
Query Optimizationwith Constraints via Reformulation
• Relational data model and constraints• Complete sources • Compile to query language (e.g. SQL)
Completeness of Sources/
Source Access Model
![Page 8: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/8.jpg)
BackgroundBackground
Semantic Data Technology Semantic Data Technology
TargetImplementation
Data modelfor queries and constraints
Query Rewriting with Exact Views
• Relational sources and constraints• Base data may not be accessible• Can still look for exact answers to queries• Compile to query language (e.g. SQL)
Completeness of Sources/
Source Access Model
![Page 9: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/9.jpg)
BackgroundBackground
Semantic Data Technology Semantic Data Technology
TargetImplementation
Data modelfor queries and constraints
Federated Querying Over Web-basedSources
• Model sources and constraints relationally • Complete information on subset of sources• Distributed sources with mix of access
regimes• Compile to middleware plan
Completeness of Sources/
Source Access Model
![Page 10: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/10.jpg)
BackgroundBackground
Long-term PDQ visionLong-term PDQ vision
Completenessof Sources/Source Access Model
TargetImplementation
Data modelfor queries and constraints
PDQ
![Page 11: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/11.jpg)
FunctionalityFunctionality
PDQ: what it is todayPDQ: what it is today
Unified framework for:•Query Optimization/Reformulation with Constraints •Querying with Materialized Views•Federated Querying with Complete Information
System for answering queries Q in the presence of semantic relationships and access restrictions on sources
Targets:•Relational data model and constraints•Sufficient accessible information assumption: there is sufficient accessible data to obtain the exact answers to the query Q•Compilation into a “static plan” (reformulation, physical plan, middleware plan)
![Page 12: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/12.jpg)
FunctionalityFunctionality
PDQ: what it is PDQ: what it is
Metadata including •D description of access to sources•integrity constraints C
Pbest: plan using access model described by D with minimal cost giving the exact answer to Q for databases satisfying constraints C
PDQ planner
PDQ runtime Executes plans on top ofWeb-based or local datasources
Query Q
Cost information (e.g. cost function on plans)
![Page 13: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/13.jpg)
Under the hoodUnder the hood
PDQ: how it works (sort of)PDQ: how it works (sort of)
Key observation: Under the sufficient accessible information assumption on Q, C, D there is always a “static plan” (e.g. relational algebra query) PQ that can be run to answer Q
We can find such a PQ by looking for a “proof that there is sufficientinformation to answer Q”.
• First main component: procedures to turn “proofs of answerability” into plans • Proof-to-plan procedure works for extremely rich class of integrity constraints• Adaptable to different target implementations (SQL query, physical plan, distributed
plan…)
• These “proof-to-plan” procedures are coupled with a reasoning system
for finding the proofs of answerability. • Plug-in architecture: Chase procedure, Tableau-based FO theorem-prover, …
![Page 14: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/14.jpg)
Under the hoodUnder the hood
PDQ: how it works in a bit more detail PDQ: how it works in a bit more detail
PDQ planner
Reasoningsystem for
finding “proofs of
answerability”
Proof-to-Plan
conversion
Metadata including •D description of access to sources•integrity constraints C Query Q
Cost information (e.g. cost function on plans)
![Page 15: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/15.jpg)
Under the hoodUnder the hood
PDQ: how it works, still morePDQ: how it works, still more
We can find a static plan PQ getting the exact answer to Q by looking for a “proof that Q is answerable” and then applying a proof-to-plan procedure.
Last component – search strategy: we can find a good PQ by searching for a proof that 1.witnesses that Q is answerable2.generates a low-cost planSearch is directed by proof goal and cost
![Page 16: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/16.jpg)
Under the hoodUnder the hood
PDQ architecture PDQ architecture
![Page 17: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/17.jpg)
StatusStatus
PDQ today and tomorrowPDQ today and tomorrow
• Theoretical basis given in PODS 2014 paper
• Demonstration implemented over web services in VLDB 2014
• Implementation generates SQL reformulation over relational sources (run on top of Postgres)
Moving forward:
•Pilot project beginning Oct 2014 to explore “native implementation” of PDQ on top of the plan language of the LogicBlox DBMS
•Large EPSRC-funded project 2015-2020 to explore diverse uses of PDQ
![Page 18: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/18.jpg)
StatusStatus
PDQ today and tomorrowPDQ today and tomorrow
Completenessof Sources/Source Access Model
TargetImplementation
Data modelfor queries and constraints
PDQ2014
PDQ2020
![Page 19: PDQ: Proof-driven Querying presentation](https://reader036.fdocuments.us/reader036/viewer/2022062514/55837a93d8b42ac6268b4ca5/html5/thumbnails/19.jpg)
PDQ: Next StepsPDQ: Next Steps
Next StepsNext Steps
• More info at http://cs.ox.ac.uk/pdq• See the demo!