Stardog talk-dc-march-17
-
Upload
clark-parsia-llc -
Category
Technology
-
view
3.506 -
download
0
description
Transcript of Stardog talk-dc-march-17
![Page 1: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/1.jpg)
Kendall Clark, CEOClark & Parsia, LLC
1Thursday, March 17, 2011
![Page 2: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/2.jpg)
About C&P• We build semantic technology infrastructure
and enterprise solutions
• Pellet, the leading OWL reasoner
• POPS Expertise Location system
• Bootstrapped since 2005
• Offices in DC and Cambridge, MA
• Government & enterprise customers
• First talk ever was at LOC in 2005 :)
2Thursday, March 17, 2011
![Page 3: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/3.jpg)
3Thursday, March 17, 2011
![Page 4: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/4.jpg)
TLDR?• Java RDF database (“quad store”) (no
native code)
• Freemium model:
• enterprise & community editions
• OEM
• Performance for complex SPARQL queries
• Best available reasoning support
4Thursday, March 17, 2011
![Page 5: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/5.jpg)
NoSQL and SemWeb• Semweb is schemaless and schema-rich
• As agile as NoSQL stores
• More expressive than SQL
• Standards based
• Graph DBs are all ad hoc
• Query Language and, you know, joins
• Do you really want to write map-reduce programs...only?! We sure don’t...!
5Thursday, March 17, 2011
![Page 6: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/6.jpg)
Why another RDF DB?
• We’re scratching our itch for fast query for integration & decision support apps
• aimed at db-reasoner “tweener” space
• operationally agile
• There’s a hole in the market; or: markets are normal distributions (probably)
• Gives us a complete semantic application platform
6Thursday, March 17, 2011
![Page 7: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/7.jpg)
Commercial Market• 6 products
• Technically homogenous:
• Sagan-like scale obsession
• Mostly ad hoc reasoning
• Weak perf on complex queries
• Ho-hum feature sets & integrations
• See http://bit.ly/92P8eN for more
7Thursday, March 17, 2011
![Page 8: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/8.jpg)
Stardog1.0: Overview• Fast
• Lightweight
• Rich API support
• Logical & statistical inference
• Transactions
• Full-text search
• Graph algorithms and path language
• awesome mascot!
8Thursday, March 17, 2011
![Page 9: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/9.jpg)
Fast? No, Really Fast!
• First design goal in Stardog is performance of complex SPARQL query eval on single machine in the default configuration
• Next, total total queries per second
• In-memory mode available, when needed
• Early testing is promising: fastest RDF DB on SP2B benchmark. Often several times faster.
9Thursday, March 17, 2011
![Page 10: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/10.jpg)
Performance• Do yr own testing; the only queries that
matter are yours; don’t trust, test.
• It’s not ready till it’s very, very fast.
• Flatten the RDF performance tax
• About 256 GB for ~2B triples in main-memory mode, i.e., $20k Dell box.
• When in doubt: Add. More. RAM.
10Thursday, March 17, 2011
![Page 11: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/11.jpg)
Scalability• Stardog 1.0: scale up
• Disk-based joins for very large intermediate structures
• Triples compression
• Ideally efficient on-disk indices
• Stardog 2.0: scale out (shared-disk cluster)
• We think it’s easier to scale a fast DB than to speed up a scalable one...
11Thursday, March 17, 2011
![Page 12: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/12.jpg)
Lightweight• ~34 KLOC for core system, ~10 KLOC of
tests (1034 unit tests)
• Trivially simple installation:
• copy JAR & restart servlet container
• If you’ve ever used Sesame...
• May run: embedded, client-server; main memory or disk-backed modes; any combination of these
12Thursday, March 17, 2011
![Page 13: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/13.jpg)
Interfaces
• SNARL (Stardog Native API for RDF Language)
• Avro RPC—esp. the low-level TCP transport (coming soon...)—for Java & non-Java
• Sesame & Jena
• SPARQL Protocol (HTTP)
13Thursday, March 17, 2011
![Page 14: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/14.jpg)
Logical Inference1. OWL 2 QL, EL, and RL “query-time”
reasoning
• No materialization (so: fast bulk loading)
• reasoning enabled per-query
2. OWL 2 DL reasoning via Pellet 3.0
• in-memory, schema reasoning
3. Integrity Constraint Validation via OWL2
4. user-defined & SWRL rules
14Thursday, March 17, 2011
![Page 15: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/15.jpg)
OWL validation of RDF• Use OWL ontologies to validate RDF
instance data in Stardog.
• May be used as a guard to database modifications (so, if resulting data is invalid, transaction fails).
• W3C Member Submission to formalize this approach; stay tuned for details.
• See http://clarkparsia.com/pellet/icv/ for details
15Thursday, March 17, 2011
![Page 16: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/16.jpg)
OWL 2 Support
• Stardog 1.0: query-time, query rewriting reasoner for SPARQL entailment regimes
• It will support all of OWL 2 QL, EL, and RL, with exceptions:
• limited support for datatypes reasoning
• i.e., won’t support user-defined datatypes
• will depend on customer demand
16Thursday, March 17, 2011
![Page 17: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/17.jpg)
Statistical Inference• Corleone is a machine learning system for
RDF and OWL
• Optimized for Stardog
• Multiple classifier & cluster algorithms
• Clusters (similarity) and classifies (predicts) by RDF class & individual
• Machine learning must still be tuned; no magic bullets
17Thursday, March 17, 2011
![Page 18: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/18.jpg)
Transactions
• Supports optional ACID transactions on database mutations
• 2-phase commit based on Java Transaction API
• Tx’d writes 2x to 8x slower, depending on lots of variables
• Writes may be asynchronous & queued
18Thursday, March 17, 2011
![Page 19: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/19.jpg)
Search• Indexes RDF individuals and literals
• Results are 2-tuples (url|value, score)
• Based on Lucene: very fast, very scalable
• Can use 1 of 6 algorithms to partition RDF individuals from a graph
• via SPARQL DESCRIBE hook
• Will be integrated with SPARQL syntax...
19Thursday, March 17, 2011
![Page 20: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/20.jpg)
RDF as Graph• SPARQL isn’t ideal for every use case
• Graph algorithm processing on RDF purely as a graph
• Stardog supports Gremlin, the ad hoc standard for graph database query languages
• Gremlin makes graph algorithms easy to write
• More optimized Gremlin support for 1.0
20Thursday, March 17, 2011
![Page 21: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/21.jpg)
Implementations
Sesame Jena Empire
HTTP API Native API Avro API
Stardog API
SPI Runtime
Transactions
Stardog RDF
Stardog Core
Query
Exec
Optimizer
Plan Filter API
Query Rewriting/Reasoning
Index API SPI
CP Util IO Util Stardog Util Sesame Ext
Plan API
!"#$%&'#&("'
21Thursday, March 17, 2011
![Page 22: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/22.jpg)
Status
• Stardog 0.4.6 alpha release to alpha testers on 15 March 2011
• It feels damn good to ship code, even if it’s just an alpha! :)
• Weekly updates till beta period starts, then bimonthly updates till 1.0 release
22Thursday, March 17, 2011
![Page 23: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/23.jpg)
The Private Beta• Doin’ it old school: private beta, invitation
only
• Helps us keep commercial focus
• ~1 April to 30 May
• [email protected] if yr interested: give name, org, area of interest, etc.
• Rolling releases, new features, bug fixes, etc
• ~90 organizations signed up for beta so far
23Thursday, March 17, 2011
![Page 24: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/24.jpg)
Roadmap• 1.0 in mid-Summer
• SPARQL 1.1, MRMW
• stored procedures in any JVM lang
• Shiro-based security layer
• native OWL 2 RL reasoner
• provenance API
• graph algorithms & an RDF path language
• performance improvements continuously
24Thursday, March 17, 2011
![Page 25: Stardog talk-dc-march-17](https://reader033.fdocuments.us/reader033/viewer/2022060108/554e210db4c9056b798b4d56/html5/thumbnails/25.jpg)
Thanks! Questions?• http://stardog.com/
• http://clarkparsia.com/
• http://twitter.com/candp
• http://twitter.com/stardog_db
25Thursday, March 17, 2011