MySql to HBase in 5 Steps

Post on 28-Aug-2014

1.201 views 1 download

Tags:

description

Converting MySql or Oracle databases to Apache HBase with on-line examples using the popular Wordnet dictionary

Transcript of MySql to HBase in 5 Steps

MySql to HBase in 5 StepsConverting MySql or Oracle databases to Apache HBase™ with on-line examples using the popular Wordnet® dictionary

Scott Cinnamond – TerraMeta Software Inc.http://cloudgraph.org

CloudGraph ®

What is Wordnet® ?• Large complex lexical (MySql) database of

English. • Nouns, verbs, adjectives and adverbs

grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

• Synsets are interlinked by means of conceptual-semantic and lexical relations.

HBase Conversion Stepshttp://wordnet.cloudgraph.org

1) Model Creation: reverse engineer Wordnet DB into UML®

2) Code Generation: provision persistence and query-DSL java code

3) HBase™ Table Mapping: map data graphs and row keys to table(s)

4) Data Migration: MySql to HBase

5) Services / App Creation: build services, web app

1.) Model CreationReverse engineer Wordnet DB into PlasmaSDO™ UML® Model

• Capture entities, properties, data types, associations, enumerations, comments as UML

• Why UML? Popular standards-based format. Editable, viewable using standard tools. Supports enterprise governance processes

• How? Maven build with plasma-maven-plugin RDB tool (goal:RDB, action:reverse, dialect:mysql)

• Download working example at https://github.com/cloudgraph/wordnet

Generated Wordnet Model(core subset of 30 total entities and enumerations)

2.) Code GenerationProvision SDO persistence and query DSL java code

• Generate Java API based on Wordnet UML Model

• Why? Use across RDB, HBase, other CloudGraph Services. Compile time checking for queries, all persistence logic

• How? Maven build with plasma-maven-plugin SDO and DSL tools

• See generated API Javadocs on-line at http://wordnet.cloudgraph.org

3.) HBase™ Table MappingMap data graphs and row keys to HBase™ table(s)

• Configure delimited, hashed, salted, formatted, composite row keys with (xpath) paths into target data graphs

• Map data graph roots to HBase tables • Why? Automates row-key creation via data

extraction processing from anywhere in your data graphs

• How? CloudGraph Configuration XML. See https://github.com/cloudgraph/wordnet

4.) Data Migration MySql to HBase

• Create RDB-to-HBase standalone migration app using generated persistence and DSL query API incrementally call CloudGraph HBase and RDB services

• Why? Wordnet data is large and highly connected, so must be incrementally extracted/inserted and linked

5.) Services / App CreationBuild services, web app

• Build simple pojo services using persistence and DSL query API

• Encapsulate Wordnet business logic• Add adapter/wrapper structures• Call services called from web-app

Web Apphttp://wordnet.cloudgraph.org

• Auto-complete field triggers CloudGraph HBase to use the HBase fuzzy row filter API

• Find button returns all semantic and lexical relations for the selected word, including descriptions and example sentences

• Resulting relation graphs typically contain more than 100 nodes and return in less than 200 milliseconds

Conclusions• Complex, highly recursive RDB models

can be easily converted and leveraged in HBase and future CloudGraph services

• Large lexical data graphs can be returned in single query

• Data migration difficult given complex recursive model

Resources• Download the complete CloudGraph Wordnet

example: https://github.com/cloudgraph/wordnet• Run the example online:

http://wordnet.cloudgraph.org• Project details, contact information:

http://cloudgraph.org• Beta Source Repo:

https://github.com/terrameta/cloudgraph• Production Source Repo (under construction):

https://github.com/cloudgraph

Status / Legal

• Project Status– CloudGraph ® is currently under private beta testing

• Licensing– CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed

under version 2 of the GNU General Public License• Trademarks

– WordNet ® is a registered trademark of Princeton University– Apache HBase™ is a trademark of Apache Software Foundation– CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta

Software Inc.