Post on 15-Feb-2016
description
NoSQL DatabasesOracle - Berkeley DB
Content
A brief intro to NoSQLAbout Berkeley DbAbout our application
3
???
What is NoSQL?• Stands for Not Only SQL• Class of non-relational data storage systems• Usually do not require a fixed table schema nor do
they use the concept of joins, group by, order by and so on.
• All NoSQL offerings relax one or more of the ACID properties.
What is NoSQL ?• Next generation databases • Characteristic:– Large Data Volumes– Non-relational– Distributed– Open-source– Scalable replication and distribution
CAP Theorem
8
History of NoSQL
• The term NoSQL was introduced by Carl Strozzi in 1998 to name his file based database.
• It was again re-introduced by Eric Evans when an event was organized to discuss open source distributed databases.
Why NoSQL Databases ?
• Bigness• Massive write performance• Fast key-value access• Flexible schema and Flexible data types• No single point of failure• Programming ease of use
12
Scaling to size vs complexity.
Berkeley DB - Introduction
• An open-source, embedded transactional data management system.
• A key/value store.• Runs on everything from cell phone to large
servers.• Distributed as a library that can be linked directly
into an application.• Berkeley DB has high reliability and high
performance.
Berkeley DB Product Family Architecture
Berkeley DB: The Design Philosophy
• Provide mechanisms without specifying policies.• For example, Berkeley DB is abstracted as a store
of <key, value> pairs.– Both keys and values are opaque byte-strings.– Berkeley DB has no schema.– Application that embeds Berkeley DB is
responsible for imposing its own schema on the data.
Data Access Services
• Indexing methods– B-Tree– Hash– Queue– A record-number-based index
Advantages of <key, value> pairs
• An application is free to store data in whatever form is most natural to it.
– Objects (like structures in C language)– Rows in Oracle, SQL Server– Columns in C-store
• Different data formats can be stored in the same databases.
Data Management Services
ConcurrencyTransactionsRecovery
Berkeley DB Applications
• Local Directory Access Protocol• Mail Servers• Manage access control lists• Store user keys in a public-infrastructure• Record machine-to-network address
mappings in address servers
Berkeley DB for Computationally Intensive
Algorithms• Algorithms that repeatedly execute a
computationally intensive operation– E.g. Factorial
• Useful to create a cache containing the already computed results– Cache = Set of <key,value> pairs containing <n,
factorial(n)>
• Advantages:– avoid to re-compute results for the same input (even
over different executions) – In a process crash, we can still start again the process
and quickly go back to the point where it stopped
• In memory map• Simple• Very efficient (b/s in completely memory)• Need considerable amount of memory• No fault tolerance (We need to manually save data to a file)• Relation Databases• ACID properties may not be necessary• Cannot handle Big data • Slow• NoSQL databases (Berkeley DB)• Fast key-value access• Flexible schema and Flexible data types• Ease of use• Fault tolerance
Berkeleydb.java
• Open Environment:• EnvironmentConfig class specify environment configuration parameters
• Open Class Catalog: • Class catalog : specialized database store that contain
java class descriptions of all serialized objects stored in the database
• Create Database and StoredClassCatalog object
• Open Database:
• Close Environment, Class Catalog and Databases:
DBViews.java
Factorial.java
Factorial (Berkeley DB ) – Memory Usage
Factorial (MySQL) – Memory Usage
Factorial (HashMap) – Memory Usage