Big Data Management and NoSQL Databases · NoSQL Databases Five Challenges 4. Analytics and...

67
Big Data Management and NoSQL Databases Lecture 12 PD Dr. Andreas Behrend [email protected] Acknowledgements I am indebted to Prof. Dr.-Ing. Sebastian Michel, Prof. Johan Gamper, and Dr. Holubova for providing me slides.

Transcript of Big Data Management and NoSQL Databases · NoSQL Databases Five Challenges 4. Analytics and...

  • Big Data Management and NoSQL Databases Lecture 12 PD Dr. Andreas Behrend [email protected]

    Acknowledgements

    I am indebted to Prof. Dr.-Ing. Sebastian Michel,

    Prof. Johan Gamper, and Dr. Holubova for providing me slides.

  • What is Big Data?

    buzzword? bubble? gold rush? revolution?

    “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it,

    everyone thinks everyone else is doing it, so everyone claims they are doing it.”

    Dan Ariely

  • What is Big Data?

    No standard definition First occurrence of the term: High

    Performance Computing (HPC)

    Gartner: “Big Data” is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.

    3 (4, 5) Vs

    Volume

    Variety Velocity

    Big Data

  • What is Big Data?

    IBM: Depending on the industry and organization, Big Data encompasses information from internal and external sources such as transactions, social media, enterprise content, sensors, and mobile devices. Companies can leverage data to adapt their products and services to better meet customer needs, optimize operations and infrastructure, and find new sources of revenue.

    http://www.ibmbigdatahub.com/

    Social media and networks (all of us are generating data)

    Scientific instruments (collecting all sorts of data)

    Mobile devices (tracking all objects all the time)

    Sensor technology and networks (measuring all kinds of data)

    http://www.ibmbigdatahub.com/

  • Big Data Characteristics: Volume (Scale)

    http://www.ibmbigdatahub.com/

    Data volume is increasing

    exponentially, not linearly

    1021

    109

    1018

    1012

    http://www.ibmbigdatahub.com/

  • Big Data Characteristics: Variety (Complexity)

    http://www.ibmbigdatahub.com/

    Various formats, types, and

    structures (from semi-structured

    XML to unstructured multimedia)

    Static data vs. streaming data

    1018 109

    http://www.ibmbigdatahub.com/

  • Big Data Characteristics: Velocity (Speed)

    http://www.ibmbigdatahub.com/

    Data is being generated fast and

    need to be processed fast

    Online Data

    Analytics

    http://www.ibmbigdatahub.com/

  • Big Data Characteristics: Veracity (Uncertainty)

    http://www.ibmbigdatahub.com/

    Uncertainty due to inconsistency, incompleteness,

    latency, ambiguities, or approximations.

    1012

    http://www.ibmbigdatahub.com/

  • Some Numbers as of 2015 Estimated Size of Data

    • Google: 15 000 PB (=15 Exabytes)

    • Facebook: 300 PB • Ebay: 90 PB • Spotify: 10 PB

    Data Processed per Day • Google: 100 PB • Ebay: 100 PB • NSA: 29 PB • Facebook: 600 TB • Twitter: 100 TB • Spotify: 2,2 TB

    MB = 106 Bytes GB = 109 Bytes TB (Terabyte) = 1012 Bytes PB (Petabyte) = 1015 Bytes EB (Exabyte) = 1018 Bytes

  • How does Data Look Like? • Not necessarily like you got used to in database

    lectures: usually not nicely structured (BCNF or 3NF) relations with known schema information.

    • But: – Twitter Tweets – Server Access Logs – Web Pages – Web Graph – Huge CSV files in general (e.g., holding a “relation”)

  • {"created_at":"Wed Jan 21 15:21:04 +0000 2015","id":557920823764586496,"id_str":"557920823764586496","text":"#T ulsaAirport #Oklahoma Jan 21 08:53 Temperature 37\u00b0F clouds Wind NW 7 km\/h Humidity 85% .. http:\/\/t.co\ /SnC8ST3gQC","source":"\u003ca href=\"http:\/\/www.woweather.com\/USA\/TulsaIAP.htm\" rel=\"nofollow\"\u003eupd ate weather tulsa\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":nu ll,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":255167 921,"id_str":"255167921","name":"Weather Tulsa","screen_name":"wo_tulsa","location":"Tulsa","url":"http:\/\/itu nes.apple.com\/app\/weatheronline\/id299504833?mt=8","description":"Weather Tulsa\n\nhttp:\/\/www.woweather.com \/USA\/Tulsa.htm","protected":false,"verified":false,"followers_count":111,"friends_count":60,"listed_count":5, "favourites_count":0,"statuses_count":33805,"created_at":"Sun Feb 20 20:31:42 +0000 2011","utc_offset":7200,"ti me_zone":"Athens","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_b ackground_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1 \/bg.pn g","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_ back ground_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_ color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/ \ /pbs.twimg.com\/profile_images\/1249942071\/WO-20px- linien_normal.png","profile_image_url_https":"https:\/\/pbs .twimg.com\/profile_images\/1249942071\/WO- 20px-linien_normal.png","default_profile":true,"default_profile_imag e":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place ":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"TulsaAirport", "indices":[0,13]},{"text":"Oklahoma","indices":[14,23]}],"trends":[],"urls":[{"url":"http:\/\/t.co\/SnC8ST3gQC","expa nded_url":"http:\/\/bit.ly\/188eNcw","display_url":"bit.ly\/188eNcw","indices":[93,115]}],"user_mentions":[],"sym bols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"e n","timestamp_ms":"1421853664710"} {"created_at":"Wed Jan 21 15:21:04 +0000 2015","id":557920823877464064,"id_str":"557920823877464064","text":"An ime episode updated: Kyoukai no

    How to store or analyse such Data?

    http://www.woweather.com/http://www.woweather.com/http://www.woweather.com/

  • Processing Big Data OLTP: Online Transaction Processing (DBMSs)

    Database applications Storing, querying, multiuser access

    OLAP: Online Analytical Processing (Data Warehousing) Answer multi-dimensional analytical queries Financial/marketing reporting, budgeting, forecasting, …

    RTAP: Real-Time Analytic Processing (Big Data Architecture & Technology) Data gathered & processed in a real-time

    Streaming fashion Real-time data queried and presented in an online fashion Real-time and history data combined and mined interactively

  • Key Big Data-Related Technologies Distributed file

    systems NoSQL databases Grid computing,

    cloud computing MapReduce and

    other new paradigms

    Large scale machine learning

    http://e-theses.imtlucca.it/34/

    http://e-theses.imtlucca.it/34/http://e-theses.imtlucca.it/34/http://e-theses.imtlucca.it/34/

  • Relational Database Management Systems (RDMBSs) Predominant technology for storing structured

    data Established query languages, e.g. SQL, RA Often thought of as the only alternative for data

    storage Persistence, concurrency control, consistency

    control, … Alternatives: Object databases or XML stores Never gained the same adoption and market

    shareT

  • Why Distributed File Systems?

    • Assume you got 10 TB data on disk • Now, do some analysis of it

    • With a 100MB/s disk, reading alone takes

    – 100000 seconds – 1666 minutes – 27 hours

  • Need to do something about it

    http://flickr.com/photos/jurvetson/157722937/

    http://www.google.com/about/datacenter

    http://flickr.com/photos/jurvetson/157722937/http://www.google.com/about/datacenter

  • Scale-up vs Scale-out Scale-Up (vertical scaling):

    More RAM

    More CPU

    More HDD

    Scale-Out (horizontal scaling):

    Same Hardware

    Connected by network

  • Data Centers

    source: http://www.google.com/about/datacenters/inside/index.html

    http://www.google.com/about/datacenters/inside/index.html

  • Hardware Failures • Lots of machines (commodity hardware)

    failure is not an exception but very common • P[machine fails today] = 1/365 • n machines: P[failure of at least 1 machine] =

    1-(1-P[machine fails today])^n

    – for n=1: 0.0027 – for n=10: 0.02706 – for n=100: 0.239 – for n=1000: 0.9356 – for n=10 000: ~ 1.0

    source: google.com

  • Fallacies of Distributed Computing 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous source: Peter Deutsch

    and others at Sun

  • Failure Handling & Recovery

    • Hardware failures happen virtually at any time

    • Algorithms/Infrastructures have to compensate

    • Issues in distributed computing:

    • Replication of data • Logging of state • Redundancy in task execution

  • „NoSQL“ 1998 first used for a relational database that

    omitted the use of SQL Carlo Strozzi

    2009 used for conferences of advocates of non- relational databases Eric Evans

    Blogger, developer at Rackspace

    NoSQL movement = “the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for”

  • „NoSQL“

    Not „no to SQL“ Another option, not the only one

    Not „not only SQL“ Oracle DB or PostgreSQL would fit the definition

    „Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent (BASE, not ACID), a huge data amount, and more“

    http://nosql-database.org/

    http://nosql-database.org/http://nosql-database.org/http://nosql-database.org/

  • The End of Relational Databases?

    Relational databases will not disappear Compelling arguments for most projects Familiarity, stability, feature set, and available support

    We should see relational databases as one option for data storage Polyglot persistence – using different data stores in

    different circumstances Search for optimal storage for a particular application

  • Motivation for NoSQL Databases

    Huge amounts of data are now handled in real- time

    Both data and use cases are getting more and more dynamic

    Social networks (relying on graph data) have gained impressive momentum Special type of NoSQL databases: graph databases

    Full-texts have always been treated shabbily by RDBMS

  • Example: FaceBook http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/

    Statistics from 2010

    500 million users 570 billion page views per month 3 billion photos uploaded per month 1.2 million photos saved per second 25 billion pieces of content (updates, comments) shared every

    month 50 million server-side operations per second

    2008: 10,000 servers 2009: 30,000 servers …

    → One RDBMS may not be enough to keep this going on!

    And even newer numbers: https://research.facebook.com/blog/facebook-s-top-open-data-problems/

    http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/https://research.facebook.com/blog/facebook-s-top-open-data-problems/

  • Example: FaceBook Architecture from 2010

    Cassandra NoSQL distributed storage system with

    no single point of failure For searching your inbox messages

    Hadoop/Hive An open source MapReduce

    implementation Enables to perform calculations on

    massive amounts of data Hive enables to use SQL queries

    against Hadoop

  • Example: FaceBook Architecture from 2010 and later

    Memcached Distributed memory caching system Caching layer between the web servers

    and MySQL servers Since database access is relatively slow

    HBase Hadoop database, used for e-mails,

    instant messaging and SMS Has recently replaced MySQL,

    Cassandra and few others Built on Google’s BigTable model

  • NoSQL Databases Five Advantages

    1. Elastic scaling “Classical” database administrators scale up – buy

    bigger servers as database load increases Scaling out – distributing the database across multiple

    hosts as load increases 2. Big Data Volumes of data that are being stored have increased

    massively Opens new dimensions that cannot be handled with

    RDBMS

    http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772

    http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772

  • NoSQL Databases Five Advantages

    3. Goodbye DBAs (see you later?) Automatic repair, distribution, tuning, … vs. expensive,

    highly trained DBAs of RDBMS 4. Economics Based on cheap commodity servers → less costs per

    transaction/second 5. Flexible Data Models Non-existing/relaxed data schema → structural changes

    cause no overhead

  • NoSQL Databases Five Challenges

    1. Maturity Still in pre-production phase Key features yet to be implemented 2. Support Mostly open source, result from start-ups

    Enables fast development Limited resources or credibility 3. Administration Require lot of skills to install and effort to maintain

  • NoSQL Databases Five Challenges

    4. Analytics and Business Intelligence Focused on web apps scenarios

    Modern Web 2.0 applications Insert-read-update-delete

    Limited ad-hoc querying Even a simple query requires significant programming expertise

    5. Expertise Few number of NoSQL experts available in the market

  • Data Assumptions

    RDBMS NoSQL integrity is mission-critical OK as long as most data is correct

    data format consistent, well-defined data format unknown or inconsistent

    data is of long-term value data is expected to be replaced

    data updates are frequent write-once, read multiple (no updates, or at least not often)

    predictable, linear growth unpredictable growth (exponential)

    non-programmers writing queries only programmers writing queries

    regular backup replication

    access through master server sharding across multiple nodes

  • NoSQL Data Model Aggregates Data model = the model by which the database

    organizes data Each NoSQL solution has a different model Key-value, document, column-family, graph First three orient on aggregates

    Aggregate A data unit with a complex structure

    Not just a set of tuples like in RDBMS Domain-Driven Design: “an aggregate is a collection

    of related objects that we wish to treat as a unit” A unit for data manipulation and management of consistency

  • NoSQL Data Model Aggregates – aggregate-ignorant

    There is no universal strategy how to draw aggregate boundaries Depends on how we manipulate the data

    RDBMS and graph databases are aggregate- ignorant It is not a bad thing, it is a feature Allows to easily look at the data in different ways Better choice when we do not have a primary

    structure for manipulating data

    NoSQL

  • NoSQL Data Model Aggregates – aggregate-oriented

    Aggregate orientation Aggregates give the database information about

    which bits of data will be manipulated together Which should live on the same node

    Helps greatly with running on a cluster We need to minimize the number of nodes we need to query

    when we are gathering data

    Consequence for transactions NoSQL databases support atomic manipulation of a

    single aggregate at a time

  • NoSQL Databases Materialized Views Disadvantage: the aggregated structure is given, other

    types of aggregations cannot be done easily RDBMSs lack of aggregate structure → support for accessing

    data in different ways (using views) Solution: materialized views

    Pre-computed and cached queries Strategies:

    Update materialized view when we update the base data For more frequent reads of the view than writes

    Run batch jobs to update the materialized views at regular intervals

  • NoSQL Databases Schemalessness When we want to store data in a RDBMS, we need to

    define a schema Advocates of schemalessness rejoice in freedom and

    flexibility Allows to easily change your data storage as we learn more

    about the project Easier to deal with non-uniform data

    Fact: there is usually an implicit schema present The program working with the data must know its structure

  • Types of NoSQL Databases Core: Key-value stores (databases) Document databases Column-family (column-oriented/columnar) stores

    (in constrast to relational columnar DBs like Monet DB) Graph databases

    Non-core: Object databases XML databases …

    http://nosql-database.org/

    http://nosql-database.org/

  • Key-value store Basic characteristics

    The simplest NoSQL data stores A simple hash table (map), primarily used when all

    access to the database is via primary key A table in RDBMS with two columns, such as ID and

    NAME ID column being the key NAME column storing the value

    A BLOB that the data store just stores Basic operations:

    Get the value for the key Put a value for a key Delete a key from the data store

    Simple → great performance, easily scaled Simple → not for complex queries, aggregation needs

  • • Data model: (key) -> value • Interface: CRUD (Create, Read, Update, Delete)

    Key-Value Stores

    users:2:friends {23, 76, 233, 11} users:2:inbox [234, 3466, 86,55]

    Theme → "dark", cookies → "false" users:2:settings

    Value: An opaque blob

    Key

  • Key-value store Representatives MemcachedDB

    not open-source

    Project Voldemort

    open-source version

  • Key-value store Suitable Use Cases

    Storing Session Information Every web session is assigned a unique session_id value Everything about the session can be stored by a single PUT request

    or retrieved using a single GET Fast, everything is stored in a single object User Profiles, Preferences Every user has a unique user_id, user_name + preferences such as

    language, colour, time zone, which products the user has access to, …

    As in the previous case: Fast, single object, single GET/PUT

    Shopping Cart Data Similar to the previous cases

  • Key-value store When Not to Use

    Relationships among Data Relationships between different sets of data Some key-value stores provide link-walking features

    Not usual Multioperation Transactions Saving multiple keys

    Failure to save any one of them → revert or roll back the rest of the operations

    Query by Data Search the keys based on something found in the value part Operations by Sets Operations are limited to one key at a time No way to operate upon multiple keys at the same time

  • Column-Family Stores Basic Characteristics

    Also “columnar” or “column-oriented” Column families = rows that have many columns

    associated with a row key Column families are groups of related data that is often

    accessed together e.g., for a customer we access all profile information at the same

    time, but not orders

  • Examples: Cassandra (AP), Google BigTable (CP),

    Wide-Column Stores

    com.cnn.www crawled: … content : "…" content : "…" content : "…" title : "CNN"

    Row Key Column

    Data model: (rowkey, column, timestamp) -> value Interface: CRUD, Scan

    Versions (timestamped)

    HBase (CP)

  • Column-Family Stores Representatives

    Google’s BigTable

  • Column-Family Stores Suitable Use Cases

    Event Logging Ability to store any data structures → good choice to store event information Content Management Systems, Blogging Platforms We can store blog entries with tags, categories, links, and trackbacks in

    different columns Comments can be either stored in the same row or moved to a different

    keyspace Blog users and the actual blogs can be put into different column families

  • Column-Family Stores When Not to Use

    Systems that Require ACID Transactions Column-family stores are not just a special kind of RDBMSs with

    variable set of columns! Aggregation of the Data Using Queries (such as SUM or AVG) Have to be done on the client side For Early Prototypes We are not sure how the query patterns may change As the query patterns change, we have to change the column family

    design

  • Document Databases Basic Characteristics

    Documents are the main concept Stored and retrieved XML, JSON, …

    Documents are Self-describing Hierarchical tree data structures Can consist of maps, collections (lists, sets, …), scalar values,

    nested documents, … Documents in a collection are expected to be similar

    Their schema can differ Document databases store documents in the value part

    of the key-value store Key-value stores where the value is examinable

  • Data model: (collection, key) -> document Interface: CRUD, Querys, Map-Reduce

    Examples: CouchDB (AP), Amazon SimpleDB (AP),

    Document Stores

    order-12338 { order-id: 23, customer: { name : "Felix Gessert", age : 25 } line-items : [ {product-name : "x", …} , …]

    }

    ID/Key JSON Document

    MongoDB (CP)

  • Document Databases Representatives

    Lotus Notes Storage Facility

  • Document Databases Suitable Use Cases Event Logging Many different applications want to log events

    Type of data being captured keeps changing Events can be sharded (i.e. divided) by the name of the application or type

    of event Content Management Systems, Blogging Platforms Managing user comments, user registrations, profiles, web-facing

    documents, … Web Analytics or Real-Time Analytics Parts of the document can be updated New metrics can be easily added without schema changes

    E.g. adding a member of a list, set,… E-Commerce Applications Flexible schema for products and orders Evolving data models without expensive data migration

  • Document Databases When Not to Use

    Complex Transactions Spanning Different Operations Atomic cross-document operations

    Some document databases do support (e.g., RavenDB) Queries against Varying Aggregate Structure Design of aggregate is constantly changing → we need

    to save the aggregates at the lowest level of granularity i.e. to normalize the data

  • Graph Databases Basic Characteristics

    To store entities and relationships between these entities Node is an instance of an object Nodes have properties

    e.g., name Edges have directional significance Edges have types

    e.g., likes, friend, …

    Nodes are organized by relationships Allow to find interesting patterns e.g., “Get all nodes employed by Big Co that like NoSQL

    Distilled”

  • Example:

  • Graph Databases RDBMS vs. Graph Databases When we store a graph-like structure in RDBMS, it is for

    a single type of relationship “Who is my manager”

    Adding another relationship usually means a lot of schema changes

    In RDBMS we model the graph beforehand based on the traversal we want If the traversal changes, the data will have to change In graph databases the relationship is not calculated at query

    time but persisted

  • Graph Databases Representatives

    FlockDB

  • Graph Databases Suitable Use Cases

    Connected Data Social networks Any link-rich domain is well suited for graph databases Routing, Dispatch, and Location-Based Services Node = location or address that has a delivery Graph = nodes where a delivery has to be made Relationships = distance Recommendation Engines “your friends also bought this product” “when invoicing this item, these other items are usually invoiced”

  • Graph Databases When Not to Use

    When we want to update all or a subset of entities Changing a property on all the nodes is not a straightforward

    operation e.g., analytics solution where all entities may need to be updated

    with a changed property Some graph databases may be unable to handle lots of

    data Distribution of a graph is difficult or impossible

  • NoSQL Data Model Aggregates and NoSQL databases

    Key-value database Aggregate = some big blob of mostly meaningless bits

    But we can store anything We can only access an aggregate by lookup based on

    its key Document database Enables to see a structure in the aggregate

    But we are limited by the structure when storing (similarity) We can submit queries to the database based on the

    fields in the aggregate

  • NoSQL Data Model Aggregates and NoSQL databases

    Column-family stores A two-level aggregate structure

    The first key is a row identifier, picking up the aggregate of interest

    The second-level values are referred to as columns Ways to think about how the data is structured:

    Row-oriented: each row is an aggregate with column families representing useful chunks of data (profile, order history)

    Column-oriented: each column family defines a record type (e.g., customer profiles) with rows for each of the records; a row is the join of records in all column families

  • 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

    History Google File System

    MapReduce CouchDB

    MongoDB Dynamo

    Cassandra Riak

    MegaStore

    F1

    Redis

    HyperDeX Spanner

    CouchBase

    Dremel

    Hadoop &HDFS HBase

    BigTable

  • References http://nosql-database.org/ Pramod J. Sadalage – Martin Fowler: NoSQL Distilled:

    A Brief Guide to the Emerging World of Polyglot Persistence

    Eric Redmond – Jim R. Wilson: Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement

    Sherif Sakr – Eric Pardede: Graph Data Management: Techniques and Applications

    Shashank Tiwari: Professional NoSQL

    http://nosql-database.org/

    Big Data Managementand NoSQL DatabasesWhat is Big Data?What is Big Data?What is Big Data?Big Data Characteristics:Volume (Scale)Big Data Characteristics:Variety (Complexity)Big Data Characteristics:Velocity (Speed)Big Data Characteristics:Veracity (Uncertainty)Some Numbers as of 2015How does Data Look Like?How to store or analyse such Data?Processing Big DataKey Big Data-RelatedTechnologiesRelational Database ManagementSystems (RDMBSs)Why Distributed File Systems?Need to do something about itScale-up vs Scale-outData CentersHardware FailuresFallacies of Distributed ComputingFailure Handling & Recovery„NoSQL“„NoSQL“The End of Relational Databases?Motivation for NoSQL DatabasesExample: FaceBookExample: FaceBookExample: FaceBookNoSQL DatabasesNoSQL DatabasesNoSQL DatabasesNoSQL DatabasesData AssumptionsNoSQL Data ModelSlide Number 35Slide Number 36Slide Number 37NoSQL Data ModelNoSQL Data ModelNoSQL DatabasesNoSQL DatabasesTypes of NoSQL DatabasesKey-value storeKey-Value StoresKey-value storeRepresentativesKey-value storeKey-value storeColumn-Family StoresWide-Column StoresColumn-Family StoresRepresentativesColumn-Family StoresSuitable Use CasesColumn-Family StoresDocument DatabasesDocument StoresDocument DatabasesRepresentativesDocument DatabasesDocument DatabasesGraph DatabasesExample:Graph DatabasesGraph DatabasesRepresentativesGraph DatabasesGraph DatabasesNoSQL Data ModelNoSQL Data ModelHistoryReferences