Estado del arte de las tecnologías NoSQL_Sergio Rodríguez
-
Upload
oracle-espana -
Category
Technology
-
view
257 -
download
1
description
Transcript of Estado del arte de las tecnologías NoSQL_Sergio Rodríguez
NoSQL
[ {‘nombre’: ‘Sergio Rodríguez de Guzmán’,‘email’: ‘[email protected]’,‘departamento’: ‘formación’
} ]
LÍNEAS DE NEGOCIOLÍNEAS DE NEGOCIO
FORMACIÓN IT DESARROLLO PRODUCTOCONSULTORÍA GESTIÓN DE IDENTIDAD Y ACCESOS
BIG DATA
Centro de Formación oficial de IT.
Acuerdos con Cloudera, Oracle, MongoDB, Apple, IBM, Vmware, Microsoft, Red Hat, Cisco…
Presencial, Virtual, Online
Personalización por proyecto
Consultoría, Integración y Soporte.
Tecnologías Oracle, Microsoft, Red Hat, ForgeRock, OWS2, Opensource…
Soporte 24x7, Proyectos Llave en mano
Expertos Big Data desde 2011
Acuerdos con Cloudera, Oracle, MongoDB,
Paquetes de servicios predefinidos en ETL, Arquitectura, Seguridad, Analíticas Descriptivas y Certificación de Producción
Soluciones de Gestión de Identidad y Accesos.
Tecnologías Oracle, Microsoft, Red Hat, ForgeRock, OWS2, Opensource…
Soporte 24x7, Proyectos Llave en mano
Proyectos en MDM
Soluciones de Seguridad y Auditoría del CPD
Gestión y Actualización de parches Multivendor para el CPD
Desarrollo de Aplicaciones Móviles
1980
1990
2000
2010
Rise of Relational
PersistenceIntegration
SQLTransactions
Reporting
IMPEDANCEMISMATCH
1980
1990
2000
2010
Rise of objectDatabases
Billing
Inventory
Integration Database
1980
1990
2000
2010
RelationalDominance
Lots of Traffic
SQL SQL
BigTable
Dynamo
“NoSQL”
Johan Oskarsson
London
San Francisco
#nosql
Dynomite
Characteristics of NoSQL
Non-relationalOpen Source
Schema-less Cluster-friendly
21st Century Web
DATA MODEL
DOCUMENT
GRAPH
COLUMN
KEY-VALUENoSQL
KEY-VALUE
10025
10026
10043
10048
DOCUMENT{ "id": 1, "name": "A green door", "price": 12.50, "tags": ["home", "green"] }
{ "id": 1, "name": "A blue door", "price": 14.50, "tags": ["home", “blue"], “discount": true}
Noschema
anOrder[“price”] * anOrder[“quantity”]
Implicitschema
KEY-VALUE10025
10026
10043
DOCUMENT
{ "id": 1, "name": "A green door", "price": 12.50, "tags": ["home", "green"] }
{ "id": 1, "name": "A blue door", "price": 14.50, "tags": ["home", “blue"], “discount": true}
customer_id: 7231
metadatakey
Key-Value Document
Aggregate-Oriented
Aggregate
Order
Line Item
KEY-VALUE10025
10026
10043
DOCUMENT
{ "id": 1, "name": "A green door", "price": 12.50, "tags": ["home", "green"] }
{ "id": 1, "name": "A blue door", "price": 14.50, "tags": ["home", “blue"], “discount": true}
Value == Aggregate Document == Aggregate
COLUMN-FAMILY
1234
name “sergio”
billingAddress data…
payment data…
OR1001 data…
OR1002 data…
OR1003 data…
OR1004 data…row key
column family
column key column value
profile
orders
Order1001
aggregate
Product Revenue Prior revenue
321291233 3083 7043
343412758 5032 4782
131494408 2198 3187
… … …
… … …
… … …
… … …
Order
Line Item Product1
DOCUMENT
GRAPH
COLUMN
KEY-VALUENoSQL
Aggregate-Oriented
DocumentColumn-Family
Key-valueGraph
Graph
GraphBigCo
Sergio Lucia
Rocio Rosana
employee_of employee_of
friend
friend
START rocio = node:nodeIndex(name = “Rocio”) MATCH (rocio)-[:FRIEND]->(friend_node)RETURN friend_node.name,friend_node.location
friend
Aggregate-Oriented
DocumentColumn-Family
Key-valueGraph
Schemaless
NOSQL AND CONSISTENCY
RDBMS == ACID
NoSQL == BASE
Aggregate-Oriented
DocumentColumn-Family
Key-valueGraph
ACID
Browser Server Database
Get Get
Post
Post
Offline Lock
v101
v101 v101VersionStamp
v102
v101
Consistency
Logical
Replication
Lucia Sergio
Lucia Sergio
Lucia Sergio
Lucia Sergio
Consistency
Availability
CAP Theorem
Consistency
AvailabilityPartitionTolerance
Pick any 2
Partition
Consistency
Availability
OR
Partition
Consistency
Availability
Consistency
Response Time
Safety
Liveness
RelaxingDurabilityEventual
ConsistencyQuorums
Read-Your-WritesConsistency
WHEN AND WHY TO USE NOSQL?
easierdevelopment
large scale data
NoSQL
Billing
Inventory
IntegrationDatabase
Billing
Inventory
ApplicationDatabaseWebservice
API
1980
1990
2000
2010
NoSQL?
1980
1990
2000
2010
PolyglotPersistence
User sessionsRedis
Financial DataRDBMS
Shopping CartRiak
RecommendationNeo4J
Product CatalogMongoDB
ReportingRDBMS
AnalyticsCassandra
User activity logsCassandra
Speculative Retailers Web Application
Problems
Decisions
Organizational Change
ImmaturityEventual Consistency
Strategic
Rapid time tomarket
Dataintensive
andand/or
Possible Use Cases• Use A NoSQL Database For A Particular Application
Feature• Use A NoSQL Database For Speedy Batch
Processing• Use A NoSQL Database For Distributed Logging• Use A NoSQL Database For Large Tables• Use A RDBMS For Reporting
What's The Catch?• Difficult For Data In Different Databases To Interact• You Now Have To Decide Where To Store Data• Increased Application And Deployment Complexity• Additional Administrative Responsibilities• Training
APIS
Java
NoSQL
Python
Javascript EktorpJrelaxCouchDB4J
Who Is Actually Doing This?
Twitter• Vertically and horizontally partitioned MySQL• Several layers of aggressive caching, all application managed• Schema changes impossible, resulting in the use of bitfields
and piggyback tables• Hardware intensive• Error prone• Hitting MySQL limits• Already eventually consistent
FlockDB
Twitter• Migrating from MySQL to Cassandra as their main
online data store• Hadoop/HBase used for people search feature• FlockDB used to manage the social graph• Hadoop for analytics• “As with all NoSQL systems, strengths in different
situations” - Kevin Weil, Analytics Lead, Twitterhttp://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010
Twitter• Increased availability• The ability to support new features• The ability to analyze their massive amount of
data in a reasonable amount of time
http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010
NoSQL Job Trends
NoSQL Job Growth by Project
NoSQL Job Growth by Project (Relative)
NOSQL + BIG DATA SIMPLE SAMPLEGrokking Twitter
Step by Step• Use/Install Hadoop NoSQL Plugin• Import tweets from twitter• Write mapper in Java/Python• Write reducer in Java/Python• Call myself a data scientist
Groking Twittercurl --get 'https://stream.twitter.com/1.1/statuses/sample.json' --header 'Authorization: OAuth oauth_consumer_key="OsITqnRiCTmkQcv4dtPPj3mnq", oauth_nonce="d41d45177ab9b450f7d1cb82b0d37328", oauth_signature="bOpdpvFNxPuqrlUV4nBhiyyGWbA%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="1411988258", oauth_token="295079318-AbQu8sOPCaxXebjwDnDhOUjMST8bgs60JajOffMn", oauth_version="1.0"' --verbose | mongoimport –d test –c live
Map Hashtags in Python#!/usr/bin/env python
import syssys.path.append(".")
from pynosql_hadoop import BSONMapper
def mapper(documents): for doc in documents: for hashtag in doc['entities']['hashtags']: yield {'_id': hashtag['text'], 'count': 1}
BSONMapper(mapper)print >> sys.stderr, "Done Mapping."
Reduce Hashtags in Python#!/usr/bin/env python
import syssys.path.append(".")
from pynosql_hadoop import BSONReducer
def reducer(key, values): print >> sys.stderr, "Hashtag %s" % key.encode('utf8') _count = 0 for v in values: _count += v['count'] return {'_id': key.encode('utf8'), 'count': _count}
BSONReducer(reducer)
All Together$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -libjars /usr/lib/hadoop/lib/nosql-hadoop.jar,/usr/lib/hadoop/lib/nosql-hadoop-streaming-1.4.0-SNAPSHOT.jar -mapper /tmp/twit_hashtag_map.py -reducer /tmp/twit_hashtag_reduce.py -jobconf nosql.input.uri=nosqldb://127.0.0.1/test.live -inputformat com.nosqldb.hadoop.mapred.NoSQLInputFormat -jobconf nosql.output.uri=nosqldb://127.0.0.1/test.twit_reduction -outputformat com.nosqldb.hadoop.mapred.NoSQLOutputFormat -io nosqldb -input /tmp/in -output /tmp/out -file /tmp/twit_hashtag_map.py -file /tmp/twit_hashtag_reduce.py
Popular Hashtagsdb.twit_hashtags.find().sort( {'count' : -1 }){ "_id" : "gameinsight", "count" : 1367 }{ "_id" : "رتويت", "count" : 1135 }{ "_id" : "넌감동이야 ", "count" : 796 }{ "_id" : "비투비 ", "count" : 778 }{ "_id" : " _ _ عنك_ غريبة معلومة { count" : 768" ,"ضع{ "_id" : "ريتويت", "count" : 757 }{ "_id" : " _ _ _ الواتساب_ قروبات في وظيفتك { count" : 748" ,"ماهي{ "_id" : "androidgames", "count" : 706 }{ "_id" : "android", "count" : 683 }{ "_id" : " _ _ الثنيان_ من احسن { count" : 680" ,"الفريدي