Elasticsearch : petit déjeuner du 13 mars 2014
-
Upload
alter-way -
Category
Technology
-
view
110 -
download
4
description
Transcript of Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch in 1 slide
• More than 6 million downloads
• 450,000 New Downloads per Month
• 1000s of Mission Critical Implementations
• Top Investors: Benchmark Capital, Index
Ventures
• Seasoned Executive Team
– Founded by Creator of Elasticsearch
– Seasoned Executives from SpringSource
Big Data in Todayʼ’s Business and Technology
Environment : some significant figures
• 2.7 Zetabytes of data exist in the digital universe today. (=1 billion Terabytes)
• 235 Terabytes of data has been collected by the U.S. Library of Congress in April
2011.
• Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.
• Akamai analyzes 75 million events per day to better target advertisements.
• Walmart handles more than 1 million customer transactions every hour, which is
imported into databases estimated to contain more than 2.5 petabytes of data.
• The largest AT&T database boasts titles including the largest volume of data in one
unique database (312 terabytes) and the second largest number of rows in a unique
database (1.9 trillion), which comprises AT&Tʼ’s extensive calling records.
• Hadoop :
– 94% of Hadoop users perform analytics on large volumes of data not possible
before
– 88% analyze data in greater detail;
– while 82% can now retain more of their data.
The Rapid Growth of Unstructured Data
• YouTube users upload 48 hours of new video every minute of the
day.
• 500+ new websites are created every minute of the day.
• Brands and organizations on Facebook receive 34,722 Likes every
minute of the day.
• 100 terabytes of data uploaded daily to Facebook.
• According to Twitterʼ’s own research in early 2012, it sees roughly
175 million tweets every day, and has more than 465 million
accounts.
• 30 Billion pieces of content shared on Facebook every month.
Data production will be 44 times greater in 2020 than it was in 2009. .
Big Data & Real Business Issues
• 25+ % of decision‐makers surveyed predict that data volumes in their
companies will rise by more than 60% by the end of 2014, with the
average of all respondents anticipating a growth of no less than 42 %.
• 40% projected growth in global data generated per year vs. 5% growth in
global IT spending.
• According to estimates, the volume of business data worldwide, across all
companies, doubles every 1.2 years.
– Poor data can cost businesses 20%–35% of their operating revenue.
– Bad data or poor data quality costs US businesses $600 billion annually.
• 75+ % of decision-makers surveyed anticipate significant impacts in the
domain of storage systems as a result of the “Big Data” phenomenon.
• We anticipate a new challenge : to be able to Search and Analyse all
those datas … in real time !
StartUp
search = like % ?SELECT doc.*, pays.* FROM doc, pays WHERE doc.pays_code = pays.code AND doc.date_doc > to_date('2011-12', 'yyyy-mm') AND doc.date_doc < to_date('2012-01', 'yyyy-mm') AND lower(pays.libelle) = 'france' AND lower(doc.commentaire) LIKE ‘%produit%' AND lower(doc.commentaire) LIKE ‘%david%';
Start…
$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.0.tar.gz!$ tar -xf elasticsearch-1.0.0.tar.gz!$ ./elasticsearch-1.0.0/bin/elasticsearch![INFO ][node ][Ghost Maker] {1.0.0}[5645]: initializing
… and play!$ curl -XPUT localhost:9200/sessions/session/1 -d '{! "title" : "Elasticsearch",! "subtitle" : "Make sense of your (BIG) data !",! "date" : "2014-02-13T16:30:00",! "tags" : [ "elasticsearch", "elosi", "cloud" ],! "speaker" : [{! "first_name" : "David", ! "last_name" : "Pilato" ! }]!}'
Search!$ curl -XPOST http://localhost:9200/sessions/session/_search -d' { "query": { "multi_match": { "query": "elasticsearch elosi david", "fields": [ "title^2", "speaker.first_name" ] } }, "post_filter": { "range": { "date": { "from": "2014-02-13", "to": "2014-02-14" } } } }'
$ curl -XPOST http://localhost:9200/sessions/session/_search -d' { "query": { ... }, "aggs": { "by_date": { "date_histogram": { "field": "date", "interval": "day", "format" : "dd/MM/yyyy" } } } }'
"by_date": [ { "key_as_string": "11/02/2014", "doc_count": 1 }, { "key_as_string": "12/02/2014", "doc_count": 2 }, { "key_as_string": "13/02/2014", "doc_count": 3 } ]
Compute!
{ "name":"Pilato David", "dateOfBirth":"1971-12-26", "gender":"male", "marketing":{ "fashion":334, "music":3363, "hifi":2351 }, "address":{ "country":"France", "countrycode":"FR", "city":"Paris" } }
Let’s make sense of …• logs
• github
• marketing data
• ...
• your data
• your big data
démo#mstechdays #elasticsearch StartUp
MAKE SENSE OF YOUR (BIG) DATA!
let’s inject some marketing documents…
StartUp
Distributed indices node 1
orders
products
1 2
3 4
1 2
$ curl -XPUT localhost:9200/orders -d '{! "settings.index.number_of_shards" : 4,! "settings.index.number_of_replicas" : 1!}'
$ curl -XPUT localhost:9200/products -d '{! "settings.index.number_of_shards" : 2,! "settings.index.number_of_replicas" : 0!}'
StartUp
Distributed indices node 1
orders
products
1 2
3 4
1 2
node 2
$ ./elasticsearch-1.0.0/bin/elasticsearch![INFO ][node ][Armageddon] {1.0.0}[5645]: initializing [INFO ][cluster.service][Armageddon] detected_master [Ghost Maker]
StartUp
node 3
Distributed indices node 1
orders
products
1
4
1
node 2
orders
products
2
3
2
2
3
1
4
$ ./elasticsearch-1.0.0/bin/elasticsearch![INFO ][node ][Karnak] {1.0.0}[5645]: initializing [INFO ][cluster.service][Karnak] detected_master [Ghost Maker]
StartUp
node 3
products
orders
Distributed indices node 1
orders
products
1
4
1
node 2
orders
products
2
33
2
2
3
1
4 3
1
4
Elasticsearch 1.0 : une
solution prête pour la
production !
Revolutionizing Data Search
and Analytics
Purpose of Elasticsearch
• Organize data and make it easily accessible
– Through powerful search and analytics
– Easily consumable (even for non-data scientists)
– Elegantly handles extremely large data volumes
– Delivers results in real time
• Technology stack agnostic
• Used across all market verticals
Features of Elasticsearch
• Structured & unstructured search
• Advanced analytics capabilities
• Unmatched performance
• Real-time results
• Highly scalable
• User friendly installation and maintenance
User: Fog Creek Software
Searches 40,000,000,000 (40 billion) lines of code
in real-time using Elasticsearch
Example: Email ArchivingEmail Archiving of 2 Petabytes of data across 100’s of servers
Big data, structured and unstructured
Unprecedented Uptake
Elasticsearch has more than 6 Million downloads
… and 450,000 more each month
Cumulative
User Raves
Chris Cowan @uhduh
I’m in love with @elasticsearch! I want to use it for everything right now!
Alain Richardt @alaincxs
Moving ffrom #solr to # Elasticsearch is like upgrading from a Reliant Robin to a McLaren
F1
Pete Connolly @peteconnolly
Two really useful and productive days of training from @kimchy and @uboness all
about #elasticsearch. Best training course in years
Cyril Lacôte @clacote
#ElasticSearch is the s*&t. Amazingly simple and powerful. Open source is awesome.
That's made my day.
Logan Lowell @fractaloop
Tweaking @elasticsearch for huge indexes can be fun. I'm very glad the IRC channel is
so helpful too.
Product Offerings:Support Throughout Your Project
1. Core Elasticsearch Training
2. Development and Production Support
1: Training
Core Elasticsearch Training
• Two day classroom training
• Delivered by Elasticsearch developers
1. Worldwide Public Courses
2. Onsite Training Course
Resources
• www.elasticsearch.com
• www.elasticsearch.org
• User Groups:
http://www.elasticsearch.org/community/forum/
• Contact:
Richard Maurer
Territory Manager
• Faceting
• Fuzzy Search
• Speed
• Auto Completion
• Geo Search
• Log Analysis
Ecommerce Enhanced Search
Engine
• REST based
• Memory and I/O efficient
• Adaptive I/O
• Map/Reduce API support
• Pig support
• Hive support
• = elasticsearch-hadoop
Combining Hadoop & ElasticSearch
Conclusion
• Il est temps de révolutionner la façon dont vous valorisez
vos données : offrez Elasticsearch à vos applicatifs !
• La stack ELK (Elasticsearch, Logstash, Kibana) en
version 1.0 est prête pour la production !
• Faites vous accompagner pour bénéficier des bonnes
pratiques et du support à tous les stades de votre projet :
conception, développement, production