OWF12/BIG DATA OWF OpenSearchServer light

Le moteur de recherche, inspirateur technologique

du Big Data ?

Emmanuel Keller, CEO OpenSearchServer

Ainsi naquît Google…

Avril 1998 hEp://infolab.stanford.edu/pub/papers/google.pdf

Google 1.0

24 millions de pages « It is foreseeable that by the year 2000, a comprehensive index of the Web will contain over a billion documents. » PageRank « Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstaFon. »

Google 1.0

« We intend to speed up Google considerably through distribuFon and hardware, soHware, and algorithmic improvements »

1997 -‐ IBM Deskstar 16GP

Google 1.0

Rappel

Informa5que n.f.

Science du traitement automaFque et raFonnel de l'informaFon meLant en œuvre des matériels et des logiciels.

L’art de jongler…

…c’est pas nouveau

•  Bayer, Rudolf; McCreight, E. (July 1970), Organiza\on and Maintenance of Large Ordered Indices, Mathema\cal

•  Bayer, Rudolf (1971), "Binary B-‐Trees for Virtual Memory »

La structure en arbre

Réduire le nombre de lectures physiques sur le disque dur

Une forêt d’arbres

•  Un système de fichiers (FAT, NTFS, EXT3, ZFS) est avant tout une structure en arbre.

•  Les bases de données basent leurs index sur la taille des blocs du disque dur

•  La mémoire virtuelle

Quand au calcul

•  Les fondeurs peinent à suivre la loi de Moore: la technologie du silicium aEeint ses limites

•  Les cœurs mul\ples compliquent les développements: obliga\on d’intégrer le parallélisme

Distributed compu\ng

Google 2.0 MapReduce: « Simplified Data Processing on Large Clusters » By Jeffrey Dean and Sanjay Ghemawat (Google Inc.) San Francisco, CA, December, 2004

hEp://research.google.com/archive/mapreduce.html

Hadoop: Implémenta\on open source sous licence Apache 2.0

Map Reduce

Source: hEp://www.gridgainsystems.com

MAP Répar\r la charge sur plusieurs nœuds (WORK) Chaque nœud travaille sur une par\e des données REDUCE Le résultat de chaque nœud est consolidé pour cons\tuer le résultat final

No SQL Database

Redis, HBase, Cassandra, Mongo DB, MemCache DB, Berkeley DB, Big Table,…

•  Une approche simplifiée de l’accès aux données •  Principe Clé / Valeur •  Se « cloudise » très bien

Vers un nouveau paradigme •  Applica\ons (Pentaho) •  JAVA, C / C++ ? L •  API, API, API, API… •  Système de fichiers – XtreemFS – Sector/Sphere – Oracle Clustered File System (GPL !!!!)

– Disques SSD

Merci pour votre aEen\on

ekeller@open-‐search-‐server.com

OWF12/BIG DATA OWF OpenSearchServer light

Documents

Transcript of OWF12/BIG DATA OWF OpenSearchServer light

OWF Catalog - Copy

OWF12/PAUG Conf Days Android system development, maxime ripard, free electrons

Owf12 open forges summit adms.sw

OWF12/BIG DATA Presentation big data owf ysance

OWF12/Java Moussine pouchkine Girard

OWF12/Java Sacha labourey

OWF12/Open source Web Applications on the cloudNuxeo, cloud and saas

OWF: Xen - Open Source Hypervisor Designed for Clouds

HESSELØ OWF – PRELIMINARY GEOTECHNICAL SITE …

OWF12/Java Building an Open M2M community

Owf 2013 rii veri t fontaine speaker4

OWF12/Alert project workshop

Barrow OWF Design & Build of UKs 1st OWF Substation

OWF12/Foss for Humanity Introducing OpenMRS

Soil Impedance of Monpiles for OWF

Ozone Widget Framework (OWF) Widget Development Capabilitiesdataintelligencellc.com/uploads/DI-OWF-Capabilities - 4-28-2014.pdf · The majority of DoD Cloud initiatives include the

Owf 2013 rii moose speaker 2

Ozone Widget Framework (OWF) Widget Development … - 4-28... · What is Ozone Widget Framework (OWF) ? OWF is web-application framework developed by NSA, that has evolved to Free

Owf12 open forges summit scott wilson, oss watch

OWF12/Java Ian robinson