Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im...

45
KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft Research Group Cloud Computing - Steinbuch Centre for Computing www.kit.edu Big Data Architectures and Technologies Marcel Kunze

Transcript of Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im...

Page 1: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

KIT – Universität des Landes Baden-Württemberg und

nationales Forschungszentrum in der Helmholtz-Gemeinschaft

Research Group Cloud Computing - Steinbuch Centre for Computing

www.kit.edu

Big Data Architectures and Technologies

Marcel Kunze

Page 2: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Largest European scientific Institution

Main Topics: Energy, Nanotechnology, Astrophysics, Engineering

Mission:

Research Education Innovation

Karlsruhe Institute of Technology (KIT)

www.kit.edu

M. Kunze | Big Data Architectures and TechnologiesCGW 2014, Cracow2

Page 3: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Agenda

What is “Big Data” ?

Big Data Management

Big Data Toolbox

R&D Projects

CGW 2014, Cracow3 M. Kunze | Big Data Architectures and Technologies

Page 4: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

What is “Big Data” ?

CGW 2014, Cracow4 M. Kunze | Big Data Architectures and Technologies

Page 5: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Big Data Volume

CGW 2014, Cracow5

Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012

Million Peta Bytes =

Zetta Bytes

Science seems

to be not the

main driver any

more !

M. Kunze | Big Data Architectures and Technologies

Page 6: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Large Scale Data Facility (LSDF)

RAID

HDFS

LTO Tape

6 CGW 2014, Cracow M. Kunze | Big Data Architectures and Technologies

Page 7: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

The Data Center as a Computer

Apple data center in Maiden, NW Carolina: Exa-Scale data center (iCloud)

7

Comparison:

SCC@KIT,

LSDF, LHC Tier-1

CGW 2014, Cracow M. Kunze | Big Data Architectures and Technologies

Page 8: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Big Data Drivers

CGW 2014, Cracow8

Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012

M. Kunze | Big Data Architectures and Technologies

Page 9: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Big Data Dimensions (4V)

CGW 2014, Cracow9

• Internet of Things

• Data Generation in (near) real-time

• Milliseconds, Seconds, Hours

• Digital certificates

• Data quality

• Uncertainty quantification

• Reference data

• Data Sources (Biz, Web)

• Structured / unstructured

• Poly-structured

• Text, Video, Pictures, Tweets, Blogs, Meters

• Machine Communication

• Number of Data Sets

• Number of Files

• Number of Objects

• Tera, Peta, Exa, Zetta, Yotta Bytes (1024 Bytes)

• Brontobyte (1027 Bytes)

• Geopbyte (1030 Bytes)

Volume Variety

VelocityVeracity

M. Kunze | Big Data Architectures and Technologies

Page 10: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

The 5th Dimension: Value

A major new trend in information processing will be the trading of

original and enriched data, effectively creating an information economy

Data mining

Descriptive analytics (Past)

Predictive analytics (Future)

Prescriptive analytics (Actionable insight)

Correlation of data

Intelligence of patterns, relations, etc.

CGW 2014, Cracow10

„When hardware became commoditized, software was valuable. Now that software is

being commoditized, data is valuable.“ (TIM O‘REILLY)

„The important question isn’t who owns the data. Ultimately, we all do. A better question

is, who owns the means of analysis?“ (A. CROLL, MASHABLE, 2011)

M. Kunze | Big Data Architectures and Technologies

Page 11: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Added Value in the Data Economy

CGW 2014, Cracow11

Acq

uis

itio

n DigitizationIn

teg

ration Quality

Management

Aggre

gation Market Place

Pro

ducts Services

Vis

ualiz

ation Interpretation

Big Data Technologies and Cloud Infrastructure

M. Kunze | Big Data Architectures and Technologies

Example: Weather data - heterogeneous data/simulation sources – forecast/probability – map/app

Page 12: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Big Data Management

CGW 2014, Cracow12 M. Kunze | Big Data Architectures and Technologies

Page 13: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Data Management

CGW 2014, Cracow13

How can we manage billions of files/objects?

How can we manage exabytes of data?

Data security

Data usage control

Data movement

Data availability

Data preservation

Data publication

M. Kunze | Big Data Architectures and Technologies

Page 14: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Data Security

There is no absolute security, neither in the cloud nor on your own

premises. Spies are everywhere…

Risk management:

Risk = probability of disaster * cost of disaster

Given a lower cost, a higher probability of data loss could be tolerable

A web server and clickstream analyses might be safely migrated into a public cloud

Strong encryption helps to treat compliance and legal issues. Control of:

Keys

Algorithms

Data

CGW 2014, Cracow14

“Encryption helps…”

M. Kunze | Big Data Architectures and Technologies

Page 15: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Data Usage Control

Serious cloud providers are offering corresponding services that even

comply with the highest legal standards

Regional concepts to define the data location

Trusted data stores with encryption

Identity and access management

Data usage control via transactional trails logging

CGW 2014, Cracow15

Who had What Whichaccess to resources WhenSource: Amazon CloudTrail

M. Kunze | Big Data Architectures and Technologies

Page 16: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Data Movement

Globus Online: Fully managed and automated Big Data file transfer

SaaS: Separation of control plane and data plane (Third party transfer)

CGW 2014, Cracow16

Source: https://www.globus.org/

M. Kunze | Big Data Architectures and Technologies

Page 17: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Data Publication

What does it mean to publish?

Data is:

Identified

Described

Curated

Verifiable

Accessible

Preserved

CGW 2014, Cracow17 M. Kunze | Big Data Architectures and Technologies

Page 18: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Data Discovery

What does it mean to discover?

Data can be:

Searched

Browsed

Accessed

CGW 2014, Cracow18 M. Kunze | Big Data Architectures and Technologies

Page 19: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Globus Data Publication Services

SaaS for publishing Big Data

Bring your own storage

Extensible metadata

Publication and curation

workflows

Public and restricted

collections

Rich discovery model

Architecture:

Communities create collections

of datasets

Describe the metadata

Define policies for data access

Define processes for publication

CGW 2014, Cracow19

Source: https://www.globus.org/data-publication

M. Kunze | Big Data Architectures and Technologies

Page 20: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Supply Domain specific Meta Data

CGW 2014, Cracow20

Source: https://www.globus.org/data-publication

M. Kunze | Big Data Architectures and Technologies

Page 21: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Search within and across Collections

CGW 2014, Cracow21

Globus may be a pathfinder project to create open data markets

Source: https://www.globus.org/data-publication

M. Kunze | Big Data Architectures and Technologies

Page 22: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Globus Genomics

CGW 2014, Cracow22

Source: https://www.globus.org/genomics/

M. Kunze | Big Data Architectures and Technologies

Page 23: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Big Data Toolbox

CGW 2014, Cracow23 M. Kunze | Big Data Architectures and Technologies

Page 24: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Google Innovations over the last twelve Years

CGW 2014, Cracow24

Spanner

(Big Table 2)DremelMapReduce

Big TableColossus

(GFS2)

2010 2012 20132002 2004 2006 2008

GFSCompute

Engine

New

SQL

M. Kunze | Big Data Architectures and Technologies

Page 25: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Google Cloud Platform

CGW 2014, Cracow25

Storage

Cloud Storage Cloud SQL Cloud Datastore

Compute

App Engine

(PaaS)

Services

BigQuery Cloud EndpointsCompute Engine

(IaaS)

Big Data Analysis Tool

Analyze terabytes of data in seconds

Data imported in bulk or using streaming

Supports CSV and JSON

Browser tools, command line tool, or REST-API

M. Kunze | Big Data Architectures and Technologies

Page 26: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Google Innovations over the last twelve Years

CGW 2014, Cracow26

Spanner

(Big Table 2)DremelMapReduce

Big TableColossus

(GFS2)

2010 2012 20132002 2004 2006 2008

GFSCompute

Engine

Opensource:

New

SQL

M. Kunze | Big Data Architectures and Technologies

Page 27: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Hadoop

Hadoop is a Big Data ecosystem that implements

Hadoop core utilities

HBase: A scalable, distributed database for large tables.

HDFS: A distributed file system.

Hive: A data warehouse, data summarization and ad hoc querying.

MapReduce: distributed processing on compute clusters.

Oozie: Workflow management.

Pig: A high-level data-flow language for parallel computation.

Spark: Ultra-fast in-memory computing.

ZooKeeper: coordination service for distributed applications.

And much more …

27 CGW 2014, Cracow M. Kunze | Big Data Architectures and Technologies

Page 28: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

From Hadoop to the Enterprise Data Hub

CGW 2014, Cracow28

OpensourceScalableFlexibleCost effective

Managed ✖OpenArchitecture ✖Secure

3RD PARTY APPS

Unified scale-out storage for anytype of data

UNIFIED, ELASTIC, RESILIENT,,SECURE SENTRY

CLOUDERA’S ENTERPRISE DATA HUB

BATCH PROCESSING

MAPREDUCE

ANALYTIC SQL

IMPALA

SEARCH ENGINE

SOLR

MACHINE LEARNING

SPARK

STREAM PROCESSING

SPARK STREAMING

WORKLOAD MANAGEMENT YARN

FILESYSTEM

HDFS

ONLINE NOSQL

HBASE

DA

TA

MA

NA

GEM

ENT

CLO

UD

ERA

NA

VIG

ATO

R

SYSTEM

MA

NA

GEM

ENT

CLO

UD

ERA

MA

NA

GER

http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-enterprise.html

M. Kunze | Big Data Architectures and Technologies

Page 29: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

How do we conveniently treat Machine Data?

CGW 2014, Cracow29

Customer ID Order ID Product ID

Customer ID

Twitter ID Customer ’s Tweet

Middleware Errors

Time Waiting On Hold

Company’s Name

Sources

Twitter

Callcenter IVR

Order System

Order ID Customer ID

Variety: Data are “poly-structured” and live in lots of formats

M. Kunze | Big Data Architectures and Technologies

Page 30: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

The Data Warehouse

CGW 2014, Cracow30

Machine Data Operational Intelligence

HA Indexesand Storage

Search andAnalytics

ProactiveMonitoring

Operational Visibility

Real Time Business

CommodityServers

Online Services

Web Services

ServersSecurity

GPSLocation

Storage

Desktops

Networks

Packaged Applications

CustomApplicationsMessaging

Telecoms

Online Shopping

Cart

Web Clickstreams

Databases

Energy Meters

Call Detail Records

Smartphones and Devices

RFID

M. Kunze | Big Data Architectures and Technologies

Page 31: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

CGW 2014, Cracow31

Big Data Analytics in the Cloud: Amazon Redshift

Source: Amazon Redshift and DynamoDB

Fully managed, massively parallel relational data warehouse

Takes care of cluster management and distribution of data

Optimized for complex queries across many large tables

Use standard SQL & standard BI tools

Can be combined with Hadoop on-demand (Elastic MapReduce)

M. Kunze | Big Data Architectures and Technologies

Page 32: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Microsoft Analytics Platform System

Bringing together Hadoop with the data warehouse (Windows Azure)

New: Microsoft machine learning service (Azure ML)

CGW 2014, Cracow32 M. Kunze | Big Data Architectures and Technologies

Page 33: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Research and Development

CGW 2014, Cracow33 M. Kunze | Big Data Architectures and Technologies

Page 34: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Ingredients of a Big Data Project

Technology

Data preparation

Scalable processing

Scalable platform

Mathematical analysis methods

Machine learning

Statistics

Optimization

Toolset

Natural Language Processing

Image processing

Visualization

Application

Real-world analysis problem

CGW 2014, Cracow34 M. Kunze | Big Data Architectures and Technologies

Cloud Computing

Page 35: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Smart Data Innovation Lab (SDIL)

M. Kunze | Big Data Architectures and TechnologiesCGW 2014, Cracow35

www.sdil.de

Data sources

from industry, research, public

sector and Internet

Smart Data

Innovation Lab

Knowledge

(Analytics, Forecast)

Research and DevelopmentBig Data-

Technology

Cooperation between industry and science to spur innovation

Pilot R&D projects on dedicated Big Data infrastructure

Page 36: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Architecture

M. Kunze | Big Data Architectures and TechnologiesCGW 2014, Cracow36

Op

en

So

urc

e

Re

po

sito

ry

Data

sources

Big Data

Processing

IT and applied research on an

open system

+

Data sources from industry,

research, public sector and

Internet

Domain specific Tools

HANA Terra-

cotta

IT Re-

search

Applied Science

Central

Storage

Distributed

Datasources

Catalog + Broker

Azure …

Page 37: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Application Area

M. Kunze | Big Data Architectures and TechnologiesCGW 2014, Cracow37

Operations – Platform & Tools

Data Innovation Communities

Energy Smart CitiesIndustry 4.0 Medicine

Working Group Data Curation

Workin Group Law

Cross Topic

Communities

Page 38: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Partners

M. Kunze | Big Data Architectures and TechnologiesCGW 2014, Cracow38

Page 39: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Research and Development Areas

M. Kunze | Big Data Architectures and TechnologiesCGW 2014, Cracow39

Applications

• Industry 4.0

• Logistics

• Smart Grids

• Smart City

• PersonalizedMedicine

Methods

• Data mining

• Machine Learning

• Statistics Analysis

• Predictive Analytics

• Tools

Storage

• Data Warehouses

• NoSQLDatabases

• Column Stores

• In memory DBs

Processing

• HadoopEngines

• Real time Analytics

• Software Defined Data Center

Representation

• Dashboards

• Visualization

• Rich Clients

• Collaboration Platforms

Page 40: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Predictive Analytics

CGW 2014, Cracow40

Source: blue yonder

Descriptive

M. Kunze | Big Data Architectures and Technologies

Page 41: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Prescriptive Analytics

CGW 2014, Cracow41

Decision

Synthesis

Analysis

Integration

Preparation

Collection

From raw data to decision processes:

Data analysis takes time and often works with offline data only

Is it possible to improve the process and react in real time ?

Actionable Insight

Knowledge

Information

Data

M. Kunze | Big Data Architectures and Technologies

Page 42: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Blue Yonder forward demand Architecture

CGW 2014, Cracow42 M. Kunze | Big Data Architectures and Technologies

NEUROBAYES SYSTEM

High Performance

Data Transformation

Trainer

Industry-

specific Output

Pre

dic

tio

ns

Simulator(industry-specific)

OUTPUT

EXPERTISE(Industry-specific models)

HISTORICAL

TRANSACTION DATA

REAL-TIME

FEEDR >

R >BATCH

INPUT

OPERATIONAL SYSTEM(Environment)

WEB UI & SERVICES

ExpertSTREAM-

BASED INPUT

Machine learning utilizing modern in-memory database technology

Direct integration into business processes (not just simple data-mining)

Page 43: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Future: Algorithm in Hardware

CGW 2014, Cracow43 M. Kunze | Big Data Architectures and Technologies

NeuroBayes machine learning algorithm on FPGA

Field Programmable Gate Array: (XILINX Virtex6 VLX75T)

Clock frequency: 250 MHz

Approx. 1 decision per clock cycle (fully pipelined architecture)

250 million decisions per second

Throughput: 100 Gbit/s

Interesting for real-time investigation of online streaming data

Page 44: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

Summary

Big Data depends on scalable dynamic models (Cloud is essential)

Big Data is interdisciplinary: Computer Science, Mathematics, …

Hadoop may assume the role of the data hub

Big data is more about value than about volume

CGW 2014, Cracow44 M. Kunze | Big Data Architectures and Technologies

Page 45: Big Data Architectures and TechnologiesBig Data Volume 5 CGW 2014, Cracow Source: „Big-Data im Praxiseinsatz, Leitfaden", BITKOM 2012 Million Peta Bytes = Zetta Bytes Science seems

KIT – Universität des Landes Baden-Württemberg und

nationales Forschungszentrum in der Helmholtz-Gemeinschaft

Research Group Cloud Computing - Steinbuch Centre for Computing

www.kit.edu

Contact:[email protected] Dr. Marcel Kunze

Karlsruhe Institute of Technology (KIT)

Forschungsgruppe Cloud Computing

Steinbuch Centre for Computing

Hermann-von-Helmholtz-Platz 1

D-76344 Eggenstein-Leopoldshafen