Top 10 Big Data Tools

12
  • date post

    21-Oct-2014
  • Category

    Technology

  • view

    2.147
  • download

    3

description

Happiest Minds enables organizations conceptualize and drive a well thought-out big data program across multiple domains and focus areas, which enable them achieve the twin objectives of revenue maximization and increasing operational efficiency. For more details, click here - http://www.happiestminds.com/big-data/ Also read our blog on Big Data at - http://www.happiestminds.com/blogs/what-do-you-do-with-your-gold-mine-of-insights/

Transcript of Top 10 Big Data Tools

Page 1: Top 10 Big Data Tools
Page 2: Top 10 Big Data Tools

SO WHAT DO YOU DO WITH YOUR GOLD MINE OF INSIGHTS?

Happiest Minds presents TOP 10 open source technologies that are the best in the market to harness, analyze and make the most sense out of Big Data.

We are in an ever expanding marketplace!!!

With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed of light

ANDInformation (which we now have more than enough access

to) has gone on to be more about analytics and business relevance.

Page 3: Top 10 Big Data Tools

#1

You simply can't talk about big

data without mentioning

Hadoop

The Apache distributed data processing software is so pervasive that sometimes the terms "Hadoop" and "big data" get used synonymously

Hadoop is known for the ability to process extremely large data in both structured and unstructured formats reliably replicating chunks of data to nodes in the cluster and making it available locally on the processing machine

Apache Foundation also sponsors a number of related projects that extend the capabilities of big data Hadoop

Page 4: Top 10 Big Data Tools

If Hadoop is the big data mahout, MapReduce happens to be it’s lifeline

A programming model and software framework for writing applications, MapReduce works to rapidly process vast amounts of data in parallel on large clusters of compute nodes

Widely used by Hadoop, as well as many other data processing applications

#1

MapReduce was originally

developed by Google!

#2

Page 5: Top 10 Big Data Tools

GridGain is a Java based middleware for faster in-memory processing of Big Data in real time

GridGain is compatible with the Hadoop Distributed File System

Requires Windows, Linux or Mac OS X operating system

#3

GridGain offers an alternative

to MapReduce

Page 6: Top 10 Big Data Tools

Developed by LexisNexis Risk Solutions, HPCC is short for "high performance computing cluster"

HPCC Systems delivers on a single platform, a single architecture and a single programming language for data processing

Both free community versions and paid enterprise versions are available

#4

HPCC claims to offer superior

performance to Hadoop

Page 7: Top 10 Big Data Tools

Storm differs from other tools with it’s distributed, real-time, fault-tolerant processing system, unlike batch processing systems of Hadoop

Real-time computation capabilities, it is fast and highly

scalable, often being described as the "Hadoop of real-time"

Fault-tolerant and works with nearly all programming languages, though typically Java is used

#5

Coming from the Apache family,

Storm is now owned by Twitter

Page 8: Top 10 Big Data Tools

Cassandra is a highly scalable NoSQL database for massive data across multiple data centers and the cloud

Used by many organizations with large, active datasets, including Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco and Digg

Its commercial support and services are available through third-party vendors

#6

Originally developed by Facebook,

it is now managed by the Apache

Foundation

Page 9: Top 10 Big Data Tools

HBase is the non-relational data store for Hadoop

Being a column-oriented database management system, HBase is well suited for sparse data sets and is written in Java

Supports writing applications such as Avro, REST and Thrift

Features include: linear and modular scalability strictly consistent reads and writes automatic failover support and much more

#7

Developed as part of the Apache

Hadoop project, HBase runs on top

of Hadoop Distributed Filesystem

Page 10: Top 10 Big Data Tools

MongoDB was originally developed by 10gen designed to support humongous databases

It's a NoSQL database written in C++ with document-oriented storage, full index support, replication and high availability and scales horizontally without compromising functionality

Commercial support is available through 10gen

#8

mongoDB literally comes from the

term ‘humongous’ and is the most

popular NoSQL database system

Page 11: Top 10 Big Data Tools

Neo4j boasts performance improvements of up to 1000x or more versus relational databases

Stores data structured in graphs instead of tables and is a disk-based, fully transactional Java engine

Organizations can purchase advanced and enterprise versions from Neo Technology

#9

Developed by Neo Technologies,

this is the world’s leading graph

database

Page 12: Top 10 Big Data Tools

CouchDB stores data in JSON documents that can be accessed via the web or query using JavaScript

Offers distributed scaling with fault-tolerant storage

Key featured include: On-the-fly document transformation Real-time change notifications Easy-to-use web administration console

#10

Another one from the Apache

Foundation, CouchDB is

completely made for the web