Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered...

14
Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions

Transcript of Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered...

Page 1: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

Dell Blueprint forBig Data and Analytics

November 2015

Reference Architectures and Engineered Solutions

Page 2: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

2 Dell Blueprints

Big Data and Analytics Blueprint Portfolio

Consulting and Deployment:

Custom – see services details for each offer

Training: R730, SC4020

ProSupport Plus

Training Credits:SQL, MS Analytics

ProSupport Plus

Dell Software

Suite

Statistica Data Analytics Suite

Dell Boomi Integration Tools Dell Toad Data Management Dell SharePlex Replication Connector for Hadoop

ReferenceArchitectures

Dell | Cloudera Apache Hadoop SolutionStart and up to 15 Nodes, Scales to 445 nodes, Scales 45+ nodes

Dell SQL DWFTStart with 730/PS6210S to 17TB, Scales on 730xd to 21TB,

Scales on 730/PS6210S to 26 TB, Scales on 730/SC4020 to 55TB

Dell | Cloudera | Syncsort Data Warehouse Optimization for ETL Offload RA (June 19, 2015)

Engineered Solutions Dell QuickStart for Cloudera Hadoop

5 nodes

Dell In-Memory Appliance for Cloudera Enterprise Start with 8 nodes, Scales to 16 nodes, Scales from 24 – 48 nodes

Microsoft APS AppliancePDW: 3 nodes, Scales PDW + Hadoop to 6 nodes, Scales PDW + Hadoop 9 – 54 nodes

SAP HANA Appliance Single Server configurations scale from 128GB – 1.5 TB RAM;

Scale Out cluster configurations scale from 2-16TB RAM (up to 24TB w/R930 – due September, 2015)

ES Implementations:

Deployment: APS JumpStart

SERVICES

RA Implementations:

Engage your Big Data Overlay Sales Team

Page 3: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

3 Dell Blueprints

Reference Architectures: DWFT for SQL server 2014 17 TB, 21 TB, 26 TB, and 55 TB configurations

Solution benefits  • Integrated, balanced and verified reference architectures

jointly engineered with Microsoft.• Capacity ranging from 17TB to 55TB.• Internal storage/SAN storage.• iSCSI and Fibre Channel networking.• Dell’s 13G server platform and all flash Dell storage arrays.• Feature rich SAN storage. Dell differentiation  • Faster deployments:  Pre-configured, Dell-led solution.• Reduced risk: Out-of-the-box offerings.• DWFT validated RA: Optimized data warehouse performance

that avoids over-provisioning of hardware resources.• Single point of contact/accountability for purchases, services,

and support with deep expertise based on 25 years.

Link to DWFT RAs

Dell PowerEdge serversR730/R730XD

Dell storagePS6210S/SC4020

Dell networkingswitchesS4810

MS Windows Server 2012 R2

Dell Open Manage / iDRAC / Lifecycle Controller

MS SQL Server 2014

Page 4: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

4 Dell Blueprints

Dell | Cloudera Apache Hadoop Solution Reference ArchitectureFlexible and scalable solution that simplifies Apache Hadoop

Minimize complexity through an engineered, validated solution based on extensive customer experience

• Scale Out hardware architecture — PowerEdge R730, R730xd and high performance Dell S-Series networking.

• Based on Cloudera Enterprise Apache Hadoop and Red Hat enterprise server.

• Comprehensive and collaborative service and support for the entire solution through it’s complete lifecycle.

The Dell difference

• Achieve Flexibility with a reference architecture approach that allows choice and provides guidance.

• Detailed reference architecture documentation.• Deployment guidelines detail best practices based on

extensive experience with production deployments.• Increased efficiency — PowerEdge servers are feature and

power-optimized to provide lower TCO in addition to saving on space and energy.

Link to Dell | Cloudera Apache Hadoop RALink to the Solution Brief

Store, process and analyzeall your data

Dell PowerEdge servers

Dell networkingswitches

Dell Statistica

Dell services

Cloudera Enterprise

Open

GovernedManaged

Secure

ApacheHadoop

Page 5: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

5 Dell Blueprints

Dell | Cloudera | Syncsort Data Warehouse Optimization for ETL Offload Reference ArchitectureThe first and only reference architecture for ETL offload with Hadoop

Scalable ETL with the flexibility of a reference architecture• Scale Out hardware architecture — PowerEdge R730,

R730xd and high performance Dell S-Series networking.• Tight integration between Dell, Cloudera and Syncsort

provides ease of deployment and maintenance with no performance impact or hurdles down the road.

• Close the Skills Gap by eliminating the need to develop expertise on MapReduce, Pig, Hive, and Sqoop.

• Fast Track Projects with automated conversion of legacy SQL scripts into efficient ETL processes in Hadoop without any coding.

• Comprehensive and collaborative service and support for the entire solution through it’s complete lifecycle.

The Dell difference• Faster time to value through an optimized solution jointly

designed by three market leaders.• Detailed reference architecture documentation.• Deployment guidelines detail best practices based on

extensive experience with production deployments.

Link to Dell | Cloudera | Syncsort DWO – ETL Offload RALink to the ETL Offload Solution Brief

Page 6: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

6 Dell Blueprints

Dell In-Memory Appliance for Cloudera Enterprise Big Data appliance optimized for in-memory analyticsReference architecture scalability with the simplicityof an appliance

• Scale Out hardware architecture, with predefined configurations and scalable in 4 node increments.

• Delivered assembled and ready to install, with minimal site integration requirements.

• Delivered with Cloudera Enterprise, Apache Spark, and Cloudera Impala ready to run.

• Optimized for interactive in-memory analytics and analysis of data, including streaming from connected devices and embedded sensors.

• Comprehensive and collaborative service and support for the entire solution through it’s complete lifecycle.

The Dell difference

• Based on the established Dell Cloudera Reference Architecture.• Faster time to value with a pre-configured, turnkey data

platform.• Increased efficiency — PowerEdge servers are feature and

power-optimized to provide lower TCO in addition to saving on space and energy.

Starter Configuration

8 Node ClusterPowerEdge R730- 4 Infrastructure Nodes with ProSupportPowerEdgeR730XD- 4 Data Nodes with ProSupportCloudera Enterprise Dell Networking using S4048-ON and S3048-ON switchesDell Rack 42U~176TB (disk raw space)

Mid-Size Configuration

16 Node ClusterPowerEegeR730- 4 Infrastructure Nodes with ProSupportPowerEdgeR720XD- 12 Data Nodes with ProSupportCloudera Enterprise Dell Networking using S4048-ON and S3048-ON switches Dell Rack 42U~528TB (disk raw space)Small Enterprise Configuration

24 Node ClusterPowerEdgeR730- 4 Infrastructure Nodes with ProSupportPowerEdgeR730XD- 20 Data Nodes with ProSupportCloudera Enterprise Dell Networking using S4048-ON and S3048-ON switches julDell Rack 42U~880TB (disk raw space)

Spec Sheet for the Dell In-Memory Appliance for Cloudera EnterpriseLink to the Solution Brief

Page 7: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

7 Dell Blueprints

Dell QuickStart for Cloudera HadoopCost-effective, all-in-one starter bundle for testing and building Hadoop proof-of-concept

Easy, affordable, flexible Hadoop starting point

• Five node Hadoop cluster with PowerEdge R730xd and Dell networking, Cloudera Enterprise Basic Edition and RedHat Enterprise included.

• Easy: Dell QuickStart for Cloudera Hadoop includes all hardware, software, networking, training and services.

• Affordable: Build a full Hadoop proof of concept for under $150K.

• Flexible: Build a proof of concept that can also upgrade to a full production cluster.

The Dell difference

• Upgradeable to the full Dell Cloudera Reference Architecture• Initial services jumpstart included.

Dell switch

Dell R7302x infrastructure nodes

Dell R730XD3x data nodes

Link to Datasheet for the Dell QuickStart for Cloudera Hadoop

Page 8: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

8 Dell Blueprints

Engineered Solutions for SAP HANAModular, complete In-memory appliance for real-time data analytics

Link to Solution Brief Link to Tech Sheet

Single server configuration

Scale-out cluster configuration

Engineered solutions SAP HANA detail

• PowerEdge R930 node for all configurations: 4U/4-socket Intel E7-8890v3 (R930 certified for scale-out in September).

• Delivered fully configured with either SLES or RHEL and ready for SAP HANA licenses keys to be applied.

• Deployment services included in appliance SKU; full SAP HANA transformation consulting and managed services available.

• 38% faster performance than next competitor in SAP BW-EML 1B record benchmark (www.sap.com/benchmark).

The Dell difference Key differentiation :

• Common node platform, from the smallest to the largest, simplifies the management and maintenance of your system.

• Modular Scalability ensures your ability to grow your scale out system without disruption or “rip and replace”.

• Single vendor for every aspect of the solution, end to end.

Page 9: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

9 Dell Blueprints

Microsoft analytics platform system by Dell • Integrated compute, storage, networking and software

appliance for high performance database workload needs.

• Microsoft APS software aggregates, stores and queries relational (SQL)+ non-relational (Hadoop) data in the solution.

• Includes Jumpstart services (3 weeks) for customer training and architecture design.

The Dell difference

• MPP (Massively Parallel Processing) appliance for up to 100x improvement over SMP database workloads.

• Highly scalable solution — starting from 3 nodes to 54, Multiple racks can be configured (up to 6 racks). Scales from 21TB to 6PB, Scale-out expansion 3 nodes at a time.

• White glove delivery and installation: Delivered as fully built appliance with software installed and configured (EDT) for the customer with training services (GICS).

Link to APS Solution BriefLink to Jumpstart Services

Engineered Solution: Microsoft Analytics Platform System by DellReal-time management of relational (SQL) and non-relational (Hadoop) data

x2 | SX6036 Infiniband switchesx2 | N3048 ethernet switchesx2 | R630 management nodesx2 | R630 nodes added when HDInsight included in first rack Optional

3rd Scale Unite for 9 nodes Optionalx3 | R630 compute nodes x2 | MD3060e JBODs (102 Drives / 18 Spare)

2nd Scale Unite for 6 nodes Optionalx3 | R630 compute nodes x2 | MD3060e JBODs (102 drives / 18 spare)

Base unit for 3 nodesx3 | R630 compute nodesx2 | MD3060e JBODs (102 drives / 18 spare)

Scales from 3 nodes to 54 nodes across 6 racks (up to 6PB)

Page 10: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

10 Dell Blueprints

Software: Dell SharePlexDatabase replication and migration

Dell SharePlex

Database replication — Oracle to Oracle — Near real-time data integration.• SharePlex saves DBAs more than five hours a day by

automating replication, which has also increased accuracy.• Only SharePlex provides data compare and repair, in-flight

data integrity, plus monitoring and alerting functionalities — all in one affordable solution.

System requirements

Platform UNIX®, Linux®, Windows®Memory SharePlex processes are 64-bit and can exceed 4GB.Per process memory greater than or equal to 256MB.Additional software: SQL*Plus®Source environments: Oracle Target environments: Oracle, Microsoft SQL Server, SAP ASE, Hadoop, Java Message Service (JMS), File See the platform-specific pre-installation checklist in the installation guide foradditional system and database requirements.For replication, migration or data integration from Oracleto Hadoop, go to SharePlex Connector for Hadoop.

Dell PowerEdgeservers

Source databases:Oracle

Target databases:Oracle, SQL Server, Hadoop, SAP ASE, more…

SharePlex database replication

On-premises / Remote / In-the-Cloud

Oracle EBS / PeopleSoft / Siebel / SAP / more…

CRM / Finance / HR / Web Apps / BI

Dell networkswitches

Dell storage

Link to Datasheet for Dell SharePlex

Page 11: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

11 Dell Blueprints

Software: Dell StatisticaPredictive analytics platform

Dell Statistica

Dell Statistica is an advanced analytics platform that enables organizations to transform unstructured, semi-structured and structured data into actionable business decisions. Statistica excels at creating predictive models that can see into the future.• Challenger in the Gartner Magic Quadrant for Advanced

Analytics Platforms.• Professional services available for standing up models and

driving better reports.

System requirements

Compatible with Windows® XP, Windows Server® 2003 and 2008, Windows Vista® and Windows 7 and 8.Client requirements Windows XP (Windows 7 or above recommended) 512 MB RAM (1 GB recommended) 500 MHz processor (2.0 GHz, 64-Bit, dual core recommended).Server requirements: Windows Server 2008 R2 or later2 GB RAM (8 GB recommended) 1.0 GHz processor (2.0 GHz, 64-Bit, dual core recommended) 2.5 GB disk space 100 Mb/s or faster network bandwidth For complete system requirements, please visit statsoft.com/Products/Licensing.

Dell Statistica

Analytics

Multiple users

Dell PowerEdgeservers

Dell — Single user

Link to Datasheet for Dell Statistica

Page 12: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

12 Dell Blueprints

Software:Dell Toad Data Point and Toad Intelligence CentralData prep and cleansing

Dell Toad Data Point and Toad Intelligence Central

Toad Data Point is a cross-platform query, data integration and preparation tool that simplifies data access, analysis and provisioning for data management professionals. It is specifically built for data analysts, providing nearly limitless data connectivity, desktop data integration, visual query building, and workflow automation.• Improved data access • Desktop data integration • Data Preparation • Improves productivity

System requirements

Compatible with Windows® XP, Windows Server® 2003and 2008, Windows Vista® and Windows 7 and 8. Client requirementsWindows XP (Windows 7 or above recommended) 512 MB RAM (1 GBrecommended) 500 MHz processor (2.0 GHz, 64-Bit, dual core recommended)Server requirements: Windows Server 2008 R2 or later 2 GB RAM (8 GB recommended) 1.0 GHz processor (2.0 GHz, 64-Bit, dual core recommended)2.5 GB disk space 100 Mb/s or faster network bandwidth. For complete systemrequirements, please visit statsoft.com/Products/Licensing-Options/System-Requirements.

Dell PowerEdgeservers

User

Toad Intelligence Central

Data sources

Toad Data Point

Dell statistica

Analytics

Link to Datasheet for Dell Toad Data Point Link to Datasheet for Dell Toad Data Intelligence Central

Page 13: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.

13 Dell Blueprints

Software: Dell BoomiCloud-based data integration platform

Dell Boomi

The Dell Boomi AtomSphere platform was designed and implemented from the ground up to be an elastic, multi-tenant, hosted platform. It is not a retro-fit of a traditional software solution where multi-tenancy is achieved via multiple installation instances. Dell Boomi has a proven, tenant-isolation implementation that achieves isolation at a process, data and management level by:

• Assigning a unique identifier to each account and tagging all objects associated with the account with this ID.

• Using roles and permissions to control access to account objects and management functions.

• Encapsulating all integration workflow, transformation rules, business logic validations and connector operations as metadata bound to a specific customer account.

• Deploying workflow configuration metadata to an Atom, which acts on it to perform the execution of an integration process.

Dell Boomi

Application Application

Link to Datasheet for Dell Boomi

Page 14: Dell Blueprint for Big Data and Analytics November 2015 Reference Architectures and Engineered Solutions.