Cisco Data Intelligence Platform...drives for better TCO, increased port density by consolidating...

6
Solution overview Cisco public Overview The digital disruption caused by unprecedented growth in data is forcing organizations to redefine business processes to compete more effectively. Data scientists are constantly searching for newer techniques and methodologies that can unlock the value of data and distill this data further to identify additional insights that could transform productivity and provide business differentiation. One area that seems extremely promising for extracting value from this data glut, and has seen tremendous development recently, is Artificial Intelligence/Machine Learning (AI/ML). Data scientists need to be able to operate on both a data lake and an AI/ML platform without worrying about the underlying infrastructure, thus placing stringent demands on their IT organization, which needs to grow to cloud scale while reducing the TCO of this infrastructure without affecting utilization. Cisco Data Intelligence Platform (CDIP) is a cloud-scale architecture that brings together big data, AI/compute farms, and storage tiers to work together as a single entity while also being able to scale independently to address the IT issues in the modern data center. CDIP combines a fully scalable infrastructure with centralized management and a fully supported software stack (in partnership with industry leaders) to each of these three independently scalable components of the architecture: the data lake, AI/ML technologies, and object stores. Cisco Data Intelligence Platform Big data meets AI: Modernizing data lakes Benefits Perform extremely fast data ingestion and data engineering at the data-lake level Use various types of AI frameworks and compute (including GPUs, CPUs, and FPGAs) to work on the data for advanced analytics with an AI-compute farm Gradually retire data as needed to a storage dense system with a lower cost-per- terabyte ratio for better TCO, thanks to storage tiering Seamlessly scale the architecture to thousands of nodes with a single pane of glass management using Cisco Application Centric Infrastructure (Cisco ACI™) and Cisco Intersight™ © 2019 Cisco and/or its affiliates. All rights reserved.

Transcript of Cisco Data Intelligence Platform...drives for better TCO, increased port density by consolidating...

Page 1: Cisco Data Intelligence Platform...drives for better TCO, increased port density by consolidating your UCS domain into a single fourth generation Fabric Interconnect, and FPGA-based

Solution overviewCisco public

Overview The digital disruption caused by unprecedented growth in data is forcing organizations to redefine business processes to compete more effectively. Data scientists are constantly searching for newer techniques and methodologies that can unlock the value of data and distill this data further to identify additional insights that could transform productivity and provide business differentiation.

One area that seems extremely promising for extracting value from this data glut, and has seen tremendous development recently, is Artificial Intelligence/Machine Learning (AI/ML).

Data scientists need to be able to operate on both a data lake and an AI/ML platform without worrying about the underlying infrastructure, thus placing stringent demands on their IT organization, which needs to grow to cloud scale while reducing the TCO of this infrastructure without affecting utilization.

Cisco Data Intelligence Platform (CDIP) is a cloud-scale architecture that brings together big data, AI/compute farms, and storage tiers to work together as a single entity while also being able to scale independently to address the IT issues in the modern data center. CDIP combines a fully scalable infrastructure with centralized management and a fully supported software stack (in partnership with industry leaders) to each of these three independently scalable components of the architecture: the data lake, AI/ML technologies, and object stores.

Cisco Data Intelligence PlatformBig data meets AI: Modernizing data lakes

Benefits• Perform extremely fast data

ingestion and data engineering at the data-lake level

• Use various types of AI frameworks and compute (including GPUs, CPUs, and FPGAs) to work on the data for advanced analytics with an AI-compute farm

• Gradually retire data as needed to a storage dense system with a lower cost-per-terabyte ratio for better TCO, thanks to storage tiering

• Seamlessly scale the architecture to thousands of nodes with a single pane of glass management using Cisco Application Centric Infrastructure (Cisco ACI™) and Cisco Intersight™

© 2019 Cisco and/or its affiliates. All rights reserved.

Page 2: Cisco Data Intelligence Platform...drives for better TCO, increased port density by consolidating your UCS domain into a single fourth generation Fabric Interconnect, and FPGA-based

Solution overviewCisco public

Data scientist vs. IT landscape While data scientists want to be able to use the latest and greatest advancements in AI/ML software and hardware technologies on their data sets, the IT team is busy building and administering the organization’s data lake. This has led to architecturally siloed implementations. When data that is ingested, worked on, and processed in a data lake, needs to be further used by AI/ML frameworks, it often leaves the platform and needs to be onboarded onto a different platform to be processed. This would be fine if this demand were seen only on a small percentage of workloads. However, AI/ML workloads are being increasingly used to process data inside a data lake. For instance, data lakes in customer environments are seeing a deluge of data from new use cases, such as IoT, autonomous driving, smart cities, genomics, and financials, all of which are seeing more and more demand for AI/ML processing of this data.

IT is demanding newer solutions to enable data scientists to operate on both a data lake and an AI/ML platform (or a compute farm) without worrying about the underlying infrastructure. IT also needs this infrastructure to seamlessly grow to cloud scale while reducing the TCO and without affecting utilization. These demands drive the need to plan the data lake along with the AI/ML platform in a systemic fashion.

Seeing this increasing demand by IT, and envisioning the solution as a natural extension of the data lake, we are pleased to introduce the Cisco Data Intelligence Platform.

The Cisco Data Intelligence PlatformThe Cisco Data Intelligence Platform (CDIP) brings big data, AI/compute farms, and tiered storage together as a single entity, but with each element still able to be scaled independently.

The Cisco Data Intelligence Platform can be deployed in various ways, as seen in Figure 1:

• With Cloudera Data Science Workbench (powered by Kubernetes) and tiered storage with Hadoop

• With Hortonworks with Apache Hadoop 3.1 and Data Science Workbench (powered by Kubernetes) and tiered storage with Hadoop

Figure1. Cisco Data Intelligence Platform

Cisco Data Intelligence Platform (CDIP)Unlock intelligence, performance, and simplicity at scale for large data sets

AI/compute farm

Data lake - Hadoop

Cloudera

HadoopHDFS

Hortonworks

MapR

Hive

Apache Spark

Apache HBASE

Big data meets AIHadoop 3.0 enables AI workloads to run natively with GPU and container resources

Eliminate architectural silosData intensive workloads, compute Intensive workloads, and storage systems work closely together

Cloud-scale architectureBrings together big data, AI, and object storage to scale to thousands of nodes and hundreds of petabytes

AutomationSolution management and deployment automation with Cisco Intersight

Data anywhereHDFS Object

ScalityCloudian

CephSwiftstack

Hadoop HDFS

Cold

WarmHot

Frameworks

Kubernetes

Hadoop YARN

Compute Apps

Docker

Intel Xeon

AMD EPYC

NVIDIA Tesla

TensorFlowPythonMXNetCaffe 2Pytorch

© 2019 Cisco and/or its affiliates. All rights reserved.

Page 3: Cisco Data Intelligence Platform...drives for better TCO, increased port density by consolidating your UCS domain into a single fourth generation Fabric Interconnect, and FPGA-based

Solution overviewCisco public

AI/compute farm - Compute-intensive workloads Today, deep learning is fueling many scenarios, including autonomous driving, computer vision, health care (cancer diagnosis, drug discovery, etc.), speech and image recognition, and video analytics. But processing these kinds of analytics for a very large data set with millions of simulations and several thousand containers to achieve high precision requires a lot of computation. These new use cases and applications need a great deal of computing power (often expressed in teraflops—one trillion floating-point operations per second—or TFLOPs), and GPUs along with CPUs are required to power the AI/ML algorithms.

Furthermore, with containerization, organizations can now manage computing resources elastically (on demand), deploy applications in microservices architectures, and run multiple versions of applications for AI/ML and deep-learning workloads.

The Cisco Data Intelligence Platform integrates an AI/compute farm with Hadoop clusters, enabling organizations to easily run AI/ML containerized workloads in the computing farm while accessing data on the Hadoop Distributed File System (HDFS). The computing farm provides a large pool of memory, CPU, and GPU resources to the Hadoop cluster. The computing farm enables logical separation between data and computing, thus allowing massive linear scaling without disruption.

Data-intensive workloads – Hadoop data lakeFor data-intensive workloads, the Cisco Data Intelligence Platform is built on Hadoop. Hadoop enables data engineering, providing very fast ingestion of data and Extract, Transform, and Load (ETL) processing. In a data-intensive workload, computing moves to the data to enable faster, distributed processing of the data.

Building a data pipeline that receives data flows from different data sources at higher velocities, performs ETL on this data so that it lands in HDFS, and then makes it available for the serving layer either for real-time streaming or batch processing, is an extremely I/O intensive operation.

The data lake in the Cisco Data Intelligence Platform is designed with servers that support high I/O and network bandwidth with little or no network oversubscription to help prevent bottlenecks even when the network is scaled out to thousands of servers.

In the Cisco Data Intelligence Platform, the AI/compute farm supports computation-intensive workloads in a multitenant containerized environment in a Kubernetes cluster.

© 2019 Cisco and/or its affiliates. All rights reserved.

Page 4: Cisco Data Intelligence Platform...drives for better TCO, increased port density by consolidating your UCS domain into a single fourth generation Fabric Interconnect, and FPGA-based

Solution overviewCisco public

Cisco has partnered with industry-leading object-storage vendors and has published multiple Cisco Validated Designs with SwiftStack, Scality, and Cloudian on Cisco UCS S3260 servers.

Data anywhere - Massive storageNew data is more frequently accessed than older data. Therefore, over time the frequency of read operations on a given data set naturally decreases, with data becoming less frequently accessed as it ages. New data is deemed “hot,” and old data (data that has already been analyzed and may not be part of 95 percent of the queries or workloads) is deemed “cold” or archival. An in-between type of data is “warm” data. As enterprises collect increasing volumes of data of all three types, they have a growing need to retire data to a more cost-effective storage with a better cost-per-terabyte ratio. Below are examples of hot, warm, and cold data tiers:

• The hot-data tier delivers a storage tier that consists of Cisco UCS® C240 M5 Rack Servers to store data sets that require high-speed storage access

• The warm-data tier uses Cisco UCS S3260 M5 Storage Servers, providing high-capacity storage within HDFS clusters by defining storage types and policies in HDFS. Each Cisco UCS S3260 has two nodes, and each node supports up to 100 TB of HDFS data

• The cold-data tier is provided through the object store, which provides only one copy of the data, with processing on this data performed outside the storage unit

Cisco UCS S3260 Storage Servers provide a high-capacity cold storage tier. Software-Defined Storage (SDS) with Cisco UCS S3260 brings together the simplicity and agility of the cloud and the cost benefits of industry-standard servers. It offers an excellent S3-compatible object-storage platform that is highly scalable and optimized for capacity and I/O performance.

© 2019 Cisco and/or its affiliates. All rights reserved.

Page 5: Cisco Data Intelligence Platform...drives for better TCO, increased port density by consolidating your UCS domain into a single fourth generation Fabric Interconnect, and FPGA-based

Solution overviewCisco public

Scalable infrastructureThe CDIP architecture can start from a single rack and scale to thousands of nodes with a single pane of glass management with Cisco Intersight and Cisco Application Centric Infrastructure (ACI) as seen in Figure 2. This cloud-scale architecture enables you to run your AI workloads (for example, TensorFlow and PyTorch) natively on a data lake, bringing synergy across a Hadoop, AI, and data anywhere (object/cold tier) strategy.

Figure 2. Cloud-scale architecture

Cisco Data Intelligence Platform

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1234

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1234

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

2x40 Gbps

And can scale to thousands

Can start small, with a single rack . . .

Data lake/Data intensive workload

AI/com-pute farm

Data anywhere/tiered storage

Cisco UCS C240 M5

Cisco UCS

S3260Cisco UCS

C480 ML M5

Cisco UCS® C240 M5

Big datacluster

AI/compute farm

Data anywhere /tiered storage

APIC

Cisco ACI™ fabric

If you’ve already deployed a big data solution, now is the time to modernize your infrastructure. You can take advantage of Intel®’s new Cascade Lake processors in our systems and newer configurations with all-flash and NVMe drives for better TCO, increased port density by consolidating your UCS domain into a single fourth generation Fabric Interconnect, and FPGA-based compression on Hadoop. You’ll get your infrastructure humming and keep your analytics team happy.

Figure 3. Modernized Hadoop Infrastructure

Modernized Hadoop Infrastructure

▪ Superior performance

▪ Cost parity with HDD

▪ Better TCO

Highperformance

Cascadelake

All HDD (1.9 PB)Cisco UCS® C240 M5

All-flash (1.5 PB) Cisco UCS C4200

NVMe/HDD (1.9 PB)Cisco UCS C240 M5

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1234

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1234

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1234

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1234

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1234

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1234

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

© 2019 Cisco and/or its affiliates. All rights reserved.

Page 6: Cisco Data Intelligence Platform...drives for better TCO, increased port density by consolidating your UCS domain into a single fourth generation Fabric Interconnect, and FPGA-based

Solution overviewCisco public

The Cisco advantageWhen building an infrastructure to enable this modernized architecture that can scale to thousands of nodes, operational efficiency can’t be an afterthought. Cisco delivers seamless operation at this scale with:

• Cisco UCS with Intersight and Cisco ACI, which can enable this cloud-scale architecture to be deployed and managed with ease

• Infrastructure automation of Cisco UCS servers with service profiles and Cisco data center network automation utilizing application profiles with Cisco ACI

• Centralized management and deep telemetry and simplified granular trouble-shooting capabilities

• Multitenancy for application workloads, including containers and microservices, with the right level of security and SLA for each workload

Use cases

Finance • Detect fraud• Create competitive advantages• Grow without increased cost

Healthcare • Remove distractions• Simplify diagnoses• Predict challenges

Media and entertainment • Classify content• Automate special effects• Offer augmented reality

Security and defense • Protect from malicious bots• Improve video analysis• Provide intelligent automation

Retail • Improve ad targeting• Make accurate recommendations• Provide compelling offers

Marketing • Improve product design• Optimize warranties• Provide predictive maintenance

Find out more about CDIPRead extended solution overview.

Contact your sales representative or partner.

© 2019 Cisco and/or its affiliates. All rights reserved. Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third-party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1110R) C22-742735-00 09/19