Conference Theme - Hitachi Solutions America · Introduction. Structured Data. Unstructured Data....
Transcript of Conference Theme - Hitachi Solutions America · Introduction. Structured Data. Unstructured Data....
/ 2Think Digital Customer Conference 2019
Conference Theme
It can be challenging to understand what exactly Digital Transformation means for your business.We want to help you take a step back and reconsider how you run your business and perhaps even how you go to market in this new world we’re already living in.We understand it can be challenging to think strategically rather than tactically about specific products and tools. We want to help you overcome this challenge, so that you don’t underutilize the power of the solutions Hitachi Solutions has to offer.It’s time to rethink our approach, it’s time to Think Digital.Looking to solve a problem? Think Digital.Streamline your workload? Think Digital.Extend Your Reach? Better communicate? Increase sales? Think Digital.And we’re going to help you do just that, starting with this conference.
DIGITAL TRANSFORMATION CAN BE AN INTIMIDATING CONCEPT
Off to the Big Data Race: Performance, Speed and Storage
Director, Data & Analytics
Email: [email protected]
Orlando Gonzalez
#HSCCATLANTA19
Breakout Track 3: Analytics and AI
Introduction
Structured Data
Unstructured Data
Modern Data Warehouse
Azure Data Services
01
02
03
04
05
C O N T E N T S
Business Use Case
Azure Data Lake
HDInsight
Azure SQL DW
Databricks
06
07
08
09
10
/ 5Think Digital Customer Conference 2019
Introduction
• How do I manage various data types?• Which data service should you use?• Correlation of data?• Does all your data have value?• Can you afford to keep everything?• Who has a deep understanding of your data?
/ 6Think Digital Customer Conference 2019
Structured Data• Organization (tables, rows, columns)• Standard data mining techniques• Ongoing administration and maintenance• SQL Server
• Relational database• Pre-defined schemas for structure• Upfront preparation and architecture required• Changes in data type (numeric/text) requires schema change• Transaction security, keys, locks, views
/ 7Think Digital Customer Conference 2019
Structured Data
• Integrity• Relational• Columns have
known Data types
• Data/Log files• Fixed partition
sizes• Concurrently &
Locks
/ 8Think Digital Customer Conference 2019
Unstructured Data
• Complex structured data• Traditional database not needed for data management• Impose schema on read• Store data in its native format• The business users decide on which data to interpret• Scalable• Less administration & maintenance
/ 9Think Digital Customer Conference 2019
Unstructured Data
• No pre-defined data model
• All data types • Structured
information in different ways
• Large scale data mining
• Supports images, audio, video, email body text
/ 10Think Digital Customer Conference 2019
Traditional Data Warehouse
• Single source, ERP or Transactional• Simple data model• Relational data sources• Standard costing model• Optimized design for analytical queries• Traditional ETL design & data movement• User maintained (Server, DB, Tuning)
/ 11Think Digital Customer Conference 2019
Modern Data Warehouse
• Integrated MPP Data Platform• Near Real-Time• All Structured Data Types• Scalable & Performance • Supporting all levels of Analytics• No single solution for a data estate• All data may have value• Updates to source systems or
changing data types
• Data• -base• Warehouse• Bricks• Cube• Lake• Lake Analytics• Catalog
• Cosmos DB• Blob• Event Hub• Stream Analytics• HDInsight• Analysis Services
• 4:15
/ 12Think Digital Customer Conference 2019
Modern Data Warehouse - Azure
/ 13Think Digital Customer Conference 2019
Modern Data Warehouse - Azure
/ 14Think Digital Customer Conference 2019
/ 15Think Digital Customer Conference 2019
Business Cases
• What are you solving for the business? • Do not jump directly to technology architecture• Products support business need and usage• Lead >> Lag >> Match (think about scale & business comfort)• Data trends in your industry• Performance consideration, use the tools the way they were designed• Integrate multiple data sources
/ 16Think Digital Customer Conference 2019
Compute requirement U-SQL
ADLS WASB
Azure Data Lake Topology
/ 17Think Digital Customer Conference 2019
HDInsight Cluster
Azure Data Lake Storage
Domain credentials
Azure Storage Blob
Head node
Back-up
Data node
AAD tenantAzure VNET to VNET peering
Azure HDInsight Topology
/ 18Think Digital Customer Conference 2019
Com
pute
Rem
ote
stor
age
Cache TempDB
NVMe SSD
Cores Memory
Data
Log
Cache TempDB
NVMe SSD
Cores Memory
Cache TempDB
NVMe SSD
Cores Memory
Snapshot backups
Azure SQL Data Warehouse
/ 19Think Digital Customer Conference 2019
Azure Resource Manager
Storage Compute Network
Microsoft.Databricks RP
Azure Databricks Workspace
VNetVM
VM
VM
VM
Blob Storage ClustersDBFS
Azure Databricks Topology
/ 20Think Digital Customer Conference 2019
HDInsight Hive HDInsight Spark Azure Data Lake Azure Databricks Azure SQL DWVolume Petabytes Petabytes Petabytes Petabytes TerabytesSecurity AAD, ADLS /
Apache RangerAAD, ADLS AAD, ADLS AAD, ADLS, ADB
role-based accessTDE, Threat Detect, CA. AAD
Languages HiveQL SparkSQL, HiveQL, Scala, Java, Python, R
U-SQL, R, Python PySpark, SparkR, sparklyR, Scala, SparkSQL
T-SQL
Extensibility Yes, .NET/SerDe Yes, maven/PyPi Yes, .NET Yes, 3rd party libs PolybaseExternalSources
ORC, CSV,Parquet + others
Parquet, JSON,Hive + others
Text, CSV, TSV, Custom
CSV, JSON, Parquet + Many Sources
Text, Hive RCFile, Hive ORC, Parquet
Admin Medium-High Medium-High Low Low Low-MediumCost Model Nodes & VM Nodes & VM Units/Jobs VM, DBU DWU, cDWUSchema Definition
Schema on Read Schema on Read Schema on Read Schema on Read Schema on Write
Storage Blob or ADLS Blob or ADLS ADLS Blob or ADLS Internal Storage
Modern Data WarehousingDecision points and trade-offs, but not necessarily one versus the other...
/ 21Think Digital Customer Conference 2019
Next Steps
/ 22Think Digital Customer Conference 2019
Questions?
• Linkedin: https://www.linkedin.com/in/orlando-gonzalez
• Websites:• https://www.capaxglobal.com/• https://us.hitachi-solutions.com/• https://azure.microsoft.com/en-us/solutions/data-warehouse/
/ 23Think Digital Customer Conference 2019
Data processing with Azure Databricks
Modern Data Warehouse: ETL
Orchestration
Load flat filesinto data lake on a schedule
Ingest storage Data processing
Read data from files using DBFS
Serving storage
Load processed data into tables
optimized for analytics
Dashboards
Logs, files, and media (unstructured) Azure Storage/
Data Lake Store
Azure SQL DW
Applications
Azure Databricks
Azure Data Factory
Business and custom apps (structured)
Transactional storage
Applications manage their
transactional data directly
Extract and transform
relational data
Load into DBFS
Orchestration
SQL DBAzure Data
Factory
SQL