Post on 19-Nov-2014
description
04/08/2023
An Active and Hybrid Storage System for Data-intensive Applications
Ph.D Candidate: Zhiyang Ding
Defense Committee Members:Dr. Xiao QinDr. Kai H. ChangDr. David A. UmphressUniversity Reader:Prof. Wei Wang,Chair of the Art Design Dept.
2
Cluster Computing
04/08/2023
• Large-scale Data Processing is everywhere.
3
Motivation
04/08/2023
• Traditional Storage Nodes on the Cluster
Client Network switch
Compute Nodes
Storage Node (or Storage Area Network)Internet
Head Node
4
Motivation
04/08/2023
• What’s the next? • More “Active”.
Storage Node
Client Network switch
Compute Nodes
Internet
Head Node
Computation OffloadI/O Request
Raw DataPre-processed Data
5
About the Active Storage
04/08/2023
pp-mpiBlast:How to deploy Active Storage?
McSD: A Smart Disk Model
Storage Node HcDD:Hybrid Disk for Active Storage
604/08/2023
McSD: A Multicore Active Storage Device
• I/O Wall Problem: CPU--I/O Gap– Limited I/O Bandwidth– CPU Waiting and Dissipating the Power
• How to – Bridge CPU--I/O Gap– Reduce I/O Traffic
7
• “Active”: – Leveraging the Processing Power of Storage Devices
• Benefits:– Offloading Data-intensive Computation– Reducing I/O Traffic– Pipeline Parallel Programming
04/08/2023
Why McSD?
8
• Design a prototype of a multicore active storage
• Design a pre-assembled processing module
• Extend a shared-memory MapReduce system
• Emulate the whole system on a real testbed
04/08/2023
Contributions
9
• Traditional Smart/Active Disks– On-board: Embedding a processor into the hard disk– Various Research Models• e.g. active disk, smart disk, IDISK, SmartSTOR, and etc.
04/08/2023
Background: Active Disks
• However, “active disk” is not adopted by hardware vendors
Improved attachment technologies
I/O Bound Workloads
Cost of the System
Reliability
10
• Multi-core Processors or Multi-processors– 45% transistors increase 20% processing power
• MapReduce: a Parallel Programming Model– MapReduce by Google– Hadoop, Mars, Phoenix, and etc.
• Multicore and Shared-memory Parallel Processing
04/08/2023
Background: Parallel Processing
1104/08/2023
Design: System Overview
Multicore and Shared-memory
Parallel Processing
Communication Mechanism
Hybrid Storage Disks
Pipeline Parallel Processing
Design of an Active Storage
12
• Computation Mechanism– Pre-assembled Processing Model– smartFAM
• Extend the Shared-Memory MapReduce by Partitioning
04/08/2023
Design and Implementation
13
• Pre-assembled Processing Modules– Meet the nature of embedded services– Reduce Complexity and Cost– Provide Services• E.g. Multi-version antivirus service, Pre-process of data-
intensive apps, De-duplication, and etc.
• How to invoke services?
04/08/2023
Pre-assembled Processing Modules
14
• smartFAM = Smart File Alternation Monitor– Invokes the pre-assembled processing modules or
functions by monitoring the changes of the system log file.
• Two Components:– an inotify function: a Linux system function– a trigger daemon
04/08/2023
smartFAM
1504/08/2023
Design and Implementation
12
3
1604/08/2023
Extend the Phoenix:A Shared-memory MapReduce Model
• Extend the Phoenix MapReduce Programming Model by partitioning and merging– New API: partition_input– New Functions:
• partition (provided by the new API)• merge (Develop by user)
• Example:– wordcount [data-file][partition-size][]
1704/08/2023
Pipeline Processing
18
• Testbed
• Benchmarks– Word Count– String Match– Matrix Multiplication
• Individual Node Performance• System Performance04/08/2023
Evaluation Environment
19
Word Count (seconds) String Match (seconds)
1 GB 1.25 GB 1 GB 1.25 GB
w/ Partition 40.60 50.91 17.76 20.61
w/o Partition 85.74 139.54 17.62 21.00
04/08/2023
Individual Node Performance
20
Matrix-Multiplication and Word-Count (Speedups)
Input Data Size vs Single Machine vs Single-core Active vs McSD w/o Partition
500 MB 1.47 X 2.15 X 0.99 X
750 MB 1.45 X 2.09 X 1.04 X
1 GB 7.62 X 2.14 X 6.07 X
1.25 GB 19.01 X 2.50 X 15.39 X
04/08/2023
System Evaluation
21
• It can improve system performance by offloading data-intensive computation
• McSD is a promising active storage model with– Pre-assembled processing modules– Parallel data processing – Better Evaluation Performance
04/08/2023
Summary
22
Storage Node
About the Active Storage
04/08/2023
pp-mpiBlast:How to deploy Active Storage?
McSD: A Smart Disk Model
HcDD:Hybrid Disk for Active Storage
23
• So far, we know the potential of Active Storages
• Challenge: How to coordinate active storage nodes with computing nodes?
• Propose a Pipeline-parallel Processing pattern
04/08/2023
Apply Active Storages to a Cluster
24
• Propose a pipeline-parallel processing framework to
“connect” a Active Storage node with computing nodes.
• Evaluate the framework using both an analytic model
and a real implementation.
• Case Study: Extend an existing bioinformatics
application based on the framework.
04/08/2023
Contributions
2504/08/2023
Background: Active Storage
SSD
Mass Storage
Active Storage Node
SSD
Memory
Buff Disks
Processor
Computation
Bridge?
27
• BLAST*: Basic Local Alignment Search Tool– Comparing primary biological sequence
information
• mpiBLAST** is a freely available, open-source, parallel implementation of NCBI BLAST. – Format raw data files– Run a parallel BLAST function
04/08/2023
Background: Bioinformatics App
*http://blast.ncbi.nlm.nih.gov/**http://www.mpiblast.org/
28
• Offload the raw-data formatting task to where data stores.
• Intra-application Pipeline-parallel Processing by “partition” and “merge”.
• pp-mpiBlast, a case study.
04/08/2023
Pipeline-parallel Design
29
Active Storage Node Computing Nodes
04/08/2023
Pipelining Workflow
Output File
RawInput File
Partition 1
2
…Partition
n
Intermediate 12
…Intermediate
n
Partition
Sub-output 1
2
…Sub-output
n
FormatDB mpiBlast Merge
(n-1) times
n
(n-1) times
1
Inter-mediat
esFormart DB OutputFormart DB
3004/08/2023
Analytic Model
• Three Critical Measures
31
Computing Nodes Configuration Active Storage ConfigurationCPU Intel XEON X3430 Intel Core 2 Q9400
Memory 2 GB DDR3 (PC3-10600)OS Ubuntu 9.04 Jaunty Jackalope 32bit Version
Kernel 2.6.28-15-genericNetwork Gigabit LAN
04/08/2023
Evaluation Environment
Our Testbed Opposite Testbeds“Pipeline-parallel” “12-node Cluster” “13-node Cluster”12 Computing Nodes 12 Computing Nodes 13 Computing Nodes1 Active Storage Node 1 Storage Node 1 Storage Node
3204/08/2023
Pipeline-parallel Design
Results: Compared With 12-node System
Results: Compared With 13-node System
3304/08/2023
Speedups Trends: Partition Size
34
• We proposed a pipeline-parallel processing mechanism to apply an Active Storage Node.
• As a case study, we extended a classic bioinformatics application based on the pipeline-parallel style.
04/08/2023
Summary
35
About the Active Storage
04/08/2023
pp-mpiBlast:How to deploy Active Storage?
McSD: A Smart Disk Model
Storage Node HcDD:Hybrid Disk for Active Storage
3604/08/2023
What’s Hybrid?
A Hybrid Combination of a Gas Engine and a Electronic Engine
Power Efficiency
3704/08/2023
Hybrid Disk Drives
• A Hybrid Combination of Two Types of Storage Devices: HDD and SSD– HDD: Magnetic Hard Disk– Solid State Disk: Built by NAND-based flash memory.
What are their roles?
3804/08/2023
Motivation
• However, SSDs suffer reliability issues.
• In a hybrid storage system, using SSDs as the buffer can boost the performance.
39
• Flash Memory:– Each Block consists 32 or 64 or128 pages. – Each Page is typically 512 or 2,048 or 4,096 bytes.
• “Erase-before-write” at block level.• Lifespan is 10,000 Program/Erase cycles.– E.g., *The lifespan of an 80 GB MLC SSD can only
last 106 days, if the write rates is 30 MB/s.
04/08/2023
Limitations Related to SSDs
• Rethink about their roles?*Based on the SSD lifespan calculator provided by Virident.com
40
• Hybrid Combination of HDD and SSD disks
• De-duplication Service using HDDs as a Write Buffer
• Internal-parallel Processing in SSD
• Simulation of the Whole System For Evaluation
04/08/2023
Contributions
4104/08/2023
Hybrid Disk Configuration
HDD
SSD
I/O Requests
Read Requests
Data of Write Requests
data
Data
De-duplication
Dedicated Processor
Pre-processingRead RequestsPre-processed Data
dataDeduplicated
4204/08/2023
HcDD Architecture
4304/08/2023
Deduplication Design
4404/08/2023
Internal Parallel Processing
4504/08/2023
Evaluation
4604/08/2023
Internal Parallelism Evaluation:Single Node
4704/08/2023
Single Node: Dedup Ratio
4804/08/2023
System Performance Evaluation
4904/08/2023
System Performance Evaluation
5004/08/2023
Summary
51
Conclusion
04/08/2023
pp-mpiBlast:How to deploy Active Storage?
McSD: A Smart Disk Model
Storage Node HcDD:Hybrid Disk for Active Storage
52
Future Work
04/08/2023
53
Many Thanks!And Questions?
04/08/2023