An Introduction to
Sector/Sphere
Sector & Sphere
Yunhong Gu Univ. of Illinois at Chicago and VeryCloud LLC
@CHUG, June 22, 2010
What is Sector/Sphere?
Sector: Distributed File System Sphere: Simplified Parallel Data
Processing Framework Goal: handling big data on commodity
clusters Open source software, BSD license,
written in C++. Started since 2006, current version 2.3 http://sector.sf.net
Motivation: Data Locality
Storage Compute
Data
Super-computer model:Expensive, data IO bottleneck
Sector/Sphere model:Inexpensive, parallel data IO, data locality
Motivation: Simplified Programming
Parallel/Distributed Programming with MPI, etc.:Flexible and powerful.very complicated application development
Sector/Sphere model (cloud model):Clusters regarded as a single entity to the developer, simplified programming interface.Limited to certain data parallel applications.
Motivation: Global-scale System
Systems for single data centers:Requires additional effort to locate and move data.
Sector/Sphere model:Support wide-area data collection and distribution.
Data Center
Data Center
Data Center
Data ProviderUS Location
Data ReaderAsia Location
Sector/Sphere
Data ProviderUS Location Data Provider
US Location
Data ProviderEurope Location
Up
loa
d
Data UserUS Location
Processing
Data ReaderAsia Location
Sector Distributed File System
DFS designed to work on commodity hardware racks of computers with internal hard disks
and high speed network connections File system level fault tolerance via
replication Support wide area networks
Can be used for data collection and distribution
Not POSIX-compatible yet
Sector Distributed File System
Security Server Masters
slaves slaves
SSL SSLClients
User accountData protectionSystem Security
MetadataScheduling
Service provider
System access toolsApp. Programming
Interfaces
Storage and Processing
Data
UDTEncryption optional
Security Server
User accounts, permission, IP access control lists
Use independent accounts, but connect to existing account database via a simple “driver”, e.g., Linux accounts, LDAP, etc.
Single security server, system continue to run when security server is down, but new users cannot login
Master Servers Maintain file system metadata
Metadata is a customizable module, currently there are two implementations, one in-memory and one on disk
Authenticate users, slaves, and other masters (via security server)
Maintain and manage file replication, data IO and data processing requests Topology aware
Multiple active masters can dynamically join and leave; load balancing between masters
Slave Nodes
Store Sector filesSector file is not split into blocksOne Sector file is stored on the “native” file
system (e.g., EXT, XFS, etc.) of one or more slave nodes
Process Sector dataData is processed on the same storage node,
or nearest storage node possible Input and output are Sector files
Clients Sector file system client API
Access Sector files in applications using the C++ API Sector system tools
File system access tools FUSE
Mount Sector file system as a local directory Sphere programming API
Develop parallel data processing applications to process Sector data with a set of simple API
The client communicate with slave directly for data IO, via UDT
UDT: UDP-based Data Transfer
http://udt.sf.net Open source UDP based data transfer
protocolWith reliability control and congestion control
Fast, firewall friendly, easy to use Already used in many commercial and
research systems for large data transfer
Application-aware File System
Files are not split into blocksUsers are responsible to use proper sized
files Directory and File Family
Sector will keep related files together during upload and replication
In-memory object
Sphere: Simplified Data Processing
Data parallel applications Data is processed at where it resides, or on the
nearest possible node (locality) Same user defined functions (UDF) are applied
on all elements (records, blocks, files, or directories)
Processing output can be written to Sector files or sent back to the client
Transparent load balancing and fault tolerance
Sphere: Simplified Data Processing
for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …);
n
SPE
n+1n+2n+3...n+m
SPESPESPE
n-k...nn+1n+2n+3
Sphere Client
Application
Locate and Schedule SPEs
Split data
Collect result
Input Stream
Output Stream
SphereStream sdss;sdss.init("sdss files");SphereProcess myproc;myproc->run(sdss,"findBrownDwarf", …);
findBrownDwarf(char* image, int isize, char* result, int rsize);
Sphere: Data Movement
Slave -> Slave Local Slave -> Slaves
(Hash/Buckets) Each output record is
assigned an ID; all records with the same ID are sent to the same “bucket” file
Slave -> Client
nn+1n+2n+3...n+m
SPESPESPESPE
SPESPESPESPE
0123...b
0123...b
Sta
ge 1
: Shu
fflin
gS
tage
2: S
ortin
g
Input Stream
Intermediate Stream
Output Stream
What does a Sphere program like?
A client applicationSpecify input, output, and name of UDF Inputs and outputs are usually Sector
directories or collection of filesMay have multiple round of computation if
necessary (iterative/combinative processing) A UDF
A C++ function following the Sphere specification (parameters and return value)
Compiled into a dynamic library
Sphere/UDF vs. MapReduce
Map = UDF MapReduce = 2x UDF
First UDF generates bucket files and second processes the bucket files.
Sphere/UDF vs. MapReduce
Sphere is more flexible and efficient UDF can be applied directly on records, blocks, files,
and even directories Support multiple inputs/outputs with better data
locality, including certain legacy applications that process files and directories
Native binary data support w/ permanent index files Sorting is required by Reduce, but it is optional in
Sphere
Output locality allows Sphere to combine multiple operations more efficiently
Sphere Benchmarks Terasort: sort 1TB data over distributed servers Malstone: detect malware website from billions
of transactions Graph processing: analyze very large social
networks at billions of vertices (BFS and enumerating cliques)
Genome pipeline: analyze genome sequences Satellite image processing: compare satellite
images of different time, for disaster relief
Sphere is about 2 – 4 times faster than Hadoop
Open Cloud Testbed
15 Racks in Baltimore (JHU), Chicago (StarLight and UIC), and San Diego (Calit2)
10Gb/s inter-site connection on CiscoWave
1 - 2Gb/s inter-rack connection Two dual-core AMD CPU, 8 - 16GB RAM,
1-4TB RAID-0 disk
Development Status
Current version 2.3, all core functions ready, still working on to improve code quality and details for certain modules.
Partly funded by NSF for NCDM/UIC Commercial support via VeryCloud LLC Next step: support column-based data
tables (similar to BigTable) Open source contributors are welcome
Top Related