Extreme computing Databases and MapReduce · Extreme computing Databases and MapReduce Stratis D....
Transcript of Extreme computing Databases and MapReduce · Extreme computing Databases and MapReduce Stratis D....
Extreme computingDatabases and MapReduce
Stratis D. Viglas
School of InformaticsUniversity of Edinburgh
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Outline
Databases and MapReduceOverviewRelational databasesRelational data processing on Hadoop MRBigTableHive and Pig
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
A different data model
• BigTable’s data model is not relational• A table is “a sparse, distributed, persistent multidimensional sorted
map”• The map is indexed by a triplet
• (row:string, column:string, time:int64)• row and column are keys, time is a timestamp
• Bigtables are mutable at the row level• Support for insertions, deletions, lookups
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Rows and columns in more detail
"<html>..."
"<html>..."
"<html>..."
"CNN" "CNN.com"
t3t5
t6
t9 t8com.cnn.www
contents: anchor:cnnsi.com anchor:my.look.ca
• Rows are maintained in sorted lexicographic order• Applications can exploit this property for efficient row scans• Row ranges dynamically partitioned into tablets
• Columns grouped into column families• Column key = family:qualifier• Column families provide locality hints• Unbounded number of columns per table
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Building blocks: SSTable
• The smallest and most basic building block• Persistent immutable map from keys to values
• Stored in GFS• Sequence of disk blocks with a (persistent) index for lookup• Memory-mapped for fast operation
• Two supported operations• Given a key, look up the value associated with it• Iterate over key/value pairs within a given key range
64kBblock
64kBblock
64kBblock
Index
SSTable
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Building blocks: Tablets and Tables
• Dynamically partitioned range of rows• Built from multiple SSTables
64kBblock
64kBblock
64kBblock
Index
SSTable
64kBblock
64kBblock
64kBblock
Index
SSTable
Tablet start: aardvark end: apple
• Multiple tablets make up a table• SSTables can be shared beween tablets
SSTable
Tabletaardvark apple
SSTable SSTable SSTable
Tabletapplepie boat
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Notes on the architecture
• Similar to GFS
• Single master server, multiple tablet servers
• BigTable master
• Assigns tablets to tablet servers
• Detects addition and expiration of tablet servers
• Balances tablet server load
• Handles garbage collection
• Handles schema evolution
• Bigtable tablet servers
• Each tablet server manages a set of tablets
• Typically between ten to a thousand tablets
• Each 100− 200MB by default
• Handles read and write requests to the tablets
• Splits tablets when they grow too large
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Location dereferencing
Chubby file ...
...
...
...
...
...
...
...
...
...
...
Other metadatatablets
Root tablet(1st metadata level)master file
User table 1
User table nchubby: replicated, persistent lock service; maintains tablet server locations
root tablet: root of the metadata tree
at most three levels in the metadata hierarchy
B-tree like structure, indexed by table identifier and end row
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Tablet assignment
• Master keeps track of• Set of live tablet servers• Assignment of tablets to tablet servers• Unassigned tablets
• Each tablet is assigned to one tablet server at a time• Tablet server maintains an exclusive lock on a file in Chubby• Master monitors tablet servers and handles assignment
• Changes to tablet structure• Table creation/deletion (master initiated)• Tablet merging (master initiated)• Tablet splitting (tablet server initiated)
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Tablet serving and I/O flow
SSTable SSTable SSTable
memtable read
write
memory
GFS
tablet log
write operations arelogged (in redo records)
recent updates kept sorted in main memory
memtable and SSTablesare merged to servethe read request
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce BigTable
Tablet management
• Minor compaction• Converts the memtable into an SSTable• Reduces memory footprint and log traffic on restart
• Merging compaction• Reads the contents of a few SSTables and the memtable, and writes
out a new SSTable• Reduces number of SSTables
• Major compaction• Merging compaction that results in only one SSTable• No deletion records, only live data
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce Hive and Pig
Outline
Databases and MapReduceOverviewRelational databasesRelational data processing on Hadoop MRBigTableHive and Pig
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce Hive and Pig
High-level data processing
• Hive: data warehousing application in Hadoop
• Query language is HQL, variant of SQL
• Tables stored on HDFS as flat files
• Developed by Facebook, now open source
• Pig: large-scale data processing system
• Scripts are written in Pig Latin, a dataflow language
• Developed by Yahoo!, now open source
• Roughly 1/3 of all Yahoo! internal jobs
• Common idea
• Provide higher-level language to facilitate large-data processing
• Higher-level language is compiled to Hadoop jobs
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce Hive and Pig
Hive: background and components
• Started at Facebook1
• Data was collected by nightly cron jobs into Oracle DB• Extract-transform-load (ETL) via hand-coded python• Grew from 10s of GBs (2006) to 1TB/day new data (2007), now 10x that
• Shell: allows interactive queries• Driver: session handles, fetch, execute• Compiler: parse, plan, optimize• Execution engine: DAG of stages (MR, HDFS, metadata processing)• Metastore: schema, location in HDFS, SerDe
1It had to be good for something apart from wasting my PhD students’ timeStratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce Hive and Pig
Logical and physical models
• Tables
• Typed columns (int, float, string, boolean)
• Also: list, map
• Partitions
• For example, range-partition tables by date
• Buckets
• Hash partitions within ranges (useful for sampling, join optimization)
• Metastore
• Database: namespace containing a set of tables
• Holds table definitions (column types, physical layout)
• Holds partitioning information
• Can be stored in Derby, MySQL, and many other relational databases
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce Hive and Pig
Hive processing
• Hive uses HQL, a declarative query language close to SQL
• HQL statements are translated into a syntax tree
• Syntax tree is compiled into an execution plan of MapReduce jobs,
executed by Hadoop
SELECT s.word, s.freq, k.freqFROM shakespeare s JOIN bible kON (s.word = k.word) WHERE s.freq >= 1 AND k.freq >= 1 ORDER BY s.freq DESC LIMIT 10;
HQL query Abstract Syntax Tree
map
reduce
map
reduce
map
reduce
map
reduce
map
reduce
map
reduce
MapReduce plan
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce Hive and Pig
Pig and Pig Latin
• Similar idea to Hive, but more tailored towards efficiency and a
DB-like setting
• Script interface to deploy MapReduce jobs
• Maintains schema and performs type checking
• Rudimentary optimiser to translate Pig scripts into an efficient
physical dataflow
• Sequence of one or more MapReduce jobs
• Exploit heuristics and cost model to reduce intermediate data
• Dataflow is scheduled and executed
• Runtime tracks job progress and any errors
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce Hive and Pig
Example Pig Latin script
Visits = load ’/data/visits’ as (user, url, time);Visits = foreach Visits
generate user, Canonicalize(url), time;
Pages = load ’/data/pages’ as (url, pagerank);
VP = join Visits by url, Pages by url;UserVisits = group VP by user;UserPageranks = foreach UserVisits
generate user, AVG(VP.pagerank) as avgpr;
GoodUsers = filter UserPageranks by avgpr > ’0.5’;store GoodUsers into ’/data/good_users’;
Stratis D. Viglas www.inf.ed.ac.uk
Databases and MapReduce Hive and Pig
Java vs. Pig Latin
20406080
100120140160180
Hadoop Pig
lines
of c
ode
50
100
150
200
250
300
Hadoop Pig
min
utes
• Performance on par with raw Hadoop
• But with 1/20 of the lines of code
• And with 1/16 of the developement time
Stratis D. Viglas www.inf.ed.ac.uk