Babu Netezza

Data-centric computing with Netezza Architecture

DISC reading groupSeptember 24, 2007

High Level Points

• Supercomputer use model today:– Compile, submit, wait– Does a poor job of taking advantage of

human insight available in interactive models

• Large datasets can be interactively processed using Netezza

What is Netezza?

• Essentially: A big, fast SQL database

What is Netezza?

• Frontend provides SQL interface• Backend is a large rack of specialized blades

Custom Backend Blades

• Commodity CPU, NIC, disk• Custom FPGA replaces disk interface

– Can do basic filtering in hardware, i.e., stream processing before data hits main memory

Division of Data

• Database distributed across multiple (100+) SPUs

• Each SPU controls, manages its slice of DB

• No info on data management, replciation, etc.

Division of Labor

• SPU FPGA handles basic filtering tasks• SPU CPU handles record level processing:

filtering, parsing, projecting, logging, etc.• SPU CPU handles most operations on

intermediate results: sorts, joins, aggregates• Frontend CPU handles remaining operation

>>> Processing close to disk

What can this be used for?

• Paper gives 3 examples:– Citation graph processing– Search for particular structure in electrical

netlist– Word meaning disambiguation through search

of ontology

Citation graph example

• Look through large, sparse graph (16 million nodes, 388 million edges)

• Find both strong (direct edge) and weak couplings (e.g., two papers cite the same work)

• Essentially same code for workstation and Netezza – no need to expose parallel architecture

• Workstation DNF; 80-100x speedup on smaller tests

IC netlist example• Flattened netlist of 3.5 million transistors, 10

million wires• Search for AND structure

IC example results

• Combinatorial explosion makes directly joining all possibilities for each element impossible

• Can constrain better using fanouts of signals internal to the circuit

• Individual SQL queries for finding possible matches for the individual transistors took under 10 seconds

• Found all uses of the AND macro, as well as many other (1300+) identical structures generated through other means

Ontology example

• Expand out all possible interpretations of a phrase

• Ontology specifies lexical elements, IS-A relations, concepts, and constraints on concepts

• Goal is to search the space, expand concepts to find all matches to given phrase

Ontology results

• Partially unfolded ontology– Greatly expands database size, but reduces

iterations / recursions

• Recoded ontology triples as integers

• 5.58 sec. vs. 262 sec.

• can pipeline multiple queries

Issues

• Works if you can reduce your problem to SQL queries• All of the problems were based on graph expansion /

exploration – how about other domains?• Issues of database partitioning? How does arbitrary

slicing across 108 blades affect performance / scalability, esp. for non-sparse problems?

• Strawman comparison to workstation class machine: how does a traditional DB server / storage cluster compare?

Babu Netezza

Documents

Transcript of Babu Netezza