The Mercury System: Embedding Computation into Disk Drives
-
Upload
sydnee-gilliam -
Category
Documents
-
view
28 -
download
0
description
Transcript of The Mercury System: Embedding Computation into Disk Drives
![Page 1: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/1.jpg)
The Mercury System: Embedding Computation into Disk Drives
Roger Chamberlain, Ron Cytron,Mark Franklin, Ron Indeck
Center for Security TechnologiesWashington University in St. Louis
![Page 2: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/2.jpg)
Enabling Technology: Disk Drives
Magnetic disk storage areal density vs. year of IBM product introduction (From D. A. Thompson)
~10,000,000x increase in 45 years!
(over 50% per year)
1960 1970 1980 1990 2000 2010
100000
1000
10
0.1
0.001
Are
al d
en
sity
(M
b/in
2 )
![Page 3: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/3.jpg)
Cost per Megabyte
Cost decreasing 3% per week!
Price history of hard disk product vs. year of product introduction (From D. A. Thompson)
1980 1985 1990 1995 2000
1000
100
10
1
0.1
0.01
Pric
e pe
r m
egab
yte
(dol
lars
)
![Page 4: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/4.jpg)
Massive Data
• Storage industry shipped 4,000,000,000,000,000,000 Bytes last year
• MasterCard recently installed a 200 TByte data warehouse in St. Louis
• US intelligence services collect data equaling the printed collection of the US Library of Congress every day!
![Page 5: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/5.jpg)
Enabling Technology:Reconfigurable Hardware
• Field Programmable Gate Arrays (FPGAs) provide custom logic function capability
• Operate at hardware speeds• Can be altered (reconfigured) in the field to
meet specific application needs
program m able log ic
program m able in terconnec t
![Page 6: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/6.jpg)
What are we doing?
Within the Center, we are combining the capabilities of these two enabling technologies to build extremely fast data search engines.
We do this by moving the search closer to the data, and performing it in hardware rather than software.
![Page 7: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/7.jpg)
Important Application:Intelligence Data
• Lots of data– Public (e.g., web pages)– Clandestine (e.g., via national technical means)
• Growing constantly• Many perturbations of individual words
– Tzar, Tsar, Czar, …
• Query and field types aren’t known a priori
![Page 8: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/8.jpg)
Finding a needel in a haystack
• Text can contain errors• Often seek an approximate match, e.g.
needle• No match? Try 2-transpositions
enedle, needle, nedele, neelde, needel• No match? Try 1-deletions
eedle, nedle, nedle, neele, neede, needl• No match? Try insertions, larger edits, …
![Page 9: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/9.jpg)
Genome Application
• Genome maps being expanded daily– 80,000 genes, 3 billion base pairs (A,C,G,T)
• Look for matches– Identify function– Disease: understand, diagnose, detect, therapy– Biofuels, warfare, toxic waste– Understand evolution– Forensics, organ donors, authentication– More effective crops, disease resistance
![Page 10: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/10.jpg)
DNA String Matching
• Looking for CACGTTAGT…TAGC• Interested in matches and near matches
• Search human genome, other gene oceans– Need to search entire data sets
![Page 11: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/11.jpg)
Bio Computation Problem
*BIG* Genome
DatabasesA C G
T G
T A C A G
DNA pattern
DNA sequence
Match?
![Page 12: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/12.jpg)
Image Database Applications
• Challenging database• Unstructured• Massive data sets• Don’t know what we need to look for
in each picture
![Page 13: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/13.jpg)
Object Recognition
• Face recognition• Match template with image• Template database must be searched• Strict time constraints for matching
and overall search
![Page 14: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/14.jpg)
Washington University Campus
![Page 15: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/15.jpg)
Satellite Data
• Low orbit fly-over every 90 minutes• Look for differences in images
– Large objects– Troops– Changes to landscape
• Flag, transmit these differences immediately
![Page 16: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/16.jpg)
How do we find what we’re looking for most effectively?!
![Page 17: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/17.jpg)
Conventional Structured Database
Did
43
12
DocumentAgent James Bond
Agent mobile computerJames Madison movie
James Bond movie
Word
Jamescomputer
agentBond
Inverted list - pointers<1,2><1,4><2>
<1,3,4>Madison <3>mobile <2>movie <3,4>
![Page 18: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/18.jpg)
Challenges in SearchingThese Massive Databases
• If we know what we will be looking for– Need to build index beforehand– Maintain index as it changes
• If we don’t know what we want a priori– Need to search the whole database!
![Page 19: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/19.jpg)
Conventional Search
Hard driveProcessor
MemoryI/O bus
Hard drive
![Page 20: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/20.jpg)
Conventional Search
Hard driveProcessor
MemoryI/O bus
Hard drive
find …
![Page 21: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/21.jpg)
Conventional Search
Hard driveProcessor
MemoryI/O bus
Hard drive
no, no, no, yes, no …
contents
![Page 22: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/22.jpg)
Conventional Search
Hard driveProcessor
MemoryI/O bus
Hard drive
no, no, yes, no, no
…
contents
![Page 23: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/23.jpg)
Conventional Approach
![Page 24: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/24.jpg)
WUSTL’s Approach
![Page 25: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/25.jpg)
Streaming Approach
Hard driveProcessor
Memory
I/O bus
Memory
bus
Reconfigurable hardware
Search Engine
Hard driveReconfigurable
hardware
Search Engine
![Page 26: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/26.jpg)
Streaming Approach
Hard driveProcessor
Memory
I/O bus
Memory
bus
Reconfigurable hardware
Search Engine
Hard driveReconfigurable
hardware
Search Engine
find …
![Page 27: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/27.jpg)
Streaming Approach
Hard driveProcessor
Memory
I/O bus
Memory
bus
Reconfigurable hardware
Search Engine
Hard driveReconfigurable
hardware
Search Engine
find …
find …
![Page 28: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/28.jpg)
Streaming Approach
Hard driveProcessor
Memory
I/O bus
Memory
bus
Reconfigurable hardware
Search Engine
Hard driveReconfigurable
hardware
Search Engine
no, no, no, yes, no …
no, no, yes, no, no …
![Page 29: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/29.jpg)
Search Engine in Context
p ro cesso r
ca che
ma inmemo ry
memo ry b us
b r idg e
I/O b
us
d iskco n tro lle r
da ta sh ift reg is te r
reco n fig u rab lelog ic P
d iskda ta
to d iskco n tro lle r
search engine
![Page 30: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/30.jpg)
Reconfigurable Hardwarefor Text Searches
A T C G G T
data shif t registerdiskdata
A A T C G Gco m pareregister
AND
f ine-grainco m pariso n
wo rd-levelco m pariso n
m atch signal
x x x x
x
A A T C G G
data shif t registerdiskdata
A A T C G Gco m pareregister
AND
f ine-grainco m pariso n
wo rd-levelco m pariso n
m atch signal
![Page 31: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/31.jpg)
Sources of Performance Gains
1. Disk Search Parallelism: Each engine searches in parallel across a disk or disk surface
2. System Parallelism: Searching is off-loaded to search engines and main processor can perform other tasks
3. Reduced data movement overhead: Disk data moves principally to search engine, not successively over system bus, memory bus, to cache, etc.
4. Hardware logic for searching: Searching, matching, and query operations are performed on streaming data in hardware rather than in software
5. Specialized hardware logic tailored to queries: Reconfigurable hardware permits matching the query logic to the search engine logic and preserves flexibility
![Page 32: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/32.jpg)
Technical Status
• Prototype operational• External to an ATA/100 drive
performance is currently disk-limited– SCSI-based RAID system under
development
• 3 applications functional– Exact text search– Approximate text search (agrep)– Biosequence search (Smith-Waterman)
![Page 33: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/33.jpg)
Performance
Speedup relative to 1 GHz processor
Application Disk-limited speedup
Logic-limited speedup
Exact text search 1.1 14
Approx. text search 12 31
Biosequence search 50 125
![Page 34: The Mercury System: Embedding Computation into Disk Drives](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813484550346895d9b632a/html5/thumbnails/34.jpg)
Summary
• Fast, inexpensive searches for large and changing databases
• Approximate searches supported• Up to 100 times faster than standard
database searches• Performance is scalable and uses
conventional disk drives• Data Search Systems, Inc. is
actively commercializing the technology