RSSI 2007 FPGA Acceleration For Production Use
Transcript of RSSI 2007 FPGA Acceleration For Production Use
RSSI 2007
FPGA Acceleration For
Production Use
Matthias Fouquet-Lapar
Principal Engineer
Multi-Paradigm Computing
Slide 2
Challenges for FPGAs in Production Use
• FPGAs remain (b)leading edge technology to the majority of customers
• High Level Language Tools make it easier to program FPGAs, but it remains a major effort :– Flat Application profiles in many areas
– Majority of legacy applications written in Fortran
– Significant investment for a customer with uncertain result
• System Integration (remember we talk HPC, not PC !)– Job Scheduling / Resource Allocation
– Accounting
– Scalability
– RAS
Slide 3
And HPC typically looks like this :
Slide 4
Challenges for FPGAs in Production Use
• Lots of different options :– Continuing fast evolution of top-end micro-processors
– Multi-core & many-core micro-processors
– GPGPUs
– Cell
– Dedicated FP accelerators
• Many past performance claims of “orders of magnitude”
speedups really talked about one specific accelerated loop –
the overall benefit to the customer was probably < 10
Slide 5
Challenges for FPGAs in Production Use
• The (in) famous “X” factor– Any implementation has to compare itself to leading edge micro-processors,
not to a years old processor generations
– This has (and continues) to have a very negative impact for the entire
FPGA community
• Lack of Standardization– SGI recognizes the enormous value Intel’s Quick Assist technology is
bringing to the table
– We are fully engaged in sharing years of experience developing accelerated
solutions as well as 25+ years of experience in HPC
Slide 6
Driving scalability in HPC
• In-Memory data bases become more important for accelerated
computing
– Largest Memory Configuration installed to date : 40 TeraBytes
• Continuing FPGA scaling
– Scaling of a 4 FPGA BLAST-N Benchmark to 8 FPGAs showed a
linear speedup
– Largest FPGA Configuration tested to date included 30 FPGA
– Next milestone : 128 FPGAs in Single System Image
Slide 7
Additional Support of X86 based ICE
• Strategic Relationship with Intel working on Quick Assist
• Implementation of an FSB solution for new Application
Spaces
• Focus remains on delivering application and solutions to
customers using 25+ years of in-depth application
experience in HPC
Slide 8
SGI’s Application Focus
• Government– Classified
• Life Sciences– Bio-Informatics
– Genomics
– Chem Informatics
• Data Management– Encryption
– Content Analysis, Search and Filtering
– Image Processing
Slide 9
SGI and Life Sciences Application Focus
• Bio-Informatics and Genomic research have an exponential
growth rate
Slide 10
SGI’s history in life science
• Over 20 years experience providing high performance
solutions for life sciences applications with a dedicated
worldwide team of experts in bioinformatics and
computational chemistry
• Other life-science partnerships include – Gaussian
– Schrodinger
– SCM
– Several open source applications
Slide 11
Partnership : Creating a Life Science
Appliance
• SGI and Mitrionics share a common vision :
Provide turn-key solutions and appliances to facilitate
easier adoption of accelerator technologies for
academic and industrial partners
• Focus on application areas which have dramatically
increasing computing demands and which are
addressable with today’s FPGA technology
Slide 12
Our Vision
• Open Systems Scalable Infrastructure
• Open Source
• Create a Bio-Informatics Community adding to the existing application
stack :
– IBLAST (Interactive BLAST)
– Smith-Waterman
– Clustal W
– Needleman-Wunsch
• Industrial quality grade of the implementation
• Workshops to facilitate and enable development
• “Full Care Support Option” for customers who want to run out of the box
Slide 13
Creating a BLAST-N solution
• From a technical perspective these applications are a good fit for FPGAs
– Integer (actual character/bit sized operands)
– High degree of parallelism
– “Hot spots" in the application profile
– Earlier black box implementations (using ASICs or FPGAs) from companies
such as Paracel or Time-Logic have shown the potential, but • Limited acceptance by customers since there was no way of changing the black-box (taking
the “Programmable” out of FPGAs)
• Limited I/O Bandwidth, Limited system Integration
• Expensive
• too specialized
• non-general purpose
Slide 14
Design goals for BLAST-N : Easy Integration into
existing customer workflows
• NCBI BLAST is the de-facto standard for the majority of
customers
• An RASC appliance should be able to plug into existing
workflows– Consistent results with NCBI BLAST
– Coherent parameter set with NCBI BLAST
– Option to run either the standard CPU version or the FPGA accelerated
version
– Automatic fallback to CPU versions if all FPGAs are busy
– Work “out of the box”
Slide 15
… but a customer expects more then a simple
replacement
• Order of magnitude of speedup compared to current
implementations
• Scalable solution – the system can grow with steady
increasing demands
• Not black-box solution :– Open System Architecture
– Open Source Software
• In addition : – Very significant savings in terms of infrastructure (power, cooling) :
green computing
– Reduction of foot-print (machine room size)
Slide 16
So what did we achieve with BLAST 1.0 ?
released on 15-Jun-2007 on sourceforge.net
• Test case :– 500bp query AB000401 (Mycoplasma capricolum rpmH and dnaA genes,
partial cds) from the EMBL database
– Database set includes the NCBI BLAST benchmark suite’s nucleotide
database (benchmark.nt), the Mouse EST database, and the
Nonredundant Nucleotide (NT) database
• Results :– Using the NCBI BLAST benchmark suite for BLASTN a single 500 bp
query ran slower then the CPU implementation
– Merging 64 x 500bp queries speedup : 10X – 28X
– Merging 256 x 500bp queries speedup : 12X – 56X
Slide 17
Benchmark Results compared to 4 core
Opteron 8820 SEserver
0
10
20
30
40
50
60
70
6 Large Queries averaging 115,000+ bp
from the Drosophila Genome against the
GenBank Mouse EST Database with 2.1B
bp
Production Run of 3,534 Short Queries
(25bp) against a human genome
database with 4B bp
4-core Opteron
8820 SE server
SGI RASC
Appliance for
Bioinformatics
with 4 FPGAs
0.74
13.33
25
1,490
Speedup
Throughput (queries/min)
Slide 18
Factoring in Green Computing
0
1
2
3
59
60
61
62
63
Performance
(Queries/Min)
Power
(Watts)
Queries/KWHr
4-core Opteron
8820 SE server
SGI RASC
Appliance for
Bioinformatics
with 4 FPGAs
2,049
128,501
726 69625
1,490
Results Relative
to Opteron Server
Speedup
Slide 19
Running multiple instances
• Executing multiple instances, very good scalability
Slide 20
Results
• Wall-Clock throughput improvements for real test cases between 10X – 60X compared to Opteron 2.8 Ghz
• Results are consistent with NCBI’s BLAST-N implementation (< 0.3 error rate)
• Power Consumption per query 90% - 95% less than CPU implementation
• Clear Price / Performance advantage over top-end quad-core micro-processor implementations (leaving out power & cooling cost)
• Attractive complete bundle including Hardware, System Software and the BLAST-N application for less than $40K
Slide 21
No assembly required
- and you don’t need batteries
SGI RASC Appliance for Bioinformatics
8.25” high chassis for standard 19” racks
Slide 22
Who is using this ?
• Customers and Beta-Test Installations
– National Cancer Institute / US
– Chinese National Hume Genome Center / Shanghai China
– Merck / Germany
– Universite de Laval / Canada
– University of Arizona / US
Thank You