Elsip
Adam Edström CEO
Bengt Edlund Sales Director
December 2012
© Elsip 2012. Elsip non-confidential
Needs and Solutions
Our Sweet Spot
Software Defined Data Management
for Many-core SoC Designs
Why we're needed
Many-core
Reconfigurability
Complexity
Many-core instead of MHz
•Clock frequencies don’t rise anymore.
Figure shows the clock frequencies of processors presented at ISSCC between 1993 and 2011. After a long period of steady increase the top frequency has leveled off since 2005, at 2-3 Ghz. Source: ISSCC, 2011.
Entering the Many-core Era
More parallelism is the only way to higher performance.Sequential programs limits multithreading to ~10 instructions per cycle.Higher degrees of parallelism have to be extracted at the process and the application levels.=> Hundreds of cores in a few years!
Microprocessor Challenges
1980 1990 2000 2010 2020
100.000.000
10.000.000
1.000.000
100.000
10.000
1.000
100
10
1
Performance
Memory Bandwidth Programming Scaling
Serial Challenges Parallel
Early days20%/Year
Single CoreTornado50%/Year
Multi Core days20%/Year
Saturation10%/Year
Many Core Days25%/Year More than
100x improvement
10-100xImprovement3D SoC
memory
Makimoto extended
Reconfigurability
Products are increasingly defined as flexible platforms. Standardization is pushed by the fact that future products will include more embedded processing, more communication and more interconnect.
=> Heterogeneous IC architectures, with flexible reconfigurable processing cores, and interface components configurable for standardized communication and interaction protocols.
DesignComplexity
Std single core CPU with cache and main memory
Multi-core with distributed shared cache with common main memory
Homogeneous many-core with distributed shared cache and main memory
Heterogeneous many-core with distributed shared cache and main memory
High
Low
Computational energy efficiency
Architecture complexity
Operation s/sec/Joul e
DesignComplexity
Std single core CPU with cache and main memory
Multi-core with distributed shared cache with common main memory
Homogeneous many-core with distributed shared cache and main memory
Heterogeneous many-core with distributed shared cache and main memory
High
Low
3D Stack Die
Computational energy efficiency
Architecture complexity
Operation s/sec/Joul e
DesignComplexity
Std single core CPU with cache and main memory
Multi-core with distributed shared cache with common main memory
Homogeneous many-core with distributed shared cache and main memory
Heterogeneous many-core with distributed shared cache and main memory
High
Low
3D Stack Die
Computational energy efficiency
Architecture complexity
Elsip's target market
Operation s/sec/Joul e
Memory architecture matters
Beyond a certain level of parallelization, any gain in computation time is offset by the overhead of memory access and synchronization.For the matrix and FFT operations this means that the performance in a 64 node central memory architecture is in fact lower than on 16 nodes.The performance advantage of DSM increases with the number of cores
Performance of multi-core architectures with centralized and distributed memory organization. Both use a cache, so the observed difference is only due to the delay in accessing uncached data. Source: Elsip.
Distributed memory needs
A distributed memory architecture needs a data management mechanism supporting:
Distributed memory accessFlexible private/shared memory space managementSynchronization for memory consistencyVirtual address space managementScaleabilityFlexibilityTransaction ordering (Memory consistency)Data movement (DMA) functionsMessage passingCache coherence
Elsip's DME – Data Management Engine – is a microprogrammable IP block for on-chip data management. Microprograms in the DME realize the different data management functions. The microprograms can also be downloaded dynamically, giving applications flexibility to adapt the DME to specific needs.
For higher performance and/or power critical applications the DME can be hard coded (replaced by a state machine)
=> The DME is a software defined MPSoC data management IP block
Introducing DME
Applications
The DME is useful for many-core SoCs in: Video, signal and network processing Cloud computing Industrial automation Set-top boxes Scientific computing Solid state disks High-end personal mobile devices Other high-end embedded applications
16
Video and Data Packet processors are drivers for faster memory access today
- Graphics- Mobile Video- Network Processor- FPGA
David McCann GF Snr Dir
SSD
Memory
Portable
The DME provides
Programmability => the DME can be optimized for any particular application. Lower design risk, allowing late design changes without need for re-spin Customization => different hardware versions can be generated for different platform instances. Dynamic programmability => facilitates use of customized functions in different parts or phases of an application. Efficiency => speed and power on par with custom hardware Separation => offloading computing cores, giving higher degree of parallelism. The DME complies to several standard interfaces, e.g. AHB, APB and AXI, with configurable data bus widths.
DME Features
Note: Perceived value is based on early customer input, and is application dependent.
DME Products
24
The DME architecture
25
Application example: SSD NodeApplication example: SSD Node
Interface PCI-eInterface PCI-eCPU for flash write-read-remove scheduling and buffer CPU for flash write-read-remove scheduling and buffer managementmanagementPower budget for the SSD board is 13 W, for MCU is 5w.Power budget for the SSD board is 13 W, for MCU is 5w.
26
Application example: SSD NodeApplication example: SSD Node
DME ? DME ?
DME ?
The CPU needs complex functionality and perhaps an OS. The DME is not a good candidate to replace the CPU
Depending on the precise functionality, the DME could be optimized for buffer management.
The DME could implement the FTL (Flash Translation Layer)
27
Star-ring topology instead of treeStar-ring topology instead of treeFrom the rack perspective, it is a star topologyFrom the rack perspective, it is a star topologyIntra-cluster and inter-cluster nodes are rings.Intra-cluster and inter-cluster nodes are rings.
Application example 2: SSD Array DesignApplication example 2: SSD Array Design
28
DME + ELSIP in-house switch can be optimized for DME + ELSIP in-house switch can be optimized for managing large SSD Arraysmanaging large SSD Arrays
Application example 2: SSD Array DesignApplication example 2: SSD Array Design
DME + Switch
DME + Switch
DME + Switch
DME + Switch
DME + Switch
Evaluating the DME
For evaluation of the DME, Elsip offers:
Introduction Booklet DME Application Development Package, with API libraries C++ Model Compiled IP Model User manual Demonstrator On-site and off-site support
The founders
Axel Jantsch, CTO. Professor, KTH Electronic Systems since 2002. 20+ years of research, primarily within NoC and SoC. 200+ scientific papers published. Visiting professor of Fudan University in PRC and Cantabria University in Spain
Ahmed Hemani. Professor, KTH, focus on high-level system integration, design automation, NoC, asynchronous circuit, configurable system. Industrial experience from NSC, NXP/Philips, ABB, Ericsson, Newlogic, Synthesia and Spirea (co-founder).
Zhonghai Lu: Professor, KTH, expert in SoC and NoC. Reviewer of 14 international periodicals. Principal investigator of Intel, dealing with future nuclear processor chip frame.
30
Management Team
Adam Edström, CEO. 20+ years as editor and editor-in-chief of Elektroniktidningen, Sweden's major electronics publication. Visiting editor at Fortune Magazine in NYC. VP International affairs at SICS, Swedish Institute of Computer Science. Founded three companies prior to Elsip.
Bengt Edlund, Sales Director. 30+ years of semiconductor sales, marketing and new technology business development at National Semiconductor and Hewlett Packard. Served as European director of business development, marketing and global sales.
31
Some ELSIP Milestones
•Founded by professors Axel Jantsch, Ahmed Hemani and Zhonghai Lu at the Royal Institute of Technology in Stockholm 2011•Received initial funding from Vinnova•Commercial launch when Adam Edström (CEO) and Bengt Edlund (Sales Dir) joined the company Sept 2012•Established subsidiary Memcom in Shanghai March 2012, PRC, with Zhonghai Lu as CTO and Zhuo Zou as CEO. Received initial funding from Wuxi government.•Cooperation with Fudan-Wuxi Institute, Shanghai, PRC•Selected by SICS, the Swedish Institute of Computer Science, as member of SICS Startup Accelerator
32
Roadmap
Looking into the future, other IP we’re working on include:
Circuit-switched NoC (faster than today’s NoC for telecom/datacom applications)
CGRA - Coarse Grain Reconfigurable Architecture (reconfigurable on bus level, better silicon usage than FPGA)
Contact
Sales Director Bengt EdlundMail: [email protected]: +46 708 722 800
CEO Adam EdströmMail: [email protected] +46 702 579 734
Address: c/o SICS, PO Box 1263, SE16429, Kista, Sweden
34
Thank you!