Architecture of Digital Integrated Systems -...
Transcript of Architecture of Digital Integrated Systems -...
![Page 1: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/1.jpg)
Architecture of Digital Integrated Systems
Course Presentation
Davide BertozziUniversity of Ferrara
![Page 2: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/2.jpg)
Most of the course material will be in english
2
![Page 3: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/3.jpg)
Course Information
Instructor:Davide BertozziAssistant ProfessorEmail: [email protected]: +390532974832
Teaching assistant, responsible for laboratory experiences:Meriem TurkiPhD studentEmail: [email protected]: no. 338 (third floor, Engineering Department)
![Page 4: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/4.jpg)
Course schedule
MONDAY 16.30-19.00 - Room 20
WEDNESDAY 16.30-19.00 - Room 9OR- Informatics Lab (Small)
All Lab experiences will be taught in english Office hours: on appointment
(email reservation, or after lectures)
![Page 5: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/5.jpg)
The Exam
Roughly one/third of the course will be in lab (or related to it) Expertise on the (C++-derived) SystemC hardware description
language (HDL) Exam split into 2 parts:
Oral exam (25 points) - 3 questions.
Course project (5 points) Hands-on final project assignment showing off SystemC programming
skills Exams are on appointment, and requests should be emailed
to me at least one week in advance
![Page 6: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/6.jpg)
Course material Course website:
http://mpsoc.unife.it/~arch-dig/- Slides (at least 1 hour before lessons)- News, course information
No unique course book available, since the topic of this courseis fast evolving It is at the frontier of research Specific book chapters, papers,...will be suggested on a topic by
topic basis
Taking the course and taking notes is the best way to enjoy the course!
![Page 7: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/7.jpg)
Useful books 1. T.Groetker, S.Liao, G.Martin, S.Swan; System Design with SystemC, Kluwer
Academic Publishers, 2002 SystemC hardware description language
2. J.Flich, D.Bertozzi; Designing Network-on-Chip Architectures in the Nanoscale Era, CRC Press, 2011. Networks-on-Chip
3. William James Dally, Brian Patrick Towles; Principles and Practices of Interconnection Networks; Morgan Kaufmann, 2004 Interconnection networks
4. Digital Integrated Circuits - A Design Perspective (second edition), J.M.Rabaey, A.Chandrakasan, B.Nikolic, Prentice Hall Design methodologies; Timing issues in digital circuit design
5. David A. Patterson, John L. Hennessy; Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann, 2004 Microprocessor architecture
5. D.Culler, P.Singh, A.Gupta; Parallel Computer Architecture: a Hardware/Software Approach, Morgan Kaufmann, 2004 Design issues of multicore processors
![Page 8: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/8.jpg)
Lab Schedule
Indicative dates for lab experiences
- march 21st from 16:30 to 18:30 - march 28th from 16:30 to 18:30 - april 11th from 16:30 to 18:30 - april 18th from 16:30 to 18:30 - may 9th from 16:30 to 18:30 - may 16th from 16:30 to 18:30 - may 23rd from 16:30 to 18:30 (Final Project Assignment)
These dates may change based on my work commitments. They should be considered as indicative.
8
![Page 9: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/9.jpg)
Architecture
Technology
Synthesis flow
SystemC Hardware
Description Language (HDL)
The course at a glance
Technology-aware design
![Page 10: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/10.jpg)
Why taking this course?
Breaking the abstraction layers and knowing what is underneathenables you to solve problems and design better future systems
Cooperation between multiple components and layers can enable more effective solutions and systems
10
Off-chip memoryMicroprocessor core Bus Memory I/OAccelerators
Operating System
Language Runtime
Application and Libraries
Hypervisor
Netlist of logic gates
Circuits
Layout
Transistors
Horizontal integration
Verticalintegration
![Page 11: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/11.jpg)
I am ready!
When I push a button on the touchscreen, the smartphone recovers from sleep mode and starts working
Operation Batteryduration
Standby 250 ore
Operation Batteryduration
3G talk time 10 hrs
3G browsing 8 hrs
LTE browsing
10 hrs
Wi-Fi browsing
10 hrs
Video 10 hrs
Music 40 hrs
iPhone 5s
Several kinds of works stress the smartphone to a different extent
This has direct implications on the battery duration
Let us start from our daily experience
Who does the actual computing inside the smartphone?
![Page 12: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/12.jpg)
Electronic board
Opening the smartphone
![Page 13: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/13.jpg)
Touch Screen Controller
Antennas and controllers
Electronic Board – Face up
![Page 14: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/14.jpg)
ApplicationProcessor
Non-Volatile Memory
Gyroscope & Accelerometer
LTE/GSM Modem
NFC Controller
Image Processor
Electronic Board – Face down
Application Processor
The «brain» of the smartphone is its «Application Processor»:Snapdragon (Qualcomm), Exynos (Samsung), Helio (Mediatek), OMAP (Texas Instr.), Kirin (HiSilicon), Tegra (Nvidia), Ax (Apple), ….
![Page 15: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/15.jpg)
What is a microprocessor?
Execute!
Yes Sir!
Programmer
Microprocessor
Send an email!
Yes Sir!
Programmer
Microprocessor
Microprocessors are not able to understandand process such «abstract» commands!
From now on, the terms «application processor» and «microprocessor» will be used interchangeably, although other kinds of microprocessors do exist (e.g., for power
management, wireless control, display control, etc..).
![Page 16: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/16.jpg)
Machine LanguageTalking to a microprocessor goes through «electrical signals» The basic thing a microprocessor does is to detect the
following conditions: - Presence of signal (symbol «1») - Lack of signal (symbol «0»)
Fundamentals of digital processing:Binary numbers are used to communite
both instructions and datato a microprocessor
1000110010100000!
Microprocessors speak a language whose alphabet consists of 2 letters (the italianlanguage has 21 letters). As a result, machine language consists of binary numbers:
microprocessor
programmer
«0»«1»«0»«1»«0»«1»
With this language, what kind of orders can I give to a microprocessor?
![Page 17: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/17.jpg)
A microprocessor can only execute simple «low-level» arithmetic-logic instructions
- E.g., sum/subtract/multiply/divide two integers- E.g., carry out the logic AND/OR/EXOR of two bits or sets of bits
Applications consist of «complex» (or abstract) operations, which take for granted the capability to think/abstract/plan/structure of the human mind:
- Start a phone call!- Play back a video!- Send an email
Abstractions hide details, but enable to cope with problem complexity
Several intermediate HW/SW layers are needed to interpretand translate high-
level operations intothe basic operations
a microprocessorcan do.
Moving to a «lowerabstraction layer»
increases the informative content
ABSTRACTIONGAP
![Page 18: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/18.jpg)
Application Processor: an Example
SAMSUNG EXYNOS7420 OCTA-CORE
A microprocessor does NOT ONLY consist of a single CORE (computation unit), but rather of a (more or less regular) network of cores.
ANALOGY
![Page 19: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/19.jpg)
Once upon a time (~50 years ago) it was not like that…
Microprocessors used to be «Monolithic processor cores»
Hardware capable of executingarithmetic and logic instructions
Optionals di questo hardware:OPTIONS:Processing speed (or clock frequency)Instruction throughputInstruction-Level ParallelismOut-of-order execution capabilityBranch prediction strategyMemory hierararchy and access speedVirtual memory…..
![Page 20: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/20.jpg)
Trend to integrate more and more functions (computation units, controllers, modems, memory macros, etc.) on the same silicon die, thus building up «Systems-on-Chip».
- lower power, better performance, lower sizeToday, all application processors are «systems-on-chip»
What slows down this trend:Technology, Cost, Reliability
The long-term idealasymptotic trend
consists of the «smartphone-on-
a-chip»
The «system-on-chip (SoC)» revolution
![Page 21: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/21.jpg)
«Integrated» Application ProcessorsMicroprocessors in the «system-on-chip (SoC)» era
Memory Peripherical unitsand I/O
Hardware capable of executingarithmetic and logic instructions
![Page 22: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/22.jpg)
SoCfor
wireless applications
Computation unit from a third-party vendor(implementation made by system integrator – SOFT MACRO)
Computation unit from a third-party
vendor(layout defined
by the vendor as well – HARD
MACRO)
A typical «SoC» consists of pre-designed and pre-verified blocks, which can be made in-house or bought by «third party» vendors
(against the payment of royalties) Data Memory and Instruction Memory (from
third-party vendor). Their layout comes
from vendor as well - HARD
MACRONew terminology coming up:- Platform-based design- Design reuse- System integration task
Systems-on-chip: a different way of designing systems….and of doing business!
![Page 23: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/23.jpg)
«ARM» PROCESSORS You may have heard that «most smartphone
processors» are ARM processors…… … but when talking about application processors ARM
was not mentioned: Qualcomm, MediaTek, Nvidia, HiSilicon, Apple,…..??!?!
Generic Application Processorswith «ARM core inside».
Core ARM
In turn, ARM processors are systems-on-chip….
Application processors are Systems-on-chip!
![Page 24: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/24.jpg)
Evolution: the Memory Hierarchy
1st level memory
Peripheral unitsand I/O2nd level memory
Hardware capable of executingarithmetic and logic instructions
Microprocessors in the «system-on-chip (SoC)» era
![Page 25: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/25.jpg)
Memory – the ProblemWe all would like knowledge to be accessible in a single book!
It follows from thisthe need to selectevery time…
..the book containing the needed information at any given point in time!
![Page 26: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/26.jpg)
Working Set
• We may identify a set of books containing all the information that we normally need (except for specific cases!) = Working set.
• What are the «habits» of the microprocessor«reader», so to build the working set?
![Page 27: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/27.jpg)
The Problem
The microprocessor would like to have infinite memorywith overly fast access times… But this is not feasible in practice:
Fast memories are small. Large memories are slow. The amount of memory that can stay on a chip is limited! Fast memories are also very expensive, so they have to be
small.
![Page 28: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/28.jpg)
Key Insight The microprocessor has a distinctive feature: it tends to reuse
data and instructions that have been accessed recently, or thatare close to the recently-accessed ones.
TEMPORAL LOCALITY
Recently-accessed elements are likely to be accessed again soon
SPATIAL LOCALITY
When accessing an element, the elements nearby are likely to be accessed soon
![Page 29: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/29.jpg)
Example
Spatial locality Access to the elements A[i] in sequnce
Temporal locality At each iteration, the «sum» instruction is used
How to exploit this?
![Page 30: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/30.jpg)
Low latency High throughput Small size High cost
Memory Hieararchy The approach is to store on a small and fast memory «close to» the
processor (i.e., a cache) the data/instructions that I am currentlyaccessing, in addition to the «nearby» ones
By implementing the memory system as a «hierarchy», the microprocessor is given the illusion of having a memory as large as
the last-level one and as fast as the first-level one
Working set
Whether the working set is «good» or not depends on the number of MISSes in the first-level memory
![Page 31: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/31.jpg)
How does it work? Hit and MissI am searching for a/an data/instruction. Do you have it?YES (HIT!)
NO (MISS!) Transfer from the lower level
Registers
L1 Cache
L2 Cache
The philosophy is as follows:Fast access to data/instructions that are most commonly used or which can be foreseen to be accessed in the near future. For the exceptions, a temporary performance slowdown has to be accounted for.
![Page 32: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/32.jpg)
Evolution: the Memory Hierarchy
1st level memory
Peripheral unitsand I/O2nd level memory
Hardware capable of executingarithmetic and logic instructions
Microprocessors in the «system-on-chip (SoC)» era
![Page 33: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/33.jpg)
Trend: increasing chip size!
Growing to Super-cores!
Memoria I livello
Periferica e/o porta di I/OMemoria II livello
Chip Size
Goal: meet the growing user expectations for advanced software services
![Page 34: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/34.jpg)
Chip Size
Next-Generation Processor Core
1st-level memory Peripheral unit and I/O
2nd-level memoryPeripheral unit and I/OPeripheral unit and I/O
3rd-level memoryPeripheral unit and I/O
Peripheral unit and I/OPeripheral unit and I/O
Integration of tens or hundreds of «cores»
More memory levels or
same levels with more memory
Higher performance, lower cost,
etc.
Trend: increasing chip size!
![Page 35: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/35.jpg)
Next-Generation Processor Core
1st-level memory Peripheral unit and I/O
2nd-level memoryPeripheral unit and I/OPeripheral unit and I/O
3rd-level memoryPeripheral unit and I/O
Peripheral unit and I/OPeripheral unit and I/O
Chip SizeOpposite trend: chip size reduction under the pressure of technology scaling
Super-cores!65 nm45 nm
90 nm Today we are
headingbelow 14nm
processnodes!
Below 10nm fundamental physical issus come to the forefront
![Page 36: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/36.jpg)
Trend stabilization: constant chip size
Microprocessor area has today stabilized (rough numbers provided): 140 mm2 for desktop computers 260 mm2 for high-performance computing (e.g., scientific computing) 70–100 mm2 for “embedded” microprocessorsArea split into LOGIC, MEMORY AND INTEGRATION OVERHEAD
There are limiting factors for chiparea:- power consumption- manufacturing cost- chip-wide transmission delay- design cost
![Page 37: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/37.jpg)
Then, around 2000, an epoch-makingparadigm shift occurred…
![Page 38: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/38.jpg)
SOMETHING WENT WRONG!
Designers realize that at each new generation of microprocessors, the cost to achieve a predefined performance increase skyrockets (if at all achievable)
Pollack’s rule:At a given feature size (process node), a new
microprocessor generation takes 2-3x the area of the old one, while the performance speedup is only 1.4-1.6x
![Page 39: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/39.jpg)
DECREASING MARGINAL UTILIZATION
![Page 40: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/40.jpg)
WHAT EXACTLY WENT WRONG?
1. Few applications can expose more than 2 parallel instructions/cycle The main limiter of the instruction throughput is the presence of
dependencies in the instruction flow, whichmicroarchitecture/compiler designers cannot completely get rid of
2. Sometimes, although instruction parallelism is there, the compilerand/or the hardware are not able to extract it E.g., potentially parallel instructions that are thousands of cycles
apart3. Memory access latencies limit the utilization rate of the processor
Memory latency cannot be completely hidden4. Beyond a 150W power envelope, it is not economically convenient to
cool down any more
![Page 41: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/41.jpg)
5. Beyond a given clock frequency, there are concerns: A well-formed clock pulse becomes challenging Within the clock cycle, there is an inactive time that does not scale
6. Although the processor is fast in processing data, data communication is overly slow and costly! The communication bottleneck becomes more severe as
technology scales down!
Equivalent to:
WHAT EXACTLY WENT WRONG?
![Page 42: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/42.jpg)
C1 C2
C3 C4
2nd-level Memory
Large Core
1st-level Memory
1
2
3
4
1
2 SmallCore 1 1
1
2
3
4
1
2
3
4
Power
PerformancePower = 1/4
Performance = 1/2
Multi-Core:Power efficient +Better power and thermal management
A NEW ERA: MULTI-CORE COMPUTING
Computation parallelism represents a more efficient and scalable way of delivering computing performance and power management!
![Page 43: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/43.jpg)
C1 C2
C3 C4
2nd-level Memory
Large Core
1st-level Memory
1
2
3
4
1
2 SmallCore 1 1
1
2
3
4
1
2
3
4
Power
PerformancePower = 1/4
Performance = 1/2
Multi-Core:Power efficient +Better power and thermal management
A NEW ERA: MULTI-CORE COMPUTING
Computation parallelism represents a more efficient and scalable way of delivering computing performance and power management!
![Page 44: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/44.jpg)
Multi-core computing
1985 199019801970 1975 1995 2000 2005
Raw
Power4 Opteron
Power6
Niagara
YonahPExtreme
Tanglewood
Cell
IntelTflops
Xbox360
CaviumOcteon
RazaXLR
PA-8800
CiscoCSR-1
PicochipPC102
Boardcom 1480
20??
# ofcores
1248
163264
128256512
Opteron 4PXeon MP
AmbricAM2045
4004
8008
80868080 286 386 486 Pentium P2 P3P4Itanium
Itanium 2Athlon
C1 C2
C3 C4
Cache
Large Core
Cache
1
2
3
4
1
2 SmallCore 1 1
1
2
3
4
1
2
3
4
Power
PerformancePower = 1/4
Performance = 1/2
Multi-Core:Power efficient +Better power and thermal management
![Page 45: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/45.jpg)
Multi-Core ArchitecturesMicroprocessors in the era of parallelism
ReplicatedHardwareCapable ofExecuting
Arithmetic and Logic
Instructions
Peripheral unit and I/OShared L2 MemoryPeripheral unit and I/OPeripheral unit and I/O
Shared L3 Memory Peripheral unit and I/O
Peripheral unit and I/OPeripheral unit and I/O
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
![Page 46: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/46.jpg)
Power ManagementThe processor is split into voltage and frequency islands
Periferica e/o porta di I/OPeriferica e/o porta di I/O
Memoria III livelloPeriferica e/o porta di
I/OPeriferica e/o porta di I/OPeriferica e/o porta di I/O
OFF
OFF
1st-level memory
1st-level memory
1st-level memory
1st-level memory
1st-level memory
1st-level memory
OR
per-core activation
Each core (or «cluster» of cores) can be operated at different voltage and frequency settings, or selectively powered off
![Page 47: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/47.jpg)
Intel Single-Chip Cloud Computer (SCC)48 cores structured into 2-core clusters
24 frequency islands8 voltage islands
15 speed settings from 100 to 800 MHz7 voltage levels from 0.7V to 1.3V in steps of 0.1V
Case Study: an Industrial Research Prototype
Power Management
![Page 48: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/48.jpg)
From Multicores to Manycores There are applications where the hardware parallelism is
perfectly matched to the software parallelism (this is notalways the case!!) E.g., graphics
Single instruction («sum with 6») applied to Multiple Data(SIMD – Single Instruction Multiple Data) Implementation
![Page 49: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/49.jpg)
Graphics Processing Unit
Server of/with GPUsOptimized for SIMD Workloads
![Page 50: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/50.jpg)
NVIDIA TITAN V GPU
5120 cores 640 additional cores to support artificial
intelligence Other 320 cores («texture» units) 21.1 billions of transistors Maximum clock frequency: 1.5 GHz 12 GB of memory 12nm technology 110 TFLOPS of compute performance Target applications: deep learning,
supercomputing, financial services, high-end gaming, big data applications
TDP: 250 W Price: roughly 3000 dollars
![Page 51: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/51.jpg)
LATEST NEWS FROM THE WORLD OF MICROPROCESSORS
![Page 52: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/52.jpg)
Heterogeneity
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
System-on-Chip
Low-PerformanceLow-PowerMulticore
High-PerformanceHigh-Power
Multicore
OPERATING SYSTEM SCHEDULER
OFFON
![Page 53: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/53.jpg)
«Big-LITTLE» ARM architecture:Combination of hight-end ARM A57
with low-end ARM A53
SAMSUNG EXYNOS7420 OCTA-CORE
![Page 54: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/54.jpg)
54
![Page 55: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/55.jpg)
The Accelerator Store
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
System-on-Chip
Low-PerformanceLow-PowerMulticore
High-PerformanceHigh-Power
Multicore
Hardware Accelerators
Image processing
VideoPlayback
FFT
NeuromorphicAccelerator
HOST PROCESSOR
![Page 56: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/56.jpg)
The Accelerator Store
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
1st-level Memory
System-on-Chip
Low-PerformanceLow-PowerMulticore
High-PerformanceHigh-Power
Multicore
Hardware AcceleratorGraphics (Embedded GPU)HOST PROCESSOR
![Page 57: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/57.jpg)
The Accelerator Store
Re-programmable accelerators Specialized accelerators
![Page 58: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/58.jpg)
Example: Huawei Kirin 970
«Big.LITTLE» Host processor 4-core A73 (2.4 GHz) 4-core A53 (1.8 GHz)
Embedded GPU with 12 cores 5.5 billions of transistors Area: 1cm2 1 accelerator for machine learning(25x performance, 50x energyefficiency)
2005 classified images/minute (Samsung Galaxy S8: 95, iPhone7 Plus: 487)
![Page 59: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/59.jpg)
THE FUTURE
![Page 60: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/60.jpg)
Deep Learning
60
![Page 61: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/61.jpg)
Hardware for Deep Learning
61
![Page 62: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/62.jpg)
UNIFE is playing the game!
62
Collaboration with Fabrizio Riguzzi, Department of Informatics
Dynamically Reconfigurable DNN
Courtesy of Rice University
![Page 63: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/63.jpg)
Brain-inspired Computing
63
Courtesy of D.Querlioz
![Page 64: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/64.jpg)
Brain-Inspired Computing: Why?
64
Courtesy of D.Querlioz
![Page 65: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/65.jpg)
We are playing the game right now!
65
![Page 66: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/66.jpg)
UNIFE is playing the game
66Collaboration with Michele Favalli, Columbia University and AMD
Enabling Asynchronous Interconnect Technology
![Page 67: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/67.jpg)
Integrated Photonics for Communication(typically) off-chip
Laser sourceOptical signal carried to the chip via optical fiber
Tapered input
Silicon waveguide Optical OOK modulationSilicon waveguide
![Page 68: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/68.jpg)
Optical OOK modulation
Silicon waveguide
Photonic Switching
PhotodetectorTransimpedance amplifieDigital Comparator
Integrated Photonics for Communication
![Page 69: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/69.jpg)
3-D Integration
![Page 70: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/70.jpg)
UNIFE is playing the game
AutomaticPlace&Route
Tool
Full-custom
Irregular Pattern Regular pattern Optical Ring Optical RingsAutomatic topology synthesis Framework
Collaboration with Maddalena Nonato, Marco Gavanelli and TU Munich (Germany)
![Page 71: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/71.jpg)
UNIFE is playing the gameFabrication of photonic integrated circuits and measureents on them
Collaboration with Gaetano Bellanca and Inphotec Pisa
![Page 72: Architecture of Digital Integrated Systems - mpsoc.unife.itmpsoc.unife.it/~arch-dig/slides2016/Course2018.pdfmore effective solutions and systems 10 ... 7420 OCTA-CORE ... We all would](https://reader035.fdocuments.us/reader035/viewer/2022062317/5afb55c97f8b9ae92b8ef0f5/html5/thumbnails/72.jpg)
UNIFE is playing the game
72
f0
DC‐FIFO 1 RL 1 MUX
3x1
DEMU
X 1x1
5
15
1
1
Arbiter
Credits From Rx15
DEMU
X 1x2
MUX 3
0x1
Arbiter
Credit counter
Credit counter
15
1
M29
CMOS 40nm ECL 130nm
÷2÷2
VC DECODER
MESO
TX
TIA PD30
SE2D
D2SE ÷2 ÷2
clk5 clk4 clk3 clk2
PLL
clk1
32x1 Binary Tree Serializer 15 Driver
DC‐FIFO 2 RL 2
DEMU
X 1x1
5
MUX
3x1
15
1
15
1
SE2D
1
15
M30Driver
VC DECODER
VC‐ID
DC‐FIFO 1
DC‐FIFO 15
Credits to Rx15
DC‐FIFO 29
DC‐FIFO 30
DEMU
X 1x3 D2SE 1x32 Binary Tree Deserializer 15
clk5 clk4 clk3 clk2 clk1
÷2÷2D2SE ÷2 ÷2 TIA PD29
1
15
RX
f1f1/16 f1/2f1/4f1/8
1
15
VC_ID
CMOS 40nm
ONOC
FULLY CMOS
Hybrid
HOW TO DRIVE AN OPTICAL NETWORK?
Collaboration with IHP Microelectronics (Germany)