Computer Architecture 2011 – peripherals 1 Computer Architecture Peripherals By Dan Tsafrir,...

Computer Architecture 2011 – peripherals1

Computer Architecture

Peripherals

By Dan Tsafrir, 6/6/2011Presentation based on slides by Lihu Rappoport

MEMORY: REMINDER

Not so long ago…

9% per yr

2X in 10 yrs

CPU60% per yr2X in 1.5 yrs

Gap grew 50% per year

Not so long ago… In 1994, in their paper

“Hitting the Memory Wall: Implications of the Obvious”,

William Wulf & Sally McKee said:

“We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM memory speed – each is improving exponentially, but the exponent for microprocessors is substantially larger than that for DRAMs.

The difference between diverging exponentials also grows exponentially; so, although the disparity between processor and memory speed is already an issue, downstream someplace it will be a much bigger one.”

More recently (2008)…lo

The memory wall in the multicore era

Processor cores

Conventionalarchitecture

Memory Trade-Offs Large (dense) memories are slow Fast memories are small, expensive and consume high

power Goal: give the processor a feeling that it has a memory

which is large (dense), fast, consumes low power, and cheap

Solution: a Hierarchy of memories

Speed: Fastest SlowestSize: Smallest BiggestCost: Highest LowestPower: Highest Lowest

L1CacheCPU

L2Cache

L3Cache

Memory(DRAM)

Typical levels in mem hierarchy

Response time Size Memory level

≈ 0.5 ns ≈ 100 bytes CPU registers

≈ 1 ns ≈ 64 KB L1 cache

≈ 15 ns ≈ 1 – 4 MB L2 cache

≈ 150 ns ≈ 1 – 4 GB Main memory (DRAM)

≈ 15 ms ≈ 1 – 2 TB Hard disk (SATA)

DRAM & SRAM

DRAM basics DRAM

Dynamic random-access memory Random access = access cost the same (well, not really)

CPU thinks of DRAM as 1-dimensional Simpler

But DRAM is actually arranged as a 2-D grid Need row & col addresses to access Given “1-D address”, DRAM interface splits it to row &

col Some time duration must elapse between row & col

access(10s of ns)

DRAM basics Why 2D? Why delayed row & col accesses?

Every address-bit requires a physical pin DRAMs are large (GBs nowadays)

=> need many pins => more expensive

A DRAM array has Row decoder

• Extracts row number from memory address Column decoder

• Extracts column number from memory address Sense amplifiers

• Hold row when (1) written to, (2) read from, (3) is refreshed (see next slide)

DRAM basics Use one transistor-capacitor pair

Per bit

Capacitors leaks => Need to be refreshed every few ms

DRAM spends ~1% of time in refreshing “Opening” a row = fetching it to sense amplifiers = refreshing it

Is it worth it to make DRAM a rectangle (rather than square?)

x1 DRAM

Data in/out

buffers

Senseamplifiers

Memoryarray

Column decoder

…columns……rows…

one bit

DRAM banks Each DRAM memory array outputs one bit

DRAMs use multiple arrays to output multiple bits at a time x N indicates DRAM with N memory arrays Typical today: x16, x32

Each collection of x N arrays forms a DRAM bank Can read/write from/to each bank independently

x4 DRAM

one bit

…row…

…columns…

Data in/out

buffers

Senseamplifiers

Memoryarray

Column decoder

…row…

…columns…

Data in/out

buffers

Senseamplifiers

Memoryarray

Column decoder

…row…

…columns…

Data in/out

buffers

Senseamplifiers

Memoryarray

Column decoder

…rows…

…columns…

Data in/out

buffers

Senseamplifiers

Memoryarray

Column decoder

Ranks & DIMMs DIMM

(Dual in-line) memory module (the unit we connect to the MB)

Increase bandwidth by delivering data from multiple banks Bandwidth by one bank is limited => Put multiple banks on DIMM Bus has higher clock frequency than any one DRAM Bus controls switches between banks to achieve high

data rate

Increase capacity by utilizing multiple ranks Each rank is an independent set of banks that can be

accessed for the full data bit‐width, • 64 bits for non-ECC; 72 for ECC (error correction code)

Ranks cannot be accessed simultaneously• As they share the same data path

Ranks & DIMMs

1GB 2Rx8 (= 2ranks x 8 banks)

Modern DRAM organization

A system has multiple DIMMs

Each DIMM has multiple DRAM banks Arranged in one or more ranks

Each bank has multiple DRAM arrays

Concurrency in banks increases memory bandwidth

Memory controllerM

address/command bus

data bus

chip select 1

address/command bus

data bus

chip select 2

Memory controller Functionality

Executes processor memory requests

In earlier systems Separate off-processor chip

In modern systems Integrated on-chip with the processor

Interconnect with processor Bus, but can be point-to-point, or through crossbar

Lifetime of a memory access1. Processor orders & queues memory requests2. Request(s) sent to memory controller3. Controller queues & orders requests4. For each request in queue, when the time is right

1. Controller waits until requested DRAM ready2. Controller breaks address bits into rank, bank, row,

column fields3. Controller sends chip-select signal to select rank4. Selected bank pre-charged to activate selected row5. Activate row within selected DRAM bank

• Use “RAS” (row-address strobe signal)6. Send (entire) row to sense amplifiers7. Select desired column

• Use “CAS” (column-address strobe signal)8. Send data back

Basic DRAM array

· Timing (2 phases)· Decode row address + RAS assert· Wait for “RAS to CAS delay”· Decode column address + CAS assert· Transfer DATA

Row latch

Row addressdecoder

Column addrdecoder

Column latchCAS#

RAS# Data

Memoryarray

Memory address bus

DRAM timing CAS Latency

Number of clock cycles to access a specific column of data

From moment the memory controller issues a column in the current row until data is read out from memory

RAS to CAS delay Number of cycles between row and column access

Row pre-charge time Number of cycles to close the opened-row & to open

next-row

Addressing sequence

· Access sequence· Put row address on data bus and assert RAS#

· Wait for RAS# to CAS# delay (tRCD)· Put column address on data bus and assert CAS# · DATA transfer· Pre-charge

access time

RAS/CAS delay

precharge delay

A[0:7]

Data n

Row i Col n Row jX

CAS latency

· Paged Mode DRAM– Multiple accesses to different columns from same row (special locality)– Saves time it takes to bring a new row (but might be unfair)

· Extended Data Output RAM (EDO RAM)– A data output latch enables to parallelize next column address with

current column data

Improved DRAM Schemes

A[0:7]

Data n D n+1

Row X Col n X Col n+1 X Col n+2 X

A[0:7]

Data n Data n+1

Row X Col n X Col n+1 X Col n+2 X

Data n+2

· Burst DRAM– Generates consecutive column address by itself

Improved DRAM Schemes (cont)

A[0:7]

Data n Data n+1

Row X Col n X

Data n+2

Synchronous DRAM (SDRAM) Asynchrony in DRAM

Due to RAS & CAS arriving at any time

Synchronous DRAM Uses clock to deliver requests at regular intervals More predictable DRAM timing => Less skew => Faster turnaround

SDRAMs support burst-mode access Initial performance similar to BEDO (=burst +EDO) Clock scaling enabled higher transfer rates later

• => DDR SDRAM => DDR2 => DDR3

DRAM vs. SRAM

(Random access = access time the same for all locations)

DRAM – Dynamic RAM SRAM – Static RAM

Refresh Yes (~1% time) No

Address Address muxed: row+col

Address not multiplexed

Random Access Not really… Yes

density High (1 Transistor/bit) Low (6 Transistor/bit)

Power low high

Speed slow fast

Price/bit low high

Typical usage Main memory cache

Computer Architecture 2011 – peripherals 1 Computer Architecture Peripherals By Dan Tsafrir,...

Documents

Transcript of Computer Architecture 2011 – peripherals 1 Computer Architecture Peripherals By Dan Tsafrir,...

Buying Media Not At Any Cost Tsafrir Peles June 2009

P Peripherals

AT91 Embedded Peripherals. 2 SYSTEM and USER PERIPHERALS Overview System Peripherals –External Bus Interface –Advanced Interrupt Controller –Parallel.

S4525A Peripherals & Enhanced FLASH 1 © 1999 Microchip Technology Incorporated. All Rights Reserved. S4525A Peripherals & Enhanced FLASH 1 Peripherals.

Application Programmer Interface (API) Reference Guide · New features and improvements Support for Touch 10 network pairing. ... Peripherals HeartBeat. Peripherals List Peripherals

Computer Architecture 2011 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Structure 2014 – Introduction 1 MAMAS – Computer Structure 234267 Lecturers: Lihu Rappoport Adi Yoaz.

Creative Peripherals and Distribution Limitedecreativeindia.com/wp-content/uploads/2017/05/Creative-Peripherals... · This presentation has been prepared by Creative Peripherals and

Computer Architecture 2012 – virtual memory 1 Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 10/6/2011 Presentation based on slides by Lihu.

® Lihu Rappoport 1 XBC - eXtended Block Cache Lihu Rappoport Stephan Jourdan Yoav Almog Mattan Erez Adi Yoaz Ronny Ronen Intel Corporation.

Peripherals Presentation

STM32 Peripherals

Interfacing peripherals

Computer Design 2007 – Introduction 1 MAMAS – Computer Architecture PC Structure and Peripherals Dr. Lihu Rappoport.

Computer Structure 2015 – Introduction 1 Computer Structure 234267 Lecturers: Lihu Rappoport Adi Yoaz.

Computer Structure 2014 – Pipeline 1 Computer Structure Pipeline Lihu Rappoport and Adi Yoaz.

Computer Peripherals

Computer Architecture 2011 – Caches 1 Lihu Rappoport and Adi Yoaz Computer Architecture Cache Memory.

PIC18 Peripherals

PC Peripherals