ARM1156T2 S Architecture

download ARM1156T2 S Architecture

of 38

Transcript of ARM1156T2 S Architecture

  • 8/8/2019 ARM1156T2 S Architecture

    1/38

    SUNPLUS

    Technology for Easy Living

    ARM1156T2ARM1156T2--S ArchitectureS Architecture

    IntroductionIntroduction

    Leon

    Nov. 14, 2006

  • 8/8/2019 ARM1156T2 S Architecture

    2/38

    SUNPLUS

    Technology for Easy Living2

    OutlineOutline

    ARM1156T2ARM1156T2--S ComponentsS ComponentsPipelinesPipelines

    Prefetch UnitPrefetch Unit

    CoprocessorCoprocessorCache and TCMCache and TCM

    Memory Protection UnitMemory Protection Unit

  • 8/8/2019 ARM1156T2 S Architecture

    3/38

    SUNPLUS

    Technology for Easy Living3

    ARM1156T2ARM1156T2--S ComponentsS Components

    ALU / Shifter

    Multiplier

  • 8/8/2019 ARM1156T2 S Architecture

    4/38

    SUNPLUS

    Technology for Easy Living4

    PipelinePipeline

    Fe1 instruction fetch, address is issued to memory.

    Fe2 memory returns data to core.

    Fe3branch prediction

  • 8/8/2019 ARM1156T2 S Architecture

    5/38

    SUNPLUS

    Technology for Easy Living5

    Prefetch UnitPrefetch Unit

    Instructions fetched fromInstructions fetched from Instruction TightlyInstruction Tightly--Coupled Memory (ITCM)Coupled Memory (ITCM)

    Instruction CacheInstruction Cache

    External memoryExternal memory

    Branch PredictionBranch Prediction Branch PredictorBranch Predictor If disable, conditionally execution wait until Execute stage toIf disable, conditionally execution wait until Execute stage to

    determine branchdetermine branchstallstall

    Pattern History TablePattern History Table 256 Entries256 Entries

    ThreeThree--entry circular predicted HW Return Stackentry circular predicted HW Return Stack

  • 8/8/2019 ARM1156T2 S Architecture

    6/38

    SUNPLUS

    Technology for Easy Living6

    Branch PredictorBranch Predictor

    Use Global History prediction schemeUse Global History prediction scheme. TwoTwoPattern History tables, per 256 entriesPattern History tables, per 256 entries

    101010000001.

    Target address This is some logicfunction that combines

    target with history, say, 12

    previous branches

    For example, reach 102, thisprediction is taken

    index

    Pattern

    History

    TableTwo bit counterGlobal History register

    N bits

    History is based on

    taken/not taken of ALL

    branches

  • 8/8/2019 ARM1156T2 S Architecture

    7/38

    SUNPLUS

    Technology for Easy Living7

    Enable/Disable Branch PredictorEnable/Disable Branch Predictor

    Enable/Disable Z bit of CP15 Control Register c1 and

    Auxiliary c1 DB bits are set to 1

    Z bit of CP15 Control Register c1 andAuxiliary c1 DB bits are set to 0

    If disable, conditional branches are

    predicted not taken.

  • 8/8/2019 ARM1156T2 S Architecture

    8/38

  • 8/8/2019 ARM1156T2 S Architecture

    9/38

    SUNPLUS

    Technology for Easy Living9

    Branch Return StackBranch Return Stack

    Predict a procedure call instruction as taken, PFU push return address toReturn Stack

    Procedure call ARM instruction

    BL immediate conditional

    BLX immediate unconditional

    Thumb Unconditional BL immediate and BLX immediate

    PFU Fetch from Return Stack when detect unconditional instruction ARM instruction

    MOV pc, r14

    ARM and Thumb-2 instruction LDR pc

    LDM r13,{..pc..}

    BX r14

    Thumb POP

  • 8/8/2019 ARM1156T2 S Architecture

    10/38

    SUNPLUS

    Technology for Easy Living10

    CoprocessorCoprocessor

    CP0~CP15 CP10 VFP control

    CP11 VFP control

    CP14 Debug

    CP15 System control.

    User Lord-Store architecture to perform coprocessorinternal operations, save/load internal registers datato/from memory, to/from ARM core registers

    CP15 Configuration of cache, TCM, MMU, MPU

  • 8/8/2019 ARM1156T2 S Architecture

    11/38

    SUNPLUS

    Technology for Easy Living11

    Unified Instruction and DataUnified Instruction and Data

    CacheCache

    Flexible to adjust portion of instruction and data region

  • 8/8/2019 ARM1156T2 S Architecture

    12/38

    SUNPLUS

    Technology for Easy Living12

    Harvard ArchitectureHarvard Architecture

    Instruction fetch

    and data access

    in a single clockcycle

  • 8/8/2019 ARM1156T2 S Architecture

    13/38

    SUNPLUS

    Technology for Easy Living13

    Cache CharacteristicsCache Characteristics

    One-1KB, two-2KB, four-other cache sizeway set associative

    Cache line size:32 bytes

    Cache way size support maximum is 16KB

    minimum is 1KB

    Unique values of cache lines within a set

  • 8/8/2019 ARM1156T2 S Architecture

    14/38

    SUNPLUS

    Technology for Easy Living14

    Cache OrganizationCache Organization

    31 34 0

    Data

    indexSet indexTag 910

    == = =

    MUX

    HIT data

    Words:4 words

    64

    entries

    4KB, 4 way

  • 8/8/2019 ARM1156T2 S Architecture

    15/38

    SUNPLUS

    Technology for Easy Living15

    xxxx31 45 0

    Data

    indexSet indexTag

    22 4

    910

    0x00000A24

    0x00000624

    0x00000224 0x3FF

    0x000

    0x224

    0x3FF

    0x000

    0x224

    0x3FF

    0x000

    0x224

    0x3FF

    0x000

    0x224

  • 8/8/2019 ARM1156T2 S Architecture

    16/38

    SUNPLUS

    Technology for Easy Living16

    Write BufferWrite Buffer

    FIFO with fast memory

    Writes to external memory

    Nonblocking cache

    If a read access which address is the same

    with one in the Write Buffer, the read is

    blocked until writes drain to main memory

  • 8/8/2019 ARM1156T2 S Architecture

    17/38

    SUNPLUS

    Technology for Easy Living17

    Cache PolicyCache Policy

    Cache line allocation policy Read allocation Read write allocation

    ARM1156T2-S only support read allocation

    Write Policy, control by Memory Attribute (CB bits) Writethrough

    Writeback Set dirty bit

    Cache line replacement policy, CP15, c1, Control Register When miss, select victim

    Round-robin

    Pseudorandom

    Least recently used (ARM do not support)

  • 8/8/2019 ARM1156T2 S Architecture

    18/38

    SUNPLUS

    Technology for Easy Living18

    Invalidate and CleanInvalidate and Clean

    Clear: clear valid bit in the affected cacheline

    Alias to flush

    Clean: write the cache lines with dirty bitto main memory and clear dirty bit

    No need clean operation for Instruction

    cache

  • 8/8/2019 ARM1156T2 S Architecture

    19/38

    SUNPLUS

    Technology for Easy Living19

    Cache lock downCache lock down

    Avoid miss penalty Lockable at a granularity of a cache way

    Critical code or data

    Vector interrupt ISR

    Algorithm used extensively

    Variables referenced intensively

    If cache is flushed, must rerun to restore

  • 8/8/2019 ARM1156T2 S Architecture

    20/38

    SUNPLUS

    Technology for Easy Living20

    Cache miss handleCache miss handle

    If all ways are locked while cache miss ARM architecture

    Unpredictable behavior

    ARM 1156T2-S

    Evict the cache line in Way 0 as if Way 0 is notlocked

    If cache is disabled, an read/write arise inthe address range of cache. ThisUnexpected hit is igonred

  • 8/8/2019 ARM1156T2 S Architecture

    21/38

    SUNPLUS

    Technology for Easy Living21

    TightlyTightly--coupled memory(TCM)coupled memory(TCM)

    Low-latency memory As a part of physical memory map, contiguous memory space

    Hold critical routines Interrupt Service Routine

    Critical tasks

    Interrupt stacks

    Data intensively referenced TCM support size

    Maximum is 256 KB

    Minimum is 4KB

    TCM information CP15 c0 TCM status Register

    TCM and cache are independent ITCM and DTCM region can not overlap

  • 8/8/2019 ARM1156T2 S Architecture

    22/38

    SUNPLUS

    Technology for Easy Living22

    TCM(Cont.)TCM(Cont.)

    The TCM region overrides memory typeattributes of the MPU and all addresses withinthe TCM space are treated as Normal, Non-Shared memory

    If the peripheral port region overlaps the TCM Treated as: Device, non-shared, and TCM

    Access to the region, route to TCM, not peripheralport

    Configurable variables

    Base address Size

  • 8/8/2019 ARM1156T2 S Architecture

    23/38

    SUNPLUS

    Technology for Easy Living23

    Access TCM V.S. CacheAccess TCM V.S. Cache

  • 8/8/2019 ARM1156T2 S Architecture

    24/38

    SUNPLUS

    Technology for Easy Living24

    Memory Protection UnitMemory Protection Unit

    Support to 16 regions Configuration options:

    region base address

    region size region attributes

    region access permissions

    If MPU disable, no access permission is

    checked

  • 8/8/2019 ARM1156T2 S Architecture

    25/38

    SUNPLUS

    Technology for Easy Living25

    MPU(Cont.)MPU(Cont.)

    Region base address Region-sized boundary if not follow, Unpredictable behavior

    Region size 32 bytes to 4GB

    Region attributes Memory Type (Strongly ordered, Device, or Normal)

    Shared/Non-Shared Non-Cacheable

    Write-through Cacheable

    Write-back Cacheable

    Access permission User and privileged mode

    Read/Write

  • 8/8/2019 ARM1156T2 S Architecture

    26/38

    SUNPLUS

    Technology for Easy Living26

    Overlap examplesOverlap examples

    Region 1 Base is 0x0000

    Privileged mode full

    access, user mode

    read-only

    Region 2

    Base is 0x3000

    User mode full access

    only

  • 8/8/2019 ARM1156T2 S Architecture

    27/38

    SUNPLUS

    Technology for Easy Living27

    Overlap examples(2)Overlap examples(2)

    Region 1 Base is 0x0

    Full access by both

    modes

    Region 2 Base is 0x0

    No access

  • 8/8/2019 ARM1156T2 S Architecture

    28/38

    SUNPLUS

    Technology for Easy Living28

    Memory map at resetMemory map at reset

    2 Giga

    1 Giga

  • 8/8/2019 ARM1156T2 S Architecture

    29/38

    SUNPLUS

    Technology for Easy Living29

    MPU EnableMPU Enable

    Before enable MPU setting up at least one memory region Clean and invalidate the data cache.

    Invalidate the instruction caches.

    Address generation from Load Store Unit or Prefetch Unit Not match in configured memory region

    Background fault generation, Fault Statue Register is filled Alignment fault

    Background fault

    Permission fault

    Matching one memory region No Permissionmemory abort

    Determine is cached, uncached, or shared

    The highest priority memory region is applied

  • 8/8/2019 ARM1156T2 S Architecture

    30/38

    SUNPLUS

    Technology for Easy Living30

    MPU DisableMPU Disable

    Before disable MPU setting up at least one memory region

    Clean and invalidate the data cache.

    Invalidate the instruction caches.

    No access permission check, no abort generation Memory map is default

    Instruction and data prefetch operations work asnormal

    Access to TCM work as normal

  • 8/8/2019 ARM1156T2 S Architecture

    31/38

    SUNPLUS

    Technology for Easy Living31

    Memory attributes and typesMemory attributes and types

    Mutually exclusive type attributes Strongly Ordered

    Device

    Normal

    Shared

    access by multiple processors

    Non-shared

    access by one single processor

    c6, Region Control Register S(Shared) bit only apply toNormal memory, not Device or Strongly Order memory

  • 8/8/2019 ARM1156T2 S Architecture

    32/38

    SUNPLUS

    Technology for Easy Living32

    Strongly Ordered memoryStrongly Ordered memory

    Access to memory marked as StronglyOrdered acts as a memory barrier to all

    other explicit accesses from that processor

    Address marked as Strongly ordered Noncacheable

    shared

  • 8/8/2019 ARM1156T2 S Architecture

    33/38

    SUNPLUS

    Technology for Easy Living33

    Memory BarriersMemory Barriers

    A class of instructions which cause a CPU toenforce an ordering constraint on memoryoperations issued before and after the barrierinstruction.

    Performance optimizations can result in out-of-

    order-execution, Ex.: load and store

    Memory operation reordering normally goesunnoticed within a single task, but causesunpredictable behaviour in multi-tasks and device

    drivers unless carefully controlled

  • 8/8/2019 ARM1156T2 S Architecture

    34/38

    SUNPLUS

    Technology for Easy Living34

    Memory Barriers(Cont.)Memory Barriers(Cont.)

    CP15, c7 Data Memory Barrier

    ensures that all explicit memory transactionsoccurring in program order before this instruction arecompleted

    Drain Write Buffer

    Flush Prefetch Buffer

    Invalidate the I-Cache

    Clean D-Cache

  • 8/8/2019 ARM1156T2 S Architecture

    35/38

    SUNPLUS

    Technology for Easy Living35

    NormalNormal

    Cacheable write-through Cacheable write-back

    Noncacheable.

    Shared, non-shared ARM1156T2-S do not cache shareable locations

    If a memory region is covered by TCM, always

    non-shared. If it is marked as shared, it results in

    Unpredictable behavior

  • 8/8/2019 ARM1156T2 S Architecture

    36/38

    SUNPLUS

    Technology for Easy Living36

    DeviceDevice

    A region with Device attribute is not heldin a cache

    In the ARM1156T2-S processor Non-

    Shared Device attribute is assigned to theperipheral Port and Shared Device

    attribute is assigned to the system bus

  • 8/8/2019 ARM1156T2 S Architecture

    37/38

    SUNPLUS

    Technology for Easy Living37

    Access PermissionAccess Permission

    The access permissions are determined by the AP[2:0]

    bits in the CP15, c6,Data Access Permission Registers.

  • 8/8/2019 ARM1156T2 S Architecture

    38/38