Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur...

26
Concurrent Autonomous Self- Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner, Intel Corporation Subhasish Mitra, Stanford University 1

Transcript of Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur...

Page 1: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Concurrent Autonomous Self-Test for Uncore Components in SoCs

Yanjing Li, Stanford University

Onur Mutlu, Carnegie Mellon University

Donald S. Gardner, Intel Corporation

Subhasish Mitra, Stanford University

1

Page 2: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Overcoming CMOS Reliability Challenges

2

Circuit agingEarly-life failures LifetimeTime

Failure rate

Burn-in difficult

Guardbands expensive

On-line self-test and diagnostics

Soft errorsBuilt-In Soft Error

Resilience (BISER)

Page 3: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Uncore Components Significant in SoCs

Cisco Network Processing Engine

Uncore

Components

Uncore

Components

NVIDIA Tegra

Uncore Components

Uncore Components

IBM Power 7

© techvishal.wordpress.com

© news.cnet.com

© ciscosistemas.org

Uncore examples

Controllers for cache & DRAM

Crossbar

I/O interfaces

3

Page 4: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Robust Uncore Essential

Uncore12%

Processor cores12%

Memories76%

New on-line self-test for uncore

CASP for processor cores [Li DATE 08, ICCAD 09]

ECC, Memory BIST & repair for memories

8-cores 64-threads

OpenSPARC T2 SoC

© opensparc.net

Uncore

4

Page 5: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Challenge 1: High Test Coverage

CASP Logic BIST Roving Emulation

Coverage High ? Depends

Cost Low High High

Design effort Moderate High High

CASP: Concurrent, Autonomous, Stored Patterns

High-coverage patterns off-chip FLASH

System-level on-line test access

FLASH cheap, test compression pervasive

5

Page 6: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

© intel.com

Challenge 2: Power, Performance, Area Costs

Stall-and-test inadequate 4-core Intel® Core™ i7 system results

On-line self-test

Requests from multiple cores

DRAM Controller

Core

Caches and Interconnects

Core Core Core

Unresponsiveness or system hang

Multiple cores stall

6

Page 7: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Naïve Approaches Inadequate for Uncore

Stall-and-test

Unresponsiveness or complete hang

Spare unit for each uncore type

12% area overhead*

Small area cost

Small performance

impact

Uncore CASP new techniques required

* OpenSPARC T2 design 7

Page 8: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

New Uncore On-line Self-Test Principles

I. Resource reallocation and sharing (RRS)

II. No-performance-impact testing

III. Smart backup

< 1% area impact, < 3% performance impact

©opensparc.net

OpenSPARC T2 SoC

8

Page 9: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

I. Resource Reallocation and Sharing (RRS)

Components with “similar” functionality in SoCs

Temporary reallocation and sharing

Small performance hit without replication

©opensparc.net

4 cores

On-line self-test4. Reroute

Crossbar blocks

CASP controller

L2 banks

4 cores

2. Transfer dirty lines

3. Invalidate

1. Stall and drain requests

OpenSPARC T2

9

Page 10: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

II. No-Performance-Impact Testing

©opensparc.net

4 cores

On-line self-test

RRS

CASP controller

L2 banks

4 cores

OpenSPARC T2

IDLE

Implication-relations among SoC components

Component(s) tested when idle

During test of another component

Crossbar blocks

10

Page 11: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

III. Smart Backup

DMA for network

DMA for disks

I/O interface

Support in smart backup

Stall or handle slowly via

Programmed I/O

Programmed

I/O

Operations with different requirements

Backup unit for performance-critical operations

Absolute minimal additional hardware

OpenSPARC T2

11

Page 12: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Application Performance Impact Memory-centric

I/O-centric on 4-core Intel system

Disk access: 3% impact

Uncore CASP emulated

4-core Intel® Core™ i7

© intel.com

Execution time

impact

PARSEC benchmarks

No visible unresponsiveness

1.5% performance impact

12

Page 13: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Area and Power Impact

CASP controller(< 0.01% area)

OFF-CHIP FLASH

200 MB On-chip buffer(8KB)

Uncore on-line self-test principles applied

© opensparc.net

Minimal area impact: < 1%

Minimal power impact: < 1%13

Page 14: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Test Results for Uncore Components

200 MB off-chip FLASH

10X test compression

7 ms – 300 ms test time per component

Total pattern count Test coverage

Stuck-at 5,577 99.2% - 99.9%

Transition 11,049 92.8% - 97.8%

Inexpensive FLASH

Thorough on-line self-test14

Page 15: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Logic BISTConcurrent BIST

[Saluja IEEE TCAD 88]

Uncore CASP [This work]

CoverageHigh with high

costsDepends High

Area Cost

HighHigh costs possible

Low

Design complexity

Moderate

Performance impact

Low with our uncore

principlesLow Low

Uncore CASP vs. Existing Techniques

15

Page 16: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

CASP Applicable for Other SoCs

Cisco Network Processing EngineNVIDIA Tegra

IBM Power 7 I. RRS

II. No-performance-impact testing

III. Smart backup

IV. Core CASP

© techvishal.wordpress.com

© news.cnet.com

© ciscosistemas.org16

Page 17: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

CASP adaptive on-line self-test & diagnostics

3 new principles for uncore CASP

I. Resource reallocation and sharing (RRS)

II. No-performance-impact testing

III. Smart backup

Effective and practical

High test coverage

1% power, 3% performance, 1% area

Conclusions

17

Page 18: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

18

Backup Slides

Page 19: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

CASP on Actual Intel® Core™ i7 System Intel Research collaboration

Quad-core Intel® Core™ i7 (3.2 GHz)

Thermoelectric temperature controller

Debug tool

Unique real-life experiment

Development of adaptive self-diagnostics

Debut Tool Adapter

TemperatureController

19

Page 20: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

20

CASP Flow

4. Resume operationScan chain

3. Apply / analyze high-quality test patterns

(test compression, at-speed test…)

1. Select uncore or core component

2. Isolate

SoC with CASP controller(mulit-core SoC proliferation)

Inexpensive off-chip FLASH(non-volatile storage technology)

Page 21: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

RRS Example: L2 Cache Banks

3b. Transfer necessary states (dirty blocks)

Write-backto main memory if necessary

Crossbar

DRAM Controller 0

Bank 0(under test)

DataTagetc.

Controller

1. Stall cache controller

2. Drain outstanding requests

3a. Invalidate clean blocks; Invalidate directory; Invalidate L1

4. Route packets with destination {bank 0, bank 1} to bank 1

Bank 1(helper)

Controller

DataTagetc.

21

Page 22: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

22

No-Performance-Impact Testing Example: CCX (Crossbar)

8 cores , 64 threads

L2 Bank 0 L2 Bank 7

CCX: multiplexers and arbitration logic 0

CCX: multiplexers and arbitration logic 7

Separate scan chains

Separate scan chains

Packets reallocated to helper

Test at the same time

Page 23: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

23

Smart Backup Example: Non-Cachable Unit

5. Select outputs from backup

3.Turn onReset

4. Transfer states

MUX

PIO

Boot ROM

interface

1. Stall2. Drain outstanding requests

Interrupt status table

Interrupt processing

Config. status

register interface

Original (under test)

PIO

Interrupt processing

Backup

Minimize area costs at acceptable performance impact

Page 24: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Naïve Approaches Inadequate for Uncore

Simple stall-and-test technique

OS timer interrupt handler on core i

DRAM controller

Request to DRAM

Under testStall

Demonstration on actual 4-core Intel® Core™ i7 system

Infrequent Test

Noticeable unresponsiveness

Frequent Test

System hang

Identical backup units: 12% area overhead

OS timer interrupt handler on core 1

Stall

24

Page 25: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Performance Impact

Simulated Latency Overhead (PARSEC Benchmark Suite)

Tool: GEMS simulator (modified for RRS)

Workload: PARSEC benchmark suite

4 threads on 4 cores, CASP runs 1 sec. every 10 sec.

25

Page 26: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

III. Smart Backup

DMA for network

DMA for disks

I/O interface

Support in smart backup

Stall or handle slowly via

Programmed I/O

Programmed

I/O

Operations with different requirements

Backup unit for performance-critical operations

Absolute minimal additional hardware

OpenSPARC T2 Ethernet port interface

Layers 3 and 4 acceleration

Network interface

Support in smart backup

OSorchestration

Layer 2

packet process

OpenSPARC T2

26