CrashDump 2010-04-20 1 Microsoft Corporation. Agenda Problem Unexplainable field phenomena New...

Post on 17-Jan-2016

217 views 0 download

Tags:

Transcript of CrashDump 2010-04-20 1 Microsoft Corporation. Agenda Problem Unexplainable field phenomena New...

Microsoft Corporation 1

CrashDumpCrashDump

2010-04-20

Microsoft Corporation 2

Agenda

Agenda

• Problem• Unexplainable field phenomena

• New Developments• in Crashdump

• Solution• How to get to ‘The Why’?

• Opportunity• New work and timelines

• Problem• Unexplainable field phenomena

• New Developments• in Crashdump

• Solution• How to get to ‘The Why’?

• Opportunity• New work and timelines

2010-04-20

Microsoft Corporation 3

Unexplainable Field Phenomena

Unexplainable Field Phenomena

• All of these devices worked normally after reboot. • No defects were found by file system scan.

• All of these devices worked normally after reboot. • No defects were found by file system scan.

2010-04-20

RE Break 7729 0 x86fre fbl_core1 100318-2018 ps-ps Timeout during installing break in webio dll.msg

RE I O Stress Failure # 60589.msg

RE Bugcheck KERNEL_DATA_INPAGE_ERROR (7a).msg

Microsoft Corporation 4

Unexplainable Field Phenomena

Unexplainable Field Phenomena

• Microsoft has been tracking these for years• Things aren’t getting better• Customers expect a solution from us• We have nothing to give them

• There are 2 theories• The device has a flaw• The device has been mishandled

• Microsoft has been tracking these for years• Things aren’t getting better• Customers expect a solution from us• We have nothing to give them

• There are 2 theories• The device has a flaw• The device has been mishandled

2010-04-20

Microsoft Corporation 5

Theory #1: The device has a flaw

Theory #1: The device has a flaw

• Goal: Address the flaw.• Assumption:

• ATA devices are sophisticated enough to perform their own internal ‘crashdump’.

• Microsoft is not able to address these issues along.

• Digression: Microsoft attempt to do this with the millions of crash reports it receives every day

• In general, user mode crashes are available to partners from http://winqual.microsoft.com through different portals.

• Partners with kernel mode drivers can download ~50 randomly selected CABs for a given bucket through the WER portal

• Partners only receive external mini dumps. Full dumps and internal crashes may only be given out by selected groups.

• Kernel mode crashes typically are driver issues that cause Blue Screens of Death or reset the machine. Analysis of data has found that device failure is a significant source of perceived “driver issues”.

• Goal: Address the flaw.• Assumption:

• ATA devices are sophisticated enough to perform their own internal ‘crashdump’.

• Microsoft is not able to address these issues along.

• Digression: Microsoft attempt to do this with the millions of crash reports it receives every day

• In general, user mode crashes are available to partners from http://winqual.microsoft.com through different portals.

• Partners with kernel mode drivers can download ~50 randomly selected CABs for a given bucket through the WER portal

• Partners only receive external mini dumps. Full dumps and internal crashes may only be given out by selected groups.

• Kernel mode crashes typically are driver issues that cause Blue Screens of Death or reset the machine. Analysis of data has found that device failure is a significant source of perceived “driver issues”.

2010-04-20

• Goal: Address the flaw.• Assumption:

• ATA devices are sophisticated enough to perform their own internal ‘crashdump’.

• Microsoft is not able to address these issues along.

• Goal: Address the flaw.• Assumption:

• ATA devices are sophisticated enough to perform their own internal ‘crashdump’.

• Microsoft is not able to address these issues along.

Microsoft Corporation 6

Agenda

Agenda

• Problem• Unexplainable field phenomena

• New Developments• in CrashDump

• Solution• How to get to ‘The Why’?

• Opportunity• New work and timelines

• Problem• Unexplainable field phenomena

• New Developments• in CrashDump

• Solution• How to get to ‘The Why’?

• Opportunity• New work and timelines

2010-04-20

7

Cloud Services (OCA,SQM, RAC)

IHVEnd user

Improved reliability of

Windows storage experience

Windows devices customer experience data flow

MS info

Vendor info

8

Response ExampleResponse Example

OCA process and workflowOCA process and workflow

• Crash occurs on the client• WER client collect crash data • Microsoft shares data with software

developers• Software developers troubleshoot• Software developers respond

to Microsoft and Customer

9

Microsoft Corporation 10

OCA’s Expanding FocusOCA’s Expanding Focus

2010-04-20

+Devices

+Drivers

+ISVs

MSFT

Microsoft Corporation 11

Theory #2: The device has been mishandled

Theory #2: The device has been mishandled

• Goal: Enable proper device handling.• Assumptions:

• Device has background scan information about internal issues, error handling, and results attempted corrections.

• This background scan information would be useful to manufacturers if there was a method for delivering it from active deployed systems.

• Background scanning can result in actionable requests from devices, improving robustness, and raising handling issues to the users attention.

• Goal: Enable proper device handling.• Assumptions:

• Device has background scan information about internal issues, error handling, and results attempted corrections.

• This background scan information would be useful to manufacturers if there was a method for delivering it from active deployed systems.

• Background scanning can result in actionable requests from devices, improving robustness, and raising handling issues to the users attention.

2010-04-20

Microsoft Corporation 12

Agenda

Agenda

• Problem• Unexplainable field phenomena

• New Developments• in Crashdump

• Solution• How to get to ‘The Why’?

• Opportunity• New work and timelines

• Problem• Unexplainable field phenomena

• New Developments• in Crashdump

• Solution• How to get to ‘The Why’?

• Opportunity• New work and timelines

2010-04-20

Microsoft Corporation 13

How to get to ‘The Why’How to get to ‘The Why’• How to transport?

• Time limited • Size negotiation• Security

• When to transport?• Host triggers• Device triggers• Dump persistence and recycling

• What to transport?• Bucketization• Device CrashDump (flavors?)• Background scan info• Does DSM affect collection content?

• How big are we willing to let this feature become?

• How to transport?• Time limited • Size negotiation• Security

• When to transport?• Host triggers• Device triggers• Dump persistence and recycling

• What to transport?• Bucketization• Device CrashDump (flavors?)• Background scan info• Does DSM affect collection content?

• How big are we willing to let this feature become?

2010-04-20

Microsoft Corporation 14

Background Scan Coordination Components

Background Scan Coordination Components

• Idle time notification• Power event notification• Background Scan vs. Power policy

precedence• Host/Device Event synchronization

(TimeStamped)

• Idle time notification• Power event notification• Background Scan vs. Power policy

precedence• Host/Device Event synchronization

(TimeStamped)

2010-04-20

Microsoft Corporation 15

Background Scan Coordination

Considerations

Background Scan Coordination

Considerations

2010-04-20

Microsoft Corporation 16

Background Scan Coordination

Considerations

Background Scan Coordination

Considerations

2010-04-20

Microsoft Corporation 17

Background Scan Coordination

Considerations

Background Scan Coordination

Considerations

2010-04-20

Microsoft Corporation 18

Background Scan Coordination

Considerations

Background Scan Coordination

Considerations

2010-04-20

Microsoft Corporation 19

Agenda

Agenda

• Problem• Unexplainable field phenomena

• New Developments• in Crashdump

• Solution• How to get to ‘The Why’?

• Opportunity• New work and timelines

• Problem• Unexplainable field phenomena

• New Developments• in Crashdump

• Solution• How to get to ‘The Why’?

• Opportunity• New work and timelines

2010-04-20

Microsoft Corporation 20

New work and TimelinesNew work and Timelines• Call for feedback, now. • Proposal for T13 in June• Approval in August

• Call for feedback, now. • Proposal for T13 in June• Approval in August

2010-04-20