Tolerating Memory Leaks

75
Tolerating Memory Leaks Michael D. Bond Kathryn S. McKinley

description

Tolerating Memory Leaks. Michael D. Bond Kathryn S. McKinley. Bugs in Deployed Software. Deployed software fails Different environment and inputs  different behaviors Greater complexity & reliance. Bugs in Deployed Software. Deployed software fails - PowerPoint PPT Presentation

Transcript of Tolerating Memory Leaks

Tolerating Memory Leaks

Tolerating Memory LeaksMichael D. Bond Kathryn S. McKinley

Bugs in Deployed SoftwareDeployed software failsDifferent environment and inputs different behaviorsGreater complexity & reliance

Bugs in Deployed SoftwareDeployed software failsDifferent environment and inputs different behaviorsGreater complexity & reliance

Memory leaks are a real problemFixing leaks is hardMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themLiveReachableDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themLiveReachableDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themLiveReachableDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themLiveReachableDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themSlow & crash real programsLiveDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themSlow & crash real programsUnacceptable for some applicationsMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themSlow & crash real programsUnacceptable for some applications

Fixing leaks is hardLeaks take time to materializeFailure far from cause

Examplehttp://www.codeproject.com/KB/showcase/IfOnlyWedUsedANTSProfiler.aspxDriverless truck10,000 lines of C#Leak: past obstacles remained reachableNo immediate symptomsThis problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles.Quick fix: after 40 minutes, stop & rebootEnvironment sensitiveMore obstacles in deployment: failed in 28 minutes

Examplehttp://www.codeproject.com/KB/showcase/IfOnlyWedUsedANTSProfiler.aspxDriverless truck10,000 lines of C#Leak: past obstacles remained reachableNo immediate symptomsThis problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles.Quick fix: restart after 40 minutesEnvironment sensitiveMore obstacles in deploymentFailed in 28 minutes

Examplehttp://www.codeproject.com/KB/showcase/IfOnlyWedUsedANTSProfiler.aspxDriverless truck10,000 lines of C#Leak: past obstacles remained reachableNo immediate symptomsThis problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles.Quick fix: restart after 40 minutesEnvironment sensitiveMore obstacles in deploymentFailed in 28 minutes

ExampleDriverless truck10,000 lines of C#Leak: past obstacles remained reachableNo immediate symptomsThis problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles.Quick fix: restart after 40 minutesEnvironment sensitiveMore obstacles in deploymentUnresponsive after 28 minuteshttp://www.codeproject.com/KB/showcase/IfOnlyWedUsedANTSProfiler.aspxUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsAlso tolerate leaksUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsAlso tolerate leaksIllusion of fixUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsAlso tolerate leaksIllusion of fixEliminate bad effectsDont slowDont crashUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsAlso tolerate leaksIllusion of fixEliminate bad effectsPreserve semanticsDont slowDont crashDefer OOM errorsPredicting the FutureDead objects not used againHighly stale objects likely leakedLiveReachableDeadPredicting the FutureDead objects not used againHighly stale objects likely leaked

[Chilimbi & Hauswirth 04][Qin et al. 05][Bond & McKinley 06]LiveReachableDeadTolerating Leaks with MeltMove highly stale objects to diskMuch larger than memoryTime & space proportional to live memoryPreserve semantics

Stale objectsIn-use objectsStale objectsSounds like Paging!

Stale objectsIn-use objectsStale objectsSounds like Paging!Paging insufficient for managed languagesNeed object granularity

GCs working set is all reachable objectsSounds like Paging!Paging insufficient for managed languagesNeed object granularity

GCs working set is all reachable objects

Bookmarking collection [Hertz et al. 05]Challenge #1: How does Melt identify stale objects?rootsAEBCFD

rootsAEBCFDGC:for all fields a.f a.f |= 0x1;

Challenge #1: How does Melt identify stale objects?rootsGC:for all fields a.f a.f |= 0x1;

AEBCFDChallenge #1: How does Melt identify stale objects?rootsGC:for all fields a.f a.f |= 0x1;

Application:b = a.f;if (b & 0x1) { b &= ~0x1; a.f = b; [atomic]}

AEBCFDChallenge #1: How does Melt identify stale objects?rootsAEBCFDGC:for all fields a.f a.f |= 0x1;

Application:b = a.f;if (b & 0x1) { b &= ~0x1; a.f = b; [atomic]}

Add 6% to application timeChallenge #1: How does Melt identify stale objects?rootsGC:for all fields a.f a.f |= 0x1;

Application:b = a.f;if (b & 0x1) { b &= ~0x1; a.f = b; [atomic]}

AEFDCBChallenge #1: How does Melt identify stale objects?rootsGC:for all fields a.f a.f |= 0x1;

Application:b = a.f;if (b & 0x1) { b &= ~0x1; a.f = b; [atomic]}

AEFDBCChallenge #1: How does Melt identify stale objects?Stale Spacestale space

roots

AEFDBCin-use spaceHeap nearly full move stale objects to diskStale Spacerootsin-use spacestale spaceAEBFCD

Challenge #2rootsin-use spacestale spaceAEBFCD

How does Melt maintain pointers?Stub-Scion Pairsrootsin-use spacestale spaceAEBFCD

BstubBscionscionspaceStub-Scion Pairsrootsin-use spacestale spaceAEBFCD

BstubBscionscionspaceB BscionsciontableStub-Scion Pairsrootsin-use spacestale spaceAEBFCD

BstubBscionscionspaceB Bscionsciontable?Scion-Referenced Object Becomes Stalerootsin-use spacestale spacescionspaceB BscionsciontableAEFCDBstubBscionB

Scion-Referenced Object Becomes Stalerootsin-use spacestale spacescionspacesciontableAEFCD

BstubBrootsin-use spacestale spacescionspacesciontableAEFCD

BstubBChallenge #3What if program accesses highly stale object?Application Accesses Stale Objectrootsin-use spacestale spacescionspacesciontableAEFCD

BstubB b = a.f; if (b & 0x1) { b &= ~0x1; if (inStaleSpace(b)) b = activate(b); a.f = b; [atomic] }Application Accesses Stale Objectrootsin-use spacestale spacescionspaceC CscionsciontableEFCstubDABstubBCCscion

Application Accesses Stale Objectrootsin-use spacestale spacescionspaceC CscionsciontableFCstubDABstubBCscion

ECApplication Accesses Stale Objectrootsin-use spacestale spacescionspaceC CscionsciontableFCstubDABstubBCscion

ECImplementationIntegrated into Jikes RVM 2.9.2Works with any tracing collectorEvaluation uses generational copying collectorImplementationIntegrated into Jikes RVM 2.9.2Works with any tracing collectorEvaluation uses generational copying collector64-bit120 GB

32-bit2 GB

ImplementationIntegrated into Jikes RVM 2.9.2Works with any tracing collectorEvaluation uses generational copying collector64-bit120 GB

mappingstub32-bit2 GBPerformance EvaluationMethodologyDaCapo, SPECjbb2000, SPECjvm98Dual-core Pentium 4Deterministic execution (replay)Results6% overhead (read barriers)Stress test: still 6% overheadSpeedups in tight heaps (reduced GC workload)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Eclipse Diff: Reachable Memory

Eclipse Diff: Reachable Memory

Eclipse Diff: Performance

Eclipse Diff: Performance

Eclipse Diff: Performance

Managed [LeakSurvivor, Tang et al. 08] [Panacea, Goldstein et al. 07, Breitgand et al. 07]Dont guarantee time & space proportional to live memory

Native [Cyclic memory allocation, Nguyen & Rinard 07] [Plug, Novark et al. 08]Different challenges & opportunitiesLess coverage or change semantics

Orthogonal persistence & distributed GCBarriers, swizzling, object faulting, stub-scion pairs

Related WorkConclusionFinding bugs before deployment is hard

ConclusionFinding bugs before deployment is hard

Online diagnosis helps developersHelp users in meantime

Tolerate leaks with Melt: illusion of fix

Stale objectsConclusionFinding bugs before deployment is hard

Online diagnosis helps developersHelp users in meantime

Tolerate leaks with Melt: illusion of fixTime & space proportional to live memoryPreserve semanticsConclusionFinding bugs before deployment is hard

Online diagnosis helps developersHelp users in meantime

Tolerate leaks with Melt: illusion of fixTime & space proportional to live memoryPreserve semanticsBuys developers time to fix leaksThank you!BackupTriggering MeltINACTIVEMARK STALEMOVE & MARK STALEWAITHeap not nearly fullHeap full ornearly fullHeap full ornearly fullStartExpected heapfullnessHeap notnearly fullAftermarkingUnexpectedheap fullnessBackConclusionFinding bugs before deployment is hard

Online diagnosis helps developersTo help users in meantime, tolerate bugs

Tolerate leaks with Melt: illusion of fix

Stale objectsRelated Work: Tolerating BugsNondeterministic errors [Atom-Aid] [DieHard] [Grace] [Rx]Memory corruption: perturb layoutConcurrency bugs: perturb schedulingGeneral bugsIgnore failing operations [FOC]Need higher level, more proactive approaches

Melts GC OverheadBack