Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan...

19
Security-Driven Security-Driven Heuristics Heuristics and A Fast Genetic and A Fast Genetic Algorithm for Trusted Algorithm for Trusted Grid Job Scheduling Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Shanshan Song, Ricky Kwok, and Kai Hwang Hwang University of Southern California University of Southern California Los Angeles, CA 90089 USA Los Angeles, CA 90089 USA Presented by Shanshan Song at the IEEE Presented by Shanshan Song at the IEEE IPDPS’05, Denver, Colorado, April 6, 2005 IPDPS’05, Denver, Colorado, April 6, 2005 The work was supported by the NSF ITR Grant The work was supported by the NSF ITR Grant 0325409 0325409
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    3

Transcript of Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan...

Security-Driven Heuristics Security-Driven Heuristics and A Fast Genetic Algorithm and A Fast Genetic Algorithm for Trusted Grid Job Scheduling for Trusted Grid Job Scheduling

Shanshan Song, Ricky Kwok, and Kai HwangShanshan Song, Ricky Kwok, and Kai Hwang

University of Southern CaliforniaUniversity of Southern CaliforniaLos Angeles, CA 90089 USALos Angeles, CA 90089 USA

Presented by Shanshan Song at the IEEE IPDPS’05, Presented by Shanshan Song at the IEEE IPDPS’05, Denver, Colorado, April 6, 2005Denver, Colorado, April 6, 2005

The work was supported by the NSF ITR Grant 0325409The work was supported by the NSF ITR Grant 0325409

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 22

Presentation Outline:Presentation Outline: MotivationsMotivations The System ModelThe System Model

Three security-driven scheduling strategiesThree security-driven scheduling strategies To bind security to existing time-driven To bind security to existing time-driven

heuristics for parallel job scheduling heuristics for parallel job scheduling A New Space-Time Genetic Algorithm (STGA)A New Space-Time Genetic Algorithm (STGA)

Performance Metrics and WorkloadsPerformance Metrics and Workloads NAS and PSA Benchmark Results NAS and PSA Benchmark Results ConclusionsConclusions

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 33

MotivationsMotivations Highly shared Grid resources create severe Highly shared Grid resources create severe

insecurity problems and privacy concerns. insecurity problems and privacy concerns.

Most schedulers ignored the ‘risky’ factor Most schedulers ignored the ‘risky’ factor

when scheduling large number of jobs in a when scheduling large number of jobs in a

risky real-life Grid environment. risky real-life Grid environment.

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 44

……

Deterministic

……

Adaptive

Historicaldatabase

……

Security - Driven Model:

High secureHigh secure Low secure siteLow secure site

High demandHigh demand Low demand jobLow demand job

Parallel Job Scheduling Scenario in Risky Computational GridsParallel Job Scheduling Scenario in Risky Computational Grids

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 55

(a) Secure (b) Risky (a) Secure (b) Risky (c) (c) ff - Risky - Risky

Historicaldatabase

The bad thing always could happen --- Murphy’s Law

We are scared, Let us just wait

We don’t care, just do it. I am

courageous, not a kid anymore …

I calculate too, maybe I am lucky …

I run a calculated risk, but wait a while …

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 66

Three Scheduling ModesThree Scheduling Modes Secure modeSecure mode – Allocate jobs only to those Grid sites with – Allocate jobs only to those Grid sites with

security level exceeding the job requirement (SD < SL)security level exceeding the job requirement (SD < SL) Risky modeRisky mode – Allocate jobs to any available Grid sites without – Allocate jobs to any available Grid sites without

checking the risk level or the job demandchecking the risk level or the job demand f f - risky mode- risky mode – Allocate jobs to those Grid sites taking at most – Allocate jobs to those Grid sites taking at most

ff risk. E.g.: risk. E.g.: ff = 0.5 (50%) = 0.5 (50%)

Secure Secure ff-Risky-Risky RiskyRisky( ) 0P fail ( )P fail f ( ) 100%P fail

Risk Scale:Risk Scale:0 0 ff 100%100%

( )

0 if ( )

1 if SD SL

SD SLP fail

e SD SL

The Failure Model:The Failure Model:

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 77

Scheduling Heuristics under Three Scheduling Heuristics under Three Risky Modes Risky Modes Min-Min heuristic:Min-Min heuristic:

For each job, the resource site that gives the earliest For each job, the resource site that gives the earliest expected completion time is determined first. The job that expected completion time is determined first. The job that has the minimum earliest expected completion time is has the minimum earliest expected completion time is determined and then assigned to the corresponding site.determined and then assigned to the corresponding site.

Sufferage heuristic:Sufferage heuristic: The Sufferage heuristic is based on the idea that better The Sufferage heuristic is based on the idea that better

mappings can be generated by assigning a site to a job that mappings can be generated by assigning a site to a job that would “suffer” most in terms of expected completion time if would “suffer” most in terms of expected completion time if that particular site is not assigned to it. that particular site is not assigned to it.

Heuristic operational modes: Heuristic operational modes: Secure, Secure, ff - Risky, Risky - Risky, Risky

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 88

Genetic Algorithm (GA)Genetic Algorithm (GA) Genetic Algorithm (GA) is a popular technique used Genetic Algorithm (GA) is a popular technique used

for searching large solution spaces for searching large solution spaces It is powerful for generating good solutionIt is powerful for generating good solution It is not widely deployed for its long computation timeIt is not widely deployed for its long computation time

Number of Evolution Iterations

Solution Quality

Generate RandomInitial Population

Good Solution is found

STGA Starting Point

GASTGA

Traditional GA vs. STGA in term of Number of Evolution IterationsTraditional GA vs. STGA in term of Number of Evolution Iterations

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 99

How does STGA Work?How does STGA Work?STGA: Space-Time Genetic AlgorithmSTGA: Space-Time Genetic Algorithm

InputInput SolutionSolution

(%%,**, ###)(%%,**, ###) (423…56)(423…56)

…… ……

(%%,****, ###)(%%,****, ###) (368…89)(368…89)

Lookup TableLookup Table

(%%%, ***, ####)

One batch of jobsOne batch of jobs

(456 … 34)…

(167 … 89)

RandomlyGeneratedSolutions

(123 … 786)GA

Final SolutionFinal Solution

Initial Population

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1010

STGA Convergence TimeSTGA Convergence Time

0 25 50 75 100 125 150 175 200150000

155000

160000

165000

170000 PSA, N=1000

Ma

kesp

an

(se

con

ds)

Number of Iterations in STGA

Converge at 50 iterations, FAST!!!

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1111

Performance Metrics and Performance Metrics and WorkloadsWorkloads Performance MetricsPerformance Metrics

Makespan, slowdown ratio, and average response timeMakespan, slowdown ratio, and average response time Site utilization Site utilization Number of failed jobs & number of risk-taking jobsNumber of failed jobs & number of risk-taking jobs

Numerical Aerodynamic Simulation (NAS) WorkloadNumerical Aerodynamic Simulation (NAS) Workload A package contains three months worth of sanitized A package contains three months worth of sanitized

accounting records for the 128-node iPSC/860 located in accounting records for the 128-node iPSC/860 located in the Numerical Aerodynamic Simulation (NAS) Systems the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center. Division at NASA Ames Research Center.

Parameter Sweep Application (PSA) WorkloadParameter Sweep Application (PSA) Workload Contains a set of independent tasksContains a set of independent tasks Each task has some input files for different parametersEach task has some input files for different parameters

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1212

Performance Results (Makespan)Performance Results (Makespan) NAS trace workload (16000 jobs, 12 sites)NAS trace workload (16000 jobs, 12 sites) Job arrival rate and workload are from trace dataJob arrival rate and workload are from trace data STGA evolution iterations: 100 STGA evolution iterations: 100 (GA: (GA: 1000 iterations)

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1313

Performance Results (Response Time)Performance Results (Response Time)

NAS trace workload (16000 jobs, 12 sites)NAS trace workload (16000 jobs, 12 sites) Job arrival rate and workload are from trace dataJob arrival rate and workload are from trace data

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1414

Performance Results (Utilization)Performance Results (Utilization) NAS trace workload (16000 jobs, 12 sites)NAS trace workload (16000 jobs, 12 sites) Job arrival rate and workload are from trace dataJob arrival rate and workload are from trace data

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1515

Scalability AnalysisScalability Analysis The scalability analysis is conducted on Number of The scalability analysis is conducted on Number of

simulated jobs (PSA workload)simulated jobs (PSA workload) NN = 1000, 2000, 5000, and 10000 = 1000, 2000, 5000, and 10000

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1616

ConclusionsConclusions Security binding technique can be applied to Security binding technique can be applied to

improve any time-driven heuristics for online improve any time-driven heuristics for online scheduling of parallel jobs scheduling of parallel jobs in an open risky in an open risky Grid computing environment. Grid computing environment.

The new STGA algorithm works by swiftly The new STGA algorithm works by swiftly generating good scheduling solutions based generating good scheduling solutions based on a prior job execution experience on Grid on a prior job execution experience on Grid platforms. Both NAS and PSA benchmark platforms. Both NAS and PSA benchmark results show the superiority of STGA over results show the superiority of STGA over the heuristics algorithms applied.the heuristics algorithms applied.

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1717

Min-Min and Sufferage HeuristicsMin-Min and Sufferage Heuristics

Min-Min heuristics:Min-Min heuristics: For each job, the resource For each job, the resource

site that gives the earliest site that gives the earliest expected completion time is expected completion time is determined first. The job that determined first. The job that has the minimum earliest has the minimum earliest expected completion time is expected completion time is determined and then determined and then assigned to the assigned to the corresponding site.corresponding site.

Sufferage heuristics: Sufferage heuristics: The Sufferage heuristic is The Sufferage heuristic is

based on the idea that better based on the idea that better mappings can be generated mappings can be generated by assigning a site to a job by assigning a site to a job that would “suffer” most in that would “suffer” most in terms of expected completion terms of expected completion time if that particular site is time if that particular site is not assigned to it. not assigned to it.

  Job1 Job2 Job3

Site1 3 5 7

Site2 2 4 3

Site3 6 9 10

Expected Time to Complete Matrix

  Job1 Job2 Job3

Site1 3 5 7

Site2 2 4 3

Site3 6 9 10

Expected Time to Complete Matrix

Suffer value: 1 1 4Suffer value: 1 1 4

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1818

Genetic Algorithm Overview Genetic Algorithm Overview Genetic Algorithms (GAs) are a popular technique used for Genetic Algorithms (GAs) are a popular technique used for

searching large solution spacessearching large solution spaces ‘‘selection’, ‘crossover’, and ‘mutation’ operationsselection’, ‘crossover’, and ‘mutation’ operations Selection Selection keep good solutions keep good solutions Crossover Crossover global optimization global optimization

Mutation Mutation local jumping local jumping

0

1

0

1

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0.3 0.6 0.9 0.6

Initial Population

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0.9 0.6 0.9 0.6

After selection

0

0

0

1

0

1

1

0

1

0

0

0

0

1

0

0

1

0

0

1

1.0 0.4 0.9 0.6

After crossover

0

0

0

0

1

1

1

0

1

0

1

0

0

1

0

0

1

0

0

1

1.0 0.4 0.8 0.6

After mutation

0

0

0

0

1

http://GridSec.usc.eduhttp://GridSec.usc.eduApril 6, 2005April 6, 2005 1919

How does GA apply to job How does GA apply to job scheduling?scheduling? What we have:What we have:

A set of resource sitesA set of resource sites A number of jobsA number of jobs

Solution need to generate:Solution need to generate: Job and site mappingJob and site mapping

site4site4 site3site3 site5site5 site2site2 site2site2

Job1 Job2 Job3 Job4 Job5

One solution (chromosome in GA)One solution (chromosome in GA)

site4site4 site3site3 site5site5 site2site2 site2site2

site3site3 site1site1 site5site5 site3site3 site2site2

site1site1 site3site3 site4site4 site2site2 site6site6

Initial population (size=200)Initial population (size=200)Job1 Job2 Job3 Job4 Job5