A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch...

23
Introduction Ensemble Class Argument Expansion A Class to Manage Large Ensembles and Batch Execution in Python PyCon Canada Andre R. Erler November 12 th , 2016 Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Transcript of A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch...

Page 1: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion

A Class to Manage Large Ensembles andBatch Execution in Python

PyCon Canada

Andre R. Erler

November 12th, 2016

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 2: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion

Outline

IntroductionScience is RepetitiveWhat I do

Batch Execution using an Ensemble ClassThe Ensemble ClassA Helper Class

Argument ExpansionOuter Product Implementation

Summary & Conclusion

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 3: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Repetitive Science What I do Motivation

Science is RepetitiveTo reach conclusive results, scientificexperiments usually have to be repeatedmany times; either to establish statisticalsignificance, or to test a range of parametervalues for optimization.

Experiments are planned andconducted in large batches orso-called ensembles.

Automation

It is therefore desirable toautomate the most repet-itive tasks, and to createtools for this purpose.

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 4: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Repetitive Science What I do Motivation

Science is RepetitiveTo reach conclusive results, scientificexperiments usually have to be repeatedmany times; either to establish statisticalsignificance, or to test a range of parametervalues for optimization.

Experiments are planned andconducted in large batches orso-called ensembles.

Automation

It is therefore desirable toautomate the most repet-itive tasks, and to createtools for this purpose.

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 5: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Repetitive Science What I do Motivation

Coupling Climate Modelswith Hydrologic Models

Surface Temperature in a Global and anested Regional Climate Model

I run Climate and Hydrologic Modelsto study the impact of climate changeon water resources and generate pro-jections of future hydro-climate.

Athabasca River watershed:groundwater depth (top) and surface waterdepth (bottom)

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 6: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Repetitive Science What I do Motivation

High Performance Computing

I High-resolution Climatesimulations:

I 4 days on 128 cores and300GB of storage permodel year

I 36 ensemble members, 15years each

I Surface-SubsurfaceHydrologic Simulations:

I 1 day on 2 cores permodel year

I also 15 years each, 100+ensemble members

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 7: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Repetitive Science What I do Motivation

Motivation:Batch ProcessingI In Computational Sciences

repetitive tasks can beautomated/scripted

Boilerplate Code

Python simplifies scripting alot, but we still have a lot ofboilerplate code! This can besimplified further.

Python is an Ideal Scripting Languageensemble = [...] # a list of objects ‘‘members’’

# for loop iterating over listtmp = [] # store resultsfor member in ensemble: # iterate over list

tmp.append( result = member.operation(*args, **kwargs) )ensemble = tmp

# list comprehension is already much shorter!ensemble = [m.operation(*args, **kwargs) for m in ensemble]

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 8: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Repetitive Science What I do Motivation

Motivation:Batch ProcessingI In Computational Sciences

repetitive tasks can beautomated/scripted

Boilerplate Code

Python simplifies scripting alot, but we still have a lot ofboilerplate code! This can besimplified further.

Python is an Ideal Scripting Languageensemble = [...] # a list of objects ‘‘members’’

# for loop iterating over listtmp = [] # store resultsfor member in ensemble: # iterate over list

tmp.append( result = member.operation(*args, **kwargs) )ensemble = tmp

# list comprehension is already much shorter!ensemble = [m.operation(*args, **kwargs) for m in ensemble]

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 9: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Ensemble Class Ensemble Wrapper

Motivation:Batch ProcessingI In Computational Sciences

repetitive tasks can beautomated/scripted

The Ensemble Class

I Emulate Container TypeI Redirect method calls to

ensemble members

And Ideal Use-case Exampleensemble = Ensemble(*[...]) # create Ensemble object

# apply member methods to entire ensembleensemble = ensemble.operation_1(*args, **kwargs)...ensemble = ensemble.operation_N(*args, **kwargs)

member_N = ensemble[n] # access elements by indexmember_key = ensemble[key] # .. or by name/key...

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 10: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Ensemble Class Ensemble Wrapper

Motivation:Batch ProcessingI In Computational Sciences

repetitive tasks can beautomated/scripted

The Ensemble Class

I Emulate Container TypeI Redirect method calls to

ensemble members

And Ideal Use-case Exampleensemble = Ensemble(*[...]) # create Ensemble object

# apply member methods to entire ensembleensemble = ensemble.operation_1(*args, **kwargs)...ensemble = ensemble.operation_N(*args, **kwargs)

member_N = ensemble[n] # access elements by indexmember_key = ensemble[key] # .. or by name/key...

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 11: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Ensemble Class Ensemble Wrapper

The Ensemble ClassImplementation Snippetclass Ensemble(object):

_members = None # members...

def __getitem__(self, i):# get individual membersif isinstance(i, int):

# access like list/tuplereturn self._members[i]

elif isinstance(i, string):...

def __iter__(self):# iterate over membersmm = self._membersreturn mm.__iter__()

...

Emulating the PythonContainer Type:

1. Support several built-inmethods, such as len ,

contains , iter

2. Item assignment like listor dict using getitemand setitem

Return Values

Calls to member methodsreturn a new container orEnsemble with the results

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 12: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Ensemble Class Ensemble Wrapper

The Ensemble ClassImplementation Snippetclass Ensemble(object):

_members = None # members...

def __getitem__(self, i):# get individual membersif isinstance(i, int):

# access like list/tuplereturn self._members[i]

elif isinstance(i, string):...

def __iter__(self):# iterate over membersmm = self._membersreturn mm.__iter__()

...

Emulating the PythonContainer Type:

1. Support several built-inmethods, such as len ,

contains , iter

2. Item assignment like listor dict using getitemand setitem

Return Values

Calls to member methodsreturn a new container orEnsemble with the results

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 13: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Ensemble Class Ensemble Wrapper

The Ensemble Class

Implementation ofMethod Redirection:

1. Redirect calls to membermethods/attributes byoverloading getattr

2. Execute call on allEnsemble members

3. Return a new container orEnsemble with results

Ensemble Wrapper

Methods require helper ClassEnsWrap to apply arguments

Implementation Snippetclass Ensemble(object):

_members = None # members...

def __getattr__(self, attr):# check if callablemem0 = self._members[0]# assuming homogeneity...f = getattr(mem0,attr)if callable(f):

# return Ensemble Wrapperv = EnsWrap(self,attr)

else:# just return valuesv = [getattr(m,attr) \

for m in self._members]return v

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 14: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Ensemble Class Ensemble Wrapper

The Ensemble Class

Implementation ofMethod Redirection:

1. Redirect calls to membermethods/attributes byoverloading getattr

2. Execute call on allEnsemble members

3. Return a new container orEnsemble with results

Ensemble Wrapper

Methods require helper ClassEnsWrap to apply arguments

Implementation Snippetclass Ensemble(object):

_members = None # members...

def __getattr__(self, attr):# check if callablemem0 = self._members[0]# assuming homogeneity...f = getattr(mem0,attr)if callable(f):

# return Ensemble Wrapperv = EnsWrap(self,attr)

else:# just return valuesv = [getattr(m,attr) \

for m in self._members]return v

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 15: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Ensemble Class Ensemble Wrapper

A Helper ClassImplementation Snippetclass EnsWrap(object):

...

def __init__(self, ens, attr):_ensemble = ens # members_attr = attr # member method

def __call__(self, **kwargs):# iterate over membersnew = Ensemble()for m in self._ensemble:

f = getattr(m,self.attr)# execute member methodnew.append(f(**kwargs))

# return new ensemblereturn new

...

Implementation of theEnsemble Wrapper:

1. Initialize with ensemblemembers and the calledattribute/method

2. Use call method toexecute member methodwith arguments

Parallelization

Simple parallelization usingmultiprocessing.Pool’sapply async can be applied

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 16: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Ensemble Class Ensemble Wrapper

A Helper ClassImplementation Snippetclass EnsWrap(object):

...

def __init__(self, ens, attr):_ensemble = ens # members_attr = attr # member method

def __call__(self, **kwargs):# iterate over membersnew = Ensemble()for m in self._ensemble:

f = getattr(m,self.attr)# execute member methodnew.append(f(**kwargs))

# return new ensemblereturn new

...

Implementation of theEnsemble Wrapper:

1. Initialize with ensemblemembers and the calledattribute/method

2. Use call method toexecute member methodwith arguments

Parallelization

Simple parallelization usingmultiprocessing.Pool’sapply async can be applied

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 17: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Implementation

How can we use Ensembles with Argument Lists

A Trivial Case# this defeats the purposemembers = [member.operation(arg1=arg) for arg in arg_list]Ensemble(*members) # initialize new ensemble

# a better solution: pass list directlyensemble.operation(arg1=arg_list, inner_list=[’arg1’])

Argument lists can easilybe implemented in the

call method of theensemble wrapperEnsWrap by creating a listof arguments for eachmember

# construct argument listargs_list = expandArgList(**kwargs)# loop over listsens = self._ensemblefor m,args in zip(ens,args_list):

f = getattr(m,self.attr)# execute member method with argsnew.append(f(**args))

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 18: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Implementation

How can we use Ensembles with Argument Lists

A More Complex Case: the Outer Product List# again, this defeats the purposearg_list = []for arg1 in arg_list1: # construct arg_list from two lists

for arg2 in arg_list2: # i.e. all possible combinationsarg_list.append(dict(arg1=arg1, arg2=arg2))

# apply list to ensembleensemble.operation(arg1=arg_list, inner_list=[’arg1’])

# a better solution is to expand the lists internallyensemble.operation(arg1=arg_list1, arg2=arg_list2,

outer_list=[’arg1’,’arg2’])

The Outer Product expansion of multiple argument lists createsargument lists with all possible combinations of arguments. InnerProduct expansion works like Python’s zip function.

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 19: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Implementation

Argument Expansionvia Outer Product

Recursive Implementationof Outer Product:

1. Separate expansionarguments from others

2. Recursively expandargument list

3. Generate argument set foreach ensemble member

Decorator Class

Argument Expansion is mostuseful as a Decorator class

Implementation of Recursiondef expandArgsList(args_list,

exp_args, kwargs):# check recursion conditionif len(exp_args) > 0:

# expand argumentsnow_arg = exp_args.pop(0)new_list = [] # new arg listfor narg in kwargs[now_arg]:

for arg_list in args_list:arg_list.append(narg)new_list.append(arg_list)

# next recursion levelargs_list = expandArgsList(

new_list, exp_args, kwargs)...# terminate: return arg listsreturn args_list

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 20: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Implementation

Argument Expansionvia Outer Product

Recursive Implementationof Outer Product:

1. Separate expansionarguments from others

2. Recursively expandargument list

3. Generate argument set foreach ensemble member

Decorator Class

Argument Expansion is mostuseful as a Decorator class

Implementation of Recursiondef expandArgsList(args_list,

exp_args, kwargs):# check recursion conditionif len(exp_args) > 0:

# expand argumentsnow_arg = exp_args.pop(0)new_list = [] # new arg listfor narg in kwargs[now_arg]:

for arg_list in args_list:arg_list.append(narg)new_list.append(arg_list)

# next recursion levelargs_list = expandArgsList(

new_list, exp_args, kwargs)...# terminate: return arg listsreturn args_list

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 21: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion Motivation Implementation

Argument Expansionvia Outer Product

Recursive Implementationof Outer Product:

1. Separate expansionarguments from others

2. Recursively expandargument list

3. Generate argument set foreach ensemble member

Decorator Class

Argument Expansion is mostuseful as a Decorator class

Implementation of Recursiondef expandArgsList(args_list,

exp_args, kwargs):# check recursion conditionif len(exp_args) > 0:

# expand argumentsnow_arg = exp_args.pop(0)new_list = [] # new arg listfor narg in kwargs[now_arg]:

for arg_list in args_list:arg_list.append(narg)new_list.append(arg_list)

# next recursion levelargs_list = expandArgsList(

new_list, exp_args, kwargs)...# terminate: return arg listsreturn args_list

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 22: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion

Summary & Conclusion

The Ensemble ClassI Functions like a container type and redirects

calls to (parallelized) member methods

Argument ExpansionI Systematic expansion of argument lists from

inner or outer product (with decorator)

Sprint Project: Publish Ensemble Class

Create a stand-alone module with the Ensemble class and theargument expansion code for others to use, and add supportfor array-like item access/assignment

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python

Page 23: A Class to Manage Large Ensembles and Batch Execution in ...aerler/... · Motivation: Batch Processing I In Computational Sciences repetitive tasks can be automated/scripted Boilerplate

Introduction Ensemble Class Argument Expansion

Thank You! ∼ Questions?

Andre R. Erler ([email protected]) Large Ensembles and Batch Execution with Python