13/05/2013 Gijs Dekkers Gaëtan de Menten Raphaël Desmet
description
Transcript of 13/05/2013 Gijs Dekkers Gaëtan de Menten Raphaël Desmet
plan.be
13/05/2013Gijs Dekkers
Gaëtan de MentenRaphaël Desmet
LIAM2Introduction and
demo model
Présentation pour le pôle Prévoyance de la Caisse de Dépôt et de Gestion
Rabat, Maroc.
plan.be
• Tool for the development of dynamic microsimulation models with dynamic cross-sectional ageing.• ≠ a microsimulation model (<> Midas)• Simulation framework that allows for comprehensive
modelling and various simulation techniques• Prospective / Retrospective simulation•Work in progress …• Immigration•Weights• More sophisticated regressions and simulation techniques• Speed optimisation • You get it for free!
Introduction to Liam2
plan.be
• Check http://liam2.plan.be• This website contains• The LIAM 2 executable.• A synthetic dataset of 20,200 individuals grouped in 14,700
households in HDF5 format.• A small model containing• Fertility and mortality (aligned)• Educational attainment level• Some labour market characteristics• Documentation• A LIAM 2 user guide • A ready-to-use “bundle” of notepad integrated with LIAM 2
and the synthetic dataset.
How to get it.
plan.be
•Written in Python• High level open source language• Efficient libraries mostly C• Input• Model description: text file (YAML)• Alignment: CSV files• Internal data engine: HDF5 file format and library for storing
scientific data (meteorology, astronomy, …)• Output• HDF5 file, CSV file on demand• Interactive console
Overview
plan.be
• Declare entities (=data)•What is modelled? (person, household, enterprise, …)• Entity characteristics• fields: • what do we know about an individual?• what do we want to know?• How can we store the data? • Flag: boolean (eg. alive/dead, male/female)• discrete/category: integer (eg. single/married/divorced/…)• Continue/value: float (eg. Income)
• links: interaction between entities• same kind: who is the mother?• different kinds: in what household does the person live?
• Globals (=data)• External time series• Eg. macroeconomic context
Model definition: the simulation file
plan.be
• Simulation (=model)• Processes:• What happens to the entities in their lives?• In what order?• input: Which input file to use?• output: Where is the output? • start period:• periods: How many periods do we want to simulate?
Model definition: the simulation file
plan.be
entities: person: fields: # period and id are implicit - age: int - gender: bool processes: age: age + 1 isfemale: gender = True simulation: processes: - person: [age, isfemale] input:
file: base.h5 output:
file: output.h5 start_period: 2002 # first simulated period periods: 20
Simple simulation file
plan.be
Liam2 bundled with Notepad++
model/YAML
Interactive console
plan.be
• Simulation file (YAML-format, yml extension, highlighting)
• indentation (grouping, levels)• colon, dash, brackets, double quotes, quotes, ...• comment (#)
• Console • run: F6• import: F5• output• interactive (history)
Liam2 bundled with Notepad++
plan.be
• First simulation• simple entity • simple functions• first run• some output
LIAM 2 – demo model
plan.be
basic simulation setup
demo01.yml
plan.be
• Description of the data : entities• fields: • name • type = bool (boolean), int (integer) or float• initialdata (data from input or new data)
• The model definition: processes•model definition (transformation, regressions, alignment, ...)• Order of the processes: simulation• database (input, output)• what processes and when? (model order)• start_period, # periods
Basic setup
plan.be
entities: person: fields: # period and id are implicit - age: int - gender: bool processes: age: age + 1
simulation: processes: - person: [age] input:
file: base.h5 output:
file: output.h5 start_period: 2002 # first simulated period periods: 20
Basic simulation file
plan.be
entities: person: fields: # period and id are implicit - age: int - gender: bool # fields not present in input - agegroup: {type: int, initialdata: false} processes: age: age + 1 agegroup: 5 * trunc(age / 5)simulation: processes: - person: [age, agegroup] input: file: simple2001.h5 output: file: simulation.h5 start_period: 2002 periods: 2
Simple simulation (to run the file, press F6)
plan.be
Using simulation file: 'C:\usr\Liam2Suite\Synthetic\demo00.yml'reading data from C:\usr\Liam2Suite\Synthetic\simple2001.h5 ...person ...…period 2002- loading input data * person ... done (0 ms elapsed). -> 20200 individuals- 1/2 age ... done (2 ms elapsed).- 2/2 agegroup ... done (3 ms elapsed).- storing period data * person ... done (2 ms elapsed). -> 20200 individualsperiod 2002 done (0.01 second elapsed).…period 2003…top 10 processes: - agegroup: 0.01 second - age: 3 mstotal for top 10 processes: 0.01 second
Console output
plan.be
• Internal format = HDF5 file•Write to the console• show(expr1[, expr2, … ]):
evaluates the expressions and shows the result• dump(expr1[, expr2, …, filter, missing, header):
produces a table with the expressions given as argument evaluated over many (possibly all) individuals of the dataset.
•Write to CSV-files• csv(expr1[, expr2, …, suffix, fname, mode]):
function writes values to a csv-file• Pivot tables:• groupby(expr1[, expr2, …, filter=None, percent=False])
Output
plan.be
• Expressions• Arithmetic operators: +, -, *, /, ** (exponent), % (modulo) • Comparison operators: <, <=, ==, !=, >=, > • Boolean operators: and, or, not • Conditional expressions:
if(condition, expression_if_true, expression_if_false)• Mathematical functions• abs, log, exp, round, trunc, ...• Aggregate functions• grpcount, grpsum, grpavg, grpstd, grpmax, grpmin• Temporal functions• lag, value_for_period, duration, tavg, tsum• Random functions• Uniform, normal, randint
Some functions
plan.be
entities: person: fields: # period and id are implicit - age: int - gender: bool # fields not present in input - agegroup: {type: int, initialdata: false} processes: age: age + 1 agegroup: if(age < 50, 5 * trunc(age / 5), 10 * trunc(age / 10)) # produces 2 csv files (one per period): "person_20xx.csv“
# default name for csv-file = {entity}_{period}.csv dump_info: csv(dump(id, age, gender)) show_demography: show(groupby(agegroup, gender))
Simple simulation (to run the file, press F6)
plan.be
simulation: processes: - person: [age, agegroup,
dump_info, show_demography] input: file: simple2001.h5 output: file: simulation.h5 # first simulated period start_period: 2002 periods: 2
…
plan.be
Welcome to LIAM interactive console. help: print this help q[uit] or exit: quit the program entity [name]: set the current entity (this is required before any query) period [period]: set the current period (if not set, uses the last period simulated) fields [entity]: list the fields of that entity (or the current entity) show is implicit on all commands
>>> period 2002current period set to 2002>>> entity personcurrent entity set to person>>> grpcount(gender)10100>>> grpcount(not gender)10100
Interactive console
plan.be
• All output functions can be used both during the simulation and in the interactive console• Some examples - show • show(groupby(age, gender, filter=age<=10))• show(grpcount(age >= 18)) • show(grpcount(not dead), grpavg(age, filter=not dead)) • show("Count:", grpcount(),
"\nAverage age:", grpavg(age), "\nAge std dev:", grpstd(age))
• Some examples – csv• csv(grpavg(age))• csv(period, grpavg(age), fname=‘avg_income.csv’, mode=‘a’)• Some examples – groupby• groupby(trunc(age/10),gender)• groupby(trunc(age/10),gender, percent=True)
Remarks
plan.be
links, init, procedures, choice
demo02.yml
plan.be
• second entity (eg household)• links: interaction between entities (eg. persons,
households)• one2many (one household has many persons)
Links: model interaction
person: fields: # period and id are implicit - age: int - gender: bool ... - hh_id: int
household: fields: # period and id are implicit - nb_persons: int - nb_children: int
links: persons: {type: one2many,
target: person, field: hh_id}
plan.be
entities: household: fields: # period and id are implicit - nb_persons: {type: int, initialdata: false} - nb_children: {type: int, initialdata: false} links: persons: {type: one2many, target: person, field: hh_id} processes: household_composition: - nb_persons: countlink(persons) - nb_children: countlink(persons, age < 18)
To use information stored in the linked entities you have to use aggregate functions• countlink (eg. countlink(persons) gives the numbers of persons in the household) • sumlink (eg. sumlink(persons, income) sums up all incomes from the members in a
household) • avglink (eg. avglink(persons, age) gives the average age of the members in a
household) • minlink, maxlink (eg. minlink(persons, age) gives the age of the youngest member of
the household)
Use the links: aggregate functions
plan.be
entities: person: fields: - age: int - gender: bool # link fields - hh_id: int links: household: {type: many2one, target: household, field: hh_id}
many2one : link the item of the entity to one other item in the same (eg. a person to its mother) or another entity (eg. a person to its household).To access a the value field of a linked item, you use:
link_name.field_nameprocesses: # produces "person_20xx_info.csv" dump_info: csv(dump(id, age, gender, household.nb_persons), suffix='info')
many2one and the “.”-function
id age gender hh_id household.nb_persons
0 1 TRUE 0 5
1 1 TRUE 0 5
2 1 FALSE 0 5
3 1 FALSE 0 5
4 2 TRUE 2 3
5 2 FALSE 2 3
6 3 TRUE 4 3
7 3 FALSE 4 3
8 4 TRUE 6 3
9 4 FALSE 6 3
10 6 TRUE 12 3
plan.be
person: fields: # period and id are implicit - age: int - gender: bool # link fields - mother_id: int - partner_id: int - hh_id: int
links: mother: {type: many2one, target: person, field: mother_id} partner: {type: many2one, target: person, field: partner_id} household: {type: many2one, target: household, field: hh_id} children: {type: one2many, target: person, field: mother_id}
Some examples:mother.agemother.mother.ageage - partner.age
many2one and the “.”-function
plan.be
simulation: init: - household: [init_region_id, household_composition] processes: - household: [household_composition] - person: [ageing, dump_info] input: file: simple2001.h5 output: file: simulation.h5 # first simulated period start_period: 2002 periods: 2
init: executes the processes in start_period - 1 (here 2001) to initialise the household variablesprocesses: executes in 2002, 2003
Simulation: init - processes
plan.be
processes: ageing: - age: age + 1 - juniors: 5 * trunc(age / 5) - plus50: 10 * trunc(age / 10) - agegroup: if(age < 50, juniors, plus50)
dump_info: csv(dump(id, age, gender, hh_id, household.nb_persons, mother.age, partner.age), suffix='info') show_demography: show(groupby(agegroup, gender))
• procedures• single process (ex. dump_info)
•multi process (ex. ageing)
• local variables• temporary: only available in the ageing procedure• not stored (ex. juniors, plus50 in the ageing procedure)
Simulation: procedures – local variables
plan.be
entities: household: fields: # period and id are implicit - nb_persons: {type: int, initialdata: false} - nb_children: {type: int, initialdata: false} - region_id: {type: int, initialdata: false} links: persons: {type: one2many, target: person, field: hh_id} processes: init_region_id: - region_id: choice([0, 1, 2, 3], [0.1, 0.2, 0.3, 0.4])
• choice• region_id: 10% chance to get 0, 20% for 1, 30% for 2 and 40% for
3
• beware: sum of prob. = 100%
Stochastic changes I: probabilistic simulation
plan.be
regressions, macros, new, remove
demo03.yml
plan.be
• Logit: • logit_regr(expr, filter=None, align=percentage)• logit_regr(expr, filter=None, align='filename.csv')• Alignment :• align(expr, [take=take_filter,] [leave=leave_filter,] fname=’filename.csv’)• Continuous (expr + normal(0, 1) * mult + error_var):• cont_regr(expr, filter, mult, error_var)• Clipped continuous (always positive): • clip_regr(expr, filter, mult, error_var)• Log continuous (exponential of continuous): • log_regr(expr, filter, mult, error_var)
Stochastic changes II: behavioural equations
plan.be
processes: ageing: - age: age + 1 birth: - to_give_birth: logit_regr(0.0, filter=not gender and (age >= 15) and (age <= 50), align='al_p_birth.csv')
• logit_regr(expr, filter, align)• Expr
• filter: select individuals from entity
• apply alignment using al_p_birth.csv
logit + align example
age period
2002 2003 2004 2005 2006 2007
15 0.000678 0.000781 0.000723 0.000887 0.000727 0.000518
16 0.001857 0.001669 0.001765 0.001756 0.001819 0.001484
17 0.005945 0.005571 0.005637 0.00562 0.005594 0.003394
18 0.011406 0.011575 0.010119 0.010185 0.010387 0.006493
19 0.019842 0.022631 0.020889 0.020496 0.018818 0.01192
20 0.031668 0.029328 0.029358 0.029875 0.02996 0.020262
21 0.039381 0.041242 0.040153 0.039742 0.040004 0.032043
22 0.050043 0.049698 0.050599 0.049155 0.049004 0.047723
plan.be
macros: easier to read, maintainprocesses: ageing: - age: age + 1 birth: - to_give_birth: logit_regr(0.0, filter=not gender and (age >= 15) and (age <= 50), align='al_p_birth.csv')
person: fields: - age: int
. . . macros: MALE: True FEMALE: False ISMALE: gender ISFEMALE: not gender
processes: ageing: - age: age + 1 birth: - to_give_birth: logit_regr(0.0, filter=ISFEMALE and (age >= 15) and (age <= 50), align='al_p_birth.csv')
•macros• defined on entity level
• re-evaluated on each execution
plan.be
birth: - to_give_birth: logit_regr(0.0, filter=ISFEMALE and (age >= 15) and (age <= 50), align='al_p_birth.csv') - new('person', filter=to_give_birth, mother_id = id, hh_id = hh_id, age = 0, partner_id = UNSET, civilstate = SINGLE, gender = choice([MALE, FEMALE], [0.51, 0.49]) )
• new• entity name: what (same or other eg. household on marriage)
• filter: who
• set initial values to a selection of variables
Life cycle functions – new – create new entities
plan.be
death: - dead: if(ISMALE, logit_regr(0.0, align='al_p_dead_m.csv'), logit_regr(0.0, align='al_p_dead_f.csv')) - civilstate: if(partner.dead, WIDOW, civilstate) - partner_id: if(partner.dead, UNSET, partner_id) - show('Avg age of dead men', grpavg(age, filter=dead and ISMALE)) - show('Avg age of dead women', grpavg(age, filter=dead and ISFEMALE)) - show('Widows', grpsum(ISWIDOW)) - remove(dead)
• remove• filter: who has to removed?
• Item is removed form the entity set• No data is available for that period and later• Historical data is still accessible• Links must be cleaned manually if necessary
Life cycle functions – remove – remove entities
plan.be
entities: household: fields: - nb_persons: {type: int, initialdata: false} links: persons: {type: one2many, target: person, field: hh_id} processes: household_composition: - nb_persons: countlink(persons) - nb_children: countlink(persons, age < 18) clean_empty: remove(nb_persons == 0). . .simulation: processes: - person: [list of processes] - household: [household_composition, clean_empty]
Remove empty households
plan.be
• show and dump functions• skip_shows: if set to True, annuls all show() functions• interactive console• period• entity• output: aggregate, groupby functions• breakpoint• breakpoint ()• breakpoint(2021)• step (or s)• resume (or r)• random_seed• fix random seed: if you want to have several runs of a
simulation use the same random numbers.
Debugging possibilities
plan.be
matching, change links demo04.yml
plan.be
•matches individuals from subset 1 with individuals from subset 2• Give each individual in subset 1 a particular order (orderby)• Compute the score of all (unmatched) individuals in subset 2 • take the best score
matching(set1filter=boolean_expr,set2filter=boolean_expr,orderby=difficult_match,score=coef1 * field1 + coef2 * other.field2 + ...)
Matching - aka Marriage market
plan.be
marriage: - in_couple: ISMARRIED - to_couple: if((age >= 18) and (age <= 90) and not in_couple, if(ISMALE, logit_regr(0.0, align='al_p_mmkt_m.csv'), logit_regr(0.0, align='al_p_mmkt_f.csv')), False) - difficult_match: if(to_couple and ISFEMALE,
abs(age - grpavg(age, filter=to_couple and ISMALE)), nan) - partner_id: if(to_couple, matching(set1filter=ISFEMALE, set2filter=ISMALE, score=- 0.4893 * other.age + 0.0131 * other.age ** 2 ...
orderby=difficult_match), partner_id) - justcoupled: to_couple and (partner_id != UNSET) - civilstate: if(justcoupled, MARRIED, civilstate)
Marriage
plan.be
marriage: - in_couple: ISMARRIED ... - civilstate: if(justcoupled, MARRIED, civilstate) - newhousehold: new('household', filter=justcoupled and ISFEMALE,
region_id=choice([0, 1, 2, 3], [0.1, 0.2, 0.3, 0.4])) - hh_id: if(justcoupled, if(ISMALE, partner.newhousehold, newhousehold), hh_id)
- csv(dump(id, age, gender, partner.id, partner.age, partner.gender, hh_id, filter=justcoupled), suffix='new_couples')
• new link• change the value of the linked field
New links, change links
plan.be
break links, lagdemo05.yml
plan.be
divorce: - agediff: if(ISFEMALE and ISMARRIED, age - partner.age, 0) # select females to divorce - divorce: logit_regr(0.6713593 * household.nb_children
- 0.0785202 * dur_in_couple + 0.1429621 * agediff - 0.0088308 * agediff **2 - 4.546278,
filter = ISFEMALE and ISMARRIED and (dur_in_couple > 0), align = 'al_p_divorce.csv') # break link to partner - to_divorce: divorce or partner.divorce - partner_id: if(to_divorce, UNSET, partner_id) - civilstate: if(to_divorce, DIVORCED, civilstate) - dur_in_couple: if(to_divorce, 0, dur_in_couple) # move out males - hh_id: if(ISMALE and to_divorce,
new('household', region_id=household.region_id), hh_id)
Remove links
plan.be
globals, regr + aligndemo06.yml
plan.be
ineducation: # unemployed if graduated - workstate: if(ISSTUDENT and (((age >= 16) and IS_LOWER_SECONDARY_EDU) or ((age >= 19) and IS_UPPER_SECONDARY_EDU) or ((age >= 24) and IS_TERTIARY_EDU)), UNEMPLOYED, workstate) - show('num students', grpsum(ISSTUDENT))
1. Graduate people
plan.be
globals: periodic: - WEMRA: float
2. Retire people
# retire - workstate: if(ISMALE, if((age >= 65), RETIRED, workstate), if((age >= WEMRA), RETIRED, workstate))
• globals• variables that do not relate to any particular entity • periodic globals can have a different value for each period
plan.be
inwork: - work_score: UNSET # men - work_score: if(ISMALE and (age > 15) and (age < 65) and ISINWORK, logit_score(-0.196599 * age + 0.0086552 * age **2 - 0.000988 * age **3 + 0.1892796 * ISMARRIED + 3.554612), work_score) - work_score: if(ISMALE and (age > 15) and (age < 50) and ISUNEMPLOYED, logit_score(0.9780908 * age - 0.0261765 * age **2 + 0.000199 * age **3 - 12.39108), work_score) # women … # align on Number of Workers / Population by age class - work: if((age > 15) and (age < 65), if(ISMALE, align(work_score, leave=ISSTUDENT or ISRETIRED,
fname='al_p_inwork_m.csv'), align(work_score, leave=ISSTUDENT or ISRETIRED,
fname='al_p_inwork_f.csv')), False) - workstate: if(work, INWORK, workstate) - workstate: if(not work and lag(ISINWORK), -1, workstate)
3. Pick people … to work in 2002
plan.be
unemp_process: - unemp_score: -1 - unemp_condition: (age > 15) and (age < 65) and not ISINWORK # Probability of being unemployed from being unemployed previously - unemp_score: if(unemp_condition and lag(ISUNEMPLOYED), logit_score(- 0.1988979 * age + 0.0026222 * age **2 + ...), unemp_score) # Probability of being unemployed from being inwork previously - unemp_score: if(unemp_condition_m and lag(ISINWORK), logit_score(0.1396404 * age - 0.0024024 * age **2 + ...), unemp_score) # Alignment of unemployment based on those not selected by inwork # [Number of new unemployed / (Population - Number of Workers)] by age # The here below condition must correspond to the here above denumerator - unemp: if((age > 15) and (age < 65) and not ISINWORK, align(unemp_score, leave=ISSTUDENT or ISRETIRED, fname='al_p_unemployed.csv'), False) - workstate: if(unemp, UNEMPLOYED, workstate) - workstate: if((workstate==-1) and not unemp, OTHERINACTIVE, workstate)
4. Pick people … to be unemployed in 2002 + 5. Remain …
plan.be
import datademo_import.yml
plan.be
# this is an "import" file. To use it press F5 in liam2 environment, or run# the following command in a console: # INSTALL_PATH\liam2 import demo_import.yml
output: simple2001.h5 entities: person: path: input\person.csv fields: # period and id are implicit - age: int - gender: bool - ...
household: path: input\household.csv # if fields are not specified, they are all imported
Import data (to run the file, press F5)
plan.be
globals: periodic: path: input\globals_transposed.csv transposed: trueentities: person: path: input\person.csv fields: - age: int - gender: bool # if you want to keep your csv files intact but you use different names# in your simulation that in the csv files, you can specify name changes# here. The format is: "newname: oldname" oldnames: gender: male # if you want to invert the value of some boolean fields (True -> False# and False -> True), add them to the "invert" list below. invert: [gender]
Optional