Megatrend and Intervention Impact Analyzer for Jobs · Impact Analyzer for Jobs Toomas Kirt On...
Transcript of Megatrend and Intervention Impact Analyzer for Jobs · Impact Analyzer for Jobs Toomas Kirt On...
Megatrend and Intervention
Impact Analyzer for Jobs
Toomas Kirt
On behalf of Estonian hackathon team
Outline
Background
Data
Tools
Solution
Toomas Kirt 15 June, 2017
Background
The European Big Data Hackathon took place in Brussels
from 13 to 15 March 2017 and was organised by the
European Commission (Eurostat).
The policy question for the hackathon: How would you
support the design of policies for reducing mismatch
between jobs and skills at regional level in the EU through
the use of data?
Toomas Kirt 15 June, 2017
Contributions
Innar Liiv – key ideas
and presentation
Rain Öpik –
PosgreSQL and
programming the
visualization tool
Toomas Kirt – Hadoop
and organization
Toomas Kirt 15 June, 2017
Sources of this presentation
Toomas Kirt 15 June, 2017
https://github.com/rainopik/eubdhack-megatrend
https://rainopik.github.io/eubdhack-megatrend/
Motivation – changes in the job market
It is found that across
the OECD countries,
on average 9 % of jobs
are automatable (Arntz,
Gregory and Zierahn
2016).
Toomas Kirt 15 June, 2017
Not all jobs are lost As Lerman and Schmidt (2005)
have found around the
appearance of the first personal
computers in the mid-seventies
and 1983, computer industry
jobs in the United States grew
almost 80 percent, while total
U.S. manufacturing employment
increased by only 4 percent.
But the new jobs need new skills.
Toomas Kirt 15 June, 2017
https://www.wired.com/2012/12/ff-robots-will-take-our-jobs/
The main questions What is the impact of a
megatrend or an intervention
to the labour market?
Which parts of labour market
of what country is most
vulnerable to approaching
megatrend or planned
intervention?
The main contributions The development of a
method to represent the
complex labour market
internal structure from the
perspective of occupations
sharing skills.
Developing and presenting
the prototype.
Toomas Kirt 15 June, 2017
Solutions - Graph tools
The Real Difference
Between Google And
Apple
Toomas Kirt 15 June, 2017
https://www.fastcodesign.com/3068474/the-real-difference-between-google-and-apple
Datasets
EURES CV and job vacancy dataset (see
http://eures.europa.eu/);
ESCO RDF, converted to relational structure suitable for
SQL (European Commission 2013);
List of Jobs Susceptible for Automation / Computerization
(Benedikt and Osborne 2017);
Occupation classifications mapping table from Occupation
classifications crosswalks - from O*NET-SOC to ISCO
(Wojciech, Autor and Acemoglu 2016)
Toomas Kirt 15 June, 2017
EURES
To develop our prototype, we have used CV and job
vacancy data from EURES portal and the ESCO, the
multilingual classification of European Skills,
Competences, Qualifications and Occupations,
datasets. The EURES data consist of two datasets,
one on curriculum vitae (4.7 million lines) stuck up by
jobseekers and another on job vacancies (35 million
lines) published by potential employers.
Toomas Kirt 15 June, 2017
ESCO database The ESCO system provides
occupational profiles showing the
relationships between
occupations, skills, competences
and qualifications (European
Commission 2013). The ESCO
dataset provided 65814
relationships between skills and
occupations; it contained 619
ISCO and 2950 ESCO
occupations.
Toomas Kirt 15 June, 2017
https://www.slideshare.net/lod2project/esco-a-tool-to-facilitate-online-skills-matching-throughout-europe-2612011-brussels-belgium
Occupation classifications mapping
To demonstrate the visualization of this
megatrend on labour market, the list of
jobs susceptible for automation from
scientific articles was extracted and
O*NET-SOC standard was linked to
ISCO in order to link the job data with
datasets provided by European Big
Data Hackathon.
Toomas Kirt 15 June, 2017
Tools
Toomas Kirt 15 June, 2017
The data processing pipeline
Toomas Kirt 15 June, 2017
The occupation graph was built with PostgreSQL. Data was stored in two
denormalized tables: g_link - linking similar occupations together and g_node -
annotating occupations with supply and demand data.
Counting the number of unique job seekers and vacancies by occupations and
different countries was conducted by Hive.
The ESCO classifier was originally presented in a RDF format, as a list of
semantic triples in the subject-predicate-object format and was converted to
relational structure suitable for SQL
The visualizer was designed to work without a server and all the data was
therefore converted to csv files.
https://github.com/rainopik/eubdhack-megatrend
Occupation graph A graph is defined by two
entities:
Node - denotes an ESCO
occupation. Each occupation
may have additional data
attributes attached to it.
Link - two nodes (occupations)
are connected when they are
similar to each other.
Toomas Kirt 15 June, 2017
https://github.com/rainopik/eubdhack-megatrend
Occupations similarity When the number of distinct
skills that are required for both
occupations (22 for this example)
was divided by the number of
distinct skills required for the first
occupation (35), the proportion of
matching skills was then used as
a similarity measure between
these two occupations.
Only 3 most similar occupations
were taken for every occupation.
Toomas Kirt 15 June, 2017
https://github.com/rainopik/eubdhack-megatrend
Annotating occupations with supply
and demand data Each node in the occupation graph denotes
ESCO occupation.
How this occupation will be affected by
automation or computerization, but the list of
Jobs Suspectible for Automation has originally
SOC occupation codes.
Mapping ISCO to SOC is one-to-many, which
means that some ISCO occupations (eg. 8332 -
Heavy truck and lorry drivers) are associated
with several SOC occupations (53-1031 -
Driver/Sales Workers and 53-3032 - Heavy and
Tractor-Trailer Truck Drivers) that may have
differing probabilities for automation
(respectively 0.98 and 0.79). To solve this
ambiguity, two probabilities were calculated -
maximum and average.
Toomas Kirt 15 June, 2017
https://github.com/rainopik/eubdhack-megatrend
Graph visualization The graph visualizer is built with
d3.js
As the real-time calculation of
graph layout (the position of
every node) with d3.js may be
slow for graphs with non-trivial
structure, therefore for the
occupation graph with 2950
nodes and 8838 links the
positions of graph nodes were
pre-calculated.
Toomas Kirt 15 June, 2017
https://d3js.org/
Solution The source code of the prototype
is released as open source at
https://github.com/rainopik/eubdh
ack-megatrend
The prototype is available online
at
https://rainopik.github.io/eubdhac
k-megatrend/
Preferred to use with Google
Chrome
Toomas Kirt 15 June, 2017
A close-up of the occupation graph
Toomas Kirt 15 June, 2017 https://rainopik.github.io/eubdhack-megatrend/
Demand & supply imbalance The default mode (Show imbalance
unchecked) calculates the saturation
(“brightness” of the red colour) of the left
and the right half of the node on the
same scale.
Enabling the Show imbalance mode
normalizes both colours on the same
scale. This visualizes imbalance - when
the left half of the node is brighter red
compared to the right, this job has
unsatisfied demand. Conversely, a
brighter right half marks jobs with
excessive of job seekers.
Toomas Kirt 15 June, 2017
https://rainopik.github.io/eubdhack-megatrend/
Conclusions
Changes in a society in information age and creation of huge
quantities of data are also creating challenges for the national
statistics offices.
There are first attempts to use big data sources and generate
new type of statistics.
With our tool we provide a new way to foresee the changes in
labour market.
For reducing the negative impact of changes we need to use
new tools and data sources to react accurately and timely to
them.
Toomas Kirt 15 June, 2017
THANK YOU! [email protected]
Toomas Kirt 15 June, 2017
Toomas Kirt 15 June, 2017