Construction of Micro Genealogy as “Social DNA Sequencing ...
Transcript of Construction of Micro Genealogy as “Social DNA Sequencing ...
Dealing with Complexity in Society:
From Plurality of Data to Synthetic Indicators
September 17th and 18th, 2015 1
Ji-Ping Lin (corresponding author)
Research Center for Humanities and Social Sciences, Academia Sinica
128, Sec. 2, Academia Rd., Nankang
115 Taipei, Taiwan
E-mail: [email protected]
Construction of Micro Genealogy as “Social DNA
Sequencing” for The Study of Social Assimilation and
Integration: An Approach Using High Performance
Computing (HPC) Applied to Cumulated Micro Data Sets of
Taiwan Indigenous Peoples
Opening Session
OUTLINE
1. Background
2. Objectives
3. Methods
3.1 Conceptualization of “Social DNA”
3.2 Definition of indicators
3.3 Data
3.4 Computing methodology
4. Results
5. Conclusion and Discussion
Ji-Ping Lin Dealing with Complexity in society 2
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
1. Background
Ji-Ping Lin Dealing with Complexity in society
Taiwan Indigenous peoples are a branch of Polynesian-Malaysian
(or Austronesian) ethnic groups in genetic and linguistic context,
whose ancestors have been living in Taiwan 8,000 years before the
influx of Chinese immigrants in the 17th century. Fig 1, Geographic Distribution of the Austronesians
Source: http://www.taiwandna.com/AborigineAustronesia.jpg 3
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
1. Background (cont’d)
Ji-Ping Lin Dealing with Complexity in society
Various Aspects of TIPS like linguistic system & culture infrastrure don’t
support “Traditional Wisdoms”:
e.g.,
1) Law of Geographic Proximity
2) Zipf’s Power Law
e.g. Formosan languages are branch of Austronesian linguistic system,
but are irrelevant to Tibetan-Han linguistic system.
Source: http://historum.com/asian-history/77013-sino-tibetan-languages.html
Source: https://en.wikipedia.org/wiki/Austronesian_languages
Tibetan-Han languages Austanesian languages
4
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
1. Background (cont’d)
Ji-Ping Lin Dealing with Complexity in society
Sakizaya Rukai
Seediq Amis
Tsou Kavalan
Bunun
Paiwan
A Look at TIPs (Taiwan
Indigenous Peoples)
Source: http://thetaiwanphotographer.com/ 5
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
1. Background (cont’d)
Ji-Ping Lin Dealing with Complexity in society
Dao (Yami)
Saisiyat
Truku Puyuma
Source: http://thetaiwanphotographer.com/
Thao Dao (Yami)
A Look at TIPs
(Taiwan Indigenous
Peoples)
6
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
1. Background (cont’d)
Ji-Ping Lin Dealing with Complexity in society
TIPs Population Spatial Distritution:
7
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
1. Background (cont’d)
Ji-Ping Lin Dealing with Complexity in society
Based on the author previous co-authored
studies on the internal migration of TIPs, TIPs
are characterized by four features in terms of
population distribution and migration:
1. geographically segregated population
distribution,
2. very migratory and mostly rural-to-
urban migration,
3. periphery of metropolitan areas serving
as main destination choice for TIPs
rural-to-urban migrants;
4. weak ability of TIPs migrants to make
onward migration and mostly choose
return migration, once repeat migration
occurs (see Map 1). Source: 2000 Taiwan Population Census
8
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
2. Objectives
Ji-Ping Lin Dealing with Complexity in society
To propose formal definition of and to compute two
group-level synthetic indicators measuring social integration
and social identity, that allows us to measure the quantitative
level of social integration and social identify, based on (1)
individual inter- & intra-ethnic marriage indicator derived
from personal marriage match by ethnicity and (2) individual
patriarchy & matriarchy indicator derived from the
constructed micro genealogy.
9
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods
3.1 Conceptualization of “Social DNA”
Ji-Ping Lin Dealing with Complexity in society
Constructing genealogy as a “Social DNA Sequencing”
e.g. a piece of “Social DNA”
10
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
3.2 Definition of indicators
Ji-Ping Lin Dealing with Complexity in society
Intra-ethnic Marriage Pattern as Indicator of Integration
Definition of IEMI & EMSI:
1. Individual level: for any given pair of spouse, inter-ethnic marriage
indicator IEMI = 1 if they share the same ethnicity, otherwise IEMI
=0;
2. Group level: for a given ethnicity, ethnic marriage similarity
indicator (EMSI) is defined as the mean of all IEMIs over all
spouses in the given ethnicity.
11
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
Ji-Ping Lin Dealing with Complexity in society
Identify of Matriarchy & Patriarchy
1. Definition of MI (Matriarchy Indicator): a child’s MI = 1 if
personal registered ethnicity = mother’s ethnicity, otherwise
MI = 0;
2. Definition of PI (Patriarchy Indicator): a child’s PI = 1 if
personal registered ethnicity = father’s ethnicity, otherwise PI
= 0;
3. Definition of group MI & PI: for a given ethnic group, its
ethnic MI & ethnic PI is defined respectively as the mean of
all individual MIs & PIs;
3.2 Definition of indicators
12
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
Ji-Ping Lin Dealing with Complexity in society
3.3 Data
Household registration data
Household ID, Time of data creation, PIN, name, spouse name,
parents’ names, education, age, marital status, address, birth place,
mobility…
1tP
2tP3tP1tP 2tP
死亡或跨國移出
死亡或跨國移出
跨國移入 跨國移入
原住民基礎生活發展資料庫:人口及公務資料整合及動態結構
時間點 t1 t2 t3
教育、勞動及就業、所得、住宅、健保及醫療等公務資料
教育、勞動及就業、所得、住宅、健保及醫療等公務資料
教育、勞動及就業、所得、住宅、健保及醫療等公務資料
連結
連結 連結
連結連結
原住民戶籍資料 原住民戶籍資料 原住民戶籍資料
13
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
Ji-Ping Lin Dealing with Complexity in society
Genealogy: Construction of
Micro Kinship & Friendship
Network
Recursively build-up process (see source code)
spouse
fath
er
moth
er
frie
ndship
Spouse D
ad
spouse
fath
er
moth
er
Spouse fath
er
Spouse m
oth
er
spouse
fath
er
moth
er
Spouse fath
er
Spouse m
oth
er
spouse
fath
er
moth
er
Spouse fath
er
Spouse m
oth
er
spouse
fath
er
moth
er
Spouse fath
er
Sp
ou
se
mo
the
r
spouse
fath
er
moth
er
Spouse fath
er
Spouse m
oth
er
3.4 Computing methodology
14
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
Ji-Ping Lin Dealing with Complexity in society
3.4 Computing methodology (cont’d)
Record matching process 1. Load pooled data bank
into memory (n= 6.2m) 2. Sort pooled data bank
by gender, family name,
given name, and ethnicity
& construct index file
3. Load master data into
memory (n= 530
thousands)
4. Retrieve given and
family names from
master data to quickly
match micro genealogy
info via index file (n= 530
thousands)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
15
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
Ji-Ping Lin Dealing with Complexity in society
3.4 Computing methodology (cont’d)
Manipulation of digital hardware: In-memory computing is used to achieve
genealogy computing by overclocking digital hardware (1) CPUs & (2) IO bus &
(3) DRAM. DRAM overclocking I/O bus overclocking CPUs overclocking
16
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
Ji-Ping Lin Dealing with Complexity in society
3.4 Computing methodology (cont’d)
Why In-memory Computing? to achieve high performance computing to decode
the complexity of intertwined micro social network
17
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
Ji-Ping Lin Dealing with Complexity in society
3.4 Computing methodology (cont’d)
Digital hardware infrastructure for the study: Supermicro A7X9-7f mobo + dual
Intel Xeon E5-2680v2 + 256GB ECC DDR3 1600 + 80GB RAM disk + RAID0 of
2*1TB SATA3 Micron Crucial MX200 SSD + nVidia GTX Titan…
x2
+ +
+
+ +
x2 +
18
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
3. Methods (cont’d)
Ji-Ping Lin Dealing with Complexity in society
3.4 Computing methodology (cont’d)
OS & Programming language: Win8 x.64 Enterprise + x.64 programming
language object Pascal & coding in RAD Studio Delphi ( click here to see codes)
Philae on 67P comet
19
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
4. Results
Ji-Ping Lin Dealing with Complexity in society
4.1 Intra-ethnic Marriage Pattern as Indicator of Integration
20
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
4. Results
Ji-Ping Lin Dealing with Complexity in society
4.1 Intra-ethnic Marriage Pattern as Indicator of Integration
Mean Var
All TIPS 0.15 0.13
Amis 0.24 0.18
Atayal 0.11 0.10
Paiwan 0.12 0.11
Bunun 0.16 0.13
Rukai 0.07 0.06
Puyuma 0.02 0.02
Tsou 0.09 0.08
Saysiyat 0.10 0.09
Tao 0.04 0.04
Thao 0.03 0.03
Kavalan 0.00 0.00
Taroko 0.06 0.05
Sakizaya 0.00 0.00
Sediq 0.01 0.01
Undocumented Indi. 0.04 0.04
EthnicityEthnic Mariage Similarity Indicator
Ethnic marriage similarity
indicator (EMSI) is defined
as the mean of all IEMIs over
all spouses in the given
ethnicity;
Integration declines as EMSI
increases.
In terms of the extent of
integration of TIPS with Taiwan
population system, ethnic
population size is negatively
associated with integration.
21
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
4. Results
Ji-Ping Lin Dealing with Complexity in society
4.2 Identify of Matriarchy & Patriarchy
Mean Var Mean Var
All TIPs 0.59 0.24 All TIPs 0.55 0.25
Amis 0.61 0.24 Amis 0.57 0.25
Atayal 0.56 0.25 Atayal 0.53 0.25
Paiwan 0.56 0.25 Paiwan 0.53 0.25
Bunun 0.65 0.23 Bunun 0.56 0.25
Rukai 0.59 0.24 Rukai 0.55 0.25
Puyuma 0.45 0.25 Puyuma 0.57 0.24
Tsou 0.64 0.23 Tsou 0.48 0.25
Saysiyat 0.54 0.25 Saysiyat 0.62 0.24
Tao 0.41 0.24 Tao 0.66 0.22
Thao 0.38 0.24 Thao 0.62 0.24
Kavalan 0.53 0.26 Kavalan 0.47 0.26
Taroko 0.53 0.25 Taroko 0.51 0.25
Sakizaya 1.00 0.00 Sakizaya 0.00 0.00
Sediq 0.71 0.21 Sediq 0.31 0.22
Ethnic Patriarchy IndicatorEthnicity Ethnicity
Ethnic Matriarchy Indicator
In terms of ethnic
identify, TIPSs’
matriarchy identity
tends to outweigh
patriarchy identity a
little bit;
This finding fits
general wisdom and
TIPSs cultural
tradition.
22
Opening Session
1. Background
2. Objectives
3. Methods
4. Results
5. Conclusion and Discussion
5. Conclusion and Discussion
Ji-Ping Lin Dealing with Complexity in society
1. With gradual availability of massive micro data & decline of digital
hardware costs, computation for social complexity like the
construction of micro genealogy becomes feasible;
2. But computing issues are challenging & total costs of computing
are still time expensive;
3. The emerging data science that integrates multi-disciplinary
skills & knowledge of “hacking skills”, “advanced math/stat”,
and “domain knowledge” is crucial to overcome such
constraint.
23