Multiperspective visualization of genealogy data

51
Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet g n i p ö k r r o N 4 7 1 0 6 n e d e w S , g n i p ö k r r o N 4 7 1 0 6 - E S LiU-ITN-TEK-A--18/023--SE Multiperspective visualization of genealogy data Anna Georgelis 2018-06-14

Transcript of Multiperspective visualization of genealogy data

Page 1: Multiperspective visualization of genealogy data

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet

gnipökrroN 47 106 nedewS ,gnipökrroN 47 106-ES

LiU-ITN-TEK-A--18/023--SE

Multiperspectivevisualization of genealogy

dataAnna Georgelis

2018-06-14

Page 2: Multiperspective visualization of genealogy data

LiU-ITN-TEK-A--18/023--SE

Multiperspectivevisualization of genealogy

dataExamensarbete utfört i Medieteknik

vid Tekniska högskolan vidLinköpings universitet

Anna Georgelis

Handledare Katerina VrotsouExaminator Camilla Forsell

Norrköping 2018-06-14

Page 3: Multiperspective visualization of genealogy data

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Anna Georgelis

Page 4: Multiperspective visualization of genealogy data

LINKÖPING UNIVERSITY

MASTER THESIS

Multiperspective Visualization ofGenealogy Data

Author:Anna GEORGELIS

Supervisor:Katerina VROTSOU

June 19, 2018

Page 5: Multiperspective visualization of genealogy data

i

AbstractThis thesis presents and discusses the implementation of a web application developedas a Master’s degree project at Linköping University. The application is a tool offeringa multiperspective visualization of genealogy data, that can be used by genealogistsin order to analyze his or her collected family tree data, but also to find what datathat may be wrong. Data stored in a GEDCom file is being processed and storedin a database. By using D3.js, the data is then visualized in three different typesof representations: an ancestor tree, a sunburst chart and a lifeline representation,all interacting with each other. The work concludes that by using different types ofvisualizations to present the same data, it is possible to create a genealogy applicationwhere new kind of insights about the data can be gained.

Page 6: Multiperspective visualization of genealogy data

ii

Acknowledgements

First of all I want to thank my supervisor Katerina Vrotsou and my examiner CamillaForsell for all their help, discussions and feedback during this thesis work. I alsowant to thank Per Filipsson and Hjalmar Granberg for contributing with interestingdiscussions, knowledge, ideas and feedback from a genealogical point of view.

To my family I want to say a big thank you for their continuous support throughmy studies. And last but not least, thanks to you Bassam for always being there forme!

Norrköping, June 2018Anna Georgelis

Page 7: Multiperspective visualization of genealogy data

iii

Contents

Abstract i

Acknowledgements ii

Glossary vii

1 Introduction 1

1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 3

2.1 Genealogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 GEDCom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1 Tree Drawing Algorithms . . . . . . . . . . . . . . . . . . . . . . 52.3.2 Time Focused Visualization In Genealogy . . . . . . . . . . . . . 8

3 Method 15

3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Data Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4.1 Ancestor Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4.2 Sunburst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4.3 Lifeline Representation . . . . . . . . . . . . . . . . . . . . . . . 21

4 Results 22

4.1 Ancestor Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2 Sunburst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3 Lifelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.4 Colors and Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.5 Evaluation Meeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Discussion 30

5.1 Ancestor Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2 Sunburst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.3 Lifeline Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.4 Colors and Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Page 8: Multiperspective visualization of genealogy data

iv

6 Conclusion & Future work 36

6.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Page 9: Multiperspective visualization of genealogy data

v

List of Figures

2.1 A simplified GEDCom file. . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 An individual record from the sample GEDCom file. . . . . . . . . . . . 52.3 A family record from the sample GEDCom file. . . . . . . . . . . . . . . 52.4 The same data set drawn in different formations of a binary tree. The

numbers indicates the level of the nodes (where level 0 is the root). . . 62.5 Genalogy data visualized in different formations. . . . . . . . . . . . . 62.6 The same sample tree, created with the two algorithms presented by

Wetherell and Shannon [1]. . . . . . . . . . . . . . . . . . . . . . . . . . 72.7 A sample tree drawn by different algorithms. . . . . . . . . . . . . . . . 72.8 A lifeline visualization created with TimeNets [2]. . . . . . . . . . . . . 82.9 A pedigree chart created with Genelines [3]. . . . . . . . . . . . . . . . . 92.10 A full descendants chart created with Genelines [3]. . . . . . . . . . . . 102.11 A family group chart created with Genelines[3]. . . . . . . . . . . . . . 102.12 A direct line chart created with Genelines [3]. . . . . . . . . . . . . . . . 112.13 A fan chart created with Genelines [3]. . . . . . . . . . . . . . . . . . . . 112.14 An example of an ancestor tree created by Mukaliyev [4]. . . . . . . . . 122.15 An example of a descendants tree created by Mukaliyev [4]. . . . . . . 122.16 An example of a visualization created by Mukaliyev. Person 1 is the

central person, and person 2,3 and 4 are the ancestors fulfilling the ruleof having an only upgoing path of birth lines to the central person [4]. 13

2.17 A lifeline where personal events has been added [4]. . . . . . . . . . . . 14

3.1 The three type of collections stored in the database. . . . . . . . . . . . 163.2 An example of an hierarchical array, written with pseudocode. . . . . . 183.3 Two versions of the same sample tree. . . . . . . . . . . . . . . . . . . . 193.4 The drawing process of the nodes. . . . . . . . . . . . . . . . . . . . . . 203.5 A comparison of the two sizing techniques. As can be seen the red

node in the left image is much smaller than the same node colored inred in the right image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1 The final result of the application. . . . . . . . . . . . . . . . . . . . . . . 224.2 An ancestor tree created with the application. . . . . . . . . . . . . . . . 234.3 A pop-up window appears when hovering a person in the tree. . . . . 234.4 A sunburst chart created with the application. . . . . . . . . . . . . . . 244.5 The sunburst chart when hovering a person. . . . . . . . . . . . . . . . 244.6 Multiple paths of ancestors can be highlighted. . . . . . . . . . . . . . . 244.7 If a node is missing data, it will be drawn in red. . . . . . . . . . . . . . 244.8 A lifeline representation created with the application. . . . . . . . . . . 254.9 The lifelines sorted and colored based on their sex. . . . . . . . . . . . . 254.10 The lifeline representation when a line is being hovered. . . . . . . . . 264.11 The paths of ancestors highlighted in the sunburst chart, drawn in dif-

ferent colors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.12 The lines represents the average age of each century. . . . . . . . . . . . 26

Page 10: Multiperspective visualization of genealogy data

vi

4.13 The default colors used in the application. . . . . . . . . . . . . . . . . . 274.14 The colors used in special cases. . . . . . . . . . . . . . . . . . . . . . . . 274.15 The three visualizations of the application interacting with each other. 274.16 The first version of the ancestor tree. . . . . . . . . . . . . . . . . . . . . 284.17 The different looks of the sunburst chart. . . . . . . . . . . . . . . . . . 29

5.1 Two ancestor trees with their own characteristic shapes. . . . . . . . . . 315.2 One of the ancestor has four parents according to the tree, due to an

error in the database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.3 An error in the database found by the lifeline representation. . . . . . . 33

6.1 An example of how a descendants tree could look when hovering oneof the ancestors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.2 A sample tree drawn along a timeline. . . . . . . . . . . . . . . . . . . . 38

Page 11: Multiperspective visualization of genealogy data

vii

Glossary

Centralperson The root person from who the visualizationsare based on

Lifeline A line that represents a person’s life, wherethe length of the line corresponds to the lifespan of that person

Page 12: Multiperspective visualization of genealogy data

1

Chapter 1

Introduction

Genealogy is a hobby whose popularity has grown rapidly in recent years. In Amer-ica it is the second most popular hobby [5]. But what is it that makes genealogy thatinteresting? To find kinships with people you did not even know existed? Getting toknow ancestors who lived hundreds of years ago? To learn about your name and howthat name has developed throughout all generations? Or is it about getting insightsabout your family and how different historical events have affected your present life?There are a lot of possible reasons that may attract new groups of genealogy practi-tioners.

One major reason for the growing interest is the internet. With internet it is notonly easier to collect information, keeping contact and sharing experiences with othergenealogists, it is also easier to actually get a visual representation of the collecteddata. There is a lot of genealogy software helping you to create visualizations. Com-mon for most existing genealogy software is the use of GEDCom (section 2.2), a fileformat in which genealogy data is stored.

1.1 Aim

Genealogy data is usually visualized with so called family trees. However, these treesare often difficult to navigate through, and do not present much more informationthan hierarchical relationships between people, and their birth and death dates. Ina GEDCom file it is possible to store different types of data (for instance temporalor geographical data) that gets lost in the traditional visualizations. Tools presentingthis kind of data would lead to a more vivid genealogy experience.

The aim with this thesis work is to examine how different visualization techniquescould be used to enable intuitive navigation through a GEDCom based family tree,and how a multiperspective labeling of historical individuals can be used to gain newinsights from genealogical visualizations.

1.2 Problem Description

At present, there are two problems mainly discussed regarding visualizing genealogydata. One is the difficulty of processing large data sets. A family tree containing alarge number of people will be cluttered and hard to interpret and navigate through.The other problem is about visualizing temporal attributes. Currently, there are a lotof visualizations available showing hierarchical relations, but the number of examplesincluding visualization of temporal data are sparse and usually limited to includingthe temporal information in the text labels.

Page 13: Multiperspective visualization of genealogy data

Chapter 1. Introduction 2

After discussion with two experienced genealogists, it also emerged that a prob-lem with genealogy today is the difficulty of getting an overview of where in one’sfamily tree there is missing data, and what data may be wrong.

These problems lead to the research questions that this work will try to answer.

1.3 Research Questions

• When visualizing a large data set, how can a family tree be visualized to sim-plify the navigation in the tree?

• How can genealogy data be visualized from a temporal point of view?

• How can genealogy data be visualized to clearly indicate where there is missingdata?

1.4 Limitations

The application developed in this work will only work as a tool to visualize and an-alyze the genealogy data stored in a GEDCom file. No functionality for adding newpersons to the database, or to modify the already stored data, will therefore be imple-mented.

Due to time constraints, the application will focus on visualizing only the ances-tors of a central person.

1.5 Report Structure

The structure of this report is as follows:

• Chapter 2 - Background: This chapter will initially present some history aboutgenealogy. Then, the GEDCom format will be described, followed by relatedwork regarding methods for creating a tidy drawing of a tree, and for visualiz-ing genealogy data from a temporal point of view.

• Chapter 3 - Method: In this chapter the different methods used for creating theapplication will be presented.

• Chapter 4 - Results: The final application will be presented in this chapter.

• Chapter 5 - Discussion: In this chapter, the methods used, as well as the finalresult, will be discussed.

• Chapter 6 - Conclusion & Future work: A conclusion based on the aim and re-search questions of this work will be presented in this chapter. Also, possibilitiesfor extensions and future work will be included here.

Page 14: Multiperspective visualization of genealogy data

3

Chapter 2

Background

2.1 Genealogy

Genealogy, defined as the study of families and their origin [6], is a tool for provingkinship between people and is used for several purposes. Some people use genealogyto find living relatives, some use it for medical reasons (to find genetic diseases withinthe family), and some use it just as a hobby in order to create a family tree that rangesas far back as possible. The interest in genealogy has grown dramatically in recentyears, making it the second most popular hobby in the United States [5]. However,the idea of genealogy has not always been as positive. As a result of the AmericanRevolution, politics in the country became unstable, and the usually huge respectfor ancestors decreased. For many, genealogy was seen as something elitist. About100 years later, after the Civil War, the United States began to regain its stability andprosperity. This resulted in a nation attracting a wave of immigrants, many of whomwere met with hostility. Nativism spread throughout the country. During this time,genealogy became a tool for ideologies rooted in concepts such as heredity and race.[7]

The publication of the book Roots: The Saga of an American Family [8] by Alex Hayesin 1976 is said to be the start of the great interest in genealogy. The book followsthe life of Kunta Kinte, an African youth sold and abducted to North America forslavery, and the lives of his descendants. The book convinced people around theworld that each family has its own important story to tell. Regardless of origin, raceor prosperity, it became respectable to search for their ancestors. [7]

With internet, genealogy became more accessible. The internet facilitated the pro-cess of collecting data and it also became easier to both keep contact with other ge-nealogists and to create family trees. This made the whole genealogy process easier,and is one of the main reasons why the interest in genealogy has grown that muchin recent years. [5] Today there are several genealogy software where users can storetheir family information as well as visualize the data (commonly with a family tree).Many of these software have in common that they use the GEDCom file structure(section 2.2), a data representation for genealogy data, which enables exchanging databetween them.

2.2 GEDCom

GEDCom, short for GEnealogical Data Communication, is a data representation forgenealogy data, developed and presented by The Church of Jesus Christ of Latter-daySaints. [9] The GEDCom data format is used to exchange genealogical data betweendifferent genealogy software.

Page 15: Multiperspective visualization of genealogy data

Chapter 2. Background 4

FIGURE 2.1: A simplified GEDCom file.

A GEDCom file usually consists ofthree parts: the header, the individualrecords part and the family records part.Figure 2.1 shows a sample from a GED-Com file. The header includes basic in-formation about the file, for example thename of the software source (SOUR), theGEDCom version (5.5) and the characterencoding (ASCII).

Both the individual records part andthe family records part contain a stream ofrelated records. Every record is describedby a series of tagged lines hierarchicallyorganized, where every line consists of alevel number, a tag and an optional value.A line with the level number 0 is alwaysindicating that it is the first line of a newrecord. Every line can have a subordinateline, whereof the level number refers totheir hierarchical relationship. A subor-dinate line always has a level number in-creased by 1. In figure 2.2, an example ofan individual record from a GEDCom fileis presented.

The first line of the individual recordspart means that this is the record de-scribing a new individual (INDI) that, inthe example of figure 2.2, has been giventhe identification number "I1". The sec-ond line has the level number 1, the tagNAME and the value John /Doe/. Thus,

the individual I1 has the name John Doe.Lines 3-5 in the example describe the birth (BIRT) of John Doe. The lines "2 DATE

01 JAN 1950" and "2 PLAC Stockholm" both have the level 2 which means that theyare subordinate lines of "1 BIRT". The tag PLAC describes a place and DATE is thedate for an event. Thus, John Doe was born on January the 1st, 1950, in Stockholm.Accordingly, lines 6-8 describe the death (DEAT) of John Doe.

The last line, 1 FAMS @F1@, describes that John Doe belongs to the family withthe given identification dumber "F1". An individual can be linked to a family eitherby the tag "FAMS" or the tag "FAMC", where FAMS means that the individual is oneof the spouses or parents of the family and FAMC that the individual is a child of thefamily. Since a person can be both a parent and a child, it is possible to be linked tomore than one family.

A family record example can be seen in figure 2.3. In a family record there areinformation about who belongs to the family. A mother (WIFE), a father (HUSB)and children (CHIL) are connected to the family by their identification numbers. Ev-ery family record is also given an individual identification number. Except for theinformation presented in the figure, a family record can for instance also contain in-formation about marriage between the parents.

Every GEDCom file ends with the line 0 TRLR. In the GEDCom Standard [9], alltags that can be used in a GEDCom file are defined.

Page 16: Multiperspective visualization of genealogy data

Chapter 2. Background 5

FIGURE 2.2: An individual recordfrom the sample GEDCom file.

FIGURE 2.3: A family record from thesample GEDCom file.

2.3 Related Work

This chapter will start with an overview of how a binary tree can be used for visual-izing genealogy data. Tree drawing algorithms will then be presented, followed by areview of time focused visualization approached used in genealogy.

2.3.1 Tree Drawing Algorithms

The probably most common way to visualize genealogy data is with a family tree.Trees are in general a typical data structure for presenting hierarchical data. A tree isbuilt of nodes, and lines connecting the nodes. The nodes are representing the objectsin the data set, and the lines demonstrate the hierarchical relationships between theobjects. [10] In many cases, a specific node is the root of the tree. The tree will inthose cases be called a rooted tree. When drawing a rooted tree some aesthetic rulesare often adopted. To place nodes at the same level along the same horizontal line isan example of an aesthetic rule. [11] A common form of a rooted tree is the binarytree. In a binary tree, each node has at most two child nodes. Figure 2.4a shows aclassic formation of a binary tree. In this figure the aesthetic rule of placing nodes atthe same level along the same horizontal line is being adopted. This kind of binarytree is common when visualizing genealogy data. Figure 2.5a shows an example ofhow a binary tree can be used to create an ancestor tree.

A binary tree can be drawn in other formations than the tree presented in figure2.4a. One example is the H-tree, figure 2.4b. Compared to the “classical” layout ofa binary tree the H-tree is using more of the drawing surface, which Tuttle, Nonatoand Silva [12] consider being an advantage when visualizing genealogy data. Figure2.5b shows how Tuttle, Nonato and Silva use the H-tree to create a pedigree chart.The central person (the root node) is placed in the middle. The parents of the centralperson are then placed on each side of the central person, either in the vertical or thehorizontal direction. The next generation of ancestors (the grandparents of the centralperson) are placed in the same manner, at each side of their respective children. Thehorizontal or the vertical direction of the placement is alternating between the gener-ations. This process continues until all persons in the data set are drawn. The nodeswill then build a fractal pattern, similar in shape to the letter H.

When visualizing hierarchical data, radial layouts can also be used. Instead ofdrawing the nodes along horizontal lines, they are being drawn along concentric cir-cles with the root placed in the middle. Nodes at the same level are placed on thesame concentric circle. This can be seen in figure 2.4c. Radial layouts have also beenused when visualizing genealogy data. An example is shown in figure 2.5c.

Page 17: Multiperspective visualization of genealogy data

Chapter 2. Background 6

(a) A classical formation. (b) An H-tree formation. (c) A radial layout.

FIGURE 2.4: The same data set drawn in different formations of a bi-nary tree. The numbers indicates the level of the nodes (where level 0

is the root).

(a) An ancestor tree createdby Progeny Genealogy [13].

(b) Genealogy data visual-ized as an H-tree by Clau-

rissa Tuttle [12].

(c) Genealogy data visual-ized with a sunburst chart,by Progeny Genealogy [14].

FIGURE 2.5: Genalogy data visualized in different formations.

Wetherell and Shannon discuss the difficulties of creating a tidy drawing of a tree,and present two algorithms in an attempt to solve these problems [1]. These algo-rithms are adjusted for binary trees, but can be modified in order to present an arbi-trary tree. According to Wetherell and Shannon, a tidy tree needs to fulfill both phys-ical and aesthetic requirements. Since the drawing surface is usually bounded in oneor two dimensions the tree cannot use unlimited of space, and should therefore notbe too big. Because of this, they implemented a physical limit in their algorithms thattries to draw the tree with a width as small as possible. The aesthetic requirementsthat a tidy tree needs to fulfill according to Wetherell and Shannon are as follows:

• Nodes of a tree at the same height should lie along a straight line, and thestraight lines defining the levels should be parallel

• In a binary tree, each left son should be positioned left of its father and eachright son right of its father

• A parent should be centered over its children.

Figure 2.6 shows two drawings of the same sample tree, one for each of the re-sulting algorithms. The left tree, figure 2.6a, fulfills all of the aesthetic requirements,but it does not have the smallest width possible. Because of this, Wetherell and Shan-non modified the algorithm, resulting in the right tree, figure 2.6b. The differencebetween the algorithms is that in the modified algorithm, subtrees are being foldedbeneath their ancestors where possible. This will result in a narrower tree. On thecontrary, it will no longer fulfill the last aesthetic rule. As can be seen in figure 2.6b,

Page 18: Multiperspective visualization of genealogy data

Chapter 2. Background 7

the parent of node A is not placed in the center of its children, which will give a lesstidy tree according to aesthetics.

(a) A sample tree fulfilling the aestheticrules defined by Wetherell and Shannon.

(b) A narrower version of the same sam-ple tree, created with the

modified algorithm.

FIGURE 2.6: The same sample tree, created with the two algorithmspresented by Wetherell and Shannon [1].

The problems occuring when drawing a tree with the algorithms presented byWetherell and Shannnon are discussed by Reingold and Tilford [15]. They concludethat the problems are a consequence of that, when drawing a tree with these algo-rithms, the shape of a subtree is affected by the position of the nodes outside thatsubtree. This will result in that a symmetric tree may be drawn asymmetrically, thatis, a tree and its reflection will not always produce mirror image drawings. That asymmetric tree is being drawn asymmetrically is not a desirable behaviour, and toprevent that this will happen, Reingold and Tilford formulate a fourth aesthetic rulethat should be fulfilled when drawing a tidy tree:

• A tree and its mirror image should produce drawings that are reflections of oneanother; moreover, a subtree should be drawn the same way regardless of whereit occurs in the tree.

As an improvement to the Wetherell and Shannon algorithms, they present a newalgorithm where the fourth aesthetic rule, as well as the three former rules, are ful-filled. In figure 2.7, a sample tree drawn by the three different algorithms can be seen.

Even though the algorithm presented by Reingold and Tilford produces tidy andpleasing drawings, the trees are not drawn with the smallest width possible. How-ever, Reingold and Tilford consider the fourth aesthetic rule being of bigger impor-tance than minimum width, since the aesthetic rule will aid in human perception.

(a) Drawn by the firstalgorithm presented byWetherell and Shannon

[15].

(b) Drawn by the modi-fied algorithm presented byWetherell and Shannon [15].

(c) Drawn by the algorithmpresented by Reingold and

Tilford [15].

FIGURE 2.7: A sample tree drawn by different algorithms.

Page 19: Multiperspective visualization of genealogy data

Chapter 2. Background 8

2.3.2 Time Focused Visualization In Genealogy

TimeNets, proposed by Nam Wook Kim et al [2], is a visualization tool where hier-archical conditions are combined with a timeline, for presenting a family tree witha temporal context. In addition, TimeNets includes a technique for processing largedata sets. In this technique, every individual in the data set is assigned a degree-of-interest. Based on this degree-of-interest the individual is either included or excludedfrom the visualization.

The visualizations in TimeNets are based on a horizontal timeline representingtime continuing from left to right, figure 2.8. Each individual is represented with alifeline. These lines are placed along the timeline so that the left end is placed at thetime of the particular person’s birth and the right end is placed at the time of theparticular person’s death. To visualize different kinds of relationships between indi-viduals the vertical axis is used. A marriage between two persons is represented bytheir lifelines converging. Likewise, a divorce is represented by two lifelines diverg-ing.

FIGURE 2.8: A lifeline visualization created with TimeNets [2].

To indicate a parent-child-relationship, TimeNets uses a drop line that connectsthe parents’ lifelines with the child’s lifeline. These lines are dotted and transparentin order to prevent the visualization from getting too cluttered. In those cases where aparent has more than one child, the children will be ordered vertically based on theirbirth dates. The youngest child will be placed the closest to their parents’ lines, andthe oldest the most far away. By sorting the lines like this, the number of intersectinglines will be minimized, which as well results in a less cluttered visualization.

As a default, the lifelines will be colored based on sex, blue for male and red forfemale. However, this can be changed to make the colors represent other data, e.g.an individual’s geographical settlement. The colors will then represent different geo-graphical locations, and the line will shift colors depending on when the representedperson moved to another location. Another example of alternative color coding is todetect an individual’s various diseases, when they occurred and how long they kept.

As mentioned in section 1.2, one of the main problems when visualizing geneal-ogy data is the difficulty of processing large data sets. When the visualization in-cludes a large amount of people it will easily become cluttered and hard to interpret.To improve this, TimeNets is using a degree-of-interest (from now on called DOI) tech-nique. A DOI will be calculated for each individual, and compared to a thresholdvalue. If the DOI of an individual is larger than the threshold, he or she will be in-cluded in the visualization, otherwise not. Based on the central person, the DOI of

Page 20: Multiperspective visualization of genealogy data

Chapter 2. Background 9

all individuals in the data set is being calculated. The central person gets the highestDOI. The DOI will then decrease for each individual in relation to the distance to thecentral person. For people related to the central person, the DOI decreases linearly,while for marital relationships, it decreases more slowly.

Unlike a classical, static layout of a family tree, the TimeNets approach offers a toolfor analyzing the data and gaining new kind of insights. By presenting the temporaldata in a clear way, the user will “get to know” the family and their history better.Another strength of the visualizations created by TimeNets is the ability to easilypresent divorces and remarriages. In today’s society this is very common, hence, toolsoffering support for clearly presenting divorces and remarriages are highly relevant.

Despite its strengths, TimeNets is not an optimal tool in all situations. When ana-lyzing only a few individuals closely related, as in the example presented in figure 2.8,it works great. In other situations though, where the aim may be to get an overviewof the whole family, or comparing two individuals placed far from each other due todifferences in birth years, it will not work as good.

Genelines [3] is another software that visualizes genealogy data along a timeline.It is possible to create five different kinds of visualizations presenting hierarchicalrelationships with Genelines: a pedigree chart, a full descendant chart, a family groupchart, a direct line chart and a fan chart. The first four visualizations have in commonthat every individual, just like with TimeNets, is represented by a lifeline.

All visualizations have their own main task. The pedigree chart, figure 2.9, drawsthe central person together with its ancestors. Like TimeNets, parents are paired withtheir child using a dashed vertical line. In contrast, instead of placing the child be-neath its parents, Genelines places the child between its parents. The colors and theplacement of the lifelines indicate if the individual is a maternal or paternal ancestor.A maternal ancestor will be drawn in red beneath the central person, while a paternalancestor will be drawn in blue above the central person. In those cases where thereis no information about an individual’s birth or death date, Genelines will estimatethat date. An estimated date is represented by the lifeline not being completely filledwith color, either at the beginning or end of the line, depending on whether there isthe birth or death date missing.

FIGURE 2.9: A pedigree chart created with Genelines [3].

Page 21: Multiperspective visualization of genealogy data

Chapter 2. Background 10

The full descendant chart presents all descendants of the central person. The cen-tral person is drawn at the top together with his spouse, colored in green, figure 2.10.Every child to the central person is drawn, together with their respective descendants,in different colors.

FIGURE 2.10: A full descendants chart created with Genelines [3].

The family group chart visualizes lifelines of the central person and his children,spouse and parents, figure 2.11. In this visualization the lifelines are colored blue formen and red for women. It is also possible to add historical events important to thefamily in the family group chart. They are visualized as vertical bands.

The direct line chart connects a person together with a chosen descendant or an-cestor by showing only those people creating the path between these two, figure 2.12.As well as with the family group chart, it is possible to add historical event in thedirect line chart.

FIGURE 2.11: A family group chart created with Genelines[3].

Page 22: Multiperspective visualization of genealogy data

Chapter 2. Background 11

FIGURE 2.12: A direct line chart created with Genelines [3].

Finally, it is possible to create a fan chart. Similar to the other visualizations, thefan chart is based on a timeline. The central person is drawn in green in the middle,paternal ancestors are drawn on the left side and maternal ancestors are drawn on theright side of the central person.

One advantage with Genelines is that it is possible to create different visualiza-tions of the same data. Comparing the charts, different types of insights can be gainedfrom each chart. Hence, Genelines as a tool is broad, and covers several differentneeds. However, the different charts are neither linked together or visualized at thesame time. Even more kind of insights could have been gained by the user if thecharts were linked together and interacted with each other.

FIGURE 2.13: A fan chart created with Genelines [3].

Mukaliyev presents a new method for visualizing genealogy data [4]. The methodis based on an attempt to combine methods for handling large data sets along withmethods for visualizing data against a timeline.

Like TimeNets and Genelines, Mukaliyev as well uses lifelines placed along a hor-izontal axis to represent individuals. The proposed solution consists of two mergedtrees, an ancestor tree and a descendant tree. For the ancestor tree, the same techniqueas Genelines is used: each individual’s lifeline is placed vertically between its parents’lifelines, figure 2.14. The descendant tree at the other hand uses the same techniqueas TimeNets where a child’s lifeline is placed underneath both parents’ lines, figure2.15.

Page 23: Multiperspective visualization of genealogy data

Chapter 2. Background 12

FIGURE 2.14: An example of an ancestor tree created by Mukaliyev [4].

FIGURE 2.15: An example of a descendants tree created by Mukaliyev [4].

A visualization where every individual in the data set is included will be hardto understand, especially when it comes to large data sets. It would also be hardto avoid intersecting lines which makes the visualization even harder to interpret.To avoid this, Mukaliyev chooses to only draw a part of the data, and let the userhave the possibility to affect which part is shown. Before any tree is drawn the userchooses which person in the data that should be the central person. The ancestor treeis drawn first, containing all ancestors of the central person. All ancestors have theirown descendant tree, but in order to avoid a cluttered visualization all descendanttrees cannot be drawn. Therefore, Mukaliyev introduced a rule saying that only thedescendants belonging to an ancestor whose path of birth lines to the central personis only going upwards, will be drawn. This rule is further explained in figure 2.16.Person 1 is the central person, and person 2, 3 and 4 are the ancestors fulfilling therule of having an only upgoing path of birth lines to the central person. Thus, adescendant tree will only be drawn for person 2, 3, and 4. This rule will result in thatno birth line intersects with any lifeline.

Page 24: Multiperspective visualization of genealogy data

Chapter 2. Background 13

FIGURE 2.16: An example of a visualization created by Mukaliyev. Person 1 is the centralperson, and person 2,3 and 4 are the ancestors fulfilling the rule of having an only upgoing

path of birth lines to the central person [4].

However, this does not mean that a descendant tree cannot be drawn for any ofthe other ancestors. By clicking on an optional ancestor, the tree will be redrawn, butwith the greatest ancestor of the chosen ancestor as the lowermost person in the vi-sualization. Thus, the ancestors fulfilling the rule will now be others than before. Asa default, the greatest paternal ancestor will be the lowermost ancestor in the visual-ization.

If the user clicks on a lifeline that does not belong to an ancestor of the centralperson, the person whose lifeline was clicked will be the new central person. Boththe ancestor and the descendant tree will then be redrawn based on the new centralperson. Makuliyev thus offers the user a great impact on what data is being drawn.

Even though only some parts of the data is drawn, the visualization can still be-come cluttered and contain a large amount of lines. Because of this Mukaliyev intro-duced techniques for zooming and panning. The zooming is only working in verticalmode, which means that the user can increase the vertical distance between the life-lines. The user can also drag the visualization around without zooming and thusaffect which part of the visualization that is visible.

The colors of the lifelines depend on the gender, blue for male and red for female.Personal events, such as diseases, can be added to an individual. This is visualizedby coloring the part of the lifeline corresponding to when the event took place withanother color. At both ends of the event, a dot is drawn in the same color (see figure2.17). This will help the user to keep track of when different events start and end incases where different events overlap on time.

Page 25: Multiperspective visualization of genealogy data

Chapter 2. Background 14

FIGURE 2.17: A lifeline where personal events has been added [4].

One of the strengths of Mukaliyev’s approach is the rules deciding the placementof the lines, and which descendants that are being drawn. Thanks to these rules,the number of intersecting lines will be minimized, which is of big importance if thevisualization should not become too cluttered. Another strength is that Mukaliyevoffers the user a great impact on which data is being drawn. Since the user has theability to change which data is being drawn, it does not matter that all data cannot bedrawn at the same time.

Like the TimeNets approach, Mukaliyev’s approach is not suitable in situationswhere the user wants to compare two individuals placed far from each other or wantsto get an overview of the whole family. Visualizing a large data set, there is a risk thatthe resulting visualization will become cluttered due to the amount of lines.

Page 26: Multiperspective visualization of genealogy data

15

Chapter 3

Method

This chapter will describe the approach and the different techniques used to create atool for visualizing genealogy data.

Two experienced genealogists were involved during this thesis work. Before anyof the implementation was done, continuous meetings were held in order to discusssome difficulties regarding genealogy today, as well as different techniques, repre-sentations, functionalities and layouts. After a first version of the application wasimplemented, there was also an evaluation meeting held where the result was dis-cussed and the genealogists gave their feedback of what could be improved.

The result of this thesis has been a web application built using the MEAN-stack.MEAN stands for MongoDB [16] (a NoSQL database that stores its data in a JSON-like format), Express [17] (a flexible back end web application framework that runs ontop of Node.js), AngularJS [18] (a front end web application framework) and Node.js[19] (an asynchronous event driven server framework) which are all JavaScript-basedtechnologies. The MEAN-stack has been used due to its strengths in building fast,efficient and well structured applications that are easy to maintain. On top of theMEAN-stack D3.js [20] has been used, a JavaScript library for creating powerful visu-alizations.

3.1 Implementation

A database was created in the application for holding the GEDCom data. The databaseconsists of three different collections: one collection containing all individuals, onecontaining all families, and one containing all events. In order to simplify the han-dling and structuring of the objects in the database, Mongoose [21] was used. Mon-goose is an object modeling tool for MongoDB. With Mongoose a Schema is createdfor each collection, defining what information is stored in every object of the collec-tion [22]. The Schemas created in the application are called Indi, Family and Event,and are defined according to figure 3.1. In the Event collection an object stores allevents related to a specific person. In this application, the only event data saved isabout birth, death and marriages, since this it the only temporal data needed for theimplemented functionality (presented in section 4.3). The Events database can be ex-panded with information about other events as well as information about where aspecific event took place, since this is data that can be stored in a GEDCom file.

Page 27: Multiperspective visualization of genealogy data

Chapter 3. Method 16

(a) A schema defining an object in the Indicollection.

(b) A schema defining an object in theFamily collection.

(c) A schema defining an object in the Event collection.

FIGURE 3.1: The three type of collections stored in the database.

NodeJS and MongoDB are asynchronous. The difference between synchronousand asynchronous programming is that with synchronous programming, the code isexecuted in the same order in which it is written, from top to bottom. When a func-tion is executed, the program waits for it to finish before moving on to the next part ofthe code. Asynchronous code on the other hand, does not have to wait for a functionto run to completion, but several statements can be executed simultaneously. Theflow therefore does not become sequential, where the code is executed from top tobottom. The advantage of writing asynchronous code is that the program gets fasterand more efficient. Functions that are independent can be run in parallel which willsave time. [23] However, in those cases when several asynchronous functions dependon the result from each other, it may cause difficulties. Asynchronous functions usecallbacks to define what will happen when the function completes. Combining mul-tiple asynchronous functions leads to nestled callbacks, which are difficult to handle.The utility module async [24] was used in this project, in order to simplify the han-dling of the asynchronous callbacks. The method async.waterfall was used in orderto handle the flow in the code. Async.waterfall works in such a way that an array ofasynchronous functions are executed sequentially, and each function sends its resultto the next function of the array [25].

Express is a web application framework for Node.js, which contributes to fasterand better structured code. In this application it was used for setting up a local webserver and to simplify the routing. When receiving an incoming request in the formof an URL from the client (in this case from AngularJS), Express redirects the requeststo a certain part of the code. That code will then be executed, and a response is sentback to the client. [26]

3.2 Data Parsing

The data in the GEDCom file needed to be parsed in order to be saved in the database.This was done by creating a JavaScript script that would read the file line by line andprocess its data. Since GEDCom files often contain data about a large number ofpeople they become very large, which can cause problems for the application tryingto read it. The application only has a limited amount of memory allocated, and foreach line read some of that memory is being used. After reading a large number oflines the application will run out of memory and therefore stop working. To avoid

Page 28: Multiperspective visualization of genealogy data

Chapter 3. Method 17

this, the proposed application divides the sent in GEDCom file into smaller parts,reads one part, processes its data, and then throws it away before continuing to readthe next part. This means that memory will only be allocated for one part at the time,hence, the application wont run out of memory.

The data was processed in order to be saved in the three different collections,defined in figure 3.1. As described in section 2.2, each individual and family have aunique identification number that is used in order to connect individuals to a specificfamily or event. Each person can be connected to multiple families, either as a parent(tag = FAMS) or as a child (tag = FAMC).

3.3 Data Processing

In order for the data stored in the database to be visualized, it first needs to be pro-cessed. All persons in the data set will not be drawn simultaneously, and therefore theapplication needs to know which data to draw. At present, due to time constraints, theapplication is focused on visualizing only the ancestors of the central person (in sec-tion 6.2 there is a discussion about how the application could be extended to visualizedescendants as well). Based on the central person, an array is created containing allthe ancestors of that person. The visualizations created by the application are basedon the data stored in that array. It is possible for the user to change the central person(this is described in section 4.1). A new array will then be created before redrawingthe visualizations.

The array is created using a recursive function. The function works as describedbelow:

1. Add the central person to the array

2. Examine if the person at the current position of the array has a mother and/ora father stored in the database

3. If yes, add them to the array

4. If this is the last position of the array, return and save the array. Otherwise,continue to the next position of the array, then repeat step 2-4.

By repeating these steps, all ancestors will be added to the array. Hence, the ap-plication will know which data to visualize.

To be able to create the array, the collection called Family in the database (figure3.1b) is used. This is because the necessary hierarchical relationships are stored in thiscollection. After created, the array was restructured in order to represent the correcthierarchy. In figure 3.2 a sample array can be seen to better understand the structureof the hierarchical array.

Page 29: Multiperspective visualization of genealogy data

Chapter 3. Method 18

FIGURE 3.2: An example of an hierarchical array, written with pseu-docode.

3.4 Visualization

Just like Genelines, the application in this work consists of different types of visual-izations. A difference though, is that in this application all visualizations are visibleat the same time. The visualizations are also linked with each other, allowing interac-tion between them through brushing and selections. The representations created arean ancestor tree, a sunburst chart, and a bar chart like representation where each in-dividual is presented as a lifeline ordered and aligned according to various temporalevents.

The ancestor tree, or more accurately the tree as a data structure in general, hasas earlier discussed difficulties in drawing large data sets in order to be neat andunderstandable. Despite this, an ancestor tree is still used in this work. This is becausegenealogy data represented as a tree is widely recognized, which will also help theuser to understand the other, not as common, visualizations.

To create the visualizations, D3.js is used. D3.js makes use of SVG (Scalar VectorGraphics), a format in which vector-based graphics are defined. With SVG, differentkind of objects can be created, like lines, graphs and figures of different complex-ity. An advantage with SVG-objects is that regardless of resolution and size, theymaintain the same high quality. It is thus possible to zoom in on an object withoutany quality loss. [27] SVG-objects can both be styled, which is done using CSS, andtransformed. They also support animation. Different event handlers can be appliedto an SVG-object. [28] Examples of event handlers used in this application are "on-mouseover" and "onclick" handling what will happen when the user hovers over anobject and when the user clicks on an object.

3.4.1 Ancestor Tree

The ancestor tree is a binary tree with the root placed in the bottom. One of the aimswith the ancestor tree is that the user should get a clear overview of the whole tree,thus all ancestors should be visible at the same time. The user should not have to pan

Page 30: Multiperspective visualization of genealogy data

Chapter 3. Method 19

in order to see all ancestors. For all ancestors to fit in the same view the tree can notbe too big, and some rules for the node positioning need to be adopted. Wetherelland Shannon [1] present three aesthetic rules and one physical limit that should beadopted when drawing a tidy tree, see section 2.3. They implemented two algorithmsfollowing these rules differently. The resulting trees from their algorithms can be seenin figure 2.6. The first tree fulfills the aesthetic rules, but not the physical limit sinceit has a larger width than the smallest possible. The second tree, on the other hand,has the smallest width possible, but does not fulfill the third aesthetic rule. In thiswork the assumption that the first tree is a better choice when visualizing genealogydata is taken. This is because it is of importance that the hierarchical relationships areclear, and that the user has no problem of understanding them. A tree that is moreaesthetically appealing will ease this. With the additional aesthetic rule, presented byReingold and Tilford, the tree will be more aesthetically appealing, which will easethe understanding of the tree even further.

The aesthetic rules described by Wetherell and Shannon will be applied in thiswork, but with a modification to the second rule. The second rule says that everyleft child should always be positioned to the left of its parent, and a right child to theright of its parent. According to the modified rule this will only be applied in thosecases when the parent node has two children. If the parent node has only one child,they will instead be placed straight above each other. This is because it will result ina tree with a width smaller than otherwise. This is clarified in figure 3.3. The fourthaesthetic rule will also be applied in this work. The aesthetic rules adopted whendrawing the ancestor tree are thus:

• Nodes of a tree at the same height should lie along a straight line, and thestraight lines defining the levels should be parallel.

• In a binary tree, each left child should be positioned left of its parent and eachright child right of its parent, except in those cases where the parent has onlyone child. The child should then be positioned straight above its parent.

• A parent should be centered beneath its children.

• A tree and its mirror image should produce drawings that are reflections of oneanother

The physical limit saying that the tree should be drawn with the smallest widthpossible will also be applied, as long as it does not break any of the aesthetic rules.

(a) A sample tree drawn according to therules defined by Wetherell and Shannon.

(b) A narrower version of the same sam-ple tree, due to the modified rule.

FIGURE 3.3: Two versions of the same sample tree.

In figure 3.4 the drawing process of the nodes is described. They are drawn ac-cording to the order defined in figure 3.4a. The algorithm deciding which node will

Page 31: Multiperspective visualization of genealogy data

Chapter 3. Method 20

be drawn next is recursive, and is always investigating if the node just drawn hasany child node. If it does, the child node will be drawn next, instead of the previousdrawn node’s eventual remaining child. This is recursively repeated which will resultin the nodes being drawn in the described order.

Each time a new node is drawn, the position of each node will be examined, andupdated when needed. This is to ensure that the aesthetic rules are always fulfilled.Their vertical position is fixed and depends on the level of the node. The horizon-tal position is the one that may change during the drawing procedure. Figure 3.4bis showing an example of how the nodes change position each time a new node isadded.

(a) The order in whichthe nodes are drawn.

(b) The position changes when drawing the six first nodes.

FIGURE 3.4: The drawing process of the nodes.

3.4.2 Sunburst

The fan chart drawn by Genelines, figure 2.13, is drawn along a horizontal timeline.An advantage of that is that it is easy to compare the difference in birth years betweenpeople of the same generation. On the contrary, the fan chart could be a bit misleadingas it is easy to interpret the width of a node as the total age of the person representedby that node. This is not a correct interpretation, since the width of the node dependson the birth year of its child node, and not by its death date. The width will thereforenot represent the entire life span of the person. With this in mind, a sunburst dia-gram has been chosen for this application which is not drawn along a timeline. Theplacement of the nodes will only represent the hierarchical relations. Nodes belong-ing to the same generation will be drawn along the same concentric circle. The orderin which the nodes are drawn is the same as described in section 3.4.1.

Nodes along the same concentric circle will not necessarily have the same size.The size of a node is dependent on the total amount of nodes creating the branchthat the concerned node is a part of. When comparing two nodes at the same level,where one of them belongs to a branch containing for example five nodes and theother belongs to a branch containing fifteen nodes, the last one will be the largerone. To size the nodes like this is advantageous in those cases where one or more ofthe branches are very long. In each branch, the size of the nodes decreases with everygeneration. Imagining a very long branch, the node at the end will become very small,and in worst cases hard for the user to see, clarified in figure 3.5. By determining thesize of a node based on the size of the branch, those cases will be minimized. Anotherreason for using this technique, is that it will be more apparent where in the data setthere is more data collected. The user will then get an indication of in which part ofthe family he or she should collect more data.

Page 32: Multiperspective visualization of genealogy data

Chapter 3. Method 21

(a) A sunburst chart where all nodes at thesame concentric circle have the same size.

(b) A sunburst chart where the size of anode depends on the number of nodes

creating the branch.

FIGURE 3.5: A comparison of the two sizing techniques. As can beseen the red node in the left image is much smaller than the same node

colored in red in the right image.

3.4.3 Lifeline Representation

A bar chart like representation is used to visualize the individuals as lifelines. NamWook Kim et al. [2], Daniyar Mukaliyev [4], and Genelines [3] all use lifelines intheir works to visualize genealogy data. Unlike those representations though, nohierarchical relationships are presented in this lifeline representation. The lifelines aresorted vertically according to a user selected variable, such as birth year, sex etc. Avertical axis is included in the representation which represents the time of an arbitraryevent (such as getting married, or having your first child), selected by the user from adrop-down menu, see figure 4.8. All lifelines are then horizontally aligned along thisaxis with respect to this chosen event. As the lifeline is representing the life span ofa person, each year of that person’s life corresponds to a certain point of the lifeline.If the vertical axis is set to represent the time of getting married, the lifeline will behorizontally positioned in order to intersect with the axis at the point correspondingto the age of which the represented person got married. Each lifeline is positionedas described, hence, it is possible to compare the age of the visualized persons atdifferent life happenings.

The lifeline representation is designed like this because of the assumption thatadditional insights and correlations can be found by comparing the lives of the an-cestors, instead of presenting their hierarchical relationships. Also, the hierarchicalrelationships are already visualized in the ancestor tree and the sunburst chart.

Page 33: Multiperspective visualization of genealogy data

22

Chapter 4

Results

This work has resulted in the implementation of a web application for visualizingdifferent aspects of genealogy data stored in a GEDCom file. In figure 4.1, the finalresult is presented, consisting of three different representations: an ancestor tree, asunburst chart and a lifeline representation, all interacting with each other. In thischapter, the result will be presented in detail.

FIGURE 4.1: The final result of the application.

4.1 Ancestor Tree

The ancestor tree can be seen in figure 4.2. Each person is drawn as a small dot,linked together with curved lines representing the hierarchical relationships. At eachbranching, the male ancestor is placed to the left of the parent node, and the femaleancestor is placed to the right. In those cases where the parent node has only one childnode, the child node will be placed straight above its parent node according to thesecond aesthetic rule (section 3.4.1), thus, the user cannot tell if the ancestor is male orfemale. To make sure that the user can always determine if a person is male or female,information about a person’s gender is presented as a symbol in a pop-up windowthat appears when hovering over that person. The pop-up window also containsinformation about the person’s name and birth year, as well as a number written inparenthesis describing the number of generations between the hovered person and

Page 34: Multiperspective visualization of genealogy data

Chapter 4. Results 23

the central person, figure 4.3 (since the data in the application is real, no names arepresented in the figures of this report. Instead the individual identification numbersare used in the figures). When a person is hovered, the path between that person andthe central person is highlighted, while the rest of the tree becomes transparent inorder for the user to be able to focus on the hovered person. Multiple colors are usedto draw the nodes. The colors are further discussed in section 4.4.

By double clicking on one of the nodes in the tree, the corresponding ancestor willbecome the new central person. The ancestor tree, as well as the sunburst chart andthe lifeline representation will then be redrawn, based on the new central person.

FIGURE 4.2: An ancestor tree created with the application.

FIGURE 4.3: A pop-up window appears when hovering a person in thetree.

4.2 Sunburst

The sunburst chart, similar to the ancestor tree, presents all the ancestors and theirhierarchical relationships, but in a radial layout, figure 4.4. Each person is drawn

Page 35: Multiperspective visualization of genealogy data

Chapter 4. Results 24

as an arc, a part of a circle. The persons being a part of the same circle, belong tothe same generation. The root, which represents the central person, is placed in themiddle, and for every generation a new circle is built around the root. Just like in thetree representation, the male ancestors are placed to the left and the female ancestorto the right of each other. The colors used in the sunburst are the same used in theancestor tree (described in section 4.4).

Like with the ancestor tree, the user can get additional information about a personby hovering it. An identical pop-up window will then appear containing informationabout the person’s gender, name and birth year, as well as the number of generationsto the central person, figure 4.5. When the user hovers a person, the path of ancestorsconnecting that person to the central person will be highlighted. This is done bymaking the rest of the sunburst transparent.

The user can also choose to click on a specific person. The path of ancestor willthen remain highlighted. The user can highlight one or more paths like this, figure 4.6.By selecting the checkbox “Draw selected data” in the lifeline representation (section4.3), the highlighted ancestors will be the only ancestors drawn as lifelines. This willbe further clarified in the next section.

In the sunburst chart it is also possible to present which ancestors, according tothe current settings in the lifeline representation, are missing data that are needed forthem to be drawn as lifelines. This is done by the user clicking on the radio button“Show missing data”. The persons missing data will then be drawn in red instead oftheir default color, figure 4.7.

FIGURE 4.4: A sunburst chart createdwith the application.

FIGURE 4.5: The sunburst chart whenhovering a person.

FIGURE 4.6: Multiple paths of ances-tors can be highlighted.

FIGURE 4.7: If a node is missing data,it will be drawn in red.

Page 36: Multiperspective visualization of genealogy data

Chapter 4. Results 25

4.3 Lifelines

In figure 4.8, the lifeline representation can be seen, where each lifeline represents aperson. In the example shown in the figure, the vertical axis is set to represent thetime when the visualized ancestors got their first child. The user can change whatevent the vertical axis is presenting, by choosing another option in the drop-downmenu. Currently, the options available are “Marriage” and “Birth of first child”. Byalternating these options, the lines will be realigned around the vertical axis withrespect to the chosen event.

The user can also define how the lines will be sorted vertically. By default, theyare sorted by their birth date, with the youngest person placed at the top. It is alsopossible to sort the lines by their birth dates, but with the oldest person placed at thetop. The user can also choose to sort the lines based on gender. This is in order tomore clearly present possible differences between men and women when it comes tothe age of, for instance, having their first child. When the lines are sorted based ongender, the men are placed at the top, and the women at the bottom.

By default, the lifelines are colored with the same colors as the ancestor tree andthe sunburst chart. However, the user can change the coloring in order to present thegender instead, figure 4.9. The lifeline of a male ancestor will then be colored blue,and the line of a female ancestor will be colored red. This as well clarifies the possibledifferences between men and women.

FIGURE 4.8: A lifeline representation cre-ated with the application.

FIGURE 4.9: The lifelines sorted and col-ored based on their sex.

For more information about who a particular lifeline is representing, the user canhover over that line, figure 4.10. The rest of the lines will then be transparent, whilethe hovered line retains its color. There will also appear a pop-up window, as with

Page 37: Multiperspective visualization of genealogy data

Chapter 4. Results 26

the ancestor tree and the sunburst chart, presenting the name and birth year of theperson, as well as the age of the person at the time of the represented event.

FIGURE 4.10: The lifeline representation when a line is being hovered.

In those cases when the data set visualized in the application is too big, all an-cestors will not fit at the same time in the lifeline representation, since the drawingsurface will be too small. To slightly reduce the data set, the persons who do nothave all the information needed stored in the database (e.g. birth or death date, ora date for the event represented by the vertical axis) will not be drawn. Moreover,functionality for scrolling in the lifeline representation has been implemented. Thus,even though all lifelines can not be seen at the same time, the user has the ability toscroll through the lines in order to see all of them.

In order to allow the user to have a greater impact on which ancestors that arevisualized in the lifeline representation, functionality for selecting ancestors in thesunburst has been implemented. This is done by, as described in section 4.2, high-lighting one or more paths of ancestors in the sunburst chart. The lines will then bedrawn in different colors than the default colors. The paths of chosen ancestors willbe alternating between two different shades of blue, figure 4.11. This is to distinguishwhich lines belongs to the same path in order to compare them against each other.The sunburst chart will still be colored with the default colors.

It is also possible to set the lifeline representation to present the average age of aparticular event. Each line in the visualization will then represent a specific century,where the length of each line corresponds to the average life length of that specificcentury. The average age of the particular event is then calculated for each century,figure 4.12. Here as well, the lines are alternating between two different shades of bluein order for the user to be able to distinguish them from each other. By hovering a line,a pop-up window will appear, containing information about the average age for thatspecific century, as well as information about which century the line is representing.

FIGURE 4.11: The paths of ancestors high-lighted in the sunburst chart, drawn in

different colors.

FIGURE 4.12: The lines represents the av-erage age of each century.

Page 38: Multiperspective visualization of genealogy data

Chapter 4. Results 27

4.4 Colors and Interaction

The colors used as default in order to draw all ancestors in the different representa-tions can be seen in figure 4.13. Four base color are used, all alternating between fivedifferent shades. As can be seen in figure 4.2 and figure 4.4, the four different colorsclearly divide the graphs in four branches, one for each grandparent. For each color,the different shades alternate between the generations. All persons belonging to thesame generations, will thus be drawn with the same shade.

In those cases where the lines are drawn in other colors than the default, the colorspresented in figure 4.14 are used. The colors are selected in order to be easy to distin-guish from each other. This is to make sure that different groupings, like the one infigure 4.11, will be clearly understood.

FIGURE 4.13: The default colors used inthe application.

FIGURE 4.14: The colors used in specialcases.

In figure 4.15 an example is shown where the different representations are inter-acting with each other. When a person is hovered by the user in any of the representa-tions, that person will as earlier described be highlighted. This will not only happenin the current representation though, the person hovered will be highlighted in therest of the representations as well.

FIGURE 4.15: The three visualizations of the application interactingwith each other.

Page 39: Multiperspective visualization of genealogy data

Chapter 4. Results 28

4.5 Evaluation Meeting

After implementing all three representations, there was a meeting held with the twoearlier mentioned (section 1.2) genealogists. The purpose with the meeting was toevaluate the different representations. The feedback from the genealogists led to somechanges concerning the result.

In figure 4.16, the first version of the ancestor tree can be seen. It did not containany colors, but the persons were drawn as black dots, and the information appearingwhen hovering a person did just contain the name and birth year of that person.The changes discussed were that the nodes in the ancestor tree should be coloredwith the same colors as the sunburst chart and the lifeline representation since thatwill create a better connection between all representations. It was also suggestedthat when hovering a person, there should be a number presenting the number ofgenerations between the hovered person and the central person. This is because ina large tree, it can be difficult to keep track of the hierarchical relationships betweenpeople.

The looks of the sunburst chart were also changed due to the evaluation meeting.The different stages can be seen in figure 4.17. At first, there was only one color used,gradually changing to a darker shade for every generation, figure 4.17a. During themeeting it was discussed that the color should instead alternate between five shades,repeated for every fifth generation. This resulted in the sunburst chart presented infigure 4.17b. It was also discussed that instead of using only one color, four differentcolors should be used, one for each grandparent, which would lead to a clearer dis-tinction of the different parts of the family. This resulted in the final sunburst chart,figure 4.17c.

FIGURE 4.16: The first version of the ancestor tree.

Page 40: Multiperspective visualization of genealogy data

Chapter 4. Results 29

(a) The sunburst chartdrawn with only onecolor, gradually chang-

ing to a darker shade.

(b) The sunburst chartdrawn with only onecolor, alternating be-

tween five shades.

(c) The final look ofthe sunburst chart, us-ing four different colorsalternating between five

shades each.

FIGURE 4.17: The different looks of the sunburst chart.

The appearance of the lifelines has not changed due to the meeting, other thanthe color changes described above. However, there was a suggestion that it shouldbe visible which lifelines do not have all the temporal data necessary stored in thedatabase. Instead of making that visible in the lifeline representation, this suggestionhas been applied to the sunburst chart, where as presented in figure 4.7, it is possiblefor the user to see the persons with missing data drawn in red. The reason for pre-senting this in the sunburst chart instead of the lifeline representations is because inthe sunburst chart, the nodes do not need any temporal data in order to be drawn intheir correct positions. In the lifeline representation however, the position of each lineis calculated based on its temporal data. If there is temporal data missing, the correctposition of the lifeline cannot be calculated, and the lifeline will not be drawn. It willtherefore be more clear to present which nodes are missing data in the sunburst chart.

There was also a request that the user should be able to select a group of peoplethat would be drawn as lifelines instead of always showing all the individuals. Thiswas then implemented in the sunburst chart as well, as mentioned in section 4.2,making it possible for the user to highlight paths of ancestors, and then visualizethem as lifelines.

Page 41: Multiperspective visualization of genealogy data

30

Chapter 5

Discussion

In this chapter, the different decisions taken and the methods used in this work arediscussed. The final result as well as suggestions on what may be improved will alsobe discussed.

5.1 Ancestor Tree

The aim with the ancestor tree was to offer the user a representation of the geneal-ogy data that would be easy to recognize and understand. As earlier mentioned, theclassical layout of a tree (section 2.3.1) is the most used method for visualizing ge-nealogy data. This makes that layout a suitable choice for creating a recognizablevisualization. Compared to the H-tree for example, it will be much easier for the userto understand the connection between two ancestors in the tree with the chosen lay-out. The path of ancestors connecting the two people will include several directionchanges in the H-tree, and even by highlighting the path, it would still be easy to getconfused along the path.

From the evaluation meeting with the two genealogists it appeared that the an-cestor tree visualizes the data in a clear and comprehensive manner. The tree differsfrom other family tree representations in such a way that all ancestors are visualizedat the same time. Hence, the user does not have to drag the tree around in order tosee all ancestors. This will contribute to the tree being easy to understand.

Another advantage of the resulted ancestor tree according to the genealogists isthat it will work as a "fingerprint" of the visualized family. Each ancestor tree willhave its own characteristic shape, and each time using the application, the user willimmediately recognize the tree. By using this application regularly, the user will learnwhere in the tree a certain ancestor is placed, which will be helpful in order to not getlost in the tree. The “fingerprint” idea will also make the genealogy experience morefun. The user will get a visual “portrait” of his family, to both present to his familyand to compare with the portraits of other genealogists. In figure 5.1 two differenttrees are drawn, both with their own characteristic "fingerprint".

As mentioned in section 3.4.1, four aesthetic rules were applied to the drawingprocess of the tree. These rules contribute in different ways to a good structure of thetree, where the hierarchical relationships are presented in a clear way, and from whichthe user can draw different kind of conclusions. For instance, the first rule saying thatall nodes belonging to the same level should be placed along the same horizontal linewill contribute with not only a better structure of the tree but will also work as anindicator of where in the data set the user should collect more data, which one of themain research questions of this work has been about (section 1.3). As can be seen infigure 4.2, some of the blue branches are much higher than for example the purplebranches. This means that the blue branch contains more generations of ancestors,

Page 42: Multiperspective visualization of genealogy data

Chapter 5. Discussion 31

hence, more data is collected in that part of the family. Based on this, the user cantake the decision to prioritize collecting data in other parts of the family.

(a) An example of an ancestor tree. (b) Another example of an ancestortree.

FIGURE 5.1: Two ancestor trees with their own characteristic shapes.

The ancestor tree can also be used to detect incorrect data in the database. Oneexample is shown in figure 5.2. The tree is showing that one of the ancestors has fourparents, which cannot be correct. Because of the tree presenting all the ancestors atthe same time, this kind of incorrect data is easy to detect. In a tree where the userwould have to drag the tree around in order to view all ancestors, it would be easierto miss that mistake.

The tree shown in figure 4.2 consists of 439 ancestors, where the greatest ancestorwas born in 1350. The fact that they all fit in the drawing surface at once is a greatresult. Still, suppose that the number of ancestors to be drawn was double the amountof the ancestors in the figure. In this case they would probably not fit at the same time,and functionality to handle these situations should be implemented. The degree-of-interest technique presented by Nam Wook Kim et al. [2], where each person isassigned a degree of interest in order for the application to prioritize which personsto draw, could be implemented in this application in order to solve this problem.The DOI functionality could then be extended to be able to be affected by the user.When calculating the DOI for each person, Nam Wook Kim et al. have implementedthat the central gets the highest DOI, and for each person in the data set the valuedecreases in relation to their distance to the central person. For people related to thecentral person, the DOI decreases linearly, while for marital relationship, it decreasesmore slowly. For the user to be able to affect the DOI functionality, it could instead beimplemented that the user can decide how the DOI should decrease for each ancestor.If the user is interested in analyzing only the paternal ancestors, he or she can set theapplication to prioritize them. The DOI would then be decreasing more slowly forthe paternal ancestors, than for the maternal ancestor. If the user instead is interestedin viewing as many generations as possible, the DOI would be calculated to fit thatrequest instead. The user would then have a great impact on what data that is beingvisualized.

Page 43: Multiperspective visualization of genealogy data

Chapter 5. Discussion 32

FIGURE 5.2: One of the ancestor has four parents according to the tree,due to an error in the database.

5.2 Sunburst

Compared to the ancestor tree, the sunburst visualizes the same data more compact– it does not need the same amount of space in order to draw the same number ofancestors. Because of this, the sunburst works as a great complement to the ancestortree. In cases when all ancestors cannot be visualized at once, as described in section5.1, they can still be included in the sunburst chart since it uses less space.

As earlier mentioned, due to time constraints, no descendants of the central per-son are visualized in the application. If the descendants were visualized as well,there would be even more data excluded from the tree representation due to the lackof space. In that case, two sunburst charts could be drawn, one for the ancestors andone for the descendants of the central person. The sunburst representation wouldthen have an even more important role as a complement to the tree.

Another great use of the sunburst chart is to present which of the ancestors thatdo not have all data needed stored in the database, in order to be drawn as a lifeline.By coloring them in red, the user will get a clear indication of which ancestor needsmore data to be added to the GEDCom file.

The sizing method for deciding the size of each node in the sunburst chart wasdiscussed during the evaluation meeting. With the method used in the application, itmay occur that a node that is placed further away from the root compared to anothernode will be drawn larger in size. One of the genealogists mentioned that this couldbe misleading since the user could get the impression that the node further awaywould be of bigger importance than the other node. There was a suggestion thatall nodes in the same generation should instead be equal in size. The advantagesand disadvantages of the suggested method, and the method used in the applicationwere discussed. After trying both methods, it was decided that the method used inthe final result is more suitable in this application, based on the motivation in section3.4.2.

5.3 Lifeline Representation

Nam Wook Kim et al. [2], Daniyar Mukaliyev [4], and Genelines [3] all use lifelines intheir works to present genealogy data. In those representations, the hierarchical rela-tionships between the people are the main focus, compared to the lifeline representa-tion in this work where the comparison between the lines is the main focus. For in-stance, comparing two people with four generations of difference with the TimeNetsrepresentation for example, will be difficult since their lines will be placed far apartdue to their birth dates. In the presented lifeline representation, the comparison be-tween people will be easy regardless which generations they belong to. During the

Page 44: Multiperspective visualization of genealogy data

Chapter 5. Discussion 33

evaluation meeting, the genealogists gave the feedback that the lifeline representationpresents a new approach of visualizing genealogy data. Compared to the most usualgenealogy visualizations, new type of insights can be gained with this representation.The lifeline representation is therefore considered having an important role in the ap-plication, as the aim of this work is partly to create a visualization where the user willget new kind of insight from their collected data.

Thanks to the different sorting settings, different correlations can be found. Forinstance, by vertically sorting the lifelines based on the persons’ birth year, and hori-zontally aligning them based on their age of marriage, potential patterns can be seentelling the user for example how the age when getting married has changed through-out the generations. Potential differences between male and female ancestors regard-ing their age when getting married or having their first child, can be clarified by sort-ing the lines based on sex instead.

An example of another correlation that was found during this work due to thelifeline representation is that a big part of the family visualized at the time, got theirfirst child the year after they got married. This was found just by alternating whichevent the axis represented. To be able to easily find correlations like this will makethe genealogy process more fun and feel more vivid. However, the only events theuser can currently chose to visualize is “Marriage” and “Birth of first child”. In orderto enable that more and different types of insights and correlations can be found withthe lifeline representation, more type of events stored in the GEDCom file should beadded to the application.

The lifeline representation also works as a tool to find errors in the database. Infigure 5.3, one example can be seen. According to the visualization, the hovered per-son was just one year old when getting her first child. This will tell the user to adjustthe data stored in the GEDCom file.

FIGURE 5.3: An error in the database found by the lifeline representa-tion.

In the final result, as described in section 4.2, by highlighting one or more pathsof ancestors, the user can select which persons are drawn as lifelines. A suggestion ofan improvement in order to make the application more flexible, is to allow the userto select separate individuals as well, and not only a complete path of ancestors. Insome case the user may need to compare two persons, placed in completely differentparts of the sunburst. A lifeline should then not be drawn for each person belongingto the paths connecting the selected persons to the central person, since that will justcomplicate the comparison between the two specific persons. One improvement ofthe application would therefore be to allow the user to select separate individuals aswell.

Page 45: Multiperspective visualization of genealogy data

Chapter 5. Discussion 34

5.4 Colors and Interaction

In order to create a unified application, the three different visualizations are linkedwith each other. For example, when a person is hovered over in one of the visual-izations, the same person will be highlighted in the other visualizations as well. Thiswill also help the user to understand the different visualizations. For instance, if theuser understands exactly how the ancestor tree is structured and how the hierarchicalrelationships are presented, but not fully understands the sunburst chart, by hoveringa person in the sunburst chart, and seeing the placement of that person in the ancestortree, the user will more easily understand the sunburst chart.

When hovering a person there will also appear an information box presentingsome personal information. By presenting personal information like this, instead ofwriting it directly in the nodes, the nodes can be drawn much smaller. In the ancestortree for instance, each person is drawn as a really small dot, which results in thatmore persons can fit in the drawing surface. Using the hovering technique in orderto present personal information, will therefore help the visualization to become tidierand to present more data in the same surface.

Four different base colors were used in the visualizations in order to create fourdistinct branches. This is because the user will easier understand which part of thefamily a certain ancestor belongs to. By using the four colors like this, they will alsowork as an indicator telling the user in which parts of the family he or she needs tocollect more data. In both figure 4.2 and figure 4.4 it is clear that the blue part of eachvisualization consists of more ancestors than for instance the purple part.

5.5 Implementation

During this work, several difficulties were encountered regarding parsing the datafrom the GEDCom file to MongoDB. Code that worked for some GEDCom files didnot work for others, and when trying to load large files the application crashed. Asmentioned in section 3.1, this problem was solved by dividing the GEDCom files intosmaller parts when trying to read them, and process the data from one part at thetime. However, much time was spent before understanding that the reason why theapplication did crash was the size of the GEDCom files. Therefore, a lot of unnec-essary time was spent trying to solve the same problem. There are a lot of alreadyimplemented GEDCom-parsers that could have been used in this application in or-der to save time, but the decision was made to not use any of them. This is due to thefact that by implementing your own parser, you have full control over how the parseddata will be structured, and which data will be saved. A GEDCom file contains a lotof information not used in this application. By using an existing GEDCom-parser alot of data would be unnecessary stored in the database, and the structure of the datawould not be adjusted for this application, which would make the rest of the imple-mentation more difficult. Despite the knowledge about the difficulties experienced,the same decision regarding writing your own parser or using an existing one wouldhave been made today. The ability to customize the structure of the data based onthe specific conditions of the application, is assumed more advantageous than thetime saved by using an already existing parser. Also, the reason for the amount oftime spent on the problem is the lack of experience regarding solving this kind oftasks. Someone more experienced may not have encountered the same difficulties,and therefore it is considered better to not use an already existing parser in this spe-cific situation.

Page 46: Multiperspective visualization of genealogy data

Chapter 5. Discussion 35

Other difficulties encountered during the implementation is about asynchronousprogramming. Because of lack of previous experience regarding asynchronous pro-gramming, the flow in which asynchronous code is executed was not fully under-stood. This affected the implementation time as well, since a lot of time was spenttrying to understand how the code should be structured. With the help of the utilitymodule async, it later became easier to understand and implement the asynchronousfunctions as well.

D3.js as a tool for visualizing the different representations in the application hasworked very well. There have not been any difficulties when implementing the dif-ferent types of visualizations, and with D3.js it has also been easy to handle what willhappen when the user hovers over an object or clicks on it. The use of SVG (scalarvector graphics) leads to that the objects can be drawn in any resolution or size, with-out any quality loss, which is important in order to be able to control the looks ofthe graphs in a flexible manner. D3.js is therefore a powerful and adaptable tool forcreating this kind of applications.

Page 47: Multiperspective visualization of genealogy data

36

Chapter 6

Conclusion & Future work

With the aim to examine how a multiperspective visualization could be used in orderto gain new insights from genealogy data stored in a GEDCom file, three researchquestions were stated, section 1.3. In this chapter those questions will be answered,followed by some conclusions. This chapter will also discuss some future work ofthis thesis.

6.1 Research Questions

• When visualizing a large data set, how can a family tree be visualized to simplify thenavigation in the tree?

A tree visualizing a large data set, where the user has to pan the view in or-der to see all data presented, will be difficult to navigate through. It will bedifficult for the user to get a clear overview of the data, which will also com-plicate the understanding of the connections between the persons in the tree.This work presents a family tree where all visualized persons, for a data set ofconsiderable size, can be seen at once in order to solve this problem. By usinga hovering technique it will be even easier for the user not to get lost in thetree when trying to understand the connection between two of the visualizedpersons. By highlighting the path between the central person and an optionalperson, their relationship will be clarified. Another method that will help theunderstanding of the tree is to let the colors alternate between different shades.Thanks to the different shades of the colors, the user will easier keep track ofthe number of generations between two people in the tree. To summarize, byvisualizing the data in a comprehensive manner, where all persons can be seenat the same time, and by using different helping techniques (such as hovering,and using different colors), it is possible to draw a tree easy to navigate through,even when visualizing a large data set.

• How can genealogy data be visualized from a temporal point of view?

GEDCom files contain a lot of temporal data, usually not visualized other thanwith text labels. Depending on the aim with the visualization, temporal datacan be visualized differently. Nam Wook Kim et al. [2], Daniyar Mukaliyev[4], and Genelines [3] all present their genealogy data along a timeline. Theyalso have in common that they present a person as a lifeline. Their differentapproaches of visualizing temporal data work great in visualizations where thehierarchical relationships are in focus. In this work, a different approach of theuse of lifelines is presented. As mentioned, the aim with this work is to examinehow different visualization techniques could be used in order to gain new kind

Page 48: Multiperspective visualization of genealogy data

Chapter 6. Conclusion & Future work 37

of insights. The focus of the lifeline representation is therefore to compare thelives of the visualized ancestors, which, as described in section 3.4.3, will enablethat different kind of insights and correlations can be found. In conclusion, ge-nealogy data can be visualized in multiple ways in order to present temporaldata, depending on the aim of the visualization.

• How can genealogy data be visualized to clearly indicate where there is missing data?

In this work, several techniques have been proposed to clearly indicate wheremore data needs to be collected, that is, in which part of the family that thereis not as much data collected as in other parts. By dividing the family into fourparts by using four different colors, the user will clearly see the difference inhow many persons each part consists of. The shape of the ancestor tree and thesunburst chart will also give the user an indication of which parts contain moredata. Since both the ancestor tree and the sunburst chart visualize all persons atthe same time, it is easy to compare the length of the different branches. Also,by sizing the nodes in the sunburst chart as described in section 3.4.2, it will bemore clear which parts contains more data. With the presented application it isalso possible to see which persons that are missing some personal informationin the database. By coloring them in red in the sunburst chart, the user will getan indication that their data needs to be adjusted.

One conclusion that can be drawn from this work is that, by using different kindsof visualizations that can interact with each other, thus creating a multiperspectivevisualization, it is possible to create an application from which new kinds of insightand correlations can be found. With one or more of the visualizations presenting allvisualized data at the same time, the application can also be used as a tool for learningin which part of the family more data need to be collected. It is also concluded that theinteraction between the user and the application, as well as the interaction betweenthe different visualizations, is important in order to provide an extra understanding.In this thesis work, the use of colors has also shown to play an important role whencreating a visualization easy to understand.

6.2 Future Work

The main change that should be done in future work of this application is to im-plement support for visualizing descendants, and not limit the application to onlyvisualize ancestors. If all descendants were drawn in the ancestor tree in the samemanner as the ancestors, the tree would probably become too large to fit in the draw-ing surface. A proposal of how they could be drawn instead is to by default onlyvisualize the ancestors, just like with the current representation. Then when the userclicks on one of the ancestors, a descendant tree would be drawn, containing only thedescendants of that ancestor. They would be drawn inside a box placed above the an-cestor tree. The rest of the tree would then be drawn transparent in order to not drawany attention from the user. To make the descendant tree disappear the user wouldonly have to click on the same ancestor node again. An example of how a descendanttree could look is seen in figure 6.1.

All descendants of the central person could also be visualized as a sunburst chart,just like the current ancestor chart. The central person would thus be placed in themiddle of the sunburst, and the descendants placed along concentric circles. Thelifeline representation would work pretty much the same as in the current application,

Page 49: Multiperspective visualization of genealogy data

Chapter 6. Conclusion & Future work 38

with the only difference that descendants would be drawn as well. More settingscould possibly be implemented for the lifeline representation, for instance letting theuser sort the lines based on if they present an ancestor or a descendant.

FIGURE 6.1: An example of how a descendants tree could look whenhovering one of the ancestors.

Another functionality that could be implemented in future work of this applica-tion is to add a timeline along the ancestor tree. The timeline would in this case beplaced vertically along the ancestor tree, and the height of a node would no longer de-pend on which level that node belongs to, but the birth year of the person representedby that node. A simplified example of how this could look can be seen in figure 6.2.By implementing a timeline like this, it would be possible for the user to compare thebirth dates of people belonging to the same generation. Just because two persons be-longs to the same generation, it does not mean that they were born around the sameyear. Sometimes the birth year can differ by decades. By being able to see this kindof differences, the user would get more types of insights of his or her family history.To make sure that the user would still have an ancestor tree easy to understand, atoggle functionality could be implemented, offering the user the possibility to switchbetween the usual ancestor tree, and the ancestor tree drawn along to a timeline.

FIGURE 6.2: A sample tree drawn along a timeline.

Page 50: Multiperspective visualization of genealogy data

39

Bibliography

[1] Wetherell C, Shannon A. Tidy drawings of trees. IEEE Transactions on SoftwareEngineering SE-5(5):514-520, 1979.

[2] Wook Kim N, Card S K, Heer J. Tracing genealogical data with timenets. InProceedings of AVI 2010, Advanced Visual Interfaces, 2011.

[3] Progeny Genealogy. Genelines. URL https://progenygenealogy.com/

products/timeline-charts. Accessed 2017-09-10.

[4] Mukaliyev D. Visualizing large genealogies with timelines. Joensuu, Universityof Eastern Finland, 2015.

[5] Dennis Szucs L, Hargreaves Luebking S. The Source: A Guidebook to AmericanGenealogy, 3rd edition. Ancestry, 2006.

[6] Nationalencyklopedin. Genealogi. URL http://www.ne.se/uppslagsverk/

encyklopedi/lång/genealogi. Accessed 2018-01-10.

[7] Shown Mills E. Genealogy in the ’information age’: History’s new frontier? Na-tional Genealogical Society Quarterly 91, 2003.

[8] Haley A. Roots: The Saga of an American Family. Dell Publishing Company, 1977.

[9] The Church of Jesus Christ of Latter-day Saints. The gedcom standard release5.5. 1996.

[10] Tamassia R. Handbook of Graph Drawing and Visualization. Chapman Hall/CRC,2013.

[11] Battista G D, Eades P, Tamassia R, Tollis I G. Algorithms for drawing graphs:An annotated bibliography. Computational Geometry: Theory and Applications4(5):235-282, 1994.

[12] Tuttle C, Nonato L G, Silva C. Pedvis: A structured, space-efficient technique forpedigree visualization. IEEE Transactions on Visualization and Computer Graphics16(6):1063-1072, 2010.

[13] Progeny Genealogy. Ancestor chart. URL https://progenygenealogy.com/

Products/Family-Tree-Charts/Sample-Charts/Ancestors. Accessed 2018-01-26.

[14] Progeny Genealogy. Ancestor fan. URL http://progenygenealogy.com/

Portals/0/images/charts/CCFTM-360-Fan-plain.PDF. Accessed 2018-01-26.

[15] Reingold E M, Tilford S J. Tidier drawings of trees. IEEE Transactions on SoftwareEngineering SE-7(2):223-228, 1981.

[16] MongoDB. URL https://www.mongodb.com/.

Page 51: Multiperspective visualization of genealogy data

BIBLIOGRAPHY 40

[17] Express. URL https://expressjs.com/.

[18] AngularJS. URL https://angularjs.org/.

[19] Node.js. URL https://nodejs.org/.

[20] D3.js. URL https://d3js.org/.

[21] Mongoose. URL http://mongoosejs.com/.

[22] Mongoose. Mongoose documentation. URL http://mongoosejs.com/docs/

guide.html. Accessed 2018-01-22.

[23] Pasquali S. Mastering Node.js. Packt Publishing, 2013.

[24] Async. Async documentation. URL https://caolan.github.io/async/docs.

html. Accessed 2018-01-22.

[25] Async. Async waterfall. URL https://caolan.github.io/async/docs.html#

waterfall. Accessed 2018-01-22.

[26] Holmes S. Getting MEAN with Mongo, Express, Angular, and Node. ManningPublications Co., 2016.

[27] W3Schools. SVG Intro. URL https://www.w3schools.com/graphics/svg_

intro.asp. Accessed 2018-01-23.

[28] W3C. About SVG. URL https://www.w3.org/Graphics/SVG/About.html. Ac-cessed 2018-01-23.