Interactive out-of-core rendering and filtering of one...

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet

gnipökrroN 47 106 nedewS ,gnipökrroN 47 106-ES

LiU-ITN-TEK-A--18/034--SE

Interactive out-of-corerendering and filtering of onebillion stars measured by the

ESA Gaia missionAdam Alsegård

2018-06-19

LiU-ITN-TEK-A--18/034--SE

Interactive out-of-corerendering and filtering of onebillion stars measured by the

ESA Gaia missionExamensarbete utfört i Medieteknik

vid Tekniska högskolan vidLinköpings universitet

Adam Alsegård

Handledare Emil AxelssonExaminator Anders Ynnerman

Norrköping 2018-06-19

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Adam Alsegård

Linköping University | Department of Science and Technology

Master Thesis | Media Technology and Engineering

Spring 2018 | LiU-ITN-TEK-A--00/000--SE

Interactive out-of-core renderingand filtering of one billion starsmeasured by the ESA Gaia mission

Adam Alsegård

Supervisor: Emil Axelsson

Examiner: Anders Ynnerman

Linköping University

SE-601 74 Norrköping, Sweden

013–28 10 00, www.liu.se

Abstract

The purpose of this thesis was to visualize the 1.7 billion stars released by the European Space Agency,

as the second data release (DR2) of their Gaia mission, in the open source software OpenSpace with

interactive framerates and also to be able to filter the data in real-time. An additional implementation

goal was to streamline the data pipeline so that astronomers could use OpenSpace as a visualization

tool in their research.

An out-of-core rendering technique has been implemented where the data is streamed from disk

during runtime. To be able to stream the data it first has to be read, sorted into an octree structure

and then stored as binary files in a preprocess. The results of this report show that the entire DR2

dataset can be read from multiple files in a folder and stored as binary values in about seven hours.

This step determines what values the user will be able to filter by and only has to be done once for a

specific dataset. Then an octree can be created in about 5 to 60 minutes where the user can define if

the stars should be filtered by any of the previously stored values. Only values used in the rendering

will be stored in the octree. If the created octree can fit in the computer’s working memory then the

entire octree will be loaded asynchronously on start-up otherwise only a binary file with the structure

of the octree will be read during start-up while the actual star data will be streamed from disk during

runtime.

When the data have been loaded it is streamed to the GPU. Only stars that are visible are uploaded

and the application also keeps track of which nodes that already have been uploaded to eliminate

redundant updates. The inner nodes of the octree store the brightest stars in all its descendants as a

level-of-detail cache that can be used when the nodes are small enough in screen space.

The previous star rendering in OpenSpace has been improved by dividing the rendering phase into two

passes. The first pass renders into a framebuffer object while the second pass then performs a tone-

mapping of the values. The rendering can be done either with billboard instancing or point splatting.

The latter is generally the faster alternative. The user can also switch between using VBOs or SSBOs

when updating the buffers. The latter is faster but requires OpenGL 4.3, which Apple products do not

currently support.

The rendering runs with interactive framerates for both flat and curved screen, such as domes/plan-

etariums. The user can also switch dataset during render as well as render technique, buffer objects,

color settings and many other properties. It is also possible to turn time on and see the stars move with

their calculated space velocity, or transverse velocity if the star lacks radial velocity measurements.

The calculations omits the gravitational rotation.

The purpose of the thesis has been fulfilled as it is possible to fly through the entire DR2 dataset on

a moderate desktop computer and filter the data in real-time. However, the main contribution of the

project may be that the ground work has been laid in OpenSpace for astronomers to actually use it as a

tool when visualizing their own datasets and also for continuing to explore the coming Gaia releases.

Keywords: out-of-core rendering, large-scale visualization, hierarchical octree, GPU streaming, real-

time filtering.

Acknowledgments

This thesis would not have been the same without the help and enthusiasm from a number of people

to whom I would like to express my gratitude.

First and foremost I have to thank my supervisor Emil Axelsson for all of our technical discussions,

for letting me exploit your illustration skills, for being my travel companion and especially for the

last few weeks of the project when we worked hard to get the rendering to work in a dome. I owe you

a lot!

Secondly I would like to thank Anders Ynnnerman for proposing the project in the first place and for

enabling me to go and meet astronomers both in Vienna and in New York.

I would also like to thank Jacqueline Faherty and Brian Abbot at the American Museum of Natural

History (AMNH) for inviting me to New York, for your willingness to use OpenSpace for the Gaia

Sprint event at AMNH and for being fantastically enthusiastic and generous hosts. João Alves and

Torsten Möller at the University of Vienna for your initial input for the project and for sharing your

enthusiasm for Gaia and at the possibility of getting a new visualization tool to work with. Marie

Rådbo for teaching me about space in general and for cheering me on. Alexander Bock and the rest

of the OpenSpace developer team for helping me prepare for the event at AMNH and for letting me

be a part of the team. Patrik Ljung, Karljohan Lundin Palmerius, Kristofer Krus and everybody else

at the Visualization Center C in Norrköping for your input and for treating me as your colleague for

last five months.

Finally I would like to thank my partner Rebecca and all the rest of my friends and family who have

supported me during this time. I love you all!

June 15 2018, Norrköping

Adam Alsegård

Contents

Acronyms ii

List of Figures v

List of Tables vii

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 OpenSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 The Gaia mission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Implementation goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related work 5

2.1 Large-scale simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Visualization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Visualization software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Visualize the Gaia mission 9

3.1 The mission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 The spacecraft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 The orbit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4 The releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4.1 Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4.2 Calculating velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Render 1.7 billion stars 15

4.1 System overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Read the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

i

CONTENTS ii

4.2.1 Read a single file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.2 Read multiple files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.3 Pre-process the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Construct an Octree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3.1 Octree structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.3.2 Offline filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.4 Data streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.4.1 Update the buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.4.2 VBO and SSBO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4.3 Removing chunks from the buffer . . . . . . . . . . . . . . . . . . . . . . . 22

4.4.4 Rebuilding the buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4.5 Stream from files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.5 Render techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.5.1 Render modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.5.2 Billboard instancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5.3 Point splatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5.4 Real-time filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5.5 Render in a dome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Results 28

5.1 Reading from multiple files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2 Construction of the octree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.4 NYC Gaia Sprint 2018 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Analysis and Discussion 34

6.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.4 Software comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.5 Source criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.6 The work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Conclusion 40

7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Acronyms

AABB Axis-Aligned Bounding Box. 19, 35

AMNH The American Museum of Natural History. 1, 7, 15, 31, 38, 40, 41

AU Astronomical Unit. 9, 13

CCA Center for Computational Astrophysics. 31

CPU Central Processing Unit. 3, 5, 6, 22, 28, 41

CSV Comma-Separated Value. 15, 17, 34, 35

Dec Declination. 11, 12, 17, 19

DPAC Gaia Data Processing and Analysis Consortium. 11

DR1 Gaia Data Release 1. 2, 11, 15, 17, 34

DR2 Gaia Data Release 2. 2, 3, 8, 11, 12, 13, 14, 15, 17, 19, 28, 34, 37, 41

ESA European Space Agency. 1, 2, 9, 16

FBO Framebuffer Object. 25, 30, 35, 36

FITS Flexible Image Transport System. 15, 16, 17, 28, 34, 40

FPS Frames Per Second. 37

GPU Graphics Processing Unit. 3, 5, 6, 15, 17, 18, 19, 20, 22, 23, 24, 28, 34, 41

ICRS International Celestial Reference System. 11, 12

L2 Second Lagrange point. 10, 11

LiU Linköping University. 1

LOD Level-Of-Detail. 6, 18, 20, 21, 23, 29, 30, 36, 37, 42

MPI Message Passing Interface. 6

NASA National Aeronautics and Space Administration. 1

NYU New York University. 1

iii

Acronyms iv

RA Right Ascension. 11, 12, 17, 19

RAM Random-Access Memory. 3, 5, 6, 15, 17, 18, 19, 22, 28, 30, 36, 41, 42

SCI University of Utah Scientific Computing and Imaging Institute. 1

SGCT Simple Graphics Cluster Toolkit. 26

SSBO Shader Storage Buffer Object. 20, 21, 22, 24, 30, 34, 36, 37, 41

TGAS Tycho-Gaia Astrometric Solution. 2, 7, 15, 34

UBO Uniform Buffer Object. 20

VBO Vertex Buffer Object. 20, 21, 22, 24, 30, 36, 37, 41

List of Figures

2.1 The TGAS dataset of 2 million stars visualized in 3D with TOPCAT. . . . . . . . . . 7

2.2 The TGAS dataset of 2 million stars mapped as the night sky with TOPCAT. . . . . . 7

3.1 Illustration of how a parallax angle is determined. . . . . . . . . . . . . . . . . . . . 9

3.2 An artist’s rendition of the Gaia spacecraft for the Paris Air Show 2013. . . . . . . . 10

3.3 The Gaia spacecraft model rendered in OpenSpace. The model has been rotated be-

fore this screenshot so that the sun will brighten up the instrument. . . . . . . . . . . 10

3.4 Trail lines of the Gaia spacecraft rendered in OpenSpace. The shown trajectory is with

respect to Earth’s position in space. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5 Illustration of how the space velocity vector can be broken up into a transverse veloc-

ity and a radial velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1 Illustration of the data pipeline when reading a dataset from multiple files, such as the

full DR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Illustration of the data pipeline when reading a dataset from a single file. . . . . . . . 16

4.3 Illustration of how the size of the octree depends on the maximum number of stars in

each node and the initial extent of the octree. . . . . . . . . . . . . . . . . . . . . . 18

4.4 Illustration of which nodes that are eligible for streaming to the GPU as the camera

rotates. The red (striped) nodes are already uploaded to the GPU and will not be

updated. The blue (clear) nodes are no longer visible and will be removed from the

GPU with their indices being returned to the index stack on the next render call. The

green (circle) nodes become visible and will be uploaded to the GPU. If the node with

given buffer index 88 is smaller in screen space than a set threshold it will return its

LOD cache instead of traversing any further. . . . . . . . . . . . . . . . . . . . . . . 20

4.5 Illustration of how the SSBO buffers are updated in a single draw call. The traversal

adds a node with buffer index 2 and removes the node with index 3. First the index

buffer is updated linearly. The index buffer keeps track of the accumulated sum of

stars in the data buffer. The numbers with new and removed stars are added and prop-

agated through the remaining buffer. Thereafter the data buffer is updated with the

actual new data. The buffer index of the node is used to determine where in the buffer

the data should be written. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.6 Illustration of which nodes that will be fetched around the camera initially. The cyan

nodes are children to the neighboring nodes on the same level as the inner node (red)

that contains the camera. The blue nodes are children to neighboring nodes of the

second parent while orange nodes are the same for the third layer of neighboring

parent nodes. By default this means that 632 nodes will be fetched around the camera

with the possibility for the user to add more layers. . . . . . . . . . . . . . . . . . . 23

v

LIST OF FIGURES vi

4.7 The TGAS subset rendered as Static. Here all stars are assumed to have the same

luminosity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.8 The TGAS subset rendered as Color. The luminosity is calculated from the stars’

absolute magnitude. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.9 Extreme over-exposing effect while rendering billboards. . . . . . . . . . . . . . . . 25

4.10 Stars rendered as billboards with an excessive initial size and close-by stars boosted

even further. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.11 Stars rendered as points with a filter size of 19 and sigma of 2.0 which gives an

excessive effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.12 The radial velocity subset with 7.2 million stars rendered in Static mode with all stars

enabled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.13 The radial velocity subset rendered in Static mode with all stars without parallax

filtered away. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.14 Darkening effect when rendering in fish-eye mode (i.e. on a curved screen). . . . . . 26

4.15 Illustration of how the scalefactors for σmajor and σminor are calculated. . . . . . . . 27

5.1 Render statistics with 1.1 million stars visible and a screen resolution of 1280x720. . 31

5.2 Render statistics for the radial velocity dataset with a screen resolution of 1920x1200. 32

5.3 Performance for different filter sizes with and without scaling of the filter enabled. 2

million stars visible on a flat screen with 1920x1200 resolution. . . . . . . . . . . . . 33

5.4 Photograph taken at the Gaia Sprint show at AMNH. . . . . . . . . . . . . . . . . . 33

6.1 Block structure that can occur when storing the brightest stars as LOD cache. . . . . 37

6.2 Highlighting the visual artifact when rendering the full DR2 dataset of 1.7 billion stars. 37

6.3 Gaia Sky running their largest dataset with a background image of the Milky Way,

with maximum star brightness used. . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.4 Gaia Sky running their largest dataset without a background image of the Milky Way,

with maximum star brightness used. . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.5 61 million visible stars rendered in OpenSpace with a background image of the Milky

Way. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.6 61 million visible stars rendered in OpenSpace without a background image of the

Milky Way. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

List of Tables

5.1 Statistics while running ReadFitsTask for the full DR2 with 1.7 billion stars. . . . . . 28

5.2 Statistics while running ReadFitsTask for a random subset from DR2 with 42,9 mil-

lion stars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3 Statistics while running ReadFitsTask for the radial velocity subset from DR2 with

7.2 million stars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.4 Statistics while running ConstructOctreeTask for three different DR2 datasets. . . . . 29

5.5 Statistics while running ConstructOctreeTask for DR2 and filter both bright and dim

stars with a parallax error less than 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.6 Performance while rendering a subset with 618 million stars on a flat screen with

1920x1200 resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.7 Statistics while rendering different large datasets on a flat 1920x1200 screen. . . . . 32

vii

Chapter 1

Introduction

Humanity’s drive to explore goes back to the beginning of our civilization. It began with exploring

our closest environment but expanded quickly and once most of the Earth had been discovered we

turned our gaze to the stars. While there is still a lot we do not understand about our own planet there

is even more knowledge hidden in space.

This thesis is part of that drive to explore. The main focus of the project is to work with a dataset of

about 1.7 billion stars released by the European Space Agency (ESA) as part of their Gaia mission

and to develop tools that astronomers around the world can use to explore this dataset. With help from

these tools the astronomers are hopefully able to discover new knowledge for us all to share.

This report will discuss what optimization techniques that can be used to work with and render a

dataset of that magnitude in real-time. The techniques were implemented in the open source software

OpenSpace1.

1.1 Background

This thesis project stems from a collaboration between professors at Linköping University and at the

University of Vienna. The scientists in Vienna already worked with data from ESA’s Gaia mission

but were interested in better visualization tools to explore the dataset. At the same time a group at

Linköping University had been working on a software to visualize the cosmos for a few years but had

not incorporated data from the Gaia mission as of yet. That is where this thesis come into play.

1.1.1 OpenSpace

The visualization software developed partly at Linköping University is called OpenSpace and is an

open source interactive data visualization software designed to visualize the entire known universe

and portray humanity’s ongoing efforts to investigate the cosmos [11]. It is designed to support both

personal computers as well as domes/planetariums that are using a cluster of computers and projec-

tors. The aim is both to serve as a platform for scientists chasing new discoveries and for museums

working with public outreach. The software is implemented in C++17 and OpenGL 3.3 and above.

OpenSpace supports multiple operating systems, interactive presentations of dynamic data and en-

ables simultaneous connections across the globe. The development is a collaboration between

Linköping University (LiU), The American Museum of Natural History (AMNH), National Aero-

nautics and Space Administration (NASA), New York University (NYU) and the University of Utah

1http://openspaceproject.com/

1

CHAPTER 1. INTRODUCTION 2

Scientific Computing and Imaging Institute (SCI).

1.1.2 The Gaia mission

On December 19, 2013 ESA launched the Gaia instrument into space with the objective to "measure

the position, distances, space motions and many physical characteristics of some one billion stars

in our Galaxy and beyond" [1]. About two years later ESA released Gaia Data Release 1 (DR1)

to the public in September 2016 with 1.1 billion point sources based on observations during the

first 14 months of the mission. Out of those sources "only" 2 million counted as part of Tycho-Gaia

Astrometric Solution (TGAS) or were referred to as primary sources. The rest were merely seen

as placeholders until the next release because they either lacked a number of parameters including

parallax or because of high uncertainties in the measurements.

On the 25th of April 2018 the Gaia Data Release 2 (DR2) went public. In this update 1.7 billion

point sources have their measured galactic position in space and G-band magnitude. A large portion

also have photometry, parallaxes and proper motions as well as radial velocities for about 7.2 million

stars2. A third release will happen in 2020 before the final release of the catalogue in the end of 2022.

A more in-depth explanation of the Gaia mission and its releases can be found in Chapter 3.

1.2 Objective

The work of this thesis has two main objectives regarding the Gaia mission. The first is to enable

interaction with the Gaia DR2 dataset for public outreach purposes and the second is to develop tools

that scientists can use in their research while exploring the same dataset. Each objective can be divided

into a few sub goals.

1.2.1 Implementation goals

The following are the implementation goals for the public outreach objective:

• Visualize the Gaia mission and how the instruments measure the stars.

• Display the stars at their measured position in 3D space and render them physically correct with

regard to brightness and size.

• Be able to switch rendering technique during runtime so the full extent of the measurements

can be appreciated.

• Put the new dataset into context with previous knowledge, such as the constellations, and inte-

grate it with the rest of the OpenSpace software.

The implementation goals for the researched focused objective are as follow:

• Be able to load the full DR2 dataset, or a chosen subset, and render it in 3D with interactive

framerates.

• Be able to re-load a different subset during runtime. The subset could have been created in a

third party software.

2https://www.cosmos.esa.int/web/gaia/dr2


• Be able to filter what stars to render by their spatial information as well as photometric proper-

ties or by the estimated error in the measurements.

• Render stars differently depending on magnitude and photometry to make them easier to clas-

sify.

• Read velocity where possible and be able to "turn back time" to be able to see how the stars

move in space.

1.2.2 Research questions

Part of the implementation process is to figure out what techniques to use. The main question this

thesis poses is: What combination of optimization techniques can enable real-time rendering of one

billion stars?

As this is a fairly big question it can be divided into several smaller questions within different areas.

For example, as the DR2 dataset will be much too large to store in a computer’s working memory,

or Random-Access Memory (RAM), some sort of streaming technique from disk or network will

have to be researched and implemented. Then an optimization technique of uploading the data to

the Graphics Processing Unit (GPU) will have to be implemented as well, as the memory on the

GPU is even smaller than the one on the Central Processing Unit (CPU). Finally there is the issue

of rendering the particles to the screen. When taking these insights into consideration the research

questions become:

• What optimization technique will enable streaming of data dynamically to the CPU during

runtime?

• How should the data be structured to limit the uploading of data to the GPU?

• What rendering technique is optimal for rendering as many stars as possible?

• What rendering technique is optimal for improving the readability when exploring the dataset?

• What tools can OpenSpace offer to astronomers that are not already available?

1.3 Limitations

The limitations this project faces are mostly connected to OpenSpace and how it is used. As Open-

Space is used in planetariums around the world the techniques must work on a cluster of computers,

most with only a single GPU per computer and possibly a limited amount of available RAM. Thus

solutions for supercomputers are not feasible for this project. However, if the user has better hardware

the algorithms should be able to scale accordingly.

OpenSpace also supports both Windows, Linux and Mac OS with multiple users on each operating

system. Thus, to the extent that it is possible, the implementation should strive to work for all operating

systems. This is most notable for Mac OS as they so far only supports OpenGL versions up to 4.1 [6],

while Windows and Linux support up to 4.6, which was released in July 2017 and is still the latest

version as of June 2018 [19]. To run OpenSpace users are required to have at least OpenGL 3.3 but

several modules require a later version to work properly.


1.4 Delimitations

There are many aspects of the Gaia DR2 dataset that this thesis will not take into consideration. The

release contains information about a lot of different celestial objects such as exoplanets, asteroids,

quasars, supernovae and variable stars that will not be mentioned further in this report. Only stars will

be visualized and only a subset of the information about each star will be presented as well. Up to

95 values were released for each star but only the 24 values used for either rendering or filtering (see

Section 4.5 and 4.3.2) will be explained in this report.

Chapter 2

Related work

As computing hardware capabilities grow so does the amount of scientific data that those hardware

are able to produce. To visualize the larger amount of data new techniques have to be developed

that can handle the increased requirements in memory and bandwidth. This chapter will briefly go

through a few examples of recent large-scale simulations and novel algorithms developed to visualize

the simulated data, as well as visualization software that can be used to explore huge datasets in

real-time. Even though the following sections will mainly discuss astronomical data the presented

techniques should work for other kinds of large particle datasets as well, such as molecular structures.

2.1 Large-scale simulations

Gaia is not the only instrument that generates large quantities of data. Most of the others are however

based on simulations instead of real measurements. In astrophysics this is often referred to as N-

body simulations, as they try to solve how n bodies interact with each other gravitationally. These

simulations have grown rapidly both in size and fidelity in the recent past due to improvements in

hardware and algorithms as well as in observation techniques that can validate the results. A few recent

examples of N-body simulations are the Millennium Run in 2005 consisting of 10 billion particles

[13], a 6 billion year simulation of the Milky Way on a supercomputer with 51 billion particles in

2014 [9], the Millennium XXL project in 2010 that simulated over 300 billion particles for more

than 13 billion years [5], the Q Continuum simulation in 2015 that harnessed the powers of GPUs in

supercomputers and simulated 550 billion particles [17] and the Dark Sky simulations that had a first

data release in 2014 with 1.07 trillion particles [31].

The growth of these simulations is outperforming Moore’s Law which so far has been a good indicator

of the growth of hardware capabilities. Beside position these simulations often produce velocity and

other properties per star, not unlike the measurements provided by Gaia. This increase of data poses

a challenge for all visualization tools and new algorithms for data management and rendering have to

be developed to be able to work with such huge datasets.

2.2 Visualization techniques

The goal of most visualization tools is to achieve good enough framerates that the user can interact

with the data in real-time. To do that you have to optimize the bottlenecks in the rendering pipeline

which currently usually are reading large amount of data from disk, streaming data from the CPU to

the GPU and ineffective rendering techniques.

5

CHAPTER 2. RELATED WORK 6

Because the memory footprint of the particle data often exceeds the amount of available RAM in a

single computer one either has to divide the data into numerous files, require a computer with large

enough RAM to fit the whole dataset, compute the renderings offline or make use of a cluster of

machines to achieve interactive framerates.

After the particle data have been read it should be reorganized into an ordered data structure as a

preprocess to optimize the disk operations during runtime. The most popular structure for particle

data in recent research is an octree ([27], [26], [29] etc). The octrees are mainly used to optimize what

data to load into memory, or what to stream to the GPU. Another way to reduce how much data to

stream is to use Level-Of-Detail (LOD), which is to store several layers of data with more and more

complexity. During rendering one of the levels with less complexity can then be fetched if the node is

far away and thus reduce the amount of data to stream.

The last area that researchers are focusing on improving is rendering techniques. Some of the most

common techniques today are direct rendering of particles, volume rendering, distributed rendering

and CPU-based ray-casting. Direct rendering is also known as point splatting [18], which often makes

use of geometry instancing in geometry shaders (also sometimes referred to as billboarding or point

sprites) to reduce the complexity of the data to upload to the GPU.

One group that made use of a computer cluster was Rizzi et al. [27] who were able to run a dataset

of 32 billion particles using a cluster of 128 GPUs. They used a hierarchical octree structure and a

distance-based LOD and made use of a Message Passing Interface (MPI) to read the data in parallel

before rendering it with a parallel point sprite algorithm.

If you instead have large amounts of RAM you can use the technique presented by Wald et al. [34] to

render the dataset on a single CPU. They used a purely CPU-based ray-tracing algorithm to render a

dataset of a billion particles with interactive framerates using a 72-core CPU with 3 TB of RAM. The

data structure was a balanced k-d tree but no LOD were used in contrast to most other implementations

but it was still competitive to many GPU-based techniques.

If you do not have access to a supercomputer or special hardware with terabytes of RAM, as is the

case with this thesis project, there are still a couple techniques out there. One makes use of a CUDA-

accelerated wavelet compression to further reduce bandwidth requirements when streaming from disk

and manages to render the 10 billion particles simulated by the Millennium Run with point splatting

[26]. Another combines LOD and billboarding to render a dataset of 10 billion particles of molecular

data on a single GPU [20]. Another example of rendering the Millennium Run is Fraedrich et al. [13]

who used an adaptive octree structure and continuous LOD and saved subsets of the octree to files.

All nodes with the same parent (i.e. a subtree) were packed together in a file and then only files that

were visible were loading during a fly-through.

There are several other examples of implementations where parts of the octree were saved to files and

streamed dynamically during render, such as Lukac [21] who claims to be able to render up to 10

billion particles in HD-resolution on a single desktop with only 4 GB of RAM. However, almost 90%

of the particles were culled during rendering.

Whereas most of the mentioned projects have used point splatting or ray-casting there are also de-

velopment in the field of volume rendering. Scherzinger et al. [30] came up with a novel merger-tree

approach that combined volume ray-casting on volumetric resampling and direct visualization of halo

overlays and their evolution over time. The paper won the IEEE Scientific Visualization Contest in

2015 and was implemented in the framework Voreen.

Finally there is a novel hybrid method presented by Schatz et al. [29] that renders a dataset of a

trillion particles from the Dark Sky simulation on a single computer. They make use of a dual-GPU

configuration that splits the data depending on type and renders particles with geometry instancing

on one GPU for details and uses volume-based raycasting on density volume on the other GPU to


provide context. Their approach is to make use of an octree structure which stores all leaf nodes as

files and then only load nodes in the closest vicinity of the camera. By limiting the streaming of data

they only need about 5 GB of RAM for the particle data during rendering.

2.3 Visualization software

There is a wide range of existing visualization software packages focusing on astronomy. While as-

tronomers often are more comfortable with low-level tools like TOPCAT1 and GlueViz2, which are

great for selecting subsets and flexible linked views, several higher level tools exist as well. To give

some context of what astronomers are used to look at Figure 2.1 shows the TGAS dataset of 2 million

stars being visualized in TOPCAT while Figure 2.2 shows the same dataset being mapped as night

sky on a sphere. Together they illustrate the difficulty of getting a sense of scale and structure, which

is why most astronomers so far are working with subsets of 100,000 stars or less.

Figure 2.1: The TGAS dataset of 2 mil-

lion stars visualized in 3D with TOP-

CAT.

Figure 2.2: The TGAS dataset of 2 mil-

lion stars mapped as the night sky with

TOPCAT.

Another type of software are the commercial planetarium focused, such as Uniview3 and Digistar4

(developed by Sciss and Evans & Sutherland respectively). They incorporate a lot of visualizations

for the cosmos but also produces other content for planetariums as well as full dome solutions.

Then there are the open source software packages, all with different strengths and focuses. ParaView5

can for example visualize star particle data in 3D but are essentially focused on distributed mem-

ory computing and scaling up to bigger clusters or supercomputers. It can visualize large datasets

whatever scientific area but does therefore also not contain as many features for the cosmos.

Partiview6 and Celestia7 on the other hand have similar features as OpenSpace regarding the known

cosmos. Partiview contains much of the same regarding the outer universe while Celestia is more

focused on the solar system. Partiview was developed by Brian Abbot at AMNH, who in turn created

the Digital Universe catalogue that both Partiview and OpenSpace uses. OpenSpace does have quite

1http://www.star.bris.ac.uk/~mbt/topcat/2http://www.glueviz.org/en/stable/3http://sciss.se/uniview4https://www.es.com/digistar/5https://www.paraview.org/6http://virdir.ncsa.illinois.edu/partiview/7https://celestia.space/


a few advantages in terms of globe browsing [10] and visualization of specific missions however.

Neither Partiview nor Celestia have made any effort to incorporate data from the Gaia mission as of

yet.

The software that this thesis project has most in common with is Gaia Sky8 which is promoted by the

official Gaia website. Gaia Sky is a real-time 3D visualization tool that focuses on the Gaia mission

and its data. The project started in the end of 2014, about the same as OpenSpace, and has since

then been in continuous development with version 2.0 released on the same day as DR2 with several

subsets of the data already preprocessed and ready for download.

Gaia Sky does have a couple of features that OpenSpace lacks, such as screen-space picking of ob-

jects and showing information about the picked objects in the user interface. Several features that

were requested from the astronomers in Vienna, such as selection of subsets by regions and measure

distances between objects, are still absent from both OpenSpace and Gaia Sky. Much like OpenSpace

it feels like Gaia Sky’s main objective is public outreach and not to be used as a research tool. A more

in-depth comparison with Gaia Sky will be held in Section 6.4.

8https://zah.uni-heidelberg.de/institutes/ari/gaia/outreach/gaiasky/

Chapter 3

Visualize the Gaia mission

The first implementation goal of this thesis was to visualize the Gaia mission. The main objective

of the (still ongoing) mission is to measure about 1% of the stars in our Milky Way. To reach that

goal a spacecraft has been placed in an orbit around the sun where it keeps pace with the Earth. The

following sections describe the mission, spacecraft and orbit, as well as explaining what data that

have been released and how it should be converted before it can be used in OpenSpace.

3.1 The mission

Earth’s motion around Sun

Distant stars

Near star

Apparent parallaxmotion of near star

Parallax angle= 1 arc second

p

1P

arse

c

1AU

Figure 3.1: Illustration of how a

parallax angle is determined.

Source: Srain @ Wikipedia by PD

[32]

Astrometry, the discipline to accurately measure the position of

celestial objects, has a long history. The earliest star catalogue

dates back to 190 BC and contained at least 850 stars and their

positions. The discipline had an obvious surge after the inven-

tion of the telescope in the 17th century, but it took until the

19th century before astronomers were able to figure out how to

accurately measure the distance to the stars. The method they

came up with was to use the parallax angle.

The method is similar to holding a finger in front of your face

and then close one eye at a time and observing how the finger

moves in contrast to a static background. When measuring stars

the orbit of the Earth is used instead of the distance between

your eyes. A photo is taken every six months in the same direc-

tion and the observed displacement of close stars is used to de-

termine their parallax angle. That angle is in turn used together

with the known distance between the Sun and the Earth (1 As-

tronomical Unit (AU) or 150 million kilometers) to determine

the distance to that star by using simple trigonometry. Figure

3.1 illustrates the method.

However, observing the parallaxes from Earth proved to be quite

difficult because of disturbances in the atmosphere and by the

mid-1990s only about 8000 stars had accurate parallaxes. That

changed in 1997 when the finding of ESA’s Hipparcos satellite

(or High Precision Parallax Collecting Satellite) were released

which measured parallaxes with high precision for 117.955 ob-

jects [3].

9

CHAPTER 3. VISUALIZE THE GAIA MISSION 10

Gaia (or Global Astrometric Interferometer for Astrophysics) is meant to be the successor of Hip-

parcos and the science project started in 2000. The construction of the instrument was approved in

2006 and it was launched on December 19 2013. The mission started with four weeks of ecliptic-pole

scanning and subsequently transferred into full-sky scanning. The mission is to measure different

properties regarding positions (astrometry), flux/intensity (photometry) and electromagnetic radiation

(spectrometry) of about one percent (or two billion) of all stars in the Milky Way with high precision.

Gaia will also be able to discover new asteroids, comets, exoplanets, brown dwarfs, white dwarfs,

supernovae and quasars but the main objective is to clarify the origin and history of our galaxy [14].

3.2 The spacecraft

The Gaia spacecraft is comprised of three major functional modules; the payload module, the me-

chanical service module and the electrical service module. To simplify one can say that the payload

module is constructed to capture and process the data, the mechanical service module controls the

navigation system and operates the instruments while the electrical service module controls the power

and communications with the Earth.

The payload module carries two identical telescopes pointing in different directions. Three instru-

ments are connected to these telescopes. The first is an astrometric instrument that will measure

stellar positions on the sky. By combining several measurements of the same star it it also possible to

deduce its parallax, distance and velocity across the sky.

The second is a photometric instrument that provides colour information by generating two low-

resolution spectra, one red and one blue. These are used to determine properties such as mass, temper-

ature and chemical composition. The third instrument is a radial velocity spectrometer that calculates

the velocity in depth by measuring Doppler shifts of absorption lines in a high-resolution spectrum in

a specific wavelength range. How the instruments work will not be further explained or visualized in

this thesis project. Instead more information can be found on Gaia’s official website1.

However, the Gaia spacecraft itself will be visualized in OpenSpace. An artist’s rendition of the space-

craft can be seen in Figure 3.2. An open-source model of the spacecraft (produced by the University

of Heidelberg) was imported into OpenSpace and is rendered at its correct position in space w.r.t a

specific time and date. A render of the model in OpenSpace in shown in Figure 3.3.

Figure 3.2: An artist’s rendition of the Gaia

spacecraft for the Paris Air Show 2013.

Source: Pline @ Wikipedia by CC BY-SA

3.0 [25]

Figure 3.3: The Gaia spacecraft model ren-

dered in OpenSpace. The model has been ro-

tated before this screenshot so that the sun

will brighten up the instrument.

1https://www.cosmos.esa.int/web/gaia/spacecraft-instruments


3.3 The orbit

A few weeks after the launch in the end of 2013 the Gaia instrument arrived at its operation point, the

Second Lagrange point (L2) of the Sun-Earth-Moon system which is about 1.5 million km from the

Earth. Lagrange points are positions in an orbital configuration of large bodies where the gravitational

forces of the larger objects will maintain a smaller object’s position relative to them. Or as Mignard

puts it "The region around L2 is a gravitational saddle point, where spacecraft can be maintained at

roughly constant distance from the Earth for several years by small and cheap manoeuvres." [22].

At L2 Gaia will keep pace with the Earth’s orbit while enjoying a less obstructed view of the cosmos

than an orbit around the Earth would provide. However, in a circular radius around L2 the Sun is

always eclipsed by the Earth and thus the solar panels on Gaia would not receive enough sunlight.

Gaia was therefore placed in a large Lissajous orbit around L2 to ensure it stays away from the eclipse

zone for at least six years [22] to be able to complete its mission.

Figure 3.4: Trail lines of the Gaia

spacecraft rendered in OpenSpace.

The shown trajectory is with re-

spect to Earth’s position in space.

The orbit is visualized in OpenSpace by rendering trail lines

of its accurate journey to, and orbit around, the L2 point with

respect to Earth’s position. Figure 3.4 shows Gaia’s trajectory

from launch up until 22 May 2018. Data of the trajectory was

obtained from the HORIZONS Web-Interface 2 hosted by the Jet

Propulsion Laboratory at California Institute of Technology. To

keep fidelity the model in OpenSpace has been rotated so its

sun-shield is always facing the Sun. A correct rotation around

its own axis is not yet in place however as no real data of the

rotation could be found.

OpenSpace already had implemented techniques to render trail

lines but a new translation interface was implemented for this

project to read the text format exported from Horizons, which

was used to place the instrument at its correct position as well

as showing the trail up until the end of the nominal mission. The

position for past dates is based on measurements and for future dates it is an approximation. If the

user uses the menu in OpenSpace to turn back time (or set it to a future date) where no position data

exists then the model will simply remain at the last known position.

One drawback of using Horizons is that the data is static, it is only accurate up to the date it was

generated. Many other satellites have their data released in so called SPICE kernels. If a kernel file is

updated then that change will be read on the next start-up and no manual update to the data has to be

made. However, Gaia has not been released as a SPICE kernel as of yet [2] but the implementation

should be updated to use it when or if it is released.

3.4 The releases

The measurements from the Gaia spacecraft will be released in four different batches. DR1 was

released on September 14 2016, DR2 was released on April 25 2018, the third release will happen in

late 2020 and the final release for the nominal mission will be in the end of 2022.

A short summary is that each release will have measurements of more stars and better measurements

of previously released stars. DR2, which is the release this project is focused on, contains about 1.7

billion point sources with about 1.3 billion of them having measurements for parallax and proper

2https://ssd.jpl.nasa.gov/horizons.cgi


motion in addition to their position on the night sky. Proper motion tells us the transverse velocity of

the star across the sky. The position and proper motion of non-solar system objects are expressed in

the International Celestial Reference System (ICRS) in terms of equatorial angles Right Ascension

(RA) and Declination (Dec). However OpenSpace uses a Galactic Coordinate System3 with angles

in galactic latitude and longitude. A conversion of the positions had already been done by Gaia Data

Processing and Analysis Consortium (DPAC) and both equatorial and galactic angles were released

in DR2 [12]. However, the measurements of proper motion had not and thus a conversion had to be

done before proper motion was used to calculate the velocity.

3.4.1 Conversions

The conversions used are the same as in the documentation for DR2 which made use of a simple ma-

trix multiplication system [4]. A point in ICRS and the Galactic Coordinate System can be expressed

as vectors

rICRS =

XICRS

YICRS

ZICRS

=

d cosα cos βd sinα cos β

d sin β

(3.1)

and

rGal =

XGal

YGal

ZGal

=

d cos l cos bd sin l cos bd sin b

(3.2)

where d is the distance to the star, α and β are the equatorial angles RA and Dec while b and l are

galactic latitude and longitude. Distances in space are often expressed in parsec [pc]. The reason

behind it is tied to how parallax angles are expressed. When using very small angles, such as the

parallax angle to a distant star, 360 degrees are simply not enough to express them. Instead every

degree is divided into 1 archour or 60 arcminutes. One arcminute is in turn 60 arcseconds. If the

parallax angle of a star is exactly one arcsecond then the distance to that star is one parsec (or 3.08 ∗1016 meters). The relationship is given by d = 1/p where p is the parallax angle. The angles in DR2

are given in milliarcseconds and as such the distances are expressed in kiloparsecs. The conversion

from ICRS to Galactic is then obtained by

rGal = A′

G ∗ rICRS (3.3)

where

A′

G = Rz(−lΩ)Rx(90° − δG)Rz(αG + 90°) (3.4)

=

−0.0548755604162154− 0.8734370902348850− 0.4838350155487132+0.4941094278755837− 0.4448296299600112 + 0.7469822444972189−0.8676661490190047− 0.1980763734312015 + 0.4559837761750669

(3.5)

3http://astronomy.swin.edu.au/cosmos/G/Galactic+Coordinate+System


is a fixed orthogonal matrix that represents the rotation around the three axis. The proper motion

angles pmra and pmdec can be expressed as components (µ⋆α, µβ) with the corresponding values in

Galactic angles being (µ⋆l , µb) where µ⋆

α = µα cos β) and µ⋆l = µl cos b).

For the conversions of the proper motion angles four auxiliary matrices are required;

pICRS =

− sinαcosα0

, qICRS =

− cosα sin β− sinα sin β

cos β

(3.6)

and

pGal =

− sin lcos l0

, qGal =

− cos l sin b− sin l sin b

cos b

(3.7)

which represents unit vectors in the direction of increasing α and β (or l and b). The Cartesian com-

ponents of the vectors can then be expressed as

µICRS = pICRS ∗ µ⋆α + qICRS ∗ µβ (3.8)

and

µGal = pGal ∗ µ⋆l + qGal ∗ µb (3.9)

with the conversion being

µGal = A′

G ∗ µICRS (3.10)

where A′

G is the same as in Eq 3.5. This implies then even though a conversion of the positions had

taken place in DR2 OpenSpace can now read angles for both position and proper motion and convert

them to the Galactic Coordinate system if need be.

3.4.2 Calculating velocity

One of the implementation goals of the project was to read velocity where possible and get the stars

to move. To get the space velocity you need two vectors, a transverse velocity vector and a radial

velocity vector (see Figure 3.5). To calculate the transverse velocity you need proper motion and the

distance to the star. Proper motion is as mentioned the traverse motion across the sky and is expressed

in milliarcseconds per year in DR2. To convert it to m/s one can use the same relationship as when

calculating the distance from the parallax angle. An angle of µ arcsec at a distance of r pc corresponds

to a separation of rµ AU [23]. In our case the proper motion parameters are the angles and the distance

is obtained from the parallax. To convert the separation/transverse velocity from AU/year to m/s one

can use the following equation.


Figure 3.5: Illustration of how the space

velocity vector can be broken up into a

transverse velocity and a radial velocity.

Source: Brews ohare @ Wikipedia by

CC BY-SA 3.0 [24]

1AU/year =∼ 1.5 ∗ 1011m

∼ 3 ∗ 107s= 4.74 ∗ 103m/s (3.11)

The last parameter needed to calculate the space velocity

is the radial velocity, which is the velocity by which stars

move towards or away from the Sun. In DR2 around 7.2

million stars were released with a radial velocity (in km/s).

The radial velocity vector is calculated by using Eq 3.2

with the radial velocity as distance. The space velocity vec-

tor is then obtained by combining the two velocity vectors.

This vector can later be used to simulate the movement

of the stars. This is however only an instantaneous veloc-

ity vector. To obtain the true motion one has to incorpo-

rate the gravity of nearby stars and the rotation around the

galaxy’s center, which has not yet been implemented into

OpenSpace.

Chapter 4

Render 1.7 billion stars

The main part of this project was to import and display the stars released in DR2. The full release

contains about 1.2 TB of raw data which is too much for most computers to handle. Therefore an out-

of-core rendering technique had to be implemented. The research presented in Section 2.2 concluded

that the main bottlenecks when rendering large datasets usually are the I/O operations, transferring

the data to the GPU and too many or too inefficient shader calls. This chapter will present how these

bottlenecks have been optimized in OpenSpace.

4.1 System overview

As nothing previously had been implemented in the OpenSpace pipeline to handle such large particle

datasets most of the data pipeline had to be implemented from scratch. This section will give a short

overview of the pipeline and then let the following sections describe each step in more detail.

First off there is a difference in the entire pipeline depending on if the dataset is stored in one file or

in several files. If the dataset is stored in one file we assume that it can fit in RAM. This assumption

stems from the subsets that astronomers produced for this project. All the subsets were relatively

small and stored in a single file. According to the astronomers this was how they were used to work.

Therefore, if a single file is read initially then the "single file format" will be kept through the entire

pipeline, even if files are produced in intermediate steps, which in turn means that the dataset cannot

be streamed from disk during render and thus the entire dataset has to fit in RAM.

Before OpenSpace can start processing the data the file(s) have to be in a format OpenSpace can read,

then the steps are basically to read the raw data, sort the stars into an octree structure, upload stars that

are visible to the GPU and finally render them to the screen. The steps can be broken up in separate

tasks, as shown in Figure 4.1 or the whole or parts of the process can be done during start-up for

smaller single file datasets, as illustrated by Figure 4.2.

4.2 Read the data

There are multiple formats that can be used to store star data on disk, for example Flexible Image

Transport System (FITS) 1, SPECK, VOTable and Comma-Separated Value (CSV). DR1 and the

TGAS subset were both released in CSV, FITS as well as VOTable. The University of Vienna also

uses the FITS format while AMNH uses SPECK which implied that OpenSpace had to be able to

support at least both those formats.

1https://heasarc.gsfc.nasa.gov/docs/heasarc/fits.html

15

CHAPTER 4. RENDER 1.7 BILLION STARS 16

Figure 4.1: Illustration of the data pipeline when reading a dataset from multiple files,

such as the full DR2.

Figure 4.2: Illustration of the data pipeline when reading a dataset from a single file.

Because DR2 had not been released by the start of this thesis a single FITS file with the TGAS subset

was used during the majority of the implementation period. The full DR2 was later released as 61,234

separate files. Thus both reading of a single file and of multiple files had to be implemented. The

following sections will describe the difference in the techniques.

4.2.1 Read a single file

A reading of single SPECK files had already been implemented in OpenSpace but reading FITS tables

had not. FITS files can contain either image data or table data. In the case of Gaia it stores tabular

data for all the stars. A FITS file reader module was therefore implemented in OpenSpace where the

user can define which file to read from along with which columns and rows. For the I/O operations

the module uses CCfits2 which in turn builds upon cfitsio3.

The file can either be read on start-up or in another process as a preprocess. This is called a TaskRunner

in OpenSpace and can be run independently from the main process. The reason for implementing the

reading as a task is that reading an ASCII file such as SPECK or FITS can take quite a long time,

especially if the file is big. A ReadSpeckTask and a ReadFitsTask were therefore implemented that

read a single text file and output a binary file with only the star data we are interested in as it is much

faster to read a binary file during start-up.

2https://heasarc.gsfc.nasa.gov/fitsio/CCfits/3https://heasarc.gsfc.nasa.gov/fitsio/fitsio.html


To be fair, FITS files can store tables in binary as well but the files released by ESA were stored in

ASCII. However, even if the tables had been binary it would still be faster to read a preprocessed

file because then the values would already be ordered by star. SPECK files are ordered in row-major

fashion, which means that one row contains data for one star, so the stars can be read sequentially.

CCfits on the other hand reads the table by column, which requires more memory and additional loops

to order the data by star and store it in the correct binary order.

The ReadSpeckTask only takes paths to the input and output files as arguments while ReadFitsTask

also accepts optional parameters for the first and last row to read as well as which additional columns

to read. The columns needed for default rendering and filtering will always be read but the user can

define additional filter parameters. Reading additional columns will slow down the process tremen-

dously however so it is actually preferably to add new columns directly to the code instead.

4.2.2 Read multiple files

The same ReadFitsTask can be used to read multiple FITS files from a folder. This is required if the

dataset is too large for the RAM in the computer. The reason why only FITS is supported is that it

was the fastest format to read a single file out of those that DR1 was released in. As it later turned

out the DR2 dataset was only released in compressed CSV files initially. To read the DR2 dataset all

61,234 files first have to be downloaded from the Gaia archive, unzipped and then converted to FITS.

The conversion was done with a Python script using Astropy4. All the files in the specified folder are

thereafter read in multiple threads. However, CCfits prevents the usage of more than one I/O driver at

a time so what is actually threaded is the processing of the data and the writing to binary files.

To avoid using too much RAM in the next step the data is split into eight initial octants, defined by

the main Cartesian axis. The writing is done in batches so when the number of values in an octant

exceeds a pre-defined threshold the values in that octant are appended to the corresponding file. If

more values are stored per star or if the available RAM is too low to handle eight files it is possible to

divide the data into 8n binary files instead by going down more levels in the octree.

When reading from a folder the ReadFitsTask also takes an additional optional parameter which de-

fines how many threads to use for the reading.

4.2.3 Pre-process the data

The most fundamental thing you need to render anything is the object’s position in 3D space. In our

case we are also interested in the velocity of a star and some parameters for how the star should

look. This amounts to eight parameters in the end; [x,y,z] position, [x,y,z] velocity, magnitude and

color. How these values are used in the rendering is later explained in Section 4.5. Because one

implementation goal was to filter the data a couple of parameters for filtering may also be of interest.

How the filtering works will be explained in Section 4.3.2 and 4.5.4.

To be able to store the eight basic rendering parameters all measurements needed for calculating them

have to be read from the file(s). In some cases the parameters may already be calculated correctly, but

if the data for example is read directly from DR2 they have to be calculated from RA, Dec, parallax,

proper motion and radial velocity. For these calculations the equations presented in Section 3.4 were

used.

If a star is missing a measurement it will be set to a default value. For example, DR2 contains 1.7

billion stars but only 1.3 billion of them had any parallax angle. Thus the distance for those stars were

set to a user-defined constant, which could easily be filtered away later.

4http://docs.astropy.org/en/stable/io/unified.html


4.3 Construct an Octree

Once the data have been read it needs to be structured in a way that can optimize how much data is

streamed from disk and/or streamed to the GPU. Schatz et al. [29] and several others ([27], [26], [33],

[13]) suggest that a version of an octree structure is the best way to go. In this work the subdivision

of the octree is based on the spatial information of the stars. When the number of stars in a leaf node

exceeds a user-defined constant that node is subdivided into eight new leaf nodes of equal size and

redefines itself as an inner node (parent). Figure 4.3 illustrates how a 2D representation of the octree

might subdivide.

Figure 4.3: Illustration of how the size of the octree

depends on the maximum number of stars in each

node and the initial extent of the octree.

The depth of the tree depends on the initial ex-

tent of the tree and how many stars that are stored

in each node. If the initial extent is small it will

generate a shallow tree, which is preferable as it

will speed up the traversals. However, the stars

that fall outside of the initial extent still has to be

stored in the octree. If the outermost node is not

able to contain all the outliers the memory stack

will overflow and the process will crash during

the construction. A higher number of stars per

node will also generate fewer total nodes which

is both faster to construct and to traverse but may

require more data to be streamed to the GPU

later on as the frustum culling will get coarser.

Bigger nodes also implies that fewer nodes can

fit in the buffer in the next step. If many of the

nodes in the tree are underutilized it can be quite

bad for the performance as well due to how the

buffers are updated. When constructing the oc-

tree the inner nodes will keep a copy of the

brightest stars in all their descendants as a LOD

cache that will be used later when traversing the

octree. Therefore a higher number of stars per

node also means that more data will be duplicated. The user has to find a balance between the two

properties to make sure that both the construction time and render performance are at acceptable

levels. A little bit of trial-and-error is necessary as there is no general guideline for how to set the

properties as it depends heavily on the characteristics of the specific dataset.

The construction of the octree can either be done during start-up or in a TaskRunner. The Construct-

OctreeTask can process either a single binary file or the eight binary octant files produced by reading

from multiple files. The process in the two versions are similar. The stars are read one-by-one from

the binary file, checked if they should be filtered away and, if they pass all filters, are inserted into the

octree. The difference is that the single file version saves the octree in a single binary file while the

multiple file version stores the structure of the octree, without any data, in a binary index file and then

stores the data of the nodes in one file per node. The node files are named after their position index in

the octree, which is based on Morton order (or Z-order curve) [8]. This will be important later when

accessing neighboring nodes (Section 4.4.5) as it preserves data locality.

When constructing the octree from several files only one octant is processed at a time. It would be

possible to read several octants at the same time in different threads but that would require more

memory, which the computer used for this project did not posses. It would be possible to add addi-

tional threading, if the hardware can handle it. As of now one branch of the octree is constructed at


a time, after all stars have been inserted the branch is written to multiple files, after which all data is

cleared and the memory deallocated. The user can choose to perform the writing in a different thread

as writing thousands of files otherwise slows down the process. This makes the construction require

more RAM as the reading of the next octant begins before the data has been cleared from the previous

branch. When all octants have been read and all nodes have been stored in separate files then the

structure of the octree is saved to a binary index file. The structure tells us all we need to know to be

able to load the file later, which is the number of stars stored and if the node is a leaf or not.

4.3.1 Octree structure

The octree used in the implementation is a pointer-based octree with every node represented as an

Axis-Aligned Bounding Box (AABB) with a Vec3 center and floating point half-dimension. Every

node has a pointer to each of its eight children but none to its parent. Every node also keep track of

how many stars the node contains, if it is a leaf, if it is loaded into RAM, if it has any descendants

that are loaded into RAM, its index in the streaming buffer, its position in the octree and containers

with render values for all its stars.

4.3.2 Offline filtering

During the construction of the octree the stars can be filtered by a number of parameters. The default

filter parameters from DR2 that are used are position (x, y, z), velocity (x, y, z), magnitude (mean

band of photometry G, Bp and Rp), color (the difference between two photometric bands, i.e. Bp-Rp,

Bp-G and G-Rp), RA, Dec, parallax, proper motion for RA and Dec, radial velocity and errors for the

last six values. It is possible in theory to add all the released values and filter by everything if you

have the hardware for it but to limit the file sizes these were the 24 values chosen. They are also the

filter values suggested by the astronomers in Vienna.

The user can set min and max values for each of these parameters as input. If both min and max are

set to the same value then all stars with that specific value are filtered away. It is possible to set the

minimum to minus infinity and the maximum to positive infinity.

4.4 Data streaming

The next step in the pipeline is to find out which nodes that are eligible for rendering and load them

into RAM, if they are not already loaded, and then stream their data to the GPU. This section will first

explain the algorithms used if the entire octree can fit in memory and then expand those techniques to

when streaming from files.

4.4.1 Update the buffers

If the entire dataset can fit in memory then it will be asynchronously loaded into RAM on start-up

without locking the application regardless if it is stored in a single file or in multiple files. When the

data is loaded into the working memory we need to find out which nodes that should be streamed to

the GPU. During each render call the octree is traversed in pre-order fashion. The node’s AABB is

used to determine if a node intersects the view frustum or not. If it does not there is no need to keep

going down that branch. Another culling technique had already been implemented into OpenSpace

by Bock et al. [7] which in theory removes the near and far planes and enables seamless space travel


over 60 orders of magnitude. However, even though that technique culls objects that are not visible it

kicks in later in the OpenSpace pipeline and to be able to optimize the streaming we need to determine

what nodes to upload in an earlier stage.

If the node is visible and it is a leaf, the data in that node should be uploaded. If the node is visible

but is an inner node we keep the traversal going, unless the node in smaller in screen space than a

user-defined threshold. In that case the stored LOD cache should be uploaded instead and the traversal

stops going down that branch.

However, to limit the streaming to the GPU we only want to update nodes that are not already up-

loaded. An index stack therefore keeps track of all the free spots in the buffer. The maximum size of

the stack is determined by how much dedicated video memory the GPU has. The user can also limit

the maximum usage with a max-percent property.

Each node then has a buffer index with that node’s placement in the buffer, if uploaded. When a node

should be uploaded it first checks if its index is the default value, if not then it already exists in the

buffer. Otherwise it requires the top index in the stack, if not empty, and then the data of that node

will be inserted into that required position in the buffer. Figure 4.4 illustrates how nodes in a similarly

structured quadtree might be updated.

Figure 4.4: Illustration of which nodes that are eligible for streaming to the GPU as the camera rotates.

The red (striped) nodes are already uploaded to the GPU and will not be updated. The blue (clear)

nodes are no longer visible and will be removed from the GPU with their indices being returned to

the index stack on the next render call. The green (circle) nodes become visible and will be uploaded

to the GPU. If the node with given buffer index 88 is smaller in screen space than a set threshold it

will return its LOD cache instead of traversing any further.

4.4.2 VBO and SSBO

Two different techniques for updating the buffers have been implemented. One using the standard

Vertex Buffer Object (VBO) and one using the newer Shader Storage Buffer Object (SSBO), which

requires OpenGL 4.3. When using VBOs the render values for the stars are sent as fixed sized at-

tributes while with SSBOs they are sent as variable sized arrays. The main reason why SSBO is used


instead of the similar Uniform Buffer Object (UBO) is that the size of the latter is limited to 64 KB

or even 16 KB for some GPUs whereas SSBOs guarantees at least 128 MB but is most of the times

only limited on the available video memory on the GPU.

Because the number of stars is different in each leaf node and we cannot calculate the number before a

node should be uploaded we assume that all nodes are of equal size. That way we can easily calculate

the offset for a certain node depending on its index. The data of one node will hereafter be referred to

as a "chunk".

The main difference between VBO and SSBO is that VBOs are unaware of how many stars there are in

each chunk. They assume that all chunks are filled and calls the vertex shader MaxStarsPerNode ∗NumberOfChunks times each frame. This also means that we have to fill up the chunks with zeros

so that old values are not accidentally rendered.

With SSBOs on the other hand we can keep track of the exact number of stars in each chunk in an

index buffer and send that to the shader as an additional variable-sized array. The shader will then

use a binary-search-based algorithm to find the correct index of the star to process. This means that

the shaders are only called for the exact number of stars that will be rendered and that there is no

need to upload any extra zeros to overwrite old values. Figure 4.5 illustrates how the beginning of the

SSBO buffers would be updated for the camera movement illustrated in Figure 4.4. The index buffer

is updated with a single glBufferData call and will thus not increase the bandwidth significantly.

Figure 4.5: Illustration of how the SSBO buffers are updated in a single draw call. The traversal adds a

node with buffer index 2 and removes the node with index 3. First the index buffer is updated linearly.

The index buffer keeps track of the accumulated sum of stars in the data buffer. The numbers with

new and removed stars are added and propagated through the remaining buffer. Thereafter the data

buffer is updated with the actual new data. The buffer index of the node is used to determine where in

the buffer the data should be written.

There are two ways to update parts of a buffer in OpenGL, glMapBufferRange and glBufferSub-

Data. Both were implemented in OpenSpace with a technique called buffer re-specification or buffer

orphaning which prevents synchronization between render calls which in turn improves the perfor-

mance. Both methods worked fine but it turned out that glBufferSubData was the faster option and

was therefore the only one kept in the long run.


4.4.3 Removing chunks from the buffer

To avoid unnecessary memory usage you also need to remove nodes that should no longer be rendered.

This happens when a node is no longer visible or if a node is switching to or from LOD cache. If a

node is not visible then it and all its descendants will be removed from the buffer, if they existed. If

an inner node should return its LOD cache all its descendants will be removed. If an inner node is

visible and big, we should remove the LOD data of that node and traverse its children instead.

When a node is removed its index is reclaimed by the index stack and the buffer index of the node is

set to default again. The traversal will return an empty vector for that index and the buffer will either

overwrite the previous values with zeros (if using VBOs) or propagate the change in the index buffer

(if using SSBOs) as illustrated with the removal of node 3 in Figure 4.5.

4.4.4 Rebuilding the buffer

After running OpenSpace for a while it is possible that the biggest index used in the buffer is dispro-

portionate to the number of chunks rendered. This can happen if the user has zoomed out to look at a

large amount of stars from "the outside" only to zoom in to a smaller group of stars again. This dis-

proportion can lead to a decrease in performance, most noticeable when using VBOs, because it will

call the shaders for a lot of non-existing stars, but it can also have a negative effect on the performance

when using SSBOs as the propagation of changes in the index buffer gets longer than needs be.

To avoid such occasions a rebuilding of the buffer will happen if the biggest chunk index in use is

bigger than 4/5 of the maximum stack size while the number of free spots in the stack is bigger than

5/6 of the maximum stack size. This rebuild will rearrange all visible nodes to lower indices and

overwrite old values. Such a rebuild will also happen on any dynamic change as when switching

dataset, render mode or render technique during runtime.

4.4.5 Stream from files

If the dataset is too big to fit in RAM it needs to be loaded dynamically during rendering. To be able

to do that there are two major additions from the previously mentioned techniques, namely finding

out which nodes to fetch and which to remove from memory.

While Schatz et al. [29] used a technique to fetch all children of the 27 closest parent nodes to the

camera they only uploaded the 27 closest leaf nodes to the GPU, of out which more than half should

not even be visible. Figure 4.6 illustrates which nodes that are fetched by default for this project.

My implementation similarly finds the 27 closest neighbors (cyan) to the parent node that the camera

resides in (red), but the algorithm can also find neighbors on higher levels. By default the 27 nodes are

joined by the 26 closest neighbor nodes to the second layer parent (blue) as well as the 26 neighbor

nodes to the third layer parent (orange) and all their children will be fetched. This means that by

default at least (79 ∗ 8 = 632 nodes around the camera will be fetched on start-up, but only the ones

that are visible will be streamed to the GPU. The user can control how many additional surrounding

levels of nodes that should be fetched.

The neighbors are found with a similar technique to that presented by Samet [28] but makes use of

the Morton code of the nodes to find the correct file. If a neighbor does not exist on the same level the

closest ancestor will be returned instead. If no such ancestor exist or if the only common ancestor is

the root node then the find function will return false.

When the camera moves to a different parent node new neighbors will be found and the corresponding

data will be fetched from disk asynchronously. Similar to another rendering technique in OpenSpace


Figure 4.6: Illustration of which nodes that will be fetched around the camera initially. The cyan

nodes are children to the neighboring nodes on the same level as the inner node (red) that contains the

camera. The blue nodes are children to neighboring nodes of the second parent while orange nodes

are the same for the third layer of neighboring parent nodes. By default this means that 632 nodes will

be fetched around the camera with the possibility for the user to add more layers.

developed by Bladin et al. [10] the application keeps track of the least-recently fetched nodes as well

as how much of the CPU memory budget that is in use. Just as with the streaming budget the user can

define what max percentage to use for the octree out of the computer’s total RAM. When the RAM

usage exceeds a threshold of the set budget the data of least-recently fetched nodes will start to be

removed asynchronously in a separate thread similar to the algorithm presented by Reich et al. [26].

When updating the buffers the only difference from when reading from a single file is that sometime

we want to upload data from an inner node even if it does not pass the usual requirement for sending

its LOD cache. One such occasion is if an inner node is loaded but none of its descendants. To be

able to do that the nodes keep track if there are any descendants that are loaded or not. If those

chunks would not be uploaded then only loaded leaf nodes would be visible and the result would

look abnormally sparse as only the absolute closest stars would be rendered. If all descendants of the

neighboring nodes always were fetched then this would not be a problem. However, to avoid loading

too many files at once the condition above was introduced to be able to browse a big dataset even in

the higher levels of the octree without having to load too many files initially.

4.5 Render techniques

The last step of the pipeline is to actually render the stars on the GPU. The previous way of rendering

stars in OpenSpace was direct rendering by using billboard instancing. For this thesis that technique

was extended by a tone-mapping technique that improved the performance as well as using the stars’

luminosity and distance to the observer to determine their physically correct size and brightness. The


strength with billboards is that it is easy to change the size of the stars without much effect on the

performance. It is also quite fast when rendering sparse datasets, but when using larger datasets the

performance decrease fast. Therefore a point splatting technique with a similar tone-mapping was also

developed as well as a real-time switching between the two techniques. This section will describe the

differences of these techniques in more detail.

4.5.1 Render modes

As previously mentioned there are eight render values per star stored in the octree. These are divided

into three different render modes; Static uses only the position of the stars, Color uses position to-

gether with magnitude and color while Motion also adds velocity and thus uses all eight values. The

reason behind this is that the user should not be required to upload data to the GPU that is not used.

Not all subsets will have correct velocity for example. This also have consequences for the octree,

which stores the data for position, magnitude plus color and velocity in three separate containers, and

how the data is streamed to the GPU as both VBO and SSBO have to be able to switch between the

render modes during runtime.

The difference in appearance is most noticeable between Static and the others because for Static we

assume the luminosity is the same for all stars while for Color and Motion the equation is derived

from

Magnitudesun −Magnitudestar = 2.5 ∗ log(Luminositystar/Luminositysun) (4.1)

which gives Luminositystar = 101.89−0.4∗Magnitudestar when using the default scientific values of

Luminositysun = 1.0 and Magnitudesun = 4.72 [16]. Static also lacks any color while the others

get their color from a look-up table depending on B-V, Bp-Rp, Bp-G or G-Rp photometric bands.

Figure 4.7 and 4.8 display the apparent difference. Regardless of render mode the distance between

the camera and the star is used to dim each star by applying the inverse-square law.

Figure 4.7: The TGAS subset rendered as

Static. Here all stars are assumed to have the

same luminosity.

Figure 4.8: The TGAS subset rendered as

Color. The luminosity is calculated from the

stars’ absolute magnitude.


4.5.2 Billboard instancing

Figure 4.9: Extreme over-exposing

effect while rendering billboards.

The usual way to render stars with billboards is to generate a

screen-aligned square in the geometry shader and then use a

point spread texture in the fragment shader to determine the

contribution of a pixel to a star. This approach was already im-

plemented in OpenSpace but was much too slow to be able to

handle rendering of large amounts of stars. Therefore it was ex-

tended to a two-pass render, with the first render pass storing the

accumulated luminosity values of each pixel in a Framebuffer

Object (FBO) and the second pass tone-mapping the values to

avoid an over-exposing effect (as seen in Figure 4.9).

There are also a couple of properties that the user can control in

runtime to highlight different aspects in the dataset. For example

a luminosity multiplier can be used to compensate for traveling

further out or if the user wants to show the extent of the mea-

surements. A "nearby boost" property can also be used to increase the size of close stars which is

especially effective when rendering in stereo. Figure 4.10 shows an example of billboard rendering

with excessive initial size and close-by stars boosted even further.

4.5.3 Point splatting

One of the reasons billboards are slower for dense datasets is that when the geometry of many stars

overlap the alpha blending in the OpenGL pipeline takes a long time to complete. To improve this

a point splatting technique was implemented where only the closest pixel in the FBO will store the

accumulated luminosity of the visible stars. Instead of spreading them out in the geometry shader in

the first render-pass the brighter stars will be spread out in the second pass with a convolutional bloom

filter. A circular Gaussian function was initially used as a sort of procedural point spread function.

Only pixels with a function weight higher than a set threshold will be sampled and contribute to

the final value of the pixel. The user can change filter size and σradius value in the distribution as a

property during runtime. Figure 4.11 shows stars rendered as points with the filter size set to 19 and

sigma to 2.0, which gives unrealistically big stars.

Figure 4.10: Stars rendered as billboards with

an excessive initial size and close-by stars

boosted even further.

Figure 4.11: Stars rendered as points with a

filter size of 19 and sigma of 2.0 which gives

an excessive effect.


4.5.4 Real-time filtering

As an additional feature it is possible to filter the stars in real-time. The filtering is then done in the

vertex shader, but as the vertex shader cannot actually discard any vertices the main function instead

returns early and the star is discarded as soon as possible in the geometry shader. The user can filter

by position, magnitude, color, velocity and/or distance in a similar fashion as with the offline filtering

(Section 4.3.2). The user can set min and max thresholds with two sliders per filter value. If both

thresholds are set to the same value then all stars with the corresponding parameter equal to that value

will be filtered away, demonstrated by Figure 4.12 and Figure 4.13. In the latter all stars without any

measured parallax have been filtered away whereas in the former they are all set to the same distance.

The filtering works the same regardless of render mode and technique.

Figure 4.12: The radial velocity subset

with 7.2 million stars rendered in Static

mode with all stars enabled.

Figure 4.13: The radial velocity subset

rendered in Static mode with all stars

without parallax filtered away.

4.5.5 Render in a dome

Figure 4.14: Darkening effect when

rendering in fish-eye mode (i.e. on

a curved screen).

Render on a flat screen can be quite different from rendering

in a dome. OpenSpace makes use of Simple Graphics Clus-

ter Toolkit (SGCT)5 to sync between multiple projectors and

warp the images correctly onto any type of screen. When using

a screen space method, such as the bloom filter used for point

splatting, one has to compensate for the transformations that are

performed by SGCT after the frame has left OpenSpace to be

able to render in a dome. When projected onto a curved screen

the stars should look round as before, but for them to appear

round they have to be rendered as scaled ellipses on the pre-

transformed flat image. Otherwise they will appear darker in the

transitions between the projectors as illustrated by the fish-eye

render in Figure 4.14.

This means that instead of a circular Gaussian distribution func-

tion that only scales in σradius we instead should use a two-

dimensional Gaussian function that can scale in two directions,

σmajor and σminor. That 2D function is given by

5https://c-student.itn.liu.se/wiki/develop:sgct:sgct/


f(x, y) = A ∗ exp(−(a(x− x0)2 + 2b(x− x0)(y − y0) + c(y − y0)

2)) (4.2)

where

a =cos2 θ

2σ2major

+sin2 θ

2σ2minor

, (4.3)

b =sin 2θ

4σ2major

+sin 2θ

4σ2minor

, (4.4)

c =sin2 θ

2σ2major

+cos2 θ

2σ2minor

(4.5)

and A is the accumulated intensity of previously sampled pixels. θ is the angle to a pixel from the

apparent aspect center of the camera. The scale factors for σmajor and σminor are calculated according

to Figure 4.15, where σmajor is represented by c, σminor is represented by b and θ is equal to β which

means that the scale factors are 1/ cos2 θ and 1/ cos θ respectively. The equations are based on the

assumption that e ∼ f when the α angle is very small, as is the case for a single pixel.

Figure 4.15: Illustration of how the scalefactors for σmajor and σminor are calculated.

The filter size and pixel contribution are then scaled by a factor depending on the aspect ratio and field-

of-view of the camera. You also have to consider the difference in size between different projectors

which is why they are also scaled up or down depending on their relation to a default screen size.

A similar darkening effect will happen when rendering billboards if they are facing the screen instead

of the camera. This can be fixed by rotating all billboards so they are always camera-facing in the

geometry shader in the first render pass. This has also been implemented into OpenSpace.

Chapter 5

Results

All the following measurements were performed on a machine with an Intel Xeon E5-1620, 3.60GHz

CPU with 4 cores (8 logical processors), 40 GB RAM and a Nvidia GeForce GTX 1070 Ti GPU with

8 GB of dedicated memory. When reading files during start-up or streaming data from disk during

runtime the corresponding files were stored on a 425 GB SSD while for the pre-processing tasks the

files were read from and written to a 3 TB HDD.

5.1 Reading from multiple files

Table 5.1 shows the statistics when reading the entire DR2 dataset of 1.7 billion stars. The dataset was

originally stored in 61,234 FITS files ranging from 0.6 to 86 MB in size. The table shows the number

of values read from the original files over the number of values written to the binary files as well as

the maximum number of values an octant would store in RAM before writing the values to disk. It

also shows the range in size of the final files, the number of threads used for the reading, the total time

it took for the task to finish and the calculated speed-up.

R/W values MaxSizeBeforeWrite Size range (MB) Threads HH:MM:SS Speed-up

18/24 48,000,000 12,103 - 30,723 1 07:29:13

” ” ” 2 07:18:36 2.4%

” ” ” 4 07:00:41 6.8%

” ” ” 8 07:07:05 5.2%

” ” ” 16 07:20:02 2.0%

” 9,000,000 ” 1 07:31:31

” ” ” 8 07:20:46

18/18 48,000,000 9,325 - 23,042 ” 08:02:58

18/8 ” 4,024 - 10,241 ” 08:11:13

8/8 ” ” ” 05:11:25

Table 5.1: Statistics while running ReadFitsTask for the full DR2 with 1.7 billion stars.

Table 5.2 shows the same statistics while reading a random subset of 42,9 million stars with an ad-

ditional metric for the maximum amount of RAM that was used, if any was recorded. The subset

consists of 3,045 files with sizes between 0.7 and 32.8 MB. The memory metric was read directly

from Windows Task Manager (WTM) which most often makes an overestimate of the requirements.

When compared to another memory profiler WTM displayed up to 50% more memory consumption

(1070 MB versus 700 MB). The reason why no other profiler was used was simply because it took

too long to run a single test.

28

CHAPTER 5. RESULTS 29

R/W values MaxSizeBfWrite Size range (MB) Threads MM:SS RAM (MB)

18/24 48,000,000 86 - 1,327 1 14:19

” ” ” 2 13:23

” ” ” 4 13:14

” ” ” 8 12:21 1070

” ” ” 16 14:34

” 9,000,000 ” 8 12:30 220

” ” ” 16 12:29 310

Table 5.2: Statistics while running ReadFitsTask for a random subset from DR2 with 42,9 million

stars.

Table 5.3 then shows the same statistics for the official subset of 7.2 million stars that all had mea-

surements of radial velocity. The main difference from the other datasets is that this subset was stored

in only 8 files with sizes of 628.5 (7 files) or 141.2 MB.

R/W values MaxSizeBfWrite Size range (MB) Threads MM:SS Speed-up

18/24 48,000,000 60 - 119 1 1:58

” ” ” 2 1:28 34.1%

” ” ” 4 1:29 32.6%

” ” ” 8 1:28 34.1%

Table 5.3: Statistics while running ReadFitsTask for the radial velocity subset from DR2 with 7.2

million stars.

5.2 Construction of the octree

The following tables show statistics from running the ConstructOctreeTask on different datasets. The

tables will highlight different things but all have in common that they read 24 and store 8 values per

star in the octree.

Table 5.4 compares the time when constructing an octree from different datasets. The table shows

which dataset it is, what the maximum number of Stars Per Node is, the initial Distance, the number

of created files, the number of leaf nodes and inner nodes, the depth, if any filtering was applied, how

many stars that were stored in the octree and how long it took to complete the task.

The filtering property is defined as the parallax error for bright stars/dim stars. Bright stars are defined

as G magnitude <13 (it has an inverse relationship). If a dataset is filtered by 90/50 that means that

bright stars can have a parallax error up to 0.9 while dim stars are filtered away after 0.5.

Dataset kSPN/Dist #files Leaf/Inner Depth Filter #stars MM:SS

full 150/250 29,793 26,069/3,723 14 no 1,692,880,791 30:30

full 50/1000 145,325 127,170/18,166 17 no 1,692,880,791 1:52:45

full 20/200 1,905 1,667/237 12 90/0 7,196,311 5:57

subset 10/200 18,501 17,599/2,513 14 no 42,871,653 1:37

rv 10/20 3,033 2,654/378 9 no 7,224,631 0:12

Table 5.4: Statistics while running ConstructOctreeTask for three different DR2 datasets.

Table 5.5 shows the difference in memory consumption and time spent when running Construct-

OctreeTask for the same dataset and filtering but with differences in the number of threaded writes


and how big the stored LOD cache is allowed to grow before re-sorting it. If writing was threaded

for one level then one thread per branch would be started. If two levels were threaded then eight new

threads would take care of each sub-branch as well. On the other end when subdividing a leaf node

it will sort all its stars by their magnitude and keep them as LOD cache. As new stars are inserted

into its descendants they will be checked against the dimmest star in the cache and stored if they are

brighter. If re-sorting is enabled it means that the size of the LOD cache is allowed to grow to twice

or thrice the size of the original max stars per node before resorting the cache, which subsequently

will raise the bar for new stars to be stored in the cache.

kSPN/Dist Filter #stars Thr. write Re-sort RAM (MB) MM:SS

150/250 90/90 1,104,289,901 x1 x2 <21,300 30:49

” ” ” x2 x2 30:52

” ” ” x2 x3 <19,800 30:30

” ” ” no x2 <39,000 42:21

” ” ” no no MAX -

Table 5.5: Statistics while running ConstructOctreeTask for DR2 and filter both bright and dim stars

with a parallax error less than 0.9.

5.3 Rendering

There are several things to consider when it comes to the rendering. There is a difference if we

are rendering a small dataset that fits in RAM or a large dataset that has to be streamed from disk.

Then there are two different buffer techniques (VBO and SSBO), two different render techniques

(billboard instancing and point splatting) and three render modes (Static, Color and Motion). Finally

the performance depends on the resolution, camera motion, camera aspect transforms (i.e. the need

for scaling), the size of the octree, the size of the stars and how many stars that are overlapping. The

next few figures/tables will break down these things into separate parts.

The radial velocity dataset with 7.2 million stars has a size of 335 MB after the preprocess and can

fit in memory on all modern laptop and desktop computers which is why it will serve as a measuring

rod for the first test. Figure 5.1 and 5.2 display the performance in render time (ms) for all render

modes and render techniques on two different resolutions. The SSBO_noFBO bars represent the old

rendering technique, when rendering billboards without using a FBO, but are using the new SSBO

buffer objects. The point rendering used a filter size of 5 while the billboards have an initial size of 1

with a close-up boost at 200. Most measurements are taken when using a static camera close to the

Sun with 1.1 million stars visible. That is true for all but the last two measurements in Figure 5.2,

where the first is looking at the entire dataset (7.2 million) with a static camera and the second is

spinning with constant speed at a distance of 20 pc from the Sun with at least 2 million stars visible

at any given time. No LOD optimization was used for these measurements and the octree was built

with a maximum of 10,000 stars in a node and an initial extent of 20 kpc. The displayed time is the

average time it takes to render a frame after the time has stabilized (i.e. after all nodes are updated),

except for when spinning the camera when it is the mean render time of one full rotation cycle.

Figure 5.3 shows the performance when rendering a static camera with the same radial velocity dataset

in Color mode on a 1920x1200 flat screen, 2 million stars visible. Point_SSBO is used and the only

things that are changed are the size of the screen space filter and if any scaling is performed on the

filter or not.

Then there are the datasets that cannot fit in RAM. Table 5.6 shows performance for the different

render techniques on a dataset with 618 million stars (50/50 filter) while Table 5.7 illustrates the


Figure 5.1: Render statistics with 1.1 million stars visible and a screen resolution of 1280x720.

amount of stars that can be rendered at once in OpenSpace. All rendering was done in Color mode on

a flat 1920x1200 screen. The filter size was set to 5 with filter scaling enabled. The #stars property

displays how many stars that are actually visible on screen. When spinning the camera the metrics

show the range between minimum and maximum values.

Method #stars Placement Render time (ms)

Point_SSBO 11M inside 23.7

” 11-14M spinning (100pc) 26.3-29.2

”(no scaling) ” ” 24.7-28.2

Point_VBO 11M inside 24.3

” 11-14M spinning (100pc) 32-34

Billboard_SSBO 11M inside 42.3

” 11-14M spinning (100pc) 49-57

Billboard_VBO 11M inside 45.9

” 11-14M spinning (100pc) 78-90

Table 5.6: Performance while rendering a subset with 618 million stars on a flat screen with

1920x1200 resolution.

5.4 NYC Gaia Sprint 2018

On June 4 through 8 an event called the Gaia Sprint 2018 was held in New York City at the Center

for Computational Astrophysics (CCA). About 90 astronomers worked together for a week to try to

find new knowledge in Gaia DR2. As a part of that sprint an external event was planned at AMNH to

show parts of the release in the Hayden Planetarium for the participant of the sprint.

I was invited to be a part of this event and after the organizers had seen the capabilities of OpenSpace

they decided to use it for the entire show. About 17 new subsets were processed, some of which had

been prepared and given to us only a few days before the show. To be able to show different abundance


Figure 5.2: Render statistics for the radial velocity dataset with a screen resolution of 1920x1200.

Dataset #stars Placement Performance (fps / ms)

618M 36M inside / 43.7

” 46M - / 52.8

” 51M - / 52.9

” 61M - / 61.3

1.1B 15M inside / 39

” 82M - / 91

” 100M - 11 / 107

1.7B 33M outside / 30.4

” 63M ” 18 / 55.4

” 66M inside 17 / 66

” 50-70M spinning (20pc) 13-16 / 58-90

Table 5.7: Statistics while rendering different large datasets on a flat 1920x1200 screen.

data with color maps three of the smaller datasets were rendered with the old technique but all the

others were rendered with the technique presented in this report.

The show had over 100 attendees and apart from showing the smaller subsets given to us we also

turned time on for the entire radial velocity dataset with 7.2 million stars as well as rendering a

dataset of 920 million stars in the dome. Figure 5.4 is a photograph taken at the end of the show.


Figure 5.3: Performance for different filter sizes with and without scaling of the filter enabled. 2

million stars visible on a flat screen with 1920x1200 resolution.

Figure 5.4: Photograph taken at the Gaia Sprint show at AMNH.

Source: Photo credit Matt Shanley @ AMNH

Chapter 6

Analysis and Discussion

The results show that the performance has been improved in several departments. Due to the late

release of the actual dataset more work was put into the buffer updates and rendering techniques than

it may have been otherwise. This meant both a better performance for small to medium datasets but

also that the streaming itself may not have given enough implementation time to fulfill its potential.

The following sections will continue to analyze the results and discuss possible improvements.

6.1 Method

One challenge of the project has been that the real data was not released until the supposed end of the

implementation phase. Therefore a smaller dataset, the TGAS subset of 2 million stars, was used as

a reference for most of the implementation while the end goal was to use a dataset that was a 1000

times bigger. Too late I was made aware that DR1 had been released with 1.1 billion stars as well

but with the majority lacking parallax measurements. That dataset could have been used to test the

streaming of files in an earlier phase.

Another mistake I made was to assume that DR2 would be released as FITS files, as DR1 and the

TGAS subsets had been, while it was actually only released as CSV files. The consequences of this

mistake was that it took over a week after the release before I could actually process the data as I had

prepared to. Then it turned out that CCfits could not have more than one I/O driver at a time which

means that a parallellized CSV-reader might actually have been faster in the long run. It is difficult

to say for sure before it has been implemented and tested, which I did not have time for in the scope

of this thesis. However, even though the end goal of rendering the full dataset was pushed back by

starting small that route was still useful as it meant that I focused on optimizing the algorithms for

buffer updates and rendering more than I maybe would have otherwise, which later had a positive

impact on the performance of both small and large datasets. Most astronomers want to work with

relatively small datasets in the end anyways.

One thing that I might have spent too much time on was the refactoring of old code, to keep the

backward compatibility of things I had previously implemented. Some of the techniques that were

important in the beginning, such as the different render modes, might not be as interesting for the end

user when everything else is in place. At the same time it gives more flexibility to the application.

This thesis project has been more implementation-focused than research-focused, which was great

for the implementation goals but maybe less so when comparing to other visualization research. Few

of the techniques are brand new even though the combination of them might be. The techniques that

may actually be new are using SSBO with binary search to limit the shader calls, actually keeping

track of which nodes that already have been uploaded to the GPU and compensating the rendering for

34

CHAPTER 6. ANALYSIS AND DISCUSSION 35

a curved screen when using a screen space filter on points.

The reason why it was very implementation-focused was that the entire pipeline had to be imple-

mented before any large quantity of stars could be rendered at all. Because of that it would not have

been feasible to concentrate the study on a narrow part of the pipeline. With the pipeline foundation

now in place another student could basically continue on my work and do a whole new thesis with

focus on improving the streaming of files, the rendering aspects and/or compare other techniques to

the ones that are already implemented.

6.2 Implementation

In hindsight there are a couple of changes that might have improved the performance even more but

because of other priorities and the time constraint there has been no comparison to any of these.

For example, one of the more costly operations in a shader is the texture look-up. Therefore it would

probably be better to use procedural functions when rendering billboards instead of the point spread

texture and color look-up table. A similar function as the one used in the bloom filter might have

worked well.

There are also some fundamental issues with the octree. The octree used is a pointer-based octree

which has advantages in flexibility and more reader-friendly code. However, it was not really utilized

as I first had planned which means that a block representation or an implicit node representation [15]

would have taken up less space and probably would have been faster in the traversals. That would

bring up other problems when using multiple threads or when finding neighbor nodes for example, as

well as making the code less readable.

It would also have been interesting to try an adaptive octree structure. One could for example subdi-

vide a node where it had the highest concentration of stars. That would require more memory as more

information about the AABB would need to be stored. It also has an intrinsic flaw because the stars

may be ordered in the binary files by when they were measured by Gaia, which means that where the

highest concentration is when a node is subdivided is not always where the highest concentration will

be after all stars have been read. If the number of nodes, and therefore the number of total files, could

be reduced it may still be worth it. It may also be worth looking into more bottom-up construction

techniques as a comparison.

Another improvement would be to use anti-aliasing when rendering points. Right now the point ren-

dering can have a "jitter" effect when moving the camera slowly, especially in a dome as one can

see the stars move from being based in one pixel to another. This could be reduced by rendering to

a higher-resolution FBO but the intrinsic problem would still be the same. Another idea is to divide

the luminosity of a star into the closest four pixels with a weight depending on the exact screen space

location of the star. That way it may be possible to have a smooth movement even for small distances.

6.3 Results

The presented results divided the reading from disk, construction of the octree and rendering into

separate part. No measurements for the single file pipeline have been made as it was insignificant

compared to the bigger datasets. The results show that it is quite easy to generate multiple subsets

once the full dataset has been read. When rendering there are also several techniques to choose from

depending on what the users want to highlight.


ReadFitsTask

As can be seen in Tables 5.1 and 5.2 the speed-ups when threading the ReadFitsTask are marginal

when the file sizes of the raw data are relatively small. This implies that the disk reads are the main

bottleneck and not the processing of the stars. According to Table 5.3 the speed-up has a relatively

greater impact for larger files. A parallellized CSV reader may still perform better as it does not have

the same inherit problem with only one I/O driver. However, as this process is meant to be done only

once for any dataset it may not be a prioritized change.

When comparing to other projects the preprocessing time is not bad. Reich et al. [26] used 35 hours to

preprocess one timestep of the Millennium Run (225GB) while Fraedrich et al. [13] used 5:20 hours

for the same dataset. In comparison my 7 hours for 1.2 TB of data is quite affordable.

The property for max values before writing determines the max usage of RAM more than the total

time of the process. One surprising thing in Table 5.1 is that it takes longer to read 18 values than

24 values from the original files. That could depend on the order the columns were read in, or on

some strange bug. Most of these timings were only run once which really is a bad measurement as it

sometimes can differ quite a lot between different runs depending if the computer is occupied with

something else simultaneously.

ConstructOctreeTask

The biggest takeaway from constructing the octree is that you need to re-sort the LOD cache sometime

otherwise the RAM can hit max and lock the entire computer, as illustrated in Table 5.5. The process

also gets significantly faster when using threaded writes, however one level of threading is enough. If

the computer has enough RAM it would be interesting to see how fast the construction can get if all

eight binary files were read simultaneously. The computer in this project did not have enough memory

budget to try such a thing.

The biggest impact on how long it took to generate the octree was how many files that should be

written to disk. The number of files depends upon the size of the initial dataset, the max number of

stars per node, the initial distance and if any filtering was used or not. When most stars were filtered

away the construction decreased to around 6 minutes while it increased to almost 2 hours when writing

145,000 files as in Table 5.4.

Rendering

In the results it is apparent that SSBOs are faster than VBOs and that points are generally faster

than billboards. It is interesting to notice however that billboards gets faster on a bigger screen, when

rendering a small dataset. It is partly because the filter sizes of the point rendering gets scaled more

but foremost because the stars are quite scattered. Billboards perform much worse when many stars

overlap, as illustrated by Figure 5.2 when looking at the full dataset. When spinning the camera the

difference between the buffer techniques also get more significant. The difference scales with the

rotation speed as SSBOs can handle constant updates much faster than VBOs. Both figures show that

a two-pass render is much faster than not using any FBO. If the single-pass method had not been

extended to using SSBOs it would probably finish dead last in all measurements. Right now it is quite

fast with updates but cannot handle large datasets at all.

As can be seen in Figure 5.3 the scaling of the screen space filter affects the performance quite a lot.

Scaling the filter is a must when rendering on a curved screen but if one is using a flat screen and likes

to have all stars equally round and as a bonus get a better performance there is really no need to use

any filter scaling. That would also mean a much better performance on higher resolution screens as


well. Manual scaling of billboards also affect the performance but not as much, as long as the stars do

not overlap too much.

Finally there are the big datasets. Points are definitely faster and SSBOs are still preferred over VBOs,

especially when the camera moves around as in Table 5.6. This is due to less draw calls and that less

data has to be written to the buffers, as VBOs have to fill up all the free space in the chunks with

zeros. It is interesting that it is actually faster to do a binary search for every star in every frame than

to update the buffers too often. The drawback is that SSBOs do not work for Apple products so it is

important to keep both methods up to date.

Figure 6.1: Block structure that can oc-

cur when storing the brightest stars as

LOD cache.

In Table 5.7 we can see that it is possible to render up to

100 million stars on the screen while keeping the frame rate

over at least 10 Frames Per Second (FPS). It is also possi-

ble to actually run the complete DR2 dataset with almost

1.7 billion stars and fly from outside the Milky Way to the

Sun with only a few drops in performance when new nodes

starts to load. It is however worth noting that this may not

look as expected until the camera is closer to home. Figure

6.1 illustrates the block structure that comes from storing

the brightest stars as LOD cache. In most nodes the bright-

est stars will be the ones closer to the measuring instru-

ment, because we can get better measurements of them,

which in turn means that all higher levels will store the

same stars closest to Earth. A few ideas to improve this are

mentioned in future work.

Most scientists are not interested in rendering the full re-

lease anyways as many of the stars are missing measure-

ments or the existing measurements have large estimated

errors. Figure 6.2 highlights some of the artifacts that are

visible when rendering the full DR2 dataset from a viewpoint close to the Sun. These artifacts will

decrease by each of the coming Gaia releases as errors of the measurements will get smaller for every

release.

Figure 6.2: Highlighting the visual artifact when rendering the full DR2 dataset of 1.7 billion stars.


6.4 Software comparison

As mentioned in Section 2.3 the most similar software on the market is Gaia Sky, which has been

focused on visualizing the Gaia data since the launch of the project. A second release of Gaia Sky

went public on the same day as DR2 was released with six different prepared subsets of the data ready

for download.

Gaia Sky has several features that do not exist in OpenSpace yet. It already has a good screen-space

selection, which is the reason why I did not try to implement anything similar in OpenSpace, it runs

very smoothly regardless of the size of the dataset and it is easy to install and run.

It does not display the number of frames per second or number of stars rendered so it is difficult

to compare any numbers but my analysis is that OpenSpace can render more stars simultaneously

and have a smoother navigation than Gaia Sky. Figure 6.3 and 6.4 shows the largest dataset released

for Gaia Sky, which has 601 million stars, with and without rendering the background image of the

Milky Way. The settings have been adjusted to boost the brightness of the stars to the maximum.

Figure 6.5 and 6.6 show the same view for a dataset of about 618 million stars in OpenSpace, where

61 million stars are actually rendered on screen. To be fair the settings have been adjusted to boost

dimmer stars without saturating the brighter ones (i.e. not the realistic rendering Gaia Sky is going

for). It is however possible to change those settings in runtime and highlight the properties that you

are interested in, and the displayed brightness is far from the maximum. Another side note is that the

color values are Bp-Rp but the color look-up table is normalized after the older B-V values which may

render the colors a bit too exaggerated in OpenSpace. This is a known problem and researchers at

AMNH are working on getting a correct look-up table for the new ranges.

Figure 6.3: Gaia Sky running their largest dataset

with a background image of the Milky Way, with

maximum star brightness used.

Figure 6.4: Gaia Sky running their largest dataset

without a background image of the Milky Way,

with maximum star brightness used.

Figure 6.5: 61 million visible stars rendered

in OpenSpace with a background image of the

Milky Way.

Figure 6.6: 61 million visible stars rendered in

OpenSpace without a background image of the

Milky Way.


6.5 Source criticism

The majority of the theory is based on quite recent technical papers while a lot of the implementation

techniques were inspired from online sources. All the included papers have been cited quite a few

times already and the online sources used have all been checked for credibility. However, all of the

cited papers have focused on large-scale astrophysical visualization. There may be additional relevant

papers in other fields as well that I did not find in my searches. It would for example be interesting to

try a large-scale molecular approach for particle rendering as they work more with different represen-

tations of the data. Such an implementation could be efficient for public outreach but as astronomers

are more interested in seeing the real data it may not be as relevant when developing a research tool.

6.6 The work in a wider context

As the software is open source and the Gaia data also is openly released anybody can download and

use the software to visualize the cosmos with the released data in it. A couple of the prepared datasets

are uploaded online as well and can be downloaded within OpenSpace. With a moderate desktop

computer any user can run through the full dataset and look at the stars where they actually exist. One

important part to remember is the uncertainty of the measurements, something that is not visualized

in OpenSpace yet, otherwise it is easy to get confused by the strange artifacts that may appear (as

shown in Figure 6.2). It is possible to filter away stars with high uncertainties but the errors are also a

good way to show how exact (or inexact) real research actually is.

I would argue that the thesis project has no negative impact on any particular group. For users with

color blindness the color table can be changed to a more appropriate range during runtime if needed.

If anything the application provides more accessibility to current research and real data than what is

common today.

Chapter 7

Conclusion

At the start of the project nine implementation goals were set up and five research questions were

asked. Now when the project has finished I can conclude that all of the goals more or less have been

fulfilled and that all of the questions have been answered. Based on the results of the project and the

response I got from the attendees at the AMNH event I consider this thesis to be successful.

The implementation goals were divided into two groups depending on the objective, research focused

or for public outreach. The following sections will briefly summarize how the different goals were

achieved.

Implementation goals for the public outreach objective

The first goal was to visualize the Gaia mission and how the instruments measure the stars. As ex-

plained in Section 3.2 this has been visualized in OpenSpace with an open-source model of the space-

craft which is rendered together with trail-lines of the accurate trajectory of the entire mission. The

model is rotated away from the Earth, as should be, but does not rotate around its own axis at the mo-

ment. Exactly how the instruments on the spacecraft measure the stars is not visualized either which

makes this the sole goal that was only partly fulfilled.

Thereafter is was a question about rendering the stars physically correct at their measured position in

3D. The real position can be calculated from the released values or, if the user already have processed

the data, it is possible to render your own subsets. The brightness and size of the stars depend on their

apparent magnitude, which is calculated from absolute magnitude and dimmed by the inverse-squared

law. The stars can be manually brightened and it is possible to highlight different aspects of the data

with the use of a number of properties. If one wants to show the full extent of the dataset for example

it is easier to do with the Static render mode as all stars are assumed to have the same magnitude. With

Color and Motion the measured magnitude is used instead which gives a more realistic rendering. It

is possible to switch render mode during runtime as one implementation goal requested.

It is also possible to render the new stars together with old knowledge, such as other galaxies, exo-

planets, quasars and the constellations, to provide context to what the user is seeing. The user can also

render the old Hipparcos stars simultaneously as the stars measured by Gaia to compare the datasets.

Implementation goals for the research focused objective

While it is interesting to show the data for the public, researchers want to interact more with the data

and use their own subsets. As shown in the results it is possible to render both small and big datasets

in interactive framerates. With the implemented streaming technique it does not really matter how big

the dataset is as long as the total number of nodes are kept at a reasonable level to keep down the

40

CHAPTER 7. CONCLUSION 41

traversal time of the octree. It is also possible to change dataset of the same format during runtime.

The accepted formats are FITS, SPECK, raw binary, binary octree and streaming octree from disk.

If OpenSpace was started with a FITS file it would be possible to switch to other FITS files during

runtime. The other subsets may have been created in a third party software as requested. However, if

one wants to switch between a binary octree and streaming from disk then two different render objects

have to be created in the scene before starting OpenSpace.

This way it is easy for astronomers to load their favorite subsets or to create a new one within the

OpenSpace pipeline. After reading the raw data of the full DR2 release, which takes about 7 hours

and only have to be done once, creating a new subset takes most of the times less than 30 minutes.

When creating a new subset the user can filter stars before they are inserted into the octree. If the

user wants to show the difference live instead a filtering can be applied during runtime as well. For

the offline filtering you could in theory filter by all values in the release if you have enough RAM

to handle the file size. The default is 24 values which include most photometric properties and their

estimated errors. During runtime the user can only filter by the eight render values plus the distance

from the Sun.

By default a "realistic" rendering approach is applied but if the objective is to highlight the differences

in the dataset, without using filtering, then the user can change the displayed size, brightness and/or

color of the stars to make them easier to classify. It is also possible to make all stars with measure-

ment for proper motion move. If radial velocity exists as well the star will move with its combined

space velocity, otherwise the star will move with its transverse velocity. However, even the combined

space velocity is only realistic for a fraction of time because the galactic rotation is omitted from the

calculations.

Research questions

An out-of-core rendering technique enabled streaming data dynamically to the CPU by loading the

stars closest to the camera asynchronously during runtime. The application keeps track of how much

RAM that is still available while also keeping track of which nodes that should be unloaded when the

memory budget stars to decline.

Current research suggested that the best way to structure the data was a version of an octree. I found

that a pointer-based octree structure gives great flexibility when deciding which nodes to upload to

the GPU while keeping track of which nodes that already are uploaded. The spatially structured grid

also helps determine which nodes that should be loaded from disk while streaming.

The results for the rendering (Section 5.3) show that the best performance for big datasets is given

when using point splatting with SSBOs as buffer objects. Therefore, if the objective is to render

as many stars as possible, that is the technique to use. If the objective instead is to improve the

readability then billboard instancing together with SSBOs may be the best alternative. This is because

it is easier to radically increase the size of billboards than points without a drastic performance drop.

As mentioned before the best way to classify stars without filtering is to increase the size or change

the color. If the dataset is relatively big then point splatting would still be the better alternative due to

the performance gain. If one is using an Apple product then SSBOs are not available and VBOs have

to be used instead. This is unfortunately nothing that can be improved until Apple updates its support

for OpenGL.

The final question asked what tools that OpenSpace can offer to astronomers that are not already

available. The initial response would be that OpenSpace can render more stars on the screen than

any other available software. That is not something that I can prove as most papers only refer to

the original size of the dataset they are running and not how many stars/particles that are actually

rendered on screen at a certain time. Many paper do not even have pictures of their final results. Out

CHAPTER 7. CONCLUSION 42

of those that publish any performance numbers this implementation is right at the top with regards to

the hardware the measurements were run on. To my best knowledge the AMNH event was also the

first time ever that a billion (or 0.92B) stars/particles were run publicly on a dome cluster.

However, the main contribution of this thesis may be that OpenSpace now can offer is a stream-

lined data pipeline, where the users easily can insert their own datasets and show them together with

previous knowledge of the universe. With the multiple menu properties the user can also tweak the

rendering until the performance is up to standard or the stars looks as wished.

7.1 Future work

Even though a lot of things have been implemented in this project there are still improvements that

can be made and new functionality to add. For example the LOD cache needs to be improved. Right

now the brightest stars are stored as duplicates in the inner nodes as LOD cache. That means that

OpenSpace require more storage and RAM consumption when constructing the octree. If the stars

should not be duplicated then the stars need to be sorted after every new insertion however which

would bring the total time from about half an hour up to probably more than a day. An even better

way would be to calculate the total luminosity of the stars in all descendants and only store a cluster

representation in the inner nodes. That way the transitions between levels would be less apparent as

well. Such a solution would help show the entire dataset to the public but many astronomers would

likely not be interested in such a simplification of the data.

The next thing would be to implement selection. Preferably both by screen space picking and by

selecting a 3D region and get information about the selected stars. I did not focus at this at all

because Gaia Sky already did screen space picking quite well. However, it is a good feature that

OpenSpace should have. When selection is implemented you can also start to measure things which

the astronomers in Vienna requested.

Another idea is to combine the two rendering techniques so that billboards are used for close stars

while points are used for stars in the background. This would mean that we could increase the size of

close stars drastically without losing too much performance. Such an arrangement would foremost be

preferable when rendering in stereo.

Integrating more third-party tools such as GlueViz, TopCAT and Virtual Observatory would also be

great to attract more astronomers to use OpenSpace. Also to render uncertainties, display more meta

data about the stars and let the user decide what values to use during render. It would for example be

great if the user could switch between different colors and magnitudes during runtime.

A personal favorite of mine would be to implement a sequence where the stars are added to the night

sky as the Gaia instrument rotates around its own axis. A similar sequence has already been added in

OpenSpace for the New Horizons mission when the instrument takes images of Pluto but it may be

a little bit more difficult to implement for Gaia, especially because the measurements of all stars are

based on several different observations over a long time period.

Another important addition would be to incorporate the galactic rotation and gravity to the velocities.

Right now the velocities are only instantaneous vectors which is quite unrealistic. One astronomer

also asked for trail lines for the stars. Both those additions should not be too difficult to incorporate

with the current pipeline and they would increase the value of OpenSpace as a research tool.

Bibliography

[1] European Space Agency. Gaia mission: Science Objectives. https://www.cosmos.esa.int/web/

gaia/science-objectives. (Retrieved 16/04/2018).

[2] European Space Agency. SPICE Data. https://www.cosmos.esa.int/web/spice/data. (Retrieved

26/05/2018).

[3] European Space Agency. Hipparcos: About the mission. http : / / sci .esa . int /hipparcos /

31905-about-the-mission/, 2004. (Retrieved 30/04/2018).

[4] European Space Agency. Gaia DR2 Documentation - Transformation of astrometric data and

error propagation. https://gea.esac.esa.int/archive/documentation/GDR2/Data_processing/

chap_cu3ast/sec_cu3ast_intro/ssec_cu3ast_intro_tansforms.html#SSS1, Apr 2018. (Retrieved

10/05/2018).

[5] Raul Angulo and Simon White. The Millennium-XXL Project: Simulating the Galaxy Polula-

tion in Dark Energy Universes. https://wwwmpa.mpa-garching.mpg.de/mpa/research/current_

research/hl2011-9/hl2011-9-en.html, Jan 2011. (Retrieved 12/05/2018).

[6] Apple. Mac computers that use OpenCL and OpenGL graphics. https://support.apple.com/

en-us/HT2028233, Dec 2017. (Retrieved 16/04/2018).

[7] Axelsson, Emil and Costa, Jonathas and Silva, Cláudio T. and Emmart, Carter and Bock, Alexan-

der and Ynnerman, Anders. Dynamic Scene Graph: Enabling Scaling, Positioning, and Naviga-

tion in the Universe. In Computer Graphics Forum, Proceedings of EuroVis, 2017.

[8] Jeroen Beart. Morton encoding/decoding through bit interleav-

ing: Implementation. http : / / www . forceflow . be / 2013 / 10 / 07 /

morton-encodingdecoding-through-bit-interleaving-implementations/, Oct 2013. (Retrieved

27/05/2018).

[9] J. Bédorf, E. Gaburov, M. S. Fujii, K. Nitadori, T. Ishiyama, and S. Portegies Zwart. 24.77

Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs. In

Proceedings of the International Conference for High Performance Computing, Networking,

Storage and Analysis, pages 54–65, November 2014.

[10] Bladin, Karl and Axelsson, Emil and Broberg, Erik and Emmart, Carter and Ljung, Patric and

Bock, Alexander and Ynnerman, Anders. Globe Browsing: Contextualized Spatio-Temporal

Planetary Surface Visualization. In IEEE Transactions on Visualization and Computer Graphics,

2017.

[11] Alexander Bock, Emil Axelsson, Karl Bladin, Jonathas Costa, Gene Gayne, Matthew Territo,

Joakim Kilby, Eric Myers, Masha Maria Kuznetsova, Carter Emmart, and Anders Ynnerman.

OpenSpace: An open-source astrovisualization framework. Journal of Open-Source Software,

2017.

43

BIBLIOGRAPHY 44

[12] European Space Agency (ESA) and Gaia Data Processing and Analysis Consortium (DPAC).

Gaia Data Release 2. Astronomy & Astrophysics, Apr 2018.

[13] R. Fraedrich, J. Schneider, and R. Westermann. Exploring the millennium run - scalable ren-

dering of large-scale cosmological datasets. IEEE Transactions on Visualization and Computer

Graphics, 15(6):1251–1258, Nov 2009.

[14] Gaia Collaboration, T. Prusti, J. H. J. de Bruijne, and et al. The Gaia mission. Astronomy &

Astrophysics, 595, Nov 2016.

[15] David Geier. Advanced Octrees 2: node representations. https://geidav.wordpress.com/2014/

08/18/advanced-octrees-2-node-representations/, Aug 2014. (Retrieved 27/05/2018).

[16] Martha Haynes. Apparent magnitudes and absolute magnitudes. http://www.astro.cornell.edu/

academics/courses/astro201/mag_absolute.htm, Feb 2018. (Retrieved 11/06/2018).

[17] Katrin Heitmann, Nicholas Frontiere, Chris Sewell, Salman Habib, Adrian Pope, Hal Finkel,

Silvio Rizzi, Joe Insley, and Suman Bhattacharya. The q continuum simulation: Harnessing

the power of gpu accelerated supercomputers. The Astrophysical Journal Supplement Series,

219(2):34, 2015.

[18] Matthias Hopf and Thomas Ertl. Hierarchical splatting of scattered data. In Proceedings of the

14th IEEE Visualization 2003 (VIS’03), VIS ’03, pages 57–, 2003.

[19] Khronos. Khronos Releases OpenGL 4.6 with SPIR-V support. https://www.khronos.org/news/

press/khronos-releases-opengl-4.6-with-spir-v-support, Jul 2017. (Retrieved 30/04/2018).

[20] M. Le Muzic, J. Parulek, A.K. Stavrum, and I. Viola. Illustrative visualization of molecular reac-

tions using omniscient intelligence and passive agents. Computer Graphics Forum, 33(3):141–

150, Jun 2014.

[21] Niko Lukac. Particle-based visualization of large cosmological datasets. May 2011.

[22] Francois Mignard. Gaia - The L2 Orbit. https://www.cosmos.esa.int/documents/29201/305493/

IN_L2_orbit.pdf/, Aug 2009. (Retrieved 30/04/2018).

[23] Hirsch Observatory. Stars in Motion. http://www.rpi.edu/dept/phys/observatory/obsastro8.pdf.

(Retrieved 10/05/2018).

[24] Brews ohare. Proper motion. https://commons.wikimedia.org/wiki/File:Proper_motion.JPG,

Dec 2008. (Retrieved 22/05/2018).

[25] Pline. Maquette de Gaia salon du Bourget 2013. https://commons.wikimedia.org/wiki/

File:Maquette_de_Gaia_salon_du_Bourget_2013_DSC_0191.JPG, Jun 2013. (Retrieved

22/05/2018).

[26] F. Reichl, M. Treib, and R. Westermann. Visualization of big sph simulations via compressed

octree grids. In 2013 IEEE International Conference on Big Data, pages 71–78, Oct 2013.

[27] S. Rizzi, M. Hereld, J. Insley, M. E. Papka, T. Uram, and V. Vishwanath. Large-scale parallel

visualization of particle-based simulations using point sprites and level-of-detail. In Proceedings

of the 15th Eurographics Symposium on Parallel Graphics and Visualization, PGV ’15, pages

1–10. Eurographics Association, 2015.

[28] Hanan Samet. Neighbor finding in images represented by octrees. Computer Vision, Graphics,

and Image Processing, 46(3):367 – 386, 1989.

BIBLIOGRAPHY 45

[29] K. Schatz, C. Müller, M. Krone, J. Schneider, G. Reina, and T. Ertl. Interactive visual exploration

of a trillion particles. In 2016 IEEE 6th Symposium on Large Data Analysis and Visualization

(LDAV), pages 56–64, Oct 2016.

[30] A. Scherzinger, T. Brix, D. Drees, A. Völker, K. Radkov, N. Santalidis, A. Fieguth, and K. H.

Hinrichs. Interactive exploration of cosmological dark-matter simulation data. IEEE Computer

Graphics and Applications, 37(2):80–89, Mar 2017.

[31] Samuel Skillman, Michael Warren, Matthew Turk, Risa H. Wechsler, Daniel E. Holz, and P.M.

Sutter. Dark sky simulations: Early data release. Jul 2014.

[32] Srain. Parsec Definition diagram. https://commons.wikimedia.org/wiki/File:Stellarparallax_

parsec1.svg, Sep 2006. (Retrieved 22/05/2018).

[33] H. Sundar, R. S. Sampath, S. S. Adavani, C. Davatzikos, and G. Biros. Low-constant parallel

algorithms for finite element simulations using linear octrees. In Supercomputing, 2007. SC ’07.

Proceedings of the 2007 ACM/IEEE Conference on, pages 1–12, Nov 2007.

[34] I. Wald, A. Knoll, G. P. Johnson, W. Usher, V. Pascucci, and M. E. Papka. Cpu ray tracing

large particle data with balanced p-k-d trees. In 2015 IEEE Scientific Visualization Conference

(SciVis), pages 57–64, Oct 2015.

Interactive out-of-core rendering and filtering of one...

Documents

Transcript of Interactive out-of-core rendering and filtering of one...