cl02_data_abstraction.pdf

19
Data abstraction What?

Transcript of cl02_data_abstraction.pdf

  • Data abstractionWhat?

  • How?

    What?Why?

    How?

    What?Why?

    How?

    What?Why?

    How?

    What?Why?

  • Dataset types

    Trees andnetworks

    Items/nodes

    Links

    Attributes

    Tables

    Items

    Attributes

    Fields

    Grids

    Attributes

    Positions

    Geometry

    Items

    Positions

    Clusters, sets, lists Items Static vs. dynamic

  • Attribute types

    Categorical: Ordered:

    Ordinal: Quantitative:

    Sequential: Diverging: Cyclic:

  • Data semantics

    The semantics of the data is its real-world meaning. The type of the data is its structural or mathematical interpretation At the data level, its an item, a link or an attribute? At the dataset level, how are these data types combined to create a table, a

    tree, a field At the attribute level, what kind of mathematical operations are

    meaningful? Sometimes, semantics can be inferred from the syntax of the data file or

    from the names of variables included, but many often, they must be explicitly provided along with the dataset in order to be interpreted properly: the metadata.

  • Basic data types

    Attributes Items Links Grids Positions

  • Basic data types: attributes

    An attribute is an specific property that can be measured, observed or recorded.

    Synonyms: variable, dimension. Examples: salary, price, number of sales, number of neurons,

    temperature,

  • Basic data types: items and links

    An item is an individual entity that is discrete. Examples: a row in a simple table or a node in a network. A link is a relationship between items, usually within a network.

  • Basic data types: grids and positions

    A grid defines an strategy for sampling continuous data in terms of geometric and topological relationships between its cells.

    A position is spatial data, typically in 2D or 3D. Examples: latitude and longitude coordinates for determining the

    location over the earths surface, three numbers that define a location within the volume measured by a medical scanner.

  • Dataset types A dataset is a collection of information that is subject of analysis. The four basic dataset types are:

    Tables Networks Fields Geometry

    In practice, in the real-world we can find complex combinations of these four basic datasets.

    Other ways of grouping items are clusters, sets and lists. In addition, these datasets could be fully available immediately from a

    static file or it might be dynamic data processed gradually in the form of a stream.

  • Dataset types: tables

    A table is a set of items structured in rows and columns. Each row maps to an item. Each column maps to an attribute. Each cell is defined by a pair (row, column) and contains a value for

    that pair.

  • Dataset types: tablesPoint Position

    Point Position X Point Position Y Point Position Z Unit Name Time Object Group Index

    150.2389984 100.3249969 122.8769989 um A 1 Measurement Points 4 Surpass Scene 1

    150.2689972 100.3529968 122.9680023 um B 1 Measurement Points 4 Surpass Scene 2

    150.8320007 99.69300079 123.0920029 um C 1 Measurement Points 4 Surpass Scene 3

    150.8470001 99.71900177 122.9970016 um D 1 Measurement Points 4 Surpass Scene 4

    152.9389954 99.30999756 123.8359985 um E 1 Measurement Points 4 Surpass Scene 5

    152.9309998 99.3239975 123.9349976 um F 1 Measurement Points 4 Surpass Scene 6

    153.3959961 97.98100281 121.0680008 um G 1 Measurement Points 4 Surpass Scene 7

    153.4340057 98.01799774 121.1529999 um H 1 Measurement Points 4 Surpass Scene 8

    153.9600067 97.41999817 120.9160004 um I 1 Measurement Points 4 Surpass Scene 9

    153.9940033 97.45899963 121.0019989 um J 1 Measurement Points 4 Surpass Scene 10

    154.5910034 96.68399811 121.8769989 um K 1 Measurement Points 4 Surpass Scene 11

    154.6000061 96.68699646 121.9769974 um L 1 Measurement Points 4 Surpass Scene 12

    154.9589996 100.0250015 119.9179993 um O 1 Measurement Points 4 Surpass Scene 13

    154.947998 100.0240021 120.0169983 um P 1 Measurement Points 4 Surpass Scene 14

    154.822998 101.8209991 122.7170029 um O 1 Measurement Points 4 Surpass Scene 15

    154.7890015 101.8010025 122.8089981 um P 1 Measurement Points 4 Surpass Scene 16

    155.8009949 100.2139969 120.6389999 um Q 1 Measurement Points 4 Surpass Scene 17

    Attribute

    Item

  • Dataset types: networks and trees

    Networks (graphs) are used to define relationships between two or more items.

    A network item is called a node (vertex). A link (edge) is a relationship between two items. Both, items and links, can have associated attributes. A tree is an specific network with a hierarchical structure. Trees are acyclic networks.

  • Dataset types: fields

    Datasets based on fields discretize a continuous domain. Cells can also collect attributes. We must deal with some mathematical problems associated to

    continuous domains: sampling and interpolation. Types:

    Spatial fields. Grids.

  • Dataset types: fields Spatial fields

    Continuous domain data are often found like spatial fields, where the cell structure of the field is based on sampling at spatial positions.

    Most of these datasets appear in the context of tasks where the goal is to understand its spatial structure, mainly its shape.

    Scientific visualization (scivis) vs. Information visualization (infovis).

    Aerodynamic analysis of one car from Aston Martin by the English company TotalSim.

  • Dataset types: fields Grids

    If data are sampled with regular intervals, the cells define a uniform grid.

    This way, there is no need to explicitly store the geometry nor the grid topology.

    A rectangular grid is created from a non uniform sampling. Structured grids allow the representation of specific geometric

    shapes. Non structured grids are more flexible, but they demand more

    storage resources because we must explicitly keep both geometry and topology.

  • Dataset types: geometric data Geometric dataset describe the

    shape of the items with explicit spatial positions.

    Items can be points, one dimensional lines or curves, two dimensional surfaces or regions, or three dimensional volumes.

    Spatial data often incorporate hierarchical structures with different levels of detail.

    These datasets do not necessarily have attributes, in contrast to the other three basic dataset types.

  • Dataset types: other combinations A set is just a collection of items. A list is a group of ordered items. A cluster is a group of items based on attribute values with some kind of

    similarity. A path in a network is an ordered set of segments composed by the links that

    connect the nodes. A compound network is a is a network with an associated tree: all of the nodes in

    the network are the leaves of the tree, and interior nodes in the tree provide a hierarchical structure for the nodes that is different from network links between them.

    Una red compuesta es una red que tiene asociada un rbol: todos los nodos de la red son las hojas del rbol, y los nodos internos del rbol proporcionan unaestructura jerrquica distinta de los enlaces de la red.

    In practice, we can find complex combinations of the basic datasets, but anyway, we must always specify the data abstraction required for answering the question: What data do you want to see?

  • References

    Tamara Munzner. Visualization Analysis and Design. A K Peters Visualization Series. CRC Press. Nov. 2014.

    Stuart K. Card, Jock Mackinlay and Ben Shneiderman. Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann, 1999.

    Data abstractionSlide Number 2Dataset typesAttribute typesData semanticsBasic data typesBasic data types: attributesBasic data types: items and linksBasic data types: grids and positionsDataset typesDataset types: tablesDataset types: tablesDataset types: networks and treesDataset types: fieldsDataset types: fieldsDataset types: fieldsDataset types: geometric dataDataset types: other combinationsReferences