6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined...

41
6-1 Chapter 6: Data Types • Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types Union Types Pointer Types

Transcript of 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined...

Page 1: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-1

Chapter 6: Data Types

• Introduction• Primitive Data Types• Character String Types• User-Defined Ordinal Types• Array Types• Associative Arrays• Record Types• Union Types• Pointer Types

Page 2: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-2

Introduction

• A data type defines a collection of data objects and a set of predefined operations on those objects

• Almost all programming languages provide a set of primitive data types

• Primitive data types: Those not defined in terms of other data types

• Some primitive data types are merely reflections of the hardware– character: a byte – integer: a word – real number: two contiguous words

• Others require little non-hardware support

Page 3: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-3

Primitive Data Types• Integer

– Almost always an exact reflection of the hardware so the mapping is trivial

– There are many different integer types, e.g., short, int, long

• Floating Point– Model real numbers, but only as approximations– Languages for scientific use support at least two floating-point

types (e.g., float and double; sometimes more• Decimal

– For business applications (money)• Essential to COBOL• C# offers a decimal data type

– Store a fixed number of decimal digits, e.g., 25.2 – Advantage: accuracy– Disadvantages: limited range (no exponents)

Page 4: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-4

Primitive Data Types (cont)

• Boolean– Range of values: two elements, one for “true” and one for

“false”– Could be implemented as bits, but often as bytes– Advantage: readability

• Character– Stored as numeric codings– Most commonly used coding: ASCII– An alternative, 16-bit coding: Unicode

• Includes characters from most natural languages• Supported in Java, Perl, C#.

• Character String Types– Values are sequences of characters– Design issues:

• Is it a primitive type or just a special kind of array?• Should the length of strings be static or dynamic?

Page 5: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-5

Character String Type in Certain Languages

• C – Not primitive– Use char arrays, e.g., char str[] = “apples”;– a library of functions (in string.h) that provide

operations

• SNOBOL4 (a string manipulation language)– Primitive– Many operations, including elaborate pattern

matching

• Java and C++– Primitive via the String class

Page 6: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-6

Character String Length Options

• Static: – COBOL, Java’s String class– Need compile-time descriptor

• Limited Dynamic Length: – In C-based language, a special character is

used to indicate the end of a string’s characters, rather than maintaining the length

• Dynamic (no maximum): – SNOBOL4, Perl, JavaScript– need run-time descriptor

• Ada supports all three string length options

Page 7: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-7

User-Defined Ordinal Types

• An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers

• Examples of primitive ordinal types in Java– integer– char– Boolean

• Other programming languages support– enumeration– subrange

Page 8: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-8

Enumeration Types

• It consists of a finite sequence of named constants.

• C# exampleenum days {mon, tue, wed, thu, fri, sat, sun};

• Design issues– Is an enumeration constant allowed to appear in more

than one type definition?• Ada: yes, resolving the overloading by the context• Others: No

– Are enumeration values coerced to integer?• Pascal, C, C++: yes• Ada, C#, and Java 5.0: No (see the next page for the

advantage)

Page 9: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-9

Evaluation of Enumerated Type

• Aid to readability, – e.g., no need to code a color as a number; can

represent directly:

enum colors {red, blue, green};colors mycolor = blue, yourcolor = red;

• Aid to reliability, – compiler can check:

• operations (don’t allow colors to be added) • No enumeration variable can be assigned a value

outside its defined range– C++ performs checking to see if it’s in the

range of the internal representation, but not always effective.

Page 10: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-10

Subrange Types

• An ordered contiguous subsequence of an ordinal type– Example: 12..18 is a subrange of integer type

• Ada’s designtype Days is (mon, tue, wed, thu, fri, sat, sun);

subtype Weekdays is Days range mon..fri;

subtype Index is Integer range 1..100;

Day1: Days;

Day2: Weekday;

Day2 := Day1;

Page 11: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-11

Subrange Evaluation

• Aid to readability– Make it clear to the readers that variables of

subrange can store only certain range of values

• Reliability– Assigning a value to a subrange variable that

is outside the specified range is detected as an error

Page 12: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-12

Array Types

• An array– holds a sequence of elements of the same type; – an individual element is identified by its position in the

aggregate, relative to the first element.– supports random access to its element.

• Design issue:– What types are legal for subscripts?– Are subscripting expressions in element references

range checked?– When are subscript ranges bound?– When does allocation take place?– Can array objects be initialized?

Page 13: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-13

Array Indexing

• Indexing (or subscripting) is a mapping from indices to elements

array_name (index_value_list) an element

• Arrays Index (Subscript) Types– FORTRAN, C, Java : integer only– Pascal: any ordinal type (integer, Boolean, char,

enumeration)– Ada: integer or enumeration (includes Boolean and

char)

• Index range checking– C, C++, Perl, and Fortran: no– Java, ML, C#: yes, more reliable

Page 14: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-14

Subscript Binding and Array Categories

• Static: – subscript ranges are statically bound and

storage allocation is static (before run-time)– Advantage: efficiency (no dynamic allocation)– C and C++ arrays that include static

modifier• Fixed stack-dynamic:

– subscript ranges are statically bound, but the allocation is done at declaration time

– Advantage: space efficiency – C and C++ arrays without static modifier

Page 15: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-15

Subscript Binding and Array Categories (continued)

• Stack-dynamic: – subscript ranges are dynamically bound and the storage

allocation is dynamic (done at run-time)– Advantage: flexibility (the size of an array need not be

known until the array is to be used)– Ada arrays can be stack-dynamic, e.g.,

list: array(1..LenVar) of integer;

• Fixed heap-dynamic: – similar to fixed stack-dynamic: storage binding is

dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated from heap, not stack)

– C++ : by the operation new

Page 16: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-16

Subscript Binding and Array Categories (continued)

• Heap-dynamic: – binding of subscript ranges and storage

allocation is dynamic and can change any number of times

– Advantage: flexibility (arrays can grow or shrink during program execution)

– Perl and JavaScript

Page 17: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-17

Array Initialization

• Some language allow initialization at the time of storage allocation– C, C++, Java, C# int list [] = {4, 5, 7, 83}

– Character strings in C and C++char name [] = “freddie”;

– Arrays of strings in C and C++char *names [] = {“Bob”, “Jake”, “Joe”];

– Java initialization of String objectsString[] names = {“Bob”, “Jake”, “Joe”};

Page 18: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-18

Arrays Operations

• APL provides the most powerful array processing operations for vectors and matrixes as well as unary operators (for example, to reverse column elements)

• Ada allows array assignment but also catenation

• Fortran provides elemental operations because they are between pairs of array elements– For example, + operator between two arrays

results in an array of the sums of the element pairs of the two arrays

Page 19: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-19

Implementation of Arrays

• Elements of the same array appear in consecutive locations.

• for single-dimensioned arrays:– address(list[k]) =

address (list[lower_bound])+ ((k-lower_bound) * element_size)

= k* element_size + address (list[lower_bound])-lower_bound * element_size

– address (list[lower_bound]) -lower_bound * element_size could be precomputed as soon as the array is declared, so each element can be accessed in constant time at the run time -> providing random access.

• Accessing Multi-dimensioned Arrays– Row major order (by rows) – used in most languages– column major order (by columns) – used in Fortran

Page 20: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-20

Locating an Element in a Multi-dimensioned Array•General format for row major:

address (list[i,j]) = address(list[row_lb,col_lb]) + (((i - row_lb) * n) + (j - col_lb)) * element_size

Page 21: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-21

Associative Arrays

• An associative array – an unordered collection of data elements that are

indexed by an equal number of values called keys – Implemented by hash functions

• Associative Arrays in Perl– Names begin with %; literals are delimited by

parentheses%hi_temps = ("Mon" => 77, "Tue" => 79, “Wed” =>

65);– Subscripting is done using braces and keys– Adding a new element

$hi_temps{“Thu"} = 83;– Elements can be removed with delete

delete $hi_temps{"Tue"};

Page 22: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-22

Record Types

• A record – a possibly heterogeneous aggregate of data

elements in which the individual elements are identified by names

– It allows variables relevant to an object to can be grouped together and treated as an unit.

• Design issues:– What is the syntactic form of references to the

record element (field)? – Are elliptical references allowed?

Page 23: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-23

In COBOL

• uses level numbers to show nested records 01 EMP-REC. 02 EMP-NAME. 05 FIRST PIC X(20). 05 MID PIC X(10). 05 LAST PIC X(20). 02 HOURLY-RATE PIC 99V99.

• Record Field Referencesfield_name OF record_name_1 OF ... OF record_name_n

• Allowing elliptical references– allow leaving out record names as long as the reference is

unambiguous– Example:FIRST, FIRST OF EMP-NAME, and FIRST of EMP-REC

Page 24: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-24

Other languages (Ada for example)

• Simply nesting record declarations inside record declarationstype Emp_Name_Type is record

First: String (1..20);Mid: String (1..10);Last: String (1..20);

end record;type Emp_Record_Type is record

Emp_Name: Emp_Name_Type;Hourly_Rate: Float;

end record;Emp_Rec: Emp_Rec_Type;

• Record Field References (dot notation)record_name_1.record_name_2. ...record_name_n.field_namee.g., Emp_Rec.Emp_Name.Mid

Page 25: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-25

Operations on Records

• Assignment is very common if the types are identical– e.g., x : = y. All values of fields of y will be assigned to

fields of x component-wise.

• COBOL provides MOVE CORRESPONDING if the types are different– Copies a field of the source record to the target record,

only if the target record has a field with the same name. – E.g., move corresponding input-record to output-record

• Ada allows record comparison for equality and inequality.

Page 26: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-26

Evaluation and Comparison to Arrays• The design of record is straight forward and safe,

since the field names are static.• All elements in an array have the same type

(homogeneous component types), but not for records (heterogeneous component types).

• More flexible in choosing field types of records; more flexible in selecting array elements, which can be changed at run time.

• The layout of both is determined at compile time.

Page 27: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-27

Implementation of Record Type

Offset address relative to the beginning of the records is associated with each field, since the sizes of the fields are not necessarily the same.

Page 28: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-28

Unions Types

• A union – a type whose variables are allowed to store different

type values at different times during execution– It represents objects that have some but not all

properties in common;• Type checking of unions require that each union

include a type indicator called a discriminant– Supported by Ada (see the next two pages)

• Fortran, Pascal, C, and C++ provide union constructs without type checking->potentially unsafe construct (see page 31)

• Java and C# do not support unions– Reflective of growing concerns for safety in

programming language

Page 29: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-29

Ada Union Types

type Shape is (Circle, Triangle, Rectangle);type Colors is (Red, Green, Blue);type Figure (Form: Shape) is record

Filled: Boolean;Color: Colors;case Form is

when Circle => Diameter: Float;when Triangle =>

Leftside, Rightside: Integer;Angle: Float;

when Rectangle => Side1, Side2: Integer;end case;

end record;

Page 30: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-30

Ada Union Type Illustrated

A discriminated union of three shape variables

Page 31: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-31

A Union example in Pascal

type kind = 1..2;

t = record

case kind of1: (i: integer);2: (r: real)

end;var x: t;

x.r := 1.0;writeln(x.i)=> produce the machine-dependent output like

16512

Page 32: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-32

Pointer Type

• A pointer type variable has a range of values that consists of memory addresses and a special value, nil

• Provide the power of indirect addressing• Provide a way to manage dynamic

memory• A pointer can be used to access a location

in the area where storage is dynamically created (usually called a heap)

Page 33: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-33

Pointer Operations• Two fundamental operations: assignment and

dereferencing• Assignment is used to set a pointer variable’s

value to some useful address• Dereferencing yields the value stored at the

location represented by the pointer’s value– C++ uses an explicit operation via *– j = *ptr sets j to the value located at ptr

Page 34: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-34

Implementation of dynamic data (in Pascal)

• using fixed-size cells, that is, records and pointers. e.g., type link = cell ;

cell = record info : integer ; next : link ; end• The following statements can create a linked list:

new(p); p^.info=i; p^.next := front; front :=p;

Page 35: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-35

Problems with Pointers

• Dangling pointers (dangerous)– A pointer points to a heap-dynamic variable that has

been de-allocated– They are unsafe because the result of dereferencing a

dangling pointer is unpredictable.

• Lost heap-dynamic variable– A storage that is no longer accessible to the user

program (often called garbage)– might be caused by improper assignment to pointers;

• Pointer p1 is set to point to a newly created heap-dynamic variable

• Pointer p1 is later set to point to another newly created heap-dynamic variable

– can degrade the performance of a running program.

Page 36: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-36

Pointers in Ada

• The dangling-pointer problem is alleviated, because dynamic objects can be automatically de-allocated at the end of pointer's type scope, thus dramatically lessening the need for explicit deallocation.

• The lost heap-dynamic variable problem is not eliminated by Ada

Page 37: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-37

Pointers in C and C++

• Extremely flexible but must be used with care• Pointers can point anywhere in the memory,

regardless of where it was allocated-> dangerous.• Pointer arithmetic is possible (also see the next

page)float stuff[100];float *p;p = stuff;*(p+5) is equivalent to stuff[5] and p[5]*(p+i) is equivalent to stuff[i] and p[i]

• Explicit dereferencing (*) and address-of operators (&)int *ptr, count, init; ptr = &init; count = *ptr;

Page 38: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-38

Pointer Arithmetic in C and C++

• Operations on pointers in C allow successive characters in a string to be accessed,

• To copy a string from a buffer into the pool:p = start[next];q = buffer;for (;;) {

*p = *q;if (*p == EOS) break;p++;q++;

}

Page 39: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-39

Heap Management

• A very complex run-time process• Two approaches to reclaim garbage

– Reference counters (eager approach): • maintain a counter in every cell that store the

number of pointers currently pointing at the cell• Advantage: reclamation is gradual• Disadvantages: space required, execution time

required, complications for cells connected circularly

– Garbage collection (lazy approach):• reclamation occurs when the list of variable space

becomes empty

Page 40: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-40

Garbage Collection

• Every heap cell has an extra bit used by collection algorithm

• All cells initially set to garbage

• All pointers traced into heap, and reachable cells marked as not garbage

• All garbage cells returned to list of available cells

• Disadvantages: when you need it most, it works worst (takes most time when program needs most of cells in heap)

Page 41: 6-1 Chapter 6: Data Types Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types.

6-41

Summary

• The data types of a language are a large part of what determines that language’s style and usefulness

• The primitive data types of most imperative languages include numeric, character, and Boolean types

• The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programs

• Arrays and records are included in most languages

• Pointers are used for addressing flexibility and to control dynamic storage management