Type Systems and Structures Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of...

Post on 11-Jan-2016

221 views 2 download

Tags:

Transcript of Type Systems and Structures Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of...

Type Systems and Structures

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language PrinciplesLecture 22

Data Types

• Most PLs have them.• Two purposes:

• Provide context for operations, e.g. a+b (ints or floats). In Java,

• Widget x=new(Widget), allocates memory for an object of type Widget, and invokes a constructor.

• Limits semantically legal operations, e.g. n + "x".

Type equivalence and compatibility.

• At hardware level, bits have no type.• In a PL, need types

• to associate with values,• to resolve contextual issues and • to check for illegal operations.

Type System

1. Mechanism for defining types and associating them with PL constructs.

2. Rules for determining type equivalence, compatibility. Type inferencing rules are used to determine the type of an expression from its parts, and from its context.

Type Systems

• Distinction between "type of expression" and "type of object" important only in PLs with polymorphism.

• Subroutines have a type in some languages (RPAL: lambda-closure), if they need to be passed as parameters, stored or returned from function.

Type Systems (cont’d)

• Type checking: process of enforcing type compatibility rules. A "type clash" occurs if not.

• Strongly typed language: enforcement of operations only applied to objects of types intended. Example: C is not very strongly typed: "while (*p++) { ... }" used to traverse an array.

Type Systems (cont’d)

• Statically typed language: strongly typed, with enforcement occurring at compile time. Examples: ANSI C (more so than classic C), Pascal (almost, untagged variant records)

• Some (few) languages are completely untyped: Bliss, assembly language.

Type Systems (cont’d)

• Dynamic (run-time) type checking: RPAL, Lisp, Scheme, Smalltalk.

• Other languages (ML, Miranda, Haskell) are polymorphic, but use significant type inference at compile time.

Type Definitions

• Early on (Fortran, Algol, BASIC) available types were few and non-extensible.

• Many languages distinguish:• type declaration (introduce name

and scope)• type definition (describe the object

or type itself).

Type Definitions (cont’d)

• Three approaches to describe types:

1. Denotational.• A type is a set of values (domain).• An object has a type if its value is in the

set.2. Constructive:

• A type is either atomic (int, float, bool, etc.) or is built (constructed) from atomic types, i.e. arrays, records, sets, etc.

3. Abstraction:• A type is an interface: a set of operations

upon certain objects.

Classification of Types

• Scalar (a.k.a. discrete, ordinal) types:• The terminology varies (bool,

logical, truthvalue).• Scalars sometimes come in several

widths (short, int, long in C, float and double, too).

• Integers sometimes come "signed" and "unsigned."

Classification of Types (cont’d)

• Characters sometimes come in different sizes (char and "wide" in C, accommodating Unicode)

• Sometimes "complex" and "rational" are provided.

• COBOL and PL/1 provide "decimal" type. Example (PL/1):

FOR I=0 TO 32/2 ...

Classification of Types (cont’d)

Enumerations:• Pascal: type day = (yesterday, today, tomorrow)• A newly defined type, so:

var d: day;for d := today to tomorrow do ...

• Can also use to index arrays:var profits: array[day] of real;

In Pascal, enumeration is a full-fledged type.

Classification of Types (cont’d)

• C: enum day {yesterday,today,tomorrow }; equivalent to:

typedef int day; const day yesterday=0; today=1; tomorrow=2;

Classification of Types (cont’d)

• Subrange types.• Values are a contiguous subset of

the base type values. The range imposes a type constraint.

Pascal:

type water_temp = 32 .. 212;

Classification of Types (cont’d)

• Composite Types.• Records. A heterogeneous

collection of fields.• Variant records. Only one of the

fields is valid at any given time. Union of the fields, vs. Cartesian product.

• Arrays. Mapping from indices to data fields.

Classification of Types (cont’d)

• Sets. Collections of distinct elements, from a base type.

• Pointers. l-values. Used to implement recursive data types: an object of type T contains references to other objects of type T.

• Lists. Length varies at run-time, unlike (most) arrays.

• Files. Hold a current position.

Orthogonality

• Pascal: variant fields required to follow non-variant ones.

• Most PL's provide limited ability to specify literal values of composite types.

• Example: In C, int [] x = {3,2,1} Initializer, only allowed for

declarations,not assignments.

• In Ada, use aggregates to assign composite values.

Type Equivalence

• Structural equivalence: Two types are equivalent if they contain the same components.

• Varies from one language to another.

Type Equivalence (cont’d)Example:

type r1 = record a,b: integer; end;

type r2 = record b: integer;

a: integer; end;

var v1: r1; v2: r2; v1 := v2;

• Are these types compatible ?• What if a and b are reversed ?

• In most languages, no. In ML, yes.

Type Equivalence (cont’d)

• Name equivalence: based on type definitions: usually same name.

• Assumption: named types are intended to be different.

• Alias types: definition of one type is the name of another.

• Question: Should aliased types be the same type?

Type Equivalence (cont’d)

• In Modula-2:

TYPE stack_element = INTEGER; MODULE stack; IMPORT stack_element; EXPORT push, pop;

procedure push (e:stack_element); procedure pop ( ): stack_element;

• Stack module cannot be reused for other types.

Type Equivalence (cont’d)

• However,

TYPE celsius = REAL; fahrenh = REAL; VAR c: celsius; f: fahrenh;

f := c; (* should probably be an error *)

Type Equivalence (cont’d)

• Strict name equivalence: aliased types are equivalent. • type a = b considered both

declaration and definition.

• Loose name equivalence: aliased types not equivalent.• type a = b considered a

declaration; a and b share the definition.

Type Equivalence (cont’d)

In Ada: compromise, allows programmer to indicate:

• alias is a subtype (compatible with base type)

subtype stack_element is integer;

Type Equivalence (cont’d)

• In Ada, an alias is a derived type (not compatible)

subtype stack_element is integer; type celsius is new REAL; type fahrenh is new REAL;

• Now the stack is reusable, and celsius is not compatible with fahrenh.

Type Conversion and Casts

• Many contexts in which types are expected:• assignments, unary and binary

operators, parameters.

• If types are different, programmer must convert the type (conversion or casting).

Three Situations

1. Types are structurally equivalent, but language requires name equivalence. Conversion is trivial.

• Example (in C):

typedef number int; typedef quantity int; number n; quantity m; n = m;

Three Situations (cont’d)

2. Different sets of values, but same representation.

• Example: subrange 3..7 of int.• Generate run-time code to check for

appropriate values (range check).

Three Situations (cont’d)

3. Different representations.

• Example (in C):

int n; float x; n = x;

• Generate code to perform conversion at run-time.

Type Conversions

• Ada: name of type used as a pseudofunction:

• Example: n = integer(r);

• C, C++, Java: Name of type used as prefix operator, in ()s.

• Example: n = (int) r;

Type Conversions (cont’d)

• If conversion not supported in the language, convert to pointer, cast, and dereference (ack!):

r = *((float *) &n);

• Re-interpret bits in n as a float.

Type Conversions (cont’d)

• OK in C, as long as• n has an address (won't work with

expressions)• n and r occupy the same amount of

storage.• programmer doesn't expect run-

time overflow checks !

Type Compatibility and Coercions• Coercion: implicit conversion.

• Rules vary greatly from one language to another.

Type Compatibility and Coercions (cont’d)

• Ada: Types T and S are compatible (coercible) if either1.T and S are equivalent.2. One is a subtype of the other (or both

subtypes of the same base type).3. Both are arrays (same numbers, and

same type of elements).

• Pascal: same as Ada, but allows coercion from integer to real.

Type Compatibility and Coercions (cont’d)

• C: Many coercions allowed. General idea: convert to narrowest type that will accommodate both types.

1. Promote char (or short int) to int, guaranteeing neither is char or short.

2. If one operand is a floating type, convert the narrower one:

float -> double -> long double

Type Compatibility and Coercions (cont’d)

• Note: this accommodates mixtures of integer and floating types.

3. If neither type is a floating type, convert the narrower one:

int-> unsigned int-> long int-> unsigned long int

Examples

char c; /* signed or unsigned -- implementation? */short int s;unsigned int u;int i;long int l;unsigned long int ul;float f;double d;long double ld;

Examples (cont’d)

i + c; /* c converted to int */i + s; /* s converted to int */u + i; /* i converted to unsigned int */l + u; /* u converted to long int */ul + l; /* l converted to unsigned long int */f + ul; /* ul converted to float */d + f; /* f converted to double */ld + d; /* d converted to long double */

Type Compatibility and Coercions (cont’d)

• Conversion during assignment.• usual arithmetic conversions don't

apply.• simply convert from type on the

right, to type on the left.

Examples

s = l; /* l's low-order bits -> signed number */

s = ul; /* ditto */l = s; /* s signed-extended to longer length */ul = s: /* ditto, ul's high-bit affected ? */s = c; /* c extended (signed or not) to */

/* s's length, interpreted as signed */f = l; /* l converted to float, precision lost */d = f: /* f converted, no precision lost. */f = d; /* d converted, precision lost */

/* result may be undefined */

Type Inference

• Usually easy.• Type of assignment is type of left-side.• Type of operation is (common) type of

operands.

Type Inference (cont’d)

• Not always easy.

Pascal: type A: 0 .. 20; B: 10.. 20: var a: A; b: B;

• What is the type of a+b ? In Pascal, it's the base type (integer).

Type Inference (cont’d)

• Ada:• The type of the result would be an

anonymous type 0..40.• The compiler would generate run-time

checks for values out of bounds. • Curbing unnecessary run-time checks

is a major problem.

Type Inference (cont’d)

• Pascal allows operations on sets:

var A: set of 1..10; B: set of 10..20; C; set of 1..15; i: 1..30;

C := A + B * [1..5,i];

• The type of the expression is set of integer (the base type). Range check is required when assigning to C.

Type Inference (cont’d)

• Type safety in Java

Type Inference in ML

• Programmer can declare types, but if not, ML infers them, using unification (more later in Prolog).

Type Inference in ML (cont’d)

• ML infers the return type of "fib":1. i+1 implies i is of type int.2. i=n implies n is of type int.3. fib_helper(0,1,0) implies f1, f2 of

type int, and confirms (doesn't contradict) i is of type int.

4. fib_helper returning f2 implies fib_helper returns int.

5. fib returning fib_helper(0,1,0) implies fib returns int.

Type Inference in ML (cont’d)

• ML checks type consistency: no contradictions or ambiguities.

• By inferring types, ML allows polymorphism:

fun compare (x,p,q) = if x = p then if x = q then "all three match" else "first two match" else if x = q then "second two match" else "none match";

Type Inference in ML (cont’d)

• The type of fun is not specified. Typeinference yields any type for which '=' is legal (many of them !).

• Result is polymorphic 'compare' method. • It's possible to underspecify the type:

fun square (x) = x * x; (* int or float ? *)

fun square (x:int) = x * x; (* ambiguity gone *)

Records (structs) and Variants (unions)

• In Pascal,

Records (structs) and Variants (unions)

• Representation in Pascal:

Records (structs) and Variants (unions, cont’d)

• Usage:

var copper: element; copper.name := 'Cu'; • Record can be "packed", filing in holes,

but forcing compiler to generate code that can access fields using multi-instruction sequences (less efficient).

Records (structs) and Variants (unions, cont’d)

• Packed Representation

Records (structs) and Variants (unions, cont’d)

• Usage:

element copper; strcpy(copper.name,"Cu");

Records (structs) and Variants (unions, cont’d)

• Most languages allow assignment of one record to another, but if not, a "block_copy" routine can solve the problem.

• Most languages don't allow equality comparison. A "block_compare" routine might have problems with garbage in the holes.

Records (structs) and Variants (unions, cont’d)

• Compilers often rearrange fields to reduce space:

Pascal with Statements

• Introduce a nested scope, in which record fields are visible without record name.

• Useful for deeply nested structures.• Example:

with copper do begin name := 'Cu'; atomic_number := 29; atomic_weight := 63.546 metallic := true; end;

Pascal with Statements (cont’d)• Problems with Pascal's with statement:

1. Can only manipulate fields of ONE record, not two. Not a shortcut for copying fields from one record to another.

2. Local names that match field name become inaccessible.

3. Can be difficult to read, especially in long or deeply nested with statements.

Pascal with Statements (cont’d)• Module-2 allows aliases for

complicated expressions:

WITH e=copper DO BEGIN e.name := 'Cu'; e.atomic_number := 29; e.atomic_weight := 63.546 e.metallic:= true; END;

Pascal with Statements (cont’d)• Can access one than one record at a

time:

WITH e=copper, f=iron DO e.metallic := f.metallic; END;

Pascal with Statements (cont’d)

• In Modula-3, the with statement goes further:

WITH d = (...) DO IF d <> 0 THEN val := n/d ELSE val := 0;

Pascal with Statements (cont’d)

• C gets around this using the conditional expression:

{ double d = (...); val = (d ? n/d : 0); }

Pascal with Statements (cont’d)

• C has no need for a with statement, just use pointers:

element *e = { ... } element *f = { ... } e->name = f->name; e->atomic_number =

f.atomic_number; e->atomic_weight =

f.atomic_weight; e->metallic = f.metallic ;

Variant Records

• Choice between alternative fields.

• Only one is valid at any given time.

Example (Pascal)

Example (Pascal, cont’d)

• "naturally_occuring" is the "tag", which indicates whether the element contains

1. A source and a prevalence, or2. A half_life.

Example (Pascal, cont’d)

In C

Variant Records (cont’d)

• Unions are not integrated with structs, so there are additional names:

element e; e.extra_fields.natural_info.source = 3; e.extra_fields.half_life = 3.5;

Variant Records (cont’d)

• In general, type safety is compromised:

type tag = (is_int, is_real, is_bool); var irb: record case which: tag of is_int: (i:integer); is_real: (r:real); is_bool: (b:Boolean); end;

Variant Records (cont’d)

• Usage: irb.which := is_real; irb.r := 3.0; irb.i := 7; (* run-time error *)

Variant Records (cont’d)

• Changing the tag field should make all other fields in the variant uninitialized, but it's very expensive to keep track of at run-time. Most compilers won't catch this:

irb.which := is_real; irb.r := 3.0; irb.which := is_int; writeln(irb.i); (* uninitialized, or worse, shares space with irb.r *)

Variant Records (cont’d)

• Worse yet, the tag field is optional:

type tag = (is_int, is_real, is_bool); var irb: record case tag of (* 'which' field is gone ! *) is_int: (i:integer); is_real: (i:real); is_bool: (i:Boolean); end;

Variant Records (cont’d)

• No way to catch

irb.r := 3.0; writeln(irb.i); • Designers of Modula-3 dropped variant

records, for these safety reasons.

• Similarly, designers of Java dropped union of C and C++.

Variants in Ada

• Must have a tag (discriminant).• If tag changes, all fields in the variant

must be changed, by• assigning a whole record (A := B;), or• assigning an aggregate.

Example (with discriminant default value)

Variants in Ada (cont’d)

• Declaration can use the default:

copper: element;

• Declaration can override the default:

plutonium: element (false); americium: element (naturally_occuring => false);

Variants in Ada (cont’d)

• The type declaration may:

• provide a default (constrained discriminant), which cannot be changed.

• not provide a default (unconstrained discriminant); then every variable declaration must do so, and the tag may be changed.

Variants in Ada (cont’d)

• In short, discriminants are never uninitialized.

• In Ada, variants are required to appear at the end of the record.

• The compiler assigns a constant address to every field.

Variants in Modula-2

• In Modula-2, this restriction is dropped. Usually, a fixed address is assigned to each field, leaving holes where variants differ in size.

Variants in Modula-2 (cont’d)

Type Systems and Structures

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language PrinciplesLecture 22