Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer...

28
Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer Science www.soe.ucsc.edu/classes/cmps010/Spring11 [email protected] 6 April 2011
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer...

Characters, Strings, Basic Data Structures

UC Santa CruzCMPS 10 – Introduction to Computer Sciencewww.soe.ucsc.edu/classes/cmps010/[email protected] April 2011

UC SANTA CRUZ

Class website

http://www.soe.ucsc.edu/classes/cmps010/Spring11/

Please write this down, and bookmark it

Holds: Syllabus (including homework due dates) Homework assignment descriptions Description of course readings Links to class lecture notes

The final exam is scheduled for Tuesday, June 7, 8am-11am This class will have a final exam. Please plan on this.

UC SANTA CRUZ

Tutoring available

Learning Support Services (LSS) Has tutoring available for students in CMPS 10 Students meet in small groups, led by a tutor Students are eligible for up to one-hour of tutoring per

week per course, and may sign-up for tutoring at https://eop.sa.ucsc.edu/OTSS/tutorsignup/ beginning April 5th at 10:00am.

Brett Care - [email protected] is the tutor for CMPS 10 that LSS has hired

UC SANTA CRUZ

Abstraction and Models

Converting the real world into data: Create a model of the real world Represent that model in data

How do you model the real world? Involves a process called abstraction

Abstraction Prerequisite: know your problem or application Focus on aspects of the real world that are important to the

problem Add those elements to your model

Omit elements of the real world that aren’t relevant Implies: the same real world scenario can be modeled in many

ways, depending on the problem at hand

physical world

model

data (inside

computer)

abstraction

representation

UC SANTA CRUZ

Representing models as data

Most models can be represented using: Basic data types

Integers Floating point Boolean Characters Strings

Basic data structures Arrays Lists Stacks/Queues Trees Graphs

UC SANTA CRUZ

Boolean A boolean data type represents true or false This is represented as a 1 (true) or a 0 (false)

How much space does a boolean require? It varies. The minimum required space is 1 bit However, typically a boolean is stored in an

entire byte 8 bits, as in the C# language Only use one bit:

00000001 = true 00000000 = false

… or in an integer 16 or 32 bits, as in the C language which lacks a

standard boolean type 0000000000000001 = true (16 bits) 0000000000000000 = false (16 bits)

www.popwuping.com/culture/true-urban-park-in-bangkok.php

tv.wearefalse.com

UC SANTA CRUZ

Character

A single letter, number, punctuation, symbol, etc. Historically, in the US characters

were represented using the US-ASCII code (uses 7 bits of an 8 bit byte)

This was superseded by ISO/IEC 8859 Provided support for special

characters used in specific languages, along with accented characters

Examples: ß (German), ñ (Spanish), å (Swedish and other Nordic languages) and ő (Hungarian)

But, didn’t handle representation of ideographic languages

Led to development of many standards for this specific purpose, and for other languages not covered by ISO 8859

Historically a character of storage meant one byte (8 bits)

This still holds true in many discussions today

Lead typeblog.davidcaputo.net/category/design/

UC SANTA CRUZ

UNICODE Today, the UNICODE standard is rapidly

becoming standard Can represent every character in every

human language with an alphabet Contains more than 109,000 characters

covering 93 scripts Initial idea comes from Joe Becker and

Mark Davis in 1987 Today, maintained by UNICODE consortium

Each character has a unique 32 bit identifier But, 32 bit per character is a lot of space So, have multiple encodings

UTF-8: most popular, maximizes backward popularity with US-ASCII, 8bits/byte for US-ASCII, more bytes for other scripts (variable width)

UTF-16: most common scripts are 16 bits, less common ones are more (variable width)

UTF-32/UCS4: each character uses 32 bits (4 bytes)

One of the great unsung achievements of computer science

www.macchiato.com/

UC SANTA CRUZ

Strings

A string is a sequence of characters “Hello, world!” is the most famous string.

Two main ways to represent: A sequence of characters, ended with a 0 (null

character)

A length, and then that many characters

13

Each character is represented according to some character encoding (UTF-8, UTF-16, US-ASCII, etc.)

H e l l o , W o r l d ! /0 null character

H e l l o , W o r l d !

UC SANTA CRUZ

Basic Data Structures

UC SANTA CRUZ

Modeling and sets

When modeling the real world, there is a need to model sets of things Can also think of this as a group of things, collection of things,

etc. Examples:

The temperature at my house measured every hour over a day All of the songs in my music collection All of the houses on a street All of the people in my family People standing in line at a restaurant

Frequently, these sets have a natural order Temperature over a day:

First temperature reading at hour 0, then the second at hour 1, etc.

Houses on a street: Order by house number

UC SANTA CRUZ

Representing sets

Many different data structures have been developed to represent sets Array

A set with fixed length Elements can be added anywhere Can go directly to any element

Lists A set with variable length Elements can be added anywhere Need to search list for specific element

Stack/queue A set with variable length Elements can be added only at beginning (stack) or end

(queue) Can only retrieve element from beginning (stack/queue)

www.setgame.com

UC SANTA CRUZ

Arrays

Used to represent sets of fixed length Also represents mathematical vectors and matrices of fixed

size Once set, cannot change the size of an array (biggest

limitation) But, this limitation permits fast lookup of values (biggest

strength)

How do they work (1-dimensional) Given an integer index can:

Retrieve an element of the array array[index] value

Set an element of the array value array[index]

Can have an array comprised of any basic data type Array of integers, array of floats, array of strings, etc.

UC SANTA CRUZ

Array example

Consider a set of temperature values at a location, with temperature readings taken once every hour Have a total of 24 readings each day, and this won’t

change

Model for one day of readings A set of 24 ordered temperature readings

Representation Use an array to represent the ordered set of 24 readings Use a floating point number to represent each

temperature readingtemperature is array[24] of float

UC SANTA CRUZ

Array Example: 24 hours of temperature

Typical use: temperature[0] = 52.5 Sets the temperature value for

hour 0 to 52.5 noon_temp = temperature[12] The variable noon_temp

takes the value of the temperature array at hour 12 (noon), 68.2

52.5

52.0

51.7

51.2

50.8

50.1

0:

1:

2:

3:

4:

5:

49.8

51.6

55.7

57.2

61.4

65.8

6:

7:

8:

9:

10:11:

68.2

70.4

72.5

72.9

72.1

70.3

12:13:14:15:16:17:

68.3

61.8

58.0

56.4

54.3

52.6

18:19:20:21:22:23:

UC SANTA CRUZ

2-dimensional arrays

It is also possible to have 2 and more dimensional arrays

Represents tabular data, or matrices

In this case, have indices for the row and column of the data

Example: Temperature readings for a yeartemperature is array[365][24] of float

Noon_Jan_First_temp = Temperature[0][12] The temperature on January 1, at noon

UC SANTA CRUZ

Array: pros and cons

Pros Permits fast access to elements of the array Array notation maps well to certain kinds of problems

(mathematical matrices)

Cons Array size is fixed, and cannot grow

In many situations, the amount of data is unknowable in advance

Example: Your music collection. Can you predict how many songs you’ll

acquire over your lifetime? For this situation, would be better to have a representation

that can grow or shrink over time

UC SANTA CRUZ

List

Used to represent sets of variable length It is possible to change the length of a list by adding and

removing members Are slower than arrays for looking up members

How do they work List.add(element)

Adds element to end of the list List.remove(element)

Searches list, and removes first one that matches element List.Insert(position, element)

Adds element at specified position in list

Can have a list of any basic data type List of integers, list of floats, list of strings, etc.

UC SANTA CRUZ

List example

Consider a list of the titles of songs you own This list will grow over time… and may shrink

Maybe you delete the Miley Cyrus in your collection?

Model A set of song titles

Representation Use a list to represent the set and a string to represent

the title Songtitles is List of string Songtitles is a list of

strings

UC SANTA CRUZ

List example

Start with this list, called Songlist 0, “Poker face” 1, “Video killed the radio star” 2, “Rock star”

Add a song to the list Songlist .Add(“Beat it”)

0, “Poker face” 1, “Video killed the radio star” 2, “Rock star” 3, “Beat it”

Songlist .Remove(“Rock star”) 0, “Poker face” 1, “Video killed the radio star” 2, “Beat it”

Songlist .Retrieve(1) Gives the value, “Video killed the radio star”

Songlist .Insert(1, “Let’s Go”) 0, “Poker face” 1, “Let’s Go” 2, “Video killed the radio star” 3, “Beat it”

UC SANTA CRUZ

Linked List

The typical implementation of a list is as a linked list

Each element holds a pointer to the next element in the list

Can also have each element point to the next and previous element in the list (permits fast “previous item” capability)

A singly linked list (en.wikipedia.org/wiki/Linked_list)

0: 12, 1: 99, 2: 37

A doubly linked list (en.wikipedia.org/wiki/Linked_list)

0: 12, 1: 99, 2: 37

UC SANTA CRUZ

History of Linked List

Linked lists emerged early in computing 1955-1956 by Allen Newell, Cliff Shaw, Herbert Simon

while developing language IPL In 1958, the language LISP (List Processor) was

developed at MIT by John McCarthy Made lists (implemented as linked lists) a fundamental part

of the language

Today, most major programming languages provide a built-in list data type, often with multiple variations

UC SANTA CRUZ

List: Pros and Cons

Pros Can handle a list of any length

Cons Slower access than with arrays Slower to add elements into a list Notation for accessing elements not as convenient as

arrays Some languages allow use of array notation (e.g.,

list[index]) with lists

UC SANTA CRUZ

Stack and Queue Stacks and queues are used to represent ordered sets where you

typically want to access the most recently added element (the “top”) Accessing the top element is called “pop” for Stacks, “dequeue” for Queues

With a stack, the elements can only be added (push) to the top With a queue, the elements can only be added (enqueue) to the bottom

Stacken.wikipedia.org/wiki/Stack_(data_structure)

Queueen.wikipedia.org/wiki/Queue_(data_structure)

UC SANTA CRUZ

Stack example

Consider a list of web page addresses (URLs) you have visited in your browser If you hit the “back” button, you would like to go to the

last page you visited

Model A set of ordered web page URLs

Representation Use a stack to represent the set, and a string to

represent each URL Visited_pages is stack of string

Visited_pages is a stack data structure where each element is a string

The Visited_pages structure ensures that the last element added is the first element removed (last-in, first-out, LIFO)

UC SANTA CRUZ

Stack example Assume the starting history is as follows and you’re at the

page www.engadget.com 0: www.ucsc.edu 1: games.soe.ucsc.edu 2: www.acm.org (www.engadget.com) isn’t added until you go to a new page, and it becomes history

Now, you browse over to Slashdot (www.slashdot.com) Visited_pages.push(“www.engadget.com”) 0: www.engadget.com 1: www.ucsc.edu 2: games.soe.ucsc.edu 3: www.acm.org

Then, you decide to hit back button Visited_pages.pop() Returns: www.engadget.com

0: www.ucsc.edu 1: games.soe.ucsc.edu 2: www.acm.org

Browser reloads this page

UC SANTA CRUZ

Queue example

Consider people waiting in line at the coffee cart It might be nice if you could just give your name, and

they would call you when it’s your turn

Model An ordered set of customer names.

Representation Use a queue to represent the set, and a string to

represent the customer name Coffee_line is queue of string

Coffee line is a data structure where each element is a string

The queue data structure ensures no one will cut in line (first-in line, first-served, or first-in, first-out, FIFO)

UC SANTA CRUZ

Queue example Assume the line is currently:

0: “Ada Lovelace” 1: “Charles Babbage” 2: “Grace Hopper”

A new person, Alan Turing comes to the end of the line Coffee_line.enqueue(“Alan Turing”) Line is now (Turing is added to the end of the queue)

0: “Ada Lovelace” 1: “Charles Babbage” 2: “Grace Hopper” 3: “Alan Turing”

Now the next person in line is served Coffee_line.dequeue() Returns “Ada Lovelace” List is now (Ada came from the from the front of the queue)

0: “Charles Babbage” 1: “Grace Hopper” 2: “Alan Turing”