Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer...
-
date post
22-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer...
Characters, Strings, Basic Data Structures
UC Santa CruzCMPS 10 – Introduction to Computer Sciencewww.soe.ucsc.edu/classes/cmps010/[email protected] April 2011
UC SANTA CRUZ
Class website
http://www.soe.ucsc.edu/classes/cmps010/Spring11/
Please write this down, and bookmark it
Holds: Syllabus (including homework due dates) Homework assignment descriptions Description of course readings Links to class lecture notes
The final exam is scheduled for Tuesday, June 7, 8am-11am This class will have a final exam. Please plan on this.
UC SANTA CRUZ
Tutoring available
Learning Support Services (LSS) Has tutoring available for students in CMPS 10 Students meet in small groups, led by a tutor Students are eligible for up to one-hour of tutoring per
week per course, and may sign-up for tutoring at https://eop.sa.ucsc.edu/OTSS/tutorsignup/ beginning April 5th at 10:00am.
Brett Care - [email protected] is the tutor for CMPS 10 that LSS has hired
UC SANTA CRUZ
Abstraction and Models
Converting the real world into data: Create a model of the real world Represent that model in data
How do you model the real world? Involves a process called abstraction
Abstraction Prerequisite: know your problem or application Focus on aspects of the real world that are important to the
problem Add those elements to your model
Omit elements of the real world that aren’t relevant Implies: the same real world scenario can be modeled in many
ways, depending on the problem at hand
physical world
model
data (inside
computer)
abstraction
representation
UC SANTA CRUZ
Representing models as data
Most models can be represented using: Basic data types
Integers Floating point Boolean Characters Strings
Basic data structures Arrays Lists Stacks/Queues Trees Graphs
UC SANTA CRUZ
Boolean A boolean data type represents true or false This is represented as a 1 (true) or a 0 (false)
How much space does a boolean require? It varies. The minimum required space is 1 bit However, typically a boolean is stored in an
entire byte 8 bits, as in the C# language Only use one bit:
00000001 = true 00000000 = false
… or in an integer 16 or 32 bits, as in the C language which lacks a
standard boolean type 0000000000000001 = true (16 bits) 0000000000000000 = false (16 bits)
www.popwuping.com/culture/true-urban-park-in-bangkok.php
tv.wearefalse.com
UC SANTA CRUZ
Character
A single letter, number, punctuation, symbol, etc. Historically, in the US characters
were represented using the US-ASCII code (uses 7 bits of an 8 bit byte)
This was superseded by ISO/IEC 8859 Provided support for special
characters used in specific languages, along with accented characters
Examples: ß (German), ñ (Spanish), å (Swedish and other Nordic languages) and ő (Hungarian)
But, didn’t handle representation of ideographic languages
Led to development of many standards for this specific purpose, and for other languages not covered by ISO 8859
Historically a character of storage meant one byte (8 bits)
This still holds true in many discussions today
Lead typeblog.davidcaputo.net/category/design/
UC SANTA CRUZ
UNICODE Today, the UNICODE standard is rapidly
becoming standard Can represent every character in every
human language with an alphabet Contains more than 109,000 characters
covering 93 scripts Initial idea comes from Joe Becker and
Mark Davis in 1987 Today, maintained by UNICODE consortium
Each character has a unique 32 bit identifier But, 32 bit per character is a lot of space So, have multiple encodings
UTF-8: most popular, maximizes backward popularity with US-ASCII, 8bits/byte for US-ASCII, more bytes for other scripts (variable width)
UTF-16: most common scripts are 16 bits, less common ones are more (variable width)
UTF-32/UCS4: each character uses 32 bits (4 bytes)
One of the great unsung achievements of computer science
www.macchiato.com/
UC SANTA CRUZ
Strings
A string is a sequence of characters “Hello, world!” is the most famous string.
Two main ways to represent: A sequence of characters, ended with a 0 (null
character)
A length, and then that many characters
13
Each character is represented according to some character encoding (UTF-8, UTF-16, US-ASCII, etc.)
H e l l o , W o r l d ! /0 null character
H e l l o , W o r l d !
UC SANTA CRUZ
Modeling and sets
When modeling the real world, there is a need to model sets of things Can also think of this as a group of things, collection of things,
etc. Examples:
The temperature at my house measured every hour over a day All of the songs in my music collection All of the houses on a street All of the people in my family People standing in line at a restaurant
Frequently, these sets have a natural order Temperature over a day:
First temperature reading at hour 0, then the second at hour 1, etc.
Houses on a street: Order by house number
UC SANTA CRUZ
Representing sets
Many different data structures have been developed to represent sets Array
A set with fixed length Elements can be added anywhere Can go directly to any element
Lists A set with variable length Elements can be added anywhere Need to search list for specific element
Stack/queue A set with variable length Elements can be added only at beginning (stack) or end
(queue) Can only retrieve element from beginning (stack/queue)
www.setgame.com
UC SANTA CRUZ
Arrays
Used to represent sets of fixed length Also represents mathematical vectors and matrices of fixed
size Once set, cannot change the size of an array (biggest
limitation) But, this limitation permits fast lookup of values (biggest
strength)
How do they work (1-dimensional) Given an integer index can:
Retrieve an element of the array array[index] value
Set an element of the array value array[index]
Can have an array comprised of any basic data type Array of integers, array of floats, array of strings, etc.
UC SANTA CRUZ
Array example
Consider a set of temperature values at a location, with temperature readings taken once every hour Have a total of 24 readings each day, and this won’t
change
Model for one day of readings A set of 24 ordered temperature readings
Representation Use an array to represent the ordered set of 24 readings Use a floating point number to represent each
temperature readingtemperature is array[24] of float
UC SANTA CRUZ
Array Example: 24 hours of temperature
Typical use: temperature[0] = 52.5 Sets the temperature value for
hour 0 to 52.5 noon_temp = temperature[12] The variable noon_temp
takes the value of the temperature array at hour 12 (noon), 68.2
52.5
52.0
51.7
51.2
50.8
50.1
0:
1:
2:
3:
4:
5:
49.8
51.6
55.7
57.2
61.4
65.8
6:
7:
8:
9:
10:11:
68.2
70.4
72.5
72.9
72.1
70.3
12:13:14:15:16:17:
68.3
61.8
58.0
56.4
54.3
52.6
18:19:20:21:22:23:
UC SANTA CRUZ
2-dimensional arrays
It is also possible to have 2 and more dimensional arrays
Represents tabular data, or matrices
In this case, have indices for the row and column of the data
Example: Temperature readings for a yeartemperature is array[365][24] of float
Noon_Jan_First_temp = Temperature[0][12] The temperature on January 1, at noon
UC SANTA CRUZ
Array: pros and cons
Pros Permits fast access to elements of the array Array notation maps well to certain kinds of problems
(mathematical matrices)
Cons Array size is fixed, and cannot grow
In many situations, the amount of data is unknowable in advance
Example: Your music collection. Can you predict how many songs you’ll
acquire over your lifetime? For this situation, would be better to have a representation
that can grow or shrink over time
UC SANTA CRUZ
List
Used to represent sets of variable length It is possible to change the length of a list by adding and
removing members Are slower than arrays for looking up members
How do they work List.add(element)
Adds element to end of the list List.remove(element)
Searches list, and removes first one that matches element List.Insert(position, element)
Adds element at specified position in list
Can have a list of any basic data type List of integers, list of floats, list of strings, etc.
UC SANTA CRUZ
List example
Consider a list of the titles of songs you own This list will grow over time… and may shrink
Maybe you delete the Miley Cyrus in your collection?
Model A set of song titles
Representation Use a list to represent the set and a string to represent
the title Songtitles is List of string Songtitles is a list of
strings
UC SANTA CRUZ
List example
Start with this list, called Songlist 0, “Poker face” 1, “Video killed the radio star” 2, “Rock star”
Add a song to the list Songlist .Add(“Beat it”)
0, “Poker face” 1, “Video killed the radio star” 2, “Rock star” 3, “Beat it”
Songlist .Remove(“Rock star”) 0, “Poker face” 1, “Video killed the radio star” 2, “Beat it”
Songlist .Retrieve(1) Gives the value, “Video killed the radio star”
Songlist .Insert(1, “Let’s Go”) 0, “Poker face” 1, “Let’s Go” 2, “Video killed the radio star” 3, “Beat it”
UC SANTA CRUZ
Linked List
The typical implementation of a list is as a linked list
Each element holds a pointer to the next element in the list
Can also have each element point to the next and previous element in the list (permits fast “previous item” capability)
A singly linked list (en.wikipedia.org/wiki/Linked_list)
0: 12, 1: 99, 2: 37
A doubly linked list (en.wikipedia.org/wiki/Linked_list)
0: 12, 1: 99, 2: 37
UC SANTA CRUZ
History of Linked List
Linked lists emerged early in computing 1955-1956 by Allen Newell, Cliff Shaw, Herbert Simon
while developing language IPL In 1958, the language LISP (List Processor) was
developed at MIT by John McCarthy Made lists (implemented as linked lists) a fundamental part
of the language
Today, most major programming languages provide a built-in list data type, often with multiple variations
UC SANTA CRUZ
List: Pros and Cons
Pros Can handle a list of any length
Cons Slower access than with arrays Slower to add elements into a list Notation for accessing elements not as convenient as
arrays Some languages allow use of array notation (e.g.,
list[index]) with lists
UC SANTA CRUZ
Stack and Queue Stacks and queues are used to represent ordered sets where you
typically want to access the most recently added element (the “top”) Accessing the top element is called “pop” for Stacks, “dequeue” for Queues
With a stack, the elements can only be added (push) to the top With a queue, the elements can only be added (enqueue) to the bottom
Stacken.wikipedia.org/wiki/Stack_(data_structure)
Queueen.wikipedia.org/wiki/Queue_(data_structure)
UC SANTA CRUZ
Stack example
Consider a list of web page addresses (URLs) you have visited in your browser If you hit the “back” button, you would like to go to the
last page you visited
Model A set of ordered web page URLs
Representation Use a stack to represent the set, and a string to
represent each URL Visited_pages is stack of string
Visited_pages is a stack data structure where each element is a string
The Visited_pages structure ensures that the last element added is the first element removed (last-in, first-out, LIFO)
UC SANTA CRUZ
Stack example Assume the starting history is as follows and you’re at the
page www.engadget.com 0: www.ucsc.edu 1: games.soe.ucsc.edu 2: www.acm.org (www.engadget.com) isn’t added until you go to a new page, and it becomes history
Now, you browse over to Slashdot (www.slashdot.com) Visited_pages.push(“www.engadget.com”) 0: www.engadget.com 1: www.ucsc.edu 2: games.soe.ucsc.edu 3: www.acm.org
Then, you decide to hit back button Visited_pages.pop() Returns: www.engadget.com
0: www.ucsc.edu 1: games.soe.ucsc.edu 2: www.acm.org
Browser reloads this page
UC SANTA CRUZ
Queue example
Consider people waiting in line at the coffee cart It might be nice if you could just give your name, and
they would call you when it’s your turn
Model An ordered set of customer names.
Representation Use a queue to represent the set, and a string to
represent the customer name Coffee_line is queue of string
Coffee line is a data structure where each element is a string
The queue data structure ensures no one will cut in line (first-in line, first-served, or first-in, first-out, FIFO)
UC SANTA CRUZ
Queue example Assume the line is currently:
0: “Ada Lovelace” 1: “Charles Babbage” 2: “Grace Hopper”
A new person, Alan Turing comes to the end of the line Coffee_line.enqueue(“Alan Turing”) Line is now (Turing is added to the end of the queue)
0: “Ada Lovelace” 1: “Charles Babbage” 2: “Grace Hopper” 3: “Alan Turing”
Now the next person in line is served Coffee_line.dequeue() Returns “Ada Lovelace” List is now (Ada came from the from the front of the queue)
0: “Charles Babbage” 1: “Grace Hopper” 2: “Alan Turing”