Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer...

24
112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical overview What is "Computer Science" / "Informatics" ? "Computer Science" – science about a tool? better names would be: "science of computing" or "data processing science" (focuses on activity instead of tool) "Informatics": continental-European for "computer science" - French: "Informatique" (since 1960s) - German: "Informatik" Definition: "Science of the systematical processing of information, especially the automatic processing by use of digital computers". Latin "informare": to give structure to something; to educate; to picture Information: independent fundamental entity of the world besides matter and energy depends on previous knowledge of the receiver of the information various approaches to quantify it we can consider information simply as "interpreted data".

Transcript of Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer...

Page 1: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

112

Part II: Computer Science Essentials 1. Introduction to computer science

Fundamental notions, systematical overview What is "Computer Science" / "Informatics" ?

"Computer Science" – science about a tool?

better names would be: "science of computing" or "data processing science" (focuses on activity instead of tool)

"Informatics": continental-European for "computer science" - French: "Informatique" (since 1960s) - German: "Informatik"

Definition: "Science of the systematical processing of information, especially the automatic processing by use of digital computers".

Latin "informare": to give structure to something; to educate; to picture

Information: • independent fundamental entity of the world

besides matter and energy • depends on previous knowledge of the receiver

of the information • various approaches to quantify it • we can consider information simply as

"interpreted data".

Page 2: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

113

Data: represented information (e.g. text in a book, magnetic patterns on a harddisk, ...) But: Hermeneutics – "the art of interpretation" – is not part of informatics, despite its name. Social and cultural aspects of information are largely ignored. "Computer": comes from "to compute" = "to calculate". "Algorithm": The word comes from the Persian textbook writer Abu Ja'far Mohammed ibn Mûsâ al-Khowârizmî ( = "father of Ja'far Mohammed, son of Moses, coming from Khowârizm" – a town in Usbekistan, today called Khiva.)

Al-Khowârizmî lived in Bhagdad, in the "House of Wisdom" wrote book about calculation: "Kitab al jabr w'al-muqabala" (= "rules of reconstitution and reduction") – here the word "algebra" comes from!

Modern meaning of "algorithm":

Finite set of rules which specify a sequence of operations in order to solve a certain problem, with the following properties:

Page 3: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

114

1. Termination: An algorithm must come to an end after a finite number of steps.

2. Definitness: Each step must be defined precisely.

3. Input: An algorithm can need input values (e.g. numbers).

4. Output: An algorithm must give one or more output values.

5. Feasibility: An algorithm must be feasible; e.g., no intermediate step must depend on the solution of some still unsolved mathematical problem.

(after Knuth 1973)

"Programme" (in American English: "program"):

Version of an algorithm which can be read, interpreted and carried out by a computer.

Programming languages were designed to write precise programmes (more precise than possible in our natural language!) suitable for computers. Some notes concerning the history of programming:

Page 4: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

115

Early phases of computer history: Hardware (= the machines) was in focus (reason for the name "computer science") Later: Software (= programmes) increasingly important, increasingly expensive in comparison to hardware.

Later: so-called "high-level languages" - more abstraction - better readability for humans - trying to integrate traditional mathematical notations - platform-independent (not specific to certain machine)

FORTRAN (1954), COBOL (1958), LISP (1960), Pascal (1971), C (1971), C++ (extension of C, 1992), Java (1995), XL (2008) ...

(later more about programming)

Page 5: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

116

Subject areas of computer science

Page 6: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

117

Page 7: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

118

How does one organize the development process of large software systems? ("Software engineering")

Page 8: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

119

Informatics in the social context: What ethical questions arise from the use of computers, and how can they be answered? (data security, privacy questions, computer viruses, hackers, violence-promoting games, software piracy, ownership of software and ideas, the open-source idea, use of information technology for warfare, for crime, for sexual exploitation, for terrorism...) How does the use of computers influence our way of thinking (about the world, about humans, about the mind, about personal relationships of people...)? How can computers, the Web and the "Web 2.0" (Facebook, Twitter, Wikipedia etc.) be used to improve education / autonomy of people / human rights / political participation... ?

Page 9: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

120

2. Representation and measurement of information In digital computers and media, all data are represented by combinations of only 2 elementary states: 0 and 1 (can be "charged" / "not charged", "on" / "off", "magnetized" / "not magnetized", "open" / "closed", "high current" / "low current", "plus" / "minus" etc.) The smallest amount of information is thus the bit (binary digit). It expresses which of two alternatives is the case. The alternatives are often written 0 and 1, or (sometimes) 0 and L. n bits: represent one out of 2n alternatives. Codes To represent information in a computer, we must encode all with the two symbols 0 and 1 ! What is a code ?

Code (1): A mapping f: A → B from a set A of elements to be stored or transferred to a set B used for storage or transfer.

Code (2): The set B from definition (1).

Page 10: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

121

Example:

For digital computers, we need binary codes: B is a set of combinations of 0 and 1.

Page 11: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

122

Examples:

Multiples of bits Bits seldom occur as singles. Certain multiples of bits are used as units for information (storage) capacity. 1 Byte: 8 bits (can represent 1 of 28 = 256 alternatives).

Example: one of the integer numbers between –128 and +127. 1 Halfbyte: 4 bits.

Page 12: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

123

Typically, memory stores are built for multiples of bytes. Prefixes: kilo, mega, giga, tera, peta, exa

- used in physics for the factors 103, 106, 109, 1012, 1015,1018 - in computer science often used for the factors 210, 220, 230, 240, 250, 260, which are slightly larger abbre-viation

meaning factor

KB MB GB TB PB EB

Kilobytes Megabytes Gigabytes Terabytes Petabytes Exabytes

210 = 1024 220 = 1,048,576 230 = 1,073,741,824 240 = 1,099,511,627,776 250 = 1,125,899,906,842,624 260 = 1,152,921,504,606,846,976

Representation of numbers in the computer For positive integers, basically the binary number system is used (cf. Part I, Chapter 6). But: Numbers are usually stored in sections of memory of fixed size (for reasons of organization of memory access in the computer).

Page 13: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

124

Integer representation in finite cells ("words" with fixed length):

Example for the "Two's complement": 8-bit two's complement representation of –77 1. Represent +77 as a binary number: 1001101 2. Invert all bits, including the leading 0s: ...1110110010

Page 14: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

125

3. Add 1: ...1110110011 4. Use only the lowest (= rightmost) 8 bits: 10110011 Notice: For 16-bit cells, the result would be 1111111110110011.

Properties of the two's complement:

Page 15: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

126

Floating-point representations

Built analogously to the "scientific representation" of numbers in the form m * 10e

- but using the binary system:

Page 16: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

127

Digital representation of text based on representation of letters - depending on the alphabet: certain number of bits necessary - for 26 letters: at least 5 bits necessary (24 = 16 < 26, 25 = 32 > 26) - but also encoding of digits, special signs, upper- and lower-case letters... desirable traditional 7-bit code: ASCII (= American Standard Code for Information Interchange) ISO-646 norm later extended to 8-bit code examples: 00110000 = hex 30 = 4810 = digit 0

Page 17: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

128

00110001 = hex 31 = 4910 = digit 1 ... 00111010 = hex 3A = 5810 = ':' ... 01000001 = hex 41 = 6510 = 'A' 01000010 = hex 42 = 6610 = 'B' ... 01100001 = hex 61 = 9710 = 'a' ... ASCII Table:

Non-printable characters Dez Okt Hex Char Code Remark

0 000 0x00 Ctrl-@ NUL Null prompt

1 001 0x01 Ctrl-A SOH Start of heading

2 002 0x02 Ctrl-B STX Start of text

3 003 0x03 Ctrl-C ETX End of Text

4 004 0x04 Ctrl-D EOT End of transmission

5 005 0x05 Ctrl-E ENQ Enquiry

6 006 0x06 Ctrl-F ACK Acknowledge

7 007 0x07 Ctrl-G BEL Bell

8 010 0x08 Ctrl-H BS Backspace

9 011 0x09 Ctrl-I HT Horizontal tab

10 012 0x0A Ctrl-J LF Line feed

11 013 0x0B Ctrl-K VT Vertical tab

12 014 0x0C Ctrl-L FF Form feed

NP New page

13 015 0x0D Ctrl-M CR Carriage return

14 016 0x0E Ctrl-N SO Shift out

15 017 0x0F Ctrl-O SI Shift in

16 020 0x10 Ctrl-P DLE Data link escape

17 021 0x11 Ctrl-Q DC1 X-ON

18 022 0x12 Ctrl-R DC2

Printable characters Dez Okt Hex Char Remark

32 040 0x20 blank

33 041 0x21 ! exclamation mark

34 042 0x22 " quotation mark

35 043 0x23 #

36 044 0x24 $ Dollar character

37 045 0x25 %

38 046 0x26 &

39 047 0x27 ' apostroph

40 050 0x28 (

41 051 0x29 )

42 052 0x2A * asterisk

43 053 0x2B + plus sign

44 054 0x2C , comma

45 055 0x2D - minus sign

46 056 0x2E . dot

47 057 0x2F / slash

48 060 0x30 0

49 061 0x31 1

50 062 0x32 2

51 063 0x33 3

Page 18: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

129

19 023 0x13 Ctrl-S DC3 X-Off

20 024 0x14 Ctrl-T DC4

21 025 0x15 Ctrl-U NAK No achnowledge

22 026 0x16 Ctrl-V SYN Synchronous idle

23 027 0x17 Ctrl-W ETB End transmission blocks

24 030 0x18 Ctrl-X CAN Cancel

25 031 0x19 Ctrl-Y EM End of medium

26 032 0x1A Ctrl-Z SUB Substitute

27 033 0x1B Ctrl-[ ESC Escape

28 034 0x1C Ctrl-\ FS File separator

29 035 0x1D Ctrl-] GS Group separator

30 036 0x1E Ctrl-^ RS Record separator

31 027 0x1F Ctrl-_ US Unit separator

127 0177 0x7F DEL Delete or rubout

52 064 0x34 4

53 065 0x35 5

54 066 0x36 6

55 067 0x37 7

56 070 0x38 8

57 071 0x39 9

58 072 0x3A : colon

59 073 0x3B ; semicolon

60 074 0x3C < less than

61 075 0x3D = euqality character

62 076 0x3E > greater than

63 077 0x3F ? interrogation mark

64 0100 0x40 @ at

65 0101 0x41 A

66 0102 0x42 B

67 0103 0x43 C

68 0104 0x44 D

69 0105 0x45 E

70 0106 0x46 F

71 0107 0x47 G

72 0110 0x48 H

73 0111 0x49 I

74 0112 0x4A J

75 0113 0x4B K

76 0114 0x4C L

77 0115 0x4D M

78 0116 0x4E N

79 0117 0x4F O

80 0120 0x50 P

81 0121 0x51 Q

82 0122 0x52 R

83 0123 0x53 S

84 0124 0x54 T

85 0125 0x55 U

86 0126 0x56 V

87 0127 0x57 W

88 0130 0x58 X

89 0131 0x59 Y

90 0132 0x5A Z

91 0133 0x5B [

92 0134 0x5C \ backslash

93 0135 0x5D ]

94 0136 0x5E ^ caret

95 0137 0x5F _ low line

96 0140 0x60 ` back quote

97 0141 0x61 a

98 0142 0x62 b

99 0143 0x63 c

100 0144 0x64 d

101 0145 0x65 e

102 0146 0x66 f

Page 19: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

130

103 0147 0x67 g

104 0150 0x68 h

105 0151 0x69 i

106 0152 0x6A j

107 0153 0x6B k

108 0154 0x6C l

109 0155 0x6D m

110 0156 0x6E n

111 0157 0x6F o

112 0160 0x70 p

113 0161 0x71 q

114 0162 0x72 r

115 0163 0x73 s

116 0164 0x74 t

117 0165 0x75 u

118 0166 0x76 v

119 0167 0x77 w

120 0170 0x78 x

121 0171 0x79 y

122 0172 0x7A z

123 0173 0x7B {

124 0174 0x7C |

125 0175 0x7D }

126 0176 0x7E ~

ASCII not sufficient for alphabets of the non-Anglo-american world (not even for European alphabets with ä, ö, ü, ß, é, ø, ñ, å...) Unicode: 2 byte (= 16 bit) code for multilingual text processing - can represent 65536 characters amongst them: 27786 Chinese-Japanese-Korean characters 11172 Hangul characters (Korean) ancient Nordic runes Tibetan characters Cherokee characters ... complete list see http://www.unicode.org/charts/

Page 20: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

131

Unicode "Escape sequence" (to utilise it in the pro-gramming language Java): e.g., \u0041 = 'A' (0041 = hexadecimal representation) Some characters occur more frequently in texts than others: better use variable-length code UTF-8: Universal Transformation Format Characters encoded with variable number of bytes ⇒ for texts with many ASCII characters (like on many web pages) shorter as Unicode Strings (or words): sequences of characters encoded by sequences of the corresponding code words Digital representation of pictures Gray levels: encode each gray level by a number from a fixed interval (e.g. 0, 1, ..., 255: 8-bit representation – 0 = black, 255 = white) Colours: several colour models possible the most frequently used one: RGB model (red / green / blue: primary colours for additive colour composition) Each colour from a certain range ("gamut") can be mixed from these primary colours

examples with 8-bit intensities:

Page 21: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

132

black (0, 0, 0) white (255, 255, 255) medium gray (127, 127, 127) red (255, 0, 0) green (0, 255, 0) blue (0, 0, 255) light blue (127, 127, 255) yellow (255, 255, 0)

Pictures: typically represented as raster images – rectangular array (matrix) of pixels, each pixel represented by its 3 colour values. Representation of text documents (book pages, web pages...) Level of representation is important. (1) Is there text on the page? – One bit. (2) What is the text on the page? – Representation of letter sequence (e.g., string of ASCII characters). (3) What is the exact layout of the text on the page? – "formatted text" - use special characters for formatting, or - represent the page by a rasterized black-and-white image.

Page 22: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

133

Text documents with graphical elements:

- represent all as a single raster image, or

- use combined representation: several data files, one for the text, the other for the pictorial parts → HTML web pages are built like this

file <name>.html or <name>.htm contains text, layout information and links to other pages files <name>.gif or <name>.jpg or <name>.png contain images Messages and redundancy

Page 23: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

134

Other example: ISBN code (International Standard Book Number) – last character is a parity character

Entropy and quantification of information

Shannon's information theory: Information as a measurable, statistical property of signals How can we measure information and redundancy of characters in a message?

Assumption: N-character alphabet { x1, x2, ..., xN }

Number of bits per character: H0 = log2 N

(Remember: log2 N = (log N)/(log 2) )

Information content of a single character xi : )(1

log2ixp

Page 24: Part II: Computer Science Essentials - Fakultät für ...wkurth/csm16_v06.pdf112 Part II: Computer Science Essentials 1. Introduction to computer science Fundamental notions, systematical

135

Here, p(xi) is the probability of xi . Entropy = average value of information content of all characters

= H = ∑=

∗N

k kk xp

xp1

2 )(1

log)(

Binary encoding needs at least, on average, H bits per character. Redundancy: R = H0 – H.