Too much information running through my brain.. We live in the information age. Knowledge comes from...

32
I Need More Data Too much information running through my brain.

Transcript of Too much information running through my brain.. We live in the information age. Knowledge comes from...

Page 1: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

I Need More Data

Too much information running through my brain.

Page 2: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What is 'data'?

• We live in the information age.• Knowledge comes from careful investigation of

information.• Information is represented/encoded as data.

– What information is represented by an abacus? How?– What information is represented on a DVD? How?– What information is encoded on a credit card? How?

DATA: The quantities, characters, or symbols on which operations are performed by a computer.

Page 3: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

How can real-world information become data?

• How can a picture or a sound or a temperature reading become data?

• Data comes in two types:– Continuous: infinitely variable points– Discrete: finite number of points/choices

Page 4: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Continuous or Discrete?

• How long might it take a light bulb to burn out?

• What was your ACT score?• How tall are you?• How many books did you read this year?• How much water did you drink this week?• How many gen-ed courses have you taken

at UW-L?

Page 5: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Continuous/Discrete

• In electronics, signals are known as either– analog (meaning a continuous signal)– digital (meaning a discrete signal)

Page 6: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Analog or Digital?

Page 7: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Why are computers digital?

• Information needs to be encoded in such a way as to be processed.– Electrical signals can be processed.– Even analog signals can be processed, but digital is

simpler.• In computers, there are two discrete (digital)

signals: on and off. It's easy to tell if an electrical signal is on or off:– Electric fence– Electric socket– Light bulb

Page 8: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

On or Off

http://www.flickr.com/photos/tudor/31803307/sizes/o/in/photostream/http://www.flickr.com/photos/my-other-eye/5300224495/sizes/z/in/photostream/

Page 9: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What is a bit?• Bit: short for "binary digit". A bit is the representation used for the

smallest (atomic) amount of computer data.– A bit is either ON or OFF.– You can think of a bit as an extremely small battery that can be quickly

charged and discharged. When charged, the bit is ON. When discharged, the bit is OFF. This is essentially what a single transistor is.

– Mathematically speaking, a bit is usually understood as the value 0 when OFF and the value 1 when ON.

– Since there are only two values, a bit is known as a 'binary' digit.

0 1

Page 10: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Bit Patterns

• What if you had two bits in a sequence. How many different patterns (sequences) could there be?

0 1 1 00 0 1 1

Page 11: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Bit Patterns

• What if you had three bits in a sequence. How many different patterns (sequences) could there be?

0 0 0

0 1 0 0 1 1

0 0 1

Page 12: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Bit Patterns

• What if you had four bits in a sequence. How many patterns could there be?

• What if you had N bits in a sequence. How many patterns could there be?

• With more bits you can store more information.– One more bit doubles the amount.

# Bits # of Patterns1 22 43 84 165 32N 2N

Page 13: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

How is data capacity measured?

• One bit is too small to use as a measurement. – Nobody says: "I've got a 10 GigaBit IPod"

• Measures of data capacity are based on a byte.– 1 byte = 8 bits– 1 bytes can have 256 different patterns– 1 byte is big enough to represent many kinds of things

Prefix Symbol Base 2 Decimal

Kilobyte K 210 1,024

Megabyte M 220 1,048,576

Gigabyte G 230 ≈1,000,000,000

Terabyte T 240 ≈1,000,000,000,000

Petabyte P 250 ≈1,000,000,000,000,000

Page 14: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What we have learned

• A string of bits can represent various things• The length of a bit string controls the

number of things that can be represented

What is the shortest bit string for representing 100 different special symbols?

Page 15: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

How much data to encode…

• How much data capacity do you need to encode:– The complete works of William Shakespeare?– One 4 minute pop song (MP3)?– One digital picture (JPEG)?– One feature length movie (DVD)?– All of Wikipedia (As of Jan 2010)? – The entire U.S. Library of Congress (As of Apr 2011)?

Page 16: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Digital NUMBERS• All digital data is a sequence of bits.• How can we represent an integer number as a sequence of bits?• Consider the decimal number 515.

– A sequence of digits– Digits are one of: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9– Meaning of a digit depends on position: power of 10– 515 = 5×102 + 5×101 + 5×100

• Consider the binary number 101.– Binary uses a base-2 (or radix 10) system rather than base 10– Digits are one of: 0, 1– Meaning of a digit depends on position: power of 2– 101 = 1×22 + 0×21 + 1×20

Page 17: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Convert binary to decimal

• 101110 = – 1×25 + 0×24 + 1×23 + 1×22 + 1×21 + 0×20

– 1×32 + 0×16 + 1×8 + 1×4 + 1×2 + 0×1– 32 + 0 + 8 + 4 + 2 + 0– 46

• 110001 = – 1×25 + 1×24 + 0×23 + 0×22 + 0×21 + 1×20

– 1×32 + 1×16 + 0×8 + 0×4 + 0×2 + 1×1– 32 + 16 + 0 + 0 + 0 + 1– 49

Page 18: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What's the base?

• It's easy to get confused and not be sure of what base a number is written in. For example, is 111:– One hundred eleven?– Five?

• A subscript can be used to specify the base whenever it is unclear.– 1112 is equal to five

– 11110 is equal to one hundred eleven.

Page 19: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Biggest Binary Number

• What is the biggest number you can have with– Two bits?– Three bits?– Four bits?– Five bits?– N bits?

Page 20: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What about real numbers?

• While we can represent an integer as a sequence of bits, is it possible to represent a real number such as 2.31 or 2.125?

• In base 10, the value 2.125 means:– 2×100 + 1×10-1 + 2×10-2 + 5×10-3

• In base 2, the value 1.101 means:– 1×20 + 1×2-1 + 0×2-2 + 1×2-3 – 1×1 + 1×(1/2) + 0×(1/4) + 1×(1/8)– 1 + .5 + 0 + .125– 1.625

Page 21: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What about real numbers?

• Consider the value 1/3. How many decimal digits does it take to accurately represent as a real number?– 1/3 = 0.33333333333333333333333...

• Consider the value 1/5. How many decimal digits does it take to accurately represent as a real number? How many binary digits?– 1/5 = 0.210

– 1/5 = 0.102

• Since it requires a potentially infinite amount of bits to store a real number, computers can be imprecise.

Page 22: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What about text?• Can text be represented as a sequence of binary digits (bits)?• Text is a made of pictures (also known as symbols or characters).• Each character can be associated with an integer number

Character Decimal BinaryA 0 00000000B 1 00000001C 2 00000010D 3 00000011E 4 00000100F 5 00000101… … …Z 25 00011001

Page 23: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What about text?

• The numbers associated with a character can obviously be stored

• About how many unique numbers are required for English text? (asked another way, how many unique characters did William Shakespeare ever use?)– One byte has enough capacity to store an English

character.• About how many unique numbers are required for

Chinese text?– Two bytes is enough for most languages: 电脑

Page 24: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

ASCII TableMost computers that are configured for English writers, use the ASCII table. This table associates numbers with English text.

Page 25: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What about colors?

• How might a computer store a 'color'?• What are the primary colors of pigment?

– Cyan, magenta, yellow• What are the primary colors of light?

– Red, green, blue

Page 26: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

RGB Color Model• RGB color model

– Uses red, green, and blue as the primary colors.– Any color can be represented by combining different amounts of these

three primaries.• Consider a flashlight that has a slider that chooses the strength of

light emitted. – Setting the slider to zero, the flashlight is turned completely off – Setting the slider to 255, the flashlight generates as much light as it is

capable of generating. • Consider three such flashlights

– Each light emits purely red; green; or blue light. If all three flashlights are aimed at the same spot on a white wall any color can be projected onto the wall by adjusting the slider values on the three lights in different ways.

Page 27: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What about pictures?

• Could you encode an image as a sequence of bits?– Starting from the upper-left pixel, scan the image left-to-right,

top-to-bottom– Record each pixel that you encounter.

• How many bits would be required for a– 100x100 image?– 1024x768 image?

• Most JPG files of 1024x768 are about 3-4 Meg. How?

Page 28: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What about pictures?

• There are many different ways to encode the same information. Some ways use more bits than others.

• Consider a black & white 8x8 image.– Use 0 for white and 1 for black– This is known as 'raw' or 'bitmap' format

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

1 0 1 0 0 1 0 1

1 0 0 0 0 0 0 1

1 0 1 0 0 1 0 1

1 0 0 1 1 0 0 1

0 1 0 0 0 0 1 0

0 0 1 1 1 1 0 0

Page 29: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

What about pictures?

• Run length encoding is another way to encode images– A 'run' is the length of successive like-colored pixels– Store the lengths of these runs for each row, starting with white

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

1 0 1 0 0 1 0 1

1 0 0 0 0 0 0 1

1 0 1 0 0 1 0 1

1 0 0 1 1 0 0 1

0 1 0 0 0 0 1 0

0 0 1 1 1 1 0 0

2,4,2

1,1,4,1,1

1,1,1,2,1,1,1

1,6,1

1,1,1,2,1,1,1

1,2,2,2,1

1,1,4,1,1

2,4,2

Raw Run Length

Can you think of numbers in the Run Length code above that are not needed?

Page 30: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Consider another way to store images.

1, 5, 21, 1, 4, 1, 11, 1, 4, 1, 11, 5, 21, 1, 2, 1, 31, 1, 3, 1, 21, 4, 1, 1 8

a total of ____ numbers

0111110001000010010000100111110001001000010001000100001000000000

Raw pixels Run Length Encoding

a total of _____ numbers

compression - a way to represent data in more compact form

Page 31: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Compression

• When data is compressed, information is encoded using fewer bits.– This speeds transmission – Reduces storage cost (smaller drives)– May increase processing (must un-compress to

view/process)• For pictures, there are two types:

– Lossless: No information is lost– Lossy: Information may be lost

Page 32: Too much information running through my brain.. We live in the information age. Knowledge comes from careful investigation of information. Information.

Lossy (jpg) and Lossless (png)