Too much information running through my brain.. We live in the information age. Knowledge comes from...
-
Upload
johnathan-hunt -
Category
Documents
-
view
215 -
download
0
Transcript of Too much information running through my brain.. We live in the information age. Knowledge comes from...
I Need More Data
Too much information running through my brain.
What is 'data'?
• We live in the information age.• Knowledge comes from careful investigation of
information.• Information is represented/encoded as data.
– What information is represented by an abacus? How?– What information is represented on a DVD? How?– What information is encoded on a credit card? How?
DATA: The quantities, characters, or symbols on which operations are performed by a computer.
How can real-world information become data?
• How can a picture or a sound or a temperature reading become data?
• Data comes in two types:– Continuous: infinitely variable points– Discrete: finite number of points/choices
Continuous or Discrete?
• How long might it take a light bulb to burn out?
• What was your ACT score?• How tall are you?• How many books did you read this year?• How much water did you drink this week?• How many gen-ed courses have you taken
at UW-L?
Continuous/Discrete
• In electronics, signals are known as either– analog (meaning a continuous signal)– digital (meaning a discrete signal)
Analog or Digital?
Why are computers digital?
• Information needs to be encoded in such a way as to be processed.– Electrical signals can be processed.– Even analog signals can be processed, but digital is
simpler.• In computers, there are two discrete (digital)
signals: on and off. It's easy to tell if an electrical signal is on or off:– Electric fence– Electric socket– Light bulb
On or Off
http://www.flickr.com/photos/tudor/31803307/sizes/o/in/photostream/http://www.flickr.com/photos/my-other-eye/5300224495/sizes/z/in/photostream/
What is a bit?• Bit: short for "binary digit". A bit is the representation used for the
smallest (atomic) amount of computer data.– A bit is either ON or OFF.– You can think of a bit as an extremely small battery that can be quickly
charged and discharged. When charged, the bit is ON. When discharged, the bit is OFF. This is essentially what a single transistor is.
– Mathematically speaking, a bit is usually understood as the value 0 when OFF and the value 1 when ON.
– Since there are only two values, a bit is known as a 'binary' digit.
0 1
Bit Patterns
• What if you had two bits in a sequence. How many different patterns (sequences) could there be?
0 1 1 00 0 1 1
Bit Patterns
• What if you had three bits in a sequence. How many different patterns (sequences) could there be?
0 0 0
0 1 0 0 1 1
0 0 1
Bit Patterns
• What if you had four bits in a sequence. How many patterns could there be?
• What if you had N bits in a sequence. How many patterns could there be?
• With more bits you can store more information.– One more bit doubles the amount.
# Bits # of Patterns1 22 43 84 165 32N 2N
How is data capacity measured?
• One bit is too small to use as a measurement. – Nobody says: "I've got a 10 GigaBit IPod"
• Measures of data capacity are based on a byte.– 1 byte = 8 bits– 1 bytes can have 256 different patterns– 1 byte is big enough to represent many kinds of things
Prefix Symbol Base 2 Decimal
Kilobyte K 210 1,024
Megabyte M 220 1,048,576
Gigabyte G 230 ≈1,000,000,000
Terabyte T 240 ≈1,000,000,000,000
Petabyte P 250 ≈1,000,000,000,000,000
What we have learned
• A string of bits can represent various things• The length of a bit string controls the
number of things that can be represented
What is the shortest bit string for representing 100 different special symbols?
How much data to encode…
• How much data capacity do you need to encode:– The complete works of William Shakespeare?– One 4 minute pop song (MP3)?– One digital picture (JPEG)?– One feature length movie (DVD)?– All of Wikipedia (As of Jan 2010)? – The entire U.S. Library of Congress (As of Apr 2011)?
Digital NUMBERS• All digital data is a sequence of bits.• How can we represent an integer number as a sequence of bits?• Consider the decimal number 515.
– A sequence of digits– Digits are one of: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9– Meaning of a digit depends on position: power of 10– 515 = 5×102 + 5×101 + 5×100
• Consider the binary number 101.– Binary uses a base-2 (or radix 10) system rather than base 10– Digits are one of: 0, 1– Meaning of a digit depends on position: power of 2– 101 = 1×22 + 0×21 + 1×20
Convert binary to decimal
• 101110 = – 1×25 + 0×24 + 1×23 + 1×22 + 1×21 + 0×20
– 1×32 + 0×16 + 1×8 + 1×4 + 1×2 + 0×1– 32 + 0 + 8 + 4 + 2 + 0– 46
• 110001 = – 1×25 + 1×24 + 0×23 + 0×22 + 0×21 + 1×20
– 1×32 + 1×16 + 0×8 + 0×4 + 0×2 + 1×1– 32 + 16 + 0 + 0 + 0 + 1– 49
What's the base?
• It's easy to get confused and not be sure of what base a number is written in. For example, is 111:– One hundred eleven?– Five?
• A subscript can be used to specify the base whenever it is unclear.– 1112 is equal to five
– 11110 is equal to one hundred eleven.
Biggest Binary Number
• What is the biggest number you can have with– Two bits?– Three bits?– Four bits?– Five bits?– N bits?
What about real numbers?
• While we can represent an integer as a sequence of bits, is it possible to represent a real number such as 2.31 or 2.125?
• In base 10, the value 2.125 means:– 2×100 + 1×10-1 + 2×10-2 + 5×10-3
• In base 2, the value 1.101 means:– 1×20 + 1×2-1 + 0×2-2 + 1×2-3 – 1×1 + 1×(1/2) + 0×(1/4) + 1×(1/8)– 1 + .5 + 0 + .125– 1.625
What about real numbers?
• Consider the value 1/3. How many decimal digits does it take to accurately represent as a real number?– 1/3 = 0.33333333333333333333333...
• Consider the value 1/5. How many decimal digits does it take to accurately represent as a real number? How many binary digits?– 1/5 = 0.210
– 1/5 = 0.102
• Since it requires a potentially infinite amount of bits to store a real number, computers can be imprecise.
What about text?• Can text be represented as a sequence of binary digits (bits)?• Text is a made of pictures (also known as symbols or characters).• Each character can be associated with an integer number
Character Decimal BinaryA 0 00000000B 1 00000001C 2 00000010D 3 00000011E 4 00000100F 5 00000101… … …Z 25 00011001
What about text?
• The numbers associated with a character can obviously be stored
• About how many unique numbers are required for English text? (asked another way, how many unique characters did William Shakespeare ever use?)– One byte has enough capacity to store an English
character.• About how many unique numbers are required for
Chinese text?– Two bytes is enough for most languages: 电脑
ASCII TableMost computers that are configured for English writers, use the ASCII table. This table associates numbers with English text.
What about colors?
• How might a computer store a 'color'?• What are the primary colors of pigment?
– Cyan, magenta, yellow• What are the primary colors of light?
– Red, green, blue
RGB Color Model• RGB color model
– Uses red, green, and blue as the primary colors.– Any color can be represented by combining different amounts of these
three primaries.• Consider a flashlight that has a slider that chooses the strength of
light emitted. – Setting the slider to zero, the flashlight is turned completely off – Setting the slider to 255, the flashlight generates as much light as it is
capable of generating. • Consider three such flashlights
– Each light emits purely red; green; or blue light. If all three flashlights are aimed at the same spot on a white wall any color can be projected onto the wall by adjusting the slider values on the three lights in different ways.
What about pictures?
• Could you encode an image as a sequence of bits?– Starting from the upper-left pixel, scan the image left-to-right,
top-to-bottom– Record each pixel that you encounter.
• How many bits would be required for a– 100x100 image?– 1024x768 image?
• Most JPG files of 1024x768 are about 3-4 Meg. How?
What about pictures?
• There are many different ways to encode the same information. Some ways use more bits than others.
• Consider a black & white 8x8 image.– Use 0 for white and 1 for black– This is known as 'raw' or 'bitmap' format
0 0 1 1 1 1 0 0
0 1 0 0 0 0 1 0
1 0 1 0 0 1 0 1
1 0 0 0 0 0 0 1
1 0 1 0 0 1 0 1
1 0 0 1 1 0 0 1
0 1 0 0 0 0 1 0
0 0 1 1 1 1 0 0
What about pictures?
• Run length encoding is another way to encode images– A 'run' is the length of successive like-colored pixels– Store the lengths of these runs for each row, starting with white
0 0 1 1 1 1 0 0
0 1 0 0 0 0 1 0
1 0 1 0 0 1 0 1
1 0 0 0 0 0 0 1
1 0 1 0 0 1 0 1
1 0 0 1 1 0 0 1
0 1 0 0 0 0 1 0
0 0 1 1 1 1 0 0
2,4,2
1,1,4,1,1
1,1,1,2,1,1,1
1,6,1
1,1,1,2,1,1,1
1,2,2,2,1
1,1,4,1,1
2,4,2
Raw Run Length
Can you think of numbers in the Run Length code above that are not needed?
Consider another way to store images.
1, 5, 21, 1, 4, 1, 11, 1, 4, 1, 11, 5, 21, 1, 2, 1, 31, 1, 3, 1, 21, 4, 1, 1 8
a total of ____ numbers
0111110001000010010000100111110001001000010001000100001000000000
Raw pixels Run Length Encoding
a total of _____ numbers
compression - a way to represent data in more compact form
Compression
• When data is compressed, information is encoded using fewer bits.– This speeds transmission – Reduces storage cost (smaller drives)– May increase processing (must un-compress to
view/process)• For pictures, there are two types:
– Lossless: No information is lost– Lossy: Information may be lost
Lossy (jpg) and Lossless (png)