Multimedia Systems NOTES

UNIT-1INTRODUCTION TO MULTIMEDIAMultimedia System: One way of defining multimedia can be found in the meaning of composed world multi-many, much multiple.

Medium: An intervening substance through which some thing is transmitted or carried on.Computer System medium:

1. Text2. Image3. Sound4. Video

Representation Dimension of media:Media are divided into two types in respect to time in their representation space:

1. Time independent (discrete): Information is expressed only in its individual value. E.g.: text, image, etc.

2. Time dependent (continuous): Information is expressed not only it’s individual value, but also by the time of its occurrences. E.g.: sound and video.

Multimedia system is defined by computer controlled, integrated production, manipulation, presentation, storage and communication of independent information, which is encoded at least through a continuous and discrete media.

Classification of Media:1. The perception media2. The representation Media3. The Presentation Media4. The storage media5. The transmission media6. The information Exchange media

Perception media: Perception media help human to sense their environment. The central question is how human perceive information in a computer environment. The answer is through seeing and hearing. Seeing: For the perception of information through seeing the usual such as text, image and video are used.Hearing: For the perception of information through hearing media such as music noise and speech are used.

Representation media: Representation media are defined by internal computer representation of information. The central question is how the computer information is coded? The answer is that various format are used to represent media information in computer.

i. Text, character is coded in ASCII codeii. Graphics are coded according to CEPT or CAPTAIN video text standard.

1

iii. Image can be coded as JPEG formativ. Audio video sequence can be coded in different TV standard format(PAL,

NTSC,SECAM and stored in the computer in MPEG format)

Presentation Media: Presentation media refer to the tools and devices for the input and output of the information. The central question is, through which the information is delivered by the computer and is introduced to the computer.Output media: paper, screen and speaker are the output media.Input Media: Keyboard, mouse, camera, microphone are the input media.

Storage media: Storage Media refer to the data carrier which enables storage of information. The central question is, how will information be stored? The answer is hard disk, CD-ROM, etc.

Transmission media: Transmission Media are the different information carrier that enables continuous data transmission. The central question is, over which information will be transmitted? The answer is co-axial cable, fiber optics as well as free air.

Information exchange media: Information exchange media includes all information carrier for transmission, i.e. all storage and transmission media. The central question is, which information carrier will be used for information exchange between different places? The answer is combine uses of storage and transmission media. E.g. Electronic mailing system.

Usages of Multimedia Application:1. Education2. Training3. Entertainment4. Advertisement5. Presentation6. Business Communication7. Web page Design

Traditional data streams characteristics: A sequence of individual packets transmitted in time dependent fashion is called data stream. The data stream will be used as a synonym data flow.

Transmission of information carrying different media leads to data stream with very different features. The attributes of synchronous, asynchronous and isochronous data transmission conform the field of computer communication and switching.

i. Asynchronous Transmission mode: The asynchronous transmission mode provides for communication with no

timely restriction. Packets reach the receivers as fast as possible. All information of discrete media can be transmitted as asynchronous data

stream. If an asynchronous mode is chosen for transmission of continuous

2

media, additional technique most be applied to provide the time restriction. E.g.: Ethernet, protocol of worldwide internet for e-mail transmission.

ii. Synchronous Transmission mode: The synchronous define the maximum end to end delay for each packet of the data stream. This upper bound will never be violated. Moreover, a packet can reach the receiver at any arbitrary earlier time. So most of the time the receiver has to hold the packet temporarily. A packet has a start frame and the end frame. Start frame is used to tell the receiving station that a new packet of characters is arriving and use d to synchronize the receiving station's internal clock. The end frame is used to indicate the end of packet.

iii. Isochronous Transmission mode:Isochronous transmission Mode defines maximum end to end delay as well as minimum end to end delay. This means the delay jitter for individual packet is bounded. Isochronous transmission mode minimizes the overhead of the receiver. Storage of packet at receiver is reduced.

Properties of Multimedia System:The uses of term multimedia are not every arbitrary combination of media. Justify.

1. Combination of media: A simple text processing program with in corporate image is often called a multimedia application. Because two media are processed through one program. But one should talk multimedia only when both continuous and discrete media are utilized. So text processing program with incorporated images is not a multimedia application.

2. Computer support integrated-> computer is idle tools for multimedia application

3. Independence: An important aspect of different media is their level of independence from each other. In general there is a request for independence of different media but multimedia may requires several level of independence. E.g. A computer controlled video recorder stores audio and video information's. There is inherently tight connection between two types of media. Both media are coupled together through common storage medium of tape. On the other hand for the purpose of presentation the combination of DAT (digital audio tape recorder) signals and computer available text satisfies the request for media independence.

Global structure of Multimedia System:1. Application domain2. System domain3. Device domain

3

Application domain: provides functions to the user to develop and present multimedia projects. This includes software tools, and multimedia projects development methodology.

System Domain: including all supports for using the function of the device domain, e.g. operating system, communication systems (networking) and database systems.

Device domain: basic concepts and skill for processing various multimedia elements and for handling physical device.

4

Sound and Audio:

Sound: Sound is a physical phenomenon produce by vibration of matter such as violin string or block of wood. As a matter vibrates pressure vibration are created in the air surrounding it. The alternation of high and low pressure is propagated through air in a wave like motion. When a wave reaches a human air a sound is heard.

impBasic Concept of sound:

Amplitude

Time

Air pressure

On period

Sound is produced by vibration of matter. During the vibration the pressure vibration are created in the air surrounding it. The pattern of oscillation is called wave form.The wave form repeats the same shape at regular interval and this portion is called period. Sound wave is never perfect, smooth or uniformly periodic.Frequency of the sound is the reciprocal value of period. It represents the number of period in a second and is measured in Hz cycles/sec.

Note: 1 KHz = 1000 HzThe frequency range is divided into 0 to 20 Hz Infra Sound20Hz-20 KHz Human hearing frequency range.

Multimedia system typically makes use of sound only within the frequency range of human hearing. We will call sound in human hearing range audio and wave is called acoustics signals.

Amplitude: a sound has an amplitude property subjectively heard as loudness. The amplitude of a sound is the measure of the displacement of air pressure waveform from its mean.

Computer Representation of Sound: Diagram goes here…..Sampling: Sound wave form the smooth, continuous is not directly represented in the computer. The computer measures the amplitude of the wave form in the regular time interval

5

to produce the series the numbers. Each of this measurement is called sample. This process is called sampling.

Sampling rate: the rate at which a continuous wave form is sampled is called sampling rate. Like frequency, sampling rate are measured in Hz. For loss less digitization the sampling rate should be at least twice of the maximum frequency response.

Quantization: Diagram goes here…Just as a wave form is sampled at discrete times the value of sample is also discrete. The quantization of the sample value depends on the number of bits used in measuring the height of the wave form. The lower quantization lower quality of sound, higher quantization higher quality of sound.

Music: The relationship between music and computer has become more and more important, especially considering the development of MIDI (Musical Instrument and Digital Interface) and its important contribution in the music industry. MIDI interface between musical instrument and computer.

MIDI Basic Concept:MIDI is the standard that manufacturer of electronic musical instrument have agreed upon. It is a set of specification, they use in building their instrument. So that the instrument of different manufacturers can communicate without difficulty. MIDI is defined in 1983.

MIDI Interface Components: MIDI interface has two different components:

i. Hardware: Hardware connects the equipment. It specifies the physical connection between musical instruments. A MIDI Port is built in to the instrument and it specifies MIDI cable and deals with electronic signals that are sent over the cable.

ii. A data format:Data format encodes the information travelling through the hardware. MIDI doesn't transmit an audio signal. For this propose MIDI data format is used. The MIDI encoding includes, besides the instrument specification, the notion of the beginning and ending node, basic frequency and sound volume. MIDI data format is digital. The data are grouped into MIDI message. Each MIDI message can communicate one musical event between instruments. Note: If a musical instrument satisfies both components of the MIDI standard, the instrument is called a MIDI device which can communicate with other MIDI device through channels. MIDI standard specifies 16 channels and identifies 128 instruments. Each MIDI device is mapped to channel.

MIDI Reception Mode:

6

Mode 1: Omni on / polyMode 2: Omni on / monoMode 3: Omni off / polyMode 4: Omni off / mono

To tune a MIDI device to one or more channels, the device most be set to one of the MIDI reception mode. There are four MIDI reception modes which are given above.The first half of the mode name specifies how MIDI device monitors the incoming MIDI channels. If Omni is turned on, the MIDI device monitors all the MIDI channels and responds to all channel messages, no matter which channels they are transmitted on. If the Omni is turned off, the MIDI device respond only to channel message, the device is set to receive.The second half of the mode name tells the MIDI device how to play nodes coming in over the MIDI cable. If the option Poly is set the device can play several nodes at a time. If the node is set to mono the device play only one node at a time like a monophonic synthesizer.

MIDI Devices: Synthesizer

- Sound generator- Microprocessor- Keyboard- Control panel- Auxiliary Controller- Memory

Sequencers

Midi synthesizers are component that generates sound based on input MIDI software message. It is the heart of any MIDI system. A typical synthesizer looks like simple piano keyboard with a panel full of buttons. Most synthesizers have following components:

1. Sound Generator: Sound generator do the actual work of synthesizing. The propose of the rest of synthesizer is to control the sound generator. The principal propose of the generator is to produce a audio signals that becomes sound when fed into loud speaker. By changing the voltage oscillation of the sound signal, a sound generator changes the quality of the sound, i.e. pitch, loudness.

2. Microprocessor (Controller): It communicate with keyboard to know what notes a musician is playing and what command the musician want to send. Microprocessor then specifies note to sound generators. In other words microprocessor sends and receives MIDI message.

7

3. Keyboard: Keyboard provides the musician direct control of the synthesizer, pressing the keys on the keyboard. Which instruct the microprocessor what notes to be played and how long to play them?

4. Control Panel: It controls those functions that are not directly concerned with notes and duration. The panel controls include a slider that sets the overall volume of synthesizer a button that turns the synthesizer on, off and menu.

5. Auxiliary Controller: Auxiliary controller are available to give more control over the note played on the keyboard. It concern with notes and duration. Two common variables on synthesizer are pitch band and modulation.

6. Memory: Synthesizer memory is used to store patches for the sound generators and setting of the control panel.

Sequencers: It is an electronic device in cooperating with both hardware and software, which is used as storage server for generated MIDI data. A sequencer may be computer. Sequencer transforms the note into MIDI message.

MIDI Message: 1. Channel Message

i. Channel Voice Messageii. Channel mode Message

2. System Messagei. System Real-time messageii. System Common Messageiii. System exclusive Message

MIDI Message: It transmits information between the MIDI devices and determines what kinds of musical events can be passed from device to device. The format of the MIDI message consists of the status byte (first byte of any MIDI message) which describe the kind of message and data byte. MIDI message are divided into following types:

1. Channel Message: It go only to specified devices. There are two types of channel message.i. Channel Voice Message: Channel Voice message send actual performance data

between MIDI devices describing keyboard action, controller action and control panel changes. They describe music by defining pitch, note on, note off channel pressure, etc.

ii. Channel Mode Message: Channel Mode message determine the way that a receiving MIDI device respond to channel voice message. It deals with how to play notes coming in over MIDI cables. Channel mode message includes Omni On, Omni Off, note off, note on, etc.

2. System message: System message go to all devices in a MIDI system because no channel number are specified. There are three types of system message.i. System Real-time Message: These are very short and simple, consisting of only

one byte. They carry extra data with them. These messages synchronize the

8

timing MIDI devices in performance. To avoid delay these message are sent in the middle of other message if necessary.

ii. System Common Message: System common message are commands that prepares sequencers and synthesizer to play song. E.g. Song selected, find the common starting place in the song.

iii. System Exclusive message: System exclusive message allow MIDI manufacturers to create customized MIDI message to send between the MIDI devices.

MIDI Software: The software application generally falls into four major categories.1. Music recording and performance application2. Musical notation and printing application3. Synthesizer Patch editor and library patch 4. Music education application

MIDI and SMPTE Timing StandardMIDI reproduces traditional note length using MIDI clock, which are represented through timing clock message. Using MIDI clock a receiver can synchronize with the clock cycle of sender. To keep the standard timing reference the MIDI specification state 24 MIDI clock = 1 quarter note.As an alternative the SMPTE timing standard (society of Motion Picture and Television Engineer) can be used. The SMPTE timing standard was originally developed by NASA as a way to make incoming data from different tracking stations so that the receiving computer could fill what time each piece of data was created. SMPTE format consists of hour: minutes: second: frames: bits-30 frames per second, SMPTE uses a 24 hour clock from 0 to 23 before recycling.

Speech: speech can be perceived understood, and generated by human and also by machines. A human adjust himself/ herself very efficiently to different speakers and their speech habit. The human brain can recognize the very fine line between speech and noise.Speech signals have two properties which can be used in speech processing.

1. Voice speech signals show during certain time intervals almost periodic behavior. Therefore we can consider these signals as quasi-stationary signals for around 30ms

9

2. The spectrum of the audio signals shows characteristics maxima, which are mostly 3-5 frequency bands. These maxima called format.

Speech generation:1. Speech output system could transfer text into speech automatically without any lengthy

processing.2. Generated speech most be understandable and most sound natural.

Types of Speech:The important requirement of speech generation is the generation of the real time signals. The easiest method for speech generation is to use pre-coded speech and play it back in the timely fashion.

Time-dependent sound concatenation: CRUM

Fig: Phone sound concatenation

Fig: Di-phone concatenation

10

A

r

MK

KRrA AM

KrAM

Sentence part

Word

Syllable

Consonants Vowels Consonants

Fig: Word sound concatenation

Speech generation can also be performed by sound concatenation in a timely fashion. Individual speech unit are composed like building blocks, where the composition can occur at different levels. In the simplest case, the individual phones are understood as speech units. In the above given example individual phone of the word CRUM are shown. It is possible with just a few phones to create from an unlimited vocabulary. The phone sound concatenation shows the problem during the transition between individual phones. This problem is called co articulation which is mutual sound effect. To solve these problems, Di-phone sound concatenation is used. Two phones can constitute a Di-phone. In the above figure, Di-phone of word CRUM is shown. The transition problem is not solved sufficiently on this level. To make the transition problem easier, syllabus can be created. The speech is generated through set of syllabus. The above given figure word sound concatenation and syllabus sound shows the syllabus sound of word CRUM. The best pronunciation of word is achieved is storage of whole word. This leads towards synthesize the speech sequence.

Note: Co articulation ProblemThe transition between individual sound units create an essential problem called co articulation which is the mutual sound influence throughout the several sound.

11

Frequency dependent sound concatenation: Speech generation can also be based on frequency dependent sound concatenation. E.g. through formant synthesizing. Formants are the frequency maxima in the spectrum of the speech. Formants synthesize simulate the vocal track through filter. This characteristics value is filters middle frequency and their bandwidth. The method used for sound synthesize in order to simulate human speech is called the linear predictive coding (LPC) method. Using speech synthesize a text can be transformed into acoustics signals. The typical component of this system is given in the figure below:

12

Transcription Synthesis

Letter-to-phone& dictionary Exceptions

Sound transfer

In the first step the transcription is performed where text is translated into sound script. Most transcription methods work with later to phone rules. A dictionary of exception stored in the library.In the second step the sound script is translated into a speech signals. Besides the problem of co-articulation ambiguous pronunciation most be considered.

Speech analysis:

Speech analysis deals with research area shown in the above figure.Human speeches have certain characteristics determined by speaker. So speech analysis can server to analyze who is speaking. i.e. to recognize a speaker, for his identification and verification. The computer identifies and verifies the speaker using an acoustic finger print is digitally stored. It is digitally stored speech probe (certain statement of the speaker).Another main part of speech analysis is to analyze what has been said. To recognize and understand the speech itself.Another area of speech analysis tries to research speech pattern with respect to how the statement was said. E.g. a speaker sentence sounds differently if a person is angry or happy.

13

Speech Analysis

Who

Understanding IdentificationVerification

HowWhat

Recognition

Speech Recognition System and Understanding:

The system which provides the recognition and understanding of speech signal applies this principle several times as follows:

1. In the first step, the principle is applied to a sound pattern and/or word model. An acoustic and phonetic analysis is performed.

2. In the second step, certain speech units go through syntactical analysis. In this step the errors in the previous step can be recognized.

3. The third step deals with semantics of the previously recognized language. Here the decision error of previous step can be recognized and corrected.

There are still many problems into speech recognition and understanding research.1. Specific problem is presented by Room Acoustic with existed environment noise.2. Word boundaries must be determined. Very often neighboring words follows into one

another.3. For the comparison of the speech elements to the existing pattern, time normalization is

necessary. The same word can be spoken quickly or slowly.

Speech Transmission

The area of speech transmission deals with efficient coding of the speech signal to allow speech/sound transmission at low transmission rates over networks.

Its goal is to provide the receiver with same SPEECH/Sound quality as generated by the sender side.

Signal Form CodingThis coding considers no speech-specific properties and parameters.

Here the goal is to achieve most efficient coding of the Audio signal.

The data rate at a PCM-coded Stero-audio signal with CD-quality requirement is

Rate=2*( 44100/s) * ((16bit)/(8bit /byte)) =1,76,400 bytes/s = 1,411,200 bits/s

14

Source Coding:Parameterized systems works with source coding algorithms. Here the specific

speech characteristics are used for data rate reduction.

Eg: Channel Vo-Coder is an example of paramterized system.

IMAGES AND GRAPHICS

An image is a spatial representation of an object, a two-dimensional or 3d scene or another image.

It can be real or virtual.15

BASIC CONCEPTS:

An image can be thought as a function with resulting values of the light intensity at each point over a planar region.

For digital computer operations ,this function needs to be sampled at discrete intervals.

DIGITAL IMAGE REPRESENTATION.

A digital image is represented by a matrix of numeric values each representing a quanitized intensity value.

If I is 2-D matrix , then I(r,c) is the intensity value at the position corresponding to row r and column c of the matrix.

The point at which an image is sampled is called as PICTURE ELEMENTS (PIXELS).

The pixel values of intensity images are called gray scale levels( encodes “color” for the image).

Intensity at each pixel is represented by an integer and is determined from the continuous image around pixel location.

If there are two Intensity Values , eg: Black and white , it is represented as 0 and 1 , such images are called BINARY-VALUED IMAGES.

If 8-bit integers are used to store each pixel value, the gray level ranges from 0(black) to 255(white).

Image Format:There are different kinds of image format, some of them are captured image format and the format when image are stored, (i.e., stored image format.)

- Captured image format- Stored image format

Captured Image format:

The image format is specified by the two main parameters: spatial resolution which is specified as pixels X pixels and color encoding, which is specified by bits per pixels. Both parameter values depend on hardware and software for input and output of images.

16

Eg: For image capturing on SPARC station, Video Pix Card & its S/W is used. The Spatial Resolution is 320*240 Pixels.

The Color can be Encoded witg 1-bit(Binary image format) , 8-bit(color or grayscale) or 24-bit(color-RGB).

Resolution: 1024 X 768 pixelsColor can be encoded with one bit binary image.

Eight bit for color, grey scale Twenty four bit for color RGB

Stored image format: When we store an image we are storing a two dimensional array values in which each

value represent the data associated with pixel in image. For bitmap this value is binary digit. For color image this value may be collection of Three numbers representing the intensities of red, green and blue components

of the color at that pixel. Some current image file format for storing image include JPEG (Joint

photographic expert group), JPG (joint photographic group). A single number that is an index to a table of color triples. If there is space we can store image in the form of RGB Triples.

Some of the image formats are:

GIF - ( graphical Interchange Format)

JPEG

TIFF( Tagged Image File format)

Bitmap, IRIS, Post script etc.

RIFF( Resource Interchange File Format) -_ used for Bitmaps, vector drawings, animation , video etc…

GRAPHICS FORMAT:17

Graphics image formats are specified through GRAPHICS PRIMITIVES and their attributes.

The lines, rectangles,circles and ellises , text strings specify 2D in graphical image.

Polyhedron, specifiying 3D.

Graphics primitives and their attributes represent a higher level of an image representation.(i.e., graphical images are not represented by pixel matrix.)

The advantage of the higher level primitives is the reduction of data to store per graphical image.

The dis-advantage is the additional conversion step from the graphical primitives and their attributes to its pixel representation.

Graphics package like SRGP(simple Raster Graphics Package) provide such a conversion to take graphics primitives and attributes and generate either a Bitmap or Pixmap.

A Bitmap is an array of pixel values that map one by one to pixels on the screen.

COMPUTER IMAGE PROCESSING:

i) IMAGE SYNTHESIS :

User interfaces Office Automation and Electronic Publishing Simulation and Animation for scientific visualization and Entertainment

Dynamic Graphics:a) motion graphicsb) update graphics

*The framework of Interactive Graphics System Graphics hardware -----Input Graphics hardware -----Output

Dithering

18

ii) IMAGE ANALYSIS

IMAGE RECOGNITION

IMAGE RECOGNITION STEPS

CONDITIONINGLABELLINGGROUPINGEXTRACTINGMATCHING

III) IMAGE TRANSMISSION

Raw image transmission Compressed image data transmission Symbolic image data transmission

(Extra notes on digital image)

Digital image is represented by two dimensional matrix e.g. I[x] [y]When I is the two dimensional matrix then, I [x][y] is the intensity value at the position corresponding to the row x and column y of the matrix.

Pixel:

Image Analysis: Image analysis is concerned with techniques for extracting description from the image (shape, position, any distance, color) that are necessary for object recognition and classification.

Digital Image processing: The field of digital image processing refers to the processing of digital image by means of digital computer.

Fundamental step of digital image processing:

19

1. Image acquisition is the first process in the digital image processing. Generally image acquisition stage involves pre-processing such as scale.

2. Image Enhancement: It is the simplest area of digital image processing. The idea behind the enhancement technique is to bring out details that is hidden or not clearly seen or highlight the certain feature of the interest of the image. E.g. Increase the contrast of an image. It is human subjective area of image processing. No mathematical model or probabilistic models are used. Input and output of this step is an image.

3. Image Restoration: Image restoration is the area that also deals with improving the appearance of the object (image) how ever enhancement, which is subjective, image restoration, is objective in the sense that restoration technique tends to be based on mathematical or probabilistic model of image processing. Output of this process is generally image.

4. Image compression: Image compression deals with the technology or technique for reducing the storage required to save the image or the bandwidth required to transmit it. Image compression is familiar to the most of the user of computer in the form of the file extension such as jpg extension. Output of this process is image.

5. Morphological process: morphological processing deals with tools for extracting components (attributes) that are useful in the representation and description of the shape. E.g. shape detection, contour detection, line detection, etc. output of this processing is generally attributes.

6. Segmentation: Is the process of partitioning of an image into its component or object. Segmentation is most difficult task in the digital image processing. Rough segmentation

20

process brings the process long way towards the successful solution. The more accurate segmentation the more likely recognition is to be succeeded. The output of this process is also attributes.

7. Representation and description (Labeling): It almost always follow the output of segmentation state, which usually is a raw pixel data, consisting either boundary of the region (the set of pixels separating one image segment from another) or all the points in the region itself. The decision that must be made is weather the data should be represented as boundary or complete region. When focus is on internal property complete region is suitable. When focus is in external shape boundary is suitable. Choosing representation is only part of solution for transforming raw data into a form suitable for subsequent computer processing.

Description is also called feature selection.

8. Recognition (matching): Recognition is the process that assigns a label in object based on its descriptions. It is the process of classify the object according to the attributes extracted.

Unit-II

Video and Animation

Video Signal Representation

• In conventional TV sets or monitors, the video signal is displayed using a CRT (Cathode Ray Tube).

• An electron beam sweeps the screen from top to bottom beam carrying

the corresponding pattern information, such as intensity in a viewed scene.

Video signal representation includes three aspects:

21

• the visual representation,

• transmission, and

• digitalization

visual representation:The main objective is to offer the viewer a sense of presence in the scene &

participation in the events portrayed.

To meet this, televised image should convey spatial & temporal content of the scence.

Important measures include:1.Vertical detail and viewing distance:

• The geometry of the TV image depends on the ratio of the picture width W to height H, generally referred to as aspect ratio. The conventional aspect ratio is 4/3.

2. Horizontal detail and picture width

22

The picture width for conventional TV is 4/3 of the picture height.

3. Total Detail Content of the Image

• vertical resolution = number of pixels in the picture height

• Number of pixels in width of picture = vertical resolution x aspect ratio

4. Perception of depth

• Perception of depth depends primarily on the angular separation

of the images received by the two eyes of the viewer.

• In flat screen of TV, a considerable degree of depth is inferred from the

perspective appearance of the object matter.

5.Luminance and Chrominance

Color vision is achieved through three signals, proportional to the relative

intensities of Red, Green and Blue light (RGB) in each portion of the

screen.

• However, during the transmission of the signals from camera to the receiver (display), a different color encoding that uses luminance and two chrominance signals are used.

6.Temporal aspects of illumination(Lighting):

In contrast to continuous pressure waves of an acoustic signal, a discrete

sequence of individual pictures can be perceived as a continuous sequence.23

To represent visual reality, two conditions must be met:

• Rate of repetition of images must be high enough to guarantee smooth

motion from frame to frame.

• The rate must be high enough so that persistence of vision extends

over intervals between discrete flashes.

7.Continuity of Motion:

• We perceive continuity of motion at any frame rate faster than 15 frames

per second.

• Smooth video motion is achieved at 30 frames per second.

Movies use 24 frames per second and have a jerkiness especially when large

objects are moving fast and close to the viewer,

The NTSC (National Television Systems Committee) standard for

motion video signal specified frame rates of 30/sec, as compared to 25/sec

for the European PAL system.

24

Video Transmission...

The oldest standard for transmission and reception of video signals is the

NTSC (National Television Systems Committee).

• To encode color, a video signal is a composite of three signals.

• For transmission purposes, a video signal consists of one luminance and

two chrominance signals.

Approaches to Color Encoding:

RGB Signal:

Consists of separate signals for red, green and blue colors. Other colors

can be coded as combination of these primary colors.

YUV Signal:

Since human perception is more sensitive to brightness than any

chrominance information, therefore a coding technique can distinguish

between luminance and chrominance.

• Instead of three separate colors, the brightness information (luminance

Y) is separated from the color information (two chrominance

25

channels U and V).

• The luminance component must always be transmitted because of the

compatibility requirements for black and white video.

The component division for YUV signal is:

Y=0.30 R + 0.59 G + 0.11 B

U= (B-Y) x 0.493

V = (R-Y) x 0.877

Any error in the resolution of the luminance Y is more important than

in the chrominance values U and V.

Therefore, the luminance values are coded using higher bandwidth than

chrominance value.

YUV is used in PAL and SECAM systems.

YIQ signal:

Is a coding similar to YUV signal and is the basis for NTSC format.

Y =0.30R + 0.59G + 0.11B

I = 0.60R - 0.28G - 0.32B

Q = 0.21R - 0.52G + 0.31B

Human eye is most sensitive to Y, next to I, next to Q.

Therefore, NTSC allocates 4 MHz of the bandwidth to Y, 1.5 MHz to I, and

0.6 MHz to Q.

26

Digitization:

• Digitization consists of sampling the gray (color) level in the picture at M

x N array of points. Then quantized samples are converted to bit streams.

• The next step in creation of digital motion video is to digitize pictures

in time and get a sequence of digital images per second that approximates

analog motion video.

27

Computer Video FormatsThe computer video format depends on the input and output devices for the motion video medium.

The output of the digitalized motion video depends on the display device.

The commonly used device is raster displays.

The video controller displays the image stored in the frame buffer, accessing the memory

through a separate Access port as often as

Raster scan rate dictates.

Here the important task is Constant refresh of the display.

28

For presentation of different colors on the screen, the system works with a Color Look Up Table(CLUT or lut).

Computer Video Controller Standards are:

• Color Graphics Adapter (CGA) has a resolution of 320x200 pixels with simultaneous presentation of 4 colors. The storage capacity per image is therefore:

320 * 200 pixels *2bits/pixel =128000 bits = 16,000 bytes

• The Enhanced Graphic Adapter

(EGA) supports a resolution of 640x350 pixels with

16-color presentation resulting in a storage capacity of 112,000 bytes per image.

640 * 350 pixels *(4bits/pixel) = 112,000 bytes.

• The Video Graphic Array (VGA)

works mostly with a resolution of 640x480 pixels and can display 256 colors simultaneously. The monitor is controlled through an RGB output. The storage capacity

per image is then 307,200 bytes.

29

640 X 480 pixels X ( 8 bits /pixel ) = 307,200 bytes

The Extended Graphics Array ( XGA)

Supports a resolution of 640 X 480 pixels and 65,000 different colors. With the resolution of 1024 X 786 pixles, 256 colors can be presented.

In this case the storage capacity per image is 8514/A adapter.

• The Super VGA (SVGA) offers resolutions up to 1024x768 pixels and color formats up to 24 bits per

pixel. The storage capacity per image is then 2,359,296 bytes.

1024 X 768 pixels X ( 24 bits/pixel = 2,359,256 bytes

Low-cost SVGA video adapters with video accelerator chips overcome the speed penalty of using a higher

resolution.

Color TV Systems

Conventional television systems:

30

The conventional television systems include:

• NTSC(National Television Systems Committee):

Developed by RCA in US:

It is the oldest and most widely used television standard.

The color carrier is used with approx. 4.429MHz .

It Uses a quadrature amplitude modulation, motion frequency (scan rate) of 30 Hz, total channel bandwidth of 6 MHz and 525 lines per picture. (Used in North and

South America and Japan)

For NTSC television , 4.2MHz for luminance and 1.5 MHz for each of the two chrominance channels.

However for Home Tv’s and VCR’s only 0.5 MHZ for chrominance channels.

• SECAM (Sequential Color avec Memoire): is a standard used in France and Eastern Europe,uses

frequency modulation, it uses motion frequency of 25 Hz, and each picture has 625 lines per picture.

31

• PAL: (Phase Alternating Line) was developed by Telefunken in 1963. Is used in Western Europe. Uses quadrature amplitude modulation, uses a total

bandwidth of 8MHz and 625 lines per picture.

• HDTV: Started with the MUSE system in Japan. In North America the efforts have been toward compatibility with NTSC.

High Definition TV

In 1995, the Advanced Television Systems Committee proposed the

ATSC Digital Television Standard that covers HDTV.

The standard recommends an aspect ratio of 16/9, 1080 vertical lines, 1920 horizontal pixels, and

interlaced scan rate of 60. The HDTV format will be a full digital Solution.

Computer-Based Animation

• Introduction• Producing an Animation• Specifications of Animations• Methods of Controlling Animations

32

• Display of Animations• Transmission of Animations

Introduction:

An animation covers all changes that have a visual effect . visual effects can be of different nature.

They might include time-varying positions (Motion Dynamics),shape,color,structure and texture of an object(update Dynamics) and changes in lighting,camera, position , orientation and focus.

• Computer-based animations are produced, edited and generated with the help of graphical tools to create visual effects

– Multimedia API’s• Java3D

– Constructs and renders 3D graphics– Provides a basic set of object primitives (cube,

splines,...etc.)– An abstraction layer built on top of DirectX or

OpenGL33

• DirectX– Windows API that supports video, images, audio,

and 3D animation– Most widely used for Windows-based animations

(video games)• OpenGL

– Most popular 3D API in use today– Highly portable

Basic concepts( Producing an Animation):

• Input Process– Drawings must be digitized or generated

• Digitizing photos or drawings may require post-processing (filtered) in order to remove any glitches(defects)

34

• Composition Stage– Individual frames in a completed animation are generated by us-

ing image composition techniques to combine foreground and background elements

– Trailer film(pencil test) is generated from placing low-resolution digitized frames in a grid.

• InBetween Process– Interpolation methods are used to animate the movement from

one position to another.• Linear interpolation (lerping) is the simplest but the most

limited– E.g. the interpolation of animating throwing a ball us-

ing three points

• Splines can be used to smoothly vary different parameters as a function of time, yet the problem is not completely solved (very complex)

• Changing Colors• Uses the Color LookUp Table (CLUT) or (LUT) of the graphics

memory and the double buffering method35

• Two parts of a frame are stored in different areas of graphic memory.

• The graphic memory is divided into two fields, each having half as many bits per pixel.

• The animation is generated by manipulating the CLUT

Animation Languages(Specification of Animations)

• Formal specifications that describe animations can be divided into three categories:

– Linear-List Notations– High-Level Programming Language Notations– Graphical Languages–

Linear List Notations:

• Each event is described by a beginning frame number, an end frame number and an action event that is to be performed.

– Action events may accept input parameters36

• For example 42, 53, B, ROTATE “PALM”, 1, 30

– This instruction means......• SCEne Format (Scefo) specification can be considered a superset of

linear sets including groups and object hierarchies as well as transfor-mation abstractions using high-level languages constructs.

(General purpose Languages) High-Level Programming Languages Notations:

• Values of variables can be used as parameters for animation routines.• For example, ASAS is a LISP extension that includes primitives such

as vectors, colors, polygons, surfaces, groups, points of view, sub-worlds, and lighting aspects in addition to geometrical transformations operating on objects

– For example(grasp my-cube); cube is current object

(cw 0.05); small clock-wise rotation

(grasp camera); camera is current object

(right panning-speed); Move it to the right

Graphical Languages

• Graphical actions cannot be easily described by or understood from textual scripts.

• Hence, graphical animation languages describe animations in a visual manner.

37

• GENESYS, DIAL and S-Dynamics System are examples of such sys-tems.

Methods of Controlling Animations:

Controlling animation is independent of the language used for describing it. It can use different techniques.

• Full Explicit Control (Explicitly Declared)• Procedural Control• Constraint-Based Systems• Tracking Live Action (Analyzing Live Action-Based )• Kinematics and Dynamics

Full Explicit Control (Explicitly Declared):

• All events that could occur in an animation are declared. This can be done at the

– object level by specifying simple transformations (translations, rotations, scaling) to objects

38

– frame level by specifying key frames and methods for interpolat-ing(insert) between them.

Procedural Control:

• Based on communication among different objects whereby each object obtains knowledge about the static/dynamic properties of other objects.

– Can be used to ensure consistency• For example ....

Constraint-Based Systems:

• Many objects movements in the real world are determined by other ob-jects which they come in contact with

– E.g. presence of strong wind or fast moving large objects• Instead of explicit declaration, constraints based on the environment

can be used to control objects’ motion.• Example Systems: Sketchpad and ThingLab.

Tracking Live Action (Analyzing Live Action-Based):

• Control is achieved by examining the motions of objects in the real world.

– Rotoscoping: is a technique where animators trace live action movement, frame by frame, for use in animated films.

• Originally, pre-recorded live-film images were projected onto a frosted glass panel and redrawn by an animator.

– This projection equipment is called a Rotoscope.• Another way is to attach indicators to key points on the body of a hu-

man actor.

39

– For example the data glove [gesture language for hearing-im-paired people.

Kinematics and Dynamics:

• Kinematics refer to the position and velocity of points– “The cube is at the origin at time t = 0. Thereafter, it moves with

constant acceleration in the direction (1 meter, 1 meter, 5 meters)”

• Dynamics takes into account the physical laws that govern kinematics– Newton laws for the movement of large objects– Euler-Lagrange equations for fluids– A particle moves with an acceleration proportional to the forces

acting on it.– For example: “At time t = 0, the cube is at position (0 meter, 100

meter, 0 meter). The cube has a mass of 100 grams. The force of gravity acts on the cube.”

40

Display of Animation:

• To display animations with raster systems, the animated objects must be scan-converted and stored as pixmap in the frame buffer.

– Scan conversion must be done at least 10 times per second to en-sure smooth visual effects.

• The actual scan-conversion must take a small portion of 10 times/second in order to avoid distracting ghost effect

• Double buffering is used to avoid the ghost effect

Transmission of Animation:

• Two forms of transmission– Symbolic representation of an animation is transmitted together

with the operations performed on the object.• The receiver displays the animation.

– Transmission is fast since text is much smaller than pixmaps

– Display is slow since the pixmap has to be generated from their descriptions.

– The pixmap representations are transmitted and displayed• Transmission time is longer.• Display is faster.

41

Data CompressionOn compressed graphics, audio and video data require considerable storage capacity which in the case of uncompressed video is not often feasible in today's CD technology. The same is for multimedia communication. The data transfer of uncompressed video data over digital network requires very high bandwidth to be provided for a single point to point communication. To provide visible and cost effective solution, most multimedia system handle compressed digital and audio stream data.

Coding requirement: Storage size and bandwidth minimization

1. Entropy Coding: Run length coding Huffman coding Arithmetic coding

2. Source coding: Prediction coding

- DPCM- DM

Transformation- FFT (Fast Fourier Transformation)- DCT (Discrete Cosine Transform)

i. Layered Coding Bit position Sub sampling Sub-band coding

ii. Vector Quantization3. Hybrid coding

- JPEG- MPEG- H.261- DVI- PLV

Entropy Coding: It is used regardless of media specific characteristics. The data stream to be compressed is considered to be a simple digital sequence and the semantics of the data is ignored. Entropy coding is an example of the lossless encoding as the decompression process regenerates the data completely. E.g. are Run length coding, Huffman coding, arithmetic coding, etc.Source coding: Source coding take into an account the semantics of the data. The degree of compression that can be reached by source coding depend on the data contain. In the case of source coding a one way relation between the original data stream and encoded data streams exists. The data streams are similar but not identical. E.g.: delta pulse code modulation, delta modulation, etc.

Hybrid Coding: It is the compression technique. That is the well known algorithm and transformation technique. E.g.: JPEG coding, MPEG coding, etc.

42

UncompressedPicture Compressed data

Major steps of data compression:

fig: Major steps of data compression

1. Picture preparation: this includes analog to digital conversion and generating a appropriate digital representation of the information. An image is divided into block of 8 X 8 pixels and represented by fixed number of bits/pixel

2. Picture processing: Processing is the actually the first step of compression process which makes use of sophisticated algorithm of transformation from time to frequency domain. It can be performed using DCD(Discrete Cosine transform).

3. Quantization Process: Quantization process the result of the previous step. It specifies the mapping of the real number into integer. This can be considered as the equivalent of µ-law and A- law which apply the audio data. Quantization is performed using the different number of bits per coefficient.

4. Entropy Coding: Entropy coding is usually last step of data compression. It compresses a sequence of digital stream without loss. For e.g. a sequence of zero's in the data stream can be compressed by specifying the number of occurrence followed by zero it.

The processing and quantization cab be repeated iteratively several times in feedback loop. The term spatial domain (time domain) refer to the image plane itself and approaches in this category are based on discrete manipulation of pixel in an image.Frequency domain processing technique is based on modifying the Fourier transform of an image.

Some basic compression Techniques:1. Runlength CodingSample image, audio and videos data stream often contain sequence of same bytes by replacing these repeated byte sequence with the number of occurrence, a substantial reduction of data can be achieved this is called Run length coding. Which is indicated by special flag called? Which does not occur as part of data stream? If a byte occur at least four consecutive times, the number of occurrence is counted. The compressed data contain these byte followed by the special flag and number of its occurrence. It converts byte between four and 259 to into three byte.Uncompressed data: A B C C C C C C C D D D E E E ECompressed data: A B C! 7 D D D E! 4

Run length encoding is the generalization of the zero suppression which assumes that just one symbol appears particularly often in sequence. The blank in the text such as symbol or pair of blanks are ignored. Starting with a sequence of three blanks, they are replaced by an M-byte and a byte that specifies the number of blanks of this sequence. Sequence of three to a maximum 258 bytes can be reduced to 2 bytes.Diatomic encoding the variation of run length coding based on combination of two bytes. This technique determines most frequently occurring pair of bytes.

43

Picture

preparation

Picture

Processing

Quantization Entropy

Coding

2. Huffman Coding:Huffman coding is one type of entropy coding where given character most be encoded together with the probability of their occurrence. The Huffman Coding Algorithm determines the optimal code using the minimum number of bits. The length (number of bits) of the coded character will be differing. To determine Huffman code, it is useful to construct a binary tree. The leaves (nodes) of the tree represent the characters that are to be encoded. Every nodes contains the occurrence of probability 0 and 1 are assigned to the branches of the tree. Every character has associated weight equal to number of times the character occurs in a data stream. (page no: 124)

P(A) = 0.16P(B) = 0.51P(C) = 0.09P(D) = 0.13P(E) = 0.11

Gray Level Probabilitya1 0.4a2 0.3a3 0.1a4 0.1

44

a5 0.06a6 0.04

3. Arithmetic Coding: Arithmetic coding is one type of the Entropy coding between the code symbol and code word doesn't exist because it doesn't encode each symbol separately. Each symbol is instead coded by considering the period data. Therefore coded data stream most always be read form beginning. Random access is not available. In practice the average compression rate achieved by arithmetic and Hoffman coding are similar. 4. Transformation Encoding: Data is transformed into another mathematical domain suitable for compression. The inverse transformation most exists and is known to the encoding process. The most widely known example is the Fourier Transformation which transforms data from the time into frequency domain. The most effective transformation for image compression is discrete cosine transformation and fast Fourier transformation.

Chapter 7Multimedia Operating system

Multimedia operating system is the system software that handles multimedia data and multimedia devices.

Important issues in Multimedia Operating System:1. Appropriate scheduling method should be applied: In contrast to the traditional real time

operating system, multimedia operating systems also have to consider task without hard timing restriction under the aspects of fairness.

2. Communication and synchronization between processes must meet the restriction of the real-time requirement and timing relations between different Medias.

3. In multimedia system, memory management has to provide access to data within a guaranteed timing delay and efficient data manipulation function.

4. Data management is a important component in multimedia system, however database management system abstracts the details of storing data on the secondary media storage. Therefore database management should depend on file management services provided by multimedia operating system.

Real-time system:Real time process is the process which delivers the result of processing in a given time. Main characteristics of real time system are the correctness of computation and fixed response time. Deadline represent the latest acceptable time for the presentation of the processing result. Real time system has both hard and soft deadline. Soft deadline is the type of deadline which in some cases is missed and may be tolerated as long as.

45

Hard deadline should never be violated. Hard deadline violation is the system failure.

Characteristics of Real time system:1. Predictably fast response to time critical event and accurate timing information.2. High degree of schedulability :

Schedulability refers to the degree of resource utilization at which or below which deadline of each time critical task can be taken into account. Under system overload, processing of the critical task most be done.

3. Management of manufacturing process and control of the military system are the application area of real time system.

Real-time and Multimedia System:1. A piece of music most be played back at a constant speed.2. To fulfill the timing requirement of the continuous media, the operating system most use real

time scheduling techniques.3. The real-time requirements of traditional real-time scheduling techniques and control system in

application areas such as factory automation, air craft piloting have high demand of security and fault tolerance.

4. The requirement desire from this demand somehow differentiates real time scheduling efforts applied to continuous media.

5. Multimedia system uses the different scenario then traditional real time operating system in real time requirements.

The fault tolerance requirement of multimedia system is usually less strict then those of real time system:

- Short time failure of continuous media system will not directly lead to the destruction the technical equipment or threat to human life.

- For many multimedia system applications missing a deadline is not a server failure. If the video frame is not available, it is omitted, the viewer hardly notices.

- Schedulability considerations for periodic task are much easier. Multimedia data processing operations are periodic.

- The bandwidth demand for continuous media is not always that strict.

Fig: Process State.

46

Running

BlockedReady

3

32

1

4

Phase of resource Reservations and Management:1. Schedulability test2. Quality of service calculation3. Resource reservation4. Resource scheduling

Real-time Scheduling algorithm:1. Earliest Deadline first2. Rate Monotonic Algorithm3. Shortest Job First

Behavior of Periodic Task:

Periodic task

Time constraints of the periodic task T are characterized by the following parameter (S, e, d, p)S: starting pointE: processing time of TP: Period of TR: rate of T=1/pWhere0 ≤ e≤ d ≤ p

Where starting point S is the first time when the periodic task requires processing. Afterward it requires processing in every period with the processing time "e". A S+ (k-1)*p is the time for the task T is ready for k processing. The processing of periodic task a period k must be finished at S+ (k-1)*p+d. For continuous media task it is assumed that the deadline of the period k-1 is the ready time for period k. This is known as congestion avoiding deadline. The deadline for each message with the period of respected periodic task p.

47

d

p

e

s

Resource: A resource is a system entity required by the task for manipulating task. Each resource has a set of …characteristics and classified using the following scheme:

1. Resource can be active:Active resource is those resources which often exclusively used in one process. It cannot be shared at a fraction of time. e.g.: CPU,

2. Resources can be passive:Passive resources are those resources which can be shared among the different processes at a fraction of time. There is always scarcity of resources in the computer technology. So managing the resources is the important task of Operating System.

Fig: Windows of insufficient resources.

Multimedia system with indicated audio and video processing is at the limit of their capacity, even with data compression and utilization of new technology. Current computer do not allow processing of data according to their deadlines without any resource reservation and real time process management. Processing in this context refer to any kind of manipulation and communication of data. This stage of development is known as the window of insufficient resources which is shown in the above figure.

Computer TechnologyCommunication Architecture:

1. Hybrid Communication

48

Hardware Resources

Insufficient

Resource

Sufficient Score Resources

Abundant

Resources

Interactive Video

High Quality

Audio

Network File

Access1980 1990 2000

Requirement

Integrated device control Integrated-transmission control Integrated transmission

2. Digital System

49

Multimedia Systems NOTES

Documents

Transcript of Multimedia Systems NOTES