Final Exam Multimedia Communications

12
Final Exam Multimedia Communications M. Arif Nugroho 2101120003 1. H.264 a. H.264 Basic process H.264 divided into two basic process : encoding and decoding. On encoding process carries out prediction, transforming, and encoding process to produce a compressed H.264 bitstream. An H.264 video decoder carries out the complementary processes of decoding, inverse transform and reconstruction to produce a decoded video sequence. Figure 1.1 H.264 encoder process Figure 1.2 H.264 decoder process. Encoding Process Prediction The encoder forms a prediction of the current macroblock based on previously-coded data, either from the current

description

multimedia communications exam

Transcript of Final Exam Multimedia Communications

Page 1: Final Exam Multimedia Communications

Final Exam Multimedia Communications

M. Arif Nugroho 2101120003

1. H.264 a. H.264 Basic process

H.264 divided into two basic process : encoding and decoding. On encoding process carries out prediction, transforming, and encoding process to produce a compressed H.264 bitstream. An H.264 video decoder carries out the complementary processes of decoding, inverse transform and reconstruction to produce a decoded video sequence.

Figure 1.1 H.264 encoder process

Figure 1.2 H.264 decoder process.

Encoding Process Prediction

The encoder forms a prediction of the current macroblock based on previously-coded data, either from the current frame using intra prediction or from other frames that have already been coded and transmitted using inter prediction. The encoder subtracts the prediction from the current macroblock to form a residual.The prediction methods supported by H.264 are more flexible than those in previous standards,enabling accurate predictions and hence efficient video compression. Intra

Page 2: Final Exam Multimedia Communications

prediction uses 16 × 16 and 4 × 4 block sizes to predict the macroblock from surrounding, previously coded pixels within the same frame (Figure 1.4).The values of the previously-coded neighbouring pixels are extrapolated to form a prediction of the current macroblock. Figure 1.5 shows an example. A 16×16 prediction block is formed, an approximation of the original macroblock. Subtracting the prediction from the original macroblock produces a residual block (also containing 16 × 16 samples). Inter prediction uses a range of block sizes from 16 × 16 down to 4 × 4 to predict pixels in the current frame from similar regions in previously coded frames (Figure 1.6). These previously coded frames may occur before or after the current frame in display order. In the example shown in Figure 1.6 , macroblock 1 (MB1) in the current frame is predicted from a 16 × 16 region in the most recent ‘past’ frame. MB2 is predicted from two previously coded frames. The upper 8 × 16 block of samples, a ‘partition’, is predicted from a past frame and the lower 8 × 16 partition is predicted from a future frame.

Figure 1.3 Flow Diagram

Figure 1.4 Intra Prediction

Figure 1.5 Original Macroblock, intra prediction, Residual MB

Page 3: Final Exam Multimedia Communications

Figure 1.6 Inter Prediction

Transform and QuantizationA block of residual samples is transformed using a 4 x 4 or 8 x 8 integer transform, an approximate from the Discrete Cosine Transform (DCT). The transform outputs a set of coefficients, each of which is a weighting value for a standart basis pattern. When combined, the weighted basis patterns re-create the block of residual samples. The output of the transform, a block of transform coefficients, is quantized (each coefficient is divided by an integer value). Quantization reduces the precision of the transform coefficients according to a quantization parameter (QP). For example, the original coefficient values in figure 1.7 are divided by a Quantization parameter and rounded to the nearest integer. Typically, the result is a block in which most or all of the coefficients are zero, with a few non-zero coefficients. Setting QP to a high value means that more coefficients are set to zero, resulting in high compression at the expense of poor decoded image quality. Setting QP to a low value means that more non-zero coefficients remain after quantization, resulting in better image quality at the decoder but but also in lower compression.

Figure 1.7 Quantization Example Bitstream Encoding

The video coding process produces a number of values that must be encoded from the compressed bitstream. These values include :

o Quantized transform coefficients.o Information to enable the decoder to re-create the prediction.o Information about the structure of the compressed data and the compression

tools used during encoding.o Information about the complete video sequence.

These values and parameters, syntax elements are converted into binary codes using variable length coding or arithmetic coding. Each of these encoding methods produces an efficient, compact binary representation of the information. The encoded bitstream can then stored or transmitted.

Page 4: Final Exam Multimedia Communications

Decoder Process Bitstream decoding

A video decoder receive the compressed H.264 bitstream, decodes each of the syntax elemets and extracts the information described above, i.e. quantized transform coefficients, prediction information, etc. this information is then used to reverse the coding process and recreate a sequence of video images.

Rescaling and inverse transformThe quantized transform coefficients are re-scaled. Each coefficient is multiplied by an integer value to restore its original scale. In the example of figure 1.8, the quantized coefficients are each multiplied by a QP or step size of 8. The rescaled coefficients are similar but not identical to the originals (figure 1.7)

Figure 1.8 Rescaling example

Figure 1.9 Inverse transform : combining weighted basis patterns to create a 4 x 4 image block.

An inverse transform combines the standard basis patterns, weighted by the re-scaled coefficients, to re-create each block of residual data. Figure 1.9 shows how the inverse DCT or integer transform creates an image block by weighting each basis pattern according to a coefficient value and combining the weighted basis patterns. These blocks are cobined together to from a residual macroblock.

Page 5: Final Exam Multimedia Communications

ReconstructionFor each macroblock, the decoder forms an identical prediction to the one created by the encoder using inter prediction from previously-decoded frames or intra prediction from previously-decoded samples in the current frame. The decoder adds the prediction to the decoded residual to reconstruct a decoded macroblock which can then be displayed as part of a video frame (figure 1.10)

Figure 4.10 Reconstruction flow diagram

2. a. Advantages and disadvantages of tcp/udp video streaming. Udp Disadvantages :

complex error handling mechanism. UDP is an unreliable protocol. As a result, packets may be lost during transit. To offer good-quality video, these losses have to be mitigated. Retransmission, Forward Error Correction and error concealment are techniques which may be used.

Network unfriendliness : UDP transmission is not elastic and hence not TCP friendly. As a result, it either takes unfarily too much bandwidth or leads to high packet loss in the presence of fluctuating bandwidth.

Unselective data loss : for video stream, some frames and some data fields are more important than others and need to be protected. Since wireless error occurs at any time, these important data may be lost, leading to degradation in quality. If those more important frames or data fields can be selectively protected, better video quality would be achieved.

Firewall penetration : through some protocols make use of UDP (STUN,SIP,RTP). Applications using UDP experienced penetration problem than TCP.

Advantages : UDP protocol is fast. No error checking required.

Tcp :Disadvantages :

high complexity on TCP protocols.Advantages :

Page 6: Final Exam Multimedia Communications

Reliable transmission : tcp is a reliable protocol and hence effectively addresses the synchronization and retransmission problem as mentioned above. There is no need of complex error concealment and resiline mechanism which need to be implemented in the client and proxy.

Network fairness : TCP is instrinsically friendly, which shares network resources with other data traffic/flows in the presence of congestion. There is no need to implement other mechanisms to achieve fairness. It also adapts its transmission rate according to the avalilable network bandwidth, thereof allowing the video applications to make full use of the bandwidth.

Ease of deployment : using TCP in applications is easy, and TCP applications more readily penetrate firewalls.

b. UDP use checksum for error detection while transporting data packets over a network. UDP at the sender performs the one’s complement of the sum of all the 16-bit fields in the UDP header. The checksum is also calculated for a few of the fields in the IP header in addition to the UDP header. The computed result is stored in the checksum field of the UDP header. If the computed checksum is zero, this field must be set to 0XFFFF. In the destination computer, it passes an incoming IP datagram to UDP if the valu in the type field of the IP header is UDP. When the UDP receives the datagram from IP, it examines the UDP checksum. All 16-bit fields in the UDP header are added together, including the checksum. If this sum equals 1111111111111111, the datagram has no errors. If one of the bits in the computed sum is zero, it indicates that the datagram was inadvertenly altered during transmission and thus has some errors. If the checksum field in the UDP header is zero, it means that the sender did not calculate checksum field in the UDP header is zero, it means that the sender did not calculate checksum and the field can be ignored. If the checksum is valid or nonzero, UDP at the destination computer examines the destination port number and if an application is bound to that port, the datagram is transferred to an application message queue to buffer the incoming datagrams before transferring them to the application. If the checksum is not valid, the destination computer discards the UDP datagram. Example : UDP header has three fields that contains the following 16-bits values: 0110011001100101,0101010101010110, and 0000111100001111, the checksum can be calculated as follows :First two 16-bit are added:01100110011001010101010101010110The sum of first and second 16-bit data is :1011101110111011Adding the third 16 bit data to the above sum gives :10111011101110110000111100001111The sum of these values is :1100101011001010The 1’s complement of the sum 1100101011001010 is 0011010100110101. Now the checksum computed by the sender’s UDP is 0011010100110101. At the destination

Page 7: Final Exam Multimedia Communications

computer, the values of all the four 16-bit fields, source & destination ports, length and checksum are added. If no errors were introduced in the datagram, the sum at the receiver will be 1111111111111111. If one of the bits is a zero, error detected.

3. a. on JPEG/MPEG-1/2 use fixed size block matching (FSBM). On MPEG 4 using various size block matching (VSBM).

b. VSBM is improved version of FSBM by varying the size of blocks to more accurately match moving areas. This method was proposed by chan,yu,and Constantinides. VSBM is a scheme that starts with relatively large blocks, which are then repeadtedly divided, this is a so-called top down approach. If the best matching error for a block is above some threshold, the block is divided into four smaller blocks, until the maximum number of blocks or locally minimum errors are obtained. The application of such top-down methods may generate block structures for an image that match real moving objects, but it seems that an approach which more directly seeks out areas of univorm motion might be more effective. For the same number of blocks per frame as FSBM, VSBM method results in a smaller mean sequare error (MSE), or better prediction. More significantly, for a similar level of MSE as FSBM, the VSBM technique can represent the inherent motion using fewer blocks, and thus a reduced number of motion vectors.

4. a. chroma substantion benefits : chroma subsampling is a technique that are used to reduce bandwidth in many video systems. Since the human visual system is not very sensitive to color, color resolution can be reduced to lower bandwidth. Video systems do this via chroma subsampling. b. artefact that found in an MPEG-encoded video :

Aliasing : occurs when a signal being sampled contains frequencies that are too high to be successfully digitized at a given sampling frequency. When samped these high frequencies fold back on top of the lower frequencies producing distortion. In most method of video digitizing, this will produced pronounced vertical lines in the picture. This problem can be reduced by applying a low pass filter to the video signal before it is digitized to remove the unwanted high frequency components.

Quantisation Noise : this form of distortion occurs because, when digitized the continuously variable analogue waveform must be quantized into a fixed finite number of levels. It is the coarseness of these levels that causes quantisation noise. A 24-bit colour picture suffers from virtually no quantisation noise, since the number of available colours is so high – 16.7 million. Reasonable results can be obtained from an 8-bits per pixel picture, especially if the picture is greyscale rather than colour.

Overload : like quantisation noise, overload is related to the finite number of levels that the signal can take. If a signal is digitized that is too high in amplitude, then the picture will appear bleached. For example, if the signal level of a greyscale image is too high for the conversion process to cope with, then all levels above the maximum will be converted to white, causing the washed out appearance.

Video signal degradation : video in digital form degrades far less gracefully than its analogue counterpart. While digital information may in theory be duplicated an infinite number of times without any degradation, once that degradation does occur, it is very noticeable. Due to the compression techinques used, a single bit

Page 8: Final Exam Multimedia Communications

error in the data stream could for example cause a large block of pixels to be displayed in a completely different colour to that intended.

Gibbs effect : this is most noticeable around artificial objects such as plain coloured, large text and geometric shape such as square. It shows up as a blirring or haze around the object, where the sudden transition is made from the artificial object to the background. It is caused by the discrete cosine transform used to digitize chrominance and luminance information. This phenomena is also apparent around more natural shapes like a human figure.

Blockiness : when video footage involving high speed motion is digitized, the individual 8x8 blocks that make up the picture become more pronounced.

5. PRISM Codec Improvement. there are new proposed process on PRISM decoding and encoding. The new proposed process are : a. Syndrome coding : syndrome coding is introduced to make possible exploiting the

temporal redundancy at the decoder. This module can generate compression without having the knowledge of the exact difference between the current video block and the best predictor created at the decoder based on the previously decoded frame. The compression is achieved through the careful selection of the quantized DCT coefficients bitplanes that are transmitted to the decoder. So, the syndrome encoder selects the bitplanes that will be transmitted to the decoder and those that will estimated based on good temporal redundancy.

b. Block classification : the number of bitplanes coded for a given DCT coefficient should increase with the increasing of the difference between the current block and the best predictor created at the decoder based on the previously decoded frame; this difference corresponds to correlation noise. However, as the encoder cannot rely on motion estimation, the correlation noise has to be predicted somehow based on the available information from the previous frame. The approach is to classify each block to be coded into one of 16 correlation classes, [0;15], relying on the difference between the current block and the collocated one in the previous frame measured through the MSE. The block classified with 0 are skipped (no bits transmitted since there is very high correlation) and those classified with 15 are intra encoded since it is considered there is not enough temporal redundancy to be exploited.

c. Cyclic redundancy coding : the decoder side information is constituted by multiple video block candidates, corresponding to motion compensated blocks from the previously decoded frame. While any of the side information candidates can help disambiguate the information received by the syndrome encoder, not all can do it correctly because some of them are too different from the original block. To detect that a block has been successfully decoded, a 16 bits CRC playing the role of a block hash/signature, is transmitted to the decoder.

Page 9: Final Exam Multimedia Communications