2D Discrete Wavelet Transform PDF

2D Discrete Wavelet Transform For an N x N input image, the 2D DWT -- Convolve each row of the image with h0[n] and h1[n], discard the odd numbered columns of the resulting arrays, and concatenate them to for a transformed row. -- After all rows have been transformed, convolve each column of the result with h0[n] and h1[n]. Again discard the odd numbered rows and concatenate the result. After the above two steps, one stage of the DWT is complete. The transformed image now contains four subbands, LL, HL, LH, and HH, standing for low-low, high-low, etc. The LL subband can be further decomposed to yield yet another level of decomposition. This process can be continued until the desired number of decomposition level is reached. BIT - NET 4007 Slide 36 2D Discrete Wavelet Transform BIT - NET 4007 Slide 37 NET 4007 Multimedia Networking Image Compression Standards BIT - NET 4007 Slide 38 The JPEG Standard JPEG is an image compression standard that was developed by the “Joint Photographic Experts Group”. JPEG was formally accepted as an international standard in 1992. JPEG is a lossy image compression method. It employs a transform coding method using the DCT (Discrete Cosine Transform). An image is a function of i and j (or conventionally x and y) in the spatial domain. The 2D DCT is used as one step in JPEG in order to yield a frequency response which is a function F(u; v) in the spatial frequency domain, indexed by two integers u and v. BIT - NET 4007 Slide 39 Observations for JPEG Image Compression The effectiveness of the DCT transform coding method in JPEG relies on 3 major observations: Observation 1: Useful image contents change relatively slowly across the image, i.e., it is unusual for intensity values to vary widely several times in a small area, for example, within an 8 x 8 image block. -- Much of the information in an image is repeated, hence “spatial redundancy”. BIT - NET 4007 Slide 40 Observations for JPEG Image Compression Observation 2: Psychophysical experiments suggest that humans are much less likely to notice the loss of very high spatial frequency components than the loss of lower frequency components. -- the spatial redundancy can be reduced by largely reducing the high spatial frequency contents. Observation 3: Visual acuity (accuracy in distinguishing closely spaced lines) is much greater for gray (“black and white”) than for color. -- chroma subsampling (4:2:0) is used in JPEG. BIT - NET 4007 Slide 41 Block Diagram for JPEG Encoder BIT - NET 4007 Slide 42 Main Steps in JPEG Image Compression Transform RGB to YIQ or YUV and subsample color. DCT on image blocks. Quantization. Zig-zag ordering and run-length encoding. Entropy coding. BIT - NET 4007 Slide 43 DCT on Image Blocks Each image is divided into 8 x 8 blocks. The 2D DCT is applied to each block image f(i, j), with output being the DCT coefficients F(u, v) for each block. Using blocks, however, has the effect of isolating each block from its neighboring context. This is why JPEG images look choppy (“blocky”) when a high compression ratio is specified by the user. http://www.sfu.ca/~cjenning/toybox/hjpeg/index.html BIT - NET 4007 Slide 44 Quantization F(u, v) represents a DCT coefficient, Q(u; v) is a “quantization matrix” entry, and represents the quantized DCT coefficients with JPEG will use in the succeeding entropy coding. The quantization step is the main source for loss in JPEG compression. We can handily change the compression ratio simply by multiplicatively scaling the numbers in Q(u; v) matrix. In fact, the quality factor, a user choice offered in every JPEG implementation, is essentially linearly tied to the scaling factor. BIT - NET 4007 Slide 45 Quantization Tables The entries of Q(u, v) tend to have larger values towards the lower right corner. This aim to introduce more loss at the higher spatial frequencies – a practice supported by Observations 1 and 2. The following tables show the default Q(u, v) values obtained from psychophysical studies with the goal of maximizing the compression ratio while minimizing perceptual losses in JPEG images. BIT - NET 4007 Slide 46 Run-length Coding (RLC) on AC Coefficients RLC aims to turn the values into sets {#-zero-to-skip, next non- zero value} To make it most likely to hit a long run of zeros: a zig-zag scan is used to turn the 8 x 8 matrix into a 64-vector. BIT - NET 4007 Slide 47 DPCM on DC Coefficients The DC coefficients are coded separately from the AC ones. Differential Pulse Code Modulation (DPCM) is the coding method. If the DC coefficients for the first 5 image blocks are 150, 155, 149, 152, 144, then the DPCM would produce 150, 5, -6, 3, -8, assuming di = DCi+1 − DCi, and d0=DC0. BIT - NET 4007 Slide 48 Four Commonly Used JPEG Modes Sequential Mode -- the default JPEG mode, implicitly assumed in the discussions so far. Each graylevel image or color image component is encoded in a single left-to-right, top-to-bottom scan. Progressive Mode. Hierarchical Mode. Lossless Mode -- discussed in Chapter 7, to be replaced by JPEG-LS. BIT - NET 4007 Slide 49 The JPEG2000 Standard The JPEG is no doubt the most successful and popular image format to date. The main reason for its success is the quality of its output for relatively good compression ratio. Design goals for JPEG2000 -- To provide a better rate-distortion tradeoff and improved subjective image quality. -- To provide additional functionalities lacking in the current JPEG standard. BIT - NET 4007 Slide 50 JPEG2000 Image Compression Use Embedded Block Coding with Optimized Truncation (EBCOT) algorithm that partitions each subband LL, LH, HL, HH produced by the wavelet transform into small blocks called “code blocks”. A separate scalable bitstream is generated for each code block ➔ improved error resilience. Main steps of JPEG200 Image compression -- Embedded block coding and bitstream generation. -- Post compression rate distortion (PCRD) optimization -- Layer formation and representation BIT - NET 4007 Slide 51 Layer Formation and Representation JPEG2000 offers both resolution and quality scalability through the use of a layered bitstream organization and a two-tiered coding strategy. The first tier produces the embedded block bit-streams while the second tier encodes the block contributions to each quality layer. BIT - NET 4007 Slide 52 Region of Interest Coding in JPEG2000 Goal: Particular regions of the image may contain important information, thus should be coded with better quality than others. Usually implemented using the MAXSHIFT method which scales up the coefficients within the ROI so that they are placed into higher bit-planes. During the embedded coding process, the resulting bits are placed in front of the non-ROI part of the image. Therefore, given a reduced bit- rate, the ROI will be decoded and refined before the rest of the image. BIT - NET 4007 Slide 53 Region of Interest Coding in JPEG2000 BIT - NET 4007 Slide 54 Problems with JPEG 2000 Higher computational demands. Higher memory demands. BIT - NET 4007 Slide 55 NET 4007 Multimedia Networking Basic Video Compression Techniques BIT - NET 4007 Slide 56 Introduction to Video Compression A video consists of a time-ordered sequence of frames, i.e., images. Continuous motion produced at a frame rate of 15 fps or higher. Traditional movies run at 24 fps. NTSC TV standard in north America uses about 30 fps. An obvious solution to video compression would be predictive coding based on previous frames. Compression proceeds by subtracting images: subtract in time order and code the residual error. It can be done even better by searching for just the right parts of the image to subtract from the previous frame. BIT - NET 4007 Slide 57 Video Compression with Motion Compensation Consecutive frames in a video are similar – temporal redundancy exists. Temporal redundancy is exploited so that not every frame of the video needs to be coded independently as a new image. The difference between the current frame and other frame (s) in the sequence will be coded – small values and low entropy, good for compression. Steps of Video compression based on Motion Compensation (MC): -- Motion Estimation (motion vector search). -- MC-based Prediction. -- Derivation of the prediction error, i.e., the difference. BIT - NET 4007 Slide 58 Motion Compensation Each image is divided into macroblocks of size N x N. -- By default, N = 16 for luminance images. For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted. Motion compensation is performed at the macroblock level. -- The current image frame is referred to as Target Frame. -- A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)). -- The displacement of the reference macroblock to the target macroblock is called a motion vector MV. -- The following figure shows the case of forward prediction in which the Reference frame is taken to be a previous frame. BIT - NET 4007 Slide 59 Motion Compensation MV search is usually limited to a small immediate neighborhood – both horizontal and vertical displacements in the range [-p, p]. This makes a search window of size (2p + 1) x (2p + 1). BIT - NET 4007 Slide 60 Sequential Search Sequential search: sequentially search the whole (2p + 1) x (2p + 1) window in the Reference frame (also referred to as Full search). -- a macroblock centered at each of the positions within a macroblock centered at each of the positions within frame pixel by pixel and their respective MAD is then derived using the above equation. -- The vector (i, j) that offers the least MAD is designated as the MV (u, v) for macroblock in the Target frame. -- sequential search method is very costly -- assuming each pixel comparison requires three operations (subtraction, absolute value, addition), the cost for obtaining a motion vector for a single macroblock is (2p+1)*(2p+1)*N2 *3 ➔ O(p2N2). BIT - NET 4007 Slide 61 2D Logarithmic Search Logarithmic search: a cheaper version, which is suboptimal but still usually effective. -- As illustrated in the following figure, initially only nine locations in the search window are used as seeds for a MAD-based search; they are marked as ‘1’. -- After the one that yields the minimum MAD is located, the center of the new search region is moved to it and the step-size ("offset") is reduced to half. -- In the next iteration, the nine new locations are marked as ‘2’, and so on. -- 2D-Logarithmic search method can save substantially compare to sequential search. O (log p * N 2) vs. O (p 2 N 2). BIT - NET 4007 Slide 62 Hierarchical Search The search can benefit from a hierarchical (multi-resolution) approach in which initial estimation of the motion vector can be obtained from images with a significantly reduced resolution. The following figure is a three-level hierarchical search in which the original image is at Level 0, images at Levels 1 and 2 are obtained by down-sampling from the previous levels by a factor of 2, and the initial search is conducted at Level 2. Since the size of the macroblock is smaller and p can also be proportionally reduced, the number of operations required is greatly reduced. BIT - NET 4007 Slide 63 Hierarchical Search Given the estimated motion vector (uk, vk) at Level k, a 3 x 3 neighborhood centered at (2 uk, 2 vk) at Level k – 1 is searched for the refined motion vector. The refinement is such that at level k – 1 the motion vector (uk-1, vk-1) satisfies: BIT - NET 4007 Slide 64 Comparison of Computational Cost BIT - NET 4007 Slide 65 H. 261 H.261: An earlier (1990) digital video compression standard, its principle of MC-based compression is retained in all later video compression standards. -- The standard was designed for videophone, video conferencing and other audiovisual services over ISDN. -- The video codec supports bit-rates of p x 64 kbps, where p ranges from 1 to 30. -- Require that the delay of the video encoder be less than 150 msec so that the video can be used for real-time bi-directional video conferencing. BIT - NET 4007 Slide 66 H.261 Frame Sequency Two types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames): -- I–frames are treated as independent images. Transform coding method similar to JPEG is applied within each I – frames, hence “Intra”. -- P–frames are not independent: coded by a forward predictive coding method (prediction from a previous P-frame is allowed – not just from a previous I-frame. -- Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal. -- To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video. BIT - NET 4007 Slide 67 Rate Control: Problem The Problem: H.261 is typically used to send data over a constant bit rate channel, such as ISDN (e.g. 384kbps). The encoder output bit rate varies depending on amount of movement in the scene (see the graph below which shows the bit rate variation for a typical H.261 encoded video sequence). Therefore, a rate control mechanism is required to map this varying bit rate onto the constant bit rate channel. BIT - NET 4007 Slide 68 Rate Control: Solution The encoded bitstream is buffered and the buffer is emptied at the constant bit rate of the channel. An increase in scene activity will result in the buffer filling up. The quantization step size in the encoder is increased which increases the compression factor and reduces the output bit rate. If the buffer starts to empty, then the quantization step size is reduced which reduces compression and increases the output bit rate. The compression, and the quality, can vary considerably depending on the amount of motion in the scene relatively "static" scenes lead to low compression and high quality "active" scenes lead to high compression and lower quality. Rate Ctrl Video Encoder Channel Sequence Buffer BIT - NET 4007 Slide 69 H.263 H.263 is an improved video coding standard designed by the ITU-T in 1995/1996 as for video conference and other audiovisual services. The codec was first designed to be utilized in H.324 based systems (PSTN and other circuit-switched network videoconference and videotelephony), but has since found use in H.323 (RTP/IP-based videoconferenceing), H.320 (ISDN-based videoconferencing), RTSP (Steaming media) and SIP (Internet conferencing) solution as well. Aims at low bit-rate communications at bit-rates of less than 64 kbps. As low as 20 kbps to 24 kbps. Most Flash video content (as used on sites such YouTube, Google video, MySpace etc) is encoded in this format. BIT - NET 4007 Slide 70 Half-Pixel Precision In order to reduce the prediction error, half pixel precision is supported in H. 263 vs. full-pixel precision only in H.261. -- The default range for both the horizontal and vertical components u and v of MV (u, v) are now [-16, 15.5]. -- The pixel values needed at half-pixel positions are generated by a simple bilinear interpolation method, as shown in the following figure. BIT - NET 4007 Slide 71 Optional H.263 Coding Modes H.263 specifies many negotiable coding options in its various Annexes. Four of the common options are as follows. Unrestricted motion vector mode -- The pixels referenced are no longer restricted to be within the boundary of the image. -- When the motion vector points outside the image boundary, the values of the boundary pixel that is geometrically closest to the referenced pixel is used. -- The maximum range of motion vectors is [-31.5, 31.5]. Syntax-based arithmetic coding mode -- As in H.261, variable length coding (VLC) is used in H.263 as a default coding method for the DCT coefficients. -- Similar to H.261, the syntax of H.263 is also structured as hierarchy of four layers. Each layer is coded using a combination of fixed length code and length code. BIT - NET 4007 Slide 72 Optional H.263 Coding Modes Advanced prediction mode -- Four motion vectors (from each of the 8 x 8 blocks) generated from the neighboring blocks, left, right, above and below. PB-frames mode -- Similar to MPEG, H.263 can use B-frame, which is predicted bidirectionly from both the previous frame and the future frame. BIT - NET 4007 Slide 73 NET 4007 Multimedia Networking MPEG BIT - NET 4007 Slide 74 MPEG Overview MPEG: Moving Pictures Experts Group, establish in 1988 for the development of digital video. (reference MPEG page: http://www.mpeg.org ) It is appropriately recognized that proprietary interests need to be maintained within the family of MPEG standards Accomplished by defining only a compressed bitstream that implicitly defines the decoder. The compression algorithms, and thus the encoders, are completely up to the manufacturers. Source Pre-Processing Encoding Destination Post-Processing Decoding Scope of Standard BIT - NET 4007 Slide 75 MPEG-1 MPEG-1 was approved by MPEG in Nov. 1991 for coding of moving picture and associated audio for digital storage media at up to about 1.5 Mbps. Common storage media include CD and VCDs. MPEG-1 adopts the CCIR601 digital TV format also known as SIF (Source Input Format). MPEG-1 supports only non-interlaced video. Normally, its picture resolution is: -- 352 x 240 for NTSC video at 30 fps -- 352 x 288 for PAL video at 25 fps -- It uses 4:2:0 chroma subsampling The MPEG-1 standard is also referred to as ISO/IEC 11172. It has five parts: 11172-1 Systems, 11172-2 Video, 11172-3 Audio, 11172-4 Conformance, and 11172-5 Software. BIT - NET 4007 Slide 76 Motion Compensation in MPEG-1 Motion Compensation (MC) based video encoding in H.261 works as follows: -- In Motion Estimation (ME), each macroblock (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame – prediction. -- Prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps. -- The prediction is from a previous frame -- forward prediction. BIT - NET 4007 Slide 77 Bidirectional Search The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame. BIT - NET 4007 Slide 78 Motion Compensation in MPEG-1 MPEG introduces a third frame type -- B-frames, and its accompanying bi-directional motion compensation. The MC-based B-frame coding idea is illustrated in the following figure. -- Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction). -- If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged (indicated by `%' in the figure) before comparing to the Target MB for generating the prediction error. -- If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction. BIT - NET 4007 Slide 79 MPEG Frame Sequence BIT - NET 4007 Slide 80 MPEG Frame Sequence BIT - NET 4007 Slide 81 MPEG-1 Video Bitstream BIT - NET 4007 Slide 82 MPEG-2 Unlike MPEG-1, which is basically a standard for storing and playing video on the CD at a low bitrate (1.5 Mbps), MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps. It was initially developed as a standard for digital broadcast TV. MPEG-2 was approved by MPEG in Nov. 1994. It has gained wide acceptance beyond digital TV. Various applications include digital video discs or digital versatile discs (DVDs). Defined seven profiles aimed at different applications (e.g., low delay videoconferencing, scalable video, HDTV): Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview (Stereo video). -- Within each profile, up to four levels are defined (the following table) BIT - NET 4007 Slide 83 MPEG-2 Defined seven profiles aimed at different applications (e.g., low delay videoconferencing, scalable video, HDTV): Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview (Stereo video). -- Within each profile, up to four levels are defined (the following table) -- The DVD video specification allows only four display resolutions: 720 x 480, 704 x 480, 352 x 480, and 352 x 240 – a restricted form of the MPEG-2 Main profile at the Main and low levels. BIT - NET 4007 Slide 84 Supporting Interlaced Video MPEG-1 supports only non-interlaced (progressive) video. MPEG-2 must support interlaced video since this is one of the options for digital broadcast TV and HDTV. In interlaced video each frame consists of two fields, referred to as the top-field and the bottom-field. In a Frame-picture, all scanlines from both fields are interleaved to form a single frame, then divided into 16x16 macroblocks and coded using motion compensation. If each field is treated as a separate picture, then it is called Field- picture. BIT - NET 4007 Slide 85 Five Modes of Predictions MPEG-2 defines Frame Prediction and Field Prediction as well as five prediction modes. 1. Frame prediction for frame pictures: Identical to MPEG-1 MC-based prediction methods in both P-frames and B-frames. 2. Field prediction for field pictures: A macroblock size of 16 x 16 from field-pictures is used. 3. Field prediction for frame-pictures: The top-field and bottom-field of a Frame-picture are treated separately. Each 16 x 16 macroblock (MB) from the target Frame picture is split into two 16 x 8 parts, each coming from one field. Field prediction is carried out for these 16 x 8 parts. BIT - NET 4007 Slide 86 Five Modes of Predictions 4. 16 x 8 MC for field-pictures: Each 16 x 16 macroblock (MB) from the target Field-picture is split into top and bottom 16 x 8 halves. Field prediction is performed on each half. This generates two motion vectors for each 16 x 16 MB in the P-Field-picture, and up to four motion vectors for each MB in the B-Field-picture. This mode is good for a finer MC when motion is rapid and irregular. 5. Dual-Prime for P-pictures: First, Field prediction from each previous field with the same parity (top or bottom) is made. Each motion vector mv is then used to derive a calculated motion vector cv in the field with the opposite parity taking into account the temporal scaling and vertical shift between lines in the top and bottom fields. For each MB the pair mv and cv yields two preliminary predictions. Their prediction errors are averaged and used as the final prediction error. This mode mimics B- picture prediction for P-pictures without adopting backward prediction. This is the only mode that can be used for either Frame pictures or Field-pictures. BIT - NET 4007 Slide 87 MPEG-2 Scalabilities As in JPEG2000, scalability is also an important issue for MPEG-2. Scalable coding is especially useful for MPEG-2 video transmitted over networks with following characteristics: -- Networks with very different bit-rates. -- Networks with variable bit rate (VBR) channels. -- Networks with noisy connections. The MPEG-2 scalable coding: A base layer and one or more enhancement layers can be defined -- also known as layered coding. -- The base layer can be independently encoded, transmitted and decoded to obtain basic video quality. -- The encoding and decoding of the enhancement layer is dependent on the base layer or the previous enhancement layer. BIT - NET 4007 Slide 88 MPEG-2 Scalabilities MPEG-2 supports the following scalabilities: 1. SNR Scalability – enhancement layer provides higher SNR. 2. Spatial Scalability – enhancement layer provides higher spatial resolution. 3. Temporal Scalability – enhancement layer facilitates higher frame rate. 4. Hybrid Scalability – combination of any two of the above three scalabilities. 5. Data Partitioning – quantized DCT coefficients are split into partitions. BIT - NET 4007 Slide 89 SNR Scalability SNR scalability: Refers to the enhencement/refinement over the base layer to improve the Signal-Noise-Ratio (SNR). The MPEG-2 SNR scalable encoder will generate output bitstreams Bits_base and Bits_enhance at two layers: 1. At the Base Layer, a coarse quantization of the DCT coefficients is employed which results in fewer bits and a relatively low quality video. 2. The coarsely quantized DCT coefficients are then inversely quantized (Q−1) and fed to the Enhancement Layer to be compared with the original DCT coefficient. 3. Their difference is finely quantized to generate a DCT coefficient refinement, which, after VLC, becomes the bitstream called Bits enhance. BIT - NET 4007 Slide 90 Spatial Scalability The base layer is designed to generate bitstream of reduced-resolution pictures. When combined with the enhancement layer, pictures at the original resolution are produced. The Base and Enhancement layers for MPEG-2 spatial scalability are not as tightly coupled as in SNR scalability. As the following figure shows, the original video data is spatially decimated by a factor of 2 and sent the base layer encoder. The predicted MB from the base layer is now spatially predicted to get resolution 16 x 16, which is combined with the normal, predicted MB to form the MB of the enhancement layer. BIT - NET 4007 Slide 91 Temporal Scalability Temporally scalable coding has both the base and enhancement layers of video at a reduced rate (frame rate). The input video is temporally demultiplexed into two pieces, each carrying half of the original frame rate. Base Layer Encoder carries out the normal single-layer coding procedures for its own input video and yields the output bitstream Bits base. BIT - NET 4007 Slide 92 Temporal Scalability The prediction of matching MBs at the Enhancement Layer can be obtained in two ways: 1. Interlayer MC (motion-compensation) prediction 2. Combined MC prediction and interlayer MC prediction BIT - NET 4007 Slide 93 Hybrid Scalability Any two of the above three scalabilities can be combined to form hybrid scalability: 1. Spatial and Temporal Hybrid Scalability. 2. SNR and Spatial Hybrid Scalability. 3. SNR and Temporal Hybrid Scalability. Usually, a three-layer hybrid coder will be adopted which consists of Base Layer, Enhancement Layer 1, and Enhancement Layer 2. BIT - NET 4007 Slide 94 Data Partitioning Base partition contains lower-frequency DCT coefficients, enhancement partition contains high-frequency DCT coefficients. Strictly speaking, data partitioning is not layered coding, since a single stream of video data is simply divided up and there is no further dependence on the base partition in generating the enhancement partition. Useful for transmission over noisy channels and for progressive transmission. BIT - NET 4007 Slide 95 Other Major Differences from MPEG-1 Better resilience to bit-errors: In addition to Program Stream, a Transport Stream is added to MPEG-2 bit streams. Support of 4:2:2 and 4:4:4 chroma subsampling to increase color quality. Nonlinear quantization. More restricted slice structure: MPEG-2 slices must start and end in the same macroblock row. In other words, the left edge of a picture always starts a new slice and the longest slice in MPEG-2 can have only one row of macroblocks. More flexible video formats: It supports various picture resolutions as defined by DVD, ATV and HDTV. BIT - NET 4007 Slide 96 Overview of MPEG-4 ◼ MPEG-1&2 employ frame-based coding techniques. Their main concern is high compression ratio and satisfactory quality of video under such compression techniques. ◼ MPEG-4: a newer standard. Besides compression, pays great attention to issues about user interactivities. Version 1 was approved in early 1999. Version 2 was approved in 2000. ◼ MPEG-4 departs from its predecessors in adopting a new object-based coding. ◼ Offering higher compression ratio, also beneficial for digital video composition, manipulation, indexing, and retrieval. ◼ The following figure shows MPEG-4 videos can be composed and manipulated by simple operations on the visual objects. ◼ The bit-rate for MPEG-4 video now covers a large range between 5 kbps to 10 Mbps. BIT - NET 4007 Slide 97 Composition and Manipulation of MPEG-4 Videos BIT - NET 4007 Slide 98 Overview of MPEG-4 MPEG-4 is an new standard for: ◼ Composing media objects to create desirable audiovisual scenes. ◼ Multiplexing and synchronizing the bitstreams for these media data entities so that they can be transmitted with guaranteed Quality of Service (QoS). ◼ Interacting with the audiovisual scene at the receiving end – provides a toolbox of advanced coding modules and algorithms for audio and video compressions. BIT - NET 4007 Slide 99 Comparison of Interactivities in MPEG Standards BIT - NET 4007 Slide 100 Block-based Coding & Object-based Coding BIT - NET 4007 Slide 101 VOP-based Coding ◼ MPEG-4 VOP-based coding also employs the Motion Compensation technique: ◼ An Intra-frame coded VOP is called an I-VOP. ◼ The Inter-frame coded VOPs are called P-VOPs if only forward prediction is employed, or B-VOPs if bi- directional predictions are employed. ◼ The new difficulty for VOPs: may have arbitrary shapes, shape information must be coded in addition to the texture of the VOP. Note: texture here actually refers to the visual content, that is the gray-level (or chroma) BIT - NETvalues 4007 of the pixels in the VOP. Slide 102 VOP-based Motion Compensation (MC) ◼ MC-based VOP coding in MPEG-4 again involves three steps: ◼ Motion Estimation. ◼ MC-based Prediction. ◼ Coding of the prediction error. ◼ Only pixels within the VOP of the current (Target) VOP are considered for matching in MC. ◼ To facilitate MC, each VOP is divided into many macroblocks (MBs). MBs are by default 16 x 16 in luminance images and 8 x 8 in chrominance images. BIT - NET 4007 Slide 103 VOP-based Motion Compensation (MC) ◼ MPEG-4 defines a rectangular bounding box for each VOP (see Fig. 12.5 for details). ◼ The macroblocks that are entirely within the VOP are referred to as Interior Macroblocks. The macroblocks that straddle the boundary of the VOP are called Boundary Macroblocks. ◼ To help matching every pixel in the target VOP and meet the mandatory requirement of rectangular blocks in transform coding (e.g., DCT), a pre-processing step of padding is applied to the Reference VOPs prior to motion estimation. Note: Padding only takes place in the Reference VOPs. BIT - NET 4007 Slide 104 VOP-based Motion Compensation (MC) BIT - NET 4007 Slide 105 Sprite Coding ◼ A sprite is a graphic image that can freely move around within a larger graphic image or a set of images. ◼ To separate the foreground object from the background, we introduce the notion of a sprite panorama: a still image that describes the static background over a sequence of video frames. ◼ The large sprite panoramic image can be encoded and sent to the decoder only once at the beginning of the video sequence. ◼ When the decoder receives separately coded foreground objects and parameters describing the camera movements thus far, it can reconstruct the scene in an efficient manner. ◼ Fig. 12.10 shows a sprite which is a panoramic image stitched from a sequence of video frames. BIT - NET 4007 Slide 106 Sprite Coding BIT - NET 4007 Slide 107 Synthetic Object Coding ◼ The number of objects in videos that are created by computer graphics and animation software is increasing. These are denoted synthetic objects and can often be presented together with natural objects. ◼ 2-D mesh-based and 3D model-based coding and animation methods are used for synthetic objects. BIT - NET 4007 Slide 108 MPEG-4 Object Types, Profiles and Levels ◼ The standardization of Profiles and Levels in MPEG-4 serve two main purposes: ◼ ensuring interoperability between implementations ◼ Allowing testing of conformance to the standard ◼ MPEG-4 not only specified Visual profiles and Audio profiles, but it also specified Graphics profiles, Scene description profiles, and one Object descriptor profile in its Systems part. ◼ Object type is introduced to define the tools needed to create video objects and how they can be combined in a scene. BIT - NET 4007 Slide 109 MPEG-4 Part10/H.264 ◼ The H.264 video compression standard, formerly known as ‘H.26L’, is being developed by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG. ◼ Preliminary studies using software based on this new standard suggests that H.264 offers up to 30-50% better compression than MPEG-2, and up to 30% over H.263+ and MPEG-4 advanced simple profile. ◼ The outcome of this work is actually two identical standards: ISO MPEG-4 Part10 and ITU-T H.264. ◼ H.264 is currently one of the leading candidates to carry High Definition TV (HDTV) video content on many potential BIT - NET 4007 applications. Slide 110 MPEG-7 ◼ The main objective of MPEG-7 is to serve the need of audiovisual content-based retrieval (or audiovisual object retrieval) in applications such as digital libraries. ◼ It is not a standard that deals with the actual encoding of moving picture and audio, like MPEG-1,2&4. It is intended to provide complementary functionality to the previous MPEG standard, representing information about the content, not the content itself. It uses XML to store metadata, and can be attached to timecode in order to tag particular events. ◼ MPEG-7 became an International Standard in September 2001 - with the formal name Multimedia Content Description Interface. BIT - NET 4007 Slide 111 MPEG-7 ◼ An MPEG-7 architecture requirement is that description must be separate from the audiovisual content. On the other hand there must be a relation between the content and description. ◼ MPEG-7 uses the following tools ◼ Descriptor (D): It is a representation of a feature defined syntactically and semantically. ◼ Multimedia Description Schemes (DS): Specify the structure and semantics of the relations between its components. ◼ Description Definition Language (DDL): It is based on XML language used to define the structural relations between descriptors. ◼ System tools: These tools deal with binarization, synchronization, transport and storage of descriptors. MPEG-x Resources: http://www.m4if.org/resources.php#Section40 BIT - NET 4007 Slide 112 MPEG-21 ◼ The development of the newest standard, MPEG-21: Multimedia Framework. ◼ The vision for MPEG-21 is to define a normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain. This open framework will provide content creators, producers, distributors and service providers with equal opportunities in the MPEG-21 enabled open market. ◼ MPEG-21 is based on two essential concepts ◼ The definition of a fundamental unit of distribution and transaction, which is the Digital Item. ◼ The concept of users interacting with them. BIT - NET 4007 Slide 113 MPEG-21 ◼ Digital Items can be considered the kernel of the Multimedia Framework and the users can be considered as who interacts with them inside the Multimedia Framework. ◼ Therefore, we could say that the main objective of the MPEG-21 is to define the technology needed to support users to exchange, access, consume, trade or manipulate Digital Items in an efficient and transparent way. BIT - NET 4007 Slide 114 MPEG-21 ◼ The seven key elements in MPEG-21 are: ◼ Digital item declaration -- to establish a uniform and flexible abstraction and interoperable schema for declaring Digital items. ◼ Digital item identification and description -- to establish a framework for standardized identification and description of digital items regardless of their origin, type or granularity. ◼ Content management and usage -- to provide an interface and protocol that facilitate the management and usage (searching, caching, archiving, distributing, etc.) of the content. BIT - NET 4007 Slide 115 MPEG-21 ◼ Intellectual property management and protection (IPMP) -- to enable contents to be reliably managed and protected. ◼ Terminals and networks -- to provide interoperable and transparent access to content with Quality of Service (QoS) across a wide range of networks and terminals. ◼ Content representation -- to represent content in an adequate way for pursuing the objective of MPEG-21, namely “content anytime anywhere”. ◼ Event reporting -- to establish metrics and interfaces for reporting events (user interactions) so as to understand performance and alternatives. BIT - NET 4007 Slide 116

2D Discrete Wavelet Transform PDF

Document Details

Tags

Related

Summary

Full Transcript