EENG3010 -- MPEG

MPEG

Before we got into MPEG this week, we did a review of analog TV video standardswhich we also covered a little bit when we did PC Video. In Turkey and most of Europe we use PAL. France uses SECAM. The US and Japan use NTSC (National Television Systems Committee). PAL is the newest technology. SECAM is the middle one in age and NTSC is the oldest. These are all interlaced signals. All these broadcast techniques interlace odd and even lines. That is, in one frame odd lines are sent. The next frame even lines are sent. PAL is for example 25 frames per second. Engineers discovered that human eyes can easily detect flicker when there are only 25 frames per second. Instead they figured if 50 half-frames are sent (with alternating lines) perceived image quality goes up. In the case of NTSC, frame rate is 30/60. These frequencies are not coincidental. The higher numbers are the same as the frequency of the mains supply. In the old days, since circuits were purely analog, it was quite difficult to generate a frequency which was totally out of sync with the mains supply.

In the earlier lectures of this week, I spent sometime showing which countries use which of NTSC, SECAM, and PAL. I showed this map. And I said this is not really a technological map. It is a political map. Some of the other standards also spead in the world along political fault lines instead of technological ones. Here in this map, you may see the political influence areas of US such as Central America, some of South America, Canada, and parts of Far East (especially Phillipines). You may tell what parts of Africa are the so-called French Africa. You can tell which former Soviet republics are trying to break away from Russian influence. It seems like Ukraine and the three small Baltic states (Lithuania, Latvia, Estonia) are trying switch from Russia's SECAM (actually from France) to PAL. In this lecture, we also reviewed CRT basics, especially when we talked about interlaced rasterized video signals.

MPEG does not mean Motion Pictures Expert Group. It means Moving Pictures Expert Group. MPEG is not just a video compression technique. MPEG is a video transport standard. And it obviously deals with transporting and compressing audio. Remember MP3. Since MPEG also carries audio, it also needs to deal with synchronizing audio with video. This is most possibly done by putting timestamp info in both streams. I guess when such packets are lost, the A and the V can get out of sync. I have seen this happen in my satellite receiver. Although the speaker speaks Turkish, his lip movements do not sometimes make sense because there is a slight delay between the lio movements and the audio. When I power off and then on the equipment, the problem gets fixed. MPEG-2 proposed a new audio compression standard called AAC. Speaking of AC-3 is also an audio compression standard, which originally comes from Dolby and deals with 5.1 surround sound. AC-3 has nothing to do with AC-97.

Versions of MPEG are MPEG-1, -2, and -4. These are the ones that are currently being used. There is no MPEG-3. There are newere MPEG versions coming up but they are not yet being used. We said MPEG is a transport protocol. SPeaking of that, it is not physical layer protocol. Hence, it does not deal with how to modulate signals and send them over physical mediums such as coax cables and air (from satellite or TV transmitter). DVB and ATSC standards are what deal with that. DVB is the more popular one by far. Most satellite receivers (ie. decoders) have DVB-MPEG printed on them.

MPEG-1 is used in VCDs. MPEG-2 is used in DVDs, digital cable TV, satellite TV, over-the-air HDTV. MPEG-4 is used in DivX and cell phones.

Video is a 3D signal. Audio is 1D assuming mono. Even stereo is two 1D signals except for special circumstances. Pictures are 2D signals. Video also has the time axis in addition to x and y. A static picture compression method such as JPEG uses the correlation in x and y. MPEG also uses (much more than x and y) the correlation in time axis. Here is the rough outline of MPEG video compression:

Segmentation: Divide a frame into macroblocks.
Motion estimation: Move each macroblock around around in the previous frame and find it location by looking at correlation numbers. Send the block in the previous frame with the associated motion vectors. See slide 5 of the tutorial link at the bottom of this page
Residue: Take the diference between the predicted frame and the actual macroblock.
Transformation: Take DCT (Discrete Cosine Transform) of the difference. That is some sort of a frequency domain representation.
Coding: Code the frequency representation of the difference using standard data compression technology. MPEG-1 and -2 use run-length coding followed by Huffman coding. MPEG-4 also has the option to use AC.

MPEG in fact standardizes the decoder -- not the encoder. As long as the encoder does not confuse the decoder, it is flexible to compress in anyway it wants.

One may ask above why use DCT or in other words why swicth to the frequency domain. One possible explanation is as follows. What separates real-life pictures from cartoon-like pictures is texture of objects. You may be familiar with this term from your Computer Graphics class. Real-life objects do not have pure colors. They rather have patterns like fabrics. So neighboring pixels' colors may fluctuate and they are better represented by sinusoidal waveforms.

I then talked about the three types of frames: I, P, and B frames. It goes like this I B B P B B P ... Initially we send an I frame. You may think of it as an Initial Frame. However, I is for Intra-coded. This frame is coded "within itself". It only uses x and y correlation. Then, we compute the next P frame (Predixcted Frame). P frames are computed using the previous I or P frames. B frames are in between and they are Bidirectionally predicted. Once we have an I and a P frame or 2 consecutive P frames, it is easier two predict the frames in between. In reality the decoder computes the frames in the following order: I P B B. However, they are displayed as follows: I B B P. The decoder can always buffer a few frames and then display them. You can think of the I frame as a JPEG picture. In fact similar intra-coding methods are used. There is a max time after which we have to send a new I-frame. Or we can always send a new I-frame, if there is a "scene change". Since there is not much of a correlation between consecutive frames at the time of a scene change, it is cheaper to send a brand new frame.

MPEG-4 is quite an evolution over MPEG-1 and -2. For low and standard video qualities, it achieves a lot more compression. As for compression, it offers flexibility. You can plug in any compatible compressor. MPEG-2 achieves standard analog TV quality pictures at 5-6Mbps. MPEG-4 achieves the same quality at around 1.5MBps. We did the exercise of finding the bitrate of uncompressed video at NTSC quality. NTSC quality is 720x480 pixels per full frame and 30 frames per second. At a color resolution of 16 bits per pixel, we get a rate of 165Mbps. And a 90 minute uncompressed movie takes up 110GB. That fills a big hard drive completely.

MPEG-4 proposes its original compressor plus H.264 of ITU. Now it is a joint standard with ISO and is also known as MPEG-4 Part 10. MPEG-4 also supports graphics and animation. MPEG-4 does not use a fixed macro-block size. Depending on where there is more detail, finer macro-blocks can be present. Obviously, macro-block info has to be sent from the encoder to the decoder. This info can be represented as a binary tree. We did an example in class on this which resembled the following:

Speaking of quantization above, we reviewed sampling and quantization this week. We had talked about on several occasions before -- most notably when we talked about MP3. See below:

Reading materials:
    See some snapshots from the classroom board.
    MPEG tutorial from Internet: Pay attention to slides 5, 15-29.
    MPEG2 tutorial from Internet.
    Divx.
    DVD.
    MPEG.org.
    MPEG-4.