home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-06-03 | 83.5 KB | 1,745 lines |
- MPEG-2 Technical (and sometimes political) Frequently Asked Questions
- (FAQ) list.
- Copyright 1994 by Chad Fogg (cfogg@netcom.com)
- Draft 3.3 (May 10, 1994)
-
- 1. MPEG is a DCT based scheme, right?
- 2. What does the MPEG video syntax feature that codes video efficiently?
- 3. What does the syntax provide for error robustness?
- 4. What is the significance of each layer in MPEG video ?
- 5. How does the syntax facilitate parallelism?
- 6. I hear the encoder is not part of the standard?
- 7. Are some encoders better than others?
- 8. Can MPEG-1 encode higher sample rates than 352 x 240 x 30 Hz ?
- 9. What are Constrained Parameters Bitstreams (CPB) for video?
- 10. Why is Constrained Parameters so important?
- 11. Who uses constrained parameters bitstreams?
- 12. Are there ways of circumventing constrained parameters bitstreams for SIF
- class applications and decoders ?
- 13. Are there any other conformance points like CPB for MPEG-1?
- 14. What frame rates are permitted in MPEG?
- 15. Special prediction switches for MPEG-2
- 16. What is MPEG-2 Video Main Profile and Main Level?
- 17. Does anybody actually use the scalability modes?
- 18. What's the difference between Field and Frame pictures?
- 19. What do B-pictures buy you?
- 20. Why do some people hate B-frames?
- 21. Why was the 16x16 area chosen?
- 22. Why was the 8x8 DCT size chosen?
- 23. What is motion compensated prediction, and why is it a pain?
- 24. What are the various prediction modes in MPEG-2?
- 24.1 Frame:
- 24.2 Field predictions in frame-coded pictures:
- 24.3 Field predictions in field-coded pictures:
- 24.4 16x8 predictions in field-coded pictures:
- 24.5 Dual Prime prediction in frame and field-coded pictures
- 24.6 Field and frame organized macroblocks:
- 25. How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream?
- 26. What is the reasoning behind MPEG syntax symbols?
- 27. Why bother to research compressed video when there is a standard?
- 28. Where can I get a copy of the latest MPEG-2 draft?
- 29. What are the latest working drafts of MPEG-2 ?
- 30. What is the latest version of the MPEG-1 documents?
- 31. What is the evolution of ISO standard documents?
- 32. Where is a good introductory paper to MPEG?
- 33. What are some journals on related MPEG topics ?
- 34. Is there a book on MPEG video?
- 35. Is it MPEG-2 (Arabic numbers) or MPEG-II (roman)?
- 36. What happened to MPEG-3?
- 37. What is MPEG-4?
- 38. What are the scaleable modes of MPEG-2?
- 39. Why MPEG-2? Wasn't MPEG-1 enough?
- 40. What did MPEG-2 add to MPEG-1 in terms of syntax/algorithms ?
- 41. How do MPEG and JPEG differ?
- 42. How do MPEG and H.261 differ?
- 43. Is H.261 the de facto teleconferencing standard?
- 44. What is the TM rate control and adaptive quantization technique ?
- 45. How does the TM work?
- 46. What is a good motion estimation method, then?
- 47. Is exhaustive search "optimal" ?
- 48. What are some advanced encoding methods?
- 49. Is so-and-so really MPEG compliant ?
- 50. What are the tell-tale MPEG artifacts?
- 51. Where are the weak points of MPEG video ?
- 52. What are some myths about MPEG?
- 53. What is the color space of MPEG?
- 54. Don't you mean 4:1:1 ?
- 55. Why did MPEG choose 4:2:0 ? Isn't 4:2:2 the standard for TV?
- 56. What is the precision of MPEG samples?
- 57. What is all the fuss with cositing of chroma components?
- 58. How would you explain MPEG to the data compression expert?
- 59. How does MPEG video really compare to TV, VHS, laserdisc ?
- 60. What are the typical MPEG-2 bitrates and picture quality?
- 61. At what bitrates is MPEG-2 video optimal?
- 62. Why does film perform so well with MPEG ?
- 63. What is the best compression ratio for MPEG ?
- 64. Can MPEG be used to code still frames?
- 65. Is there an MPEG file format?
- 66. What are some pre-processing enhancements ?
- 67. Why use these "advanced" pre-filtering techniques?
- 68. What about post-processing enhancements?
- 69. Can motion vectors be used to measure object velocity?
- 70. How do you code interlaced video with MPEG-1 syntax?
- 71. Is MPEG patented?
- 72. How many cable box alliances are there?
- 73. Will there be an MPEG video tape format?
- 74. Where will be see MPEG in everyday life?
- 75. What is the best compression ratio for MPEG ?
- 76. Is there a MPEG CD-ROM format?
-
-
- 1. MPEG is a DCT based scheme, right?
-
- The DCT and Huffman algorithms receive the most press coverage (e.g. MPEG is
- a DCT based scheme with Huffman coding), but are in fact fairly
- insignificant. The variety of coding modes signaled to the decoder as
- context-dependent side information are chiefly responsible for the efficiency
- of the MPEG syntax.
-
- 2. What does the MPEG video syntax feature that codes video efficiently?
-
- A. Here are some of the statistical conditions and their syntax counterparts.
-
- Occlusion: forward, backwards, or bi-directional temporal prediction in B
- pictures.
-
- Smooth optical flow fields: variable length coding of 1-D prediction errors
- for motion vectors.
-
- Spatial correlation beyond 8x8 sample block boundaries: 1-D prediction of
- DC coefficients in consecutive group intra-coded macroblocks.
-
- High temporal correlation: variable on/off coding of prediction error at
- the macroblock (no-coding) or individual block (coded block pattern) level.
-
- Temporal de-correlation: forward, backwards, or bidirectional prediction.
-
- Content dependent quality: locally adaptive quantization
-
- Temporal prediction accuracy: "half-pel" sample accuracy.
-
- High locally correlated signal refresh pictures (I picture) and prediction
- errors: DCT
-
- Subjective coding: location-dependent quantization of DCT coefficients.
-
-
- 3. What does the syntax provide for error robustness?
-
- 1. Byte-aligned start codes in the coded bitstream.
- 2. End of block codes in coded blocks.
- 3. Slices.
- 4. slice_vertical_position embedded as sub-field within slice start codes.
- 5. slices commencing at regular locations in picture (MPEG-2)
-
- 4. What is the significance of each layer in MPEG video ?
-
- Sequence:
- Set of pictures sharing same sampling dimensions, bit rate,
- chromaticy (MPEG-1), quantization matrices (MPEG-1 only).
-
- Group of Pictures:
- Random access point giving SMPTE time code within sequence.
- Guaranteed to start with an I picture.
-
- Picture:
- Samples of a common plane -- "captured" from the same time instant.
-
- Slice:
- Error resynchronization unit of macroblocks.
- At the commencement of a slice, all inter-macroblock coding
- dependencies are reset. Likewise, all macroblocks within a common slice
- can be dependently coded.
-
- Macroblock:
- Least common multiple of Y, Cb, Cr 8x8 blocks in 4:2:0 sampling
- structure.
- For MPEG-1, the smallest granularity of temporal prediction.
-
- Block:
- Smallest granularity of spatial decorrelation.
-
- 5. How does the syntax facilitate parallelism?
-
- A. For MPEG-1, slices may consist of an arbitrary number of macroblocks.
- The coded bitstream must first be mapped into fixed-length elements before
- true parallelism in a decoder application can be exploited. Further, since
- macroblocks have coding dependencies on previous macroblocks within the same
- slice, the data hierarchy must be pre-processed down to the layer of DC DCT
- coefficients. After this, blocks may be independently inverse transformed
- and quantized, temporally predicted, and reconstructed to buffer memory.
- Parallelism is usually more of a concern for encoders. Macroblock motion
- estimation and some rate control stages can be processed independently. An
- encoder also has the freedom to choose the slice structure.
-
-
- 6. I hear the encoder is not part of the standard?
-
- A. The encoder rests just outside the normative scope of the standard,
- as long as the bitstreams it produces are compliant. The decoder,
- however, is almost deterministic: a given bitstream should
- reconstruct to a unique set of pictures. Statistically speaking, an
- occasional error of a Least Significant Bit is permitted as a
- result of the fact that the IDCT function is the only non-normative
- stage in the decoder (the designer is free to choose among many DCT
- algorithms and implementations). The IEEE 1180 test referenced in
- Annex A of the MPEG-1 and MPEG-2 specifications spells out the
- statistical mismatch tolerance between the Reference IDCT, which
- uses 64-bit floating point accuracy, and the Test IDCT.
-
- 7. Are some encoders better than others?
-
- A. Yes. For example, the range over which a compensated prediction
- macroblock is searched for has a great influence over final picture
- quality. At a certain point a very large range can actually become
- detrimental (it may encourage large differential motion vectors).
- Practical ranges are usually between +/- 15 and +/- 32. As the
- range doubles, for instance, the search area quadruples.
-
- 8. Can MPEG-1 encode higher sample rates than 352 x 240 x 30 Hz ?
-
- A. Yes. The MPEG-1 syntax permits sampling dimensions as high as 4095 x
- 4095 x 60 frames per second. The MPEG most people think of as "MPEG-
- 1" is actually a kind of subset known as Constrained Parameters
- bitstream (CPB).
-
- 9. What are Constrained Parameters Bitstreams (CPB) for video?
-
- A. MPEG-1 CPB are a limited set of sampling and bitrate parameters
- designed to normalize decoder computational complexity, buffer size, and
- memory bandwidth while still addressing the widest possible range of
- applications. The parameter limits were intentionally designed so that a
- decoder implementation would need only 4 Megabits of DRAM.
-
- Parameter Limit
- -------------- ---------------------------
- pixels/line 704
- lines/picture 480 or 576
- pixels*lines 352*240 or 352*288
- picture rate 30 Hz
- bit rate 1.862million bits/sec
- buffer size 40 Kilobytes (327,680 bits)
-
- The sampling limits of CPB are bounded at the ever popular SIF rate:
- 396 macroblocks (101,376 pixels) per picture if the picture rate is
- less than or equal to 25 Hz, and 330 macroblocks (84,480 pixels) per
- picture if the picture rate is 30 Hz. The MPEG nomenclature loosely
- defines a "pixel" or "pel" as a unit vector containing a complete
- luminance sample and one fractional (0.25 in 4:2:0 format) sample from
- each of the two chrominance (Cb and Cr) channels. Thus, the
- corresponding bandwidth figure can be computed as:
-
- 352 samples/line x 240 lines/picture x 30 pictures/sec x 1.5 samples/pixel
-
- or 3.8 Ms/s (million samples/sec) including chroma, but not including
- blanking intervals. Since most decoders are capable of sustaining
- VLC decoding at a faster rate than 1.8 Mbit/sec, the coded video bitrate
- has become the most often waived parameter of CPB. An encoder which
- intelligently employs the syntax tools should achieve SIF quality saturation
- at about 2 Mbit/sec, whereas an encoder producing streams containing
- only I (Intra) pictures might require as much as 4 Mbit/sec to achieve the
- same video quality.
-
- 10. Why is Constrained Parameters so important?
-
- A. It is an optimum point that allows (just barely) cost effective VLSI
- implementations in 1992 technology (0.8 microns). It also implies a
- nominal guarantee of interoperability for decoders and encoders. Since
- CPB is a canonical conformance point, MPEG devices which are not capable
- of meeting SIF rates are usually not considered to be true MPEG.
-
- 11. Who uses constrained parameters bitstreams?
-
- A. Applications which are focused on CPB are Compact Disc (White Book or CD-
- I) and computer video applications. Set-top TV decoders fall into a higher
- sampling rate category known as CCIR 601 or Broadcast rate.
-
-
- 12. Are there ways of circumventing constrained parameters bitstreams for SIF
- class applications and decoders ?
-
- A. Yes, some. Remember that CPB limits pictures by macroblock count.
- 416 x 240 x 24 Hz sampling rates are still within the constraints, but this
- would only be of benefit in NTSC (240 lines/field) displays. Deviating from
- 352 samples/line could throw off many decoder implementations which possess
- limited horizontal sample rate conversion abilities. Some decoders do in fact
- include a few rate conversion modes, with a filter usually implemented via
- binary taps (shifts and adds). Likewise, the target sample rates are usually
- limited or ratios (e.g. 640, 540, 480 pixels/line, etc.). Future MPEG
- decoders will likely include on-chip arbitrary sample rate converters,
- perhaps capable of operating in the vertical direction (although there is
- little need of this in applications using standard TV monitors, with the
- possible exception of windowing in cable box graphical user interfaces).
-
- 13. Are there any other conformance points like CPB for MPEG-1?
- A. Undocumented ones, yes. A second generation of decoder chips emerged on
- the market about 1 year after the first wave of SIF-class decoders. Both
- LSI Logic and SGS-Thomson introduced CCIR 601 class MPEG-1 decodersto fill in
- the gap between canonical MPEG-1 and the emergence of MPEG-2. Under non-
- disclosure agreement, C-Cube had the CL-950.
-
- 14. What frame rates are permitted in MPEG?
- A. A limited set is available for the choosing in MPEG-1, although "tricks"
- could be played with Systems-layer Time Stamps to convey non-standard rates.
- The set is: 23.976 Hz (3-2 pulldown NTSC), 24 Hz (Film), 25 Hz (PAL/SECAM or
- 625/60 video), 29.97 (NTSC), 30 Hz (drop-frame NTSC or component 525/60), 50
- Hz (double-rate PAL), 59.97 Hz (double rate NTSC), and 60 Hz (double-rate
- drop-frame NTSC/component 525/60 video).
-
- 15. Special prediction switches for MPEG-2
-
-
- MPEG-2 sequence
- / \
- progressive interlaced sequence
- sequence / \
- Field picture Frame picture
- / \
- Frame or field pred. Frame MB prediction only
- / \
- Field dct Frame dct
-
-
- 16. What is MPEG-2 Video Main Profile and Main Level?
-
- A. MPEG-2 Video Main Profile and Main Level is analogous to MPEG-1's CPB,with
- sampling limits at CCIR 601 parameters (720 x 480 x 30 Hz). Profiles limit
- syntax (i.e. algorithms), whereas Levels place limits on coding parameters
- (sample rates, frame dimensions, coded bitrates, etc.). Together, Video Main
- Profile and Main Level (abbreviated as MP@ML) normalize complexity within
- feasible limits of 1994 VLSI technology (0.5 micron), yet still meet the
- needs of the majority of application users.MP@ML is the conformance point for
- most cable and satellite systems.
-
- Profiles
- ======
- Simple: I and P pictures only. 4:2:0 sampling ratio. 8,9, or 10 bits DC
- precision.
- Main: I, P, and B pictures. Dual Prime with no B-pictures only. 4:2:0
- sampling ratio. 8, 9, or 10 bits sample precision.
- SNR profile:
- Spatial profile:
- High: 8,9,10, or 11 bits sample precision. 4:2:2 and 4:4:4 sampling ratio.
-
-
- Level
- ====
- Simple: SIF video rate (3.041280 Mhz), 4 Mbit/sec, 0.489472 Mbit VBV
- buffer, 64 vertical in frame, 32Vertical in field, 1:7 fcode hor.
-
- Main: CCIR 601 video rate (10.368 Mhz), 15 Mbit/sec, 1.835008 Mbit VBV
- buffer, 128 V in frame, 64 V in field, 1:8 f_code Hor.
-
- High 1440: 1440 x 1152 x 30 Hz (47.0016 Mhz), 60 Mbit/sec. 7.340032 Mbit
- VBV buffer, 128 V in Fe, 1:9 fcode H.
-
- High: 1920 x 1152 x 30 Hz (62.6688 Mhz), 80 Mbit/sec. 9.787392 Mbit VBV
- buffer.
- 1:9 fcode H
-
- 17. Does anybody actually use the scalability modes?
-
- A. At this time, scalability has found itself a limited number of
- applications, although research is definitely underway for its use in HDTV.
- Experiments have been demonstrated in Europe where, for example, PAL-rate
- video (720 x 576 x 25 fps) is embedded in the same stream as HDTV rate video
- (1440 x 1152 x 25 fps). The Nov. 1992 VADIS experiment divided the base layer
- (PAL) and enhancement into 4 and 16 Mbit/sec channels, respectively. The U.S.
- Grand Alliance favors HDTV simulcasting (separate NTSC analog and digital
- HDTV broadcasts). Temporal scalability is the pet scalability mode as the
- possible future solution for coding 60 Hz progressive sequences while
- maintaining backwards compatibility with early-wave equipment (e.g. 1920 x
- 1080 x 30 Hz displays) . To elaborate, the first wave receivers of the late
- 1990's would be limited to 62at 0 Hz interlaced/30 Hz progressive HDTV
- decoders. Essentially, 60 interlaced fields would be coded in a, for
- example, 16 Mbit/sec stream in 1996, and when VLSI processes shift another
- thousand or so angstroms down the wavelength scale, an 8 Mbit/sec enhancement
- layer containing the coded "high pass" between 60 Hz progressive and 60 Hz
- interlaced would be simulcasted or multiplexed. Several corporate mouths
- have been known to water at the mention of charging the quality conscious
- subscriber an extra fee for the enhancement layer.
-
- 18. What's the difference between Field and Frame pictures?
- A. A frame-coded picture consists of samples from both even and odd fields.
- A
- frame picture is coded in progressive order (an even line, then an odd line,
- etc.) and in the case of MPEG-2, may optionally switch between field and
- frame order on a macroblock basis. The Display Process, which is *almost*
- completely outside the scope of the MPEG specification, can chose to re-
- interlace the picture by displaying the odd and even lines at different times
- (16 milliseconds apart for 60 Hz displays). In fact, most pictures,
- regardless of whether they were coded as a Field or Frame, end up being
- displayed interlaced due to the fact that most TV sets are interlaced.
-
- 19. What do B-pictures buy you?
-
- A. Since bi-directional macroblock predictions are an average of two
- macroblock areas, noise is reduced at low bit rates (like a 3-D filter, if
- you will). At nominal MPEG-1 video (352 x 240 x 30, 1.15 Mbit/sec) rates, it
- is said that B-frames improves SNR by as much as 2 dB. (0.5 dB gain is
- usually considered worth-while in MPEG). However, at higher bit rates, B-
- frames become less useful since they inherently do not contribute to the
- progressive refinement of an image sequence (i.e. not used as prediction by
- subsequent coded frames). Regardless, B-frames are still politically
- controversial.
-
- B pictures are interpolative in two ways: 1. predictions in the bi-
- directional macroblocks are an average from block areas of two pictures 2. B
- pictures fill in or interpolate the 3-D video signal over a 33 or 25
- millisecond picture period without contributing to the overall signal quality
- beyond that immediate point in time. In other words, a B pictures,
- regardless of its internal make-up of macroblock types, has a life limited to
- its immediate self. As mentioned before, its energy does not propagate into
- other frames. In a sense, bits spent on B pictures are wasted.
-
-
- 20. Why do some people hate B-frames?
-
- A. Computational complexity, bandwidth, delay, and picture buffer size are
- the four B-frame Pet Peeves. Computational complexity in the decoder is
- increased since a some macroblock modes require averaging between two
- macroblocks.
-
- Worst case, memory bandwidth is increased an extra 15.2 MByte/s (4:2:0 601
- rates, not including any half pel or page-mode overhead) for this extra
- prediction. An extra picture buffer is needed to store the future prediction
- reference (bi-directionality). Finally, extra delay is introduced in
- encoding since the frame used for backwards prediction needs to be
- transmitted to the decoder before the intermediate B-pictures can be decoded
- and displayed.
-
- Cable television (e.g. -- more like i.e.-- General Instruments) have been
- particularly adverse to B-frames since, for CCIR 601 rate video, the extra
- picture buffer pushes the decoder DRAM memory requirements past the magic 8-
- Mbit (1 Mbyte) threshold into the evil realm of 16 Mbits (2 Mbyte)....
- although 8-Mbits is fine for 352 x 480 B picture sequence. However, cable
- often forgets that DRAM does not come in convenient high-volume (low cost) 8-
- Mbit packages as does the friendly 4-Mbit and 16-Mbit. In a few years, the
- cost difference between 16 Mbit and 8 Mbit will become insignificant compared
- to the bandwidth savings gain through higher compression. For the time
- being, some cable boxes will start with 8-Mbit and allow future drop-in
- upgrades to the full 16-Mbit.
-
- 21. Why was the 16x16 area chosen?
-
- A. The 16x16 area corresponds to the Least Common Multiple (LCM) of 8x8
- blocks, given the normative 4:2:0 chroma ratio. Starting with medium
- size images, the 16x16 area provides a good balance between side
- information overhead & complexity and motion compensated prediction
- accuracy. In gist, 16x16 seemed like a good trade-off.
-
- 22. Why was the 8x8 DCT size chosen?
- A. Experiments showed little improvements with larger sizes vs. the
- increased complexity. A fast DCT algorithm will require roughly double
- the arithmetic operations per sample when the transform point size is
- doubled. Naturally, the best compaction efficiency has been demonstrated
- using
- locally adaptive block sizes (e.g. 16x16, 16x8, 8x8, 8x4, and 4x4) [See
- Baker and Sullivan]. Naturally, this introduces additional side information
- overhead and forces the decoder to implement programmable or hardwired
- recursive DCT algorithms. If the DCT size becomes too large, then more edges
- (local discontinuities) and the like become absorbed into the transform
- block, resulting in wider propagation of Gibbs (ringing) and other phenomena.
- Finally, with larger transform sizes, the DC term is even more critically
- sensitive to quantization noise.
-
- 23. What is motion compensated prediction, and why is it a pain?
-
- A. MCP in the decoder can be thought of as having four stages:
-
- 1. Motion vector computation
- 2. Prediction retrieval
- various predictions are 16x16, 16x8, 8x4, 8x8 plus any half-pel
- overhead (e.g. 17x16, 17x17, etc).
- 3. Filtering
- 3.1 Forming half-pel predictions through bi-linear interpolation.
- 3.2 Averaging two predictions together (B macroblocks, Dual Prime)
- 4. Combination and ordering
- 4.1 combining 1 or 2 predictions from stage three into upper and
- lower halves (16 x 8, field in frame)
- 4.2 interleaving or grouping together odd and even lines in frame
- picture predictions.
-
- The final, combined prediction is always a 16x16 block of luminance and
- 8x8 block of chrominance, just like we experience in MPEG-1.
-
- A single motion vector can be associated with each source, hence a macroblock
- can have as many as 4 motion vectors.
-
- 24. What are the various prediction modes in MPEG-2?
-
- 24.1 Frame:
- Predictions are formed from a 16 x 16 pixel area in a previously
- reconstructed frame. Identical to MPEG-1. There can be only one source in
- forward or backward predicted macroblocks, and two sources in bi-directional
- macroblocks. The prediction frame itself may have been coded as either a
- frame or two fields, however once a frame is reconstructed, it is simply a
- frame as far as future predictions are concerned.
-
- 24.2 Field predictions in frame-coded pictures:
-
- Separate predictions are formed for the top (8 lines from field 1)and bottom
- (8 lines from field 2) portions of the macroblock. A total of two motion
- vectors in forward or backward predictions, four in bi-directional.
-
- 24.3 Field predictions in field-coded pictures:
-
- Predictions are formed from the two most recently decoded fields. Prediction
- sizes are 16x16, however the 16 lines have a corresponding projection onto a
- 16x32 pixel area of a frame. One motion vector for forward or backward
- predictions, and two for bi-directional.
-
- 24.4 16x8 predictions in field-coded pictures:
-
- Like field macroblocks in frame-coded pictures, the upper and lower 8 lines
- in this macroblock mode can have different predictions (hence two motion
- vectors). This mode compensates for the reduced temporal prediction
- precision of field picture macroblocks (a result of the fact that fields
- inherently possess half the number of lines that frames do). The field
- prediction area projected onto a frame is restored to 16 lines. 2 motion
- vectors for backwards or forwards, 4 for bi-directional.
-
- 24.5 Dual Prime prediction in frame and field-coded pictures
-
- Predictions for the current macroblock are formed from the average of two 16
- x 8 line areas from the two most recently decoded fields. Dual Prime was
- devised as an alternative for B pictures in low delay applications, but still
- offers many of the signal
- quality benefits of B-pictures. Dual Prime requires one less prediction
- picture buffer, but still retains the same instantaneous prediction bandwidth
- of a B picture system. As an alternative to coding separate motion vectors
- for each of the upper and lower 16x8 areas, a full motion vector is sent for
- the first area, and a +1, 0, or -1 differential vector (variable length
- coded) is specified for the second prediction area. A macroblock will have
- total of two full motion vectors and two differential vectors in frame-coded
- pictures. Due to the prediction bandwidth overhead, Main Profile restricts
- the use of Dual Prime prediction to P picture sequences only. High Profile
- permits use of Dual Prime in B pictures.
-
- 24.6 Field and frame organized macroblocks:
-
- Originally intended as a cheaper means of achieving field-decorrelation in
- frame-coded pictures without the fussy overhead of separate field prediction
- estimates, the dct coefficients (quantized prediction error for a given
- macroblock) may be organized into either a field or frame pattern.
- Essentially this means that the prediction error for the combined 16x16
- macroblock may be grouped into field or frame blocks. A bit in the macroblock
- header (dct_type) indicates whether the upper and lower portions of the
- macroblock are to be interleaved (frame organized) or remain separated (field
- organized).
-
- 25. How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream?
-
- A. All MPEG-2 bitstreams must contain specific extension headers that
- *immediately* follow MPEG-1 headers. At the highest layer, for example,
- the MPEG-1 style sequence_header() is followed by sequence_extension()
- exclusive to MPEG-2. Some extension headers are specific to MPEG-2 profiles.
- For example, sequence_scalable_extension() is not allowed in Main Profile
- bitstreams.
-
- A simple program need only scan the coded bitstream for byte-aligned start
- codes to determine whether the stream is MPEG-1 or MPEG-2.
-
-
- 26. What is the reasoning behind MPEG syntax symbols?
-
- A. Here are some of the Whys and Wherefores of MPEG symbols:
-
- Start codes
- These 32-bit byte-aligned codes provide a mechanism for cheaply
- searching coded bitstreams for commencement of various layers of video
- without having to actually parse variable-length codes or perform any
- decoder arithmetic. Start codes also provide a mechanism for
- resynchronization in the presence of bit errors.
-
- Coded block pattern (CBP --not to be confused with Constrained
- Parameters!) When the frame prediction is particularly good, the
- displaced frame difference (DFD, or prediction error) tends to be small,
- often with entire block energy being reduced to zero after quantization.
- This usually happens only at low bit rates. Coded block patterns
- prevent the need for transmitting EOB symbols in those zero coded
- blocks.
-
- DCT_coefficient_first
- Each intra coded block has a DC coefficient. With coded block patterns
- signaling all possible combinations of all-zero valued blocks, the
- dct_coef_first mechanism assigns a different meaning to the VLC codeword
- that would otherwise represent EOB as the first coefficient.
-
-
- End of Block:
- Saves unnecessary run-length codes. At optimal bitrates, there tends to
- be few AC coefficients concentrated in the early stages of the zig-zag
- vector. In MPEG-1, the 2-bit length of EOB implies that there is an
- average of only 3 or 4 non-zero AC coefficients per block. In MPEG-2
- Intra (I) pictures, with a 4-bit EOB code, this number is between 9 and
- 16 coefficients. Since EOB is required for all coded blocks, its absence
- can signal that a syntax error has occurred in the bitstream.
-
-
- Macroblock stuffing
- A genuine pain for VLSI implementations, macroblock stuffing was
- introduced to maintain smoother, constant bitrate control in MPEG-1.
- However, with normalized complexity measures and buffer management
- performed a priori (pre-frame, pre-slice, and pre-macroblock) in the
- MPEG-2 encoder test model, the need for such localized smoothing
- evaporated. Stuffing can be achieved through virtually unlimited slice
- start code padding if required. A good rule of thumb: if you find
- yourself often using stuffing more than once per slice, you probably
- don't have a very good rate control algorithm. Anyway, macroblock
- stuffing is now illegal in MPEG-2, so don t start using it if you
- already haven t.
-
-
- MPEG's modified Huffman VLC tables
- The VLC tables in MPEG are not Huffman tables in the true sense of
- Huffman coding, but are more like the tables used in Group 3 fax. They
- are entropy constrained, that is, non-downloadable and optimized for a
- limited range of bit rates (sweet spots). With the exception of a few
- codewords, the larger tables were carried over from the H.261 standard
- of 1990. MPEG-2 added an "Intra table". Note that the dct_coefficient
- tables assume positive/negative coefficient pmf symmetry.
-
-
- 27. Why bother to research compressed video when there is a standard?
- A. Despite the worldwide standard, many areas remain open for research:
- advanced encoding and pre-processing, motion estimation, macroblock
- decision models, rate control and buffer management in editing
- environments, etc. There's practically no end to it.
-
- 28. Where can I get a copy of the latest MPEG-2 draft?
-
- A. Contact your national standards body (e.g. ANSI Sales in NYC for the
- U.S., British Standards Institute in the UK, etc.). A number of private
- organizations offer ISO documents.
-
- 29. What are the latest working drafts of MPEG-2 ?
- A. MPEG-2 has reached voting document of the Draft International Standard for
- :
-
- Information Technology -- Generic Coding of Moving Pictures and
- Associated Audio. Recommendation H.262, ISO/IEC Draft International Standard
- 13818-2. [produced March 25, 1994, not yet approved by voting process].
-
- Audio is Part 1, Video Part 2, and Systems is Part 3. A committee draft for
- Conformance (Part 4) is expected in Novemeber 1994, as well as the Technical
- Report on Software Simulation (Part 5).
-
- 30. What is the latest version of the MPEG-1 documents?
-
- A. Systems (ISO/IEC IS 11172-1), Video (ISO/IEC IS 11172-2), and Audio
- (ISO/IEC IS 11172-3) have reached the final document stage. Part 4,
- Conformance Testing, is currently DIS
-
-
- 31. What is the evolution of ISO standard documents?
-
- A. In chronological order:
-
- ISO/Committee notation Author's notation
- --------------------------------------- -------------------------
- Problem (unofficial first stage) Barroom Witticism
- New work Item (NI) Napkin Item
- New Proposal (NP) Need Permission
- Working Draft (WD) We're Drunk
- Committee Draft (CD) Calendar Deadlock
- Draft International Standard (DIS) Doesn't Include Substance
- International Standard (IS) Induced patent Statements
-
- 32. Where is a good introductory paper to MPEG?
-
- A. Didier Le Gall, "MPEG: A Video Compression Standard for Multimedia
- Applications," Communications of the ACM, April 1991, Vol.34, No.4, pp. 47-58
-
- 33. What are some journals on related MPEG topics ?
- A.
-
- IEEE Transactions on Consumer Electronics
- IEEE Transactions on Broadcasting
- IEEE Transactions on Circuits and Systems for Video Technology
- Advanced Electronic Imaging
- Electronic Engineering Times (EE Times -- more tabloid coverage. Unfortunate
- columns by Richard Doherty)
- IEEE Int'l Conference on Acoustics, Speech, and Signal Processing
- (ICASSP)
- International Broadcasting Convention (IBC)
- Society of Motion Pictures and Television Engineers (SMPTE)
- SPIE conference on Visual Communications and Image Processing
- SPIE conference on Video Compression for Personal Computers
- IEEE Multimedia [first edition Spring 1994]
-
-
- 34. Is there a book on MPEG video?
-
- A. Yes, there will be a book published sometime in 1994 by the same authors
- who brought you the JPEG book (Bill Pennebaker, Joan Mitchell). Didier Le
- Gall will be an additional co-author, and will insure digressions into, e.g.
- arithmetic coding aspects, be kept to a minimum :-)
-
- 35. Is it MPEG-2 (Arabic numbers) or MPEG-II (roman)?
-
- A. Committee insiders most often use the Arabic notation with the hyphen,
- e.g. MPEG-2. Only the most retentive use the official designation: Phase 2.
- In fact, M.P.E.G. itself is a nickname. The official title is: ISO/IEC JTC1
- SC29 WG11. The militaristic lingo has so far managed to keep the enemy
- (DVI) confused and out of the picture.
-
- ISO: International Organization for Standardization
- IEC: International Electrotechnical Commission
- JTC1: Joint Technical Committee 1
- SC29: Sub-committee 29
- WG11: Work Group 11 (moving pictures with... uh, audio)
-
- 36. What happened to MPEG-3?
-
- A. MPEG-3 was to have targeted HDTV applications with sampling dimensions up
- to 1920 x 1080 x 30 Hz and coded bitrates between 20 and 40 Mbit/sec. It was
- later discovered that with some (compatible) fine tuning, MPEG-2 and MPEG-1
- syntax worked very well for HDTV rate video. The key is to maintain an
- optimal balance between sample rate and coded bit rate.
-
- Also, the standardization window for HDTV was rapidly closing. Europe and
- the United States were on the brink of committing to analog-digital
- subnyquist hybrid algorithms (D-MAC, MUSE, et al). European all-digital
- projects such as HD-DIVINE and VADIS demonstrated better picture quality with
- respect to bandwidth using the MPEG syntax. In the United States, the
- Sarnoff/NBC/Philips/Thomson HDTV consortium had used MPEG-1 syntax from the
- beginning of its all-digital proposal, and with the exception of motion
- artifacts (due to limited search range in the encoder), was deemed to have
- the best picture quality of all three digital proponents. HDTV is now part of
- the MPEG-2 High-1440 Level and High Level toolkit.
-
- 37. What is MPEG-4?
- A. MPEG-4 targets the Very Low Bitrate applications defined loosely as
- having sampling dimensions up to 176 x 144 x 10 Hz and coded bit rates
- between 4800 and 64,000 bits/sec. This new standard would be used, for
- example, in low bit rate videophones over analog telephone lines.
-
- This effort is in the very early stages. Morphology, fractals, model
- based, and anal retentive block transform coding are all in the
- offering. MPEG-4 is now in the application identification phase.
-
- Scaleable modes of MPEG-2
-
- 38. What are the scaleable modes of MPEG-2?
- A. Scaleable video is permitted only in the High Profiles.
-
- Currently, there are four scaleable modes in the MPEG-2 toolkit. These modes
- break MPEG-2 video into different layers (base, middle, and high layers)
- mostly for purposes of prioritizing video data. For example, the high
- priority channel (bitstream) can be coded with a combination of extra error
- correction information and/or increased signal strength (i.e. higher Carrier-
- to-Noise ratio or lower Bit Error Rate) than the lower priority channel. For
- example, in HDTV, the high priority bitstream (720 x 480) can be decoded
- under noise conditions were the lower priority (1440 x 960) cannot. This is
- part of the "graceful degradation concept. Breaking a video signal into two
- streams (base and enhancements) has a penalty, however. Usually less than
- 1.5 dB.
-
- Another purpose of salability is complexity division. A standard TV set need
- only decode the 720 x 480 channel, thus requiring a less expensive decoder
- processor than a TV set wishing to display 1440 x 960. This is known as
- simulcasting.
-
- A brief summary of the MPEG-2 video scalability modes:
-
- Spatial Scalablity-- Useful in simulcasting, and for feasible software
- decoding of the lower resolution, base layer. This spatial domain
- method codes a base layer at lower sampling dimensions (i.e.
- "resolution") than the upper layers. The upsampled reconstructed lower
- (base) layers are then used as prediction for the higher layers.
-
- Data Partitioning-- Similar to JPEG's frequency progressive mode, only
- the slice layer indicates the maximum number of block transform
- coefficients contained in the particular bitstream (known as the
- "priority break point"). Data partitioning is a frequency domain method
- that breaks the block of 64 quantized transform coefficients into two
- bitstreams. The first, higher priority bitstream contains the more
- critical lower frequency coefficients and side informations (such as DC
- values, motion vectors). The second, lower priority bitstream carries
- higher frequency AC data.
-
- SNR Scalability-- Similar to the point transform in JPEG, SNR
- scalability is a spatial domain method where channels are coded at
- identical sample rates, but with differing picture quality (achieved through
- quantization step sizes). The higher priority bitstream contains base
- layer data that can be added to a lower priority refinement layer to
- construct a higher quality picture.
-
- Temporal Scalability--- A temporal domain method useful in, e.g.,
- stereoscopic video. The first, higher priority bitstreams codes video
- at a lower frame rate, and the intermediate frames can be coded in a
- second bitstream using the first bitstream reconstruction as prediction.
- In stereoscopic vision, for example, the left video channel can be
- prediction from the right channel.
-
- Other scalability modes were experimented with in MPEG-2 video (such as
- Frequency Scalability), but were eventually dropped in favor of methods
- that demonstrated comparable or better picture quality with greater
- simplicity.
-
-
- 39. Why MPEG-2? Wasn't MPEG-1 enough?
-
- A. MPEG-1 was optimized for CD-ROM or applications at about 1.5
- Mbit/sec. Video was strictly non-interlaced (i.e. progressive). The
- international cooperation executed well enough for MPEG-1, that the committee
- began to address applications at broadcast TV sample rates using the
- CCIR 601 recommendation (720 samples/line by 480 lines per frame by 30
- frames per second or about 15.2 million samples/sec including chroma) as
- the reference.
-
- Unfortunately, today's TV scanning pattern is interlaced. This
- introduces a duality in block coding: do local redundancy areas (blocks)
- exist exclusively in a field or a frame.(or a particle or wave) ? The
- answer of course is that some blocks are one or the other at different
- times, depending on motion activity. The additional man years of
- experimentation and implementation between MPEG-1 and MPEG-2 improved
- the method of block-based transform coding.
-
-
- 40. What did MPEG-2 add to MPEG-1 in terms of syntax/algorithms ?
- A. Here is a brief summary:
-
- Sequence layer:
- More aspect ratios. A minor, yet necessary part of the syntax.
-
- Horizontal and vertical dimensions are now required to be a multiple of
- 16 in frame coded pictures, and the vertical dimension must be a
- multiple of 32 in field coded pictures.
-
- 4:2:2 and 4:4:4 macroblocks were added in the Next profiles.
-
- Syntax can now signal frame sizes as large as 16383 x 16383.
-
- Syntax signals source video type (NTSC, PAL, SECAM, MAC, component) to
- help post-processing and display.
-
- Source video color primaries (609, 170M, 240M, D65, etc.) and opto-
- electronic transfer characteristics (709, 624-4M, 170M etc.) can be
- indicated.
-
- Four scaleable modes [see scalability discussion]
-
- Picture layer:
- All MPEG-2 motion vectors are specified to a half-pel sample grid.
-
- DC precision can be user-selected as 8, 9, 10, or 11 bits.
-
- New scalar quantization matrices may be downloaded once per picture. In High
- profile, separate chrominance matrices now exist (Y and C no longer have to
- share)
-
- Concealment motion vectors were added to I-pictures in order to increase
- robustness from bit errors. I pictures are the most critical and sensitive
- picture in a group of pictures.
-
- A non-linear macroblock quantization factor providing a wider dynamic
- range, from 0.5 to 56, than the linear MPEG-1 (1 to 32) range. Both are
- sent as a 5-bit FLC side information in the macroblock and slice
- headers.
-
- New Intra-VLC table for dct_coefficient_next (AC run-level events) that
- is a better match for the histogram of Intra-coded pictures. EOB is 4
- bits. The old table, dct_coef_next, are reserved for use in non-intra
- pictures (P, B), although they new table can be used for Intra-coded
- macroblocks in P and B pictures as well.
-
- Alternate scanning pattern that (supposedly) improves entropy coding
- performance over the original Zig-Zag scan used in H.261, JPEG, and MPEG-1.
- The extra scanning pattern is geared towards interlaced video.
-
- Syntax to signal an irregular 3:2 pulldown process (repeat_field_first flag)
-
- Progressive and interlaced frame coding
-
- Syntax to indicate source composite video characteristics useful in post-
- processing operations. (v-axis, field sequence, sub_carrier, phase,
- burst_amplitude, etc.)
-
- Pan & scanning syntax that tells decoder how to, for example, window a
- 4:3 image within a wider 16:9 aspect ratio coded image. Vertical pan
- offset has 1/16th pixel accuracy.
-
- Macroblock layer:
- Macroblock stuffing is now illegal in MPEG-2 (hurray!!). If stuffing is
- really needed, the encoder can pad slice start codes.
-
- Two organizations for macroblock coefficients (interlaced and progressive)
- signaled by dct_type flag.
-
- Now only one run-level escape code code (24-bits) instead of the single (20-
- bits) and double escape (28-bits) in MPEG-1.
-
- Improved mismatch control in quantization over the original oddification
- method in MPEG-1. Now specifies adding or subtracting one to the 63rd
- AC coefficient depending on parity of the summed coefficients. MPEG-2
- mismatch control is performed on the transform coefficients, whereas in MPEG-
- 1, it is applied to the quantized transform coefficients.
-
- Many additional prediction modes (16x8 MC, field MC, Dual Prime) and,
- correspondingly, macroblock modes.
-
- Overall, MPEG-2's greatest compression improvements over MPEG-1 are:
- prediction modes, Intra VLC table, DC precision, non-linear macroblock
- quantization. Implementation improvements: macroblock stuffing was
- eliminated.
-
- 41. How do MPEG and JPEG differ?
-
- A. The most fundamental difference is MPEG's use of block-based motion
- compensated prediction (MCP)---a method falling into the general category of
- temporal DPCM.
-
- The second most fundamental difference is in the target application.
- JPEG adopts a general purpose philosophy: independence from color space
- (up to 255 components per frame) and quantization tables for each
- component. Extended modes in JPEG include two sample precision (8 and
- 12 bit sample accuracy), combinations of frequency progressive, spatial
- hierarchically progressive, and amplitude (point transform) progressive
- scanning modes. Further color independence is made possible thanks to
- downloadable Huffman tables (up to one for each component.)
-
- Since MPEG is targeted for a set of specific applications, there is only
- one color space (4:2:0 YCbCr), one sample precision (8 bits), and one
- scanning mode (sequential). Luminance and chrominance share quantization
- and VLC tables. MPEG adds adaptive quantization at the macroblock (16 x
- 16 pixel area) layer. This permits both smoother bit rate control and
- more perceptually uniform quantization throughout the picture and image
- sequence. However, adaptive quantization is part of the Enhanced JPEG
- charter (ISO/IEC 10918-3) currently in verification stage. MPEG variable
- length coding tables are non-downloadable, and are therefore optimized
- for a limited range of compression ratios appropriate for the target
- applications.
-
- The local spatial decorrelation methods in MPEG and JPEG are very
- similar. Picture data is block transform coded with the two-dimensional
- orthanormal 8x8 DCT, with asymmetric basis vectors about time (aka DCT-
- II). The resulting 63 AC transform coefficients are mapped in a zig-zag
- pattern (or alternative scan pattern in MPEG-2) to statistically
- increase the runs of zeros. Coefficients of the vector are then
- uniformly scalar quantized, run-length coded, and finally the run-length
- symbols are variable length coded using a canonical (JPEG) or modified
- Huffman (MPEG) scheme. Global frame redundancy is reduced by 1-D DPCM
- of the block DC coefficients, followed by quantization and variable
- length entropy coding of the quantized DC coefficient.
-
- MCP DCT ZZ
- Q
- Frame -> 8x8 spatial block -> 8x8 frequency block -> Zig-zag scan ->
-
- RLC VLC
- quanitzation -> run-length coding -> variable length coding.
-
- The similarities have made it possible for the development of hard-wired
- silicon that can code both standards. Even some highly microcoded
- architectures employing hardwired instruction primitives or functional
- blocks benefit from JPEG/MPEG similarities. There are many additional
- yet minor differences. They include:
-
- 1. In addition to the 8-bit mode, DCT and quantization precision
- in MPEG has a 9-bit and 12-bit mode, respectively, exclusively in non-
- intra coded macroblocks. A 1-bit expansion takes place in the
- macroblock difference operation.
-
- 2. Mismatch control in MPEG-1 forces quantized coefficients to
- become odd values (oddification). JPEG does not employ any mismatch
- mechanism.
-
- 3. JPEG run-length coding produces run-size tokens (run of zeros,
- non-zero coefficient magnitude) whereas MPEG produces fully concatenated
- run-level tokens that do not require magnitude differential bits.
-
- 4. DC values in MPEG-1 are limited to 8-bit precision (a constant
- stepsize of 8), whereas JPEG DC precision can occupy all possible 11-
- bits. MPEG-2, however, re-introduced extra DC precision critical even
- at high compression ratios.
-
-
- Difference between MPEG and H.261
-
- 42. How do MPEG and H.261 differ?
-
- A. H.261, also known as Px64, was targeted for teleconferencing
- applications where motion is naturally more limited. Motion vectors are
- restricted to a range of +/- 15 pixel unit displacements. Prediction
- accuracy is reduced since H.261 motion vectors are specified to only
- integer-pel accuracy. Other quality syntactic differences include: no
- B-pictures, inferior mismatch control.
-
- 43. Is H.261 the de facto teleconferencing standard?
-
- A. Not exactly. To date, about seventy percent of the industrial
- teleconferencing hardware market is controlled by PictureTel of Mass.
- The second largest market controller is Compression Labs of Silicon
- Valley. PictureTel hardware includes compatibility with H.261 as a
- lowest common denominator, but when in communication with other
- PictureTel hardware, it can switch to a mode superior at low bit rates
- (less than 300kbits/sec). In fact, over 2/3 of all teleconferencing is
- done at two-times switched 56 channel (~P = 2) bandwidth. ISDN is still
- expensive. In each direction, video and audio are coded at an aggregate
- rate of 112 kbits/sec (2*56 kbits/sec). The PictureTel proprietary
- compression algorithm is acknowledged to be a combination of spatial
- pyramid, lattice vector quantizer, and an unidentified entropy coding
- method. Motion compensation is considerably more refined and
- sophisticated than the 16x16 integer-pel block method specified in
- H.261.
-
- The Compression Labs proprietary algorithm also offers significant
- improvement over H.261 when linked to other CLI hardware. Local
- decorrelation is based on a DCT-VQ hybrid.
-
- Currently, ITU-TS (International Telecommunications Union--
- teleconferencing Sector), formerly CCITT, is quietly defining an
- improvement to H.261 with the participation of industry vendors.
-
-
- Rate control
-
- 44. What is the TM rate control and adaptive quantization technique ?
-
- A. The Test model (MPEG-2) and Simulation Model (MPEG-1) were not, by
- any stretch of the imagination, meant to epitomize state-of-the art
- encoding quality. They were, however, designed to exercise the syntax,
- verify proposals, and test the *relative* compression performance of
- proposals in a timely manner that could be duplicated by co-
- experimenters. Without simplicity, there would have been no doubt
- endless debates over model interpretation. Regardless of all else, more
- advanced techniques would probably trespass into proprietary territory.
-
- The final test model for MPEG-2 is TM version 5b, aka TM version 6. The
- final MPEG-1 simulation model is version 3. The MPEG-2 TM rate control
- method offers a dramatic improvement over the SM method. TM adds more
- accurate estimation of macroblock complexity through use of limited a
- priori information. Macroblock quantization adjustments are computed on
- a macroblock basis, instead of once-per-slice.
-
- 45. How does the TM work?
- A. Rate control and adaptive quantization are divided into three steps:
-
- Step One:Bit Allocation
-
- In Complexity Estimation, the global complexity measures assign
- relative weights to each picture type (I,P,B). These weights (Xi, Xp,
- Xb) are reflected by the typical coded frame size of I, P, and B
- pictures (see typical frame size discussion). I pictures are usually
- assigned the largest weight since they have the greatest stability
- factor in an image sequence. B pictures are assigned the smallest
- weight since B energy do not propagate into other pictures and are usually
- highly correlated with neighboring P and I pictures.
-
- The bit target for a frame is based on the frame type, the remaining number
- of bits left in the Group of Pictures (GOP) allocation, and the immediate
- statistical history of previously coded pictures.
-
- Step Two: Rate Control
-
- Rate control attempts to adjust bit allocation if there is significant
- difference between the target bits (anticipated bits) and actual coded
- bits for a block of data. If the virtual buffer begins to overflow, the
- macroblock quantization step size is increased, resulting in a smaller
- yield of coded bits in subsequent macroblocks. Likewise, if underflow
- begins, the step size is decreased. The Test Model approximates that the
- target
- picture has spatially uniform distribution of bits. This is a safe
- approximation since spatial activity and perceived quantization noise
- are almost inversely proportional. Of course, the user is free to
- design a custom distribution, perhaps targeting more bits in areas that
- contain text, for example.
-
-
- Step Three: Adaptive Quantization
-
- The final step modulates the macroblock quantization step size obtained in
- Step 2 by a local activity measure. The activity measure itself is normalized
- against the most recently coded picture of the same type (I, P, or B). The
- activity for a macroblock is chosen as the minimum among the four 8x8 block
- luminance variances. Choosing the minimum block is part of the concept that
- a macroblock is no better than the block of highest visible distortion
- (weakest link in the chain).
-
- 46. What is a good motion estimation method, then?
-
- A. When shopping for motion vectors, the three basic characteristics
- are: Search range, search pattern, and matching criteria. Search
- pattern has the greatest impact on finding the best vector. Hierarchical
- search patterns first find the best match between downsampled images of
- the reference and target pictures and then refine the vector through
- progressively higher resolutions. When compared to other fast methods,
- hierarchical patterns are less likely to be confused by extremely local
- distortion minimums as being a best match. Also note that subsampled search
- and hierarchical search are not synonymous.
-
- Q. Is there a limit to the length of motion vectors?
-
- The search area is unlimited, but the reconstructed motion vectors must
- not:
-
- a. point beyond the picture boundaries (1 <= MV_x <= luminancewidth -
- 16) and (1 <= MV_y <= luminanceheight - 16). The - 16 is due to the
- fact that the motion vector origin is the upper left hand corner of a
- macroblock)
-
- b. In Constrained Parameters MPEG-1, the motion vector is limited to a
- range of [-64,+63.5] luminance samples with half-pel accuracy, and [-
- 128,+127.5] with integer pel accuracy. Break the constrained parameters
- rules and your video sequence will not likely display on many hardware
- devices.
-
- c. In MPEG-2 Video Main Profile at Main Level, the motion vectors are
- always on a half-pel co-ordinate grid, and the vertical range is
- restricted to [-64, +63.5], and the horizontal limit is [-256,+255.5].
-
- d. in MPEG-1, the syntactic limit of the motion vector is [-1024,+1023]
- integer pel, horizontal and vertical.
-
- e. in MPEG-2, the syntactic limit of the motion vector is [-2048,+2047.5]
- horizontal, [-1024,+1023.5] vertical.
-
-
- 47. Is exhaustive search "optimal" ?
-
- A. Definitely not in the context of block-based MCP video. Since one
- motion vector represents the prediction of 256 pixels, divergent pixels
- within the macroblock are misrepresented by the "global" vector. This
- leads back to the general philosophy of block-based coding as an
- approximation technique. In their ICASSP'93 paper, Sullivan discusses ways in
- which block-based prediction schemes can solve part of this problem.
-
- Exhaustive search may find blocks with the least distortion (displaced frame
- difference) but will not produce motion vectors with the lowest entropy.
-
- 48. What are some advanced encoding methods?
-
- Quantizer feedback: determine the dependent quantization stepsize by
- modeling quantization error propagating over multiple pictures. [Uz/et
- al ICASSP 93, Ortega/Vetterli/et al ICASSP 93]
-
- Smoothness constraint placed on local activity measures. immediate blocks
- outside target macroblock are considered when selecting macroblock
- quantization stepsize .[Thomson/Savitier patent]
-
- Horizontal variance: measure variance between columns of pixels in addition
- to the traditional measure of variance along rows (lines) when making
- field/frame macroblock prediction decision.
-
- DFD energy: examine DFD energy/variance when making Intra/Non-intra
- macroblock decision.
-
- Activity measures: use total bits from a first-pass encoding of a picture or
- macroblock as a measure of the activity. Coded bits is a more accurate
- reflection of local complexity than variance. [Thomson/Savitier patent]
-
- motion vector cost: this is true for any syntax elements, really. Signaling
- a macroblock quantization factor or a large motion vector differential can
- cost more than making up the difference with extra quantized DFD (prediction
- error) bits. The optimum can be found with, some Lagrangian operator. In
- summary, any compression system with side information, there is a optimum
- point between signaling overhead (e.g. prediction) and prediction error.
-
- Liberal Interpretations of the Forward DCT:
- Borrowing from the concept that the DCT is simply a filter bank, a
- technique that seems to be gaining popularity is basis vector shaping.
- Usually this is combined with the quantization stage since the two are
- tied closely together in a rate-distortion sense. The idea is to use
- the basis vector shaping as a cheap alternative to pre-filtering by
- combining the more desirable data adaptive properties of pre-filtering/
- pre-processing into the transformation process... yet still reconstruct
- a picture in the decoder using the standard IDCT that looks reasonably
- like the source. Some more clever schemes will apply a form of windowing.
- [Warning: watch out for eigenimage/basis vector orthoganality. ]
-
- Frequency-domain enhancements:
- Enhancements are applied after the DCT (and possibly quantization)stage
- to the transform coefficients. This borrows from the concept: if you
- don't like the (quantized) transformed results, simply reshape them into
- something you do like. Suppressing isolated small amplitudes is popular.
-
- Temporal spreading of quantization error:
- This method is similar to the original intent behind color subcarrier
- phase alternation by field in the NTSC, PAL, and SECAM analog TV
- standards: for stationary areas, noise does not hang" in one location,
- but dances about the image over time to give a more uniform effect.
- Distribution makes it more difficult for the eye to "catch on" to
- trouble spots (due to the latent temporal response curve of human
- vision). Simple encoder models tend to do this naturally but will not
- solve all situations.
-
-
- Look-ahead and adaptive frame cycle structures: analyze picture activity
- several pictures into the future, looking for scene changes or motion
- statistics.
-
- It is easy to spot encoders that do not employ any advanced encoding
- techniques: reconstructed video usually contains ringing around edges,
- color bleeding, and lots of noise.
-
- 49. Is so-and-so really MPEG compliant ?
-
- A. At the very least, there are two areas of conformance/compliance in
- MPEG: 1. Compliant bitstreams 2. compliant decoders. Technically
- speaking, video bitstreams consisting entirely of I-frames (such as
- those generated by Xing software) are syntactically compliant with the
- MPEG specification. The I-frame sequence is simply a subset of the full
- syntax. Compliant bitstreams must obey the range limits (e.g. motion
- vectors limited to +/-128, frame sizes, frame rates, etc.)and syntax
- rules (e.g. all slices must commence and terminate with a non-skipped
- macroblock, no gaps between slices, etc.).
-
- Decoders, however, cannot escape true conformance. For example, a
- decoder that cannot decode P or B frames are *not* legal MPEG.
- Likewise, full arithmetic precision must be obeyed before any decoder
- can be called "MPEG compliant." The IDCT, inverse quantizer, and
- motion compensated predictor must meet the specification requirements...
- which are fairly rigid (e.g. no more than 1 least significant bit of
- error between reference and test decoders). Real-time conformance is
- more complicated to measure than arithmetic precision, but it is
- reasonable to expect that decoders that skip frames on reasonable
- bitstreams are not likely to be considered compliant.
-
- Artifacts
-
- 50. What are the tell-tale MPEG artifacts?
-
- A. If the encoder did its job properly, and the user specified a proper
- balance between sample rate and bitrate, there shouldn't be any visible
- artifacts. However, in sub-optimal systems, you can look for:
-
- Gibbs phenomenon/Ringing/Aliasing (too few AC bits, not enough
- pre-processing)
-
- Blockiness (not considering your neighbors before quantizing)
-
- Posterization (too few DC bits)
-
- Checkerboards (DCT eigenimages as a result of too few AC coefficients)
- Colorbleeding (not considering color in encoder cost model, not
- subtracting color at edges of objects, etc.)
-
- 51. Where are the weak points of MPEG video ?
- A.
- Texture patterns (rapidly alternating lines)
- sharp edges (especially text)
- [installment 3]
-
-
- 52. What are some myths about MPEG?
- A. There are a few major myths that I am aware of:
-
- 1. Block displacements: macroblock predictions are formed out of
- arbitrary 16x16 (or 16x8/16x16 in MPEG-2) areas from previously
- reconstructed pictures. Many people believe that the prediction
- macroblocks have boundaries that fall on interchange boundaries (pixel
- 0, 15, 31, 53... line 0, 15, 31, 53... etc.). In fact, motion vectors
- represent relative translations with respect to the target
- reconstruction macroblock coordinates. The motion vectors can point to
- half pixel coordinates, requiring that the prediction macroblock to be
- formed via bi-linear interpolation of pixels.
-
-
- 2. Displaced frame (macroblock) difference construction: the prediction
- error formed as the difference between the prediction macroblock and
- source macroblock is coded much like an Intra macroblock. The
- prediction may come from different locations (as in bi-directional
- prediction--or in MPEG-2--16x8, field-in-frame, and Dual Prime), but the
- DFD is always coded as a 16x16 unit.
-
- 3. Compression ratios
-
- You hear 200:1 and 100:1 in the media. Utter rubbish. The true range
- is between 16:1 and 40:1. Spreading misinformation about compression
- ratios in public will catch the attention of the infamous MPEG Police.
- They say mild-mannered Michael Barnsley will snap, without warning, into
- violent rage if he doesn't get the upper bunk bed.
-
- 4. Picture coding types all consist of the same macroblocks
-
- Macroblocks within I pictures are strictly intra-coded. Macroblocks
- within P pictures can be either predicted or intra-coded, and B pictures
- they can be bi-directional, forward, backward, or intra. Additional
- macroblock modes switches include: predicted with no motion
- compensation, modified macroblock quantization, coding of prediction error or
- not. The switches are concatenated into the macroblock_type side information
- and variable length coded in the macroblock header.
-
- 53. What is the color space of MPEG?
-
- MPEG strictly specifies the YCbCr color space, not YUV or YIQ or YPbPr
- or YDrDb or any other color difference variations. Regardless of any
- bitstream parameters, MPEG-1 and MPEG-2 Video Main Profile specify 4:2:0
- chroma ratio, where the color difference channels (Cb, Cr) have half the
- resolution or sample grid density in both the horizontal and vertical
- direction
- with respect to luminance.
-
- MPEG-2 High Profile includes an option for 4:2:2 and 4:4:4 coding.
- Applications
- for this are likely to be broadcasting and contribution equipment.
-
- 54. Don't you mean 4:1:1 ?
-
- A. No, here is a table of ratios:
-
-
- CCIR 601 (60 Hz) image Chroma sub-sampling factors
- format Y Cb, Cr Vertical Horizontal
- ----- --------- ---------- -------- ----------
- 4:4:4 720 x 480 720 x 480 none none
- 4:2:2 720 x 480 360 x 480 none 2:1
- 4:2:0 720 x 480 360 x 240 2:1 2:1
- 4:1:1 720 x 480 720 x 120 none 4:1
- 4:1:0 720 x 480 180 x 120 4:1 4:1
-
- 3:2:2, 3:1:1, and 3:1:0 are less common variations.
-
- 55. Why did MPEG choose 4:2:0 ? Isn't 4:2:2 the standard for TV?
-
- A. At least three reasons I can think of:
-
- 1. 4:2:0 picture memory requirements are 33% less than the size of 4:2:2
- pictures.
- MPEG-1 decoder are able to snugly fit all 3 SIF pictures (1 reconstruction &
- display, 2 prediction) into 512 KBytes of buffer space. CCIR 601 is a
- tighter fit into 2 Mbytes.
-
- 2. The subjective difference between 4:2:0 and 4:2:2 is minimal, when
- considering consumer display equipment and distribution compression ratios.
-
- 3. Vertical decimation increases compression efficiency by reducing syntax
- overhead posed in an 8 block (4:2:0) macroblock structure.
-
- 4. You re compressing the hell out of the video signal, so what possible
- difference can the 0:0:2 high-pass make?
-
- Interlacing and the 62 microsecond gap between successively scanned lines
- introduces some discontinuities, but most of this can be alleviated through
- pre-processing.
-
- 56. What is the precision of MPEG samples?
-
- A. By definition, MPEG samples have no more and no less than 8-bits uniform
- sample precision (256 quantization levels). For luminance (which is
- unsigned) data, black corresponds to level 0, white is level 255. However, in
- CCIR recommendation 601 chromaticy, levels 0 through 14 and 236 through 255
- are reserved for blanking signal excursions. MPEG currently has no such
- clipped excursion restrictions, although decoder might take care to insure
- active samples do not exceed these limits. With three color components per
- pixel, the total combination is roughly 16.8 million colors (i.e. 24-bits).
-
- 57. What is all the fuss with cositing of chroma components?
-
- A. It is moderately important to properly co-site chroma samples,
- otherwise a sort of chroma shifting effect (exhibited as a halo) may result
- when the reconstructed video is displayed. In MPEG-1 video, the chroma
- samples are exactly centered between the 4 luminance samples (Fig 1.) To
- maintain compatibility with the CCIR 601 horizontal chroma locations and
- simplify implementation (eliminate need for phase shift), MPEG-2 chroma
- samples are arranged as per Fig.2.
-
- Y Y Y Y Y Y Y Y YC Y YC Y
- C C C C
- Y Y X Y Y Y Y Y YC Y YC Y
-
- Y Y Y Y Y Y Y Y YC Y YC Y
- C C C C
- Y Y Y Y Y Y Y Y YC Y YC Y
-
- Fig.1 MPEG-1 Fig.2 MPEG-2 Fig.3 MPEG-2 and
- 4:2:0 organization 4:2:0 organization CCIR Rec. 601
- 4:2:2 organization
-
- MPEG for the data compression expert
-
- 58. How would you explain MPEG to the data compression expert?
-
- A. MPEG video is a block-based video scheme.
-
-
- 59. How does MPEG video really compare to TV, VHS, laserdisc ?
- A. VHS picture quality can be achieved for source film video at about 1
- million bits per second (with proprietary encoding methods). It is very
- difficult to objectively compare MPEG to VHS. The response curve of
- VHS places -3 dB at around 2 MHz of analog luminance bandwidth
- (equivalent to 200 samples/line). VHS chroma is considerably less dense
- in the horizontal direction than MPEG source video (compare 80
- samples/line to 176!). From a sampling density perspective, VHS is
- superior only in the vertical direction (480 luminance lines compared to
- 240)...
- but when taking into account (supposedly such things as) interfield magnetic
- tape crosstalk and the TV monitor Kell factor, the perceptual vertical
- advantage is not all that significant. VHS is prone to such inconveniences
- as timing errors (an annoyance addressed by time base correctors), whereas
- digital video is fully discretized. Pre-recorded VHS is typically recorded at
- very high duplication speeds (5 to 15 times real time playback speed),
- opening up additional avenues for artifacts. In gist, MPEG-1 at its nominal
- parameters can match VHS's sexy low-pass-filtered look.
-
- With careful coding schemes, broadcast NTSC quality can be approximated at
- about 3 Mbit/sec, and PAL quality at about 4 Mbit/sec. Of course, sports
- sequences with complex spatial-temporal activity should be treated with bit
- rates more like 5 and 6 Mbit/sec, respectively. Laserdisc is a tough one to
- compare. Laserdisc's are encoded with composite video (NTSC or PAL).
- Manufacturers of laser disc players make claims of up to 425 TVL (or 567
- samples/line) response. Thus it could be said the laserdisc has a 567 x 480 x
- 30 Hz "potential resolution". The carrier-to-noise ratio is typically better
- than 48 dB. Timing is excellent. Yet some of the clean characteristics of
- laserdisc can be achieved with MPEG-1 at 1.15 Mbit/sec (SIF rates),
- especially for those areas of medium detail (low spatial activity) in the
- presence of uniform motion. This may be why some people say MPEG-1 video at
- 1.15 Mbit/sec looks almost as good as Laserdisc or Super VHS at times.
-
- 60. What are the typical MPEG-2 bitrates and picture quality?
-
- Picture type
- I P B Average
- MPEG-1 SIF
- @ 1.15 Mbit/sec 150,000 50,000 20,000
- 38,000
-
- MPEG-2 601 400,000 200,000 80,000
- 130,000
- @ 4.00 Mbit/sec
-
- Note: parameters assume Test Model for encoding, I frame distance of 15 (N =
- 15), and a P frame distance of 3 (M = 3).
-
- Of course, among differing source material, scene changes, and use of
- advanced encoder models... these numbers can be significantly different.
-
- 61. At what bitrates is MPEG-2 video optimal?
- A. The Test subgroup has defined a few examples:
-
- "Sweet spot" sampling dimensions and bit rates for MPEG-2:
-
- Dimensions Coded rate Comments
- ------------- ---------- ----------------------------------------
- ---
- 352x480x24 Hz 2 Mbit/sec Half horizontal 601. Looks almost NTSC
- (progressive) broadcast quality, and is a good
- (better)
- substitute for VHS. Intended for film src.
-
- 544x480x30 Hz 4 Mbit/sec PAL broadcast quality (nearly full
- capture
- (interlaced) of 5.4 MHz luminance carrier). Also
- 4:3 image dimensions windowed within 720
- sample/line 16:9 aspect ratio via pan&scan.
-
- 704x480x30 Hz 6 Mbit/sec Full CCIR 601 sampling dimensions.
- (interlaced)
-
- [these numbers subject to change at whim of MPEG Test subgroup]
-
-
-
- 62. Why does film perform so well with MPEG ?
- A. Several reasons, really:
-
- 1) The frame rate is 24 Hz (instead of 30 Hz) which is a savings of
- some 20%.
- 2) the film source video is inherently progressive. Hence no fussy
- interlaced spectral frequencies.
- 3) the pre-digital source was severely oversampled (compare 352 x 240
- SIF to 35 millimeter film at, say, 3000 x 2000 samples). This can
- result in a very high quality signal, whereas most video cameras
- do
- not oversample, especially in the vertical direction.
- 4) Finally, the spatial and temporal modulation transfer function
- (MTF)
- characteristics (motion blur, etc) of film are more amenable to
- the transform and quantization methods of MPEG.
-
- 63. What is the best compression ratio for MPEG ?
-
- A. The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel
- inter. Experimentation has shown that intra frame coding with the
- familiar DCT-Quantization-Huffman hybrid algorithm achieves optimal
- performance at about an average of 1.2 bits/sample or about 6:1
- compression ratio. Below this point, artifacts become noticeable.
-
- 64. Can MPEG be used to code still frames?
-
- A. Yes. There are, of course, advantages and disadvantages to using
- MPEG over JPEG:
-
- Disadvantages:
-
- 1. MPEG has only one color space
- 2. MPEG-1 and MPEG-2 Main Profile luma and chroma share quanitzation
- and VLC tables
- 3. MPEG-1 is syntactically limited to 4k x 4k images, and 16k x 16k for
- MPEG-2.
-
- Advantages:
-
- 1. MPEG possesses adaptive quantization
-
- 2. With its limited still image syntax, MPEG averts any temptation to use
- unnecessary, expensive, and academic encoding methods that have little
- impact on the overall picture quality (you know who you are).
-
- Philips' CD-I spec. has a requirement for a MPEG still frame mode, with
- double SIF image resolution. This is technically feasible mostly thanks to
- the fact that only one picture buffer is needed to decode a still image
- instead of three buffers.
-
- 65. Is there an MPEG file format?
-
- A. Not exactly. The necessary signal elements that indicate image size,
- picture rate, aspect ratio, etc. are already contained within the sequence
- layer of the MPEG video stream. The Whitebook format for Karoke and CD-I
- movies specify a range of (time-division) multiplexing strategies for audio
- and video bitstreams. A directory format listing scenes and their locations
- on the disc is associated with the White Book specification.
-
- 66. What are some pre-processing enhancements ?
-
- Adaptive de-interlacing:
-
- This method maps interlaced video from a higher sampling rate (e.g 720 x 480)
- into a lower rate, progressive format (352 x 240). The most basic algorithm
- measures the correlation between two immediate macroblock fields, and if the
- correlation is high enough, uses an average of both fields to form a frame
- macroblock. Otherwise, a field area from one field (usually of the same
- parity) is selected. More clever algorithms are much more complex than this,
- and may involve median filtering, and multirate/multidimensional tools.
-
- Pre-anti-aliasing and Pre-blockiness reduction:
- A common method in still image coding is to pre-smooth the image before
- encoding. For example, if pre-analysis of a frame indicates that serious
- artifacts will arise if the picture were to be coded in the current condition
- (i.e. below the sweet spot), a pre-anti-aliasing filter can be applied. This
- can be as simple as having a smoothing severity proportional to the image
- activity. The pre-filter can be global (same smoothing factor for whole
- image or sequence) or locally adaptive. More complex methods will again use
- multirate/multidimensional methods.
-
- One straightforward concept from multidimensional/multirate e-processing is
- to apply source video whose resolution (sampling density) is greater than
- the target source and reconstruction sample rates. This follows the basic
- principles of oversampling, as found in A/D converters.
-
- These filters emphasize the fact that most information content is contained
- in the lower harmonics of a picture anyway. VHS is hardly considered to be a
- sharp cut-off medium, tragically implying that "320 x 480 potential" of
- VHS is never truly realized.
-
- 67. Why use these "advanced" pre-filtering techniques?
-
- A. Think of the DCT and quantizer as an A/D converter. Think of the DCT/Q
- pre-filter as the required anti-alias prefilter found before every A/D. The
- big difference of course is that the DCT quantizer assigns a varying number
- of bits per transform coefficient. Judging on the normalized activity
- measured in the pre-analysis stage of video encoding (assuming you even have
- a pre-analysis stage), and the target buffer size status, you have a fairly
- good idea of how many bits can be spared for the target macroblock, for
- example.
-
- Other pre-filtering techniques mostly take into account: texture patterns,
- masking, edges, and motion activity. Many additional advanced techniques can
- be applied at different immediate layers of video encoding (picture, slice,
- macroblock, block, etc.).
-
-
- 68. What about post-processing enhancements?
-
- Some research has been carried out in this area. Non-linear interpolation
- methods have been published by Wu and Gersho (e.g. ICASSP 93), convex hull
- projections for MAP (Severinson, ICASSP 93), and others. Post-processing
- unfortunately defies the spirit of MPEG conformance. Decoders should produce
- similar reconstructions. Enhancements should ideally be done during the pre-
- processing and encoding stages.
-
- 69. Can motion vectors be used to measure object velocity?
-
- A. Motion vector information cannot be reliably used as a means of
- determining object velocity unless the encoder model specifically set
- out to do so. First, encoder models that optimize picture quality generate
- vectors that typically minimize prediction error and, consequently,
- the vectors often do not represent true object translation. Standards
- converters that resample one frame rate to another (as in NTSC to PAL)
- use different methods (motion vector field estimation, edge detection, et
- al) that are
- not concerned with optimizing ratios such as SNR vs bitrate. Secondly, motion
- vectors
- are not transmitted for all macroblocks anyway.
-
- 70. How do you code interlaced video with MPEG-1 syntax?
- A. Two methods can be applied to interlaced video that maintain
- syntactic compatibility with MPEG-1 (which was originally designed for
- progressive frames only). In the field concatenation method, the
- encoder model can carefully construct predictions and prediction errors
- that realize good compression but maintain field integrity (distinction
- between adjacent fields of opposite parity). Some pre-processing
- techniques can also be applied to the interlaced source video that
- would, e.g., lessen sharp vertical frequencies.
-
- This technique is not efficient of course. On the other hand, if the
- original source was progressive (e.g. film), then it is more trivial to
- convert the interlaced source to a progressive format before encoding.
- (MPEG-2 would then only offer superior performance through greater DC
- block precision, non-linear mquant, intra VLC, etc.) Reconstructed
- frames are re-interlaced in the decoder Display process.
-
- The second syntactically compatible method codes fields as separate pictures.
- This approach has been acknowledged not to work as well.
-
- 71. Is MPEG patented?
- A. Yes and no. Many encoding methods are patented. Approximately 11
- blocking patents, that is, patents that are general enough to be unavoidable
- in any implementation have been recently identified.
-
- A patent pool is being formed within MPEG where a single royalty fee would be
- split among the 31 patent-holding companies.
-
- 72. How many cable box alliances are there?
-
- A. Many. To start with:
-
- Scientific Atlanta (SA), Kaledia, and Motorola:
- SA will build the box, Motorola the chips, and Kaleida the
- O/S and user interface (using ScriptX of course).
-
- Silicon Graphics (SGI), Scientific Atlanta, and Toshiba
- For the Time Warner's Orlando trial, SGI will provide the
- RISC (MIPS R4000) and software, SA will do the box again,
- and Toshiba will provide the chips.
-
- General Instruments (GI) and Microsoft:
- GI will make the box and Intel will supply the special low-cost
- 386SL processor on which a 1MB flash EPROM executable core
- of Microsoft windows and DOS will run. Microsoft will develop the
- user interface.
-
- Hewlett Packard (HP):
- HP will manufacture and/or design low cost, open architecture set-top
- decoder boxes (not a part of the Eon wireless deal). The CPU will
- explicitly not use a 80x68 based processor.
-
-
- CLI and Philips:
- Compression Labs will provide the encoder technology and Philips
- will provide the decoder techology for an ADSL system whose
- transport structure will be put together by Broadband Technologies.
-
- ["These alliances subject to change at the whim of PR departments
- and market forces."]
-
- 73. Will there be an MPEG video tape format?
-
- A. Not exactly. A consortium of international companies are co-
- developing a consumer digital video 6 millimeter wide, metal particle
- tape format. Due to the initial high cost of MPEG encoders, a JPEG-like
- compression method will be used for inexpensive encoding of typical
- consumer source video (broadcast PAL, NTSC). The natural consequence of
- still image methods is less efficient use of bandwidth: 25 Mbit/sec for
- the same subjective real-time playback quality achieved at 6 Mbit/sec
- possible with MPEG-2. A second bit rate mode, 50 Mbit/sec, is
- designated for HDTV.
-
- Pre-coded digital video from, e.g., broadcast sources will be directly
- recorded to tape and "passed-through" as a coded bitstream to the video
- decompression box upon tape playback. Assuming if linear tape speed is
- to be proportional to bit rate, the recording time of a pre-compressed
- MPEG-2 program at the upper limit of 5 Mbit/sec for broadcast quality
- video, the recording time would be over 20 hours. Channel coding
- schemes (error correction, convolution coding, etc.), however, will
- most likely be optimized for the tape medium and therefore may differ
- from the channel methods for cable, terrestrial, and satellite. (A
- Zenith-Goldstar S-VHS based experiment did, however, directly record the
- 4-VSB broadcast baseband signal of the old Zenith/AT&T HDTV proposal).
-
- More specs: (Summarized from EE Times July 5, 1993 article)
-
- tape width: 6.35 mm
- Audio: two channel 48 KHz 16-bit audio, or 4 channel at 32 KHz at 12-bit
- Tape format: metal evaporated tape, 13.5 microns thick
-
- Cassette dimensions: (millimeters) Recording times:
- Size Width Height Depth 525/625 (25Mb/sec) HDTV (50 Mb/s)
- -------- ----- ------ ----- ------------------ --------------
- Standard 125 78 14.6 4h30min 2h15min
- Small 66 48 12.2 1 hour 30min
-
- Linear tape speeds: 18.812 mm/s (60Hz), 18.831 mm/s (50 Hz)
- Video compression: DCT based
-
- Participants: Matsushita, Sony, Philips, Thomson, Hitachi, Mitsubishi,
- Sanyo, Sharp, Toshiba, JVC.
-
- MPEG in everyday life
-
- 74. Where will be see MPEG in everyday life?
- A. Just about wherever you see video today.
-
- DBS (Direct Broadcast Satellite)
- The Hughes/USSB DBS service will use MPEG-2 video and audio. Thomson
- has exclusive rights to manufacture the decoding boxes for the first 18
- months of operation. Hughes/USSB DBS will begin its U.S. service in
- April 1994. Two satellites at 101 degrees West will share the power
- requirements of 120 Watts per 27 MHz transponder over a total of 32
- transponders. Multi source channel rate control methods will be
- employed to optimally allocate bits between several programs normalized
- to one 22 Mbit/sec data carrier. Bit allocation adapts to instantaneous co-
- channel
- spatial and co-channel temporal activity. An average of 150 channels are
- planned with the addition of a second set of satellites augmenting the power
- level of each transponder to 240 Watts. The coded throughput of each
- transponder will increase to 30 Mbit/sec.
-
-
- CATV (Cable Television)
- Despite conflicting options, the cable industry has more or less
- settled on MPEG-2 video. Audio is less than settled. For example,
- General Instruments (the largest U.S. consumer cable set-top box
- manufacturer) have announced the planned exclusive use of Dolby AC-3.
- The General Instruments DigiCipher I video syntax is similar to MPEG-2
- syntax, but employs smaller macroblock predictions and no B-frames. The
- DigiCipher II specification will include modes to support both the GI
- and full MPEG-2 Video Main Profile syntax. Digicipher-I services such
- as HBO will upgrade to DigiCipher II in 1994.
-
- HDTV
- The U.S. Grand Alliance, a consortium of companies that formerly competed
- to win the U.S. terrestrial HDTV standard, have already agreed to
- use the MPEG-2 Video and Systems syntax---including B-pictures. Both
- interlaced(1920 x 1080 x 30 Hz) and progressive (1280 x 720 x 60 Hz)
- modes will be supported. The Alliance has also settled upon a modulation
- method (VSB) convolution coding (Viterbi), and error correction (Reed-
- Soloman) specification.
-
- In September 1993, the consortium of 85 European companies signed an
- agreement to fund a project known Digital Video Broadcasting (DVB) which
- will develop a standard for cable and terrestrial transmission by the
- end of 1994. The scheme will use MPEG-2. This consortium has put the
- final nail in the coffin of the D-MAC scheme for gradual migration
- towards an all-digital, HDTV consumer transmission standard. The only
- remaining analog or digital-analog hybrid system left in the world is
- NHK's MUSE (which will probably be axed in a few years as soon as it appears
- to be politically secure thing to do).
-
- 75. What is the best compression ratio for MPEG ?
- A. The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel
- inter. Experimentation has shown that intra frame coding with the
- familiar DCT-Quantization-Entropy hybrid algorithm achieves optimal
- performance at about an average of 1.2 bits/sample or about 6:1
- compression ratio. Below this point, artifacts become noticeable.
-
-
- 76. Is there a MPEG CD-ROM format?
- A. Yes, a consortium of international companies (Matsushita, Philips,
- Sony, JVC, et al) have agreed upon a specification for MPEG video and
- audio. 2 hour long movies are stored on two 650 MByte compact discs. The
- video
- rate is 1.15 Mbit/sec, the audio rate is either 128 kbit/sec or 192 kbit/sec
- Layer I or Layer II.(this seems to contradict the Philips 224 kbit/s audio
- spec?). Although the Video, Systems, and Audio syntax are identical, the CD-I
- movie format and the White Book format are not compatible.
-
- Researchers are busy experimenting with denser and faster rate CD
- formats, perhaps using green or blue laser wavelengths. One demonstration
- stretched the pit and track density to its limits, improving areal density by
- almost 2 fold.
-
-
-