A shorter version of this article appears in the May/June 2008 issue of the SMPTE journal.
Chroma subsampling is a widely used technique to reduce bandwidth in many video systems. Since the human visual system is not very sensitive to color, color resolution can be reduced to lower bandwidth. Video systems do this via chroma subsampling. Unfortunately, chroma subsampling is not visually lossless in all situations. This article will examine sources of chroma subsampling artifacts. It will also show what chroma subsampling would look like if these problems were solved. It introduces a (new?) technique for extracting higher quality from existing chroma subsampled signals by minimizing out of gamut colors.
*Examples throughout this article are for 4X horizontal subsampling, which corresponds to the 4:1:1 chroma subsampling scheme used in NTSC DV. The chroma subsampling artifacts presented in this article generalize to other schemes, including 4:2:2 signals used in studio video (2X horizontal subsampling). Please refer to original standards documents for information on how chroma subsampling should be done (the examples in this article may not correspond to the NTSC DV standard; I have not read the original standard and have read conflicting second-hand reports on it).
The video signal is divided into luma and chroma, where luma approximates the ‘black and white’ portion of a signal while chroma approximates color. Luma (Y’) is formed by the formula:
Y’ = rR’ + gG’ + bB’
where the lowercase letters represent the luma coefficients. Note that the luma coefficients are different between ITU-R Rec. BT.601, 709, and SMPTE 240M. For Rec. 709 video, the formula is as follows:
Rec. 709 Luma (Y’) = 0.2126 R’ + 0.7152 G’ + 0.0722 B’
Color information is carried via color difference components B’ –
Y’ and R’ – Y’. Their formulae are self-evident:
B’ – Y’ = B’ – Y’
R’ – Y’ = R’ – Y’
These color difference components may have scale factors and offsets applied to them so that they can be stored or carried over video interfaces. These scale factors and offsets are reversed upon decoding.
With the signal converted into luma and chroma, the chroma portion can be subsampled (i.e. reduced in resolution) to reduce bandwidth.
In Figure 1, luma is visualized by blanking the color difference components with neutral values. Similarly, the color difference components can be visualized by blanking the luma values. Once the image is converted in this manner, the color difference components can be subsampled (i.e. reduced in resolution) to reduce bandwidth. For the majority of real world images, chroma subsampling is visually lossless.
|Original||Luma (Y’) channel only||Color difference components||Subsampled image|
One area where chroma subsampling is not visually lossless is where it creates colors that cannot be reproduced by the display device (i.e. outside its gamut). Suppose the image consisted of red lines on a black background.
Figure 2. Calculation of resulting R’G’B’ values.
*Rec. 709 luma co-efficients are used, with values scaled for a 0-1 range.
If there are alternating red and black lines, all the reconstructed values will have the same chroma value (assuming typical chroma resampling). The problem is that chroma is re-constructed onto black pixels (pixels where Y’ is at black level), as seen in Figure 2. Logically, we know that black pixels should emit no red light, no green light, and no blue light. However, the reconstructed ‘black’ pixels have positive red, negative green, and negative blue values. Real world monitors cannot emit negative light!
The negative values are effectively clipped to zero by the monitor. So, the resulting ‘black’ pixel is a reddish one that emits red light (and no green or blue light). This is clearly erroneous! A side effect of clipping is that the resulting red pixel has an effective luma value that is greater than zero. It is brighter than it should be. Similarly, the same problem occurs for white pixels. Chroma reconstructed onto white pixels can cause the red/green/blue channels to go too high and clip or distort.
The second problem with chroma subsampling is that it does not maintain constant luminance. The luma values used in chroma subsampling are an engineering shortcut used to approximate luminance, calculated with the following shortcut formula.
Rec. 709 Luma (Y’) = 0.2126 R’ + 0.7152 G’ + 0.0722 B’
To calculate luminance instead of luma, linear light processing is necessary. It is desirable as it can ensure that the number of photons of light emitted by the monitor stays roughly the same. To do this, the video signal can be converted into a linear light signal by removing its gamma correction. The calculations are then performed on the linear light signal and then gamma correction is added back in.
Gamma correction can be removed by applying the inverse of the Rec. 709 transfer function according to the following formula, where L is the linear-light value and E’ is the (non-linear) gamma-corrected component:
Then calculate luminance
Rec. 709 Luminance (Y) = 0.2126 R + 0.7152 G + 0.0722 B
Gamma correction can then be added back in (the formula is the inverse of the previous formula).
In typical gamma-corrected processing, errors in chroma will ‘bleed’ into the luminance channel. Not enough chroma will cause a drop in luminance, causing dark bands to appear (see Figure 3). Similarly, too much chroma will cause a rise in luminance. This effect is proportional to chroma strength; it is worst where there are fully saturated colors. In practical situations, real world footage tends not to contain highly saturated colors so these errors usually do not appear.
Linear light processing solves this problem of chroma errors bleeding into the luminance channels, getting rid of the dark bands. For this to happen, linear light processing has to be used both in (1) forming luminance and (2) in re-sampling/re-scaling the chroma.
Figure 3. Comparison of typical processing (with gamma-corrected values) versus linear light processing.
|Original||Typical processing||Linear light processing|
Better results can be achieved by using the luma information to aid in reconstructing/upsampling chroma information. We can distribute the chroma in such a way that minimizes out of gamut colors. I call this as in-range chroma reconstruction.
Picture chroma as a liquid being poured into glasses of different heights. Let the height of the glass represent the most chroma a particular pixel can hold. If too much chroma is in the glass, it will overflow and an out of gamut value will result. In typical chroma subsampling, the same amount of chroma is poured into each glass. If the glasses are of different heights (e.g. black pixels essentially have no height) then overflow can occur. One algorithm for avoiding this problem is to distribute/pour the chroma proportional to the height of the glass. I refer to this algorithm as the “proportion” method. A second possible algorithm is to collect any spilled chroma and to re-pour them into remaining unfilled glasses. I refer to this as the “spill” method.
Figure 4. Diagram of chroma distribution methods
|First pass||Second pass
|Typical||“Proportion” method||“Spill” method|
To determine the maximum chroma a pixel can hold, visualize the R’G’B’ gamut plotted in Y’, B’-Y’, and R’-Y’ coordinates per Figure 5. Each B’-Y’ R’-Y’ pair corresponds to a particular “color”/hue and lies along a triangular slice within the R’G’B’ gamut. This triangle has corners at white, black, and some fully saturated/pure R’G’B’ color as shown in Figure 5. The height h in the figure represents the maximum chroma possible for a given Y’ value. This height h also corresponds to the height of the glasses in the chroma pouring analogy.
Results of in-range chroma reconstruction can be seen in Figure 6.
Figure 6. Comparison between typical chroma subsampling and in-range chroma reconstruction.
For red text on a white (or black) background, both the proportion and spill methods can achieve excellent results. For the darker red text on a grey-ish background, it is possible to see the differences between the two algorithms. The proportion method can exhibit some erroneous ‘hotspots’ of concentrated chroma, one of which can be seen near the center of the large A in the text in the dim test pattern. The spill method is not prone to such artifacts. However, it is slower since it requires a few passes re-pouring the spilled chroma instead of the single pass of the proportion method.
An underlying assumption behind in-range chroma reconstruction is that the image lies entirely within the R’G’B’ gamut and does not contain out of gamut colors. This can be a bad assumption for signals in a production environment. For analog material dubbed to digital, analog black level may be incorrect. For digitally originated material, many cameras will record information above white level in the “superwhite” region. As well, all sources have noise that can push legal R’G’B’ signals out of range.
If we simply apply the spill method, anomalous chroma can occur on highlight areas (not shown). Out-of-range colors can be accommodated by changing how the heights of the glasses are determined. Recall that the original function was derived from a triangle-shaped slice of the R’G’B’ cube. A conservative method is to move the corners of this triangle-shaped function to cover the out-of-range values. Unfortunately, doing so weakens the performance of in-range chroma reconstruction.
The top two rows of Figure 7 show a noise-free image subsampled via the different methods. Conservative in-range reconstruction does weaken the effectiveness of the technique. The blue text on a black background is clearly blurry and no longer improved compared to typical reconstruction.
The bottom two rows show why this may be desirable. Straightforward in-range recontruction has a speckling effect due to the noise (see Figure 8 for an enlargement).
Figure 8. Enlargement showing speckling effect
These problems do not occur if it is not possible to define luma values outside legal range (i.e. if there is no allowance for superwhites or superblacks).
One last issue with chroma subsampling is that there is a mishmash of different resampling schemes in use. Which scheme is used makes a difference in visual quality. When downsampling, any scheme has trade-offs between:
Every re-sampling scheme suffers from at least one of these problems or some combination of all three. The three problems can be visualized as corners on a triangle, where improving/moving along one dimension will make either or both of the other problems worse. It is impossible to solve for all three problems at once.
However, these three forms of image impairments do not tell the whole story. Image processing in the human brain also play a role in what looks the best. Some subjective evaluation is necessary.
Figure 9 shows a test pattern run through different re-sampling schemes.
Each scheme actually consists of two (possibly different) schemes, one for
downsampling and one for upsampling. Four common pairings are shown in Figure
Figure 9. Resampling schemes compared
|Multi-tap FIR||Nearest neighbour|
|Downsampling method:||Multi-tap FIR||Nearest neighbour / point sampling|
|Upsampling method:||Tent/triangle (*sinx/x correction may be
appropriate, but was not applied)
In my opinion, the worst looking schemes by far are the nearest neighbour and box resampling schemes. The nearest neighbour scheme exhibits high amounts of aliasing and is also vulnerable to a form of aliasing I call gap aliasing. Image detail that falls in the gaps between the sampled points are discarded and completely ignored. Gap aliasing can be seen in alternating red and black lines in the test pattern. The chroma for some sets of lines completely disappears! On top of the aliasing artifacts, the nearest neighbour and box resampling schemes suffer a boxy appearance from box upsampling.
For good quality chroma subsampling, the tent/triangle or multi-tap FIR schemes should be used. Between these two schemes, the multi-tap FIR scheme is sharper and exhibits less aliasing at the expense of ringing artifacts. Rec. 601 filtering requirements and Rec. 709 filtering guidelines establish standards for filter performance. A multi-tap FIR filter is necessary to meet those standards. This type of filter can have much better performance over multiple generations than tent/triangle resampling. However, such filters are rarely implemented (especially in commodity desktop-based systems) since they are computationally expensive.
In practice, box resampling is very commonly used for 4:1:1 DV despite its poor visual performance. Worse yet, using different resampling schemes for downsampling and upsampling means that different methods may be inappropriately used. This is a problem if box resampling is mixed with the tent/triangle (or multi-tap FIR) scheme, as shown in Figure 10.
In box resampling, the chroma center lies between luma pixels. I refer to this as interstitial siting. In the other schemes, the chroma center lies on top of a luma pixel. I refer to this as co-siting. The center of co-sited chroma lies 1.5 pixels to the left of interstitial chroma (1.5 pixels for 4:1:1, 0.5 pixels for 4:2:2). While standards for various video formats (e.g. 4:2:2 SDI, DV and its variants, MPEG-2, etc.) specify chroma siting, these standards are not always followed. Mixing the schemes can result in the chroma being shifted in relation to the luma as shown in Figure 10. If chroma is downsampled using point sampling (i.e. nearest neighbour scheme) and upsampled with the tent/triangle scheme, the chroma center will not be shifted but high amounts of aliasing will result. Alternately, using the nearest neighbour scheme inherently for both up and downsampling results in chroma shifting (see Figure 9).
|Downsampling||Tent/triangle||Point / nearest
In non-linear editing, there is a minor advantage to using box and nearest neighbour resampling since they effectively pass the chroma straight through. Unlike the tent/triangle and multi-tap FIR schemes, there is no generation loss. This generation loss can be an issue with cross dissolves. Suppose the tent/triangle scheme were used. If there is a cross-dissolve between two clips, most NLEs will only recompress the cross dissolved section. The cross dissolve will encounter generation loss, while the material around it will not. At the start of the cross dissolve, there can be a noticeable jump between 1st generation material and 2nd generation material.
This problem could be solved by recompressing (and re-applying chroma subsampling) on all the material involved in the dissolve. This would mean that adding a 1-second cross-dissolve to an hour-long clip requires that the entire clip be recompressed! Theoretically, this is not a problem if the NLE were able to recompress the footage and output real-time without needing to render. However, not all NLEs are capable of this or are designed for this. In practice, most desktop-based NLEs use box or nearest neighbour resampling for chroma subsampled formats (e.g. 4:2:2 SDI, DV, MPEG-2).
In my opinion, this approach is a 'greedy' approach that can backfire. Passing chroma through only works for video that has already been subsampled. It does not work on titles, CG elements, many filters and image processing tasks, still images (or still image sequences), or when up/downsampling material. These situations will result in the inappropriate mixing of resampling schemes as shown in Figure 10.
The ideal (though not necessarily practical) solution to this dilemma is to simply avoid it. Performing acquisition and post in a non-subsampled format (e.g. 4:4:4 R’G’B’) avoids the generation loss issues.
For ideal quality, linear light processing and in-range chroma reconstruction
should be used. Determining what the ideal resampling scheme is should be
done subjectively. Figure 11 shows chroma subsampling done with linear light
processing, the proportion method for chroma reconstruction, no illegal luma
values allowed, and different resampling schemes.
Figure 11. Comparison of resampling schemes in ideal chroma/color subsampling (images enlarged 2X)
|Box resampling.||Tent/triangle resampling.||Multi-tap FIR resampling.|
(images normal size)
|Box resampling.||Tent/triangle resampling.||Multi-tap FIR resampling.|
In my opinion, the tent/triangle scheme looks the best. The box scheme has a somewhat boxy appearance to it while the multi-tap FIR scheme has objectionable ringing artifacts.
Unfortunately, 4X horizontal subsampling is too much to be visually lossless even with when done ideally. In all instances the red text appears noticeably blurry against the grey background. Nonetheless, the examples do show that chroma/color subsampling is capable of higher quality.
Linear light processing of chroma, while not compatible with existing systems, may (or may not) be useful in future compression schemes for delivering content. I do not know whether the minor improvement in quality is worth the added complexity.
In-range chroma reconstruction is potentially useful when converting 4:2:2 material to 4:4:4 R’G’B’ (e.g. many image processing tasks require this) and when upconverting subsampled SD signals to HD.
In postproduction, chroma quality can be improved by avoiding inappropriate mixing of chroma siting and resampling schemes.
In practice, chroma subsampling artifacts for 4:2:2 and progressive 4:2:0 formats are rarely noticed even where it is poorly implemented (e.g. with nearest neighbour or box resampling). In particular, 4:2:2 is commonly referred to (and sometimes marketed) as “visually lossless”, even though it is not actually visually lossless in all circumstances (e.g. red text on a black background) . But while chroma subsampling is not entirely visually lossless, it seems to be good enough that many people do not notice otherwise.
On the other hand, 4:1:1 and interlaced 4:2:0 formats can be problematic as they effectively subsample the chroma by 4X in one direction (interlaced 4:2:0 effectively subsamples 4X vertically since each interlaced field is subsampled individually). As Figure 11 shows, 4X subsampling is too much even if current chroma subsampling problems are fixed. In current practice, end viewers do notice the artifacts. See the discussion of the “interlaced chroma problem” in Don Munsil and Stacey Spears’ article “The Chroma Upsampling Error and The 4:2:0 Interlaced Chroma Problem.” http://www.hometheaterhifi.com/volume_8_2/dvd-benchmark-special-report-chroma-bug-4-2001.html
In production, saturated colors in titles can be objectionable when working with 4:1:1 DV. Moving away from interlacing removes the need for the 4:1:1 and interlaced 4:2:0 formats. This allows the more sensible progressive 4:2:0 formats to be used and allows for higher quality.
 See Recommendation ITU-R BT.709-5. “Parameter values for the HDTV* standards for production and international programme exchange.” This document can be downloaded by taking advantage of ITU’s 3 free downloads promotion.
 Per SMPTE EG 28, luma and luminance have different meanings. See Poynton, Charles. “YUV and luminance considered harmful: A plea for precise terminology in video.” http://poynton.com/PDFs/YUV_and_luminance_harmful.pdf
 The source code is available for download. Please feel free to contact me if anything in the source code is unclear.
 For the multi-tap FIR filter, I have tried to keep the filter characteristics in the spirit of ITU-R Rec. 601. I have used the template guidelines with the passband frequency divided by 2 (since Rec. 601 defines filter performance for 2X subsampling and not 4X).
 For an overview of various standards in regards to chroma siting, see Poynton, Charles. “Merging computing with studio video: Converting between R’G’B’ and 4:2:2.” http://www.poynton.com/PDFs/Merging_RGB_and_422.pdf
 For comparisons of different common production codecs (including 4:2:2 codecs), see Marco Solorio's codecs.onerivermedia.com