VP9 is a bit of a paradox: it offers compression well above today’s industry standard for internet video streaming (H.264 – usually created using the opensource encoder x264), and playback is widely supported by today’s generation of mobile devices (Android) and browsers (Chrome, Edge, Opera, Firefox). Yet many companies and people are wary of using VP9. I’veblogged about the benefits of VP9 (using Google’s encoder, libvpx) before, and I keep hearing some common responses: libvpx is slow, libvpx is blurry, libvpx is optimized for PSNR, libvpx doesn’t look visually better compared to x264 encodes (or more extreme: x264 looks much better!), libvpx doesn’t adhere to target rates. Really, most of what I hear is not so much VP9, but more about libvpx. But this is a significant issue, because libvpx is the only VP9 software encoder available.
To fix this, we wrote an entirely new VP9 encoder, called Eve (“Efficient Video Encoder”). For those too lazy to read the whole post: this VP9 encoder offers 5-10% better compression rates (for broadcast-quality source files) compared to libvpx, while being 10-20% faster at the same time. Compared to x264, it offers 15-20% better compression rates, but is ~5x slower. Its target rate adherence is far superior to libvpx and comparable to x264. Most importantly, these improvements aren’t just in metrics: the resulting files look visually much better than those generated (at the same bitrate) by libvpx and x264. Don’t believe it? Read on!
As source material for these tests, I used the “4k” test clips from Xiph . These are broadcast-quality source files at 4k resolution (YUV 4:2:0, 4096×2160, 10 bit/component, 60fps). For these tests, since I have limited resources, I downsampled them to 360p (640×360, 8 bit/component, 30 fps) or 720p (1280×720, 8 bit/component, 30 fps) before encoding them.
I did two types of tests: 1-pass CRF (where you set a quality target) and 2-pass VBR (where you set an average bitrate target). For both tests, I measured objective quality (PSNR), effective bitrate and encoding time. For 2-pass VBR, I also measured target bitrate adherence (i.e. difference between actual and target file size). Lastly, I looked at visual quality.
I encoded the 360p test set using recommended 1-pass CRF settings for each encoder. First, let’s look at the PSNR metrics. The table shows bitrate improvement between Eve and libvpx/x264, i.e. “how many percent less (or more) bits does Eve need to accomplish the same PSNR value”. For example, a bitrate improvement of 10% for one clip means that it needs, on average (BD-RATE) over the bitrate spectrum in the graph for that clip, 10% less bits (e.g. 9 bits for Eve instead of 10 bits for the other encoder) to accomplish the same quality (PSNR). The average across all clips in this test set is -12.6% versus libvpx, which means that Eve needs, on average, 12.6% less bits than libvpx to accomplish the same quality (PSNR). Compared to x264, Eve needs 14.1% less bits to accomplish the same quality.
Some people object to using PSNR as a quality metric, so I measured the same files using SSIM as a metric. The results are not fundamentally different: Eve is 8.9% better than libvpx, and 22.5% better than x264. x264 looks a little worse in these tests than in the PSNR tests, and that’s primarily because x264 does significant metric-specific optimizations, which don’t (yet) exist in libvpx or Eve. However, more importantly, this shows that Eve’s quality improvement is independent of the specific metric used.
Lastly, I looked at encoding time. Average encoding time for each encoder depends somewhat on the target quality point. For most bitrate targets, Eve is quite a bit faster than libvpx. Overall, for an average (across all CRF values and test sequences) encoding time of about 1.28 sec/frame, Eve is 0.30 sec/frame faster than libvpx (1.58 sec/frame). At 0.25 sec/frame, x264 is ~5x faster, which is not surprising, since H.264 is a far simpler codec, and x264 a much more mature encoder.
|CRF; 360p||PSNR, Eve vs.||SSIM, Eve vs.||Encoding time (sec/frame)|
I encoded the same 360p sequences again, but instead of specifying a target CRF value, I specified a target bitrate using otherwise recommended settings for each encoder (Eve, vpxenc, x264), and used target bitrate adherence as an additional metric. Again, let’s first look at the objective quality metrics: the table shows results that are not fundamentally different from the CRF results: Eve requires 7.7% less bitrate than libvpx to accomplish the same quality in PSNR. Results for SSIM are not much different: Eve requires 6.6% less bitrate than libvpx to accomplish the same quality. Compared to x264, Eve requires 15.9% (PSNR) or 24.5% (SSIM) less bits to accomplish the same quality.
For an average encoding time of around 1.26 sec/frame, Eve is approximately 0.31 sec/frame faster than libvpx (1.57 sec/frame), which is similar to the CRF results. At 0.20 sec/frame, x264 is again several times faster than either Eve or libvpx, for the same reasons as explained in the CRF section.
|VBR; 360p||PSNR, Eve vs.||SSIM, Eve vs.||Encoding time (sec/frame)|
In terms of target bitrate adherence, Eve and x264 adhere to the target rate much more closely than libvpx does. Expressed as average absolute rate drift, where rate drift is target / actual – 1.0, Eve misses the target rate on average by 2.66%. x264 is almost as good, missing the target rate by 3.83% at default settings. Libvpx is several times farther off, with an average absolute rate drift of 9.48%, which confirms libvpx’ rate adherence concerns I’ve heard from others. Each encoder has options to curtail the rate drift, but enabling this option costs quality. If I curtail libvpx’ rate drift to the same range as x264/Eve (commandline options:
--undershoot-pct=2 --overshoot-pct=2 ; table below: RRD), it loses another 3.6% in quality, at which point Eve requires 11.3% less bitrate to accomplish the same quality, with a rate drift of 3.33% for libvpx.
|VBR; 360p||PSNR, Eve vs.||Absolute rate drift|
|libvpx||libvpx (RRD)||x264||Eve||libvpx||libvpx (RRD)||x264|
Most people in the US watch video at resolutions much higher than 360p nowadays, so I repeated the VBR tests at 720p to ensure consistency of the results at higher resolutions. Compared to libvpx, Eve needs 5.5% less bits to accomplish the same quality. Compared to x264, Eve needs 20.4% less bits. At 5.09 sec/frame versus 5.52 sec/frame, Eve is 0.43 sec/frame faster than libvpx, with the strongest gains at the low-to-middle bitrate spectrum. At 0.76 sec/frame, x264 is several times faster than either. In terms of bitrate adherence, Eve misses the target rate by 1.82% on average, and x264 by 1.65%. libvpx, at 8.88%, is several times worse. To curtail libvpx’ rate drift to the same range as Eve/x264 (using
--undershoot-pct=2 --overshoot-pct=2 ), libvpx loses another 2.9%, becoming 8.4% worse than Eve at an average absolute rate drift of 1.50%. Overall, these results are mostly consistent with the 360p results.
|VBR; 720p||PSNR, Eve vs.||Encoding time (sec/frame)||Absolute rate drift (%)|
|libvpx||libvpx (RRD)||x264||Eve||libvpx||x264||Eve||libvpx||libvpx (RRD)||x264|
The most-frequent concern I’ve heard about libvpx concerns visual quality. It usually goes like this: “the metrics for libvpx are better, but x264 _looks_ better!” (Or, at the very least, “libvpx does not look better!”) So, let’s try to look at some of these (equal bitrate/filesize) videos and decide whether we can see actual visual differences. When doing visual comparisons, it should be obvious why effective rate targeting is important, because visually comparing two files of significantly different size is quite meaningless.
For this comparison, I picked three files: one where Eve is far ahead of libvpx (BarScene), one where the two perform relatively equally (BoxingPractice), and one which represents roughly the median across the files in this test set (SquareAndTimelapse). In each case, the difference between Eve and x264 is close to the median. For target rate, I picked values around 200-1000kbps, with visual optimizations (i.e. no
--tune=psnr ). Overall, this gives reasonable visual quality and is typical for internet video streaming at this resolution, but at the same time allows easy distinction of visual artifacts between encoders. For higher resolution, you’d use higher bitrates, but the types visual artifacts would not change substantially.
First, BarScene: I encoded the file at 200kbps and picked frame 217 of each encoded file. The coded frame size is 889 bytes (Eve), 1020 bytes (libvpx) and 862 bytes (x264),with total file size of 505kB (Eve), 500kB (libvpx) and 507kB (x264). Full-sized images are clickable. In the close-ups, we see various artifacts:
- Bartender’s face: x264 makes the man’s nose and forehead look like a zombie, because of high-frequency noise at sharp edges. Libvpx has the opposite artifact: it is blurry, which is the most-often heard complaint about this encoder.
- Bartender’s shirt and girl’s sweater: libvpx blurs out most texture in the clothing. x264, on the other hand, has high-frequency noise around the buttons on the bartender’s shirt. Both x264 as well as libvpx manage to make the lemon in the glass disappear.
- Patron’s faces: libvpx is again blurry. x264 is also more blurry than it typically is.
- Bar area: x264 hides the finger of the left-hand (top/right, holding the menu), and adds a dark scar (instead of a faint shadow) to the thumb on the right hand. libvpx changes the color of the drink from orange to yellow, makes straws disappear, and is – surprise! – blurry.
Second, let’s look at SquareAndTimelapse. I encoded the file at 1 mbps and selected frame 101 of each encoded file. The coded frame sizes are 2651 bytes (Eve), 2401 bytes (libvpx) and 3721 bytes (x264), with total file size of 1.27 MB (Eve), 1.30 MB (libvpx) and 1.24 MB (x264). Full-sized images are clickable. In the close-ups, we can again compare visual artifacts:
- Man in black coat and woman in pink sweater: x264 turned the woman’s face green’ish. On the other hand, it maintains most texture in the black coat. Eve maintains almost as much detail in the coat, but libvpx blurs it quite significantly. Libvpx also bleeds the red color from the man’s t-shirt into the hair of the woman in front of him (mid/bottom).
- Man in blue t-shirt and woman in white shirt: libvpx blurs the bottom of the man’s t-shirt, particularly the red portion, which is barely visible anymore. x264, on the other hand, blurs away the woman’s face quite significantly (e.g. her mouth disappears). x264 also again suffers from coloring artifacts in the top/left girl’s neck (which turns gray) and the woman in the bottom/right (whose face turns blue). Also with x264, we again see significant high-frequency artifacts in what used to be a shoulderbag in the person to the top/right.
- Red backpack: libvpx combines two recurring artifacts here – blur and color bleed – at the bottom/right edge of the backpack, where the red backpack bleeds into neighbouring objects. x264 does the opposite, and replaces the red color in the bottom/right corner of the backpack with a green patch that seems to come out of nowhere.
Lastly, let’s look at BoxingPractice. I encoded the file at 1 mbps and selected frame 86 of each encoded file. The coded frame sizes are 3060 (Eve), 2785 (libvpx) and 2171 bytes (x264), with total file size of 509 kB (Eve), 481 kB (libvpx) and 513 kB (x264). Full-sized images are clickable. In the close-ups, we can again compare visual artifacts:
- Man with red gloves: in x264, we see the boxing glove color bleeding through into the man’s face. The high frequency noise is also abundantly present, particularly around his left hand’s boxing glove. And although all three encoders suffer significantly from blurring artifacts, libvpx is still by far the worst.
- Man with blue gloves: the x264 file shows more high-frequency noise artifacts on the right shoulder area, and a bright red patch coming out of nothing on the left. And libvpx is this time much more blurry than either of the other two encodes, and also loses the red spot on the base of the glove. The man’s facial color is not well maintained by any of the encoders, unfortunately.
- Foreground boxer: x264 has more high-frequency noise artifact just under the man’s nose. Libvpx, on the other hand, is once again blurry, and loses significantly more color in the man’s face.
Overall, we start seeing a pattern in these artifacts: at comparable file sizes and frame sizes, compared to Eve, libvpx is blurry, and x264 suffers from high-frequency noise artifacts at sharp edges and has issues with skin textures. Both x264/libvpx also have significantly more color-artifacts compared to Eve: x264 tends to lose color and libvpx often bleeds colors. Eve – although obviously not perfect – looks visually much more pleasing, at the same frame size and file size.
Eve is a world-class VP9 encoder that fixes some of the key issues people have complained about with libvpx. Here, I tested the encoder at 360p and 720p using broadcast-style settings, where one encoded file is streamed many, many times, and therefore slow encoding times (1 sec/frame) are acceptable. At these tested CRF/VBR settings, Eve:
- provides better quality metrics than libvpx (5-10% bitrate reduction) and x264 (~15-20% bitrate reduction)
- provides better visual results than libvpx/x264
- is faster than libvpx (10-20%), but slower than x264 (~5x)
- has better target rate adherence than libvpx, and has comparable target rate adherence with x264. To get libvpx at the same target rate adherence, it loses another ~2-3% in quality metrics compared to Eve.
At Two Orioles, we are working to further improve Eve’s quality and speed every day, and lots of work can still be done (e.g.: faster encoding modes). At the same time, we would love to help you use VP9 for internet video streaming. Do you stream lots of video, and are you interested in trying out VP9 or improving your VP9 pipeline using Eve? Contact us , or see our website for more information.