The past two years have seen significant advancements in video and image compression, particularly with the maturation of the SVT-AV1 video encoder and improvements to AVIF image compression. These developments, coupled with faster and more accessible developer tools, have made it easier to produce high-quality compressed media. I have a lot of optimism for the future with AV2 and the potential for further community-driven innovation in open-source compression technology.
When you work on anything, I think it is worth thinking about why you are working on it aside from whether or not it provides some instant gratification. Thinking about what the core purpose of video and image compression might be, it is easy to land on a simple conclusion: smaller images and (especially) videos make the internet cheaper. Video makes up the vast majority of internet bandwidth, so small gains add up. However, this prompts further questioning; what might the end game be for compression? For the average person, what is the deliverable derived from compression technology indistinguishable from magic?
I believe this final goal is beautiful multimedia accessible anywhere. In other words, an artist's vision is not served half-baked to any sub-section of their audience. Compression doesn't just make the internet cheaper, it also makes it more beautiful.
The most revolutionary compression technology has been able to unlock new experiences for people creating and experiencing art. If you look at the history of video, this is illustrated by certain major leaps made over time. Every breakthrough represents a widening of the pipeline between artist and audience. Let's look at the timeline.
– Video begins as something you experience at a theater; people travel to see certain videos at specific times, and then maybe never again. Video doesn't look that great, and for a while, it doesn't have sound or color.
– Analog broadcast television lets people see certain videos at specific times anywhere that has the right equipment. At some point, video gets color.
– VCRs enable time-shifted viewing; you can see certain videos at almost any time.
– Digital optical discs improve upon the last step by providing better looking videos.
– The world of digital internet video emerges through peer-to-peer downloads. Now you can see pretty much any video anywhere after waiting for a download. Regardless of your internet connection, a smaller file means a faster download.
– Streaming platforms make on-demand web video real; now you can watch pretty much any video anywhere with a good internet connection almost instantly.
– Smartphones proliferate; smarter compression and streaming technology means we can do what we did in the last step on most internet connections, allowing people to access virtually any video almost anywhere on Earth.
– Today, beautiful video is no longer exclusive to Blu-ray; high-end streaming services provide access to high-fidelity video anywhere with decent internet. Additionally, live video platforms let anyone show lots of people a window into their life from almost anywhere.
Through the combined efforts of better internet and better compression, we could see more breakthroughs in the future. Emerging immersive formats (360° video, VR/AR) and cloud gaming are opportunities for more innovation, and could unlock compelling art with enough accessibility.
For now, it is worth acknowledging that we've come a really long way.
The past two years have been incredibly consequential for video. I would argue this is the strongest the ecosystem has been in over a decade from a technological standpoint. The best open-source video encoder in the world is beginning to show signs of maturity, web-first image compression had its first major breakthrough since JPEG, and life has become a lot easier for developers. Let's walk through each of these stories.
SVT-AV1 is the best open-source video encoder in the world, according to a lot of metrics. Despite this, it is still not the most mature (that title goes to x264). For SVT-AV1 to fully supersede x264, it needs to always be better; it cannot just be mostly better. Metrics say we are already here, but subjective testing tells a different story.
One year ago (August 2024), the predominant narrative around AV1 was that it was "blurry". Time and energy were invested in swaying this perception, but it ultimately still held. To address this, I had started the SVT-AV1-PSY project earlier that year with the stated goal of building and testing research-grade features for subjective quality based on user feedback.
In August 2025, this perception has mostly been shattered. A number of the aforementioned research-grade features have been upgraded to production-grade, with more progress happening every day. Additionally, the number of features has ballooned; SVT-AV1-PSY saw a number of additional releases, and as the project was put to rest, a few new forks emerged to perpetuate the effort.
Utilities for using video encoders have improved dramatically as well; the popular chunked encoding script called Av1an saw a development resurgence to reinvigorate some older features, and other tools with similar functionality arrived on the scene to provide more options to users.
The implications of these dramatic improvements are numbered. As an enthusiast, it is no longer as easy to avoid producing good-looking AV1 video. SVT-AV1 is arguably more mature than x265, and x264 is the final available contender. Considering it takes the better part of a decade to develop a robust video encoder, the fact that so much is in motion right now is exciting news.
In August 2024, the image compression story was looking unfortunate compared to video. WebP failed to deliver exceptional compression improvements over modern JPEG encoders, and AVIF was looking promising but underwhelming. It suffered from the same "blurriness" as AV1, but it was far more severe in a world where modern image encoders tend to be well-optimized for human perception. The best standard by far was JPEG XL, which had been removed from Chrome, effectively killing any chance of ubiquitous use of the format on the web.
While all of this held true, in August Julio Barba and I were working on image-focused enhancements for SVT-AV1-PSY. We announced our results publicly, and while they were promising, the implementation was limited to our community-supported video encoder.
In 2025, Julio has worked diligently with Google to bring our work to libaom, the reference implementation of AV1. Due to its more complete feature set, it already has a number of image-specific performance considerations, so it was a perfect fit. The new improvements are wrapped into Tune IQ in libaom, and websites like The Guardian are already benefiting from its vastly improved consistency and compression gains. Before Tune IQ, AVIF would occasionally lose to JPEG; this is no longer true.
In August 2024, everyone's favorite metric is SSIMULACRA2. It is incredibly accurate to human visual perception, and helped guide certain decisions made during SVT-AV1-PSY development. It had a couple of issues, though.
ssimulacra2_rs
is not fast
Enter Vship, an
SSIMULACRA2 and Butteraugli implementation that uses the GPU. Better
developer tools enable new development paradigms, and Vship's 10–100x
speed improvement opened the doors for encoder testing frameworks like
PSY-EX `metrics` that
allow for streamlined encoder benchmarking automation. My blog post on
comparing video encoders would have been a lot longer and more
complex in 2024, but now testing an encoder's convex hull is as simple
as running a single Bash script to call PSY-EX metrics
tooling powered by Vship.
I have a lot of hope that AV2 will rally intense community efforts in open source like AV1 did. There is a lot of performance left on the table with a standard as complex as AV2, and I hope in the next decade it will be properly realized.
For now, I think we should celebrate what has happened so far.
AV1 currently features:
In the past year, we've seen:
I'm excited for what the next year of development has to offer.