QOI Encoding in Zig

Mar 13, 2024

Back Home

Note: the code for this project on GitHub has changed, but all of my commentary here remains valid.

Why?

I started writing C as my first programming language in October. While I have other projects labelled under the programming tag on this site, a friend was the lead developer on both, and I was just becoming familiar with Python; I never truly learned the concepts until later, as I hadn't needed to until I took over development for those projects. I had dabbled with Bash scripts, but other than that, I have had very limited programming exposure until recently, though I've had extensive exposure to technology. Becoming familiar with C quickly led me to discover Zig in November. Around the same time, I stumbled upon the QOI (Quite OK Image Format) lossless image format built around speed and simplicity. The simplicity part was what caught my eye; I'm a new programmer, and writing a QOI encoder would simultaneously allow me to get down and dirty with manipulating data on a low level without getting into the weeds of a more complicated spec like that of PNG.

Here's the project link, before we get too carried away: GitHub Link

As soon as I got started with Zig, I already felt it was cool; Rust felt (and continues to feel) intimidating as I was becoming acquainted with C. "Higher-level" languages don't interest me very much.

Full disclosure: I'm not going to go through everything here, and I may get some terminology details wrong. Please reach out to correct me if you notice anything.

Zig

Zig is a programming language designed to be the modern replacement for C.

I won't go over every advantage Zig brings to the table over other languages, but according to Zig's official site, Zig has some unique advantages over languages like C++, D, and Rust:

No hidden control flow or allocations: Zig code is explicit about function calls and memory allocation, so readability is improved and the programmer has more control.
First-class support for no standard library: Zig can be used without any standard library, making it friendly for bare-metal and high-performance development.
A portable language for libraries: Zig aims to be a new portable language for libraries, interacting well with C code while introducing safety and better language design.
A package manager and build system: Zig comes with a capable build system and package manager, useful even for existing C/C++ projects.
Simplicity: Zig avoids unclear features like macros and operator overloading found in other languages.
Tooling: Zig provides a comprehensive toolchain with cross-compilation (which is very cool), C/C++ support, resource compiler, and more, packaged in a single archive.

Zig positions itself as a simpler, more explicit, more portable language than C++, D, and Rust while offering more powerful tooling, which is all fine and dandy, but I'm not super familiar with Rust, D, or C++. Compared to C, Zig's additional safety features like safety-checked undefined behavior and the defer keyword to prevent issues one could normally have with free() in C are what make Zig special and interesting to me. Its focus on simplicity and safety makes it a compelling choice for systems programming tasks, including working with multimedia codecs.

QOI Overview

QOI is very cool because it is extremely simple while still being relatively efficient for a lossless image codec.

A QOI file consists of a 14-byte header, followed by any number of data “chunks” and an 8-byte end marker. (spec)

QOI supports two colorspaces: Linear RGB & sRGB with linear alpha. These do not affect the way pixels are encoded. Pixels (well, "chunks" according to the spec, but I find it more intuitive to call them pixels as that is how they are interpreted and represented on a screen) have either three or four channels; they are either RGB or RGBA. The QOI philosophy is that everything is byte-aligned, making it trivial to stream bytes to a decoder one at a time and have the decoder produce a usable output.

QOI's lossless compression takes place via six main pixel types:

SPEC_NAME ("Name I Made Up"): Description from spec. I'm only making up a name so I can reference it elsewhere more easily.

QOI_OP_RGB ("RGB Pixels"): Full RGB pixel value using 8 bits for each of the red, green, and blue channels. The alpha channel is 255 in RGB images, and remains unchanged from the previous pixel.
QOI_OP_RGBA ("RGBA Pixels"): Full RGBA pixel value using 8 bits for each of the red, green, blue, & alpha channels.
QOI_OP_DIFF ("Diff Pixels"): The difference between the current pixel and the previous pixel for the red, green, and blue channels are stored using 2 bits for each channel. The differences are stored with a bias of 2 and wrap (so 1 minus 2 would be 255). The alpha channel remains unchanged.
QOI_OP_LUMA ("Luma Pixels"): These pixels encode the green channel difference from the previous pixel using 6 bits, and then encode the red and blue channel differences relative to the green channel difference using 4 bits each. This allows for more efficient encoding of small color changes. The alpha channel remains unchanged.
QOI_OP_RUN ("Run Pixels"): These are the simplest, encoding a run-length of pixels that are identical to the previous pixel. The run length is stored using 6 bits with a bias of -1, allowing for runs of 1 to 62 pixels.
QOI_OP_INDEX ("Hashed Pixels"): These are stored by referencing a previously seen pixel value from a rolling array of 64 recent pixel values by using a simple hash on each pixel as it is identified. If another pixel matches a previously seen hash value in the array, the index of the referenced pixel is stored.

That's pretty much it! The entire spec is one page if you'd like a bit more detail. Amazingly, QOI trades blows with libpng's PNG encoding while being much faster on average with the reference encoder given the test corpus the author used in that benchmark.

Explaining the Codebase

Now that I've done the requisite amount of shilling, it is time to get into the real content. I want this to be a thorough walk-through of my decisions writing this encoding implementation, so you can take inspiration from my work on your own if you are also new to Zig. Again, the source code for my project can be found at this git repository.

The program starts by importing the necessary standard library modules and defining the necessary data structures. The QoiEnum enum, for instance, defines the different opcodes used in the QOI format, such as QOI_OP_RGB and QOI_OP_RGBA:

const QoiEnum = enum(u8) {

    QOI_OP_RGB = 0xFE,
    QOI_OP_RGBA = 0xFF,
    QOI_OP_INDEX = 0x00,
    QOI_OP_DIFF = 0x40,
    QOI_OP_LUMA = 0x80,
    QOI_OP_RUN = 0xC0,

};

These are fundamental to the QOI format's compression techniques, which we'll explore in a bit.

Next, we have the QoiPixel union, which represents a single pixel's color values. By using a union, we can access the pixel's channels (red, green, blue, alpha) individually through the channels array or the vals struct. We can also access them as a single unsigned 32-bit integer through concatenated_pixel_values:

const QoiPixel = extern union {
    vals: extern struct {
        red: u8,
        green: u8,
        blue: u8,
        alpha: u8,
    },
    channels: [4]u8,
    concatenated_pixel_values: u32,
};

Important to note here is that the extern keyword defines the union (and the enclosed struct) as C ABI compatible data structures. In Zig's standard unions without the extern keyword, trying to access inactive fields of the union is safety-checked undefined behavior. Your code will compile successfully, but if you compiled in Safe mode (which you probably did by default with zig build) you'll get some runtime errors accompanied by a stack trace. I enjoy the fact that Zig allows you to use extern unions to essentially get different "viewpoints" on how to access the underlying data in memory while maintaining a different native paradigm; it is a testament to the language's ability to espouse flexibility while still providing a feeling of familiarity.

The QoiDesc struct holds essential information about the image being encoded, including its dimensions, color channels, and colorspace, in order to write the header for the encoded file. This struct provides methods for assigning our instance of QoiDesc & writing the QOI header to the output file:

const QoiDesc = struct {
    width: u32 = 0,
    height: u32 = 0,
    channels: u8 = 0,
    colorspace: u8 = 0,

    fn qoiSetEverything(w: u32, h: u32, ch: u8, c: u8) QoiDesc {
        return QoiDesc{ .width = w, .height = h, .channels = ch, .colorspace = c };
    }

    fn writeQoiHeader(self: QoiDesc, dest: *[14]u8) void {
        // ... (implementation omitted for brevity)
    }
};

Struct methods in Zig are not super different from normal functions. According to Zig's website:

Struct methods are not special, they are only namespaced functions that you can call with dot syntax. (1)

Using these within the scope of struct definitions is cool because it makes the codebase nice and neat. You can also call methods with dot syntax, which feels tidy:

// Declare variable `desc` providing width, height, channels, colorspace.
// We can call this method using dot syntax on the type:
    var desc = QoiDesc.qoiSetEverything(width, height, channels, colorspace);
// Now, we use dot syntax on the `desc` variable to call the
// `writeQoiHeader` method, where `desc` is passed to the method as `self`:
    desc.writeQoiHeader(qoi_file[0..14]);

You don't need to do foo->bar like in C, either; Zig will do that automatically if you just do foo.bar. I like this!

The QoiEnc struct encapsulates the encoding logic. It maintains a pixel buffer, a run counter, and various other state variables necessary for the encoding process. The qoiEncInit method initializes the encoder with the image description & an output buffer:

const QoiEnc = struct {
    buffer: [64]QoiPixel,
    prev_pixel: QoiPixel,

    pixel_offset: usize,
    len: usize,

    data: [*]u8,
    offset: [*]u8,

    run: u8,
    pad: u24,

    fn qoiEncInit(self: *QoiEnc, desc: QoiDesc, data: [*]u8) !void {
        // ... (implementation omitted for brevity)
    }

    // ... (other methods omitted for brevity, but there are a bunch)
};

Now, remember the types of pixels allowed in QOI that we enumerated earlier? The QoiEnc struct provides separate methods for encoding each of these, including qoiEncRun, qoiEncLuma, qoiEncIndex, qoiEncFullColor, and qoiEncDifference.

Let's take a closer look at the qoiEncRun method, which handles run-length encoding:

fn qoiEncRun(self: *QoiEnc) void {
    const tag: u8 = @intFromEnum(QoiEnum.QOI_OP_RUN) | (self.run - 1);
    self.run = 0;

    self.offset[0] = tag;
    self.offset += 1;
}

This is a "Run pixel" which is the simplest kind of pixel conceptually within QOI, in my opinion.

This method writes a tag byte to the output buffer, indicating a run of identical pixels. The tag byte consists of the QOI_OP_RUN opcode combined with the run length minus one via a logical or.

The qoiEncLuma method, on the other hand, handles encoding "Luma Pixels":

fn qoiEncLuma(self: *QoiEnc, green_diff: i8, dr_dg: i8, db_dg: i8) void {
    const green_diff_biased: u8 = @intCast(green_diff + 32);
    const dr_dg_biased: u8 = @intCast(dr_dg + 8);
    const db_dg_biased: u8 = @intCast(db_dg + 8);

    const tags = [2]u8{ @intFromEnum(QoiEnum.QOI_OP_LUMA) | green_diff_biased, dr_dg_biased << 4 | db_dg_biased };

    for (tags, 0..) |tag, i| self.offset[i] = tag;
    self.offset += tags.len;
}

This method calculates the differences between the red, green, and blue channels of the current pixel and the previous pixel. It then encodes these differences using a luma tag and bias values, as we already explained when we talked about QOI.

Encoding "Diff Pixels" with qoiEncDifference is a similar story:

    fn qoiEncDifference(enc: *QoiEnc, red_diff: i32, green_diff: i32, blue_diff: i32) void {
        const green_diff_biased: u8 = @intCast(green_diff + 2);
        const red_diff_biased: u8 = @intCast(red_diff + 2);
        const blue_diff_biased: u8 = @intCast(blue_diff + 2);

        const tag: u8 =
            @intFromEnum(QoiEnum.QOI_OP_DIFF) |
            red_diff_biased << 4 |
            green_diff_biased << 2 |
            blue_diff_biased;

        enc.offset[0] = tag;

        enc.offset += 1;
    }

The bit manipulations for properly writing to enc.offset felt a bit intimidating, and to be honest, they still do & probably always will. Thinking about the actual structure of the pixels being written to the file, it begins to make a bit more sense thinking about the fact that the pixels are stored in order of red's difference, then green's difference, then blue's difference, with a tag preceding them; the entire pixel is a single byte, the differences are just being ordered within that byte.

Now, let's look at encoding "RGB Pixels" & "RGBA Pixels" with qoiEncFullColor:

    fn qoiEncFullColor(enc: *QoiEnc, px: QoiPixel, channels: u8) void {
        var s: u3 = 0;
        if (channels > 3) s = 5 else s = 4;
        const tags: [5]u8 = if (channels > 3) .{
            @intFromEnum(QoiEnum.QOI_OP_RGBA),
            px.vals.red,
            px.vals.green,
            px.vals.blue,
            px.vals.alpha,
        } else .{
            @intFromEnum(QoiEnum.QOI_OP_RGB),
            px.vals.red,
            px.vals.green,
            px.vals.blue,
            undefined,
        };

        for (tags[0..s], 0..s) |tag, i| enc.offset[i] = tag;
        enc.offset += s;
    }

Here, we have some trademark "Zigginess". Let's walk through it:

The method takes a pointer to a QoiEnc, which is essentially "self"
A QoiPixel, and
the number of channels. Within the method:
We declare a variable s which is a mere 3 bits
We're making tags an array of 5 unsigned 8-bit integers, and assigning different values to the array depending on the number of channels.

Let's stop here for a moment. As you can see, these if expressions are being used more like ternary expressions would be in C. The assignment to tags is being done via an anonymous list literal. These confused me greatly when first learning Zig, but it makes a bit more sense looking at a more concise example:

// Initialize an array with array literal syntax. Compiler infers size
    const arr_a = [_]i6{6, 7, 8};
// Initialize an array with an anonymous list literal. Compiler cannot
// infer size, as it isn't aware of the anonymous list literal
    const arr_b: [3]i6 = .{6, 7, 8};

Now we can observe the for loop at the end of the function block. I like the way Zig handles for loops, essentially leaving traditional C-style i++ iteration for while loops. What we're essentially doing in this loop is slicing tags from index 0 to the index before s.

A slice is a pointer and a length. The difference between an array and a slice is that the array's length is part of the type and known at compile-time, whereas the slice's length is known at runtime. Both can be accessed with the len field. (2)

We're also keeping track of 0 through the integer just preceding s with 0..s. The values are captured with |tag, i|, where we use i to assign values with tag to elements in enc.offset. Finally, we add s to enc.offset, as s represents the number of bytes we need, and enc.offset is a u8 multi-pointer.

This felt pretty cool, to have for loops and if statements so gracefully doing my bidding based on little s.

How about we take a look at encoding "Hashed Pixels" next, now that we're getting the hang of things?

    fn qoiEncIndex(enc: *QoiEnc, index_pos: u8) void {
        const tag: u8 = @intFromEnum(QoiEnum.QOI_OP_INDEX) | index_pos;
        enc.offset[0] = tag;
        enc.offset += 1;
    }

This is pretty straightforward. We calculate the hash itself outside of the function with const index_pos: u6 = @truncate(cur_pixel.vals.red *% 3 +% cur_pixel.vals.green *% 5 +% cur_pixel.vals.blue *% 7 +% cur_pixel.vals.alpha *% 11);. The *% & +% just mean wrapping addition/multiplication, which we understand from the QOI pixel descriptions. We need our numbers to wrap around. The built-in @truncate helps too, as it ensures we don't overflow a u6; it truncates bits from an integer type resulting in a smaller integer type, so it is taking the low 6 bits of our hash result because there are only 64 entries in our index table. The enc.offset is incremented by 1, because the pixel is a single byte.

That's all the pixels!

Now, the qoiEncodeChunk function is where the magic happens. It iterates over the input pixel data, compares each pixel with the previous one, and chooses the optimal encoding operation based on the differences between the pixels. The encoded data is then written to the output buffer using the respective encoding methods:

fn qoiEncodeChunk(desc: *QoiDesc, enc: *QoiEnc, qoi_pixel_bytes: [*]u8) void {
    var cur_pixel: QoiPixel = undefined;

    // ... (omitted for brevity. this is just deciding "alpha or no alpha?"
    // and writing relevant data to the current pixel.)

    const index_pos: u6 = @truncate(cur_pixel.vals.red *% 3 +% cur_pixel.vals.green *% 5 +% cur_pixel.vals.blue *% 7 +% cur_pixel.vals.alpha *% 11);

    if (qoiComparePixel(cur_pixel, enc.prev_pixel, desc.channels)) {
        // Handle "run pixel" encoding
        // ...
    } else {
        if (enc.run > 0) {
            enc.qoiEncRun();
        }
        if (qoiComparePixel(enc.buffer[index_pos], cur_pixel, 4)) {
            // Handle "hash pixel" encoding
            // ...
        } else {
            enc.buffer[index_pos] = cur_pixel;

            // Handle various encoding operations based on pixel differences
            // ...
        }
    }

    enc.prev_pixel = cur_pixel;
    enc.pixel_offset += 1;

    // Handle padding at the end of the image once it is time to do so
    // ...
}

You can see the index_pos hash calculation in there, where it happens, and what triggers it. Hopefully, everything else is clear, too. Alongside the others, this function demonstrates Zig's ability to work with low-level data structures and perform bit-level operations.

Main Function

Now that we understand everything we can do to encode a QOI image, let's look at the main function.

This main function serves as a great example of Zig's clean and readable syntax as well as QOI's simplicity. It parses command-line arguments, handles file I/O, and orchestrates the encoding process while staying concise. Error handling is easy in Zig, with the error union !void ensuring any errors are dealt with as they are returned. I'd like to expand on error unions more, but I'll save that for another article.

pub fn main() !void {
    // Get allocator
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    defer _ = gpa.deinit();

    // Parse args into string array (error union needs 'try')
    const args = try std.process.argsAlloc(allocator);
    defer std.process.argsFree(allocator, args);

    if (std.mem.eql(u8, args[1], "-h") or
        std.mem.eql(u8, args[1], "--help") or
        args.len < 4 or
        args.len > 4 or
        args[1].len < 1)
    {
        _ = try printHelp();
        return;
    }

    var width: u32 = undefined;
    var height: u32 = undefined;
    var channels: u8 = undefined;
    const colorspace: u8 = try parseInt(u8, args[3], 10);

    print("Opening {s} ... ", .{args[1]});

    const file = try std.fs.cwd().openFile(args[1], .{ .mode = .read_only });
    defer file.close();

    const bytes_read = try file.readToEndAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(bytes_read);
    var offset: usize = 0;

    if (std.mem.eql(u8, bytes_read[0..2], "P7")) {
        print("file is a PAM\n", .{});
        offset = try parsePamHeader(bytes_read[0..72], &width, &height, &channels);
    } else {
        print("\n\x1b[31mInvalid Input: Input file does not appear to be a compatible PAM.\x1b[0m\n", .{});
        print("If your PAM input contains comments in the header, please strip them.\n", .{});
        return error.InvalidInput;
    }

    const image_size: usize = width * height * channels;
    if (image_size > bytes_read.len - offset) {
        print("\x1b[31mInvalid Input: Image size is larger than the file size.\x1b[0m\n", .{});
        return error.InvalidInput;
    }

    var desc = QoiDesc.qoiSetEverything(width, height, channels, colorspace);

    const qoi_file_size = @as(usize, desc.width) * @as(usize, desc.height) * (@as(usize, desc.channels) + 1) + 14 + 8 + @sizeOf(usize);
    var qoi_file = try allocator.alloc(u8, qoi_file_size);
    defer allocator.free(qoi_file);

    print("Writing {s} ... ", .{args[2]});

    desc.writeQoiHeader(qoi_file[0..14]);

    var pixel_seek: [*]u8 = bytes_read[offset..].ptr;
    var enc: QoiEnc = undefined;

    try enc.qoiEncInit(desc, qoi_file.ptr);

    while (!(enc.pixel_offset >= enc.len)) {
        qoiEncodeChunk(&desc, &enc, pixel_seek);
        pixel_seek += desc.channels;
    }

    const used_len = @intFromPtr(enc.offset) - @intFromPtr(enc.data);

    const outfile = try std.fs.cwd().createFile(args[2], .{ .truncate = true });
    defer outfile.close();
    _ = try outfile.writeAll(enc.data[0..used_len]);
    print("\x1b[32mSuccess!\x1b[0m\n\tOriginal:\t{d} bytes\n\tCompressed:\t{d} bytes ", .{ image_size + offset, used_len });
    if ((image_size + offset) > used_len) {
        const used_len_flt: f64 = @floatFromInt(used_len);
        const image_size_flt: f64 = @floatFromInt(image_size + offset);
        const percent_dec: f64 = 100.0 - ((used_len_flt / image_size_flt) * 100.0);
        print("(\x1b[33m{d:.2}%\x1b[0m smaller)\n", .{percent_dec});
    } else {
        print("\n", .{});
    }
}

That's it, the entire main function. It could probably be made even more concise with some code golfing, but for now, I'm happy with how readable it is even without comments.

Coming from any other low-level language background, I think this function is relatively easy to parse. This is a big argument for Zig, in my opinion - transitioning C projects is not a massive burden if you have interoperability as well as a codebase that feels familiar to long-time C developers. Even as a relatively inexperienced C developer, Zig was trivial to pick up.

Conclusion

Exploring QOI encoding in Zig has been an incredibly rewarding ordeal. Not only did this experience allow me to dive into the intricacies of simple lossless image compression, but it also showcased to me the power and versatility of Zig.

Full disclosure, I may have gotten some details incorrect here, so corrections may be imminent. I'll add a note to the very top of this page if I make any larger corrections to the article.

If you're a programmer seeking a modern systems programming language, I highly recommend giving Zig a try. If you have any questions, comments, or concerns, feel free to reach out to me via my socials on this site's homepage. Thanks for reading!

Thank you to Cancername for their help with the QOI encoding implementation!

Freestanding QOI Encoder in Zig: GitHub Link (no code comments yet, coming soon).

Mastodon | Matrix | The Codec Wiki

Why?

Zig

QOI Overview

Explaining the Codebase

Main Function

Conclusion

Sponsor Me on GitHub Sponsors