Building a Redis Clone in Zig—Part 1


I’ve been writing web applications for years, mostly high-level stuff with frameworks that abstract away the messy details. You know the type: ORMs, HTTP servers, JSON serialization—all the conveniences of modern backend development. But I’ve always been curious about what happens beneath those abstractions. How does a database actually work? What does efficient memory management look like? So a few months ago, I decided to learn Zig by rebuilding Redis from scratch.

Redis is deceptively simple from the outside: it’s an in-memory key-value store with a straightforward text protocol. But underneath, it’s a masterclass in systems programming—memory management, data structures, network I/O, and concurrency all wrapped up in one project. Perfect for someone trying to bridge the gap between web development and systems programming. Also, almost every company I’ve worked for used it in some way.

The whole promise of Zig is explicit control—memory, error handling, and resources. There’s no garbage collector to save you, no exceptions to bubble up unhandled. If you want to allocate memory, you have to say exactly how, and more importantly, you have to free it yourself. This is precisely what Redis needs: tight control over memory for performance and predictability.

Redis uses RESP (REdis Serialization Protocol), a simple text-based format. A command like SET mykey myvalue arrives as:

*3\r\n
$3\r\n
SET\r\n
$5\r\n
mykey\r\n
$7\r\n
myvalue\r\n
Enter fullscreen mode

Exit fullscreen mode

The *3 means “array of 3 elements.” Each $N is a bulk string of N bytes. It’s verbose but easy to parse.

Here’s a simplified parser:

fn parseCommand(reader: anytype) ![][]const u8 {
    var args = std.ArrayList([]const u8).init(allocator);

    // Read first byte to determine type
    const type_byte = try reader.readByte();
    if (type_byte != *) return error.InvalidProtocol;

    // Read number of arguments
    const count = try readInteger(reader);

    for (0..count) |_| {
        // Each arg starts with $
        const dollar = try reader.readByte();
        if (dollar != ‘$’) return error.InvalidProtocol;

        // Length of this arg
        const len = try readInteger(reader);

        // Read the actual bytes
        const arg = try allocator.alloc(u8, len);
        _ = try reader.readAll(arg);
        try args.append(arg);

        // Consume \r\n
        _ = try reader.readByte();
        _ = try reader.readByte();
    }
    return args.toOwnedSlice();
}
Enter fullscreen mode

Exit fullscreen mode

Coming from garbage-collected languages, the hardest adjustment was thinking about ownership. In Zig, when you allocate memory, you own it. You’re responsible for freeing it. There’s no runtime to clean up after you.

Here’s a simple key-value store:

const Store = struct {
    map: std.StringHashMap([]const u8),
    allocator: std.mem.Allocator,

    pub fn init(allocator: std.mem.Allocator) Store {
        return .{
            .map = std.StringHashMap([]const u8).init(allocator),
            .allocator = allocator,
        };
    }

    pub fn set(self: *Store, key: []const u8, value: []const u8) !void {
        // Duplicate the key and value so we own them
        const owned_key = try self.allocator.dupe(u8, key);
        const owned_value = try self.allocator.dupe(u8, value);

        // If key exists, free old value
        if (self.map.get(key)) |old_value| {
            self.allocator.free(old_value);
        }

        try self.map.put(owned_key, owned_value);
    }

    pub fn deinit(self: *Store) void {
        var iter = self.map.iterator();
        while (iter.next()) |entry| {
            self.allocator.free(entry.key_ptr.*);
            self.allocator.free(entry.value_ptr.*);
        }
        self.map.deinit();
    }
};
Enter fullscreen mode

Exit fullscreen mode

This is a great start, but there are some improvements to be made. Most of the keys and values in a typical caching system are small, so SSO (Small String Optimization) can help reduce allocations.

pub const ShortString = struct {
    data: [23]u8, // Inline storage
    len: u8, // Actual length

    pub fn fromSlice(str: []const u8) ShortString {
        var ss: ShortString = .{ .data = undefined, .len = @intCast(str.len) };
        @memcpy(ss.data[0..str.len], str);
        // Zero remaining bytes for consistent hashing
        @memset(ss.data[str.len..], 0);
        return ss;
    }

    pub fn asSlice(self: *const ShortString) []const u8 {
        return self.data[0..self.len];
    }
};
Enter fullscreen mode

Exit fullscreen mode

This ShortString struct can hold strings up to 23 bytes inline, avoiding heap allocations for small keys/values. Usually, a CPU cache line is 64 bytes, allowing room for multiple instances to fit in a single cache line. I still need to evaluate the performance difference between 23 bytes and 15 bytes, but this is a good starting point.

Zig’s standard library provides a StringHashMap, which is a good starting point. However, we need to ensure that our keys and values are properly managed in terms of memory ownership. For that, we can use StringHashMapUnmanaged, which allows us to define custom allocators and manage memory more explicitly.

To go further, we can create an optimized hash map tailored for our use case. We must be able to store integers, short strings, larger strings, and later, more complex types like lists and hashes. Let’s take a look at how we can define this:

pub const ValueType = enum(u8) {
    string = 0,
    int,
    list,
    short_string,
};

pub const ZedisValue = union(ValueType) {
    string: []const u8,
    int: i64,
    list: ZedisList,
    // Showed before
    short_string: ShortString,
};

pub const ZedisObject = struct {
    value: ZedisValue,
    expiration_time: i64,
    // Plan to add more fields in the future
};

const OptimizedHashMap = std.ArrayHashMapUnmanaged([]const u8, ZedisObject, StringContext, true);
Enter fullscreen mode

Exit fullscreen mode

This will improve performance by reducing allocations for small strings and integers, which are common in Redis workloads. Additionally, we can process KEYS and SCAN commands more efficiently by iterating over the hash map directly, since the underlying structure is an array.

Karl Seguin has already written a fantastic article about it; for a more in-depth understanding, see that. In summary, it keeps track of previously destroyed objects so that the same address in memory can be used again, making it an excellent choice for objects that are allocated and released frequently.

pub const Store = struct {
    base_allocator: std.mem.Allocator,
    allocator: std.mem.Allocator,
    // Cache hash map
    map: OptimizedHashMap,

    // Hybrid memory pools for different string sizes
    small_pool: std.heap.MemoryPool([32]u8), // 32 bytes - for keys
    medium_pool: std.heap.MemoryPool([128]u8), // 128 bytes - for small values
    large_pool: std.heap.MemoryPool([512]u8), // 512 bytes - for medium values

    pub fn init(allocator: std.mem.Allocator) Store {
        return .{
            .base_allocator = allocator,
            .allocator = allocator,
            .map = .{},
            .small_pool = SafeMemoryPool.init(allocator, SMALL_STRING_SIZE),
            .medium_pool = SafeMemoryPool.init(allocator, MEDIUM_STRING_SIZE),
            .large_pool = SafeMemoryPool.init(allocator, LARGE_STRING_SIZE),
        };
    }
}
Enter fullscreen mode

Exit fullscreen mode

It performs especially well with caching system workloads like rate limiters, session stores, user metrics, etc. With that optimization, the SET command is O(1) rather than O(n) in terms of string allocation; just beware that if the memory pool doesn’t have an address available, it will call the syscall alloc.

Zedis on GitHub.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *