Building a Redis Clone in Zig—Part 1

AI

After an outcry, OpenAI swiftly rereleased 4o to paid users. But experts say it should not have removed the model so suddenly.

OpenAI’s decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o’s failure to…

AI

‘Cheapfake’ AI Celeb Videos Are Rage-Baiting People on YouTube

“They’re tweaking my voice or whatever they’re doing, tweaking their own voice to make it sound like me, and people are commenting on it like it is me and it ain’t me,” Washington recently told WIRED, when asked about AI. “I don’t have an Instagram account. I don’t have TikTok. I don’t have any of…

AI

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence. Researchers at…

AI

OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I’m not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn’t answer your prompt because the request violated OpenAI’s content guidelines, it would hit you with a…

I’ve been writing web applications for years, mostly high-level stuff with frameworks that abstract away the messy details. You know the type: ORMs, HTTP servers, JSON serialization—all the conveniences of modern backend development. But I’ve always been curious about what happens beneath those abstractions. How does a database actually work? What does efficient memory management look like? So a few months ago, I decided to learn Zig by rebuilding Redis from scratch.

Redis is deceptively simple from the outside: it’s an in-memory key-value store with a straightforward text protocol. But underneath, it’s a masterclass in systems programming—memory management, data structures, network I/O, and concurrency all wrapped up in one project. Perfect for someone trying to bridge the gap between web development and systems programming. Also, almost every company I’ve worked for used it in some way.

The whole promise of Zig is explicit control—memory, error handling, and resources. There’s no garbage collector to save you, no exceptions to bubble up unhandled. If you want to allocate memory, you have to say exactly how, and more importantly, you have to free it yourself. This is precisely what Redis needs: tight control over memory for performance and predictability.

Redis uses RESP (REdis Serialization Protocol), a simple text-based format. A command like SET mykey myvalue arrives as:

*3\r\n
$3\r\n
SET\r\n
$5\r\n
mykey\r\n
$7\r\n
myvalue\r\n

The *3 means “array of 3 elements.” Each $N is a bulk string of N bytes. It’s verbose but easy to parse.

Here’s a simplified parser:

fn parseCommand(reader: anytype) ![][]const u8 {
    var args = std.ArrayList([]const u8).init(allocator);

    // Read first byte to determine type
    const type_byte = try reader.readByte();
    if (type_byte != ‘*’) return error.InvalidProtocol;

    // Read number of arguments
    const count = try readInteger(reader);

    for (0..count) |_| {
        // Each arg starts with $
        const dollar = try reader.readByte();
        if (dollar != ‘$’) return error.InvalidProtocol;

        // Length of this arg
        const len = try readInteger(reader);

        // Read the actual bytes
        const arg = try allocator.alloc(u8, len);
        _ = try reader.readAll(arg);
        try args.append(arg);

        // Consume \r\n
        _ = try reader.readByte();
        _ = try reader.readByte();
    }
    return args.toOwnedSlice();
}

Coming from garbage-collected languages, the hardest adjustment was thinking about ownership. In Zig, when you allocate memory, you own it. You’re responsible for freeing it. There’s no runtime to clean up after you.

Here’s a simple key-value store:

const Store = struct {
    map: std.StringHashMap([]const u8),
    allocator: std.mem.Allocator,

    pub fn init(allocator: std.mem.Allocator) Store {
        return .{
            .map = std.StringHashMap([]const u8).init(allocator),
            .allocator = allocator,
        };
    }

    pub fn set(self: *Store, key: []const u8, value: []const u8) !void {
        // Duplicate the key and value so we own them
        const owned_key = try self.allocator.dupe(u8, key);
        const owned_value = try self.allocator.dupe(u8, value);

        // If key exists, free old value
        if (self.map.get(key)) |old_value| {
            self.allocator.free(old_value);
        }

        try self.map.put(owned_key, owned_value);
    }

    pub fn deinit(self: *Store) void {
        var iter = self.map.iterator();
        while (iter.next()) |entry| {
            self.allocator.free(entry.key_ptr.*);
            self.allocator.free(entry.value_ptr.*);
        }
        self.map.deinit();
    }
};

This is a great start, but there are some improvements to be made. Most of the keys and values in a typical caching system are small, so SSO (Small String Optimization) can help reduce allocations.

pub const ShortString = struct {
    data: [23]u8, // Inline storage
    len: u8, // Actual length

    pub fn fromSlice(str: []const u8) ShortString {
        var ss: ShortString = .{ .data = undefined, .len = @intCast(str.len) };
        @memcpy(ss.data[0..str.len], str);
        // Zero remaining bytes for consistent hashing
        @memset(ss.data[str.len..], 0);
        return ss;
    }

    pub fn asSlice(self: *const ShortString) []const u8 {
        return self.data[0..self.len];
    }
};

This ShortString struct can hold strings up to 23 bytes inline, avoiding heap allocations for small keys/values. Usually, a CPU cache line is 64 bytes, allowing room for multiple instances to fit in a single cache line. I still need to evaluate the performance difference between 23 bytes and 15 bytes, but this is a good starting point.

Zig’s standard library provides a StringHashMap, which is a good starting point. However, we need to ensure that our keys and values are properly managed in terms of memory ownership. For that, we can use StringHashMapUnmanaged, which allows us to define custom allocators and manage memory more explicitly.

To go further, we can create an optimized hash map tailored for our use case. We must be able to store integers, short strings, larger strings, and later, more complex types like lists and hashes. Let’s take a look at how we can define this:

pub const ValueType = enum(u8) {
    string = 0,
    int,
    list,
    short_string,
};

pub const ZedisValue = union(ValueType) {
    string: []const u8,
    int: i64,
    list: ZedisList,
    // Showed before
    short_string: ShortString,
};

pub const ZedisObject = struct {
    value: ZedisValue,
    expiration_time: i64,
    // Plan to add more fields in the future
};

const OptimizedHashMap = std.ArrayHashMapUnmanaged([]const u8, ZedisObject, StringContext, true);

This will improve performance by reducing allocations for small strings and integers, which are common in Redis workloads. Additionally, we can process KEYS and SCAN commands more efficiently by iterating over the hash map directly, since the underlying structure is an array.

Karl Seguin has already written a fantastic article about it; for a more in-depth understanding, see that. In summary, it keeps track of previously destroyed objects so that the same address in memory can be used again, making it an excellent choice for objects that are allocated and released frequently.

pub const Store = struct {
    base_allocator: std.mem.Allocator,
    allocator: std.mem.Allocator,
    // Cache hash map
    map: OptimizedHashMap,

    // Hybrid memory pools for different string sizes
    small_pool: std.heap.MemoryPool([32]u8), // 32 bytes - for keys
    medium_pool: std.heap.MemoryPool([128]u8), // 128 bytes - for small values
    large_pool: std.heap.MemoryPool([512]u8), // 512 bytes - for medium values

    pub fn init(allocator: std.mem.Allocator) Store {
        return .{
            .base_allocator = allocator,
            .allocator = allocator,
            .map = .{},
            .small_pool = SafeMemoryPool.init(allocator, SMALL_STRING_SIZE),
            .medium_pool = SafeMemoryPool.init(allocator, MEDIUM_STRING_SIZE),
            .large_pool = SafeMemoryPool.init(allocator, LARGE_STRING_SIZE),
        };
    }
}

It performs especially well with caching system workloads like rate limiters, session stores, user metrics, etc. With that optimization, the SET command is O(1) rather than O(n) in terms of string allocation; just beware that if the memory pool doesn’t have an address available, it will call the syscall alloc.

Zedis on GitHub.

Source link

Author Info

Author Name

Find Me On

Trending News

Leave a Reply Cancel reply

Author Info

Find Me On

Trending News

Leave a Reply Cancel reply

Related News