PostgreSQL JSONB Size Limits to Prevent TOAST Slicing


In my previous post Embedding Into JSONB Still Feels Like a JOIN for Large Documents, I examined the behavior of large JSONB documents in PostgreSQL when they exceed 2 KB. This limitation can compromise the data locality objectives inherent in the document model. However, it’s worth noting that this limit can be increased up to 32 KB.



Details about TOAST thresholds

PostgreSQL stores table rows in fixed-size pages (BLCKSZ), defaulting to 8 KB but configurable up to 32 KB at compile time. If a row value exceeds the TOAST_TUPLE_THRESHOLD, about a quarter of the page size (≈ 2 KB for 8 KB pages), the TOAST mechanism activates. PostgreSQL first attempts to reduce the row size to the TOAST_TUPLE_TARGET (defaulting to the threshold, but tunable per table with ALTER TABLE … SET (toast_tuple_target = …)) by compressing or relocating large TOASTable columns.

If the value remains too large, it is written to a separate TOAST table in chunks (up to TOAST_MAX_CHUNK_SIZE, also about a quarter of the page size), with each chunk stored as a separate row and retrieved via an index on (chunk_id, chunk_seq).

TOAST_TUPLE_THRESHOLD and TOAST_TUPLE_TARGET can be adjusted at compile time, while TOAST_MAX_CHUNK_SIZE and page size changes require initdb for a new database.

Storage strategies (PLAIN, MAIN, EXTERNAL, EXTENDED) and compression methods (pglz, lz4, etc.) are configurable in SQL. For large JSONB fields, exceeding the threshold incurs the cost of compression and indexed fetching, akin to an implicit join.

To fit the largest document possible in PostgreSQL, we must use the largest block size available, which is 32KB. This requires compiling PostgreSQL and creating a new database. While this is not feasible in a managed service environment, we can perform these actions in a lab setting.



32KB BLKSZ PostgreSQL lab

I used the official Dockerfile and added --with-blocksize=32 to the configure command:

diff --git a/18/alpine3.22/Dockerfile b/18/alpine3.22/Dockerfile
index 0a8c650..8ed3d26 100644
--- a/18/alpine3.22/Dockerfile
+++ b/18/alpine3.22/Dockerfile
@@ -148,6 +148,7 @@ RUN set -eux; \
                --with-llvm \
                --with-lz4 \
                --with-zstd \
+               --with-blocksize=32 \
        ; \
        make -j "$(nproc)" world-bin; \
        make install-world-bin; \
Enter fullscreen mode

Exit fullscreen mode

I built the image and started a container:

sudo docker build -t pg18-blksz32 .

sudo docker rm -f pg18-blksz32
sudo docker exec -it -e PGUSER=postgres $(
 sudo docker run -d --name pg18-blksz32 -e POSTGRES_PASSWORD=x pg18-blksz32
sleep 5
) psql
Enter fullscreen mode

Exit fullscreen mode



increase toast_tuple_target

I created the same table as in the previous post with only one difference: I set the TOAST target to the maximum, the block size minus the tuple header (24 bytes), and TOAST header (8 bytes), at the column level:

ALTER TABLE orders SET (toast_tuple_target = 32736);
Enter fullscreen mode

Exit fullscreen mode

Another solution could have been disabling external storage at the table level, unless it doesn’t fit in the block.

ALTER TABLE orders ALTER COLUMN items SET STORAGE MAIN;
Enter fullscreen mode

Exit fullscreen mode



Explain (analyze, buffers, serialize)

After setting the document with the UPDATE statement of the previous post, I checked the number of pages read to get one document:

postgres=# explain (analyze, buffers, serialize text, verbose, costs off)
select *
 from orders
 where ord_id = 42
;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.orders (actual time=0.047..0.048 rows=1.00 loops=1)
   Output: ord_id, ord_dat, items
   Recheck Cond: (orders.ord_id = 42)
   Heap Blocks: exact=2
   Buffers: shared hit=4
   ->  Bitmap Index Scan on orders_pkey (actual time=0.034..0.035 rows=2.00 loops=1)
         Index Cond: (orders.ord_id = 42)
         Index Searches: 1
         Buffers: shared hit=2
 Planning Time: 0.067 ms
 Serialization: time=0.036 ms  output=20kB  format=text
 Execution Time: 0.169 ms
Enter fullscreen mode

Exit fullscreen mode

As before, scanning the index requires 3 pages and 1 page for the heap table. Because the document fits entirely within this 32 KB page, no additional buffers are needed during the Serialization phase, unlike the six buffer hits required in the database with an 8 KB block size.



Larger documents

If the document size increases above the block size, it will be TOASTed:


postgres=# update orders 
 set items = items || items 
 where ord_id=42
;
UPDATE 1

postgres=# SELECT o.ord_id, o.ord_dat, t.chunk_id, t.chunk_seq, t.ctid,
       pg_size_pretty(length(t.chunk_data)::bigint) AS chunk_size
FROM orders o
JOIN pg_toast.pg_toast_16384 t
  ON t.chunk_id = pg_column_toast_chunk_id(o.items)
WHERE o.ord_id = 42
ORDER BY t.chunk_seq
;
 ord_id |            ord_dat            | chunk_id | chunk_seq | ctid  | chunk_size
--------+-------------------------------+----------+-----------+-------+------------
     42 | 2025-08-27 17:21:30.677401+00 |   116406 |         0 | (0,1) | 8140 bytes
     42 | 2025-08-27 17:21:30.677401+00 |   116406 |         1 | (0,2) | 8140 bytes
     42 | 2025-08-27 17:21:30.677401+00 |   116406 |         2 | (0,3) | 8140 bytes
     42 | 2025-08-27 17:21:30.677401+00 |   116406 |         3 | (0,4) | 8140 bytes
     42 | 2025-08-27 17:21:30.677401+00 |   116406 |         4 | (1,1) | 7524 bytes
(5 rows)
Enter fullscreen mode

Exit fullscreen mode

When stored inline, the document was 20K (output=20kB) and was stored as a single datum. Now that it has doubled to 40 KB, it is stored into five chunks because the maximum TOAST chunk size is 1/4th of the block size. That doesn’t matter because they are stored on the same page, or contiguous pages, as indicated by the CTID. We have doubled the size, so it makes sense to read two heap blocks instead of one. However, those are in addition to the tuple, and the main problem is the overhead of the TOAST index traversal to find those pages. There are 6 pages read in total:

postgres=# explain (analyze, buffers, serialize text, verbose, costs off)
select *
 from orders
 where ord_id = 42
;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.orders (actual time=0.028..0.028 rows=1.00 loops=1)
   Output: ord_id, ord_dat, items
   Recheck Cond: (orders.ord_id = 42)
   Heap Blocks: exact=1
   Buffers: shared hit=3
   ->  Bitmap Index Scan on orders_pkey (actual time=0.016..0.016 rows=1.00 loops=1)
         Index Cond: (orders.ord_id = 42)
         Index Searches: 1
         Buffers: shared hit=2
 Planning Time: 0.109 ms
 Serialization: time=0.143 ms  output=40kB  format=text
   Buffers: shared hit=3
 Execution Time: 0.238 ms
Enter fullscreen mode

Exit fullscreen mode

Compared to the previous example on a 8 KB block size database, there are less pages read in total because the indexes benefit from the large block size and necessitate less branch levels:

postgres=# create extension if not exists pageinspect;
CREATE EXTENSION

postgres=# select btpo_level from bt_page_stats('public.orders_pkey', (
 select root from bt_metap('public.orders_pkey'))
);

 btpo_level
------------
          1
(1 row)

postgres=# select btpo_level from bt_page_stats('pg_toast.pg_toast_16384_index', (
 select root from bt_metap('pg_toast.pg_toast_16384_index'))
);

 btpo_level
------------
          1
(1 row)
Enter fullscreen mode

Exit fullscreen mode

The main advantage of a document database is to offer a data model that natively matches application objects, without the complexity of object–relational mapping (ORM), and to preserve that logical model down to the physical layout. This reduces random I/O, improves cache efficiency, enables transparent sharding, and increases transactional scalability.



Conclusion

In MongoDB, documents are stored in a single block, with size ranging from 32 KB to 16 MB (the maximum BSON size), which ensures strong data locality. Each document typically corresponds to a domain-driven design (DDD) aggregate and can embed large fields, including text, arrays, vectors, extended references, and one-to-many relationships. Common document sizes ranging from 32 KB to 256 KB align with WiredTiger’s minimum I/O size, cloud storage maximum I/O size, and modern CPU cache sizes.

JSONB in PostgreSQL allows some locality for small documents. However, when a document exceeds about 2 KB, it moves to TOAST storage, breaking locality and adding overhead due to an extra index traversal. Increasing the page size to 32 KB allows larger documents to remain in-line, but this still falls short of the sizes typical in document databases. Moreover, achieving these sizes often requires non‑default builds and self-managed deployments, which are usually unavailable in managed services.

PostgreSQL’s TOAST mechanism allows for the storage of very large JSONB values, but it comes with a performance trade-off, impacting locality and access speed in favor of accommodating oversized data in a fixed block size storage engine. In contrast, MongoDB is designed to maintain whole-document locality up to a significantly larger size limit, and preserves the logical model of an application in its physical storage. While both databases can store JSON, they fulfill different purposes: MongoDB is optimized for an application-centric document model preserved down to the storage layer, whereas PostgreSQL is tailored for a database-centric normalized model, enhancing it with JSONB either for small additional attributes or for storing large documents that are accessed infrequently.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *