Apache Iceberg Dev Mailing List – Weekly Digest (Aug 9 – 15, 2025)




SparkTable refreshEagerly clarification

The week began with a clarification question about the refreshEagerly option on SparkTable. Limin Ma asked whether enabling refreshEagerly would automatically fetch changes from the remote catalog and update Spark’s schema. Szehon Ho clarified that refreshEagerly only refreshes the local table metadata; it does not fetch remote changes. To ensure that Spark sees the latest snapshot, callers should call table.refresh() which forces a remote fetch. There is no concurrency control around refreshEagerly(), so if multiple threads call the method, only one refresh actually occurs and the others may see stale snapshots. This exchange is captured in the mailing‑list thread “[QUESTION] SparkTable refreshEagerly”.



Community‑driven events and meet‑ups

Several community events were announced or updated during the week:

  • Meet‑ups in Tel Aviv and Japan. Yossi Reitblat announced that the Tel Aviv community meet‑up will return on September 10 2025. The event will be a half‑day in‑person meeting at Monday.com; registration and a call for proposals are linked in the announcement thread. Yuya Ebihara later shared that the third Japanese community meet‑up will take place on September 22 2025 and provided a registration link in a short message thread.
  • Enabling more meet‑ups. Kevin Liu thanked Danica Fine for improving the project’s Community page and reorganizing the navigation bar so the Community page is top‑level. He encouraged organizers to add local events and noted that the improvements are already live at iceberg.apache.org/community in a reply to the “[DISCUSS] Enabling more Meetups” thread.
  • V4 single‑file commit sync invitation. Amogh Jahagirdar scheduled a recurring community sync starting August 19 2025 (every two weeks on Tuesdays at 9 AM PT) to discuss the “V4 single‑file commits” proposal. The invitation with the Google Meet link and calendar details is posted in the “[Invitation: V4 Single File Commits Sync]” thread.
  • Column statistics sync. Eduard Tudenhöfner added a calendar event for August 19 2025 at 9 AM PT to discuss details of the V4 column statistics proposal. The scheduling note appears in the “[DISCUSS] v4 – Improved column statistics” thread.



Discussion: Analytics Accelerator Library for Amazon S3

Kevin Liu and Michael Stubbs proposed adopting the Analytics Accelerator Library as the default input stream for Amazon S3. They suggested discussing the topic in the weekly community sync and, if there is enough interest, holding an ad‑hoc FileIO sync to dig deeper. Michael Stubbs summarized the proposal and shared a draft document in the “[Discuss] Analytics Accelerator Library for Amazon S3 as default S3 Input Stream” thread. Subsequent replies indicated there was interest in a special session; details were promised in a follow‑up message.



Discussion: RCK error messages should be standardized

André Anastácio raised concerns about the Reference Catalog Kit (RCK) tests relying on exact error messages. He noted that different clients (Java, Rust, Python and Go) produce slightly different messages, making it hard to write cross‑language tests. The thread “[DISCUSS] RCK and Iceberg Clients – Should We Standardize Error Messages?”](https://lists.apache.org/thread/rvxwxv1jzkxmhfb7ojzym08l44s7fqzp) gathered several opinions. Daniel Weeks suggested pushing for some standardization while allowing flexibility, and Steve Loughran proposed using numeric error codes rather than matching full strings.



Proposal: mark HTTP 503 as non‑retryable for updateTable

Prashant Singh proposed changing the REST catalog specification so that HTTP 503 responses during updateTable operations are treated the same as HTTP 500/502/504—signaling commit state unknown and preventing automatic retries. He argued that proxies like Envoy can return 503 after a commit is processed, so retrying could corrupt tables. Dennis Huo responded in the “[DISCUSS] Mark 503 error code as non‑retryable for updateTable” thread that, since low‑level retries for read‑only methods have been merged, there is less need to retry commits. He supported configuring 503 as non‑retryable and suggested using the Retry‑After header to decide when a retry is safe.



Question: TableScan API and metadataLocation in REST catalog

Limin Ma asked why the REST catalog’s TableScan API returns data files without a metadataLocation property even though local scans include it. Ryan Blue explained that metadataLocation is only a hint used for writes; by the time a commit completes, the manifest may change, so server‑side planning omits it. He added that server‑side planning is intended for simple clients that cannot read metadata themselves and that hints are therefore not included. This exchange is recorded in the “[QUESTION] Rest catalog TableScan API response’s data_file has no ‘metadataLocation’]” thread.



V4 metadata proposals

The community continued to discuss proposals for Iceberg format V4, which aims to reduce the number of files written during commits and improve metadata:

  • One‑file commits. Russell Spitzer, Steven Wu and Yi Fang shared high‑level concepts for “one‑file commits,” replacing manifest lists with root manifests and using manifest delete vectors. Amogh Jahagirdar responded and noted that his team merged their adaptive metadata tree proposal with Russell’s ideas, planning to use a combined document as the single source of truth. He highlighted open questions such as whether manifest delete vectors should be inline or external, how to infer file‑level changes from the root manifest, and whether partition tuples can be replaced by column statistics. The initial discussion appears in the “[DISCUSS] v4 – One file commits” thread, and further updates were promised during the recurring V4 sync.
  • Improved column statistics. The “[DISCUSS] v4 – Improved column statistics]” thread continued to refine proposals for assigning field IDs to statistics and reserving ID ranges. Contributors debated how to handle reserved columns, whether writers should share a single stats space, and what bounds to place on column stats. Eduard Tudenhöfner scheduled a sync on August 19 to resolve outstanding questions.
  • FileFormat API. Péter Váry summarized the FileFormat API’s status following a community sync. Two questions require feedback: (1) dropping support for position-delete files that store deleted row data (because no current implementation uses them); and (2) deprecating format‑specific readers/writers in favor of unified InternalData and FileFormat APIs. He asked users to share any use cases that rely on the old behaviors in the “[DISCUSS] FileFormat API proposal]” thread.



Miscellaneous questions and discussions

  • Type promotion in Parquet writes. Nicolae Vartolomei asked how type promotion (e.g., int→long or float→double) works when writing Parquet files. He wondered whether new files could be written with the promoted type while older files use the original type and whether writers can omit columns that contain all nulls. The unanswered question is posted in the “[QUESTION] What type promotion actually means]” thread.
  • Commit conflicts in REPLACE TABLE. Guy Gadon reopened a discussion on allowing commit conflicts when replacing a table. He argued that the current behavior ignores potential conflicts and can revive expired snapshots or override table properties. Ryan Blue responded that transactions should retry against the latest metadata rather than blindly replacing the metadata.json file. The conversation is in the “[DISCUSS] Allow Commit Conflicts in REPLACE TABLE transactions]” thread.



Closing thoughts

Last week’s dev‑list conversations showcased a community balancing new features (V4 single‑file commits, improved statistics, FileFormat API) with practical concerns such as error handling, retries and scan efficiency. There was strong interest in coordinating meet‑ups and syncing on major proposals. Some questions—like type promotion and server‑side scan hints—remain open, but the active participation reflects a healthy, collaborative community.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *