Secure Configuration Without Exposing Sensitive Information
As this yearâs OSPP event draws to a close, developers from the Apache SeaTunnel community have also reaped the rewards of their long development journey.
Last week, we shared the story of Dong Jiaxin and her Flink Engine CDC Schema Evolution project in the article OSPP Project Outcome: Supporting Flink Engine CDC Source Schema Evolution.
Today, letâs take a look at another contributor â Wu Tianyu, a Software Engineering masterâs student from Shanghai Jiao Tong University, and see how he completed his development task during this program!
Personal Introduction
Wu Tianyu, a masterâs student in Software Engineering at Shanghai Jiao Tong University (GitHub ID: wtybxqm). His current research focuses on pervasive computing and human-computer interaction, and he enjoys sports and gaming in his spare time.
During his graduate studies, he has published a CCF-A level paper and received the National Scholarship.
Project Introduction
- Apache SeaTunnel Supports Metalake Development
During the 2025 Summer of Open Source program, I participated in the Apache SeaTunnel project. The goal of my work was to solve the problem of sensitive information exposure in task configurations.
Project Background
In traditional data processing tools, sensitive information such as usernames and passwords for data sources are often written directly into scripts.
This approach not only poses serious security risks, but also makes maintenance cumbersomeâwhen data source information needs to be updated, users must manually edit every related script, which is both inefficient and error-prone.
To address this, I designed and implemented Metalake support, which allows users to dynamically retrieve data source information via a unique sourceId.
This approach prevents sensitive information from being exposed, while also simplifying the management of data source configurations.
Implementation Approach
To achieve this goal, I first integrated Metalake-related configurations into SeaTunnelâs configuration files, ensuring that when a task starts, it can automatically load relevant Metalake settings.
Users can configure Metalake in the seatunnel-env.sh file or in the env section of their task configuration. When the task starts, it automatically reads and applies these settings, ensuring flexibility and extensibility in configuration.
Design Overview
Next, I modified the source and sink configurations to add a new field, sourceId, which serves as the unique identifier for querying data source information from Metalake.
In the task script, users only need to specify a sourceId. The system will then dynamically fetch the corresponding data source details from Metalake and replace the placeholders in the configuration.
As a result, sensitive data such as usernames and passwords are no longer hardcoded into configuration filesâthey are securely loaded at runtime via the Metalake interface, significantly improving system security and preventing information leakage.
During the implementation, I also considered system extensibility, so I designed a plugin-based interface.
This design allows the system not only to integrate with Apache Gravitino, but also to support other data catalog services such as UnityCatalog or DataHub.
This ensures the system remains flexible and adaptable, allowing users to choose their preferred data source management tools.
Although I encountered several challengesâespecially at the beginning, when I was still getting familiar with the SeaTunnel codebaseâthrough consistent communication with my mentors and persistent learning, I gradually overcame these difficulties.
The hardest part was testing. There were many test cases, and due to network instability, some tests were flaky and required multiple retries to pass. This tested my patience and attention to detail.
After continuous effort, I managed to resolve these issues, successfully completed all tasks, and contributed both code and documentation to the open-source community.
Ultimately, this project not only enhanced SeaTunnelâs security but also improved its data source configuration flexibility and manageability.
It effectively addressed user pain points around sensitive information exposure and maintenance complexity, while laying a solid foundation for future integration with additional data source management systems.
To better understand the developersâ experiences and reflections during the Summer of Open Source program, the Apache SeaTunnel community conducted a brief interview with Wu Tianyu.
Hereâs the full interview transcript:
Q1: Among all available projects, why did you choose SeaTunnel?
A: First, because SeaTunnel is part of the Apache community, which has always had an excellent reputation. Participating in such a project was a valuable opportunity for me.
Secondly, SeaTunnelâs focus overlaps with my research interests and aligns well with the technology stack Iâm familiar with. Thatâs why I chose it.
Q2: How does the SeaTunnel project relate to your academic studies?
The technologies used in SeaTunnel closely match what I learned during my undergraduate studies, especially in Java development and testing.
Participating in this project not only helped me reinforce and review my prior knowledge but also deepened my understanding by combining theory with practical development experience.
Q3: How has this project influenced your academic or career development? What are your reflections on that?
Working on the SeaTunnel project greatly improved my hands-on development skills and deepened my understanding of data processing and integration.
It taught me how to handle complex data flow and scheduling problems in real-world systems, and how to overcome technical challenges in actual development.
These experiences will have a long-lasting impact on my academic research and professional career.
Q4: What was the biggest challenge you encountered, and how did you overcome it?
The biggest challenge was testing.
I needed to use Gravitino to provide data source information, but the current version of Gravitinoâs Docker image had runtime issues that caused tests to fail.
I solved this by directly downloading and installing Gravitino inside the test container, allowing the tests to run smoothly.
Q5: How long have you been involved in open source? Do you enjoy it? What changes has open source brought you?
This was my first time participating in an open-source community, and I truly enjoyed the experience.
Open source not only improved my technical skills but also helped me appreciate the importance of collaboration and communication within a communityâboth of which have been invaluable for my personal growth.
Q6: What was your first impression of the SeaTunnel community? What do you hope to gain from it? Do you have any suggestions for improvement?
My first impression of the SeaTunnel community was that itâs open and inclusive. Everyone is willing to share their experiences and help newcomers.
By joining the community, I hope to improve my technical abilities, meet more talented developers, and engage in deeper technical discussions.
As for suggestions, I think the community could improve its documentation, especially around technical details, to help new contributors get started more easily.
Q7: Will you continue contributing to SeaTunnel in the future?
Yes, I plan to remain active in the SeaTunnel community and continue contributing to its development and maintenance.
The open-source community has given me many opportunities, and I hope that through my efforts, I can help more developers and users benefit from SeaTunnel.
Summary
Through his work on Metalake support, Wu Tianyu not only strengthened SeaTunnelâs data security but also enhanced its flexibility for enterprise use cases.
This contribution marks another important step in making SeaTunnel a more secure, scalable, and developer-friendly data integration platform.



