Hey Devs 👋,
If you’re exploring modern data engineering stacks or curious about mixing languages in one pipeline – this post is for you!
I wanted to try out something lightweight but real:
Using Python for data prep and Go for high-speed ingestion into ClickHouse.
Here’s what I built, how it works, and what I learned 👇
📦 What This Project Does
This is a beginner-friendly, containerized mini-project that shows how a polyglot pipeline can work:
🐍 Python — generates and prepares sample data
📁 Converts the data into a Parquet file
⚡ Go — reads the Parquet file and inserts into ClickHouse
🐳 Everything runs locally using Docker Compose
🛠️ Tech Stack
- Python — flexible for data prep & Parquet generation
- Go — blazing fast for inserting data into ClickHouse
- ClickHouse — lightning-fast OLAP DB
- Docker Compose — to spin up ClickHouse locally
- Parquet — efficient columnar storage format
⚙️ How To Run It Locally
Step 1. Clone the repo
git clone https://github.com/mohhddhassan/go-clickhouse-parquet.git
cd go-clickhouse-parquet
Step 2. Generate sample Parquet data with Python
cd python
python3 generate_parquet.py
Step 3. Start ClickHouse using Docker Compose
docker compose up -d
Step 4. Run the Go app to ingest data
cd go
go run main.go
🗂️ Project Structure
go-clickhouse-parquet/
├── docker-compose.yml # ClickHouse setup
├── parquet-files/
│ └── sample.parquet # Auto-generated test file
├── python/
│ └── generate_parquet.py # Script to create data
└── go/
├── go.mod
├── go.sum
└── main.go # Ingests Parquet into ClickHouse
🤯 What I Learned
💡 How to generate Parquet programmatically with Python
💡 Using Go to connect with ClickHouse and perform inserts
💡 Deploying ClickHouse quickly with Docker Compose
💡 The idea of polyglot pipelines — mixing languages for their strengths
🔍 Why You Should Try This
If you’re learning data engineering or systems programming:
- Practice mixing Python + Go in real-world data movement
- Get hands-on with Parquet files (a must-have in analytics)
- See how ClickHouse handles fast inserts and queries
- Get used to wiring up multiple components into a pipeline
📌 What’s Next?
📈 Build a dashboard on top of ClickHouse
⚙️ Try streaming Parquet data into ClickHouse
📂 Experiment with more complex schemas
🚀 Benchmark Python vs Go for performance in the pipeline
🙋♂️ About Me
Mohamed Hussain S
Associate Data Engineer
LinkedIn | GitHub
🧪 Building one mini project at a time to become a better data engineer.