Polyglot Data Engineering: Python + Go in the Same Pipeline


Hey Devs 👋,

If you’re exploring modern data engineering stacks or curious about mixing languages in one pipeline – this post is for you!

I wanted to try out something lightweight but real:
Using Python for data prep and Go for high-speed ingestion into ClickHouse.

Here’s what I built, how it works, and what I learned 👇

🔗 GitHub Repo




📦 What This Project Does

This is a beginner-friendly, containerized mini-project that shows how a polyglot pipeline can work:

🐍 Python — generates and prepares sample data
📁 Converts the data into a Parquet file
⚡ Go — reads the Parquet file and inserts into ClickHouse
🐳 Everything runs locally using Docker Compose




🛠️ Tech Stack

  • Python — flexible for data prep & Parquet generation
  • Go — blazing fast for inserting data into ClickHouse
  • ClickHouse — lightning-fast OLAP DB
  • Docker Compose — to spin up ClickHouse locally
  • Parquet — efficient columnar storage format



⚙️ How To Run It Locally

Step 1. Clone the repo

git clone https://github.com/mohhddhassan/go-clickhouse-parquet.git
cd go-clickhouse-parquet
Enter fullscreen mode

Exit fullscreen mode

Step 2. Generate sample Parquet data with Python

cd python
python3 generate_parquet.py
Enter fullscreen mode

Exit fullscreen mode

Step 3. Start ClickHouse using Docker Compose

docker compose up -d
Enter fullscreen mode

Exit fullscreen mode

Step 4. Run the Go app to ingest data

cd go
go run main.go
Enter fullscreen mode

Exit fullscreen mode




🗂️ Project Structure

go-clickhouse-parquet/
├── docker-compose.yml         # ClickHouse setup
├── parquet-files/
│   └── sample.parquet         # Auto-generated test file
├── python/
│   └── generate_parquet.py    # Script to create data
└── go/
    ├── go.mod
    ├── go.sum
    └── main.go                # Ingests Parquet into ClickHouse
Enter fullscreen mode

Exit fullscreen mode




🤯 What I Learned

💡 How to generate Parquet programmatically with Python
💡 Using Go to connect with ClickHouse and perform inserts
💡 Deploying ClickHouse quickly with Docker Compose
💡 The idea of polyglot pipelines — mixing languages for their strengths




🔍 Why You Should Try This

If you’re learning data engineering or systems programming:

  • Practice mixing Python + Go in real-world data movement
  • Get hands-on with Parquet files (a must-have in analytics)
  • See how ClickHouse handles fast inserts and queries
  • Get used to wiring up multiple components into a pipeline



📌 What’s Next?

📈 Build a dashboard on top of ClickHouse
⚙️ Try streaming Parquet data into ClickHouse
📂 Experiment with more complex schemas
🚀 Benchmark Python vs Go for performance in the pipeline




🙋‍♂️ About Me

Mohamed Hussain S
Associate Data Engineer
LinkedIn | GitHub

🧪 Building one mini project at a time to become a better data engineer.




Source link

Leave a Reply

Your email address will not be published. Required fields are marked *