To Nha Notes | Oct. 20, 2025, 9:34 a.m.

I can't repeat this enough: 80–90% of GenAI work is already being done (or can be done) by Data Engineers.
Think about it...
When you build something like a RAG application, where's the AI magic, really? Let's break down what you actually do:
➡️ Ingest data (PDFs, APIs, etc.)
➡️ Extract and transform text into JSON
➡️ Create embeddings
➡️ Store them in a vector DB
➡️ Build an app to query it
All of this is engineering. Setting up pipelines, handling APIs, managing data flow, optimizing embeddings and retrieval. That's what engineers do.
The "AI part"? That's just using a pre-trained model (like Mistral or Llama). You don't need to train it. The math to understand it all is quite simple. You don't need a PhD. You just use it.
For 80–90% of GenAI use cases:
👉 It's all a data engineer's job. No special AI wizard skills needed.
OK, we might call this engineer "AI Engineer" now. I can live with that.
We already have pre-built, downloadable models that handle that just fine.
It's not about "AI magic".
It's about knowing your data and building solid systems.
That's why I always say: Everything is Data Engineering.
For all engineers who want to understand what's actually happening under the AI hood, and build it themselves: