The past, present, and future of Data Engineering

To Nha Notes | Nov. 10, 2022, 2:19 p.m.

The Data Engineering practice has been around since the early internet era in the 1990s. Fundamentals of Data Engineering, in the past, data engineers were mostly ETL developers using specific tools. Most of these tools were proprietary tools and located on-premises. The term data engineers itself didn't exist; the more common terms used to be data modelers, database admin, and ETL developer (ETL references the proprietary ETL tool's name). Each of the ETL tools had the necessary expertise and best practices surrounding them.

Now, in the present, Data Engineering has evolved into a more mature and singular role. This means that the practice has a lot more common principles, concepts, and best practices. This is due to two reasons – the rapid improvement in the technologies supporting the practice and the fact that Data Engineering has become a critical and central role to organizations.

"Data engineer was the fastest-growing job in technology with a 50% year-over-year growth in the number of open positions" – 2019, reports from Burning Glass's Nova platform. The platform analyzes millions of active job postings. Data engineer remains the top tech job, with a 50% year-over-year growth. I have to say that the present day is the best day for anyone who wishes to become a data engineer.

The words big data and the cloud are no longer considered as the future – they are the present. If you are looking for jobs in Data Engineering, you must have strong knowledge of both of these. If you are a representative from a company that is looking for data engineers, you must ask the candidates about these two words. This is unavoidable and has become a new norm because it's the present.

Now, what does the future look like?

There are two aspects that I can think of. The first is the technology aspect, while the second is the role aspect:

  • From a technology perspective, now, in 2022, the adoption from on-premises to the cloud is still happening. It's still an uptrend, and it's still far from the peak. When it reaches its peak, traditional companies such as banking corporations and the government will have adopted the cloud – not only in certain countries but globally. The advancement in data security, data governance, and multi-cloud environments are essentials. As you may already know, the regulation of data has been maturing in a lot of countries in the last few years. The data and cloud technologies need to keep adapting to these regulations, especially when it comes to financial industries. And on top of that, the capabilities to integrate data and technologies across cloud platforms and with on-premise systems will become more mature.
  • The second aspect is the data engineer role. I think this role will go back to being a non-singular role. There will be clearer and more specific roles compared to the data engineer role. Companies will start to realize that the data engineer role can be broken down into more granular roles. It will be easier for them to find good candidates for specific needs.

One other thing about the role is who will write SQL queries. In the past, and still, in the present, data engineers are the ones who take full responsibility for writing SQL queries to transform business logic into answers. It is starting now and will occur more in the future that non-engineers will be SQL-savvy people. Non-engineers can include the marketing team, the human resources department, C-level, or any other role. They are the best at understanding the business context and gain the most benefit when they can access data directly via databases using SQL. For data engineers, this means that the role will be more focused on shaping the foundation. In terms of ELT, the extract and load process will still be handled by data engineers, but the majority of the transformation process will probably be taken over by non-data engineers. Going back to what I mentioned previously, data engineers will need to be more focused on designing and developing data security and data governance as a foundation.

This is just a very short list of examples of what data engineers should or shouldn't be responsible for:

  • Handle all big data infrastructures and software installation.
  • Handle application databases.
  • Design the data warehouse data model.
  • Analyze big data to transform raw data into meaningful information.
  • Create a data pipeline for machine learning.
Data engineer-focused diagram

At the center of the diagram (number 3) are the jobs that are the key focus of data engineers, and I will call it the core.  

Those numbered 2 are the good to have area. 

Those numbered 1 are the good to know area. For example, it's rare that a data engineer needs to be responsible for building application databases, developing machine learning models, maintaining infrastructure, and creating dashboards. It is possible, but less likely. The discipline needs knowledge that is a little bit too far from the core.

References

The ebook Data Engineering with Google Cloud Platform