Optimizing the docket build process

To Nha Notes | Aug. 13, 2024, 12:45 a.m.

The Docker build process can and should be optimized. This will remove a lot of friction in the software development life cycle.

Many Docker beginners make the following mistake when crafting their first Dockerfile:

Figure 8.6 – Unoptimized Dockerfile for a Node.js application

Figure 8.6 – Unoptimized Dockerfile for a Node.js application

Can you spot the weak point in this typical Dockerfile for a Node.js application? In Chapter 4Creating and Managing Container Images, we learned that an image consists of a series of layers. Each (logical) line in a Dockerfile creates a layer, except the lines with the CMD and/or ENTRYPOINT keywords. We also learned that the Docker builder tries to do its best by caching layers and reusing them if they have not changed between subsequent builds. But the caching only uses cached layers that occur before the first changed layer. All subsequent layers need to be rebuilt. That said, the preceding structure of the Dockerfile invalidates – or as we often hear said – busts the image layer cache!

Why? Well, from experience, you certainly know that the npm installcommand can be a pretty expensive operation in a typical Node.js application with many external dependencies. The execution of this command can take from seconds to many minutes. That said, each time one of the source files changes, and we know that happens frequently during development, line 3 in the Dockerfile causes the corresponding image layer to change. Hence, the Docker builder cannot reuse this layer from the cache, nor can it reuse the subsequent layer created by RUN npm install. Any minor change in code causes a complete rerun of npm install. That can be avoided. The package.json file containing the list of external dependencies rarely changes. With all of that information, let’s fix the Dockerfile:

Figure 8.7 – Optimized Dockerfile for a Node.js application

Figure 8.7 – Optimized Dockerfile for a Node.js application

This time, on line 3, we only copy the package.json file into the container, which rarely changes. Hence, the subsequent npm install command has to be executed equally rarely. The COPYcommand on line 5 is then a very fast operation and hence rebuilding an image after some code has changed only needs to rebuild this last layer. Build times reduce to merely a fraction of a second.

The very same principle applies to most languages or frameworks, such as Python, .NET, or Java. Avoid busting your image layer cache!