chapter3
chapter3
images
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Docker image explanation
Docker images are the base of a given container
Holds all content initially available to a container instance
INTERMEDIATE DOCKER
Docker image concerns
Tempting to add all potentially needed
components to an image
INTERMEDIATE DOCKER
Docker image recommendations
Split containers to the smallest level needed
Easier to combine multiple containers later vs. building a single large image
Like
building with reusable components
Updates to specific software only affect containers using that image instead of all
containers needing the update
Can optimize for size, making use and distribution much easier
INTERMEDIATE DOCKER
Docker image breakdown example
Consider a data engineering project using
FROM ubuntu
the following software:
RUN apt update
Postgresql database
RUN apt install -y postgresql
Python ETL software RUN apt install -y nginx
RUN apt install -y python3.9
Web server software
...
Possible to use a single image, but we
would need to update the image each time
we had an update to the ETL or web server
setup.
INTERMEDIATE DOCKER
Example with minimized containers
Better options with Docker bash> docker run -d postgresql:latest
Split each into its own container bash> docker run -d nginx:latest
Postgresql database container ...
Web server
INTERMEDIATE DOCKER
Determining image size
Using docker images bash> docker images
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Understanding
layers
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Docker layers
Docker images are made up of layers
INTERMEDIATE DOCKER
Why do we care about layers?
Reusability
Faster build time
Smaller builds
INTERMEDIATE DOCKER
docker image inspect
How to determine the layers within an image?
docker image inspect <img id | name>
The RootFS:Layers section provides details about layers in a given Docker image
INTERMEDIATE DOCKER
docker image inspect example
bash> docker image inspect postgres:latest
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:6f2d01c02c30cc1ffac781aff795cba8eeb29cc27756fe37bf525169856369c6",
"sha256:c6ad2d5a3cad837ae66b5560e9c577bfad062556b1f00791d8d733ce44a577ce",
"sha256:2153552a84ccbf7e4a28a50e766b72345072e59f8af0ff068baf98b413132e0c",
"sha256:6c00217b1e4b15c25eb3f6e28b1af8c295f469568014621e31a4c5eb5a8aca6f",
"sha256:167177d78e2a33aa822faebe9f01683c648ae78179059db05cd25737f215c305",
...
INTERMEDIATE DOCKER
jq command-line tool
Sometimes difficult to analyze the results from docker image inspect
jq commandline tool is used to read JSON data, like what's returned from
docker image inspect
INTERMEDIATE DOCKER
jq recipes with Docker
Method to see just a specific section, for example the RootFS data:
docker image inspect <id> | jq '.[0] | .RootFS'
{
"Type": "layers",
"Layers": [ "sha256:0f5c115c5eea96...",
"sha256:20792593831cdc..."
]
}
INTERMEDIATE DOCKER
jq recipes with Docker (part 2)
Method to count number of layers using jq :
docker image inspect <id> | jq '.[0] | {LayerCount: .RootFS.Layers | length}'
{
"LayerCount": 2
}
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Multi-stage builds
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Single-stage builds
Typical Docker images are created using a FROM ubuntu
single FROM command RUN apt update
Each addition to the source image adds RUN apt install gcc -y
space and makes its management ...
RUN make
Consider an application that must be
CMD ["data_app"]
compiled prior to use
You can add all the necessary
components to the image, compile it,
and then configure the final image for
use
INTERMEDIATE DOCKER
Multi-stage builds
Multi-stage builds use multiple containers
Typically has one or more build stages
COPY --from=<alias>
INTERMEDIATE DOCKER
Multi-stage build example
# Create initial build stage
FROM ubuntu AS stage1
# Install compiler and compile code
RUN apt install gcc -y
...
RUN make
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Multi-platform builds
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Multi-platform?
What does multi-platform mean?
Different OS types
linux
windows
macos
arm64
arm7
INTERMEDIATE DOCKER
Creating multi-platform builds
Is built on multi-stage build behavior
The initial / build stage tends to use cross-compilers and relies on the architecture of the
host system
INTERMEDIATE DOCKER
Multi-platform Dockerfile options
Build stage uses the --platform=$BUILDPLATFORM flag
$BUILDPLATFORM represents the platform of the host running the build
The environment variables at the host level can be defined previously or using the env
command.
INTERMEDIATE DOCKER
Multi-platform example
# Initial stage, using local platform
FROM --platform=$BUILDPLATFORM golang:1.21 AS build
# Copy source into place
WORKDIR /src
COPY . .
# Pull the environment variables from the host
ARG TARGETOS TARGETARCH
# Compile code using the ARG variables
RUN env GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /final/app .
INTERMEDIATE DOCKER
Building a multi-platform build
To create a multi-platform build, instead of using docker build , we must use
docker buildx with assorted options
docker buildx provides more commands and capabilities over docker build , including
the option to specify a platform
Prior to running the build, we must also have a new builder container present. This is done
with the docker buildx create --bootstrap --use command.
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R