void log_all_syscalls(void) {
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_LOG);
seccomp_arch_add(ctx, SCMP_ARCH_X86_64);
seccomp_export_bpf(ctx, 1);
seccomp_export_pfc(ctx, 2);
seccomp_release(ctx);
}
int main(int argc, char **argv) {
log_all_syscalls();
}
```
Building is straightforward. Just remember to link against `libseccomp` with `-lseccomp`.
```bash
gcc main.c -lseccomp
```
Running the above code we get this:
```txt
> 5@#
# pseudo filter code start
#
# filter for arch x86_64 (3221225534)
if ($arch == 3221225534)
# default action
action LOG;
# invalid architecture action
action KILL;
#
# pseudo filter code end
#
```
```bash
#!/usr/bin/dash
TEMP_LOG=/tmp/seccomp_logging_filter.bpf
./a.out > ${TEMP_LOG}
bwrap --seccomp 9 9<${TEMP_LOG} bash
```
Then we can go and see where the logs end up. On my host, they are logged under `/var/log/audit/audit.log` and they look like this:
```
type=SECCOMP msg=audit(1716144132.339:4036728): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=13 compat=0 ip=0x7fa58591298f code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=rt_sigaction
type=SECCOMP msg=audit(1716144132.339:4036729): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=13 compat=0 ip=0x7fa58591298f code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=rt_sigaction
type=SECCOMP msg=audit(1716144132.339:4036730): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=13 compat=0 ip=0x7fa58591298f code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=rt_sigaction
type=SECCOMP msg=audit(1716144132.339:4036731): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=13 compat=0 ip=0x7fa58591298f code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=rt_sigaction
type=SECCOMP msg=audit(1716144132.339:4036732): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=13 compat=0 ip=0x7fa58591298f code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=rt_sigaction
type=SECCOMP msg=audit(1716144132.339:4036733): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=14 compat=0 ip=0x7fa5859664f4 code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=rt_sigprocmask
type=SECCOMP msg=audit(1716144132.339:4036734): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=13 compat=0 ip=0x7fa58591298f code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=rt_sigaction
type=SECCOMP msg=audit(1716144132.339:4036735): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=1 compat=0 ip=0x7fa5859ce5d0 code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=write
type=SECCOMP msg=audit(1716144132.339:4036736): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=1 compat=0 ip=0x7fa5859ce5d0 code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=write
type=SECCOMP msg=audit(1716144132.339:4036737): auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined pid=19633 comm="bash" exe="/usr/bin/bash" sig=0 arch=c000003e syscall=270 compat=0 ip=0x7fa5859d77bc code=0x7ffc0000AUID="devi" UID="devi" GID="devi" ARCH=x86_64 SYSCALL=pselect6
```
Docker allows us to do the [same](https://docs.docker.com/engine/security/seccomp/). We can give docker a seccomp profile to filter out the syscalls that are not required for a specific container.
You can find the default docker seccomp profile [here](https://github.com/moby/moby/blob/master/profiles/seccomp/default.json).
## Namespaces
```
A namespace wraps a global system resource in an abstraction that makes it appear to the processes
within the namespace that they have their own isolated instance of the global resource. Changes
to the global resource are visible to other processes that are members of the namespace, but are
invisible to other processes. One use of namespaces is to implement containers.
```
From [man 7 namespaces](https://manpages.debian.org/bookworm/manpages/namespaces.7.en.html).
You can think of namespaces as almost the same thing as a namespace does in some programming languages.
Docker uses its own namespaces for the containers so as to further isolate the application containers from the host system.
### Namespaces in the Wild
As an example let's look at the script provided below. Here we are creating a new network namespace. The new interface is provided by simply connecting an android phone for USB tethering. Depending on the situation you have going on and the `udev` naming rules the interface name will differ but the concept is the same. We are creating a new network namespace for a second internet provider, which in this case, is our android phone. We then use this network namespace to execute commands in the context of this specific network namespace. Essentially, we can choose which applications get to use our phone internet and which ones use whatever it is we were previously connected to.
```sh
#!/usr/bin/env sh
PHONE_NS=phone_ns
IF=enp0s20f0u6
sudo ip netns add ${PHONE_NS}
sudo ip link set ${IF} netns ${PHONE_NS}
sudo ip netns exec ${PHONE_NS} ip link set ${IF} up
sudo ip netns exec ${PHONE_NS} ip link set dev lo up
sudo ip netns exec ${PHONE_NS} dhclient ${IF}
```
```sh
$ sudo ip netns exec home_ns curl -4 icanhaveip.com
113.158.237.102
$ curl -4 icanhasip.com
114.201.132.98
```
**_HINT_**: The IP addresses are made up. The only thing that matters is that they are different.
Since we have the android phone's interface on another namespace the two cannot interfere with each other. This is pretty much how docker uses namespaces.
Without a network namespace we would have to make a small VM, run a VPN on the VM and then make a socks5 proxy to the VM from the host and then have applications pass their traffic through a socks5 proxy with varying degrees of success.
**_NOTE_**: since we are not running the script on a hook, you might blow out your net having two upstreams at the same time. In which case, run the script, then restart NetworkManager or whatever you have.
## SBOM and Provenance Attestation
What is SBOM?
NIST defines SBOM as a “formal record containing the details and supply chain relationships of various components used in building software.".
It contains details about the components used to create a certain piece of software.
SBOM is meant to help mitigate the threat of supply chain attacks(remember xz?).
What is provenance?
```
The provenance attestations include facts about the build process, including details such as:
Build timestamps
Build parameters and environment
Version control metadata
Source code details
Materials (files, scripts) consumed during the build
```
[source](https://docs.docker.com/build/attestations/sbom/)
### Example
Let's review all that we learned about in the form of a light exercise.
For the first build, we use a non-vendored version.
Vendoring means that you store your dependencies locally. This means you are in control of your dependencies. You don't need to pull them from a remote. Even if one or more of your dependencies One of the more famous examples is Lua. The Lua foundation actually recommend vendoring your Lua dependency.
Vendoring helps with build reproducability.
We will use [milla](https://github.com/terminaldweller/milla) as an exmaple. It's a simple go codebase.
```Dockerfile
FROM alpine:3.19 as builder
RUN apk update && \
apk upgrade && \
apk add go git
WORKDIR /milla
COPY go.sum go.mod /milla/
RUN go mod download
COPY *.go /milla/
RUN go build
FROM alpine:3.19
ENV HOME /home/user
RUN set -eux; \
adduser -u 1001 -D -h "$HOME" user; \
mkdir "$HOME/.irssi"; \
chown -R user:user "$HOME"
COPY --from=builder /milla/milla "$HOME/milla"
RUN chown user:user "$HOME/milla"
ENTRYPOINT ["home/user/milla"]
```
The first docker image build is fairly simple. We copy the source code in, get our dependencies and build a static executable. As for the second stage of the build, we simply put the executable into a new base image and we are done.
The second build which is a vendored build with a golang distroless base. We copy over the source code for the project and all its dependencies and then do the same as before.
```Dockerfile
FROM golang:1.21 as builder
WORKDIR /milla
COPY go.sum go.mod /milla/
RUN go mod download
COPY *.go /milla/
RUN CGO_ENABLED=0 go build
FROM gcr.io/distroless/static-debian12
COPY --from=builder /milla/milla "/usr/bin/milla"
ENTRYPOINT ["milla"]
```
Below You can see an example docker compose file. Milla can optionally use a postgres database to store messages. We also include a pgadmin instance. Now let's talk about the docker compose file.
```yaml
services:
terra:
image: milla_distroless_vendored
build:
context: .
dockerfile: ./Dockerfile_distroless_vendored
deploy:
resources:
limits:
memory: 128M
logging:
driver: "json-file"
options:
max-size: "100m"
networks:
- terranet
user: 1000:1000
restart: unless-stopped
entrypoint: ["/usr/bin/milla"]
command: ["--config", "/config.toml"]
volumes:
- ./config.toml:/config.toml
- /etc/localtime:/etc/localtime:ro
cap_drop:
- ALL
runtime: runsc
postgres:
image: postgres:16-alpine3.19
deploy:
resources:
limits:
memory: 4096M
logging:
driver: "json-file"
options:
max-size: "200m"
restart: unless-stopped
ports:
- "127.0.0.1:5455:5432/tcp"
volumes:
- terra_postgres_vault:/var/lib/postgresql/data
- ./scripts/:/docker-entrypoint-initdb.d/:ro
environment:
- POSTGRES_PASSWORD_FILE=/run/secrets/pg_pass_secret
- POSTGRES_USER_FILE=/run/secrets/pg_user_secret
- POSTGRES_INITDB_ARGS_FILE=/run/secrets/pg_initdb_args_secret
- POSTGRES_DB_FILE=/run/secrets/pg_db_secret
networks:
- terranet
- dbnet
secrets:
- pg_pass_secret
- pg_user_secret
- pg_initdb_args_secret
- pg_db_secret
runtime: runsc
pgadmin:
image: dpage/pgadmin4:8.6
deploy:
resources:
limits:
memory: 1024M
logging:
driver: "json-file"
options:
max-size: "100m"
environment:
- PGADMIN_LISTEN_PORT=${PGADMIN_LISTEN_PORT:-5050}
- PGADMIN_DEFAULT_EMAIL=${PGADMIN_DEFAULT_EMAIL:-devi@terminaldweller.com}
- PGADMIN_DEFAULT_PASSWORD_FILE=/run/secrets/pgadmin_pass
- PGADMIN_DISABLE_POSTFIX=${PGADMIN_DISABLE_POSTFIX:-YES}
ports:
- "127.0.0.1:5050:5050/tcp"
restart: unless-stopped
volumes:
- terra_pgadmin_vault:/var/lib/pgadmin
networks:
- dbnet
secrets:
- pgadmin_pass
networks:
terranet:
driver: bridge
dbnet:
volumes:
terra_postgres_vault:
terra_pgadmin_vault:
secrets:
pg_pass_secret:
file: ./pg/pg_pass_secret
pg_user_secret:
file: ./pg/pg_user_secret
pg_initdb_args_secret:
file: ./pg/pg_initdb_args_secret
pg_db_secret:
file: ./pg/pg_db_secret
pgadmin_pass:
file: ./pgadmin/pgadmin_pass
```
We are assigning memory usage limits for the containers. We are also limiting the size of the logs we are keeping on disk.
One thing that we did not talk about before is the networking side of compose.
As can be seen, the postgres and pgadmin container share one network while the postgres container and milla share another network. This makes it so that milla and pgadmin do not have access to each other. This is inline with principle of least privilege. Milla and pgadmin don't need to talk to each other so they can't do that.
Also we refrain from using host networking.
We are also binding the open ports to the host's localhost interface. This does not let us connect to the endpoints directly. In our example we don't need the ports to be exposed to the internet but we will need access to them. What we can do is bind the open ports to the host's localhost and then use ssh to forward the ports onto our own machine, assuming the docker host is a remote.
```sh
ssh -L 127.0.0.1:5460:127.0.0.1:5455 user@remotehost
```
While building milla, for the second stage of the build, we made a non-privileged user and our now mapping a non-privileged user on the host to that user. We are removing all capabilities from milla since milla will be making requests and has no server functionality. Milla will only need to bind to high-numbered ports which does not require a special privileges.
We run both postgres and milla with gvisor's runsc runtime since it's possible to do so.
Finally we use docker secrets to put the secrets into the container's runtime environment.
Now onto the attestations.
In order to view the SBOM for the image we will use docker [scout](https://docs.docker.com/scout/install/).
```sh
docker scout sbom milla
docker scout sbom milla_distroless_vendored
```
The SBOMs can be viewed [here](https://gist.github.com/terminaldweller/8e8ecdcb68d4052aecb6804823648b4d) and [here](https://gist.github.com/terminaldweller/f4ede7122f159506f8e6e6be2bfd6a8b) respectively.
Now lets look at the provenance attestations.
```sh
docker buildx imagetools inspect terminaldweller/milla:main --format "{{ json .Provenance.SLSA }}"
```
And [here](https://gist.github.com/terminaldweller/033ae07a9e685db85b18eb822dea4be3) you can look at the result.
## Further Reading
- [man 7 cgroups](https://manpages.debian.org/bookworm/manpages/cgroups.7.en.html)
- system containers using [lxc/incus](https://github.com/lxc/incus)
- [katacontainers](https://katacontainers.io/)
timestamp:1716163133
version:1.0.0
https://raw.githubusercontent.com/terminaldweller/blog/main/mds/securedocker.md