Containerized Blog Part 1

This blog has been chugging along as a classic static website served by nginx for a few years now. Publishing articles happens automatically upon a git push to master. I really like this set up, but it's not perfect. I did a lot of manual configuration to make the publishing work via the gitlab CI. I probably should have at least put that configuration into ansible, but never got around to it. I also forgot I was using my root ssh key to copy the files over (oops).

To the Cloud

I'm currently working on moving from my on-premise gitlab instance to gitlab.com. So I either had to configure a new gitlab runner to handle the blog publishing or find a new solution. I could take advantage of gitlab pages, but I already do other things on the VPS so pages wouldn't really save me much. I could move to Digital Ocean's managed kubernetes but this would essentially double my already enormous costs of running the smallest droplet available. So I decided to split the difference and go with a single node docker swarm on the existing VPS (may sneak in a transition from CentOS 7 to Ubuntu as well). And then after setting up a test swarm and writing out most of the stack files I pivoted to k3s. This decision was made a few weeks before Docker 20.10 was released, but even with that swarm feels like a buggy mess. I also remembered I didn't have to do an all out kubernetes cluster if I used k3s. Turns out it even has a 1 step install (though it still requires a couple follow up tweaks).

Why not podman?

I already run some services via podman, why am I changing to ~docker swarm~...er k3s? Podman is nice for local development, building, testing, and maybe even running a few one off containers, but it doesn't scale (and isn't really meant to). I also don't really care for making my own systemd unit files to start/stop the podman containers. I probably also don't need to run docker in swarm mode, but it allows me to use stack files and doesn't really cost anything in terms of performance or configuration.

Up to now I've spent more time managing the infrastructure side of docker than I have the image side. So this also gives me a chance to get more familiar with Dockerfiles and managing images. I'm also trying to stop myself from simply googling to a step-by-step article (i.e. "googling docker pelican blog") that I then follow without really learning anything.

I've been looking for a reason to try out caddy server and since I already have to turn the blog upside down to fit it into a container I might as well move it from nginx to caddy as well. There are 2 ways I can run my caddy container. The first is by using the standard caddy image and attaching a volume with my site's files. The second way is by baking my site into an image layered on top of caddy's image. I don't want my site to be connected to a specific server(or to setup storage) so I'm going with option 2. This does mean I have to find a place to store my images and since Docker Hub is starting to get limited my personal choice becomes anywhere but there, most likely I'll end up using Gitlab's built in registry. But first I just want to get something working locally to make sure it's going to work out.

Now it's time to create the Dockerfile[link to docker file here]. I know I want to base it off the caddy image and just put in my files.

FROM caddy:latest

But I still need to convert my files from markdown to html and get them into the image. Of course I could keep running pelican from a python virtualenv on the gitlab runner, but do I really need to mess with a virtualenv if I could just run pelican in a container as well? Sure enough docker even expects this kind of workflow and supports it with multi-stage builds (link to docker docs). So I grab a python image and tell it to do everything I would have done manually on the runner.

FROM python:3-alpine AS builder
WORKDIR /usr/src/app
ADD content ./content
COPY pelicanconf.py ./
COPY requirements.txt ./

RUN pip install -r requirements.txt
RUN apk add git && \
    git clone https://github.com/gilsondev/pelican-clean-blog.git theme
RUN pelican /usr/src/app/content/ -s pelicanconf.py

I realize my original pelican blog config was not much more complicated than this, but having this all in a single file we can see that things are much more efficient. It's also fairly generic, the only special bit being the theme I chose to use. Putting everything together we generate the html files, and then simply include them in the final caddy image.

...
RUN pelican /usr/src/app/content/ -s pelicanconf.py

FROM caddy:latest

COPY Caddyfile /etc/caddy/Caddyfile
COPY --from=0 /usr/src/app/public /srv

And now I can also preview any changes I want by simply building and running the image locally(or I could use git branches and have CI build the image for me whenever I push a commit). To compare with my previous CI job that did not include actually setting up the python enviroment or ssh keys.

pages:
  script:
  - source ~/blogpy/bin/activate
  - pip install -r requirements.txt
  - pelican -s pelicanconf.py
  - rsync -av -e ssh --delete public/ digitalocean:/var/www/blog/
  artifacts:
    paths:
    - public/

By default caddy wants to automatically configure a TLS certificate with Let's Encrypt. This is awesome, but I plan on having multiple websites so I'd rather just let the reverse proxy handle TLS for everything and caddy stick to plaintext. We simply add a 2 line Caddyfile and it's all set.

:8080

root * /srv
file_server

Now we can do a podman build . and podman run blah to see that everything works. One thing to watch out for is that pelican hardcodes the urls so anything not on the home page would require a bit of tweaking to truly test. Now we're ready to do this via CI.

Building via CI

I was considering building my own container registry but gitlab has enough available in their free tier so I'm going to try it out. I decided to use podman instead of a full docker install, since I only need to build images.

yum install -y podman

I'm not sure why but podman is behaving differently on my local fedora workstation than it is on the Centos 8 build server. Forgot to add the gitlab-runner user to /etc/subuid and /etc/subgid but that doesn't appear to have completely fixed it. A reboot, rm -rf .local/share/containers and rm -rf .config/containers appears to have done the trick though.

I ran into a weird error:

Error committing the finished image: error adding layer with blob "sha256:801bfaa63ef2094d770c809815b9e2b9c1194728e5e754ef7bc764030e140cea": ApplyLayer exit status 1 stdout:  stderr: potentially insufficient UIDs or GIDs available in user namespace (requested 0:42 for /etc/shadow): Check /etc/subuid and /etc/subgid: lchown /etc/shadow: invalid argument

and realized to run rootless podman there is some config required. Add gitlab-runner:100000:65536 to /etc/subuid and /etc/subgid, a gitlab bug led me to rootless podman from upstream on Centos 7 for the fix.

After getting the podman build step to succeed the push failed, likely the bug referenced here

Error: error copying image to the remote destination: Error trying to reuse blob sha256:777b2c648970480f50f5b4d0af8f9a8ea798eea43dbcf40ce4a8c7118736bdcf at destination: Requesting bear token: invalid status code from registry 422 (Unprocessable Entity)

Later I ran into an issue pulling the image on the kubernetes server. For some reason I had to push the image with --format docker which I got based on a comment on this bug.

to put it all together my gitlab-ci.yml now looks like

stages:
  - build

build:
  stage: build
  script:
    - podman login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - podman build -t $CI_REGISTRY/blcarman/blog:$CI_JOB_ID .
    - podman push --format docker $CI_REGISTRY/blcarman/blog:$CI_JOB_ID
  rules:
    - if: '$CI_COMMIT_BRANCH == "master"'

I'm still being a bit special using a local runner and podman, but maybe at some point in the future I'll explore the docker-in-docker shared runners.

To be Continued

Above is only the first half of the puzzle. As you can see nothing about the blog really changed, I'm still running mostly the same commands just in a slightly different way. The benefit is I now have a much more standard/portable build process and versioned artifacts. That might not really matter for a blog, but could be critical for something more closely connected to its dependencies. I do have the blog running on k3s now but I manually configured some things to make that happen. I need to add a deploy stage to the build pipeline. I also need to codify the configuration changes I made to k3s, most notably configuring the gitlab registry and disabling the builtin traefik in favor of the latest release. I'll try to cover those in another blog post.