Integration tests on GitLab CI with Docker

Look, perhaps I’m easily impressed, but I did this thing that felt like one of the coolest things I’ve done. It was not fancy. It just made integration tests feel as simple as running a local stack, and that was enough to make the whole team more confident.

The idea: use GitLab CI with docker:dind, spin up a real stack with docker compose, run integration tests, and tear everything down. If your local dev flow already uses compose, you get a CI setup that behaves the same way. The friction to run integration tests drops to almost zero.

The stack

The example stack was straightforward:

A Python app
A PostgreSQL database
An MQTT broker for message flow

Locally, that stack already ran in Docker. The missing part was to run it the same way in CI, reliably, without flaky timing issues.

The basic flow

The flow is simple and repeatable:

Build the app image.
Start all services with docker compose.
Wait for the stack to be healthy.
Run integration tests.
Tear everything down.

The tests can connect to real services, publish MQTT messages, verify downstream effects, and validate database state as a system. That is the value of integration tests: they verify that the system works, not just that individual units compile.

Example GitLab CI job

This is the shape of the job. It uses docker:dind, then calls compose inside the job to start the stack.

integration_tests:
  image: docker:24
  services:
    - docker:24-dind
  variables:
    DOCKER_HOST: tcp://docker:2375
    DOCKER_TLS_CERTDIR: ""
  script:
    - docker version
    - docker compose build
    - docker compose up -d
    - ./scripts/wait-for-services.sh
    - docker compose exec -T app pytest -q
  after_script:
    - docker compose down -v

The point is not the exact YAML, but the pattern. Keep CI close to your local workflow. If you can run it locally, you should be able to run it in CI with the same file.

Why docker:dind and compose

I used docker:dind because the local workflow already depended on compose. That mattered more than micro-optimizing the CI job. The payoff is consistency: the same services, the same env vars, the same ports. You can debug locally and trust that CI is not hiding a different topology.

If you do want to optimize later, you can split build and test stages, cache images, or move to GitLab services. But start with the simplest thing that behaves like production, then improve the speed once it is stable.

Test data hygiene

Integration tests are only trustworthy when they are deterministic. That means you should control the data you insert and clean up after each run. I prefer short fixtures and idempotent flows. If a test needs a shared dataset, make it explicit and recreate it inside the job.

One simple rule: assume every CI run starts from zero. That keeps tests honest and makes failures easier to reproduce.

The only real gotcha: health

The main pain point is not Docker. It is timing. CI is faster, slower, or just different than your laptop. Services do not come up in the same order every run.

You need a reliable health check step. A sleep 10 works until it does not. A small script that checks each service and waits until it is ready is the difference between stable CI and flakiness.

For example:

Postgres is ready when it accepts a connection and a simple query succeeds.
MQTT is ready when it accepts a connection and you can publish a test message.
Your app is ready when its health endpoint returns a 200.

Bake those checks into a small wait-for-services.sh script, and the job becomes boring and stable. That is a win.

Why this is worth it

Integration tests are easy to postpone when they are annoying to run. The moment they are one command in CI, teams actually use them. Even if you do not catch new bugs every day, you gain confidence that your system still behaves like a system.

It is also an easy step toward more realistic testing. If you already have compose in your repo, you are most of the way there. The big benefit is consistency. The same stack, the same env vars, the same flow. That keeps tests honest.

Pitfalls to avoid

Relying on blind sleeps instead of health checks.
Forgetting to tear down volumes, which can leak state into later runs.
Mixing test data with long-lived shared environments.
Running tests against a partially ready stack and blaming CI for flakiness.
Letting integration tests grow without pruning slow cases.

Checklist

Does your local compose file represent the real integration stack?
Can the CI job run the same compose file without edits?
Do you have explicit health checks for each critical service?
Are tests idempotent and safe to re-run?
Do you always tear down containers and volumes in after_script?

Integration tests on GitLab CI with Docker

The stack

The basic flow

Example GitLab CI job

Why docker:dind and compose

Test data hygiene

The only real gotcha: health

Why this is worth it

Pitfalls to avoid

Checklist

Related Posts

Simple continuous deployment with docker compose, docker machine and Gitlab CI

Efficient development with Docker and docker-compose

Full stack is too full and not full enough

Ambition should be about what you want to do, not what you want to be

The stack

The basic flow

Example GitLab CI job

Why docker:dind and compose

Test data hygiene

The only real gotcha: health

Why this is worth it

Pitfalls to avoid

Checklist

Related links

Related Posts

Simple continuous deployment with docker compose, docker machine and Gitlab CI

Efficient development with Docker and docker-compose

Full stack is too full and not full enough

Ambition should be about what you want to do, not what you want to be