Pinning your SQLite version across environments

The project I'm currrently working on only has a single external dependency: SQLite, with full text search enabled. As a result, the application is extremely easy to package and run. However, I found out that ensuring that you have the exact same SQLite version and feature set in all your environments (development machines running macOS and linux, CI and production) is trickier than I expected.

When you rely on a traditional database server (PostgreSQL, MySQL, mongoDB, etc), you can achieve this by running the same server version in all your environments.

Docker really shines there, as it allows to do just that in a single command.

$ docker run postgres:15.4

Things are a bit different with SQLite, as it is not an SQL server. It is a library that you embed in your program (either by compiling it alongside your code, or by relying on a shared library and language bindings). Python does the latter: its sqlite3 module is written in C using the CPython API, and includes the sqlite3.h header file. Where does this header file come from though?

Inspecting the sqlite version on linux

If we have a look at a python3.11 installation directory on a random Ubuntu server, we see that it bundles an _sqlite.so shared object, that itself dynamically loads libsqlite3.so.0.

$ find /usr/lib/python3.11  -name "*sqlite3*.so"
/usr/lib/python3.11/lib-dynload/_sqlite3.cpython-311-x86_64-linux-gnu.so
$ ldd /usr/lib/python3.11/lib-dynload/_sqlite3.cpython-311-x86_64-linux-gnu.so
    linux-vdso.so.1 (0x00007ffcda976000)
    libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007fab44d9c000) # <--
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fab44a00000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fab44cb3000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fab44f17000)

Same question: where does /lib/x86_64-linux-gnu/libsqlite3.so.0 come from then?

$ apt-file search /lib/x86_64-linux-gnu/libsqlite3.so.0
libsqlite3-0: /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
libsqlite3-0: /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
$ apt-cache search libsqlite3-0
libsqlite3-0 - SQLite 3 shared library

This means that python relies on whatever libsqlite3 version is installed by the system package manager. We can double check this by having a look at the python3 package recursive dependencies: python3 -> libpython3-stdlib -> libpython3.11-stdlib -> libsqlite3-0.

To know what version is installed on that system, we can inspect the version of the libsqlite3-0 apt package:

$ apt-cache show libsqlite3-0 | grep Version
Version: 3.40.1-1

We can check that we're getting this exact version via python:

>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> conn.execute("select sqlite_version()").fetchone()
('3.40.1',)

Inspecting the sqlite version on macOS

Assuming you are installing your packages via brew on macOS, you'll find that it does things a bit differently than apt. The python3 formula depends on sqlite, which itself downloads an archive pinned to a given version (3.43.0 at the time of writing), and then compiles libsqlite3.dylib.

Indeed, we see this library when inspecting the content of the sqlite brew package:

~  ls -alh /opt/homebrew/opt/sqlite/lib/libsqlite3.dylib
lrwxr-xr-x    18 br   16 May 15:45  /opt/homebrew/opt/sqlite/lib/libsqlite3.dylib -> libsqlite3.0.dylib

And sure enough, we see that we're running the expected version in python:

>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> conn.execute("select sqlite_version()").fetchone()
('3.43.0',)

Pinning the sqlite version by vendoring the compiled library

To pin the sqlite version across all environments and OSes, we can compile these shared/dynamically loaded libraries ourselves for all architectures we plan to support, vendor them in our codebase, and inject them into our application via LD_PRELOAD.

We'd need to cover all the ways we run the app:

  • running make run, which runs the app on the host, against the version of libsqlite3 installed by the package manager
  • running make docker-run, which runs the application in a docker container against the libsqlite3 version available through the image OS package manager
  • running make test in CI (Github Actions), which runs the test against the libsqlite3 version made available by the runner OS package manager

Compiling the sqlite source code into a shared library was made easy to do as Simon Willison already documented the process.

Compiling libsqlite3 for linux

The following script compiles libsqlite3 for linux, with full text search enabled:

# script/compile-libsqlite-linux.sh
#!/usr/bin/env bash
set -e

apt-get install -y build-essential wget tcl

# link associated with sqlite 3.42.0, found on https://www.sqlite.org/src/timeline?t=version-3.42.0
# pointing to https://www.sqlite.org/src/info/831d0fb2836b71c9
sqlite_ref=831d0fb2
wget https://www.sqlite.org/src/tarball/${sqlite_ref}/SQLite-${sqlite_ref}.tar.gz
tar -xzvf SQLite-${sqlite_ref}.tar.gz
pushd SQLite-${sqlite_ref}

CPPFLAGS="-DSQLITE_ENABLE_FTS5" ./configure
make

popd
mv SQLite-${sqlite_ref}/.libs/libsqlite3.so ./lib/
rm -r SQLite-${sqlite_ref}.tar.gz SQLite-${sqlite_ref}

Compiling libsqlite3 for macOS

The following script compiles libsqlite3 for macOS, with full text search enabled:

# script/compile-libsqlite-macos.sh
#!/usr/bin/env bash

set -eu

sqlite_version=3420000

wget https://www.sqlite.org/2023/sqlite-amalgamation-${sqlite_version}.zip
unzip sqlite-amalgamation-${sqlite_version}.zip
pushd sqlite-amalgamation-${sqlite_version}

gcc -dynamiclib sqlite3.c -o libsqlite3.0.dylib -lm -lpthread -DSQLITE_ENABLE_FTS5

popd
mv sqlite-amalgamation-${sqlite_version}/libsqlite3.0.dylib ./lib/
rm -r sqlite-amalgamation-${sqlite_version}.zip sqlite-amalgamation-${sqlite_version}

Compiling the right version on-demand

We then define a $(libsqlite) make target, either pointing to lib/libsqlite3.so if you run the app on linux, or lib/libsqlite3.0.dylib if you run it on macOS. We finally make sure to override the system shared library by the vendored one when running the app, via LD_PRELOAD on linux and DYLD_LIBRARY_PATH on macOS.

# Makefile

UNAME_S := $(shell uname -s)
PWD := $(shell pwd)
ifeq ($(UNAME_S),Linux)
    libsqlite = lib/libsqlite3.so
    ld_preload = LD_PRELOAD=$(PWD)/$(libsqlite)
else ifeq ($(UNAME_S),Darwin)
    libsqlite = lib/libsqlite3.0.dylib
    ld_preload = DYLD_LIBRARY_PATH=$(PWD)/lib
endif

app-root = dnd5esheets
poetry-run = $(ld_preload) poetry run

lib/libsqlite3.so:
    @./scripts/compile-libsqlite-linux.sh

lib/libsqlite3.0.dylib:
    @./scripts/compile-libsqlite-macos.sh

build: $(libsqlite) ...

test:
    @$(poetry-run) pytest

run: build ...
    @cd $(app-root) && $(poetry-run) uvicorn --factory $(app-root).app:create_app --reload

Compiling libsqlite3 in docker

While the previous steps work, they also prove to be quite brittle, as they only works for a given CPU architecture. For example, the libsqlite3.0.dylib library will not load on an Intel Mac if it was compiled on a M1 or M2.

The most robust way to go remains building libsqlite3 in a build stage of the docker image process. This way, you know that you only need to build it for linux, whatever the host OS is, and you're guaranteed that it will be built for your CPU architecture, thanks to the multi-arch property of the python:3.11.4-slim base image.

# Dockerfile
...
# -- Build the libsqlite3.so shared object for the appropriate architecture
FROM python:3.11.4-slim AS sqlite-build

WORKDIR /app/src/build

COPY scripts/compile-libsqlite-linux.sh ./
RUN apt-get update && \
    apt-get install --no-install-recommends -y build-essential wget tcl && \
    ./compile-libsqlite-linux.sh && \
    apt-get remove -y build-essential wget tcl && \
    apt-get auto-clean


# -- Main build combining the FastAPI and compiled frontend apps
FROM python:3.11.4-slim
...
COPY --from=sqlite-build /app/src/build/libsqlite3.so ./lib/libsqlite3.so
CMD ["./start-app.sh"]
# start-app.sh
#!/bin/bash

set -e

exec \
    env LD_PRELOAD=./lib/libsqlite3.so \ # inject the LD_PRELOAD environment variable in the process
    uvicorn --factory dnd5esheets.app:create_app --host "0.0.0.0" --port 8000

Unit testing the SQLite version and feature set

With all of that said and done, we can now expose the sqlite version and compilation options through a debug API handler:

# dnd5esheets/api/debug.py
from fastapi import APIRouter, Depends
from sqlalchemy import text
from sqlalchemy.ext.asyncio import AsyncSession

from dnd5esheets.db import create_scoped_session

debug_api = APIRouter(prefix="/debug")


@debug_api.get("/sqlite")
async def sqlite_info(
    session: AsyncSession = Depends(create_scoped_session),
):
    """Return debug information about the sqlite database"""
    version = (await session.execute(text("select sqlite_version()"))).scalar_one()
    pragma_compile_options = (
        (await session.execute(text("pragma compile_options"))).scalars().all()
    )
    return {
        "version": version,
        "compile_options": pragma_compile_options,
    }

We can then query the sqlite version through the API:

 curl -s localhost:8000/api/debug/sqlite | jq .version
"3.42.0"

However, we can go even further! By unit-testing the version and compile options, we ensure that our CI uses the exact required sqlite version and feature set.

# dnd5esheets/tests/test_api_debug.py

def test_sqlite_version(client):
    sqlite_debug_info = client.get("/api/debug/sqlite").json()
    assert sqlite_debug_info["version"] == "3.42.0"
    assert "ENABLE_FTS5" in sqlite_debug_info["compile_options"]

See the effect of vendoring the compiled library in CI: before / after.

Sources

Comments