Introducing schemamacros

Posted on Thu 02 August 2018 in projects

I've built large systems in PostgreSQL, sometimes unintentionally, sometimes out of necessity. The same issues tend to appear in every large SQL codebase I've worked on or seen:

  1. Code duplication everywhere
  2. Poor or no tests, unclear how to verify correctness
  3. Hard to debug stored procedures

There are many engineering limitations in SQL and PL/pgSQL that affects how willing we are to use the power of these tools. Modern stylesheet development often happens with a preprocessor to solve the duplication problem. I believe that a lot of SQL code bases could benefit from this pattern.

Schemamacros is an experiment to see how PostgreSQL application development can be improved with the jinja2 templating engine.

Schemamacros is a thin wrapper on top of jinja2 to run it as a CLI, it also adds a YAML based configuration format and enough conventions to allow SQL-libraries hosted in the python package repositories

Minitutorial

I've made a very simple example project available on my github that looks like this:

.
├── build/
├── schema/
│   └── main.sql
├── Readme.md
├── requirements.txt
└── sm-config.yml

2 directories, 4 files

sm-config.yml

Besides the boilerplate you need a configuration file sm-config.yml, this one looks like this.

version: "1.2"

template_directories:
  - "schema"
template_packages:
  - "schemamacros-logging"

variables:
  debug: False

targets:
  build/schema.debug.sql:
    schema_template: "main.sql"
    transaction: True
    variables:
      LOGGING_LEVEL: DEBUG

  build/schema.prod.sql:
    schema_template: "main.sql"
    transaction: True
    variables:
      LOGGING_LEVEL: WARNING

First off we need to specify where store template files in this project:

template_directories:
  - "schema"

In this example we also use a very basic logging module I made to showcase that we can package sql and put it pypi. I'll write more about that later but this is how we make sure that a package is available for usage in our templates.

template_packages:
  - "schemamacros-logging"

We can pass in global variables, root node variables in the config will be used first and would be overridden in more specific parts of the config or if set with a jinja {% set ... %} -block.

variables:
  debug: False

Now let's look at the targets, every output path gets its own definition of how it's built by setting the base template and override variables. This means we can easily produce debugging and release builds from the same source if desired.

In this example we set LOGGING_LEVEL so that our schema.prod.sql will not put too much in our logs.

targets:
  build/schema.debug.sql:
    schema_template: "main.sql"
    variables:
      LOGGING_LEVEL: DEBUG

  build/schema.prod.sql:
    schema_template: "main.sql"
    variables:
      LOGGING_LEVEL: ERROR

schema/main.sql

The actual template in this example is very simple:

{% import 'logging.sql' as log with context  %}

{{ log.CRITICAL("critical message") }}
{{ log.INFO("just some info") }}

There is no actual SQL in this file, however the imported file, provided by the package schemamacros-logging does contain some PL/pgSQL stubs and when we build the targets with sm-compile we get the following contents in our output files

$ sm-compile
$ cat build/schema.debug.sql


RAISE NOTICE 'CRITICAL: %', $LOG$critical message$LOG$;
RAISE NOTICE 'INFO: %', $LOG$just some info$LOG$;
$ cat build/schema.prod.sql


RAISE NOTICE 'CRITICAL: %', $LOG$critical message$LOG$;

Tool integration

The CLI sm-compile is built to work well with Makefiles and if you pass a target path as an argument for sm-compile only that will be built. The goal is not to build a full suite for unit testing and developer work flow in the schemamacros project. I rather have this be a small project that does something well.

That's it for this time, I will probably add a more functional example that showcases how the workflow can look in practice.