Becoming a Jinja Ninja
Jinja is a templating framework used within Airflow
For those of us coming to Airflow without a background in Web Development (e.g. with Flask) the power of jinja can quite hidden. As of writing this, none of the examples in the Airflow documentation show off anything more than:
{{ my_var }}
or {{ my_func(a_variable) }}
This is missing out loads of awesome functionality which can save you so much time, and make our code much more readable!
Additionally, from Python 3.6, this sort of templating is so easy with fstrings:
my_template=f'{my_var}'
or my_template=f'{my_func(a_variable)}'
Why should I bother to use jinja at all? Happily, the answer is because jinja offers so much more, in a much clearer to read format than sting manipulation in vanilla Python.
Filters
If columns = ['col1', 'col2', 'col3']
then we easily concatenate the list of columns e.g.
SELECT {{ columns|join(', ') }}
FROM {{ my_table }}
would be:
SELECT col1, col2, col3
FROM schema.table
Other built in Filters
Statements
Using the syntax {% ... %}
allows for some very powerful logic within the template
Tests (aka Conditions)
SELECT *
FROM {{ my_table }}
{% if my_filter %}
WHERE {{ my_filter }}
{% endif %}
Loops
If columns = ['col1', 'col2', 'col3']
then we can do more complicated formatting e.g.
SELECT
{% for col in columns %}
{{ col }}{% if not loop.last %},{% endif %}
{% endfor %}
FROM {{ my_table }}
would be:
SELECT
col1,
col2,
col3
FROM schema.table
This format can be much more readable for large lists than the |join(', ')
filter
Particularly useful its the {% if not loop.last %},{% endif %}
which is built in to jinja. Other
loop helpers are available too
Single (variable) Object Pattern
Consider the template
SELECT {{ columns|join(', ') }}
FROM {{ schema }}.{{ name }}
with
my_template.render(
columns=columns,
schema=schema,
name=name,
)
vs this template
SELECT {{ table.columns|join(', ') }}
FROM {{ table.schema }}.{{ table.name }}
with
my_template.render(table=table)
Pros
- Adding new variable requires no change to the rendering, just adding to the “model” object if it doesn’t already exist
SELECT {{ table.columns|join(', ') }}
FROM {{ table.schema }}.{{ table.name }}
LIMIT {{ table.limit }}
- If the template uses variables from multiple “things” it’s clearer which are which without a huge number of variables e.g.
SELECT *
FROM {{ table.schema }}.{{ table.name }} AS t1
LEFT JOIN {{ table_2.schema }}.{{ table_2.name }} AS t2
ON t1.{{ table.id_col }} = t2.{{ table_2.f_key_col }}
Cons
- ???
Conclusion
Combining these simple aspects together can lead to some very elegant templates
These were just a few of the things I found super useful create templates, mostly for ELT workflows with SQL in a Data Warehouse
See the jinja Template Documentation for a more thorough overview