Advanced Aspects of Protein
The Collapse Rules
Definition
Important
The Collapse Rules are key to understand how data produced by Protein constructs are processed by
other Protein constructs (.do and .foreach).
It defines how Protein reduces empty sequences or mappings into simpler structures. It governs how Protein constructs produce final YAML structures.
This rule is central to Protein, because it ensures that it behaves in a way that least surprises users.
.do producing sequences
A sequence produced by a .do constructs collapses according to the following principles:
-
Empty sequence →
None
An empty list represents the absence of a value -
Single‑element sequence → the element itself
When a construct yields exactly one item, it is returned directly without wrapping. -
Otherwise → no collapse
If the sequence does not meet the above conditions, it is returned unchanged.
1. Empty sequence → None
Input
items: []
Collapsed result
items: null
2. Single‑element sequence → the element itself
Input
items:
- apple
Collapsed result
items: apple
.foreach and the Collapse Rule
.foreach always returns a sequence
It is important to understand that despite the fact that .foreach contains a .do
sequence, it always returns a list.
.foreach is, by essence, an operation on sequences (tables, lists, etc.)
.do will collapse the results, but if there is one result left, then .foreach will return
it as a list of 1 element.
Example
result:
.foreach:
.values: [x, [1]]
.do:
- "{{x}}"
Wil result in:
result:
- 1
.foreach producing a collected mapping
However the .foreach loop will do one additional thing: it collects of a sequence of mappings of cardinality 1 produced by the .do construct into a single merged mapping.
Example
Input
.local:
users:
- { id: 1, name: joe }
- { id: 2, name: jill }
accounts:
.foreach:
.values: [u, "{{ users }}"]
.do:
"{{ u.name }}":
id: "{{ u.id }}"
Intermediate (before collecting)
accounts:
- { joe: { id: 1 } }
- { jill: { id: 2 } }
Each element is a mapping with exactly one key → can be collected.
Collected result
accounts:
joe:
id: 1
jill:
id: 2
How to disable collected mapping
For the cases where you wish to preserve the list of mappings with one key,
set the collect_maps attribute to false.
.local:
users:
- { id: 1, name: joe }
- { id: 2, name: jill }
accounts:
.foreach:
.values: [u, "{{ users }}"]
.collect_mappings: false
.do:
"{{ u.name }}":
id: "{{ u.id }}"
How to escape expressions
The issue
When Protein renders a file, it uses Jinja as its templating engine to interpret keys and values.
This means that every occurrence of {{ ... }} is treated as a Jinja expression and evaluated before Protein produces the final YAML output.
However, the system that ultimately consumes the generated YAML—such as GitHub Actions—may also have its own templating syntax. GitHub Actions, for example, uses the form ${{ ... }} for its expressions. Because this syntax contains Jinja’s own {{ ... }} pattern, Jinja will try to evaluate the inner part first.
Consider this typical GitHub Actions snippet:
steps:
- name: Show GitHub ref
run: 'echo "Current ref is ${{ github.ref }}"'
If this appears inside a Protein template, Jinja will intercept the {{ github.ref }} portion, attempt to evaluate it, and almost certainly fail—preventing the correct GitHub expression from ever reaching GitHub Actions.
One-off solution
To prevent Jinja from interpreting GitHub’s ${{ ... }} expressions, you must explicitly tell it to treat that part of the template as literal text.
Ohe solution—as described in the Jinja documentation—is to wrap the affected section in a {% raw %} / {% endraw %} block.
Everything inside this block is passed through unchanged, allowing Protein to output the exact ${{ github.ref }} expression that GitHub Actions expects.
It will work only once
This is an escape technique. If, for some reason, you are interpreting the same string twice in expressions, then Protein will eventually attempt to apply the template.
Example: Protein‑idiomatic GitHub workflow generator
.local:
workflow_name: Example Workflow
name: "{{ workflow_name }}"
on:
push:
branches: [ main ]
jobs:
demo:
runs-on: ubuntu-latest
steps:
- name: Show GitHub ref
run: |
{% raw %}
echo "Current ref is ${{ github.ref }}"
{% endraw %}
.localappears first and contains only data, not logic.- The output file (here
github_workflow) is a single mapping key with a.templatebody. - The template uses Protein interpolation (
{{ workflow_name }}) where appropriate. - GitHub’s own
${{ … }}syntax is preserved via{% raw %}. - No unnecessary quoting, no heredocs, no shell tricks — just pure Protein.
If you want, I can show the idiomatic pattern for generating multiple workflow files using .foreach and the collapse rule.
Permanent solution to avoid interpretation
The solution to guarantee that a string will never be used as a template,
is to use the #!literal prefix in front of it, for example:
.define:
text: "#!literal Hello {{ name }}"
You can also use a Jinja expression that contains the template, with the quote filter:
.define:
text: "{{ 'Hello {{ name }}' | quote }}"
This will guarantee that Protein will never consider that string as a template, until the value is exported:
- When the string will be output as YAML or JSON, etc., it will appear without the prefix.
(
Hello {{ name }}). - The
.write_bufferconstruct will also strip the prefix.
Example: Protein‑idiomatic multi‑workflow generator
.define:
workflows:
- { name: build, version: 0.0.1 }
- { name: release, version: 0.0.2 }
.foreach:
.values: [w, "{{ workflows }}"]
.do:
"{{ w.name }}.yml":
name: "{{ w.name | capitalize }}"
on:
push:
branches: [ main ]
jobs:
runs-on: ubuntu-latest
steps:
- name: Show GitHub value
run: |
#!literal
echo "Value is ${{ github.ref }}"
.definefirst — this binds the two mappings (buildandrelease) to theworkflowskey..foreachproduces a sequence of 1‑key maps, which collapses into a mapping of workflow files.- Each iteration produces a file named
"{{ w.name }}.yml". .templateemits literal GitHub workflow YAML.#!literalensures GitHub’s${{ … }}syntax survives untouched.- Collapse rule merges all generated workflow files into a single mapping under
.github/workflows.
The result is a directory‑like structure:
.github/workflows:
build.yml: "<template output>"
release.yml: "<template output>"
Evaluation
What do you do, if you need to turn a literal template, into a calculated value?
To evaluate a template even when it is quoted, use the .eval construct:
# evaluation despite the quote:
text:
.eval: "#!literal Hello {{ name }}"
The construct .eval applies to a string expression, obligatorily;
applying it to any other type will raise a TYPE error.
Like any Protein expression it can return a string, another type of scalar,
a mapping, or a sequence.
Dynamically changing the initial context
There are three main ways of changing the initial context of your Protein file.
- A. With command-line arguments (also as sequences or mappings)
- B. Through environment variables
- C. With a dotenv file
A. With command-line arguments
From the command-line, you can update (or create) the top-level .create context
in your initial Protein tree, with the --set option
Arguments as scalars
The easiest way is to pass scalars (typically integers or strings):
Protein test1.yaml --set env=prod count=5
Supposing that your Protein file contained:
.local
env: test
count: 3
foo: barbaz
It will contain:
.local
env: prod
count: 5
foo: barbaz
If the tree started with a sequence, a top level map will be created:
.local
env: prod
count: 5
.do
- ...
Arguments as sequences or mappings
You can also set arguments as sequences or mappings (use YAML syntax):
Protein test1.yaml --set env=prod users="[Laurent, Paul]"
B. Through environment variables
Another way to change dynamically the initial conditions that govern a Protein program,
is to use the environment variables of the OS, through the getenv() function.
This statement may be used in any part of the Protein tree.
server:
address: "{{ get_env('MY_SERVER`) }}"
C. With a dotenv file
Dotenv files (with the .env suffix) are a common way
of storing configuration information, with key-value pairs separated by the = sign.
Protein supports dotenv files as input. Suppose a file called .env, in the source directory
of the Protein interpreter.
# Environment variables
API_KEY=123456
DEBUG=true
PORT=8080
If you want to make those values available as variables to your program:
.define:
.load '.env'
Since they are used within a .define construct, they will not appear in the output
(which is normally want you want), unless you explictly require it:
connect: "https://localhost:{{ PORT }}?api_key={{ API_KEY }}&debug={{ DEBUG }}"