General advice
This chapter contains general advice on writing Tree-sitter queries and also some specific, important Topiary semantics.
@leaf
Some nodes should not have their contents formatted at all; the classic
example being string literals. The @leaf
capture will mark such nodes
as leaves -- even if they admit their own structure, by virtue of the
grammar -- and leave them unformatted.
Example
; Don't format strings or comments
[
(string)
(comment)
] @leaf
This can make it tricky to format strings that allow interpolation. In
such cases, ideally the grammar would expose this structure, such that
the non-interpolated parts of the string can be @leaf
.
@do_nothing
If any of the captures in a query match are @do_nothing
, then the
entire match will be ignored. This is useful for cancelling a formatting
query based on context.
Example
; Put a semicolon delimiter after field declarations, unless they already have
; one, in which case we do nothing.
(
(field_declaration) @append_delimiter
.
";"* @do_nothing
(#delimiter! ";")
)
Nodes which are annotated with @do_nothing
ought to be
quantified with Tree-sitter's *
(zero or
more matches) or ?
(at most one match) operators, to define a pattern
where the exceptional node could appear. Without, the @do_nothing
capture will always be applied and the query will be cancelled
regardless.
#query_name!
When the logging verbosity is set to -vv
or higher (see runtime
dialogue), Topiary outputs information
about which queries are matched, for instance:
[2024-10-08T15:48:13Z INFO topiary_core::tree_sitter] Processing match: LocalQueryMatch { pattern_index: 17, captures: [ {Node "," (1,3) - (1,4)} ] } at location (286,1)
Here, pattern_index: 17
means that the 17th (0-based) pattern in the
query file has matched. Counting patterns in the query file -- not to
mention the potential for off-by-one errors -- is not a great developer
experience!
As such, the optional predicate #query_name!
, taking a string
argument, can be added to any query. It will modify the log line to
display its argument, to aid debugging.
Example
Considering the log line above, and let us assume that the query at
location (286,1)
is:
(
"," @append_space
.
(_)
)
If we add a query_name
predicate:
(
"," @append_space
.
(_)
(#query_name! "comma spacing")
)
Then the log line will become:
[2024-10-08T15:48:13Z INFO topiary_core::tree_sitter] Processing match of query "comma spacing": LocalQueryMatch { pattern_index: 17, captures: [ {Node "," (1,3) - (1,4)} ] } at location (286,1)
Tree-sitter predicates
Tree-sitter supports a number of predicates by default, which allow for fine-tuning queries. These are discussed in the Tree-sitter documentation and outlined here:
#eq?
: Checks a direct match against a capture or string.#match?
: Checks a match against a regular expression.#any-of?
: Checks a match against a list of strings.- Prefixing
not-
negates any of the above predicates.
Note
Topiary does not allow arbitrary capture names; just those it defines for formatting. The Tree-sitter predicates expect a capture name and, as such, this can make using them with Topiary a little unwieldy (see issue #824).
For example, as of writing, while the documented any-
prefix for eq
and match
is recognised by Topiary's Tree-sitter, it doesn't appear to
work as advertised.
Query and capture precedence
Formatting is not necessarily invariant over the order of queries. For example, queries that add delimiters or remove nodes can have a different effect on the formatted output depending on the order in which they appear in the query file.
Consider, say, the following two queries for the Bash grammar:
; Query A: Append semicolon
(
(word) @append_delimiter
.
";"? @do_nothing
(#delimiter! ";")
)
; Query B: Surround with quotes
(
"\""? @do_nothing
.
(word) @prepend_delimiter @append_delimiter
.
"\""? @do_nothing
(#delimiter! "\"")
)
In the order presented above (A
, then B
), then the input foo
will
be formatted as:
"foo;"
In the opposite order (B
, then A
), Topiary will however produce the
following output:
"foo";
A similar consideration exists for capture names. That is, while most captures do not meaningfully affect one another, there are three notable exceptions:
-
@do_nothing
(see above) will cancel all other captures in a matched query. This takes the highest priority. -
@delete
(see insertion and deletion) will delete any matched node, providing the matching query is not cancelled. -
@leaf
(see above) will suppress formatting within that node, even if it admits some internal structure. However, leaf nodes are still subject to deletion.
Note
While not in the same league as the above, also note that antispaces will cancel out all inserted spaces (see horizontal spacing).
Captures are always postfix
Note that a capture is put after the node it is associated with. If you want to put a space in front of a node, for example, you do so like this:
(infix_operator) @prepend_space
This, on the other hand, will not work:
@append_space (infix_operator)
A note on anchors
The behaviour of "anchors" can be counter-intuitive. Consider, for instance, the following query:
(
(list_entry) @append_space
.
)
One might assume that this query only matches the final element in the
list but this is not true. Since we did not explicitly march a parent
node, the engine will match on every list_entry
. After all, when
looking only at the nodes in the query, the list_entry
is indeed the
last node.
To resolve this issue, match explicitly on the parent node:
(list
(list_entry) @append_space
.
)
Or even implicitly:
(_
(list_entry) @append_space
.
)
Note that while anchors can be defined between anonymous nodes, if they are given as explicit terminals, anonymous nodes that interpose an anchor's terminals (named or anonymous) will be skipped over.
For example, in this Bash code:
if this; then that; fi
The following query matches the nodes indicated in the comments:
(if_statement
(_) ; will match "this"
.
(_) ; will match "that"
)
In the Bash grammar, this
and that
are named nodes, but are
interposed by the ;
and then
anonymous nodes, which are ignored by
the anchor.
Using anchors wherever possible is highly recommended, otherwise queries can become too general and over-match, despite resulting in the same outcome. This can significantly impact formatting performance.
For example, imagine the list [1 2 3 4 5]
. Adding spaces between
elements would be best expressed as:
(list
(element) @append_space
.
(element)
)
Here, this query will match 4 times -- 1 2
, 2 3
, 3 4
and 4 5
--
and Topiary will insert exactly the right number of spaces.
If we remove the anchor, it will match 10 times -- 1 2
, 1 3
, 1 4
,
1 5
, 2 3
, 2 4
, 2 5
, 3 4
, 3 5
and 4 5
-- so Topiary does
more than twice as much work, only for subsequent processing
to remove all those extraneous spaces.