diff options
Diffstat (limited to 'admin/notes/tree-sitter/html-manual/Pattern-Matching.html')
-rw-r--r-- | admin/notes/tree-sitter/html-manual/Pattern-Matching.html | 430 |
1 files changed, 430 insertions, 0 deletions
diff --git a/admin/notes/tree-sitter/html-manual/Pattern-Matching.html b/admin/notes/tree-sitter/html-manual/Pattern-Matching.html new file mode 100644 index 00000000000..e14efe71629 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Pattern-Matching.html @@ -0,0 +1,430 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ --> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<!-- This is the GNU Emacs Lisp Reference Manual +corresponding to Emacs version 29.0.50. + +Copyright © 1990-1996, 1998-2022 Free Software Foundation, +Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with the +Invariant Sections being "GNU General Public License," with the +Front-Cover Texts being "A GNU Manual," and with the Back-Cover +Texts as in (a) below. A copy of the license is included in the +section entitled "GNU Free Documentation License." + +(a) The FSF's Back-Cover Text is: "You have the freedom to copy and +modify this GNU manual. Buying copies from the FSF supports it in +developing GNU and promoting software freedom." --> +<title>Pattern Matching (GNU Emacs Lisp Reference Manual)</title> + +<meta name="description" content="Pattern Matching (GNU Emacs Lisp Reference Manual)"> +<meta name="keywords" content="Pattern Matching (GNU Emacs Lisp Reference Manual)"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<meta name="viewport" content="width=device-width,initial-scale=1"> + +<link href="index.html" rel="start" title="Top"> +<link href="Index.html" rel="index" title="Index"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source"> +<link href="Multiple-Languages.html" rel="next" title="Multiple Languages"> +<link href="Accessing-Node.html" rel="prev" title="Accessing Node"> +<style type="text/css"> +<!-- +a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em} +a.summary-letter {text-decoration: none} +blockquote.indentedblock {margin-right: 0em} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +kbd {font-style: oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +span.nolinebreak {white-space: nowrap} +span.roman {font-family: initial; font-weight: normal} +span.sansserif {font-family: sans-serif; font-weight: normal} +span:hover a.copiable-anchor {visibility: visible} +ul.no-bullet {list-style: none} +--> +</style> +<link rel="stylesheet" type="text/css" href="./manual.css"> + + +</head> + +<body lang="en"> +<div class="section" id="Pattern-Matching"> +<div class="header"> +<p> +Next: <a href="Multiple-Languages.html" accesskey="n" rel="next">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html" accesskey="p" rel="prev">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<span id="Pattern-Matching-Tree_002dsitter-Nodes"></span><h3 class="section">37.5 Pattern Matching Tree-sitter Nodes</h3> + +<p>Tree-sitter let us pattern match with a small declarative language. +Pattern matching consists of two steps: first tree-sitter matches a +<em>pattern</em> against nodes in the syntax tree, then it <em>captures</em> +specific nodes in that pattern and returns the captured nodes. +</p> +<p>We describe first how to write the most basic query pattern and how to +capture nodes in a pattern, then the pattern-match function, finally +more advanced pattern syntax. +</p> +<span id="Basic-query-syntax"></span><h3 class="heading">Basic query syntax</h3> + +<span id="index-Tree_002dsitter-query-syntax"></span> +<span id="index-Tree_002dsitter-query-pattern"></span> +<p>A <em>query</em> consists of multiple <em>patterns</em>. Each pattern is an +s-expression that matches a certain node in the syntax node. A +pattern has the following shape: +</p> +<div class="example"> +<pre class="example">(<var>type</var> <var>child</var>...) +</pre></div> + +<p>For example, a pattern that matches a <code>binary_expression</code> node that +contains <code>number_literal</code> child nodes would look like +</p> +<div class="example"> +<pre class="example">(binary_expression (number_literal)) +</pre></div> + +<p>To <em>capture</em> a node in the query pattern above, append +<code>@capture-name</code> after the node pattern you want to capture. For +example, +</p> +<div class="example"> +<pre class="example">(binary_expression (number_literal) @number-in-exp) +</pre></div> + +<p>captures <code>number_literal</code> nodes that are inside a +<code>binary_expression</code> node with capture name <code>number-in-exp</code>. +</p> +<p>We can capture the <code>binary_expression</code> node too, with capture +name <code>biexp</code>: +</p> +<div class="example"> +<pre class="example">(binary_expression + (number_literal) @number-in-exp) @biexp +</pre></div> + +<span id="Query-function"></span><h3 class="heading">Query function</h3> + +<p>Now we can introduce the query functions. +</p> +<dl class="def"> +<dt id="index-treesit_002dquery_002dcapture"><span class="category">Function: </span><span><strong>treesit-query-capture</strong> <em>node query &optional beg end node-only</em><a href='#index-treesit_002dquery_002dcapture' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>This function matches patterns in <var>query</var> in <var>node</var>. +Parameter <var>query</var> can be either a string, a s-expression, or a +compiled query object. For now, we focus on the string syntax; +s-expression syntax and compiled query are described at the end of the +section. +</p> +<p>Parameter <var>node</var> can also be a parser or a language symbol. A +parser means using its root node, a language symbol means find or +create a parser for that language in the current buffer, and use the +root node. +</p> +<p>The function returns all captured nodes in a list of +<code>(<var>capture_name</var> . <var>node</var>)</code>. If <var>node-only</var> is +non-nil, a list of node is returned instead. If <var>beg</var> and +<var>end</var> are both non-nil, this function only pattern matches nodes +in that range. +</p> +<span id="index-treesit_002dquery_002derror"></span> +<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is +malformed. The signal data contains a description of the specific +error. You can use <code>treesit-query-validate</code> to debug the query. +</p></dd></dl> + +<p>For example, suppose <var>node</var>’s content is <code>1 + 2</code>, and +<var>query</var> is +</p> +<div class="example"> +<pre class="example">(setq query + "(binary_expression + (number_literal) @number-in-exp) @biexp") +</pre></div> + +<p>Querying that query would return +</p> +<div class="example"> +<pre class="example">(treesit-query-capture node query) + ⇒ ((biexp . <var><node for "1 + 2"></var>) + (number-in-exp . <var><node for "1"></var>) + (number-in-exp . <var><node for "2"></var>)) +</pre></div> + +<p>As we mentioned earlier, a <var>query</var> could contain multiple +patterns. For example, it could have two top-level patterns: +</p> +<div class="example"> +<pre class="example">(setq query + "(binary_expression) @biexp + (number_literal) @number @biexp") +</pre></div> + +<dl class="def"> +<dt id="index-treesit_002dquery_002dstring"><span class="category">Function: </span><span><strong>treesit-query-string</strong> <em>string query language</em><a href='#index-treesit_002dquery_002dstring' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>This function parses <var>string</var> with <var>language</var>, pattern matches +its root node with <var>query</var>, and returns the result. +</p></dd></dl> + +<span id="More-query-syntax"></span><h3 class="heading">More query syntax</h3> + +<p>Besides node type and capture, tree-sitter’s query syntax can express +anonymous node, field name, wildcard, quantification, grouping, +alternation, anchor, and predicate. +</p> +<span id="Anonymous-node"></span><h4 class="subheading">Anonymous node</h4> + +<p>An anonymous node is written verbatim, surrounded by quotes. A +pattern matching (and capturing) keyword <code>return</code> would be +</p> +<div class="example"> +<pre class="example">"return" @keyword +</pre></div> + +<span id="Wild-card"></span><h4 class="subheading">Wild card</h4> + +<p>In a query pattern, ‘<samp>(_)</samp>’ matches any named node, and ‘<samp>_</samp>’ +matches any named and anonymous node. For example, to capture any +named child of a <code>binary_expression</code> node, the pattern would be +</p> +<div class="example"> +<pre class="example">(binary_expression (_) @in_biexp) +</pre></div> + +<span id="Field-name"></span><h4 class="subheading">Field name</h4> + +<p>We can capture child nodes that has specific field names: +</p> +<div class="example"> +<pre class="example">(function_definition + declarator: (_) @func-declarator + body: (_) @func-body) +</pre></div> + +<p>We can also capture a node that doesn’t have certain field, say, a +<code>function_definition</code> without a <code>body</code> field. +</p> +<div class="example"> +<pre class="example">(function_definition !body) @func-no-body +</pre></div> + +<span id="Quantify-node"></span><h4 class="subheading">Quantify node</h4> + +<p>Tree-sitter recognizes quantification operators ‘<samp>*</samp>’, ‘<samp>+</samp>’ and +‘<samp>?</samp>’. Their meanings are the same as in regular expressions: +‘<samp>*</samp>’ matches the preceding pattern zero or more times, ‘<samp>+</samp>’ +matches one or more times, and ‘<samp>?</samp>’ matches zero or one time. +</p> +<p>For example, this pattern matches <code>type_declaration</code> nodes +that has <em>zero or more</em> <code>long</code> keyword. +</p> +<div class="example"> +<pre class="example">(type_declaration "long"*) @long-type +</pre></div> + +<p>And this pattern matches a type declaration that has zero or one +<code>long</code> keyword: +</p> +<div class="example"> +<pre class="example">(type_declaration "long"?) @long-type +</pre></div> + +<span id="Grouping"></span><h4 class="subheading">Grouping</h4> + +<p>Similar to groups in regular expression, we can bundle patterns into a +group and apply quantification operators to it. For example, to +express a comma separated list of identifiers, one could write +</p> +<div class="example"> +<pre class="example">(identifier) ("," (identifier))* +</pre></div> + +<span id="Alternation"></span><h4 class="subheading">Alternation</h4> + +<p>Again, similar to regular expressions, we can express “match anyone +from this group of patterns” in the query pattern. The syntax is a +list of patterns enclosed in square brackets. For example, to capture +some keywords in C, the query pattern would be +</p> +<div class="example"> +<pre class="example">[ + "return" + "break" + "if" + "else" +] @keyword +</pre></div> + +<span id="Anchor"></span><h4 class="subheading">Anchor</h4> + +<p>The anchor operator ‘<samp>.</samp>’ can be used to enforce juxtaposition, +i.e., to enforce two things to be directly next to each other. The +two “things” can be two nodes, or a child and the end of its parent. +For example, to capture the first child, the last child, or two +adjacent children: +</p> +<div class="example"> +<pre class="example">;; Anchor the child with the end of its parent. +(compound_expression (_) @last-child .) + +;; Anchor the child with the beginning of its parent. +(compound_expression . (_) @first-child) + +;; Anchor two adjacent children. +(compound_expression + (_) @prev-child + . + (_) @next-child) +</pre></div> + +<p>Note that the enforcement of juxtaposition ignores any anonymous +nodes. +</p> +<span id="Predicate"></span><h4 class="subheading">Predicate</h4> + +<p>We can add predicate constraints to a pattern. For example, if we use +the following query pattern +</p> +<div class="example"> +<pre class="example">( + (array . (_) @first (_) @last .) + (#equal @first @last) +) +</pre></div> + +<p>Then tree-sitter only matches arrays where the first element equals to +the last element. To attach a predicate to a pattern, we need to +group then together. A predicate always starts with a ‘<samp>#</samp>’. +Currently there are two predicates, <code>#equal</code> and <code>#match</code>. +</p> +<dl class="def"> +<dt id="index-equal-1"><span class="category">Predicate: </span><span><strong>equal</strong> <em>arg1 arg2</em><a href='#index-equal-1' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>Matches if <var>arg1</var> equals to <var>arg2</var>. Arguments can be either a +string or a capture name. Capture names represent the text that the +captured node spans in the buffer. +</p></dd></dl> + +<dl class="def"> +<dt id="index-match"><span class="category">Predicate: </span><span><strong>match</strong> <em>regexp capture-name</em><a href='#index-match' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>Matches if the text that <var>capture-name</var>’s node spans in the buffer +matches regular expression <var>regexp</var>. Matching is case-sensitive. +</p></dd></dl> + +<p>Note that a predicate can only refer to capture names appeared in the +same pattern. Indeed, it makes little sense to refer to capture names +in other patterns anyway. +</p> +<span id="S_002dexpression-patterns"></span><h3 class="heading">S-expression patterns</h3> + +<p>Besides strings, Emacs provides a s-expression based syntax for query +patterns. It largely resembles the string-based syntax. For example, +the following pattern +</p> +<div class="example"> +<pre class="example">(treesit-query-capture + node "(addition_expression + left: (_) @left + \"+\" @plus-sign + right: (_) @right) @addition + + [\"return\" \"break\"] @keyword") +</pre></div> + +<p>is equivalent to +</p> +<div class="example"> +<pre class="example">(treesit-query-capture + node '((addition_expression + left: (_) @left + "+" @plus-sign + right: (_) @right) @addition + + ["return" "break"] @keyword)) +</pre></div> + +<p>Most pattern syntax can be written directly as strange but +never-the-less valid s-expressions. Only a few of them needs +modification: +</p> +<ul> +<li> Anchor ‘<samp>.</samp>’ is written as <code>:anchor</code>. +</li><li> ‘<samp>?</samp>’ is written as ‘<samp>:?</samp>’. +</li><li> ‘<samp>*</samp>’ is written as ‘<samp>:*</samp>’. +</li><li> ‘<samp>+</samp>’ is written as ‘<samp>:+</samp>’. +</li><li> <code>#equal</code> is written as <code>:equal</code>. In general, predicates +change their ‘<samp>#</samp>’ to ‘<samp>:</samp>’. +</li></ul> + +<p>For example, +</p> +<div class="example"> +<pre class="example">"( + (compound_expression . (_) @first (_)* @rest) + (#match \"love\" @first) + )" +</pre></div> + +<p>is written in s-expression as +</p> +<div class="example"> +<pre class="example">'(( + (compound_expression :anchor (_) @first (_) :* @rest) + (:match "love" @first) + )) +</pre></div> + +<span id="Compiling-queries"></span><h3 class="heading">Compiling queries</h3> + +<p>If a query will be used repeatedly, especially in tight loops, it is +important to compile that query, because a compiled query is much +faster than an uncompiled one. A compiled query can be used anywhere +a query is accepted. +</p> +<dl class="def"> +<dt id="index-treesit_002dquery_002dcompile"><span class="category">Function: </span><span><strong>treesit-query-compile</strong> <em>language query</em><a href='#index-treesit_002dquery_002dcompile' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>This function compiles <var>query</var> for <var>language</var> into a compiled +query object and returns it. +</p> +<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is +malformed. The signal data contains a description of the specific +error. You can use <code>treesit-query-validate</code> to debug the query. +</p></dd></dl> + +<dl class="def"> +<dt id="index-treesit_002dquery_002dexpand"><span class="category">Function: </span><span><strong>treesit-query-expand</strong> <em>query</em><a href='#index-treesit_002dquery_002dexpand' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>This function expands the s-expression <var>query</var> into a string +query. +</p></dd></dl> + +<dl class="def"> +<dt id="index-treesit_002dpattern_002dexpand"><span class="category">Function: </span><span><strong>treesit-pattern-expand</strong> <em>pattern</em><a href='#index-treesit_002dpattern_002dexpand' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>This function expands the s-expression <var>pattern</var> into a string +pattern. +</p></dd></dl> + +<p>Finally, tree-sitter project’s documentation about +pattern-matching can be found at +<a href="https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries">https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries</a>. +</p> +</div> +<hr> +<div class="header"> +<p> +Next: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> +</div> + + + +</body> +</html> |