summaryrefslogtreecommitdiff
path: root/admin/notes/tree-sitter/html-manual/Language-Definitions.html
diff options
context:
space:
mode:
Diffstat (limited to 'admin/notes/tree-sitter/html-manual/Language-Definitions.html')
-rw-r--r--admin/notes/tree-sitter/html-manual/Language-Definitions.html283
1 files changed, 169 insertions, 114 deletions
diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html
index ba3eeb9eeb9..6df676b1680 100644
--- a/admin/notes/tree-sitter/html-manual/Language-Definitions.html
+++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html
@@ -66,14 +66,17 @@ Next: <a href="Using-Parser.html" accesskey="n" rel="next">Using Tree-sitter Par
</div>
<hr>
<span id="Tree_002dsitter-Language-Definitions"></span><h3 class="section">37.1 Tree-sitter Language Definitions</h3>
+<span id="index-language-definitions_002c-for-tree_002dsitter"></span>
<span id="Loading-a-language-definition"></span><h3 class="heading">Loading a language definition</h3>
+<span id="index-loading-language-definition-for-tree_002dsitter"></span>
+<span id="index-language-argument_002c-for-tree_002dsitter"></span>
<p>Tree-sitter relies on language definitions to parse text in that
-language. In Emacs, A language definition is represented by a symbol.
-For example, C language definition is represented as <code>c</code>, and
-<code>c</code> can be passed to tree-sitter functions as the <var>language</var>
-argument.
+language. In Emacs, a language definition is represented by a symbol.
+For example, the C language definition is represented as the symbol
+<code>c</code>, and <code>c</code> can be passed to tree-sitter functions as the
+<var>language</var> argument.
</p>
<span id="index-treesit_002dextra_002dload_002dpath"></span>
<span id="index-treesit_002dload_002dlanguage_002derror"></span>
@@ -81,55 +84,92 @@ argument.
<p>Tree-sitter language definitions are distributed as dynamic libraries.
In order to use a language definition in Emacs, you need to make sure
that the dynamic library is installed on the system. Emacs looks for
-language definitions under load paths in
-<code>treesit-extra-load-path</code>, <code>user-emacs-directory</code>/tree-sitter,
-and system default locations for dynamic libraries, in that order.
-Emacs tries each extensions in <code>treesit-load-suffixes</code>. If Emacs
-cannot find the library or has problem loading it, Emacs signals
-<code>treesit-load-language-error</code>. The signal data is a list of
-specific error messages.
+language definitions in several places, in the following order:
+</p>
+<ul>
+<li> first, in the list of directories specified by the variable
+<code>treesit-extra-load-path</code>;
+</li><li> then, in the <samp>tree-sitter</samp> subdirectory of the directory
+specified by <code>user-emacs-directory</code> (see <a href="Init-File.html">The Init File</a>);
+</li><li> and finally, in the system&rsquo;s default locations for dynamic libraries.
+</li></ul>
+
+<p>In each of these directories, Emacs looks for a file with file-name
+extensions specified by the variable <code>treesit-load-suffixes</code>.
+</p>
+<p>If Emacs cannot find the library or has problems loading it, Emacs
+signals the <code>treesit-load-language-error</code> error. The data of
+that signal could be one of the following:
+</p>
+<dl compact="compact">
+<dt><span><code>(not-found <var>error-msg</var> &hellip;)</code></span></dt>
+<dd><p>This means that Emacs could not find the language definition library.
+</p></dd>
+<dt><span><code>(symbol-error <var>error-msg</var>)</code></span></dt>
+<dd><p>This means that Emacs could not find in the library the expected function
+that every language definition library should export.
+</p></dd>
+<dt><span><code>(version-mismatch <var>error-msg</var>)</code></span></dt>
+<dd><p>This means that the version of language definition library is incompatible
+with that of the tree-sitter library.
+</p></dd>
+</dl>
+
+<p>In all of these cases, <var>error-msg</var> might provide additional
+details about the failure.
</p>
<dl class="def">
-<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> &para;</a></span></dt>
-<dd><p>This function checks whether the dynamic library for <var>language</var> is
-present on the system, and return non-nil if it is.
+<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language &amp;optional detail</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> &para;</a></span></dt>
+<dd><p>This function returns non-<code>nil</code> if the language definitions for
+<var>language</var> exist and can be loaded.
+</p>
+<p>If <var>detail</var> is non-<code>nil</code>, return <code>(t . nil)</code> when
+<var>language</var> is available, and <code>(nil . <var>data</var>)</code> when it&rsquo;s
+unavailable. <var>data</var> is the signal data of
+<code>treesit-load-language-error</code>.
</p></dd></dl>
<span id="index-treesit_002dload_002dname_002doverride_002dlist"></span>
-<p>By convention, the dynamic library for <var>language</var> is
-<code>libtree-sitter-<var>language</var>.<var>ext</var></code>, where <var>ext</var> is the
-system-specific extension for dynamic libraries. Also by convention,
+<p>By convention, the file name of the dynamic library for <var>language</var> is
+<samp>libtree-sitter-<var>language</var>.<var>ext</var></samp>, where <var>ext</var> is the
+system-specific extension for dynamic libraries. Also by convention,
the function provided by that library is named
-<code>tree_sitter_<var>language</var></code>. If a language definition doesn&rsquo;t
-follow this convention, you should add an entry
+<code>tree_sitter_<var>language</var></code>. If a language definition library
+doesn&rsquo;t follow this convention, you should add an entry
</p>
<div class="example">
<pre class="example">(<var>language</var> <var>library-base-name</var> <var>function-name</var>)
</pre></div>
-<p>to <code>treesit-load-name-override-list</code>, where
-<var>library-base-name</var> is the base filename for the dynamic library
-(conventionally <code>libtree-sitter-<var>language</var></code>), and
+<p>to the list in the variable <code>treesit-load-name-override-list</code>, where
+<var>library-base-name</var> is the basename of the dynamic library&rsquo;s file name,
+(usually, <samp>libtree-sitter-<var>language</var></samp>), and
<var>function-name</var> is the function provided by the library
-(conventionally <code>tree_sitter_<var>language</var></code>). For example,
+(usually, <code>tree_sitter_<var>language</var></code>). For example,
</p>
<div class="example">
<pre class="example">(cool-lang &quot;libtree-sitter-coool&quot; &quot;tree_sitter_cooool&quot;)
</pre></div>
-<p>for a language too cool to abide by conventions.
+<p>for a language that considers itself too &ldquo;cool&rdquo; to abide by
+conventions.
</p>
+<span id="index-language_002ddefinition-version_002c-compatibility"></span>
<dl class="def">
<dt id="index-treesit_002dlanguage_002dversion"><span class="category">Function: </span><span><strong>treesit-language-version</strong> <em>&amp;optional min-compatible</em><a href='#index-treesit_002dlanguage_002dversion' class='copiable-anchor'> &para;</a></span></dt>
-<dd><p>Tree-sitter library has a <em>language version</em>, a language
-definition&rsquo;s version needs to match this version to be compatible.
-</p>
-<p>This function returns tree-sitter library’s language version. If
-<var>min-compatible</var> is non-nil, it returns the minimal compatible
-version.
+<dd><p>This function returns the version of the language-definition
+Application Binary Interface (<acronym>ABI</acronym>) supported by the
+tree-sitter library. By default, it returns the latest ABI version
+supported by the library, but if <var>min-compatible</var> is
+non-<code>nil</code>, it returns the oldest ABI version which the library
+still can support. Language definition libraries must be built for
+ABI versions between the oldest and the latest versions supported by
+the tree-sitter library, otherwise the library will be unable to load
+them.
</p></dd></dl>
<span id="Concrete-syntax-tree"></span><h3 class="heading">Concrete syntax tree</h3>
+<span id="index-syntax-tree_002c-concrete"></span>
<p>A syntax tree is what a parser generates. In a syntax tree, each node
represents a piece of text, and is connected to each other by a
@@ -155,31 +195,34 @@ parent-child relationship. For example, if the source text is
+------------+ +--------------+ +------------+
</pre></div>
-<p>We can also represent it in s-expression:
+<p>We can also represent it as an s-expression:
</p>
<div class="example">
<pre class="example">(root (expression (number) (operator) (number)))
</pre></div>
<span id="Node-types"></span><h4 class="subheading">Node types</h4>
-
-<span id="index-tree_002dsitter-node-type"></span>
-<span id="tree_002dsitter-node-type"></span><span id="index-tree_002dsitter-named-node"></span>
-<span id="tree_002dsitter-named-node"></span><span id="index-tree_002dsitter-anonymous-node"></span>
-<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>,
-<code>operator</code> are nodes&rsquo; <em>type</em>. However, not all nodes in a
-syntax tree have a type. Nodes that don&rsquo;t are <em>anonymous nodes</em>,
-and nodes with a type are <em>named nodes</em>. Anonymous nodes are
-tokens with fixed spellings, including punctuation characters like
-bracket &lsquo;<samp>]</samp>&rsquo;, and keywords like <code>return</code>.
+<span id="index-node-types_002c-in-a-syntax-tree"></span>
+
+<span id="index-type-of-node_002c-tree_002dsitter"></span>
+<span id="tree_002dsitter-node-type"></span><span id="index-named-node_002c-tree_002dsitter"></span>
+<span id="tree_002dsitter-named-node"></span><span id="index-anonymous-node_002c-tree_002dsitter"></span>
+<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>, and
+<code>operator</code> specify the <em>type</em> of the nodes. However, not all
+nodes in a syntax tree have a type. Nodes that don&rsquo;t have a type are
+known as <em>anonymous nodes</em>, and nodes with a type are <em>named
+nodes</em>. Anonymous nodes are tokens with fixed spellings, including
+punctuation characters like bracket &lsquo;<samp>]</samp>&rsquo;, and keywords like
+<code>return</code>.
</p>
<span id="Field-names"></span><h4 class="subheading">Field names</h4>
+<span id="index-field-name_002c-tree_002dsitter"></span>
<span id="index-tree_002dsitter-node-field-name"></span>
-<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to
-analyze, many language definitions assign <em>field names</em> to child
-nodes. For example, a <code>function_definition</code> node could have a
-<code>declarator</code> and a <code>body</code>:
+<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to analyze, many language definitions
+assign <em>field names</em> to child nodes. For example, a
+<code>function_definition</code> node could have a <code>declarator</code> and a
+<code>body</code>:
</p>
<div class="example">
<pre class="example">(function_definition
@@ -189,39 +232,40 @@ nodes. For example, a <code>function_definition</code> node could have a
<dl class="def">
<dt id="index-treesit_002dinspect_002dmode"><span class="category">Command: </span><span><strong>treesit-inspect-mode</strong><a href='#index-treesit_002dinspect_002dmode' class='copiable-anchor'> &para;</a></span></dt>
-<dd><p>This minor mode displays the node that <em>starts</em> at point in
-mode-line. The mode-line will display
+<dd><p>This minor mode displays on the mode-line the node that <em>starts</em>
+at point. The mode-line will display
</p>
<div class="example">
-<pre class="example"><var>parent</var> <var>field-name</var>: (<var>child</var> (<var>grand-child</var> (...)))
+<pre class="example"><var>parent</var> <var>field</var>: (<var>child</var> (<var>grandchild</var> (&hellip;)))
</pre></div>
-<p><var>child</var>, <var>grand-child</var>, and <var>grand-grand-child</var>, etc, are
-nodes that have their beginning at point. And <var>parent</var> is the
-parent of <var>child</var>.
+<p><var>child</var>, <var>grand</var>, <var>grand-grandchild</var>, etc., are nodes that
+begin at point. <var>parent</var> is the parent node of <var>child</var>.
</p>
<p>If there is no node that starts at point, i.e., point is in the middle
of a node, then the mode-line only displays the smallest node that
-spans point, and its immediate parent.
+spans the position of point, and its immediate parent.
</p>
<p>This minor mode doesn&rsquo;t create parsers on its own. It simply uses the
first parser in <code>(treesit-parser-list)</code> (see <a href="Using-Parser.html">Using Tree-sitter Parser</a>).
</p></dd></dl>
<span id="Reading-the-grammar-definition"></span><h3 class="heading">Reading the grammar definition</h3>
+<span id="index-reading-grammar-definition_002c-tree_002dsitter"></span>
<p>Authors of language definitions define the <em>grammar</em> of a
-language, and this grammar determines how does a parser construct a
-concrete syntax tree out of the text. In order to use the syntax
-tree effectively, we need to read the <em>grammar file</em>.
+programming language, which determines how a parser constructs a
+concrete syntax tree out of the program text. In order to use the
+syntax tree effectively, you need to consult the <em>grammar file</em>.
</p>
-<p>The grammar file is usually <code>grammar.js</code> in a language
-definition’s project repository. The link to a language definition’s
-home page can be found in tree-sitter’s homepage
-(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>).
+<p>The grammar file is usually <samp>grammar.js</samp> in a language
+definition&rsquo;s project repository. The link to a language definition&rsquo;s
+home page can be found on
+<a href="https://tree-sitter.github.io/tree-sitter">tree-sitter&rsquo;s
+homepage</a>.
</p>
-<p>The grammar is written in JavaScript syntax. For example, the rule
-matching a <code>function_definition</code> node looks like
+<p>The grammar definition is written in JavaScript. For example, the
+rule matching a <code>function_definition</code> node looks like
</p>
<div class="example">
<pre class="example">function_definition: $ =&gt; seq(
@@ -231,12 +275,12 @@ matching a <code>function_definition</code> node looks like
)
</pre></div>
-<p>The rule is represented by a function that takes a single argument
+<p>The rules are represented by functions that take a single argument
<var>$</var>, representing the whole grammar. The function itself is
-constructed by other functions: the <code>seq</code> function puts together a
-sequence of children; the <code>field</code> function annotates a child with
-a field name. If we write the above definition in BNF syntax, it
-would look like
+constructed by other functions: the <code>seq</code> function puts together
+a sequence of children; the <code>field</code> function annotates a child
+with a field name. If we write the above definition in the so-called
+<em>Backus-Naur Form</em> (<acronym>BNF</acronym>) syntax, it would look like
</p>
<div class="example">
<pre class="example">function_definition :=
@@ -252,66 +296,77 @@ would look like
body: (compound_statement))
</pre></div>
-<p>Below is a list of functions that one will see in a grammar
-definition. Each function takes other rules as arguments and returns
-a new rule.
+<p>Below is a list of functions that one can see in a grammar definition.
+Each function takes other rules as arguments and returns a new rule.
</p>
-<ul>
-<li> <code>seq(rule1, rule2, ...)</code> matches each rule one after another.
-
-</li><li> <code>choice(rule1, rule2, ...)</code> matches one of the rules in its
-arguments.
-
-</li><li> <code>repeat(rule)</code> matches <var>rule</var> for <em>zero or more</em> times.
+<dl compact="compact">
+<dt><span><code>seq(<var>rule1</var>, <var>rule2</var>, &hellip;)</code></span></dt>
+<dd><p>matches each rule one after another.
+</p></dd>
+<dt><span><code>choice(<var>rule1</var>, <var>rule2</var>, &hellip;)</code></span></dt>
+<dd><p>matches one of the rules in its arguments.
+</p></dd>
+<dt><span><code>repeat(<var>rule</var>)</code></span></dt>
+<dd><p>matches <var>rule</var> for <em>zero or more</em> times.
This is like the &lsquo;<samp>*</samp>&rsquo; operator in regular expressions.
-
-</li><li> <code>repeat1(rule)</code> matches <var>rule</var> for <em>one or more</em> times.
+</p></dd>
+<dt><span><code>repeat1(<var>rule</var>)</code></span></dt>
+<dd><p>matches <var>rule</var> for <em>one or more</em> times.
This is like the &lsquo;<samp>+</samp>&rsquo; operator in regular expressions.
-
-</li><li> <code>optional(rule)</code> matches <var>rule</var> for <em>zero or one</em> time.
+</p></dd>
+<dt><span><code>optional(<var>rule</var>)</code></span></dt>
+<dd><p>matches <var>rule</var> for <em>zero or one</em> time.
This is like the &lsquo;<samp>?</samp>&rsquo; operator in regular expressions.
-
-</li><li> <code>field(name, rule)</code> assigns field name <var>name</var> to the child
-node matched by <var>rule</var>.
-
-</li><li> <code>alias(rule, alias)</code> makes nodes matched by <var>rule</var> appear as
-<var>alias</var> in the syntax tree generated by the parser. For example,
-
+</p></dd>
+<dt><span><code>field(<var>name</var>, <var>rule</var>)</code></span></dt>
+<dd><p>assigns field name <var>name</var> to the child node matched by <var>rule</var>.
+</p></dd>
+<dt><span><code>alias(<var>rule</var>, <var>alias</var>)</code></span></dt>
+<dd><p>makes nodes matched by <var>rule</var> appear as <var>alias</var> in the syntax
+tree generated by the parser. For example,
+</p>
<div class="example">
<pre class="example">alias(preprocessor_call_exp, call_expression)
</pre></div>
-<p>makes any node matched by <code>preprocessor_call_exp</code> to appear as
+<p>makes any node matched by <code>preprocessor_call_exp</code> appear as
<code>call_expression</code>.
-</p></li></ul>
+</p></dd>
+</dl>
-<p>Below are grammar functions less interesting for a reader of a
+<p>Below are grammar functions of lesser importance for reading a
language definition.
</p>
-<ul>
-<li> <code>token(rule)</code> marks <var>rule</var> to produce a single leaf node.
-That is, instead of generating a parent node with individual child
-nodes under it, everything is combined into a single leaf node.
-
-</li><li> Normally, grammar rules ignore preceding whitespaces,
-<code>token.immediate(rule)</code> changes <var>rule</var> to match only when
-there is no preceding whitespaces.
-
-</li><li> <code>prec(n, rule)</code> gives <var>rule</var> a level <var>n</var> precedence.
-
-</li><li> <code>prec.left([n,] rule)</code> marks <var>rule</var> as left-associative,
-optionally with level <var>n</var>.
-
-</li><li> <code>prec.right([n,] rule)</code> marks <var>rule</var> as right-associative,
-optionally with level <var>n</var>.
-
-</li><li> <code>prec.dynamic(n, rule)</code> is like <code>prec</code>, but the precedence
-is applied at runtime instead.
-</li></ul>
-
-<p>The tree-sitter project talks about writing a grammar in more detail:
-<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">https://tree-sitter.github.io/tree-sitter/creating-parsers</a>.
-Read especially &ldquo;The Grammar DSL&rdquo; section.
+<dl compact="compact">
+<dt><span><code>token(<var>rule</var>)</code></span></dt>
+<dd><p>marks <var>rule</var> to produce a single leaf node. That is, instead of
+generating a parent node with individual child nodes under it,
+everything is combined into a single leaf node.
+</p></dd>
+<dt><span><code>token.immediate(<var>rule</var>)</code></span></dt>
+<dd><p>Normally, grammar rules ignore preceding whitespace; this
+changes <var>rule</var> to match only when there is no preceding
+whitespaces.
+</p></dd>
+<dt><span><code>prec(<var>n</var>, <var>rule</var>)</code></span></dt>
+<dd><p>gives <var>rule</var> the level-<var>n</var> precedence.
+</p></dd>
+<dt><span><code>prec.left([<var>n</var>,] <var>rule</var>)</code></span></dt>
+<dd><p>marks <var>rule</var> as left-associative, optionally with level <var>n</var>.
+</p></dd>
+<dt><span><code>prec.right([<var>n</var>,] <var>rule</var>)</code></span></dt>
+<dd><p>marks <var>rule</var> as right-associative, optionally with level <var>n</var>.
+</p></dd>
+<dt><span><code>prec.dynamic(<var>n</var>, <var>rule</var>)</code></span></dt>
+<dd><p>this is like <code>prec</code>, but the precedence is applied at runtime
+instead.
+</p></dd>
+</dl>
+
+<p>The documentation of the tree-sitter project has
+<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">more
+about writing a grammar</a>. Read especially &ldquo;The Grammar DSL&rdquo;
+section.
</p>
</div>
<hr>