diff options
Diffstat (limited to 'admin/notes/tree-sitter/html-manual/Language-Definitions.html')
-rw-r--r-- | admin/notes/tree-sitter/html-manual/Language-Definitions.html | 326 |
1 files changed, 326 insertions, 0 deletions
diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html new file mode 100644 index 00000000000..ba3eeb9eeb9 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html @@ -0,0 +1,326 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ --> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<!-- This is the GNU Emacs Lisp Reference Manual +corresponding to Emacs version 29.0.50. + +Copyright © 1990-1996, 1998-2022 Free Software Foundation, +Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with the +Invariant Sections being "GNU General Public License," with the +Front-Cover Texts being "A GNU Manual," and with the Back-Cover +Texts as in (a) below. A copy of the license is included in the +section entitled "GNU Free Documentation License." + +(a) The FSF's Back-Cover Text is: "You have the freedom to copy and +modify this GNU manual. Buying copies from the FSF supports it in +developing GNU and promoting software freedom." --> +<title>Language Definitions (GNU Emacs Lisp Reference Manual)</title> + +<meta name="description" content="Language Definitions (GNU Emacs Lisp Reference Manual)"> +<meta name="keywords" content="Language Definitions (GNU Emacs Lisp Reference Manual)"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<meta name="viewport" content="width=device-width,initial-scale=1"> + +<link href="index.html" rel="start" title="Top"> +<link href="Index.html" rel="index" title="Index"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source"> +<link href="Using-Parser.html" rel="next" title="Using Parser"> +<style type="text/css"> +<!-- +a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em} +a.summary-letter {text-decoration: none} +blockquote.indentedblock {margin-right: 0em} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +kbd {font-style: oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +span.nolinebreak {white-space: nowrap} +span.roman {font-family: initial; font-weight: normal} +span.sansserif {font-family: sans-serif; font-weight: normal} +span:hover a.copiable-anchor {visibility: visible} +ul.no-bullet {list-style: none} +--> +</style> +<link rel="stylesheet" type="text/css" href="./manual.css"> + + +</head> + +<body lang="en"> +<div class="section" id="Language-Definitions"> +<div class="header"> +<p> +Next: <a href="Using-Parser.html" accesskey="n" rel="next">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<span id="Tree_002dsitter-Language-Definitions"></span><h3 class="section">37.1 Tree-sitter Language Definitions</h3> + +<span id="Loading-a-language-definition"></span><h3 class="heading">Loading a language definition</h3> + +<p>Tree-sitter relies on language definitions to parse text in that +language. In Emacs, A language definition is represented by a symbol. +For example, C language definition is represented as <code>c</code>, and +<code>c</code> can be passed to tree-sitter functions as the <var>language</var> +argument. +</p> +<span id="index-treesit_002dextra_002dload_002dpath"></span> +<span id="index-treesit_002dload_002dlanguage_002derror"></span> +<span id="index-treesit_002dload_002dsuffixes"></span> +<p>Tree-sitter language definitions are distributed as dynamic libraries. +In order to use a language definition in Emacs, you need to make sure +that the dynamic library is installed on the system. Emacs looks for +language definitions under load paths in +<code>treesit-extra-load-path</code>, <code>user-emacs-directory</code>/tree-sitter, +and system default locations for dynamic libraries, in that order. +Emacs tries each extensions in <code>treesit-load-suffixes</code>. If Emacs +cannot find the library or has problem loading it, Emacs signals +<code>treesit-load-language-error</code>. The signal data is a list of +specific error messages. +</p> +<dl class="def"> +<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>This function checks whether the dynamic library for <var>language</var> is +present on the system, and return non-nil if it is. +</p></dd></dl> + +<span id="index-treesit_002dload_002dname_002doverride_002dlist"></span> +<p>By convention, the dynamic library for <var>language</var> is +<code>libtree-sitter-<var>language</var>.<var>ext</var></code>, where <var>ext</var> is the +system-specific extension for dynamic libraries. Also by convention, +the function provided by that library is named +<code>tree_sitter_<var>language</var></code>. If a language definition doesn’t +follow this convention, you should add an entry +</p> +<div class="example"> +<pre class="example">(<var>language</var> <var>library-base-name</var> <var>function-name</var>) +</pre></div> + +<p>to <code>treesit-load-name-override-list</code>, where +<var>library-base-name</var> is the base filename for the dynamic library +(conventionally <code>libtree-sitter-<var>language</var></code>), and +<var>function-name</var> is the function provided by the library +(conventionally <code>tree_sitter_<var>language</var></code>). For example, +</p> +<div class="example"> +<pre class="example">(cool-lang "libtree-sitter-coool" "tree_sitter_cooool") +</pre></div> + +<p>for a language too cool to abide by conventions. +</p> +<dl class="def"> +<dt id="index-treesit_002dlanguage_002dversion"><span class="category">Function: </span><span><strong>treesit-language-version</strong> <em>&optional min-compatible</em><a href='#index-treesit_002dlanguage_002dversion' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>Tree-sitter library has a <em>language version</em>, a language +definition’s version needs to match this version to be compatible. +</p> +<p>This function returns tree-sitter library’s language version. If +<var>min-compatible</var> is non-nil, it returns the minimal compatible +version. +</p></dd></dl> + +<span id="Concrete-syntax-tree"></span><h3 class="heading">Concrete syntax tree</h3> + +<p>A syntax tree is what a parser generates. In a syntax tree, each node +represents a piece of text, and is connected to each other by a +parent-child relationship. For example, if the source text is +</p> +<div class="example"> +<pre class="example">1 + 2 +</pre></div> + +<p>its syntax tree could be +</p> +<div class="example"> +<pre class="example"> +--------------+ + | root "1 + 2" | + +--------------+ + | + +--------------------------------+ + | expression "1 + 2" | + +--------------------------------+ + | | | ++------------+ +--------------+ +------------+ +| number "1" | | operator "+" | | number "2" | ++------------+ +--------------+ +------------+ +</pre></div> + +<p>We can also represent it in s-expression: +</p> +<div class="example"> +<pre class="example">(root (expression (number) (operator) (number))) +</pre></div> + +<span id="Node-types"></span><h4 class="subheading">Node types</h4> + +<span id="index-tree_002dsitter-node-type"></span> +<span id="tree_002dsitter-node-type"></span><span id="index-tree_002dsitter-named-node"></span> +<span id="tree_002dsitter-named-node"></span><span id="index-tree_002dsitter-anonymous-node"></span> +<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>, +<code>operator</code> are nodes’ <em>type</em>. However, not all nodes in a +syntax tree have a type. Nodes that don’t are <em>anonymous nodes</em>, +and nodes with a type are <em>named nodes</em>. Anonymous nodes are +tokens with fixed spellings, including punctuation characters like +bracket ‘<samp>]</samp>’, and keywords like <code>return</code>. +</p> +<span id="Field-names"></span><h4 class="subheading">Field names</h4> + +<span id="index-tree_002dsitter-node-field-name"></span> +<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to +analyze, many language definitions assign <em>field names</em> to child +nodes. For example, a <code>function_definition</code> node could have a +<code>declarator</code> and a <code>body</code>: +</p> +<div class="example"> +<pre class="example">(function_definition + declarator: (declaration) + body: (compound_statement)) +</pre></div> + +<dl class="def"> +<dt id="index-treesit_002dinspect_002dmode"><span class="category">Command: </span><span><strong>treesit-inspect-mode</strong><a href='#index-treesit_002dinspect_002dmode' class='copiable-anchor'> ¶</a></span></dt> +<dd><p>This minor mode displays the node that <em>starts</em> at point in +mode-line. The mode-line will display +</p> +<div class="example"> +<pre class="example"><var>parent</var> <var>field-name</var>: (<var>child</var> (<var>grand-child</var> (...))) +</pre></div> + +<p><var>child</var>, <var>grand-child</var>, and <var>grand-grand-child</var>, etc, are +nodes that have their beginning at point. And <var>parent</var> is the +parent of <var>child</var>. +</p> +<p>If there is no node that starts at point, i.e., point is in the middle +of a node, then the mode-line only displays the smallest node that +spans point, and its immediate parent. +</p> +<p>This minor mode doesn’t create parsers on its own. It simply uses the +first parser in <code>(treesit-parser-list)</code> (see <a href="Using-Parser.html">Using Tree-sitter Parser</a>). +</p></dd></dl> + +<span id="Reading-the-grammar-definition"></span><h3 class="heading">Reading the grammar definition</h3> + +<p>Authors of language definitions define the <em>grammar</em> of a +language, and this grammar determines how does a parser construct a +concrete syntax tree out of the text. In order to use the syntax +tree effectively, we need to read the <em>grammar file</em>. +</p> +<p>The grammar file is usually <code>grammar.js</code> in a language +definition’s project repository. The link to a language definition’s +home page can be found in tree-sitter’s homepage +(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>). +</p> +<p>The grammar is written in JavaScript syntax. For example, the rule +matching a <code>function_definition</code> node looks like +</p> +<div class="example"> +<pre class="example">function_definition: $ => seq( + $.declaration_specifiers, + field('declarator', $.declaration), + field('body', $.compound_statement) +) +</pre></div> + +<p>The rule is represented by a function that takes a single argument +<var>$</var>, representing the whole grammar. The function itself is +constructed by other functions: the <code>seq</code> function puts together a +sequence of children; the <code>field</code> function annotates a child with +a field name. If we write the above definition in BNF syntax, it +would look like +</p> +<div class="example"> +<pre class="example">function_definition := + <declaration_specifiers> <declaration> <compound_statement> +</pre></div> + +<p>and the node returned by the parser would look like +</p> +<div class="example"> +<pre class="example">(function_definition + (declaration_specifier) + declarator: (declaration) + body: (compound_statement)) +</pre></div> + +<p>Below is a list of functions that one will see in a grammar +definition. Each function takes other rules as arguments and returns +a new rule. +</p> +<ul> +<li> <code>seq(rule1, rule2, ...)</code> matches each rule one after another. + +</li><li> <code>choice(rule1, rule2, ...)</code> matches one of the rules in its +arguments. + +</li><li> <code>repeat(rule)</code> matches <var>rule</var> for <em>zero or more</em> times. +This is like the ‘<samp>*</samp>’ operator in regular expressions. + +</li><li> <code>repeat1(rule)</code> matches <var>rule</var> for <em>one or more</em> times. +This is like the ‘<samp>+</samp>’ operator in regular expressions. + +</li><li> <code>optional(rule)</code> matches <var>rule</var> for <em>zero or one</em> time. +This is like the ‘<samp>?</samp>’ operator in regular expressions. + +</li><li> <code>field(name, rule)</code> assigns field name <var>name</var> to the child +node matched by <var>rule</var>. + +</li><li> <code>alias(rule, alias)</code> makes nodes matched by <var>rule</var> appear as +<var>alias</var> in the syntax tree generated by the parser. For example, + +<div class="example"> +<pre class="example">alias(preprocessor_call_exp, call_expression) +</pre></div> + +<p>makes any node matched by <code>preprocessor_call_exp</code> to appear as +<code>call_expression</code>. +</p></li></ul> + +<p>Below are grammar functions less interesting for a reader of a +language definition. +</p> +<ul> +<li> <code>token(rule)</code> marks <var>rule</var> to produce a single leaf node. +That is, instead of generating a parent node with individual child +nodes under it, everything is combined into a single leaf node. + +</li><li> Normally, grammar rules ignore preceding whitespaces, +<code>token.immediate(rule)</code> changes <var>rule</var> to match only when +there is no preceding whitespaces. + +</li><li> <code>prec(n, rule)</code> gives <var>rule</var> a level <var>n</var> precedence. + +</li><li> <code>prec.left([n,] rule)</code> marks <var>rule</var> as left-associative, +optionally with level <var>n</var>. + +</li><li> <code>prec.right([n,] rule)</code> marks <var>rule</var> as right-associative, +optionally with level <var>n</var>. + +</li><li> <code>prec.dynamic(n, rule)</code> is like <code>prec</code>, but the precedence +is applied at runtime instead. +</li></ul> + +<p>The tree-sitter project talks about writing a grammar in more detail: +<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">https://tree-sitter.github.io/tree-sitter/creating-parsers</a>. +Read especially “The Grammar DSL” section. +</p> +</div> +<hr> +<div class="header"> +<p> +Next: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> +</div> + + + +</body> +</html> |