Previous: , Up: Font Lock Mode   [Contents][Index]


24.6.10 Parser-based Font Lock

Besides simple syntactic font lock and regexp-based font lock, Emacs also provides complete syntactic font lock with the help of a parser. Currently, Emacs uses the tree-sitter library (see Parsing Program Source) for this purpose.

Parser-based font lock and other font lock mechanisms are not mutually exclusive. By default, if enabled, parser-based font lock runs first, replacing syntactic font lock, then the regexp-based font lock.

Although parser-based font lock doesn’t share the same customization variables with regexp-based font lock, it uses similar customization schemes. The tree-sitter counterpart of font-lock-keywords is treesit-font-lock-settings.

In general, tree-sitter fontification works as follows:

For more information about queries, patterns, and capture names, see Pattern Matching Tree-sitter Nodes.

To setup tree-sitter fontification, a major mode should first set treesit-font-lock-settings with the output of treesit-font-lock-rules, then call treesit-major-mode-setup.

Function: treesit-font-lock-rules :keyword value query...

This function is used to set treesit-font-lock-settings. It takes care of compiling queries and other post-processing, and outputs a value that treesit-font-lock-settings accepts. Here’s an example:

(treesit-font-lock-rules
 :language 'javascript
 :feature 'constant
 :override t
 '((true) @font-lock-constant-face
   (false) @font-lock-constant-face)
 :language 'html
 :feature 'script
 "(script_element) @font-lock-builtin-face")

This function takes a list of text or s-exp queries. Before each query, there are :keyword-value pairs that configure that query. The :lang keyword sets the query’s language and every query must specify the language. The :feature keyword sets the feature name of the query. Users can control which features are enabled with font-lock-maximum-decoration and treesit-font-lock-feature-list (see below).

Other keywords are optional:

KeywordValueDescription
:overridenilIf the region already has a face, discard the new face
tAlways apply the new face
appendAppend the new face to existing ones
prependPrepend the new face to existing ones
keepFill-in regions without an existing face

Lisp programs mark patterns in the query with capture names (names that starts with @), and tree-sitter will return matched nodes tagged with those same capture names. For the purpose of fontification, capture names in query should be face names like font-lock-keyword-face. The captured node will be fontified with that face.

Capture names can also be function names, in which case the function is called with 4 arguments: node and override, start and end, where node is the node itself, override is the override property of the rule which captured this node, and start and end limits the region in which this function should fontify. (If this function wants to respect the override argument, it can use treesit-fontify-with-override.)

Beyond the 4 arguments presented, this function should accept more arguments as optional arguments for future extensibility.

If a capture name is both a face and a function, the face takes priority. If a capture name is neither a face nor a function, it is ignored.

Contextual entities, like multi-line strings, or /* */ style comments, need special care, because change in these entities might cause change in a large portion of the buffer. For example, inserting the closing comment delimiter */ will change all the text between it and the opening delimiter to comment face. Such entities should be captured in a special name contextual, so Emacs can correctly update their fontification. Here is an example for comments:

(treesit-font-lock-rules
 :language 'javascript
 :feature 'comment
 :override t
 '((comment) @font-lock-comment-face)
   (comment) @contextual))
Variable: treesit-font-lock-feature-list

This is a list of lists of feature symbols. Each element of the list is a list that represents a decoration level. font-lock-maximum-decoration controls which levels are activated.

Each element of the list is a list of the form (feature …), where each feature corresponds to the :feature value of a query defined in treesit-font-lock-rules. Removing a feature symbol from this list disables the corresponding query during font-lock.

Common feature names, for many programming languages, include function-name, type, variable-name (left-hand-side or LHS of assignments), builtin, constant, keyword, string-interpolation, comment, doc, string, operator, preprocessor, escape-sequence, and key (in key-value pairs). Major modes are free to subdivide or extend these common features.

For example, the value of this variable could be:

((comment string doc) ; level 1
 (function-name keyword type builtin constant) ; level 2
 (variable-name string-interpolation key)) ; level 3

Major modes should set this variable before calling treesit-major-mode-setup.

For this variable to take effect, a Lisp program should call treesit-font-lock-recompute-features (which resets treesit-font-lock-settings accordingly), or treesit-major-mode-setup (which calls treesit-font-lock-recompute-features).

Variable: treesit-font-lock-settings

A list of settings for tree-sitter based font lock. The exact format of this variable is considered internal. One should always use treesit-font-lock-rules to set this variable.

Multi-language major modes should provide range functions in treesit-range-functions, and Emacs will set the ranges accordingly before fontifing a region (see Parsing Text in Multiple Languages).


Previous: Multiline Font Lock Constructs, Up: Font Lock Mode   [Contents][Index]