Don't distort character ranges in rx translation

diff options

author	Mattias Engdegård <mattiase@acm.org>	2023-07-17 13:05:21 +0200
committer	Mattias Engdegård <mattiase@acm.org>	2023-07-17 17:56:54 +0200
commit	157e735ce89ede9cc939f4ed0f72c5af7ae60735 (patch)
tree	58c17cfd219647bec129ccbc77ced41233d65844 /test/lisp/emacs-lisp/lisp-mode-tests.el
parent	7446a8c34e2b793df52dbf56b630e20f8c10568c (diff)
download	emacs-157e735ce89ede9cc939f4ed0f72c5af7ae60735.tar.gz emacs-157e735ce89ede9cc939f4ed0f72c5af7ae60735.tar.bz2 emacs-157e735ce89ede9cc939f4ed0f72c5af7ae60735.zip

The Emacs regexp engine interprets character ranges from ASCII to raw bytes, such as [a-\xfe], as not including non-ASCII Unicode at all; ranges from non-ACII Unicode to raw bytes, such as [ü-\x91], are ignored entirely. To make rx produce a translation that works as intended, split ranges that that go from ordinary characters to raw bytes. Such ranges may appear from set manipulation and regexp optimisation. * lisp/emacs-lisp/rx.el (rx--generate-alt): Split intervals that straddle the char-raw boundary when rendering a string regexp from an interval set. * test/lisp/emacs-lisp/rx-tests.el (rx-char-any-raw-byte): Add test cases.

Diffstat (limited to 'test/lisp/emacs-lisp/lisp-mode-tests.el')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: