summaryrefslogtreecommitdiff
path: root/admin/notes
diff options
context:
space:
mode:
authorEli Zaretskii <eliz@gnu.org>2019-03-09 12:41:48 +0200
committerEli Zaretskii <eliz@gnu.org>2019-03-09 12:41:48 +0200
commitfddb915d234515af81dce30982a8dd22568b4e84 (patch)
tree7fc4d497bd317df930e6f492a6bddbf0ba1e5b96 /admin/notes
parent4e082ce3941a9c1fcaae509897761d3e24e08625 (diff)
downloademacs-fddb915d234515af81dce30982a8dd22568b4e84.tar.gz
emacs-fddb915d234515af81dce30982a8dd22568b4e84.tar.bz2
emacs-fddb915d234515af81dce30982a8dd22568b4e84.zip
Import Unicode 12.0 data files
* admin/unidata/copyright.html: * admin/unidata/UnicodeData.txt: * admin/unidata/SpecialCasing.txt: * admin/unidata/NormalizationTest.txt: * admin/unidata/Blocks.txt: * admin/unidata/BidiMirroring.txt: * admin/unidata/BidiBrackets.txt: New versions from Unicode 12.0. * admin/unidata/unidata-gen.el (unidata-gen-file): * admin/unidata/blocks.awk (name2alias): Adapt to changes in new data files. * admin/notes/unicode: Update and improve instructions for importing a new Unicode Standard. * lisp/international/characters.el (char-width-table): Update lists of characters according to Unicode 12.0. * lisp/international/fontset.el (script-representative-chars): Add characters from new scripts to 'script-representative-chars'. (otf-script-alist): Update according to data on the MS site. * lisp/international/mule-cmds.el (ucs-names): Update unused ranges of codepoints according to Unicode 12.0. * test/lisp/international/ucs-normalize-tests.el (ucs-normalize-tests--failing-lines-part1) (ucs-normalize-tests--failing-lines-part2): Update for the new NormalizationTest.txt file. * test/manual/BidiCharacterTest.txt: Update with the new version from Unicode 12.0.
Diffstat (limited to 'admin/notes')
-rw-r--r--admin/notes/unicode23
1 files changed, 18 insertions, 5 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode
index bbee3e9de7f..4d6aa6e9a9e 100644
--- a/admin/notes/unicode
+++ b/admin/notes/unicode
@@ -11,15 +11,20 @@ Emacs uses the following files from the Unicode Character Database
. UnicodeData.txt
. Blocks.txt
- . BidiMirroring.txt
. BidiBrackets.txt
+ . BidiCharacterTest.txt
+ . BidiMirroring.txt
. IVD_Sequences.txt
. NormalizationTest.txt
. SpecialCasing.txt
- . BidiCharacterTest.txt
First, the first 7 files need to be copied into admin/unidata/, and
-then Emacs should be rebuilt for them to take effect. Rebuilding
+the file https://www.unicode.org/copyright.html should be copied over
+copyright.html in admin/unidata (that file might need trailing
+whitespace removed before it can be committed to the Emacs
+repository).
+
+Then Emacs should be rebuilt for them to take effect. Rebuilding
Emacs updates several derived files elsewhere in the Emacs source
tree, mainly in lisp/international/.
@@ -28,7 +33,10 @@ files, pay attention to any warning or error messages. In particular,
admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines
new bidirectional attributes of characters, because unidata-gen.el,
bidi.c and dispextern.h need to be updated in that case; failure to do
-so will cause aborts in redisplay.
+so will cause aborts in redisplay. unidata-gen.el will also complain
+if the format of the Unicode Copyright notice in copyright.html
+changed in significant ways; in that case, update the regular
+expression in unidata-gen-file used to extract the copyright string.
Next, review the changes in UnicodeData.txt vs the previous version
used by Emacs. Any changes, be it introduction of new scripts or
@@ -40,7 +48,12 @@ and see if any changes in admin/unidata/blocks.awk are required.
The setting of char-width-table around line 1200 of characters.el
should be checked against the latest version of the Unicode file
-EastAsianWidth.txt, and any discrepancies fixed.
+EastAsianWidth.txt, and any discrepancies fixed: double-width
+characters are those marked with W or F in that file. Zero-width
+characters are not taken from EastAsianWidth.txt, they are those whose
+Unicode General Category property is one of Mn, Me, or Cf, and also
+Hangul jungseong and jongseong characters (a.k.a. "Jamo medial vowels"
+and "Jamo final consonants").
Any new scripts added by UnicodeData.txt will also need updates to
script-representative-chars defined in fontset.el, and also the list