Import Unicode 12.0 data files

* admin/unidata/copyright.html: * admin/unidata/UnicodeData.txt: * admin/unidata/SpecialCasing.txt: * admin/unidata/NormalizationTest.txt: * admin/unidata/Blocks.txt: * admin/unidata/BidiMirroring.txt: * admin/unidata/BidiBrackets.txt: New versions from Unicode 12.0. * admin/unidata/unidata-gen.el (unidata-gen-file): * admin/unidata/blocks.awk (name2alias): Adapt to changes in new data files. * admin/notes/unicode: Update and improve instructions for importing a new Unicode Standard. * lisp/international/characters.el (char-width-table): Update lists of characters according to Unicode 12.0. * lisp/international/fontset.el (script-representative-chars): Add characters from new scripts to 'script-representative-chars'. (otf-script-alist): Update according to data on the MS site. * lisp/international/mule-cmds.el (ucs-names): Update unused ranges of codepoints according to Unicode 12.0. * test/lisp/international/ucs-normalize-tests.el (ucs-normalize-tests--failing-lines-part1) (ucs-normalize-tests--failing-lines-part2): Update for the new NormalizationTest.txt file. * test/manual/BidiCharacterTest.txt: Update with the new version from Unicode 12.0.
author: Eli Zaretskii <eliz@gnu.org> 2019-03-09 12:41:48 +0200
committer: Eli Zaretskii <eliz@gnu.org> 2019-03-09 12:41:48 +0200
commit: fddb915d234515af81dce30982a8dd22568b4e84 (patch)
tree: 7fc4d497bd317df930e6f492a6bddbf0ba1e5b96 /admin/notes
parent: 4e082ce3941a9c1fcaae509897761d3e24e08625 (diff)
download: emacs-fddb915d234515af81dce30982a8dd22568b4e84.tar.gz
emacs-fddb915d234515af81dce30982a8dd22568b4e84.tar.bz2
emacs-fddb915d234515af81dce30982a8dd22568b4e84.zip
1 files changed, 18 insertions, 5 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode
index bbee3e9de7f..4d6aa6e9a9e 100644
--- a/admin/notes/unicode
+++ b/admin/notes/unicode
@@ -11,15 +11,20 @@ Emacs uses the following files from the Unicode Character Database
 
   . UnicodeData.txt
   . Blocks.txt
-  . BidiMirroring.txt
   . BidiBrackets.txt
+  . BidiCharacterTest.txt
+  . BidiMirroring.txt
   . IVD_Sequences.txt
   . NormalizationTest.txt
   . SpecialCasing.txt
-  . BidiCharacterTest.txt
 
 First, the first 7 files need to be copied into admin/unidata/, and
-then Emacs should be rebuilt for them to take effect.  Rebuilding
+the file https://www.unicode.org/copyright.html should be copied over
+copyright.html in admin/unidata (that file might need trailing
+whitespace removed before it can be committed to the Emacs
+repository).
+
+Then Emacs should be rebuilt for them to take effect.  Rebuilding
 Emacs updates several derived files elsewhere in the Emacs source
 tree, mainly in lisp/international/.
 
@@ -28,7 +33,10 @@ files, pay attention to any warning or error messages.  In particular,
 admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines
 new bidirectional attributes of characters, because unidata-gen.el,
 bidi.c and dispextern.h need to be updated in that case; failure to do
-so will cause aborts in redisplay.
+so will cause aborts in redisplay.  unidata-gen.el will also complain
+if the format of the Unicode Copyright notice in copyright.html
+changed in significant ways; in that case, update the regular
+expression in unidata-gen-file used to extract the copyright string.
 
 Next, review the changes in UnicodeData.txt vs the previous version
 used by Emacs.  Any changes, be it introduction of new scripts or
@@ -40,7 +48,12 @@ and see if any changes in admin/unidata/blocks.awk are required.
 
 The setting of char-width-table around line 1200 of characters.el
 should be checked against the latest version of the Unicode file
-EastAsianWidth.txt, and any discrepancies fixed.
+EastAsianWidth.txt, and any discrepancies fixed: double-width
+characters are those marked with W or F in that file.  Zero-width
+characters are not taken from EastAsianWidth.txt, they are those whose
+Unicode General Category property is one of Mn, Me, or Cf, and also
+Hangul jungseong and jongseong characters (a.k.a. "Jamo medial vowels"
+and "Jamo final consonants").
 
 Any new scripts added by UnicodeData.txt will also need updates to
 script-representative-chars defined in fontset.el, and also the list
author	Eli Zaretskii <eliz@gnu.org>	2019-03-09 12:41:48 +0200
committer	Eli Zaretskii <eliz@gnu.org>	2019-03-09 12:41:48 +0200
commit	fddb915d234515af81dce30982a8dd22568b4e84 (patch)
tree	7fc4d497bd317df930e6f492a6bddbf0ba1e5b96 /admin/notes
parent	4e082ce3941a9c1fcaae509897761d3e24e08625 (diff)
download	emacs-fddb915d234515af81dce30982a8dd22568b4e84.tar.gz emacs-fddb915d234515af81dce30982a8dd22568b4e84.tar.bz2 emacs-fddb915d234515af81dce30982a8dd22568b4e84.zip