summaryrefslogtreecommitdiff
path: root/vendor/wikimedia/utfnormal/README.md
diff options
context:
space:
mode:
authorPierre Schmitz <pierre@archlinux.de>2015-06-04 07:31:04 +0200
committerPierre Schmitz <pierre@archlinux.de>2015-06-04 07:58:39 +0200
commitf6d65e533c62f6deb21342d4901ece24497b433e (patch)
treef28adf0362d14bcd448f7b65a7aaf38650f923aa /vendor/wikimedia/utfnormal/README.md
parentc27b2e832fe25651ef2410fae85b41072aae7519 (diff)
Update to MediaWiki 1.25.1
Diffstat (limited to 'vendor/wikimedia/utfnormal/README.md')
-rw-r--r--vendor/wikimedia/utfnormal/README.md69
1 files changed, 69 insertions, 0 deletions
diff --git a/vendor/wikimedia/utfnormal/README.md b/vendor/wikimedia/utfnormal/README.md
new file mode 100644
index 00000000..8e4b0372
--- /dev/null
+++ b/vendor/wikimedia/utfnormal/README.md
@@ -0,0 +1,69 @@
+[![Latest Stable Version](https://poser.pugx.org/wikimedia/utfnormal/v/stable.svg)](https://packagist.org/packages/wikimedia/utfnormal) [![License](https://poser.pugx.org/wikimedia/utfnormal/license.svg)](https://packagist.org/packages/wikimedia/utfnormal)
+
+utfnormal
+=========
+
+utfnormal is a library that contains Unicode normalization routines, including
+both pure PHP implementations and automatic use of the 'intl' PHP extension when
+ present.
+
+The main function to care about is UtfNormal\Validator::cleanUp(). This will
+strip illegal UTF-8 sequences and characters that are illegal in XML, and
+if necessary convert to normalization form C.
+
+If you know the string is already valid UTF-8, you can directly call
+UtfNormal\Validator::toNFC(), toNFK(), or toNFKC(); this will convert a given
+UTF-8 string to Normalization Form C, K, or KC if it's not already such.
+The function assumes that the input string is already valid UTF-8; if there
+are corrupt characters this may produce erroneous results.
+
+Performance is kind of stinky in absolute terms, though it should be speedy
+on pure ASCII text. ;) On text that can be determined quickly to already be
+in NFC it's not too awful but it can quickly get uncomfortably slow,
+particularly for Korean text (the hangul decomposition/composition code is
+extra slow).
+
+Bugs should be filed in [Wikimedia's Phabricator] under the "utfnormal" project.
+
+
+Regenerating data tables
+------------------------
+UtfNormalData.inc and UtfNormalDataK.inc are generated from the Unicode
+Character Database by the script "generate.php". Run "composer generate"
+to rebuild the tables. To fetch updated unicode data from the internet,
+run "composer generate -- --fetch".
+
+
+Testing
+-------
+
+Running "composer test" will run a syntax checker, PHPUnit conformance tests,
+and run some benchmarks using sample texts from Wikipedia. Take all benchmark
+numbers with large grains of salt.
+
+
+PHP module extension
+--------------------
+
+If the 'intl' PHP extension is present, ICU library functions are used which
+are *MUCH* faster than doing this work in pure PHP code.
+
+It is strongly recommended to enable this module if possible:
+http://php.net/manual/en/intro.intl.php
+
+Older versions of this library supported a one-off custom PHP extension,
+which has been dropped. If you were using this, please migrate to the
+intl extension.
+
+
+History
+-------
+This library was first introduced in [MediaWiki 1.3][] ([r4965]). It was
+split out of the MediaWiki codebase and published as an independent library
+during the [MediaWiki 1.25][] development cycle.
+
+---
+[Wikimedia's Phabricator]: https://phabricator.wikimedia.org/maniphest/task/create/?projects=utfnormal
+[MediaWiki 1.3]: https://www.mediawiki.org/wiki/MediaWiki_1.3
+[r4965]: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/4965
+[MediaWiki 1.25]: https://www.mediawiki.org/wiki/MediaWiki_1.25