UnicodeToUtf8 does not convert UTF-8 surrogate characters correctly
Original Reporter info from Mantis: Milla
-
Reporter name:
Original Reporter info from Mantis: Milla
- Reporter name:
Description:
The UnicodeToUtf8 function in wstrings.inc does not correctly convert a UTF-16 surrogate pair to UTF-8 format. Instead, each half of the the surrogate pair is converted individually (also known as CESU-8 format). This patch updates the function to recognise surrogate pairs.
Additional information:
High surrogates that are not followed by low surrogates are silently dropped.
Similarly, low surrogates that are not preceded by high surrogates are also dropped.
It is likely that Utf8ToUnicode will also need to be updated, though I haven't checked if this is the case.
This issue should be related to issue 0013067.
Mantis conversion info:
- Mantis ID: 13075
- Version: 2.2.2
- Fixed in version: 2.4.0
- Fixed in revision: 12902 (#d67dbcf0)
- Target version: 2.4.0