UnicodeToUtf8 does not convert UTF-8 surrogate characters correctly

Original Reporter info from Mantis: Milla

Reporter name:

Description:

The UnicodeToUtf8 function in wstrings.inc does not correctly convert a UTF-16 surrogate pair to UTF-8 format. Instead, each half of the the surrogate pair is converted individually (also known as CESU-8 format). This patch updates the function to recognise surrogate pairs.

Additional information:

High surrogates that are not followed by low surrogates are silently dropped.

Similarly, low surrogates that are not preceded by high surrogates are also dropped.

It is likely that Utf8ToUnicode will also need to be updated, though I haven't checked if this is the case.

This issue should be related to issue 0013067.

Mantis conversion info:

Mantis ID: 13075
Version: 2.2.2
Fixed in version: 2.4.0
Fixed in revision: 12902 (#d67dbcf0)
Target version: 2.4.0

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

UnicodeToUtf8 does not convert UTF-8 surrogate characters correctly

Original Reporter info from Mantis: Milla Reporter name:

Description:

Additional information:

Mantis conversion info:

Original Reporter info from Mantis: Milla

Reporter name: