View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0037416||FPC||Misc||public||2020-07-24 15:01||2020-08-08 17:40|
|Reporter||HomeBoy TAZ||Assigned To|
|Platform||Lazarus 2.0.10 / FPC 3.2.0||OS||Windows 10|
|Summary||0037416: Readln and WriteLn issues with UTF16|
ReadLn and WriteLn seems to have 2 kind of issues:
- the first one is the handle of the end of line: reading an utf-16 encoded text file will produce a different file if we write the same buffer
- the second one is that readln/writeln seem to react differently according to the project kind (if it is a console only application or a project with a form). It is also reacting differently from one computer to another: the guys who helped me in the reference thread above have apparently different hex values than mine.
|Steps To Reproduce||I attached some files:|
utf16 little endian file with accentuated characters, will be used as source file.
- console application
- build a run it under a console
* sample2.txt should have been identical to sample.txt, but it is not (because of the first issue regarding the end of line handling)
* console output will show hex values of the characters read from the sample.txt, we can again see the end of line issue, but we also have hex values used for reference regarding Project2
- form application (Lazarus)
* sample3.txt should have been identical to sample.txt but is it not. There is also the end of line problem, but the accentuated characters got corrupted too
* a memo in the form will show the hex values of the characters read from the sample.txt. We can see the BOM and the accentuated characters than those in the console version of project1. For example, the D is ok ("0044"), but the "é" is "00E9" in the command line version where in the form it is "003F" on one computer (see my reference post) and "00EF BFBD" on my computer.
|Additional Information||I was able to reproduce under a x64 Linux FPC 3.0.4 / Lazarus 2.0.8|
I can try to make whatever test you might need, or give any useful additional information, but my skill level is pretty low, so do not expect too much.
Priority seems pretty low to my point of view, I just report to contribute to any utf16 work you might have on your side.
UTF16 Big Endian was not tested, but it is likely to have same problem. UTF32 might be totally out of scope.
|Tags||No tags attached.|
|Fixed in Revision|
Tests2.zip (132,205 bytes)
I wrote some codes for testing, some screenshots and comments. I posted them here:
Textfiles do not support UTF-16 or UTF-32 encoding. Only single byte and UTF-8 are supported.
Read(anything) will always read the data using the code page set for the text file (which by default is DefaultSystemCodepage, and which you can change with https://www.freepascal.org/docs-html/rtl/system/settextcodepage.html). Afterwards, the data gets converted to the string type that you are using (e.g. a Unicodestring or Widestring).
Additionally, just like with ansistrings, the codepages that you can pass to SetTextCodePage must always be single-byte codepages or UTF-7/8. UTF-16 or UTF-32 are not supported at this time. Does Delphi support this?