Readln and WriteLn issues with UTF16
Original Reporter info from Mantis: HomeBoy
-
Reporter name: HomeBoy TAZ
Original Reporter info from Mantis: HomeBoy
- Reporter name: HomeBoy TAZ
Description:
Reference: https://forum.lazarus.freepascal.org/index.php/topic,46759.60.html
ReadLn and WriteLn seems to have 2 kind of issues:
- the first one is the handle of the end of line: reading an utf-16 encoded text file will produce a different file if we write the same buffer
- the second one is that readln/writeln seem to react differently according to the project kind (if it is a console only application or a project with a form). It is also reacting differently from one computer to another: the guys who helped me in the reference thread above have apparently different hex values than mine.
Steps to reproduce:
I attached some files:
Sample.txt:
utf16 little endian file with accentuated characters, will be used as source file.
Project1:
- console application
- build a run it under a console
- output:
* sample2.txt should have been identical to sample.txt, but it is not (because of the first issue regarding the end of line handling)
* console output will show hex values of the characters read from the sample.txt, we can again see the end of line issue, but we also have hex values used for reference regarding Project2
Project2:
- form application (Lazarus)
- output:
* sample3.txt should have been identical to sample.txt but is it not. There is also the end of line problem, but the accentuated characters got corrupted too
* a memo in the form will show the hex values of the characters read from the sample.txt. We can see the BOM and the accentuated characters than those in the console version of project1. For example, the D is ok ("0044"), but the "é" is "00E9" in the command line version where in the form it is "003F" on one computer (see my reference post) and "00EF BFBD" on my computer.
Additional information:
I was able to reproduce under a x64 Linux FPC 3.0.4 / Lazarus 2.0.8
I can try to make whatever test you might need, or give any useful additional information, but my skill level is pretty low, so do not expect too much.
Priority seems pretty low to my point of view, I just report to contribute to any utf16 work you might have on your side.
UTF16 Big Endian was not tested, but it is likely to have same problem. UTF32 might be totally out of scope.
Mantis conversion info:
- Mantis ID: 37416
- OS: Windows 10
- OS Build: 64 bits French
- Platform: Lazarus 2.0.10 / FPC 3.2.0
- Version: 3.2.0
- Monitored by: » BrainWaveCC (Andrew S. Baker (ASB))