View Issue Details

IDProjectCategoryView StatusLast Update
0037416FPCMiscpublic2020-08-08 17:40
ReporterHomeBoy TAZ Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
PlatformLazarus 2.0.10 / FPC 3.2.0OSWindows 10 
Product Version3.2.0 
Summary0037416: Readln and WriteLn issues with UTF16
DescriptionReference: https://forum.lazarus.freepascal.org/index.php/topic,46759.60.html

ReadLn and WriteLn seems to have 2 kind of issues:
- the first one is the handle of the end of line: reading an utf-16 encoded text file will produce a different file if we write the same buffer
- the second one is that readln/writeln seem to react differently according to the project kind (if it is a console only application or a project with a form). It is also reacting differently from one computer to another: the guys who helped me in the reference thread above have apparently different hex values than mine.
Steps To ReproduceI attached some files:

Sample.txt:
 utf16 little endian file with accentuated characters, will be used as source file.

Project1:
- console application
- build a run it under a console
- output:
   * sample2.txt should have been identical to sample.txt, but it is not (because of the first issue regarding the end of line handling)
   * console output will show hex values of the characters read from the sample.txt, we can again see the end of line issue, but we also have hex values used for reference regarding Project2

Project2:
- form application (Lazarus)
- output:
   * sample3.txt should have been identical to sample.txt but is it not. There is also the end of line problem, but the accentuated characters got corrupted too
   * a memo in the form will show the hex values of the characters read from the sample.txt. We can see the BOM and the accentuated characters than those in the console version of project1. For example, the D is ok ("0044"), but the "é" is "00E9" in the command line version where in the form it is "003F" on one computer (see my reference post) and "00EF BFBD" on my computer.
Additional InformationI was able to reproduce under a x64 Linux FPC 3.0.4 / Lazarus 2.0.8

I can try to make whatever test you might need, or give any useful additional information, but my skill level is pretty low, so do not expect too much.

Priority seems pretty low to my point of view, I just report to contribute to any utf16 work you might have on your side.

UTF16 Big Endian was not tested, but it is likely to have same problem. UTF32 might be totally out of scope.
TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget
Attached Files

Activities

HomeBoy TAZ

2020-07-24 15:01

reporter  

Tests2.zip (132,205 bytes)

Handoko

2020-07-25 21:06

reporter   ~0124329

I wrote some codes for testing, some screenshots and comments. I posted them here:
https://forum.lazarus.freepascal.org/index.php/topic,46759.msg371103.html#msg371103

Jonas Maebe

2020-08-08 17:40

manager   ~0124672

Textfiles do not support UTF-16 or UTF-32 encoding. Only single byte and UTF-8 are supported.

Read(anything) will always read the data using the code page set for the text file (which by default is DefaultSystemCodepage, and which you can change with https://www.freepascal.org/docs-html/rtl/system/settextcodepage.html). Afterwards, the data gets converted to the string type that you are using (e.g. a Unicodestring or Widestring).

Additionally, just like with ansistrings, the codepages that you can pass to SetTextCodePage must always be single-byte codepages or UTF-7/8. UTF-16 or UTF-32 are not supported at this time. Does Delphi support this?

Issue History

Date Modified Username Field Change
2020-07-24 15:01 HomeBoy TAZ New Issue
2020-07-24 15:01 HomeBoy TAZ File Added: Tests2.zip
2020-07-25 21:06 Handoko Note Added: 0124329
2020-08-08 17:40 Jonas Maebe Note Added: 0124672