View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0021668FPCCompilerpublic2012-04-06 18:492012-09-27 11:03
ReporterAnton Kavalenka 
Assigned ToJonas Maebe 
PrioritynormalSeveritymajorReproducibilityalways
StatusclosedResolutionsuspended 
Platformwin32OSWindows XPOS Version
Product Version2.7.1Product Build 
Target VersionFixed in Version 
Summary0021668: Concatenation rules for ansistrings lead to impossibility saving utf-8-encoded strings via TIniFile on windows
Descriptiondefault string type String being assigned during concatenation always converts result to ANSI code page.

string := string + utf8string
always brings AnsiString(string)

Examine the attached test under debugger.

TagsNo tags attached.
FPCOldBugId
Fixed in Revision
Attached Files? file icon project2.lpr [^] (1,307 bytes) 2012-04-06 18:49
? file icon project2-2.lpr [^] (1,656 bytes) 2012-04-30 19:08
? file icon test.ini [^] (90 bytes) 2012-04-30 19:08
? file icon project3.lpr [^] (1,060 bytes) 2012-05-04 18:52

- Relationships
related to 0022982closedJoost van der Sluis Problems with accentuation Lazarus 2.7.1 

-  Notes
(0058399)
Marco van de Voort (manager)
2012-04-07 22:44
edited on: 2012-04-07 22:44

I'm not sure if rawstringbyte rvalues are correct.

I directly believe that this program doesn't do what you expect it to do. However the question is why do you think it is correct?

(0058439)
Anton Kavalenka (reporter)
2012-04-09 17:53

I think if I pass Unicode or rawbyte string into TIniFile, it should save exactly what I passed.
Data is lost in any concatenation.

None of TIniFile internals use string(CP_ACP), but the result is translated to current ANSI code page.

Just minute ago trapped into another example:
SomeMenuItem.caption:=format('%d. ',[ANumber])+ UnicodeString(Name);

Unicode is lost.
I have to manually force concatenation to work in Unicode way.

SomeMenuItem.caption:=(UnicodeString(format('%d. ',[ANumber]))+ Unicodestring(Name));

I'd like not to write my own TIniFile :)
(0058440)
Anton Kavalenka (reporter)
2012-04-09 17:59

http://docwiki.embarcadero.com/RADStudio/XE/en/Unicode_in_RAD_Studio#Code_Constructs_Independent_of_Character_Size [^]
(0058441)
Marco van de Voort (manager)
2012-04-09 19:20
edited on: 2012-04-09 19:21

The default string type is string in the default system encoding (which is afaik the same as CP_ACP). So any string parameter of TInifile is a string(AP_ACP).

This is pretty much the same as pre unicode Delphi, and this goes for the entire classes hierarchy.

So this is exactly the result I would expect for your code.

(0058460)
Anton Kavalenka (reporter)
2012-04-10 12:20

I cannot exactly bisect the release which broken concatenation.
But it worked 2 weeks ago.

According to your message all Lazarus (LCL) captions declared as String would be work in CP_ACP and LCL strictly becomes non-unicode?

I insist: it works when directly assigned as Unicode string i.e.
string property := Unicode String value

but does not work when

string property := string + Unicode string
(0059130)
Anton Kavalenka (reporter)
2012-04-30 19:05
edited on: 2012-04-30 19:07

The problem becomes even more funny (true anisotropic).

I can't write utf8string, but I can READ it!

The attached project contains .INI file with utf8-encoded string.

Uploaded next version with .INI-file containing UTF-8 string.

(0059142)
Marco van de Voort (manager)
2012-04-30 21:11

No. Lazarus assumes that they are utf8, and manuallly insert conversions where necessary.

FPC does not follow that convention.
(0059148)
Anton Kavalenka (reporter)
2012-04-30 23:22

Problem demonstrated in pure FPC ObjectPascal example.

I can read UTF8 encoded string using TIniFile class, but not write.
A month ago I was able both read and write.
That is because inside TIniFile absent explicit codepage conversions.
It is not codepage-aware at all (agnostic).

That was before.

Now the single concatenation inside TIniFile during .Write() leads to data loss.
Problem is not inside TInifile (ObjectPascal RTL) but inside concatenation, which IMO - broken.
(0059202)
Paul Ishenin (developer)
2012-05-03 04:44

That's because RTL classes are not yet prepared for codepage strings. First the compiler changes and base RTL routines needs to be finished, then other RTL/FCL classes need to follow.
(0059208)
Marco van de Voort (manager)
2012-05-03 09:58

Anything that streams isn't done via compiletime. In Delphi all loadfile and stream methods get a parameter that signals the encoding to load/save.

Actually there is no reason why this couldn't already be done. Just the default would be different now.
(0059236)
Anton Kavalenka (reporter)
2012-05-04 18:51
edited on: 2012-05-04 18:52

Yet another test.
Compare the results with Delphi XE and FPC 2.7.1 trunk

Case No.3 inside test project3.lpr is good for me.
But inside TIniFile is working case No.2.

Thank you, Paul, you cleared the horizons for me.

(0062638)
Anton Kavalenka (reporter)
2012-09-26 12:43

Specifying forcibly
System.DefaultSystemCodePage:=CP_UTF8;
At the early stage of program init (InitUnits)
makes my program behave identically in Linux and Windows.
(0062642)
Jonas Maebe (manager)
2012-09-26 13:50

It's a bit cleaner to use SetMultiByteConversionCodePage(CP_UTF8) instead. It doesn't do anything different than setting DefaultSystemCodePage right now, but maybe that will change in the future or on different platforms.
(0062645)
Anton Kavalenka (reporter)
2012-09-26 16:13

OK, i did this.

Next step for me - make Windows IO (TStream descendants) working with UTF8-encoded filenames supplied to TStream.Create and FileCreate()

This issue can be closed. Problem was in short - runtime behaviour of strings when default system encoding is single-byte (ANSI).
(0062646)
Jonas Maebe (manager)
2012-09-26 16:23

I already started working locally on the sysutils part (I've already done FileCreate for all platforms, although it's only tested for Unix platforms currently). I'll commit what I've got to a separate branch, so you can then also work on it without having to duplicate anything.
(0062670)
Jonas Maebe (manager)
2012-09-27 11:03

I've committed my current changes in the new cpstrrtl branch

- Issue History
Date Modified Username Field Change
2012-04-06 18:49 Anton Kavalenka New Issue
2012-04-06 18:49 Anton Kavalenka File Added: project2.lpr
2012-04-07 22:44 Marco van de Voort Note Added: 0058399
2012-04-07 22:44 Marco van de Voort Note Edited: 0058399
2012-04-09 17:53 Anton Kavalenka Note Added: 0058439
2012-04-09 17:59 Anton Kavalenka Note Added: 0058440
2012-04-09 19:20 Marco van de Voort Note Added: 0058441
2012-04-09 19:21 Marco van de Voort Note Edited: 0058441
2012-04-10 12:20 Anton Kavalenka Note Added: 0058460
2012-04-30 19:05 Anton Kavalenka Note Added: 0059130
2012-04-30 19:07 Anton Kavalenka Note Edited: 0059130
2012-04-30 19:08 Anton Kavalenka File Added: project2-2.lpr
2012-04-30 19:08 Anton Kavalenka File Added: test.ini
2012-04-30 21:11 Marco van de Voort Note Added: 0059142
2012-04-30 23:22 Anton Kavalenka Note Added: 0059148
2012-05-03 04:44 Paul Ishenin Note Added: 0059202
2012-05-03 09:58 Marco van de Voort Note Added: 0059208
2012-05-04 18:51 Anton Kavalenka Note Added: 0059236
2012-05-04 18:52 Anton Kavalenka File Added: project3.lpr
2012-05-04 18:52 Anton Kavalenka Note Edited: 0059236
2012-09-26 12:43 Anton Kavalenka Note Added: 0062638
2012-09-26 13:50 Jonas Maebe Note Added: 0062642
2012-09-26 14:58 Jonas Maebe Relationship added related to 0022982
2012-09-26 16:13 Anton Kavalenka Note Added: 0062645
2012-09-26 16:23 Jonas Maebe Status new => resolved
2012-09-26 16:23 Jonas Maebe Resolution open => suspended
2012-09-26 16:23 Jonas Maebe Assigned To => Jonas Maebe
2012-09-26 16:23 Jonas Maebe Note Added: 0062646
2012-09-26 16:24 Anton Kavalenka Status resolved => closed
2012-09-27 11:03 Jonas Maebe Note Added: 0062670



MantisBT 1.2.12[^]
Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker