View Issue Details

IDProjectCategoryView StatusLast Update
0019875LazarusLCLpublic2012-08-23 15:01
ReporterTristan LinnellAssigned ToVincent Snijders 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version0.9.30Product Build 
Target VersionFixed in Version1.1 (SVN) 
Summary0019875: TXMLPropStorage does not save properties including UTF-8 characters,
DescriptionLazarus uses UTF-8 for strings, so if, for example, you set a TEdit.Caption to a string that features UTF-8 characters, the TXMLPropStorage does not work properly when saving/restoring the TEdit's contents.

An example:

Compile the example attached.
Run it.
Close it.
The xml file created is attached here too.

On restarting the application, an exception occurs reading an invalid character at Line 4, Pos 29.

TagsNo tags attached.
Fixed in Revision38347
LazTarget-
WidgetsetWin32/Win64
Attached Files
  • XMLProStorageUTF8.zip (2,161 bytes)
  • NULL.PNG (4,806 bytes)
    NULL.PNG (4,806 bytes)
  • new.xml (124 bytes)
  • xmlpropstorage1.png (16,060 bytes)
    xmlpropstorage1.png (16,060 bytes)
  • new.xml.png (4,869 bytes)
    new.xml.png (4,869 bytes)
  • project1.2.xml.png (5,824 bytes)
    project1.2.xml.png (5,824 bytes)
  • project1.2.xml (140 bytes)
  • xmlpropstorage.diff (1,018 bytes)
    Index: lcl/xmlpropstorage.pas
    ===================================================================
    --- lcl/xmlpropstorage.pas	(revision 37597)
    +++ lcl/xmlpropstorage.pas	(working copy)
    @@ -169,8 +169,11 @@
     
     function TCustomXMLPropStorage.DoReadString(const Section, Ident,
       TheDefault: string): string;
    +var
    +  Res: UnicodeString;
     begin
    -  Result:=FXML.GetValue(FixPath(Section)+'/'+Ident, TheDefault);
    +  Res:=FXML.GetValue(FixPath(Section)+'/'+Utf8Decode(Ident), Utf8Decode(TheDefault));
    +  Result := Utf8Encode(Res);
       //debugln('TCustomXMLPropStorage.DoReadString Section="',Section,'" Ident="',Ident,'" Result=',Result);
     end;
     
    @@ -178,7 +181,7 @@
       Value: string);
     begin
       //debugln('TCustomXMLPropStorage.DoWriteString Section="',Section,'" Ident="',Ident,'" Value="',Value,'"');
    -  FXML.SetValue(FixPath(Section)+'/'+Ident, Value);
    +  FXML.SetValue(FixPath(Section)+'/'+Utf8Decode(Ident), Utf8Decode(Value));
     end;
     
     procedure TCustomXMLPropStorage.DoEraseSections(const ARootSection: String);
    
    xmlpropstorage.diff (1,018 bytes)

Activities

2011-08-02 17:36

 

XMLProStorageUTF8.zip (2,161 bytes)

Bart Broersma

2011-08-03 13:19

developer   ~0050397

Seems to work OK with trunk (0.9.31 r31810, Fpc 2.4.4) on Linux.
Can you retest with trunk (or latest snapshot) and please aslo state your OS and FPC version.

Tristan Linnell

2011-08-03 14:05

reporter   ~0050398

Yes this is on Windows XP.
FPC is 2.4.2.
It is the released 0.9.30 for Windows.
Thanks,
Tristan

Bart Broersma

2011-08-03 19:03

developer   ~0050408

Last edited: 2011-08-03 19:04

Try this in the file xmlpropstorage.pas

Change the following:

uses
   Classes, SysUtils, FileUtil, LCLProc, Forms, PropertyStorage, XMLCfg, DOM,
   LazConfigStorage;

into:

uses
  Classes, SysUtils, FileUtil, LCLProc, Forms, PropertyStorage, XMLConf, DOM,
  LazConfigStorage;

This seems to be the only difference between Lazarus trunk and 0.9.30 for this file.

Rebuild the LCL (or Lazarus), then rebuild your program and see if that solves the problem.

Tristan Linnell

2011-08-04 11:58

reporter   ~0050424

OK.

I just tried this, and I get the same problem.
I believe it is to do with Windows using codepages (ansi encoding) and Lazarus using UTF-8 for all strings and properties.

Thanks,
Tristan

Bart Broersma

2011-08-04 12:23

developer   ~0050425

It works for me on WinMe (cp850).
The original chars show up as ??, since my Windows does not support any font for them.
If I use accented chars (äëïöü), it works as expected.

Codepages should not be a issue here.
The characters inside the xml file are UTF-8 encoded and read back (as UTF-8) to the LCL.

It might be an fpc issue.
You can try upgrading your compiler to 2.4.4 (stable) and then rebuild Lazarus.
You would end up having the same xmlpropstorage.pas and the same underlying fpc libraries as I have.

Tristan Linnell

2011-08-04 12:36

reporter   ~0050428

Thanks I might try that then.
Looking at my XML file, the saved fields are not UTF-8. The XML file was included in the original zip file I uploaded.
Thanks.

Bart Broersma

2011-08-05 14:58

developer   ~0050475

W3 Validator says the XML from your example passes as UTF-8.
The text looks like some kind of Chinese to me.
On Linux I unpacked, build and run the example.
The text in the editbox after startup is the same as in the designer.
The characters are definately not stored as single byte (copepage) characters, but as multi-byte sequences.

Shaun Simpson

2011-08-05 15:53

reporter   ~0050484

Last edited: 2011-08-05 16:48

There is a problem with loading the file project1.xml included in the archive.

I compiled the project and got an exception on launch:

"Project project1.exe raised exception class 'EXMLReadError=' with message:
In 'file:///C:/lazarus-projects/XMLProStorageUTF8/project1.xml' (line 4 pos 29): Invalid character"

I then deleted the XML file and restarted the application. The text in the edit box: 日本語
After restarting I got: 日本語

I tried the same after setting my Windows 7 box to Japanese Locale using the "Region and Language Settings" dialog.

I deleted the XML file and started the application.
Edit box: 日本語
After restart: 日本誁E

The text change indicates a problem with the encoding.

When changing to the Japanese locale, i get ????????? in the edit box on restart of the application.

If I change XMLPropStorage to use XMLConf the text is always wrong with the Japanese locale. i.e.
Edit box (no XML file): 日本誁E
After restart: 日本誁E

I will download FPC 2.4.4 and try that.

Edit: Tried 2.4.4 no change.

I am using:
Windows 7
Lazarus 0.9.30 (release)
FPC: 2.4.2

Thanks,
Shaun

Tristan Linnell

2011-08-05 16:40

reporter   ~0050486

Yes,
I am on the Japanese locale on Windows XP.

I have just tried again with the latest snapshot of Lazarus with FPC 2.4.4, and exactly the same effect is seen.

I have also attached a screenshot of opening the XML file up in a text editor, clearly showing a NULL character in the middle of the string (NULL.png).

2011-08-05 16:41

 

NULL.PNG (4,806 bytes)
NULL.PNG (4,806 bytes)

Bart Broersma

2011-08-06 16:03

developer   ~0050556

The byte sequence of the value of Edit1.text is completely different than the one in the attached xml file (the lfm is probably not utf-8?).
Still I get no read error upon programstart with the provided xml.

Try this one (just to see that the string is correct (I removed the trailing zero):

 S := #$E8 + #$AD + #$8C + #$EF + #$BD + #$A5 + #$E8 + #$AD + #$9B + #$EF + #$BD + #$AC + #$E9 + #$9A + #$B1;
 Edit1.Text := S;

Does this show the correct text in Edit1?

I attach a file "new.xml".
Copy this to project1.xml and then run the program. Does the read-error go away?

2011-08-06 16:03

 

new.xml (124 bytes)

Bart Broersma

2011-08-06 16:50

developer   ~0050558

Running the compiled app on Win7 with the provided xml gives me the same read error as described. It fills edit with 3 questionmarks.

Running the code in my note above, starting the program without the xml file, (still on Win7) gives the result as seen in attached screenshot: xmlpropstorage1.png
The characters in Edit1 seem to be the same as in the file new.xml (if I open the latter in IE. See: new.xml.png screenshot).

When I close the program it now creates a very different file (see project1.2.xml).
This does not seem to contain UTF-8 characters (see: project1.2.xml.png screenshot)
When you run the program again, Edit1 has the same text in it.

Copying new.xml to project1.xml and then running the program shows 5 question marks in Edit.

2011-08-06 16:51

 

xmlpropstorage1.png (16,060 bytes)
xmlpropstorage1.png (16,060 bytes)

2011-08-06 16:51

 

new.xml.png (4,869 bytes)
new.xml.png (4,869 bytes)

2011-08-06 16:52

 

project1.2.xml.png (5,824 bytes)
project1.2.xml.png (5,824 bytes)

2011-08-06 16:52

 

project1.2.xml (140 bytes)

Bart Broersma

2011-08-06 17:32

developer   ~0050562

Msybe some conversion issue with xmlread/xmlwrite where ther seems to be some internal widestring to utf-8 conversion?

Bart Broersma

2012-06-28 20:26

developer   ~0060707

Lazarus 1.1 r37467 FPC 2.6.0 i386-win32-win32/win64
Tested on Win7 (64-bit OS, 32-bit Lazarus/fpc)

If I build your sample app (I removed the project1.xml) and run it, then the text in the Edit is: 日本語 (hex: E6 97 A5 E6 9C AC E8 AA 9E)
I close app and restart it: Text is the same.
I close app and restart.
I change text to 日本語語 (hex: E6 97 A5 E6 9C AC E8 AA 9E E8 AA 9E)
I close app and restart it: Text is as expected: 日本語語 (hex: E6 97 A5 E6 9C AC E8 AA 9E E8 AA 9E)

Bart Broersma

2012-06-28 21:07

developer   ~0060708

The produced xml file however does not look alright at all.

This is the hex of the string in Edit1:
E6 97 A5 E6 9C AC E8 AA 9E

This is the hex of the string stored in project1.xml
C3 A6 E2 80 94 C2 A5 C3 A6 C5 93 C2 AC C3 A8 C2 AA C5 BE
and opend in an editor capable of utf-8 it looks like this: 日本語
This value turns out to be the result of Utf8Encode('日本語') (the original text in Edit1).

Since all Lazarus controls are Utf8, TXMLPropStorage should't Utf8Encode the strings it stores in the xml files??

2012-06-28 21:22

 

xmlpropstorage.diff (1,018 bytes)
Index: lcl/xmlpropstorage.pas
===================================================================
--- lcl/xmlpropstorage.pas	(revision 37597)
+++ lcl/xmlpropstorage.pas	(working copy)
@@ -169,8 +169,11 @@
 
 function TCustomXMLPropStorage.DoReadString(const Section, Ident,
   TheDefault: string): string;
+var
+  Res: UnicodeString;
 begin
-  Result:=FXML.GetValue(FixPath(Section)+'/'+Ident, TheDefault);
+  Res:=FXML.GetValue(FixPath(Section)+'/'+Utf8Decode(Ident), Utf8Decode(TheDefault));
+  Result := Utf8Encode(Res);
   //debugln('TCustomXMLPropStorage.DoReadString Section="',Section,'" Ident="',Ident,'" Result=',Result);
 end;
 
@@ -178,7 +181,7 @@
   Value: string);
 begin
   //debugln('TCustomXMLPropStorage.DoWriteString Section="',Section,'" Ident="',Ident,'" Value="',Value,'"');
-  FXML.SetValue(FixPath(Section)+'/'+Ident, Value);
+  FXML.SetValue(FixPath(Section)+'/'+Utf8Decode(Ident), Utf8Decode(Value));
 end;
 
 procedure TCustomXMLPropStorage.DoEraseSections(const ARootSection: String);
xmlpropstorage.diff (1,018 bytes)

Bart Broersma

2012-06-28 21:27

developer   ~0060709

See attached xmlpropstorage.diff.
It treats incoming strings as Utf8 and converts them to UnicodeStings before calling TXMLConfig.SetValue(), which expects widestrings.
It does the reverse conversion upon reading.
This will properly create Utf8 xml files.

Vincent Snijders

2012-08-23 15:00

manager   ~0061834

Bart, thanks for the patch, I adapted it slightly.

Issue History

Date Modified Username Field Change
2011-08-02 17:36 Tristan Linnell New Issue
2011-08-02 17:36 Tristan Linnell File Added: XMLProStorageUTF8.zip
2011-08-02 17:36 Tristan Linnell Widgetset => Win32/Win64
2011-08-03 13:19 Bart Broersma LazTarget => -
2011-08-03 13:19 Bart Broersma Note Added: 0050397
2011-08-03 13:19 Bart Broersma Status new => feedback
2011-08-03 14:05 Tristan Linnell Note Added: 0050398
2011-08-03 19:03 Bart Broersma Note Added: 0050408
2011-08-03 19:04 Bart Broersma Note Edited: 0050408
2011-08-04 11:58 Tristan Linnell Note Added: 0050424
2011-08-04 12:23 Bart Broersma Note Added: 0050425
2011-08-04 12:36 Tristan Linnell Note Added: 0050428
2011-08-05 14:58 Bart Broersma Note Added: 0050475
2011-08-05 15:53 Shaun Simpson Note Added: 0050484
2011-08-05 16:04 Shaun Simpson Note Edited: 0050484
2011-08-05 16:08 Shaun Simpson Note Edited: 0050484
2011-08-05 16:09 Shaun Simpson Note Edited: 0050484
2011-08-05 16:13 Shaun Simpson Note Edited: 0050484
2011-08-05 16:40 Tristan Linnell Note Added: 0050486
2011-08-05 16:41 Tristan Linnell File Added: NULL.PNG
2011-08-05 16:48 Shaun Simpson Note Edited: 0050484
2011-08-06 16:03 Bart Broersma Note Added: 0050556
2011-08-06 16:03 Bart Broersma File Added: new.xml
2011-08-06 16:50 Bart Broersma Note Added: 0050558
2011-08-06 16:51 Bart Broersma File Added: xmlpropstorage1.png
2011-08-06 16:51 Bart Broersma File Added: new.xml.png
2011-08-06 16:52 Bart Broersma File Added: project1.2.xml.png
2011-08-06 16:52 Bart Broersma File Added: project1.2.xml
2011-08-06 17:32 Bart Broersma Note Added: 0050562
2012-03-13 13:40 Vincent Snijders Status feedback => acknowledged
2012-06-28 20:26 Bart Broersma Note Added: 0060707
2012-06-28 21:07 Bart Broersma Note Added: 0060708
2012-06-28 21:22 Bart Broersma File Added: xmlpropstorage.diff
2012-06-28 21:27 Bart Broersma Note Added: 0060709
2012-08-23 15:00 Vincent Snijders Fixed in Revision => 38347
2012-08-23 15:00 Vincent Snijders Status acknowledged => resolved
2012-08-23 15:00 Vincent Snijders Fixed in Version => 1.1 (SVN)
2012-08-23 15:00 Vincent Snijders Resolution open => fixed
2012-08-23 15:00 Vincent Snijders Assigned To => Vincent Snijders
2012-08-23 15:00 Vincent Snijders Note Added: 0061834