View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0019875||Lazarus||LCL||public||2011-08-02 17:36||2012-08-23 15:01|
|Reporter||Tristan Linnell||Assigned To||Vincent Snijders|
|Fixed in Version||1.1 (SVN)|
|Summary||0019875: TXMLPropStorage does not save properties including UTF-8 characters,|
|Description||Lazarus uses UTF-8 for strings, so if, for example, you set a TEdit.Caption to a string that features UTF-8 characters, the TXMLPropStorage does not work properly when saving/restoring the TEdit's contents.|
Compile the example attached.
The xml file created is attached here too.
On restarting the application, an exception occurs reading an invalid character at Line 4, Pos 29.
|Tags||No tags attached.|
|Fixed in Revision||38347|
XMLProStorageUTF8.zip (2,161 bytes)
Seems to work OK with trunk (0.9.31 r31810, Fpc 2.4.4) on Linux.
Can you retest with trunk (or latest snapshot) and please aslo state your OS and FPC version.
Yes this is on Windows XP.
FPC is 2.4.2.
It is the released 0.9.30 for Windows.
Try this in the file xmlpropstorage.pas
Change the following:
Classes, SysUtils, FileUtil, LCLProc, Forms, PropertyStorage, XMLCfg, DOM,
Classes, SysUtils, FileUtil, LCLProc, Forms, PropertyStorage, XMLConf, DOM,
This seems to be the only difference between Lazarus trunk and 0.9.30 for this file.
Rebuild the LCL (or Lazarus), then rebuild your program and see if that solves the problem.
I just tried this, and I get the same problem.
I believe it is to do with Windows using codepages (ansi encoding) and Lazarus using UTF-8 for all strings and properties.
It works for me on WinMe (cp850).
The original chars show up as ??, since my Windows does not support any font for them.
If I use accented chars (äëïöü), it works as expected.
Codepages should not be a issue here.
The characters inside the xml file are UTF-8 encoded and read back (as UTF-8) to the LCL.
It might be an fpc issue.
You can try upgrading your compiler to 2.4.4 (stable) and then rebuild Lazarus.
You would end up having the same xmlpropstorage.pas and the same underlying fpc libraries as I have.
Thanks I might try that then.
Looking at my XML file, the saved fields are not UTF-8. The XML file was included in the original zip file I uploaded.
W3 Validator says the XML from your example passes as UTF-8.
The text looks like some kind of Chinese to me.
On Linux I unpacked, build and run the example.
The text in the editbox after startup is the same as in the designer.
The characters are definately not stored as single byte (copepage) characters, but as multi-byte sequences.
There is a problem with loading the file project1.xml included in the archive.
I compiled the project and got an exception on launch:
"Project project1.exe raised exception class 'EXMLReadError=' with message:
In 'file:///C:/lazarus-projects/XMLProStorageUTF8/project1.xml' (line 4 pos 29): Invalid character"
I then deleted the XML file and restarted the application. The text in the edit box: 日本語
After restarting I got: 日本語
I tried the same after setting my Windows 7 box to Japanese Locale using the "Region and Language Settings" dialog.
I deleted the XML file and started the application.
Edit box: 日本語
After restart: 日本誁E
The text change indicates a problem with the encoding.
When changing to the Japanese locale, i get ????????? in the edit box on restart of the application.
If I change XMLPropStorage to use XMLConf the text is always wrong with the Japanese locale. i.e.
Edit box (no XML file): 日本誁E
After restart: 日本誁E
I will download FPC 2.4.4 and try that.
Edit: Tried 2.4.4 no change.
I am using:
Lazarus 0.9.30 (release)
I am on the Japanese locale on Windows XP.
I have just tried again with the latest snapshot of Lazarus with FPC 2.4.4, and exactly the same effect is seen.
I have also attached a screenshot of opening the XML file up in a text editor, clearly showing a NULL character in the middle of the string (NULL.png).
The byte sequence of the value of Edit1.text is completely different than the one in the attached xml file (the lfm is probably not utf-8?).
Still I get no read error upon programstart with the provided xml.
Try this one (just to see that the string is correct (I removed the trailing zero):
S := #$E8 + #$AD + #$8C + #$EF + #$BD + #$A5 + #$E8 + #$AD + #$9B + #$EF + #$BD + #$AC + #$E9 + #$9A + #$B1;
Edit1.Text := S;
Does this show the correct text in Edit1?
I attach a file "new.xml".
Copy this to project1.xml and then run the program. Does the read-error go away?
Running the compiled app on Win7 with the provided xml gives me the same read error as described. It fills edit with 3 questionmarks.
Running the code in my note above, starting the program without the xml file, (still on Win7) gives the result as seen in attached screenshot: xmlpropstorage1.png
The characters in Edit1 seem to be the same as in the file new.xml (if I open the latter in IE. See: new.xml.png screenshot).
When I close the program it now creates a very different file (see project1.2.xml).
This does not seem to contain UTF-8 characters (see: project1.2.xml.png screenshot)
When you run the program again, Edit1 has the same text in it.
Copying new.xml to project1.xml and then running the program shows 5 question marks in Edit.
||Msybe some conversion issue with xmlread/xmlwrite where ther seems to be some internal widestring to utf-8 conversion?|
Lazarus 1.1 r37467 FPC 2.6.0 i386-win32-win32/win64
Tested on Win7 (64-bit OS, 32-bit Lazarus/fpc)
If I build your sample app (I removed the project1.xml) and run it, then the text in the Edit is: 日本語 (hex: E6 97 A5 E6 9C AC E8 AA 9E)
I close app and restart it: Text is the same.
I close app and restart.
I change text to 日本語語 (hex: E6 97 A5 E6 9C AC E8 AA 9E E8 AA 9E)
I close app and restart it: Text is as expected: 日本語語 (hex: E6 97 A5 E6 9C AC E8 AA 9E E8 AA 9E)
The produced xml file however does not look alright at all.
This is the hex of the string in Edit1:
E6 97 A5 E6 9C AC E8 AA 9E
This is the hex of the string stored in project1.xml
C3 A6 E2 80 94 C2 A5 C3 A6 C5 93 C2 AC C3 A8 C2 AA C5 BE
and opend in an editor capable of utf-8 it looks like this: æ—¥æœ¬èªž
This value turns out to be the result of Utf8Encode('日本語') (the original text in Edit1).
Since all Lazarus controls are Utf8, TXMLPropStorage should't Utf8Encode the strings it stores in the xml files??
xmlpropstorage.diff (1,018 bytes)
Index: lcl/xmlpropstorage.pas =================================================================== --- lcl/xmlpropstorage.pas (revision 37597) +++ lcl/xmlpropstorage.pas (working copy) @@ -169,8 +169,11 @@ function TCustomXMLPropStorage.DoReadString(const Section, Ident, TheDefault: string): string; +var + Res: UnicodeString; begin - Result:=FXML.GetValue(FixPath(Section)+'/'+Ident, TheDefault); + Res:=FXML.GetValue(FixPath(Section)+'/'+Utf8Decode(Ident), Utf8Decode(TheDefault)); + Result := Utf8Encode(Res); //debugln('TCustomXMLPropStorage.DoReadString Section="',Section,'" Ident="',Ident,'" Result=',Result); end; @@ -178,7 +181,7 @@ Value: string); begin //debugln('TCustomXMLPropStorage.DoWriteString Section="',Section,'" Ident="',Ident,'" Value="',Value,'"'); - FXML.SetValue(FixPath(Section)+'/'+Ident, Value); + FXML.SetValue(FixPath(Section)+'/'+Utf8Decode(Ident), Utf8Decode(Value)); end; procedure TCustomXMLPropStorage.DoEraseSections(const ARootSection: String);
xmlpropstorage.diff (1,018 bytes)
See attached xmlpropstorage.diff.
It treats incoming strings as Utf8 and converts them to UnicodeStings before calling TXMLConfig.SetValue(), which expects widestrings.
It does the reverse conversion upon reading.
This will properly create Utf8 xml files.
||Bart, thanks for the patch, I adapted it slightly.|
|2011-08-02 17:36||Tristan Linnell||New Issue|
|2011-08-02 17:36||Tristan Linnell||File Added: XMLProStorageUTF8.zip|
|2011-08-02 17:36||Tristan Linnell||Widgetset||=> Win32/Win64|
|2011-08-03 13:19||Bart Broersma||LazTarget||=> -|
|2011-08-03 13:19||Bart Broersma||Note Added: 0050397|
|2011-08-03 13:19||Bart Broersma||Status||new => feedback|
|2011-08-03 14:05||Tristan Linnell||Note Added: 0050398|
|2011-08-03 19:03||Bart Broersma||Note Added: 0050408|
|2011-08-03 19:04||Bart Broersma||Note Edited: 0050408|
|2011-08-04 11:58||Tristan Linnell||Note Added: 0050424|
|2011-08-04 12:23||Bart Broersma||Note Added: 0050425|
|2011-08-04 12:36||Tristan Linnell||Note Added: 0050428|
|2011-08-05 14:58||Bart Broersma||Note Added: 0050475|
|2011-08-05 15:53||Shaun Simpson||Note Added: 0050484|
|2011-08-05 16:04||Shaun Simpson||Note Edited: 0050484|
|2011-08-05 16:08||Shaun Simpson||Note Edited: 0050484|
|2011-08-05 16:09||Shaun Simpson||Note Edited: 0050484|
|2011-08-05 16:13||Shaun Simpson||Note Edited: 0050484|
|2011-08-05 16:40||Tristan Linnell||Note Added: 0050486|
|2011-08-05 16:41||Tristan Linnell||File Added: NULL.PNG|
|2011-08-05 16:48||Shaun Simpson||Note Edited: 0050484|
|2011-08-06 16:03||Bart Broersma||Note Added: 0050556|
|2011-08-06 16:03||Bart Broersma||File Added: new.xml|
|2011-08-06 16:50||Bart Broersma||Note Added: 0050558|
|2011-08-06 16:51||Bart Broersma||File Added: xmlpropstorage1.png|
|2011-08-06 16:51||Bart Broersma||File Added: new.xml.png|
|2011-08-06 16:52||Bart Broersma||File Added: project1.2.xml.png|
|2011-08-06 16:52||Bart Broersma||File Added: project1.2.xml|
|2011-08-06 17:32||Bart Broersma||Note Added: 0050562|
|2012-03-13 13:40||Vincent Snijders||Status||feedback => acknowledged|
|2012-06-28 20:26||Bart Broersma||Note Added: 0060707|
|2012-06-28 21:07||Bart Broersma||Note Added: 0060708|
|2012-06-28 21:22||Bart Broersma||File Added: xmlpropstorage.diff|
|2012-06-28 21:27||Bart Broersma||Note Added: 0060709|
|2012-08-23 15:00||Vincent Snijders||Fixed in Revision||=> 38347|
|2012-08-23 15:00||Vincent Snijders||Status||acknowledged => resolved|
|2012-08-23 15:00||Vincent Snijders||Fixed in Version||=> 1.1 (SVN)|
|2012-08-23 15:00||Vincent Snijders||Resolution||open => fixed|
|2012-08-23 15:00||Vincent Snijders||Assigned To||=> Vincent Snijders|
|2012-08-23 15:00||Vincent Snijders||Note Added: 0061834|