View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0014504||Lazarus||Widgetset||public||2009-09-04 22:59||2013-05-06 09:47|
|Reporter||Radu Dan||Assigned To||Vincent Snijders|
|Status||closed||Resolution||no change required|
|Product Version||0.9.29 (SVN)|
|Summary||0014504: ExpandFileNameUTF8 does not work|
|Description||I came across this when selecting a non-ASCII file name within a TOpenDialog, but I think that it affects TSaveDialog in the same manner.|
All the non-ASCII characters from the file name were converted to a question mark (?)
Via trial and error, I located the problem to be within the common (non-interface) trunk, dialogs.pp, in the method Dereference from the TOpenDialog class. This method passes all the selected files through ExpandFileNameUTF8, which in turn is a double conversion to ANSI while passing the ansistring to ExpandFileName
I haven't quite figured out where exactly within this process does the filename get garbled (otherwise I would have submited a patch), but I think that this is a big issue for non-unicode filesystems which effectively limits file support.
For any interested user, a quick and dirty fix for this is to add the NoDereferenceLinks option into your TOpenDialog (provided your application does not need it). Doing so will bypass ExpandFileNameUTF8 and will leave your file name(s) the same way the OS reported them.
|Tags||unicode, utf8, win32|
|Fixed in Revision|
||Sorry, I reported this bug as IDE instead of LCL. Could a moderator correct this?|
I cannot reproduce this issue with the current version. Can you try a snapshot from http://www.hu.freepascal.org/lazarus/
If you reproduce this issue with a snapshot, please upload a zip file with a test project (source only) and the file I must try to open.
bug.zip (96,067 bytes)
Tested using the lazarus 0.9.29 win32 with fpc 2.2.4 from http://snapshots.lazarus.shikami.org/ (for some reason, I could not connect to ftp.hu.freepascal.org), and can confirm this happens on latest build as well.
I removed most of the files from the project file to minimize archive size, but compilation shouldn't be a problem.
Attached are two empty text files with Romanian national characters (they are titles to tracks by a Romanian rock band). I did not upload the original files due to size constraints, but the extension or file content doesn't even matter as they are never opened.
||See atthached screenshot. I tested with lazarus 0.9.29 on windows xp 32 bits, but I don't see what is wrong.|
Compiled using Lazarus-0.9.29-21629-fpc-2.2.4-20090909 on Windows Vista Business 32bit
Maybe this is a Vista issue, or you have compiled it using fpc 2.3.1, I don't know (I'll check this evening if that's the case)
On a final note, check whether ofNoDereferenceLinks is included in the dialog's options. It shouldn't be for the bug to appear (that's what the Bypass button does)
I tried the bypass button too, which removes it back afterwards.
Can you add the line:
Label1.Caption := IntToStr(Win32Platform);
Please, tell what the value of Win32Platform is.
||I tried on vista 64 bits, but I cannot reproduce this there either. I used fpc 2.2.4.|
If I look up the dereference code in the dialogs, it seems to me a no-op for windows. Did you step through this code?
Sorry, I can't really use the debugger (for some reason, the application always fails to start after the first run and the only way to continue is to stop the debugger)
As for Win32Platform, it says 2.
Could this be a RTL issue? (sorry for the long response, it seems mantis stopped sending me notices when this bug is updated)
I have no idea where the error is. And because I cannot reproduce it, I cannot step through the code to see where a wrong (or unneeded) conversion is made.
For your debugging problems: Run -> Reset debugger is your fried, give it a shortcut.
Which codepage are you using ?
I think the problem is with chars not present in used codepage because ExpandFileNameUTF8 does a strange thing:
UTF8 -> ANSI -> ExpandFileNameANSI -> UTF8
This means that any char not valid in current codepage will be discarded and replaced with '?' if using fpc >= 2.3.1 and never recovered. So in practice there is no difference in using:
Maybe it could be checked creating a filename with this UTF8 sequence:
Which is the unicode U+024B62 but that file can not be created using fpc's RTL.
This was my first guess as well; I said to myself that there's no way in hell an UTF-8 encoded string can be represented as ANSI, and that's the reason the method fails. However, it seems to work for you, and I can't really explain why.
I'm using the default codepage in vista, I think Latin-1. Do you know some easy way to check?
Either way, NTFS is unicode based and since it has mostly replaced FAT for all purposes except for removable storage, I believe that this is an issue that transcends the default operating system codepage. Since the LCL is UTF-8 based and the RTL is ANSI, maybe the ExpandFileName call should be delegated to the widgetset, so it can properly implement Unicode support for file names.
Or am I talking nonsense?
>Since the LCL is UTF-8 based and the RTL is ANSI, maybe the ExpandFileName call should be delegated to the widgetset, so it can properly implement Unicode support for file names.
The RTL should be extended to handle Unicode file names:
||José Mejuto, the question is: where is ExpandFileNameUTF8 called?|
Yes, Vincent it is never called (the subject confuses me) so to pay for my distraction I had investigated a bit more everything and see some facts.
1) fpc 2.2.4 does not convert invalid chars to '?' so that conversion should be done by the operative system.
2) Not all chars > 128 are converted to '?' only some.
So maybe his Win installation is wrong at some point, or Lazarus has been compiled without the "UnicodeSupport" or the choosen font can not represent the chars.
I had changed my WinXP to Romanian settings and the filename is still correctly displayed. Checked also in Vista 64 and everything looks fine.
Radu: Can you check you test exe file in a different windows installation ?
||UTF8ToUTF16 does convert invalid chars ?, IIRC.|
Oh boy, this turned out to be a lot more trouble than it's worth.
You are right, this is no longer an issue in 0.9.29 SVN, but only because the call to ReadAllLinks does nothing on Win32/64, and thus, a NoDereferenceLink option is implied.
The reason I was still getting the error in SVN was stupidity on my part. I had two concurrent installations of lazarus (stable and svn), and I was using the svn to compile the stable due to the way the directories were configured in the svn installation.
After a proper recompilation with 2.3.1 and a lazarus snapshot, I can confirm the behaviour Vincent has been experiencing.
However, ExpandFileNameUTF8 still does not work, and should the selected file be a symbolic link, the test case would yield a wrong answer as it does nothing to dereference the link.
Last but not least, I can still get the same behaviour (some non-ASCII chars converted to ?) by using:
Again, really sorry for the confusion I caused with my two concurrent installations
AFAIK, dereferencing links is done by the OS or the widgetset (the distinction is not so clear) on windows, so it works.
Fixing ExpandFileNameUTF8(OpenDialog1.Filename) needs a more unicode enabled RTL.
|2009-09-04 22:59||Radu Dan||New Issue|
|2009-09-04 22:59||Radu Dan||Widgetset||=> Win32/Win64|
|2009-09-04 23:00||Radu Dan||Tag Attached: unicode|
|2009-09-04 23:00||Radu Dan||Tag Attached: win32|
|2009-09-04 23:00||Radu Dan||Tag Attached: duplicate identifiers|
|2009-09-04 23:00||Radu Dan||Tag Attached: utf8|
|2009-09-05 01:18||Radu Dan||Note Added: 0030431|
|2009-09-06 19:25||Vincent Snijders||LazTarget||=> -|
|2009-09-06 19:25||Vincent Snijders||Category||IDE => Widgetset|
|2009-09-06 19:52||Vincent Snijders||Note Added: 0030484|
|2009-09-06 19:52||Vincent Snijders||Status||new => feedback|
|2009-09-06 19:52||Vincent Snijders||LazTarget||- => 1.0|
|2009-09-06 19:52||Vincent Snijders||Target Version||=> 1.0.0|
|2009-09-06 23:13||Radu Dan||File Added: bug.zip|
|2009-09-06 23:17||Radu Dan||Note Added: 0030493|
|2009-09-08 14:55||Vincent Snijders||Status||feedback => assigned|
|2009-09-08 14:55||Vincent Snijders||Status||assigned => acknowledged|
|2009-09-08 14:56||Vincent Snijders||Product Version||0.9.26.2 => 0.9.29 (SVN)|
|2009-09-08 21:10||Vincent Snijders||Tag Detached: duplicate identifiers|
|2009-09-08 21:11||Vincent Snijders||File Added: bug14504.PNG|
|2009-09-08 21:13||Vincent Snijders||Note Added: 0030549|
|2009-09-08 21:13||Vincent Snijders||Status||acknowledged => feedback|
|2009-09-09 16:24||Radu Dan||File Added: vista_screenshot.png|
|2009-09-09 16:28||Radu Dan||Note Added: 0030568|
|2009-09-09 16:45||Vincent Snijders||Note Added: 0030569|
|2009-09-09 16:51||Vincent Snijders||Note Added: 0030570|
|2009-09-09 17:05||Vincent Snijders||Note Added: 0030571|
|2009-09-09 17:05||Vincent Snijders||Note Edited: 0030571|
|2009-09-10 23:54||Radu Dan||Note Added: 0030618|
|2009-09-11 09:31||Vincent Snijders||Note Added: 0030619|
|2009-09-11 17:37||José Mejuto||Note Added: 0030639|
|2009-09-11 21:07||Radu Dan||Note Added: 0030641|
|2009-09-11 21:38||Vincent Snijders||Note Added: 0030642|
|2009-09-11 21:39||Vincent Snijders||Note Added: 0030643|
|2009-09-12 16:32||José Mejuto||Note Added: 0030660|
|2009-09-14 10:18||Vincent Snijders||Note Added: 0030705|
|2009-09-14 21:34||Radu Dan||Note Added: 0030714|
|2009-09-14 21:56||Vincent Snijders||Note Added: 0030715|
|2009-09-14 21:57||Vincent Snijders||Status||feedback => resolved|
|2009-09-14 21:57||Vincent Snijders||Resolution||open => no change required|
|2009-09-14 21:57||Vincent Snijders||Assigned To||=> Vincent Snijders|
|2011-12-01 11:22||Marc Weustink||Status||resolved => closed|
|2013-05-06 09:47||Juha Manninen||Relationship added||related to 0020835|