View Issue Details

IDProjectCategoryView StatusLast Update
0014504LazarusWidgetsetpublic2013-05-06 09:47
ReporterRadu Dan Assigned ToVincent Snijders  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionno change required 
Platformi386OSWindows 
Product Version0.9.29 (SVN) 
Target Version1.0.0 
Summary0014504: ExpandFileNameUTF8 does not work
DescriptionI came across this when selecting a non-ASCII file name within a TOpenDialog, but I think that it affects TSaveDialog in the same manner.

All the non-ASCII characters from the file name were converted to a question mark (?)

Via trial and error, I located the problem to be within the common (non-interface) trunk, dialogs.pp, in the method Dereference from the TOpenDialog class. This method passes all the selected files through ExpandFileNameUTF8, which in turn is a double conversion to ANSI while passing the ansistring to ExpandFileName

I haven't quite figured out where exactly within this process does the filename get garbled (otherwise I would have submited a patch), but I think that this is a big issue for non-unicode filesystems which effectively limits file support.

For any interested user, a quick and dirty fix for this is to add the NoDereferenceLinks option into your TOpenDialog (provided your application does not need it). Doing so will bypass ExpandFileNameUTF8 and will leave your file name(s) the same way the OS reported them.
Tagsunicode, utf8, win32
Fixed in Revision
LazTarget1.0
WidgetsetWin32/Win64
Attached Files

Relationships

related to 0020835 closedBart Broersma ExpandFileNameUTF8 broken 

Activities

Radu Dan

2009-09-05 01:18

reporter   ~0030431

Sorry, I reported this bug as IDE instead of LCL. Could a moderator correct this?

Vincent Snijders

2009-09-06 19:52

manager   ~0030484

I cannot reproduce this issue with the current version. Can you try a snapshot from http://www.hu.freepascal.org/lazarus/

If you reproduce this issue with a snapshot, please upload a zip file with a test project (source only) and the file I must try to open.

2009-09-06 23:13

 

bug.zip (96,067 bytes)

Radu Dan

2009-09-06 23:17

reporter   ~0030493

Tested using the lazarus 0.9.29 win32 with fpc 2.2.4 from http://snapshots.lazarus.shikami.org/ (for some reason, I could not connect to ftp.hu.freepascal.org), and can confirm this happens on latest build as well.

I removed most of the files from the project file to minimize archive size, but compilation shouldn't be a problem.
Attached are two empty text files with Romanian national characters (they are titles to tracks by a Romanian rock band). I did not upload the original files due to size constraints, but the extension or file content doesn't even matter as they are never opened.

2009-09-08 21:11

 

bug14504.PNG (13,068 bytes)   
bug14504.PNG (13,068 bytes)   

Vincent Snijders

2009-09-08 21:13

manager   ~0030549

See atthached screenshot. I tested with lazarus 0.9.29 on windows xp 32 bits, but I don't see what is wrong.

2009-09-09 16:24

 

vista_screenshot.png (32,929 bytes)   
vista_screenshot.png (32,929 bytes)   

Radu Dan

2009-09-09 16:28

reporter   ~0030568

Compiled using Lazarus-0.9.29-21629-fpc-2.2.4-20090909 on Windows Vista Business 32bit

Maybe this is a Vista issue, or you have compiled it using fpc 2.3.1, I don't know (I'll check this evening if that's the case)

On a final note, check whether ofNoDereferenceLinks is included in the dialog's options. It shouldn't be for the bug to appear (that's what the Bypass button does)

Vincent Snijders

2009-09-09 16:45

manager   ~0030569

I tried the bypass button too, which removes it back afterwards.

Can you add the line:
Label1.Caption := IntToStr(Win32Platform);

Please, tell what the value of Win32Platform is.

Vincent Snijders

2009-09-09 16:51

manager   ~0030570

I tried on vista 64 bits, but I cannot reproduce this there either. I used fpc 2.2.4.

Vincent Snijders

2009-09-09 17:05

manager   ~0030571

Last edited: 2009-09-09 17:05

If I look up the dereference code in the dialogs, it seems to me a no-op for windows. Did you step through this code?

Radu Dan

2009-09-10 23:54

reporter   ~0030618

Sorry, I can't really use the debugger (for some reason, the application always fails to start after the first run and the only way to continue is to stop the debugger)

As for Win32Platform, it says 2.

Could this be a RTL issue? (sorry for the long response, it seems mantis stopped sending me notices when this bug is updated)

Vincent Snijders

2009-09-11 09:31

manager   ~0030619

I have no idea where the error is. And because I cannot reproduce it, I cannot step through the code to see where a wrong (or unneeded) conversion is made.

For your debugging problems: Run -> Reset debugger is your fried, give it a shortcut.

José Mejuto

2009-09-11 17:37

reporter   ~0030639

Which codepage are you using ?

I think the problem is with chars not present in used codepage because ExpandFileNameUTF8 does a strange thing:

UTF8 -> ANSI -> ExpandFileNameANSI -> UTF8

This means that any char not valid in current codepage will be discarded and replaced with '?' if using fpc >= 2.3.1 and never recovered. So in practice there is no difference in using:

Result:=UTF8Encode(ExpandFileName(UTF8Decode(String)));
or
Result:=ExpandFileNameUTF8(String);

Maybe it could be checked creating a filename with this UTF8 sequence:

#$F0+#$A4+#$AD+#$A2

Which is the unicode U+024B62 but that file can not be created using fpc's RTL.

Radu Dan

2009-09-11 21:07

reporter   ~0030641

This was my first guess as well; I said to myself that there's no way in hell an UTF-8 encoded string can be represented as ANSI, and that's the reason the method fails. However, it seems to work for you, and I can't really explain why.

I'm using the default codepage in vista, I think Latin-1. Do you know some easy way to check?

Either way, NTFS is unicode based and since it has mostly replaced FAT for all purposes except for removable storage, I believe that this is an issue that transcends the default operating system codepage. Since the LCL is UTF-8 based and the RTL is ANSI, maybe the ExpandFileName call should be delegated to the widgetset, so it can properly implement Unicode support for file names.

Or am I talking nonsense?

Vincent Snijders

2009-09-11 21:38

manager   ~0030642

>Since the LCL is UTF-8 based and the RTL is ANSI, maybe the ExpandFileName call should be delegated to the widgetset, so it can properly implement Unicode support for file names.

The RTL should be extended to handle Unicode file names:
http://bugs.freepascal.org/view.php?id=12923

Vincent Snijders

2009-09-11 21:39

manager   ~0030643

José Mejuto, the question is: where is ExpandFileNameUTF8 called?

José Mejuto

2009-09-12 16:32

reporter   ~0030660

Yes, Vincent it is never called (the subject confuses me) so to pay for my distraction I had investigated a bit more everything and see some facts.

1) fpc 2.2.4 does not convert invalid chars to '?' so that conversion should be done by the operative system.

2) Not all chars > 128 are converted to '?' only some.

So maybe his Win installation is wrong at some point, or Lazarus has been compiled without the "UnicodeSupport" or the choosen font can not represent the chars.

I had changed my WinXP to Romanian settings and the filename is still correctly displayed. Checked also in Vista 64 and everything looks fine.

Radu: Can you check you test exe file in a different windows installation ?

Vincent Snijders

2009-09-14 10:18

manager   ~0030705

UTF8ToUTF16 does convert invalid chars ?, IIRC.

Radu Dan

2009-09-14 21:34

reporter   ~0030714

Oh boy, this turned out to be a lot more trouble than it's worth.

You are right, this is no longer an issue in 0.9.29 SVN, but only because the call to ReadAllLinks does nothing on Win32/64, and thus, a NoDereferenceLink option is implied.

The reason I was still getting the error in SVN was stupidity on my part. I had two concurrent installations of lazarus (stable and svn), and I was using the svn to compile the stable due to the way the directories were configured in the svn installation.

After a proper recompilation with 2.3.1 and a lazarus snapshot, I can confirm the behaviour Vincent has been experiencing.

However, ExpandFileNameUTF8 still does not work, and should the selected file be a symbolic link, the test case would yield a wrong answer as it does nothing to dereference the link.

Last but not least, I can still get the same behaviour (some non-ASCII chars converted to ?) by using:

Label1.Caption:=ExpandFileNameUTF8(OpenDialog1.Filename);

Again, really sorry for the confusion I caused with my two concurrent installations

Vincent Snijders

2009-09-14 21:56

manager   ~0030715

AFAIK, dereferencing links is done by the OS or the widgetset (the distinction is not so clear) on windows, so it works.

Fixing ExpandFileNameUTF8(OpenDialog1.Filename) needs a more unicode enabled RTL.

Issue History

Date Modified Username Field Change
2009-09-04 22:59 Radu Dan New Issue
2009-09-04 22:59 Radu Dan Widgetset => Win32/Win64
2009-09-04 23:00 Radu Dan Tag Attached: unicode
2009-09-04 23:00 Radu Dan Tag Attached: win32
2009-09-04 23:00 Radu Dan Tag Attached: duplicate identifiers
2009-09-04 23:00 Radu Dan Tag Attached: utf8
2009-09-05 01:18 Radu Dan Note Added: 0030431
2009-09-06 19:25 Vincent Snijders LazTarget => -
2009-09-06 19:25 Vincent Snijders Category IDE => Widgetset
2009-09-06 19:52 Vincent Snijders Note Added: 0030484
2009-09-06 19:52 Vincent Snijders Status new => feedback
2009-09-06 19:52 Vincent Snijders LazTarget - => 1.0
2009-09-06 19:52 Vincent Snijders Target Version => 1.0.0
2009-09-06 23:13 Radu Dan File Added: bug.zip
2009-09-06 23:17 Radu Dan Note Added: 0030493
2009-09-08 14:55 Vincent Snijders Status feedback => assigned
2009-09-08 14:55 Vincent Snijders Status assigned => acknowledged
2009-09-08 14:56 Vincent Snijders Product Version 0.9.26.2 => 0.9.29 (SVN)
2009-09-08 21:10 Vincent Snijders Tag Detached: duplicate identifiers
2009-09-08 21:11 Vincent Snijders File Added: bug14504.PNG
2009-09-08 21:13 Vincent Snijders Note Added: 0030549
2009-09-08 21:13 Vincent Snijders Status acknowledged => feedback
2009-09-09 16:24 Radu Dan File Added: vista_screenshot.png
2009-09-09 16:28 Radu Dan Note Added: 0030568
2009-09-09 16:45 Vincent Snijders Note Added: 0030569
2009-09-09 16:51 Vincent Snijders Note Added: 0030570
2009-09-09 17:05 Vincent Snijders Note Added: 0030571
2009-09-09 17:05 Vincent Snijders Note Edited: 0030571
2009-09-10 23:54 Radu Dan Note Added: 0030618
2009-09-11 09:31 Vincent Snijders Note Added: 0030619
2009-09-11 17:37 José Mejuto Note Added: 0030639
2009-09-11 21:07 Radu Dan Note Added: 0030641
2009-09-11 21:38 Vincent Snijders Note Added: 0030642
2009-09-11 21:39 Vincent Snijders Note Added: 0030643
2009-09-12 16:32 José Mejuto Note Added: 0030660
2009-09-14 10:18 Vincent Snijders Note Added: 0030705
2009-09-14 21:34 Radu Dan Note Added: 0030714
2009-09-14 21:56 Vincent Snijders Note Added: 0030715
2009-09-14 21:57 Vincent Snijders Status feedback => resolved
2009-09-14 21:57 Vincent Snijders Resolution open => no change required
2009-09-14 21:57 Vincent Snijders Assigned To => Vincent Snijders
2011-12-01 11:22 Marc Weustink Status resolved => closed
2013-05-06 09:47 Juha Manninen Relationship added related to 0020835