View Issue Details

IDProjectCategoryView StatusLast Update
0035060FPCFCLpublic2019-02-23 14:23
ReporterBart Broersma Assigned ToJoost van der Sluis  
Status closedResolutionfixed 
Product Version3.3.1 
Target Version3.2.0Fixed in Version3.3.1 
Summary0035060: TRegistry: regression when reading a key that has unicode charcters in it's name
DescriptionIf you try to read a key, that has unicode characters in it's name the result will be an empty string.
Steps To ReproduceOpen regedit.
In HKCU\Software create the key 'XXXXXXXXXX' (10 X's)
In HKCU\Sotware\XXXXXXXXXX create the string with name 'äëï' and set content to anything other than an empty string.

Build and runn attached program.
Note: the sourcode is encoded in cp1252.

Output with fpc 3.0.4 and Delphi 7:

Output with fpctrunk:
Additional InformationIn current registry implementation, before querying the API the AnsiString parameters are converted to UnicodeString using Utf8Decode().
In the attached example project Utf8Decode(Name) will give you (represented as word values): 003F 003F 003F, while the proper representation of Name as UnicodeString is 00E4 00EB 00EF.
TagsNo tags attached.
Fixed in Revisionr41325, r41415
Attached Files


Bart Broersma

2019-02-10 22:14


notascii.lpr (2,114 bytes)

Bart Broersma

2019-02-11 09:49

reporter   ~0114026

Fixes 3.2 has the same issue and since this is a regression it would be nice if it were to be fixed (and merged) before final 3.2.0 is out.

Bart Broersma

2019-02-11 10:50


registry.stringtounicode.diff (2,550 bytes)   
Index: packages/fcl-registry/src/
--- packages/fcl-registry/src/	(revision 41290)
+++ packages/fcl-registry/src/	(working copy)
@@ -52,7 +52,7 @@
   SecurityAttributes := Nil;
-  u:=UTF8Decode(PrepKey(Key));
+  u:=UnicodeString(PrepKey(Key));
@@ -71,7 +71,7 @@
   u: UnicodeString;
-  u:=UTF8Decode(PRepKey(Key));
+  u:=UnicodeString(PRepKey(Key));
@@ -78,7 +78,7 @@
 function TRegistry.DeleteValue(const Name: String): Boolean;
-  FLastError:= RegDeleteValueW(fCurrentKey, PWideChar(UTF8Decode(Name)));
+  FLastError:= RegDeleteValueW(fCurrentKey, PWideChar(UnicodeString(Name)));
@@ -89,7 +89,7 @@
   RD : DWord;
-  u := UTF8Decode(Name);
+  u := UnicodeString(Name);
   if (FLastError<>ERROR_SUCCESS) Then
@@ -110,7 +110,7 @@
   RD : DWord;
-  u:=UTF8Decode(ValueName);
+  u:=UnicodeString(ValueName);
   With Value do
@@ -147,7 +147,7 @@
 {$ifdef WinCE}
 {$else WinCE}
-  u:=UTF8Decode(S);
+  u:=UnicodeString(S);
 {$endif WinCE}
@@ -212,7 +212,7 @@
   S: string;
   SecurityAttributes := Nil;
-  u:=UTF8Decode(PrepKey(Key));
+  u:=UnicodeString(PrepKey(Key));
   If CanCreate then
@@ -260,7 +260,7 @@
 {$ifdef WinCE}
-  FLastError:=RegConnectRegistryW(PWideChar(UTF8Decode(UNCName)),RootKey,newroot);
+  FLastError:=RegConnectRegistryW(PWideChar(UnicodeString(UNCName)),RootKey,newroot);
   if Result then begin
@@ -422,7 +422,7 @@
-  u:=UTF8Decode(Name);
+  u:=UnicodeString(Name);
registry.stringtounicode.diff (2,550 bytes)   

Bart Broersma

2019-02-11 10:53

reporter   ~0114027

Possible patch attached in file registry.stringtounicode.diff.

(I left the instances of Utf8Encode for return values intact, since this will set the codepage of the string to CP_UTF8 and conversion to current system codepage will be triggerd later if necessary.)

Bart Broersma

2019-02-12 12:09


tw35060.pp (4,248 bytes)

Bart Broersma

2019-02-12 12:19

reporter   ~0114051

Last edited: 2019-02-12 12:20

View 2 revisions

Please find attached a test program tw35060.pp.
It is supposed to go in /tests/test/packages/fcl-registry

The program uses plain Windows API call to create the Key:
The program will fail if OpenKeyReadOnly fails on that key.
The program will (try to) remove the test key afterwards (again using plain API calls).

Currently fpc trunk fails with:
EAssertionFailed: OpenKey('Software\Bug0035060\äëï') failed: "De bewerking is voltooid." [0] (tw35060.pp, line 142)
(The strange part of this is that GetLastOSError returns 0 in this case.)

The test program can be compiled with Delphi 7 and runs fine.
For testing with newer Delphis (if that is a requirement) all occurrences of "String" must be replaced wint "AnsiString".
If needed I will adjust tw35060.pp

Thaddy de Koning

2019-02-12 13:00

reporter   ~0114052

Last edited: 2019-02-12 13:02

View 3 revisions

Bart, the only way to do this properly is to make the code Ansi and UTF16 agnostic. There should not be any explicit UTF8 in the Freepascal rtl and packages. Making it utf16 agnostic will mean it is also assignment compatible to UTF8 on the Lazarus side. In that case string can stay string.
IOW AnsiString and UnicodeString would suffice.

Bart Broersma

2019-02-12 14:29

reporter   ~0114053

IMO returned stringvalues (not stringlists) that are UTF8 encoded (and which have there StringCodePage properly set to CP_UTF8) should cause no problems on the caller side.
Canging all Utf8Encode(AUnicodeStringVar) into String(AUnicodeStringVar) is not necessary to resolve this issue, hence I did not do that.

Joost van der Sluis

2019-02-15 21:31

manager   ~0114163

Thanks for the bug report. The proposed patch was not right, though. But the provided test was very useful (tests always are).
I duplicated your test and adapted it to see if the 'old' behavior with UTF8-encoded strings did still work. This was not the case.
At first I thought I made a mistake in the test, but in the end I discovered that the strings were converted into pchar's by the fcl-registry code. And thus lost all their codepage-information. Well, it's seems to be fixed now. Can you confirm this?

I did not touch the utf8encode functions for now, but I agree that they should be removed. (But not without a proper test)

Bart Broersma

2019-02-16 09:24

reporter   ~0114172

The tests run fine now, but I think the tests for the correct encoding of Name should use RawByteString as parameter, to prevent unwanted codepage conversion inside the test.
Patch attached.

Bart Broersma

2019-02-16 09:25


registrytests.diff (868 bytes)   
Index: tests/test/packages/fcl-registry/tw35060a.pp
--- tests/test/packages/fcl-registry/tw35060a.pp	(revision 41332)
+++ tests/test/packages/fcl-registry/tw35060a.pp	(working copy)
@@ -38,7 +38,7 @@
   Result := Trim(Result);
-function AnsiToHex(const S: String): String;
+function AnsiToHex(const S: RawByteString): String;
   i: Integer;
Index: tests/test/packages/fcl-registry/tw35060b.pp
--- tests/test/packages/fcl-registry/tw35060b.pp	(revision 41332)
+++ tests/test/packages/fcl-registry/tw35060b.pp	(working copy)
@@ -38,7 +38,7 @@
   Result := Trim(Result);
-function Utf8ToHex(const S: String): String;
+function Utf8ToHex(const S: RawByteString): String;
   i: Integer;
registrytests.diff (868 bytes)   

Bart Broersma

2019-02-17 11:33

reporter   ~0114214

I noticed that there are still 2 occurrences of Utf8Decode in registry.pp.
Attached patch: registry.stringtounicode.2.diff

Bart Broersma

2019-02-17 11:34


registry.stringtounicode.2.diff (547 bytes)   
Index: packages/fcl-registry/src/registry.pp
--- packages/fcl-registry/src/registry.pp	(revision 41343)
+++ packages/fcl-registry/src/registry.pp	(working copy)
@@ -504,7 +504,7 @@
   u: UnicodeString;
-  u:=UTF8Decode(Value);
+  u:=Value;
   PutData(Name, PWideChar(u), ByteLength(u), rdExpandString);
@@ -538,7 +538,7 @@
   u: UnicodeString;
-  u:=UTF8Decode(Value);
+  u:=Value;
   PutData(Name, PWideChar(u), ByteLength(u), rdString);

Thaddy de Koning

2019-02-17 12:52

reporter   ~0114216

These were left in intentional, but nice work Bart: the patch works.

Joost van der Sluis

2019-02-17 20:45

manager   ~0114226

Bart, I don't think that the parameters should be of type rawbytestring. I checked that by querying System.StringCodePage() on a few locations, and there are no changes in the dynamic codepage?

Joost van der Sluis

2019-02-17 20:46

manager   ~0114227

Thaddy, what do you mean that they were left in intentional? At first sight I think that Bart is right, and they should be removed?
Bart could you create a test for this?

Bart Broersma

2019-02-17 21:02

reporter   ~0114229

1. I'll try to make some test. May take some time.
2. I think from a theoretical standpoint RawByteString is correct? Nevertheless it seems to work OK, but I was afraid that was just "implementation detail".
I'll leave that up to your discretion.

@Thaddy: I overlooked them in my initial patch, because I only searched for Utf8Decode in the file, not in registry.pp itself.

Bart Broersma

2019-02-17 22:00


tw35060c.pp (4,507 bytes)

Bart Broersma

2019-02-17 22:02

reporter   ~0114232

Last edited: 2019-02-17 22:03

View 2 revisions

Test tw35060c.pp shows that WriteString currently translates unicode characters to '?'.
I did not bother testing WriteExpandString, since it uses the same mechanism.

The posted patch fixes it.

Let me know if you need a codepage UTF8 version of the test.

Bart Broersma

2019-02-18 10:37

reporter   ~0114237

tw35060c.pp misses the first line direcctive for test suite:
{ %TARGET=win32,win64,wince }

Can you add it yourself?

Joost van der Sluis

2019-02-22 22:29

manager   ~0114355

Thanks, applied.

And no, from a theoretical standpoint rawbytestring is not correct. Simply use string. Always use string, unless it is really necessary to use rawbytestring. (It's like casting a class-instance to a pointer. You override the checks the compiler made for you)

Bart Broersma

2019-02-23 14:23

reporter   ~0114364


> And no, from a theoretical standpoint rawbytestring is not correct.
Thanks for pointing that out.

Issue History

Date Modified Username Field Change
2019-02-10 22:14 Bart Broersma New Issue
2019-02-10 22:14 Bart Broersma File Added: notascii.lpr
2019-02-11 09:49 Bart Broersma Note Added: 0114026
2019-02-11 10:50 Bart Broersma File Added: registry.stringtounicode.diff
2019-02-11 10:53 Bart Broersma Note Added: 0114027
2019-02-12 12:09 Bart Broersma File Added: tw35060.pp
2019-02-12 12:19 Bart Broersma Note Added: 0114051
2019-02-12 12:20 Bart Broersma Note Edited: 0114051 View Revisions
2019-02-12 13:00 Thaddy de Koning Note Added: 0114052
2019-02-12 13:00 Thaddy de Koning Note Edited: 0114052 View Revisions
2019-02-12 13:02 Thaddy de Koning Note Edited: 0114052 View Revisions
2019-02-12 14:29 Bart Broersma Note Added: 0114053
2019-02-12 14:41 Joost van der Sluis Assigned To => Joost van der Sluis
2019-02-12 14:41 Joost van der Sluis Status new => assigned
2019-02-15 21:31 Joost van der Sluis Fixed in Revision => r41325
2019-02-15 21:31 Joost van der Sluis Note Added: 0114163
2019-02-15 21:31 Joost van der Sluis Status assigned => resolved
2019-02-15 21:31 Joost van der Sluis Fixed in Version => 3.3.1
2019-02-15 21:31 Joost van der Sluis Resolution open => fixed
2019-02-15 21:31 Joost van der Sluis Target Version => 3.2.0
2019-02-16 09:24 Bart Broersma Note Added: 0114172
2019-02-16 09:24 Bart Broersma Status resolved => feedback
2019-02-16 09:24 Bart Broersma Resolution fixed => reopened
2019-02-16 09:25 Bart Broersma File Added: registrytests.diff
2019-02-17 11:33 Bart Broersma Note Added: 0114214
2019-02-17 11:33 Bart Broersma Status feedback => assigned
2019-02-17 11:34 Bart Broersma File Added: registry.stringtounicode.2.diff
2019-02-17 12:52 Thaddy de Koning Note Added: 0114216
2019-02-17 20:45 Joost van der Sluis Note Added: 0114226
2019-02-17 20:46 Joost van der Sluis Note Added: 0114227
2019-02-17 21:02 Bart Broersma Note Added: 0114229
2019-02-17 22:00 Bart Broersma File Added: tw35060c.pp
2019-02-17 22:02 Bart Broersma Note Added: 0114232
2019-02-17 22:03 Bart Broersma Note Edited: 0114232 View Revisions
2019-02-18 10:37 Bart Broersma Note Added: 0114237
2019-02-22 22:29 Joost van der Sluis Fixed in Revision r41325 => r41325, r41415
2019-02-22 22:29 Joost van der Sluis Note Added: 0114355
2019-02-22 22:29 Joost van der Sluis Status assigned => resolved
2019-02-22 22:29 Joost van der Sluis Resolution reopened => fixed
2019-02-23 14:23 Bart Broersma Note Added: 0114364
2019-02-23 14:23 Bart Broersma Status resolved => closed