View Issue Details

IDProjectCategoryView StatusLast Update
0027086FPCRTLpublic2016-06-13 10:05
ReporterMattias Gaertner Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Product Version2.7.1 
Summary0027086: FormatSettings do not support changing DefaultSystemCodePage
DescriptionWhen changing the DefaultSystemCodePage the FormatSettings still contain the settings in Windows codepage.
Calling GetFormatSettings calls internally GetLocaleStr, which queries the Ansi function instead of the W function.

Attached patch fixes this.
Steps To ReproduceUse for example locale PT-BR.

SetMultiByteConversionCodePage(CP_UTF8);
GetFormatSettings;
LongDayNames[2] should be 'Terça-feira', but the ç is broken.
TagsNo tags attached.
Fixed in Revision
FPCOldBugId0
FPCTarget
Attached Files

Relationships

related to 0027099 new unit clocale creates invalid characters on Russian Linux 
related to 0028686 new FormatSettings.ThousandSeparator don't accept multbyte character 

Activities

Luiz Americo

2014-11-24 17:00

developer   ~0079365

As a further note, this patch will fix also the RTL compiled with UnicodeString as base type

Thaddy de Koning

2014-11-26 10:56

reporter   ~0079398

Why not an overload? This patch seems to break ANSI.

Mattias Gaertner

2014-11-26 11:14

manager   ~0079399

Theoretically it can break if the GetLocaleInfoA gives a different result than GetLocaleInfoW converted to system code page.
How can this happen?

Thaddy de Koning

2014-11-26 12:56

reporter   ~0079402

Last edited: 2014-11-26 12:58

View 3 revisions

Well, write a test ;)

e.g. Lithuanian

Mattias Gaertner

2014-11-26 15:48

manager   ~0079405

And if the test for Lithuanian runs successfully. Then what?

Mattias Gaertner

2014-11-28 17:09

manager   ~0079449

Here is a patch that uses the A or W function depending on CP_UTF8 and FPC_RTL_UNICODE.

EgonHugeist/ZeosDevTeam

2014-11-29 10:33

reporter   ~0079472

Last edited: 2014-11-29 10:37

View 2 revisions

Propose you stop using the localized W:WideString variable.
It's just a little performance tweak. IMO you could prevent the slow BSTR allocation if you would use something like:

WidestringManager.Unicode2AnsiMoveProc(PWideChar(@Buf[0]), Result, DefaultSystemCodePage, L-1);

Mattias Gaertner

2014-11-29 11:14

manager   ~0079473

Thanks for the hint.
I don't think performance is an issue here as this routine is only called a few times. But the change saves some bytes.
In case of FPC_RTL_UNICODE the result is a UnicodeString.

I uploaded a new version.

EgonHugeist/ZeosDevTeam

2014-11-29 22:58

reporter   ~0079494

Last edited: 2014-11-29 23:01

View 2 revisions

Yep it save some byte:

function GetLocaleStr(aLocaleID, aLCType: Longint; const Def: string): String;
var
  L: Integer;
  Buf: array[0..255] of WideChar;
begin
  L := GetLocaleInfoW(aLocaleID, aLCType, Buf, SizeOf(Buf));
  if L > 0 then
  begin
    {$ifdef FPC_RTL_UNICODE}
    SetString(Result, PWideChar(@Buf[0]), L - 1);
    {$else}
    widestringmanager.Unicode2AnsiMoveProc(@Buf[0], Result, DefaultSystemCodePage, L - 1);
    {$endif}
  end else
    Result := Def;
end;

Allways call the W function if you're dealing with DefaultSystemCodePage...
You don't need the additional code.

Simple, short and the W-functions are prefered.
Cheers, Michael

Mattias Gaertner

2014-11-30 00:04

manager   ~0079495

About only W function: see Thaddy de Koning remark about possible difference between A and W.

EgonHugeist/ZeosDevTeam

2014-11-30 02:35

reporter   ~0079496

Sorry have to pass. You dropped your first patch..

If Thaddy de Koning means dataloss if the DefaultSystemCodePage is changed and WideCharToMultiByte returns some '?' in the Values than it's a user issue from my POV.

They should be aware about this case on dealing with DefaultSystemCodePage!

EgonHugeist/ZeosDevTeam

2014-11-30 03:09

reporter   ~0079497

IF this are the objections of Thaddy:

function GetLocaleStr(aLocaleID, aLCType: Longint; const Def: string): String;
var
  L: Integer;
  Buf: array[0..255] of WideChar;
begin
  L := GetLocaleInfoW(aLocaleID, aLCType, Buf, SizeOf(Buf));
  if L > 0 then
  begin
    {$ifdef FPC_RTL_UNICODE}
    SetString(Result, PWideChar(@Buf[0]), L - 1);
    {$else}
    if DefaultSystemCodePage = CP_UTF8 then
      widestringmanager.Unicode2AnsiMoveProc(@Buf[0], Result, DefaultSystemCodePage, L - 1)
    else
      widestringmanager.Unicode2AnsiMoveProc(@Buf[0], Result, GetACP, L - 1);
    {$endif}
  end else
    Result := Def;
end;

Still only half code... Even if i don't undertand why you should take care about it.
Btw. You have a buffer of 510 Bytes you could also bypass the WideStringManager and use the
  SetString(Result, @Buf[0], WideCharToMultiByte(DefaultSystemCodePage, 0, @Buf[0], L -1, @Buf[0], 510, nil, nil));

which simply reuse Buffer for reading/writing (untestet) but look ugly (:

Mattias Gaertner

2014-11-30 10:33

manager   ~0079501

The point is that GetLocaleInfoW converted to CP_ACP *can* create something other than GetLocalInfoA.

EgonHugeist/ZeosDevTeam

2014-11-30 11:26

reporter   ~0079503

> Why not an overload? This patch seems to break ANSI.

I can't see what you mean. Or my english realy is so bad.
IMO this is what you suggest can happen..

Luiz Americo

2014-11-30 13:16

developer   ~0079513

About the worry using the WideString version of the api:

According to the doc http://msdn.microsoft.com/en-us/library/windows/desktop/dd318101%28v=vs.85%29.aspx the ansi version (GetLocaleInfoA) converts internally from UTF16 to the ANSI code page of the passed locale.

So, the conversion is also done in the ansi version, but with the proposed change is done in fpc side.

Considering that this function is used only internally to initialize the
SysUtils.DefaultFormatSettings variable, i think we should convert always to DefaultSystemCodePage to avoid code page mismatch between this variable and other string variables.

EgonHugeist/ZeosDevTeam

2014-11-30 13:33

reporter   ~0079514

Last edited: 2014-11-30 13:41

View 2 revisions

That's something to work with! You're right. The conversion depends to LocalID parameter.

What do you think about this? If you do not want to follow my "Its a User problem!" idea:


function GetLocaleStr(aLocaleID, aLCType: Longint; const Def: string): String;
var
  L: Integer;
  Buf: array[0..255] of WideChar;
begin
  {$ifndef FPC_RTL_UNICODE}
  if DefaultSystemCodePage <> CP_UTF8 then
    L := GetLocaleInfoA(aLocaleID, aLCType, Buf, SizeOf(Buf)) //conversion depends to LocaleID
  else
  {$endif}
    L := GetLocaleInfoW(aLocaleID, aLCType, Buf, SizeOf(Buf));
  if L > 0 then
    {$ifdef FPC_RTL_UNICODE}
    SetString(Result, PWideChar(@Buf[0]), L - 1);
    {$else}
    if DefaultSystemCodePage = CP_UTF8 then
      widestringmanager.Unicode2AnsiMoveProc(@Buf[0], Result, DefaultSystemCodePage, L - 1)
    else
      SetString(Result, @Buf[0], L - 1)
    {$endif}
  else Result := Def;
end;

EgonHugeist/ZeosDevTeam

2014-11-30 13:43

reporter   ~0079515

From my POV the DefaultSystemCodePage should be the only interesting indicator here -> my full Version1 (:

Thaddy de Koning

2014-11-30 16:33

reporter   ~0079518

Last edited: 2014-11-30 16:36

View 2 revisions

Have a look at LOCALE_INVARIANT, which is the only locale that is guaranteed to be consistent across users and systems and thus should serve as the base for conversions as discussed here.

Luiz Americo

2014-11-30 16:59

developer   ~0079519

GetLocaleStr is an internal function, is not exposed to users.

The locale passed to GetLocaleStr is always returned by GetThreadLocale.

So we should not bother to make GetLocaleStr handle custom locales.

Using GetLocaleInfoW we ensure to get the data in a loss less, native format.

It's up to fpc side decide what conversion should be done.

To get the code page to be encoded we should answer this question:

What code page encoding should SysUtils.DefaultFormatSettings be?

DefaultSystemCodePage or CP_ACP?

IMO DefaultSystemCodePage.

Why?
To avoid mixing strings with different encodings in SysUtils and other base units.

Mattias Gaertner

2014-11-30 18:48

manager   ~0079520

Thanks Luiz for the "The ANSI string retrieved by the ANSI version of this function is translated from Unicode". So indeed the GetLocaleInfoA is not needed.
All code in the "Ansi" RTL expect the FormatSettings in DefaultSystemCodePage.

Which boils down to

function GetLocaleStr(aLocaleID, aLCType: Longint; const Def: string): String;
var
  L: Integer;
  Buf: array[0..255] of WideChar;
begin
  L := GetLocaleInfoW(aLocaleID, aLCType, Buf, SizeOf(Buf));
  if L > 0 then
    {$ifdef FPC_RTL_UNICODE}
    SetString(Result, PWideChar(@Buf[0]), L - 1);
    {$else}
    widestringmanager.Unicode2AnsiMoveProc(@Buf[0], Result, DefaultSystemCodePage, L - 1)
    {$endif}
  else
    Result := Def;
end;

EgonHugeist/ZeosDevTeam

2014-12-01 01:07

reporter   ~0079524

Agree.

Note: me and you have a little syntax issue: a ';' needs to be removed, like:

function GetLocaleStr(aLocaleID, aLCType: Longint; const Def: string): String;
var
  L: Integer;
  Buf: array[0..255] of WideChar;
begin
  L := GetLocaleInfoW(aLocaleID, aLCType, Buf, SizeOf(Buf));
  if L > 0 then
    {$ifdef FPC_RTL_UNICODE}
    SetString(Result, PWideChar(@Buf[0]), L - 1)
    {$else}
    widestringmanager.Unicode2AnsiMoveProc(@Buf[0], Result, DefaultSystemCodePage, L - 1)
    {$endif}
  else
    Result := Def;
end;

Matthias wrote:
>I don't think performance is an issue here as this routine is only called a few times.

Please stop thinking this way. All RTL functions should archive best performance as possible. Getting "something" running was a show-stopper on Zeos pre 7.2. I invested loads of time to "optimize" the code. Do not run into same issues.

Mattias Gaertner

2014-12-01 10:09

manager  

formatsettings.patch (1,125 bytes)   
Index: rtl/win/sysutils.pp
===================================================================
--- rtl/win/sysutils.pp	(revision 29171)
+++ rtl/win/sysutils.pp	(working copy)
@@ -669,19 +669,23 @@
                               Locale Functions
 ****************************************************************************}
 
-function GetLocaleStr(LID, LT: Longint; const Def: string): ShortString;
+function GetLocaleStr(aLocaleID, aLCType: Longint; const Def: string): String;
 var
   L: Integer;
-  Buf: array[0..255] of Char;
+  Buf: array[0..255] of WideChar;
 begin
-  L := GetLocaleInfoA(LID, LT, Buf, SizeOf(Buf));
+  L := GetLocaleInfoW(aLocaleID, aLCType, Buf, SizeOf(Buf));
   if L > 0 then
-    SetString(Result, @Buf[0], L - 1)
-  else
+  begin
+    {$ifdef FPC_RTL_UNICODE}
+    SetString(Result, PWideChar(@Buf[0]), L - 1);
+    {$else}
+    widestringmanager.Unicode2AnsiMoveProc(@Buf[0], Result, DefaultSystemCodePage, L - 1);
+    {$endif}
+  end else
     Result := Def;
 end;
 
-
 function GetLocaleChar(LID, LT: Longint; Def: Char): Char;
 var
   Buf: array[0..3] of Char; // sdate allows 4 chars.
formatsettings.patch (1,125 bytes)   

Mattias Gaertner

2014-12-01 10:12

manager   ~0079529

I uploaded the patch.

EgonHugeist/ZeosDevTeam

2014-12-01 11:21

reporter   ~0079532

Ok the added begin / end will do the job too.

Ondrej Pokorny

2016-06-12 10:26

developer   ~0093145

GetLocaleInfoW needs Length(Buf) not SizeOf(Buf)

Ondrej Pokorny

2016-06-13 10:05

developer  

formatsettings-2.patch (1,125 bytes)   
Index: rtl/win/sysutils.pp
===================================================================
--- rtl/win/sysutils.pp	(revision 29171)
+++ rtl/win/sysutils.pp	(working copy)
@@ -669,19 +669,23 @@
                               Locale Functions
 ****************************************************************************}
 
-function GetLocaleStr(LID, LT: Longint; const Def: string): ShortString;
+function GetLocaleStr(aLocaleID, aLCType: Longint; const Def: string): String;
 var
   L: Integer;
-  Buf: array[0..255] of Char;
+  Buf: array[0..255] of WideChar;
 begin
-  L := GetLocaleInfoA(LID, LT, Buf, SizeOf(Buf));
+  L := GetLocaleInfoW(aLocaleID, aLCType, Buf, Length(Buf));
   if L > 0 then
-    SetString(Result, @Buf[0], L - 1)
-  else
+  begin
+    {$ifdef FPC_RTL_UNICODE}
+    SetString(Result, PWideChar(@Buf[0]), L - 1);
+    {$else}
+    widestringmanager.Unicode2AnsiMoveProc(@Buf[0], Result, DefaultSystemCodePage, L - 1);
+    {$endif}
+  end else
     Result := Def;
 end;
 
-
 function GetLocaleChar(LID, LT: Longint; Def: Char): Char;
 var
   Buf: array[0..3] of Char; // sdate allows 4 chars.
formatsettings-2.patch (1,125 bytes)   

Issue History

Date Modified Username Field Change
2014-11-24 14:41 Mattias Gaertner New Issue
2014-11-24 14:41 Mattias Gaertner File Added: formatsettings.patch
2014-11-24 17:00 Luiz Americo Note Added: 0079365
2014-11-26 10:56 Thaddy de Koning Note Added: 0079398
2014-11-26 11:14 Mattias Gaertner Note Added: 0079399
2014-11-26 12:56 Thaddy de Koning Note Added: 0079402
2014-11-26 12:58 Thaddy de Koning Note Edited: 0079402 View Revisions
2014-11-26 12:58 Thaddy de Koning Note Edited: 0079402 View Revisions
2014-11-26 15:48 Mattias Gaertner Note Added: 0079405
2014-11-28 17:08 Mattias Gaertner File Deleted: formatsettings.patch
2014-11-28 17:08 Mattias Gaertner File Added: formatsettings.patch
2014-11-28 17:09 Mattias Gaertner Note Added: 0079449
2014-11-29 10:33 EgonHugeist/ZeosDevTeam Note Added: 0079472
2014-11-29 10:37 EgonHugeist/ZeosDevTeam Note Edited: 0079472 View Revisions
2014-11-29 11:11 Mattias Gaertner File Deleted: formatsettings.patch
2014-11-29 11:11 Mattias Gaertner File Added: formatsettings.patch
2014-11-29 11:14 Mattias Gaertner Note Added: 0079473
2014-11-29 22:58 EgonHugeist/ZeosDevTeam Note Added: 0079494
2014-11-29 23:01 EgonHugeist/ZeosDevTeam Note Edited: 0079494 View Revisions
2014-11-30 00:04 Mattias Gaertner Note Added: 0079495
2014-11-30 02:35 EgonHugeist/ZeosDevTeam Note Added: 0079496
2014-11-30 03:09 EgonHugeist/ZeosDevTeam Note Added: 0079497
2014-11-30 10:33 Mattias Gaertner Note Added: 0079501
2014-11-30 11:26 EgonHugeist/ZeosDevTeam Note Added: 0079503
2014-11-30 13:16 Luiz Americo Note Added: 0079513
2014-11-30 13:33 EgonHugeist/ZeosDevTeam Note Added: 0079514
2014-11-30 13:41 EgonHugeist/ZeosDevTeam Note Edited: 0079514 View Revisions
2014-11-30 13:43 EgonHugeist/ZeosDevTeam Note Added: 0079515
2014-11-30 16:33 Thaddy de Koning Note Added: 0079518
2014-11-30 16:36 Thaddy de Koning Note Edited: 0079518 View Revisions
2014-11-30 16:59 Luiz Americo Note Added: 0079519
2014-11-30 18:48 Mattias Gaertner Note Added: 0079520
2014-12-01 01:07 EgonHugeist/ZeosDevTeam Note Added: 0079524
2014-12-01 10:09 Mattias Gaertner File Deleted: formatsettings.patch
2014-12-01 10:09 Mattias Gaertner File Added: formatsettings.patch
2014-12-01 10:12 Mattias Gaertner Note Added: 0079529
2014-12-01 11:21 EgonHugeist/ZeosDevTeam Note Added: 0079532
2014-12-08 12:15 Marco van de Voort Relationship added related to 0027099
2015-10-01 07:46 Jonas Maebe Relationship added related to 0028686
2016-06-12 10:26 Ondrej Pokorny Note Added: 0093145
2016-06-13 10:05 Ondrej Pokorny File Added: formatsettings-2.patch