View Issue Details

IDProjectCategoryView StatusLast Update
0032961LazarusLazUtilspublic2019-04-28 11:07
ReporterKevin Morris Assigned ToMattias Gaertner  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Target Version1.10 
Summary0032961: There is a problem with the default code page and streams
DescriptionThe problem is that Lazarus is expecting codepage UTF8 but the DefaultCodePage in Windows is windows-1252.

In the Lazarus forum, I recieved enough help that we were able to determine that the problem was the default code page.

This problem affects EncodeStringBase64 and DecodeStringBase64 when either deals with anything other than characters a..A, z..Z and 0..9.

Once the Base64 messes up a string encode, then the subsequent actions you may want to do to the MIME string are messed up too, namely: BlowFish encryption.
Steps To ReproduceSee my example project. UPDATE: My example project is too big to be uploaded. Which is strange bc I was able to upload this exact project .zip to the Lazarus forum website. Anyway, see my example project in the "Additional Information memo field. Just below this paragraph, is the way I was told to cure this problem. Please note that I never enc_utf8.Free bc that causes a memory access error. I just leave it, so I hope my Lazarus application frees that memory when the app is closed. If it doesn't, then I've just created a memory leak.

uses
  LazUTF8
...
// Global var
var
  enc_utf8: TEncoding;
...

procedure SetDefaultEncoding; // I call this in my main form's FormCreate
begin
  if not Assigned(enc_utf8) then
    begin
      TEncoding.FreeEncodings;
      enc_utf8 := TMBCSEncoding.Create(DefaultSystemCodePage); // Do not manually enc_utf8.Free; It will be freed by another object.
    end;
end;
Additional Information.

{

  Built with Lazarus 1.9 and Fpc 3.1.1

  When clicking Button2 the error I get is:


  [Debugger Exception Notification]

    Project Base64_Blowfish_Stream_Bug raised exception class 'EReadError' with message:
    Stream read error

    At address 10003AFA1


    [Ignore this exception type]

    [Break] [Continue]



}

unit Unit1;

{$mode objfpc}{$H+}

interface

uses
  LazUTF8, Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls, ExtCtrls,
  Base64, Blowfish;

type

  { TForm1 }

  TForm1 = class(TForm)
    Button1: TButton;
    Button2: TButton;
    GroupBox1: TGroupBox;
    GroupBox2: TGroupBox;
    LabeledEdit1: TLabeledEdit;
    LabeledEdit2: TLabeledEdit;
    LabeledEdit3: TLabeledEdit;
    LabeledEdit4: TLabeledEdit;
    procedure Button1Click(Sender: TObject);
    procedure Button2Click(Sender: TObject);
    procedure FormClose(Sender: TObject; var CloseAction: TCloseAction);
  private
    mime: String;
    in_s, out_s: String;
    enc_utf8: TEncoding;

    function EncryptItB64(const pwd, plain_text: String): String;
    function DecryptItB64(const pwd, encrypted_str_b64: String; var decrypted_str: String): Boolean;

  public

  end;

const
  master_pwd = '12345678';

var
  Form1: TForm1;

implementation

{$R *.lfm}

{ TForm1 }

procedure TForm1.Button1Click(Sender: TObject);
begin
  in_s := LabeledEdit1.Text;
  mime := EncodeStringBase64(in_s);
  ShowMessage('mime: ' + mime);

  out_s := DecodeStringBase64(mime);
  ShowMessage('out_s: ' + out_s);

  LabeledEdit2.Text := out_s;
end;

procedure TForm1.FormClose(Sender: TObject; var CloseAction: TCloseAction);
begin
  //if Assigned(enc) then
    //enc.Free;
end;

procedure TForm1.Button2Click(Sender: TObject);
begin
  try
    if not Assigned(enc_utf8) then
      begin
        TEncoding.FreeEncodings;
        enc_utf8 := TMBCSEncoding.Create(DefaultSystemCodePage); // Do not manually enc_utf8.Free; It will be freed by another object.
      end;

    in_s := LabeledEdit3.Text;
    mime := EncryptItB64(master_pwd, in_s);
    ShowMessage('mime: ' + mime);

    if DecryptItB64(master_pwd, mime, out_s) then
      ShowMessage('out_s: ' + out_s)
    else
      ShowMessage('Nope');

    LabeledEdit4.Text := out_s;

  finally
  end;
end;

function TForm1.EncryptItB64(const pwd, plain_text: String): String;
var
  bfsh: TBlowFishEncryptStream;
  strm: TStringStream;
  byt_str: RawByteString;

begin
  Result := '';
  byt_str := '';

  if Length(plain_text) = 0 then
    Exit;

  strm := nil;
  bfsh := nil;
  try
    strm := TStringStream.Create('');
    bfsh := TBlowfishEncryptStream.Create(pwd, strm);
    bfsh.WriteAnsiString(plain_text);
    bfsh.Flush;

    strm.Seek(0, soFromBeginning);
    byt_str := strm.Datastring;
    Result := EncodeStringBase64(byt_str);

  finally
    bfsh.Free; // <--- MUST be freed in THIS ORDER!
    strm.free;
  end;

end;

function TForm1.DecryptItB64(const pwd, encrypted_str_b64: String; var decrypted_str: String): Boolean;
var
  bfsh: TBlowFishDecryptStream;
  strm: TStringStream;
  byt_str: RawByteString;
begin
  Result := False;
  byt_str := '';
  decrypted_str := '';

  if Length(encrypted_str_b64) = 0 then
    Exit;

  strm := nil;
  bfsh := nil;
  try
    try
      byt_str := DecodeStringBase64(encrypted_str_b64);

      strm := TStringStream.Create(byt_str);
      strm.Seek(0, soFromBeginning);
      bfsh := TBlowfishDecryptStream.Create(pwd, strm);
      decrypted_str := bfsh.ReadAnsiString;
      Result := True;

    except
      on E: Exception do
        begin
          decrypted_str := '';
          Result := False;
        end;
    end;

  finally
    bfsh.Free; // <--- MUST be freed in THIS ORDER!
    strm.Free;
  end;

end;


end.

TagsNo tags attached.
Fixed in Revision
LazTarget1.10
Widgetset
Attached Files

Relationships

related to 0033681 closedMichael Van Canneyt FPC TEncoding.FreeEncodings does not assign nil to FStandardEncodings 
related to 0034856 resolvedMichael Van Canneyt FPC access violation when program ends after calling TEncoding.FreeEncodings 

Activities

Thaddy de Koning

2018-01-07 12:40

reporter   ~0105431

Last edited: 2018-01-07 12:42

View 4 revisions

This is not a compiler issue. The compiler aliases "string" to whatever legal string type is current: either shortstring, ansistring or UTF16 Unicodestring.
UTF8 is a Lazarus bold-on alias for "string", although FPC knows UTF8String.
The way I handle this is to simply refactor the string into AnsiString. Even with Lazarus default UTF8 type for "string" it should use the system default CP when the string type is explicitly defined as Ansi.

Thaddy de Koning

2018-01-07 13:24

reporter   ~0105432

Or use bytebuffers.

Marco van de Voort

2018-01-07 21:27

manager   ~0105465

Last edited: 2018-01-09 10:46

View 2 revisions

According to the maillist the problem is in tstringstream's use of TEncodign that is possibly not updated when lazarus changes DefaultSystemCodePage.

http://forum.lazarus-ide.org/index.php/topic,39485.30/topicseen.html

Thaddy de Koning

2018-01-09 16:53

reporter   ~0105542

So we are only talking about TStringStream, not TStream and all other decendants?
Streams are byte wise and should stay so.

Michael Van Canneyt

2018-01-13 18:41

administrator   ~0105748

Last edited: 2018-01-13 18:55

View 2 revisions

I investigated this. The problem is indeed in the lazutf8 unit.
 
After it changes the default encoding, it should simply free the default encodings. The encodings will be regenerated on first use.

I have documented this in the TEncoding and DefaultSystemCodePage identifiers.
(rev 1458.)

As an aside:
after all the changes in the encoding it's not a good idea to rely on the defaults. If it is really necessary to create a stringstream, always specify an encoding.

engkin

2018-06-01 23:01

reporter  

FreeEncodings.patch (609 bytes)   
Index: fpcadds.pas
===================================================================
--- fpcadds.pas	(revision 58071)
+++ fpcadds.pas	(working copy)
@@ -63,6 +63,11 @@
 {$ifdef UTF8_RTL}
 initialization
   SetMultiByteConversionCodePage(CP_UTF8);
+  TEncoding.FreeEncodings;
+  {$IF Defined(FPC_FULLVERSION) and (FPC_FULLVERSION <= 30004)}
+  // Workaround TEncoding.FreeEncodings bug #33681
+  TMBCSEncoding.Create(DefaultSystemCodePage);
+  {$ENDIF}
   // SetMultiByteFileSystemCodePage(CP_UTF8); not needed, this is the default under Windows
   SetMultiByteRTLFileSystemCodePage(CP_UTF8);
 {$IFEND}
FreeEncodings.patch (609 bytes)   

engkin

2018-06-01 23:04

reporter   ~0108638

Since TEncoding bug 0033681 is fixed, this can be fixed as well.

Patch attached includes a workaround for FPC versions <= 3.0.4 affected by 33681.

Benito van der Zander

2019-01-19 13:21

reporter   ~0113476

This is bullshit

- You cannot call FreeEncodings, when it causes crashes 0033681, 0034856

- You must never free an encoding. Without reference counting, any other part of the program can still has a reference to the encoding, so there is no way from preventing it to crash on freeing

- You do not want to free all the other encodings besides the default encoding anyways. Unicode encoding stays an unicode encoding
 
- SetMultiByteConversionCodePage says it will handle additional actions necessary after changing DefaultSystemCodePage, which is like the perfect place to do something about the encodings. That was the whole point of calling SetMultiByteConversionCodePage rather than setting DefaultSystemCodePage directly.

engkin

2019-01-20 02:04

reporter   ~0113507

I wholeheartedly agree. The patch I had included is total trash.

Benito van der Zander

2019-02-27 00:56

reporter  

sysutils.encoding.patch (2,752 bytes)   
Index: rtl/objpas/sysutils/sysencoding.inc
===================================================================
--- rtl/objpas/sysutils/sysencoding.inc	(Revision 40721)
+++ rtl/objpas/sysutils/sysencoding.inc	(Arbeitskopie)
@@ -16,6 +16,23 @@
 { TEncoding }
 
 class function TEncoding.GetANSI: TEncoding;
+  procedure UpdateAnsi;
+  var
+    i: SizeInt;
+    Temp: TEncoding;
+  begin
+    for i := 0 to high(FOldEncodings) do
+      if FOldEncodings[i].CodePage = DefaultSystemCodePage then begin
+        Temp := FStandardEncodings[seAnsi];
+        FStandardEncodings[seAnsi] := FOldEncodings[i];
+        FOldEncodings[i] := temp;
+        exit;
+      end;
+    SetLength(FOldEncodings, length(FOldEncodings) + 1);
+    FOldEncodings[high(FOldEncodings)] := FStandardEncodings[seAnsi];
+    FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage);
+  end;
+
 begin
 {$ifdef FPC_HAS_FEATURE_THREADING}
   EnterCriticalSection(FLock);
@@ -22,7 +39,9 @@
   try
 {$endif}
     if not Assigned(FStandardEncodings[seAnsi]) then
-      FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage);
+      FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage)
+    else if FStandardEncodings[seAnsi].CodePage <> DefaultSystemCodePage then
+      UpdateAnsi;
 {$ifdef FPC_HAS_FEATURE_THREADING}
   finally
     LeaveCriticalSection(FLock);
@@ -162,6 +181,7 @@
 begin
   for E := Low(FStandardEncodings) to High(FStandardEncodings) do
     FStandardEncodings[E] := nil;
+  FOldEncodings := nil;
 {$ifdef FPC_HAS_FEATURE_THREADING}
   InitCriticalSection(FLock);
 {$endif}
@@ -168,8 +188,12 @@
 end;
 
 class destructor TEncoding.Destroy;
+var
+  i: SizeInt;
 begin
   FreeEncodings;
+  for i := 0 to high(FOldEncodings) do FOldEncodings[i].free;
+  FOldEncodings := nil;
 {$ifdef FPC_HAS_FEATURE_THREADING}
   DoneCriticalSection(FLock);
 {$endif}
@@ -196,10 +220,14 @@
 var
   Encoding: TEncoding;
 begin
-  if Assigned(AEncoding) then
+  if Assigned(AEncoding) then begin
     for Encoding in FStandardEncodings do
       if Encoding = AEncoding then
         Exit(True);
+    for Encoding in FOldEncodings do
+      if Encoding = AEncoding then
+        Exit(True);
+  end;
   Result := False;
 end;
 
Index: rtl/objpas/sysutils/sysencodingh.inc
===================================================================
--- rtl/objpas/sysutils/sysencodingh.inc	(Revision 40721)
+++ rtl/objpas/sysutils/sysencodingh.inc	(Arbeitskopie)
@@ -30,6 +30,8 @@
         seUTF8);
     var
       FStandardEncodings: array[TStandardEncoding] of TEncoding; static;
+    class var
+      FOldEncodings: array of TEncoding;
     Class Var
       FLock : TRTLCriticalSection;
     class function GetANSI: TEncoding; static;
sysutils.encoding.patch (2,752 bytes)   

Do-wan Kim

2019-02-27 11:18

reporter   ~0114485

Last edited: 2019-02-27 11:50

View 2 revisions

To Avoid this encoding problem, first bytes -> string conversion result type is must be 'RawByteString'.

(Edit) Add missing parts.

Do-wan Kim

2019-02-27 11:25

reporter  

sysutils-32961.patch (4,251 bytes)   
Index: rtl/objpas/sysutils/sysencoding.inc
===================================================================
--- rtl/objpas/sysutils/sysencoding.inc	(revision 41505)
+++ rtl/objpas/sysutils/sysencoding.inc	(working copy)
@@ -497,11 +497,11 @@
     Move(S[1], Result[0], Length(S));
 end;
 
-function TMBCSEncoding.GetAnsiString(Bytes: PByte; ByteCount: Integer): string;
+function TMBCSEncoding.GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString;
 begin
   SetString(Result, Pointer(Bytes), ByteCount);
-  SetCodePage(RawByteString(Result), GetCodePage, False);
-  SetCodePage(RawByteString(Result), DefaultSystemCodePage, True);
+  SetCodePage(Result, GetCodePage, False);
+  SetCodePage(Result, DefaultSystemCodePage, True);
 end;
 
 function TMBCSEncoding.GetMaxByteCount(CharCount: Integer): Integer;
@@ -638,9 +638,9 @@
 end;
 
 function TUnicodeEncoding.GetAnsiString(Bytes: PByte; ByteCount: Integer
-  ): string;
+  ): RawByteString;
 begin
-  widestringmanager.Unicode2AnsiMoveProc(PUnicodeChar(Bytes), RawByteString(Result), DefaultSystemCodePage, ByteCount div SizeOf(UnicodeChar));
+  widestringmanager.Unicode2AnsiMoveProc(PUnicodeChar(Bytes), Result, DefaultSystemCodePage, ByteCount div SizeOf(UnicodeChar));
 end;
 
 function TUnicodeEncoding.GetMaxByteCount(CharCount: Integer): Integer;
@@ -722,7 +722,7 @@
 end;
 
 function TBigEndianUnicodeEncoding.GetAnsiString(Bytes: PByte;
-  ByteCount: Integer): string;
+  ByteCount: Integer): RawByteString;
 var
   B: TBytes;
 begin
Index: rtl/objpas/sysutils/sysencodingh.inc
===================================================================
--- rtl/objpas/sysutils/sysencodingh.inc	(revision 41505)
+++ rtl/objpas/sysutils/sysencodingh.inc	(working copy)
@@ -50,7 +50,7 @@
     function GetCharCount(Bytes: PByte; ByteCount: Integer): Integer; overload; virtual; abstract;
     function GetChars(Bytes: PByte; ByteCount: Integer; Chars: PUnicodeChar; CharCount: Integer): Integer; overload; virtual; abstract;
     function GetAnsiBytes(Chars: PChar; CharCount: Integer): TBytes; virtual; abstract;
-    function GetAnsiString(Bytes: PByte; ByteCount: Integer): string; virtual; abstract;
+    function GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString; virtual; abstract;
     function GetCodePage: Cardinal; virtual; abstract;
     function GetEncodingName: UnicodeString; virtual; abstract;
   public
@@ -117,7 +117,7 @@
     function GetCharCount(Bytes: PByte; ByteCount: Integer): Integer; overload; override;
     function GetChars(Bytes: PByte; ByteCount: Integer; Chars: PUnicodeChar; CharCount: Integer): Integer; overload; override;
     function GetAnsiBytes(Chars: PChar; CharCount: Integer): TBytes; override;
-    function GetAnsiString(Bytes: PByte; ByteCount: Integer): string; override;
+    function GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString; override;
     function GetCodePage: Cardinal; override;
     function GetEncodingName: UnicodeString; override;
   public
@@ -160,7 +160,7 @@
     function GetCharCount(Bytes: PByte; ByteCount: Integer): Integer; overload; override;
     function GetChars(Bytes: PByte; ByteCount: Integer; Chars: PUnicodeChar; CharCount: Integer): Integer; overload; override;
     function GetAnsiBytes(Chars: PChar; CharCount: Integer): TBytes; override;
-    function GetAnsiString(Bytes: PByte; ByteCount: Integer): string; override;
+    function GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString; override;
     function GetCodePage: Cardinal; override;
     function GetEncodingName: UnicodeString; override;
   public
@@ -179,7 +179,7 @@
     function GetBytes(Chars: PUnicodeChar; CharCount: Integer; Bytes: PByte; ByteCount: Integer): Integer; overload; override;
     function GetChars(Bytes: PByte; ByteCount: Integer; Chars: PUnicodeChar; CharCount: Integer): Integer; overload; override;
     function GetAnsiBytes(Chars: PChar; CharCount: Integer): TBytes; override;
-    function GetAnsiString(Bytes: PByte; ByteCount: Integer): string; override;
+    function GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString; override;
     function GetCodePage: Cardinal; override;
     function GetEncodingName: UnicodeString; override;
   public
sysutils-32961.patch (4,251 bytes)   

Do-wan Kim

2019-02-27 11:49

reporter  

sysutils-32961-2.patch (5,235 bytes)   
Index: rtl/objpas/sysutils/sysencoding.inc
===================================================================
--- rtl/objpas/sysutils/sysencoding.inc	(revision 41505)
+++ rtl/objpas/sysutils/sysencoding.inc	(working copy)
@@ -48,10 +48,10 @@
 end;
 
 function TEncoding.GetAnsiString(const Bytes: TBytes; ByteIndex,
-  ByteCount: Integer): string;
+  ByteCount: Integer): RawByteString;
 begin
   Result := GetAnsiString(Pointer(@Bytes[ByteIndex]), ByteCount);
-  SetCodePage(RawByteString(Result), DefaultSystemCodePage, False);
+  SetCodePage(Result, DefaultSystemCodePage, False);
 end;
 
 class function TEncoding.GetASCII: TEncoding;
@@ -497,11 +497,11 @@
     Move(S[1], Result[0], Length(S));
 end;
 
-function TMBCSEncoding.GetAnsiString(Bytes: PByte; ByteCount: Integer): string;
+function TMBCSEncoding.GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString;
 begin
   SetString(Result, Pointer(Bytes), ByteCount);
-  SetCodePage(RawByteString(Result), GetCodePage, False);
-  SetCodePage(RawByteString(Result), DefaultSystemCodePage, True);
+  SetCodePage(Result, GetCodePage, False);
+  SetCodePage(Result, DefaultSystemCodePage, True);
 end;
 
 function TMBCSEncoding.GetMaxByteCount(CharCount: Integer): Integer;
@@ -638,9 +638,9 @@
 end;
 
 function TUnicodeEncoding.GetAnsiString(Bytes: PByte; ByteCount: Integer
-  ): string;
+  ): RawByteString;
 begin
-  widestringmanager.Unicode2AnsiMoveProc(PUnicodeChar(Bytes), RawByteString(Result), DefaultSystemCodePage, ByteCount div SizeOf(UnicodeChar));
+  widestringmanager.Unicode2AnsiMoveProc(PUnicodeChar(Bytes), Result, DefaultSystemCodePage, ByteCount div SizeOf(UnicodeChar));
 end;
 
 function TUnicodeEncoding.GetMaxByteCount(CharCount: Integer): Integer;
@@ -722,7 +722,7 @@
 end;
 
 function TBigEndianUnicodeEncoding.GetAnsiString(Bytes: PByte;
-  ByteCount: Integer): string;
+  ByteCount: Integer): RawByteString;
 var
   B: TBytes;
 begin
Index: rtl/objpas/sysutils/sysencodingh.inc
===================================================================
--- rtl/objpas/sysutils/sysencodingh.inc	(revision 41505)
+++ rtl/objpas/sysutils/sysencodingh.inc	(working copy)
@@ -50,7 +50,7 @@
     function GetCharCount(Bytes: PByte; ByteCount: Integer): Integer; overload; virtual; abstract;
     function GetChars(Bytes: PByte; ByteCount: Integer; Chars: PUnicodeChar; CharCount: Integer): Integer; overload; virtual; abstract;
     function GetAnsiBytes(Chars: PChar; CharCount: Integer): TBytes; virtual; abstract;
-    function GetAnsiString(Bytes: PByte; ByteCount: Integer): string; virtual; abstract;
+    function GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString; virtual; abstract;
     function GetCodePage: Cardinal; virtual; abstract;
     function GetEncodingName: UnicodeString; virtual; abstract;
   public
@@ -89,7 +89,7 @@
     function GetAnsiBytes(const S: string): TBytes; overload;
     function GetAnsiBytes(const S: string; CharIndex, CharCount: Integer): TBytes; overload;
     function GetAnsiString(const Bytes: TBytes): string; overload;
-    function GetAnsiString(const Bytes: TBytes; ByteIndex, ByteCount: Integer): string; overload;
+    function GetAnsiString(const Bytes: TBytes; ByteIndex, ByteCount: Integer): RawByteString; overload;
 
     property CodePage: Cardinal read GetCodePage;
     property EncodingName: UnicodeString read GetEncodingName;
@@ -117,7 +117,7 @@
     function GetCharCount(Bytes: PByte; ByteCount: Integer): Integer; overload; override;
     function GetChars(Bytes: PByte; ByteCount: Integer; Chars: PUnicodeChar; CharCount: Integer): Integer; overload; override;
     function GetAnsiBytes(Chars: PChar; CharCount: Integer): TBytes; override;
-    function GetAnsiString(Bytes: PByte; ByteCount: Integer): string; override;
+    function GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString; override;
     function GetCodePage: Cardinal; override;
     function GetEncodingName: UnicodeString; override;
   public
@@ -160,7 +160,7 @@
     function GetCharCount(Bytes: PByte; ByteCount: Integer): Integer; overload; override;
     function GetChars(Bytes: PByte; ByteCount: Integer; Chars: PUnicodeChar; CharCount: Integer): Integer; overload; override;
     function GetAnsiBytes(Chars: PChar; CharCount: Integer): TBytes; override;
-    function GetAnsiString(Bytes: PByte; ByteCount: Integer): string; override;
+    function GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString; override;
     function GetCodePage: Cardinal; override;
     function GetEncodingName: UnicodeString; override;
   public
@@ -179,7 +179,7 @@
     function GetBytes(Chars: PUnicodeChar; CharCount: Integer; Bytes: PByte; ByteCount: Integer): Integer; overload; override;
     function GetChars(Bytes: PByte; ByteCount: Integer; Chars: PUnicodeChar; CharCount: Integer): Integer; overload; override;
     function GetAnsiBytes(Chars: PChar; CharCount: Integer): TBytes; override;
-    function GetAnsiString(Bytes: PByte; ByteCount: Integer): string; override;
+    function GetAnsiString(Bytes: PByte; ByteCount: Integer): RawByteString; override;
     function GetCodePage: Cardinal; override;
     function GetEncodingName: UnicodeString; override;
   public
sysutils-32961-2.patch (5,235 bytes)   

Juha Manninen

2019-02-27 16:21

developer   ~0114489

There should be a report for FPC project with the sysutils patches attached.
They will probably be ignored here.

Michael Van Canneyt

2019-02-27 16:47

administrator   ~0114490

No it is OK, I have my eye on it and will look at the sysutils part.

Ondrej Pokorny

2019-04-01 16:18

developer   ~0115161

I just came across this problem within a pure FPC program - no Lazarus/LCL included.

As Benito already pointed out, if you change the DefaultSystemCodePage, the TEncoding.Default is not updated.

I would say the change should not be restricted only to the SetMultiByteConversionCodePage call but also to the change of DefaultSystemCodePage. This can be achieved with FPC properties:

property DefaultSystemCodePage: TSystemCodePage read GetDefaultSystemCodePage write SetDefaultSystemCodePage;

This issue should be moved to FPC because the LCL cannot do anything about it. The fix must happen in FPC's RTL.


Do-wan Kim: > To Avoid this encoding problem, first bytes -> string conversion result type is must be 'RawByteString'.

Can you elaborate on this idea? I don't get why you should get encoding problems if the result type is just "string".

Michael Van Canneyt

2019-04-01 16:56

administrator   ~0115162

Well, we're going to solve this differently.

The property will not help, since we cannot release the encoding, it could be in use in a stringlist or so.

The solution will be 2fold:
- TEncoding.Default will not be updated. It will be ANSICP, this is delphi compatible.
- We'll introduce TEncoding.SystemEncoding that will detect changes in the system code page, and in the getter will create a new encoding when needed.

Ondrej Pokorny

2019-04-01 17:01

developer  

DefaultCodePageMismatch-01.patch (1,314 bytes)   
Index: rtl/objpas/sysconst.pp
===================================================================
--- rtl/objpas/sysconst.pp	(revision 41802)
+++ rtl/objpas/sysconst.pp	(working copy)
@@ -137,6 +137,7 @@
   SCharacterIndexOutOfBounds    = 'character index out of bounds [%d]';
   SInvalidDestinationArray      = 'invalid destination array';
   SInvalidDestinationIndex      = 'invalid destination index [%d]';
+  SDefaultCodePageMismatch      = 'Default codepage mismatch [%d]<>[%d]. DefaultSystemCodePage was changed after TEncoding.ANSI was initialized.';
 
   SNoArrayMatch                 = 'Can''t match any allowed value at pattern position %d, string position %d.';
   SNoCharMatch                  = 'Mismatch char "%s" <> "%s" at pattern position %d, string position %d.';
Index: rtl/objpas/sysutils/sysencoding.inc
===================================================================
--- rtl/objpas/sysutils/sysencoding.inc	(revision 41802)
+++ rtl/objpas/sysutils/sysencoding.inc	(working copy)
@@ -29,6 +29,8 @@
   end;
 {$endif}
   Result := FStandardEncodings[seAnsi];
+  if Result.CodePage<>DefaultSystemCodePage then
+    raise EEncodingError.CreateFmt(SDefaultCodePageMismatch, [Result.CodePage, DefaultSystemCodePage]);
 end;
 
 function TEncoding.GetAnsiBytes(const S: string): TBytes;

Ondrej Pokorny

2019-04-01 17:02

developer   ~0115163

I have a simple solution to this problem: do not allow to change DefaultSystemCodePage after TEncoding.ANSI has been initialized.

For this you must ensure that TEncoding.ANSI is not initialized in any FPC-critical units - see 0035305.

Once you apply 0035305 you can raise an exception if TEncoding.ANSI doesn't match the DefaultSystemCodePage - with this exception the programmer is informed that he must change his code and change DefaultSystemCodePage before TEncoding.ANSI is needed. See DefaultCodePageMismatch-01.patch.

Of course it doesn't solve the problem if you want to change DefaultSystemCodePage several times in your program's life-cycle.

Ondrej Pokorny

2019-04-01 17:13

developer   ~0115164

Michael, sorry I missed you wrote an answer in between. Yes, this is possible as well.

Ondrej Pokorny

2019-04-13 12:01

developer  

SystemEncoding-01.patch (3,995 bytes)   
Index: rtl/objpas/sysutils/sysencoding.inc
===================================================================
--- rtl/objpas/sysutils/sysencoding.inc	(revision 41864)
+++ rtl/objpas/sysutils/sysencoding.inc	(working copy)
@@ -22,7 +22,13 @@
   try
 {$endif}
     if not Assigned(FStandardEncodings[seAnsi]) then
-      FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage);
+    begin
+      // DefaultSystemCodePage can be set to non-ANSI
+      if Assigned(widestringmanager.GetStandardCodePageProc) then
+        FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi))
+      else
+        FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage);
+    end;
 {$ifdef FPC_HAS_FEATURE_THREADING}
   finally
     LeaveCriticalSection(FLock);
@@ -91,6 +97,40 @@
   Result := GetANSI;
 end;
 
+class function TEncoding.GetSystemEncoding: TEncoding;
+var
+  I: Integer;
+begin
+{$ifdef FPC_HAS_FEATURE_THREADING}
+  EnterCriticalSection(FLock);
+  try
+{$endif}
+    for I := Low(FSystemEncodings) to High(FSystemEncodings) do
+    begin
+      if FSystemEncodings[I].CodePage=DefaultSystemCodePage then
+      begin
+        Result := FSystemEncodings[I];
+        if I<>Low(FSystemEncodings) then // exchange with first position to find it faster the next time
+        begin
+          FSystemEncodings[I] := FSystemEncodings[Low(FSystemEncodings)];
+          FSystemEncodings[Low(FSystemEncodings)] := Result;
+        end;
+        Exit;
+      end;
+    end;
+    // not found - create new encoding at first position
+    Result := TMBCSEncoding.Create(DefaultSystemCodePage);
+    SetLength(FSystemEncodings, Length(FSystemEncodings)+1);
+    if High(FSystemEncodings)<>Low(FSystemEncodings) then
+      FSystemEncodings[High(FSystemEncodings)] := FSystemEncodings[Low(FSystemEncodings)];
+    FSystemEncodings[Low(FSystemEncodings)] := Result;
+{$ifdef FPC_HAS_FEATURE_THREADING}
+  finally
+    LeaveCriticalSection(FLock);
+  end;
+{$endif}
+end;
+
 class function TEncoding.GetUnicode: TEncoding;
 begin
 {$ifdef FPC_HAS_FEATURE_THREADING}
@@ -142,6 +182,7 @@
 class procedure TEncoding.FreeEncodings;
 var
   E: TStandardEncoding;
+  I: Integer;
 begin
 {$ifdef FPC_HAS_FEATURE_THREADING}
   EnterCriticalSection(FLock);
@@ -149,6 +190,9 @@
 {$endif}
     for E := Low(FStandardEncodings) to High(FStandardEncodings) do
       FreeAndNil(FStandardEncodings[E]);
+    for I := Low(FSystemEncodings) to High(FSystemEncodings) do
+      FSystemEncodings[I].Free;
+    SetLength(FSystemEncodings, 0);
 {$ifdef FPC_HAS_FEATURE_THREADING}
   finally
     LeaveCriticalSection(FLock);
Index: rtl/objpas/sysutils/sysencodingh.inc
===================================================================
--- rtl/objpas/sysutils/sysencodingh.inc	(revision 41864)
+++ rtl/objpas/sysutils/sysencodingh.inc	(working copy)
@@ -30,6 +30,7 @@
         seUTF8);
     var
       FStandardEncodings: array[TStandardEncoding] of TEncoding; static;
+      FSystemEncodings: array of TEncoding; static;
     Class Var
       FLock : TRTLCriticalSection;
     class function GetANSI: TEncoding; static;
@@ -36,6 +37,7 @@
     class function GetASCII: TEncoding; static;
     class function GetBigEndianUnicode: TEncoding; static;
     class function GetDefault: TEncoding; static;
+    class function GetSystemEncoding: TEncoding; static;
     class function GetUnicode: TEncoding; static;
     class function GetUTF7: TEncoding; static;
     class function GetUTF8: TEncoding; static;
@@ -99,6 +101,7 @@
     class property ASCII: TEncoding read GetASCII;
     class property BigEndianUnicode: TEncoding read GetBigEndianUnicode;
     class property Default: TEncoding read GetDefault;
+    class property SystemEncoding: TEncoding read GetSystemEncoding;
     class property Unicode: TEncoding read GetUnicode;
     class property UTF7: TEncoding read GetUTF7;
     class property UTF8: TEncoding read GetUTF8;
SystemEncoding-01.patch (3,995 bytes)   

Ondrej Pokorny

2019-04-13 12:02

developer  

Codepages.lpr (1,334 bytes)

Ondrej Pokorny

2019-04-13 12:06

developer   ~0115470

Last edited: 2019-04-13 12:09

View 3 revisions

Michael, I made a patch (SystemEncoding-01.patch) according to your instructions:

> TEncoding.Default will not be updated. It will be ANSICP, this is delphi compatible.

This is done in TEncoding.GetANSI with widestringmanager.GetStandardCodePageProc(scpAnsi). It works on Windows, I didn't test other OS. I don't know if there is a universal way to get the ANSI codepage without widestringmanager. Feel free to improve it if there is a better approach.


> We'll introduce TEncoding.SystemEncoding that will detect changes in the system code page, and in the getter will create a new encoding when needed.

Done.

---

A test program is in Codepages.lpr.

Ondrej Pokorny

2019-04-13 12:22

developer   ~0115471

Strictly speaking it will not solve this bug report because TEncoding.Default will still be ANSI.

The bug reporter wants that TEncoding.Default will be DefaultSystemCodePage (UTF-8 on Lazarus). But this is not Delphi-compatible, as you mentioned. (In Delphi TEncoding.Default is still ANSI and not UTF-16 that corresponds with the native string encoding.)

IMO, Michael your solution approach (Default=ANSI) is correct and the programmer has to cope with the incompatibilities in TStringStream that were introduced when encoding support was added to TStringStream.

So this bug report should be resolved as "won't fix" rather than "fixed".

---

Maybe you could rename "TEncoding.SystemEncoding" to "TEncoding.DefaultSystemEncoding" or "TEncoding.DefaultSystem" to make the connection with "DefaultSystemCodePage" more obvious.

Michael Van Canneyt

2019-04-28 11:03

administrator  

Codepages-Fixed.lpr (1,304 bytes)   
program Codepages;

uses SysUtils;

var
  ACP,
StartDefaultSystemCodePage: TSystemCodePage;
begin
  StartDefaultSystemCodePage := DefaultSystemCodePage;
  ACP:=TEncoding.ANSI.CodePage;

  // test creating ANSI when DefaultSystemCodePage is set to non-ANSI
  if DefaultSystemCodePage<>CP_UTF8 then
    DefaultSystemCodePage := CP_UTF8
  else
    DefaultSystemCodePage := 1250;
  if TEncoding.ANSI.CodePage<>ACP then
    Halt(1);

  // test default
  DefaultSystemCodePage := StartDefaultSystemCodePage;
  if TEncoding.ANSI.CodePage<>TEncoding.SystemEncoding.CodePage then
    Halt(2);

  // try utf-8
  DefaultSystemCodePage := CP_UTF8;
  if TEncoding.ANSI.CodePage<>ACP then
    Halt(3);
  if TEncoding.SystemEncoding.CodePage<>DefaultSystemCodePage then
    Halt(4);

  // try a different single-byte encoding
  if StartDefaultSystemCodePage=1250 then
    DefaultSystemCodePage := 1251
  else
    DefaultSystemCodePage := 1250;

  if TEncoding.ANSI.CodePage<>ACP then
    Halt(5);
  if TEncoding.SystemEncoding.CodePage<>DefaultSystemCodePage then
    Halt(6);

  // try start again
  DefaultSystemCodePage := StartDefaultSystemCodePage;
  if TEncoding.SystemEncoding.CodePage<>DefaultSystemCodePage then
    Halt(7);

  WriteLn('Everything OK');
end.

Codepages-Fixed.lpr (1,304 bytes)   

Michael Van Canneyt

2019-04-28 11:07

administrator   ~0115866

I applied the patch of Ondrej implementing SystemEncoding. (many thanks!)
Attached a fixed version of the test project (it now also runs on linux).

As Ondrej notes, there is no way around handling correct codepage settings when creating string streams.
The programmer will always have to specify an encoding. This should now be possible.

Issue History

Date Modified Username Field Change
2018-01-06 22:29 Kevin Morris New Issue
2018-01-07 12:40 Thaddy de Koning Note Added: 0105431
2018-01-07 12:41 Thaddy de Koning Note Edited: 0105431 View Revisions
2018-01-07 12:42 Thaddy de Koning Note Edited: 0105431 View Revisions
2018-01-07 12:42 Thaddy de Koning Note Edited: 0105431 View Revisions
2018-01-07 13:24 Thaddy de Koning Note Added: 0105432
2018-01-07 21:27 Marco van de Voort Note Added: 0105465
2018-01-09 10:46 Marco van de Voort Note Edited: 0105465 View Revisions
2018-01-09 16:53 Thaddy de Koning Note Added: 0105542
2018-01-12 08:29 Michael Van Canneyt Category Compiler => RTL
2018-01-12 08:30 Michael Van Canneyt Assigned To => Michael Van Canneyt
2018-01-12 08:30 Michael Van Canneyt Status new => assigned
2018-01-13 18:34 Michael Van Canneyt Assigned To Michael Van Canneyt => Mattias Gaertner
2018-01-13 18:41 Michael Van Canneyt Note Added: 0105748
2018-01-13 18:46 Michael Van Canneyt Project FPC => Lazarus
2018-01-13 18:47 Michael Van Canneyt LazTarget => -
2018-01-13 18:47 Michael Van Canneyt Category RTL => LazUtils
2018-01-13 18:47 Michael Van Canneyt Product Version 3.1.1 =>
2018-01-13 18:55 Michael Van Canneyt Note Edited: 0105748 View Revisions
2018-06-01 23:01 engkin File Added: FreeEncodings.patch
2018-06-01 23:04 engkin Note Added: 0108638
2018-09-27 09:46 Juha Manninen Relationship added related to 0033681
2018-09-27 09:47 Juha Manninen LazTarget - => 1.10
2018-09-27 09:47 Juha Manninen Target Version => 1.10
2019-01-19 13:21 Benito van der Zander Note Added: 0113476
2019-01-20 02:04 engkin Note Added: 0113507
2019-02-27 00:56 Benito van der Zander File Added: sysutils.encoding.patch
2019-02-27 11:18 Do-wan Kim Note Added: 0114485
2019-02-27 11:25 Do-wan Kim File Added: sysutils-32961.patch
2019-02-27 11:49 Do-wan Kim File Added: sysutils-32961-2.patch
2019-02-27 11:50 Do-wan Kim Note Edited: 0114485 View Revisions
2019-02-27 16:21 Juha Manninen Note Added: 0114489
2019-02-27 16:47 Michael Van Canneyt Note Added: 0114490
2019-03-02 11:44 Michael Van Canneyt Relationship added related to 0034856
2019-04-01 16:18 Ondrej Pokorny Note Added: 0115161
2019-04-01 16:56 Michael Van Canneyt Note Added: 0115162
2019-04-01 17:01 Ondrej Pokorny File Added: DefaultCodePageMismatch-01.patch
2019-04-01 17:02 Ondrej Pokorny Note Added: 0115163
2019-04-01 17:13 Ondrej Pokorny Note Added: 0115164
2019-04-13 12:01 Ondrej Pokorny File Added: SystemEncoding-01.patch
2019-04-13 12:02 Ondrej Pokorny File Added: Codepages.lpr
2019-04-13 12:06 Ondrej Pokorny Note Added: 0115470
2019-04-13 12:08 Ondrej Pokorny Note Edited: 0115470 View Revisions
2019-04-13 12:09 Ondrej Pokorny Note Edited: 0115470 View Revisions
2019-04-13 12:22 Ondrej Pokorny Note Added: 0115471
2019-04-28 11:03 Michael Van Canneyt File Added: Codepages-Fixed.lpr
2019-04-28 11:07 Michael Van Canneyt Status assigned => resolved
2019-04-28 11:07 Michael Van Canneyt Resolution open => fixed
2019-04-28 11:07 Michael Van Canneyt Note Added: 0115866