View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0022501FPCRTLpublic2012-07-24 13:422012-07-25 15:49
Reporterocean 
Assigned To 
PrioritynormalSeverityminorReproducibilityalways
StatusconfirmedResolutionopen 
PlatformWin32OSOS Version
Product Version2.7.1Product Build 
Target VersionFixed in Version 
Summary0022501: Stringreplace corrupts my strings
DescriptionThis code shows message "Bug!". I have simplified it from larger program.

Tested 1.1.-37904-fpc-2.7.1-20120710-win32

Problem is NOT present 0.9.30.4 / 2.60

procedure test(s: string);
var i, j: integer;
begin
 i:=pos('€', s); //1
 s:=stringreplace(s, 'anything', 'anything2', []); // do nothing
 j:=pos('€', s); //0
 if i<>j then showmessage('Bug!');
end;

procedure TForm1.Button1Click(Sender: TObject);
var o: olevariant;
begin
 o:=UTF8Decode('€');
 test(UTF8Encode(o));
end;
TagsNo tags attached.
FPCOldBugId
Fixed in Revision
Attached Files? file icon test.pas [^] (357 bytes) 2012-07-24 21:42

- Relationships

-  Notes
(0061200)
Michael Van Canneyt (administrator)
2012-07-24 14:12

I cannot reprocude the problem.

The following program:

uses sysutils;

procedure test(s: string);
var i, j: integer;
begin
 i:=pos('€', s); //1
 s:=stringreplace(s, 'anything', 'anything2', []); // do nothing
 j:=pos('€', s); //0
  if i<>j then writeln('Bug!');
end;

begin
  test('€');
end.

does not print anything.
I suspect the bug is not in stringreplace, but in some conversion routine.

Tested with Free Pascal Compiler version 2.7.1 [2012/07/16] for x86_64,
with ansistring and shortstring.
(0061201)
Marco van de Voort (manager)
2012-07-24 14:48

Maybe the widestring (?) return of utf8decode assigned to the variant goes wrong.

But first we must have a minimal (preferably, without lazarus dependencies) COMPILABLE program to test this.
(0061206)
ocean (reporter)
2012-07-24 16:08

test('€'); works here too.

It was not related to variant, even this gives me the problem. Remove "stringreplace" line, and it works.

var s: string;
begin
 s:=UTF8Encode(UTF8Decode('€'));
 s:=stringreplace(s, 'a', 'b', []);
 if pos('€', s)=0 then showmessage('bug');
end;

Tested 3 computers

WinXP + some older 2.7.1 = Bug
Win7 + version in post0 = Bug
Win7 + 0.9.30/2.6.0 = Works
(0061207)
Michael Van Canneyt (administrator)
2012-07-24 16:20

Tested your new program on Linux, it still works.

So it is either windows related, or you are not using the sysutils version of stringreplace, maybe Lazarus has a lazarus-specific version of this routine.
 
Please also check if you are using the FPC variant of the UTF8 routines, or the lazarus specific ones.

I tested without lazarus.
(0061208)
Bart Broersma (reporter)
2012-07-24 18:25

This program tests both scenarios (with and without olevriants).

program sr;

{$mode objfpc}{$H+}

uses SysUtils, ActiveX;


function testsr(s: string): Boolean;
var i, j: integer;
begin
 i:=pos('€', s);
 if i <> 1 then writeln('i = ',i,' [should be 1]');
 s:=stringreplace(s, 'anything', 'anything2', []); // do nothing
 j:=pos('€', s);
 if j <> 1 then writeln('j = ',j,' [should be 1]');
 Result := (i=1) and (j=1);
end;

procedure testo;
var o: olevariant;
  b: Boolean;
begin
 o:=UTF8Decode('€');
 b := testsr(UTF8Encode(o));
 write('Test with olevariant = ');
 if b then writeln('Ok') else writeln('Fail');
end;

procedure tests;
var b: Boolean;
begin
 b := testsr(UTF8Encode(UTF8Decode('€')));
 write('Test with string = ');
 if b then writeln('Ok') else writeln('Fail');
end;

begin
  testo;
  tests;
end.

C:\Users\Bart\LazarusProjecten\ConsoleProjecten\bugs\StringReplace>fpc sr.lpr
Free Pascal Compiler version 2.6.0 [2011/12/25] for i386
Copyright (c) 1993-2011 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling sr.lpr
Linking sr.exe
41 lines compiled, 0.3 sec , 150128 bytes code, 22668 bytes data

C:\Users\Bart\LazarusProjecten\ConsoleProjecten\bugs\StringReplace>sr
Test with olevariant = Ok
Test with string = Ok

Tested on Win7
(0061210)
ocean (reporter)
2012-07-24 19:28

I tested your program, Win7/32, locale finnish.

You used 2.6.0, that works here too.

C:\lazarus\fpc\2.7.1\bin\i386-win32>fpc c:/bug/sr.lpr
Free Pascal Compiler version 2.7.1 [2012/05/23] for i386
Copyright (c) 1993-2012 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling c:\bug\sr.lpr
Linking c:\bug\sr.exe
39 lines compiled, 3.8 sec, 149984 bytes code, 24972 bytes data

C:\lazarus\fpc\2.7.1\bin\i386-win32>c:/bug/sr
i = 0 [should be 1]
j = 0 [should be 1]
Test with olevariant = Fail
i = 0 [should be 1]
j = 0 [should be 1]
Test with string = Fail
(0061211)
ocean (reporter)
2012-07-24 19:47

Try 2, (Coding broke, when I copied it from here to my texteditor, sorry)

C:\lazarus\fpc\2.7.1\bin\i386-win32>fpc c:/bug/sr2.lpr
Free Pascal Compiler version 2.7.1 [2012/05/23] for i386
Copyright (c) 1993-2012 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling c:\bug\sr2.lpr
Linking c:\bug\sr2.exe
40 lines compiled, 3.6 sec, 150000 bytes code, 24972 bytes data

C:\lazarus\fpc\2.7.1\bin\i386-win32>c:/bug/sr2
j = 0 [should be 1]
Test with olevariant = Fail
j = 0 [should be 1]
Test with string = Fail
(0061212)
Bart Broersma (reporter)
2012-07-24 20:56
edited on: 2012-07-24 21:07

This should fix any copy/paste problems, and it will tell us the content of S if things go wrong.

program sr;

{$mode objfpc}{$H+}

uses SysUtils, ActiveX;

const
  EUR = Chr(226) + Chr(130) + Chr(172); //UTF-8 sequnece for the Euro symbol


function testsr(s: string): Boolean;
var x, i, j: integer;
begin
 i:=pos(EUR, s);
 if i <> 1 then
 begin
   writeln('Before StringReplace: i = ',i,' [should be 1]');
   write('EUR = ');
   for x := 1 to length(EUR) do write('#',Ord(EUR[x]),' '); writeln;
   write('S = ');
   for x := 1 to length(s) do write('#',Ord(s[x]),' '); writeln;
 end;
 s:=stringreplace(s, 'anything', 'anything2', []); // do nothing
 j:=pos(EUR, s);
 if j <> 1 then
 begin
   writeln('Before StringReplace: j = ',j,' [should be 1]');
   write('EUR = ');
   for x := 1 to length(EUR) do write('#',Ord(EUR[x]),' '); writeln;
   write('S = ');
   for x := 1 to length(s) do write('#',Ord(s[x]),' '); writeln;
 end;
 Result := (i=1) and (j=1);
end;

procedure testo;
var o: olevariant;
  b: Boolean;
begin
 o:=UTF8Decode(EUR);
 b := testsr(UTF8Encode(o));
 write('Test with olevariant = ');
 if b then writeln('Ok') else writeln('Fail');
end;

procedure tests;
var b: Boolean;
begin
 b := testsr(UTF8Encode(UTF8Decode(EUR)));
 write('Test with string = ');
 if b then writeln('Ok') else writeln('Fail');
end;

begin
  testo;
  tests;
end.


Please re-test.

(0061213)
ocean (reporter)
2012-07-24 21:15

C:\lazarus\fpc\2.7.1\bin\i386-win32>fpc c:/bug/sr.lpr
Free Pascal Compiler version 2.7.1 [2012/05/23] for i386
Copyright (c) 1993-2012 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling c:\bug\sr.lpr
Linking c:\bug\sr.exe
56 lines compiled, 3.7 sec, 150704 bytes code, 25036 bytes data

C:\lazarus\fpc\2.7.1\bin\i386-win32>c:/bug/sr
Before StringReplace: j = 0 [should be 1]
EUR = 0000226 0000130 0000172
S = 0000128
Test with olevariant = Fail
Before StringReplace: j = 0 [should be 1]
EUR = 0000226 0000130 0000172
S = 0000128
Test with string = Fail
(0061214)
Ludo Brands (developer)
2012-07-24 21:41
edited on: 2012-07-24 22:32

I can reproduce the problem with fpc 2.7.1 21643. Program attached. Replace var o: variant; with var o: olevariant; and the program will crash with a EAccessViolation. But that is probably yet another bug.
At i:=pos('€', s); s contains 0xe2 0x82 0xac 0x00 which is utf8 for €. At j:=pos('€', s); s contains 0x80 0x00.
Stringreplace is from sysstr.inc in the rtl.
Did some debugging and the problem is in fpc_AnsiStr_Concat. StringReplace does a Result:=Result+RemS; with result being ''. fpc_AnsiStr_Concat does
  if (Pointer(DestS)=nil) then
    DestCP:=cp

and forces the codepage to the system code page which is 1252 on windows.

Edit: Olevariant crash reported as 0022504

(0061215)
Bart Broersma (reporter)
2012-07-24 22:43

Does the program (my latest version from the notes above) still fail if you leave out the utf8Encode(utf8 decode()) part?
It should if your analysis is correct (which would mean that many string functions would suffer from this?).
(0061229)
Ludo Brands (developer)
2012-07-25 08:16

@ Bart. Are you kidding? "we must have a minimal (preferably, without lazarus dependencies) COMPILABLE program to test this". Is 357 bytes not minimal enough? Why do we need to test your program when you have a program that reproduces the problem.

A shorter version if you have problems downloading the attached file and that shows the extend of the problem (line second bug):

program test2;

var s,snul:string;
  s1:string;
begin
  snul:='';
  s:=utf8encode(utf8decode('€'));
  s1:=snul+s;
  if s[1]<>s1[1] then
    begin
    writeln('bug');
    if s=s1 then
      writeln('a second bug');
    end;
end.

The s=s1 works because before doing the compare both sides are converted to the system code page. Reason why I did a s[1]<>s1[1] before. s1 does not contain a €.
The problem is linked to utf8Encode returning a RawByteString. To illustrate the problem with concatenating RawByteString and string try this one (even shorter and using your preferred € encoding):

program test2;

var sr:RawByteString;
  snul,s1:string;
begin
  snul:='';
  sr:=Chr(226) + Chr(130) + Chr(172);
  s1:=snul+sr;
  if sr<>s1 then
    writeln('bug');
end.

In this test s1 is empty which is even more surprising. Using an intermediate string for the concat doesn't change the result:

program test2;

var sr:RawByteString;
  snul,s,s1:string;
begin
  snul:='';
  sr:=Chr(226) + Chr(130) + Chr(172);
  s:=sr;
  s1:=snul+s;
  if sr<>s1 then
    writeln('bug');
end.

s contains € but s1 is '' again.
(0061235)
Sergei Gorelkin (developer)
2012-07-25 10:10

Make no mistake, in 2.7.1 the generic 'string' type and all functions using it as arguments or result types are generally unusable in Windows if you deal with data outside of ANSI codepage. In will drop data outside ANSI codepage. Hacks like utf8encode which actually permit the result string to contain data in different encoding than its declaration soften some corners but don't help in general.

In D2009+ this model is usable because 'string' aliases to UnicodeString. It is also usable in Linux because of its default utf8 codepage.

To make it work properly, each and every procedure/function working with strings, excluding a few basic ones like concat/insert/delete (these are already fixed), has to be changed to accept RawByteString arguments, examine actual encoding of arguments and do necessary conversions. While in principle this can be done for RTL/packages codebase, I remain very pessimistic about possibility for people to write their own string functions using the new model.
(0061238)
Ludo Brands (developer)
2012-07-25 11:47

> excluding a few basic ones like concat/insert/delete (these are already fixed)

except that in case of concat with mixed encodings the results goes through a unicode to system encoding conversion even if one side is null. Concat with an empty string should be a zero operation.

BTW The empty result string in the last tests is caused by Win32Ansi2UnicodeMove of a RawByteString that calls MultiByteToWideChar with cp=$ffff (CP_NONE) which is not supported.
(0061239)
theo (reporter)
2012-07-25 12:26

Don't know if exactly the same, but I had some problems with StringReplace on Lazarus 1.1 r FPC 2.7.1 x86_64-win64-win32/win64 some minutes ago.

Everything works as I expect when using this code:

initialization
widestringmanager.Unicode2AnsiMoveProc:=@DefaultUnicode2AnsiMove;
widestringmanager.Ansi2UnicodeMoveProc:=@DefaultAnsi2UnicodeMove;
(0061240)
Sergei Gorelkin (developer)
2012-07-25 13:07

Here you use RawByteString to hack a utf8 sequence into an ansistring which cannot contain one. This isn't going to work (and RawByteString is intended to be used only along with manual manipulations on codepage field -- but since nothing prevents its use as generic type, similar issues are going to pop up endlessly, I guess).

fpc_ansistr_concat routine assumes codepage of an empty argument equal to destination codepage, so it won't perform conversion if codepage of another argument matches destination too. If another argument has different codepage then conversion will still occur (and it will go through unicode because direct conversion from one non-unicode codepage to another is not possible). I believe this behavior is correct.

fpc_ansistr_concat_multi does not follow this pattern though, it likely needs a fix.
(0061241)
Bart Broersma (reporter)
2012-07-25 14:11

@Sergei:
> Make no mistake, in 2.7.1 the generic 'string' type and all functions using it
> as arguments or result types are generally unusable in Windows if you deal with
> data outside of ANSI codepage.

Maybe I misunderstand this, but it seems to me that this would make all fpc string operations almost useless in Lazarus on Windows, since LCL is UTF-8 by nature and Windows isn't.

@Ludo: forgive my ignorance. Whay I mainly wanted to know was:

program test2;

var s,snul:string;
  s1:string;
begin
  snul:='';
  s:=utf8encode(utf8decode('€')); //<<-- ****
  s1:=snul+s;
  if s[1]<>s1[1] then
    begin
    writeln('bug');
    if s=s1 then
      writeln('a second bug');
    end;
end.


If you leave out the utf8encode(utf8decode('€')) part and simply use s := '€', does this bug still happen? IOW, what part does the utf8encode(utf8decode()) play in all of this.
(Alo my test offerd some insight into the content of the strings.)
(0061243)
Sergei Gorelkin (developer)
2012-07-25 14:58

@Bart:
>Maybe I misunderstand this, but it seems to me that this would make all fpc >string operations almost useless in Lazarus on Windows, since LCL is UTF-8 by >nature and Windows isn't.

You understand correctly. Either LCL will have to change all 'string' declarations into 'utf8string' to work properly in utf8 encoding, or FPC will have to introduce means to redefine 'string' type globally as utf8string.
(0061245)
Jonas Maebe (manager)
2012-07-25 15:49

Or the LCL (or the LCL program) could set DefaultSystemCodePage to CP_UTF8.

- Issue History
Date Modified Username Field Change
2012-07-24 13:42 ocean New Issue
2012-07-24 14:12 Michael Van Canneyt Note Added: 0061200
2012-07-24 14:48 Marco van de Voort Note Added: 0061201
2012-07-24 15:36 Marco van de Voort Status new => feedback
2012-07-24 16:08 ocean Note Added: 0061206
2012-07-24 16:20 Michael Van Canneyt Note Added: 0061207
2012-07-24 18:25 Bart Broersma Note Added: 0061208
2012-07-24 19:28 ocean Note Added: 0061210
2012-07-24 19:47 ocean Note Added: 0061211
2012-07-24 20:56 Bart Broersma Note Added: 0061212
2012-07-24 20:57 Bart Broersma Note Edited: 0061212
2012-07-24 20:58 Bart Broersma Note Edited: 0061212
2012-07-24 21:05 Bart Broersma Note Edited: 0061212
2012-07-24 21:06 Bart Broersma Note Edited: 0061212
2012-07-24 21:07 Bart Broersma Note Edited: 0061212
2012-07-24 21:15 ocean Note Added: 0061213
2012-07-24 21:41 Ludo Brands Note Added: 0061214
2012-07-24 21:42 Ludo Brands File Added: test.pas
2012-07-24 22:32 Ludo Brands Note Edited: 0061214
2012-07-24 22:43 Bart Broersma Note Added: 0061215
2012-07-25 08:16 Ludo Brands Note Added: 0061229
2012-07-25 10:10 Sergei Gorelkin Note Added: 0061235
2012-07-25 11:47 Ludo Brands Note Added: 0061238
2012-07-25 12:26 theo Note Added: 0061239
2012-07-25 13:07 Sergei Gorelkin Note Added: 0061240
2012-07-25 14:11 Bart Broersma Note Added: 0061241
2012-07-25 14:58 Sergei Gorelkin Note Added: 0061243
2012-07-25 15:27 Marco van de Voort Status feedback => confirmed
2012-07-25 15:49 Jonas Maebe Note Added: 0061245



MantisBT 1.2.12[^]
Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker