| Anonymous | Login | Signup for a new account | 2013-06-19 13:44 CEST | ![]() |
| All Projects | FPC | Lazarus: Packages, Patches | Lazarus CCR | Mantis | fpGUI | fpcprojects: fpprofiler |
| Main | My View | View Issues | Change Log | Roadmap |
| View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||||||
| ID | Project | Category | View Status | Date Submitted | Last Update | ||||||||
| 0022501 | FPC | RTL | public | 2012-07-24 13:42 | 2012-07-25 15:49 | ||||||||
| Reporter | ocean | ||||||||||||
| Assigned To | |||||||||||||
| Priority | normal | Severity | minor | Reproducibility | always | ||||||||
| Status | confirmed | Resolution | open | ||||||||||
| Platform | Win32 | OS | OS Version | ||||||||||
| Product Version | 2.7.1 | Product Build | |||||||||||
| Target Version | Fixed in Version | ||||||||||||
| Summary | 0022501: Stringreplace corrupts my strings | ||||||||||||
| Description | This code shows message "Bug!". I have simplified it from larger program. Tested 1.1.-37904-fpc-2.7.1-20120710-win32 Problem is NOT present 0.9.30.4 / 2.60 procedure test(s: string); var i, j: integer; begin i:=pos('€', s); //1 s:=stringreplace(s, 'anything', 'anything2', []); // do nothing j:=pos('€', s); //0 if i<>j then showmessage('Bug!'); end; procedure TForm1.Button1Click(Sender: TObject); var o: olevariant; begin o:=UTF8Decode('€'); test(UTF8Encode(o)); end; | ||||||||||||
| Tags | No tags attached. | ||||||||||||
| FPCOldBugId | |||||||||||||
| Fixed in Revision | |||||||||||||
| Attached Files | |||||||||||||
Notes |
|
|
(0061200) Michael Van Canneyt (administrator) 2012-07-24 14:12 |
I cannot reprocude the problem. The following program: uses sysutils; procedure test(s: string); var i, j: integer; begin i:=pos('€', s); //1 s:=stringreplace(s, 'anything', 'anything2', []); // do nothing j:=pos('€', s); //0 if i<>j then writeln('Bug!'); end; begin test('€'); end. does not print anything. I suspect the bug is not in stringreplace, but in some conversion routine. Tested with Free Pascal Compiler version 2.7.1 [2012/07/16] for x86_64, with ansistring and shortstring. |
|
(0061201) Marco van de Voort (manager) 2012-07-24 14:48 |
Maybe the widestring (?) return of utf8decode assigned to the variant goes wrong. But first we must have a minimal (preferably, without lazarus dependencies) COMPILABLE program to test this. |
|
(0061206) ocean (reporter) 2012-07-24 16:08 |
test('€'); works here too. It was not related to variant, even this gives me the problem. Remove "stringreplace" line, and it works. var s: string; begin s:=UTF8Encode(UTF8Decode('€')); s:=stringreplace(s, 'a', 'b', []); if pos('€', s)=0 then showmessage('bug'); end; Tested 3 computers WinXP + some older 2.7.1 = Bug Win7 + version in post0 = Bug Win7 + 0.9.30/2.6.0 = Works |
|
(0061207) Michael Van Canneyt (administrator) 2012-07-24 16:20 |
Tested your new program on Linux, it still works. So it is either windows related, or you are not using the sysutils version of stringreplace, maybe Lazarus has a lazarus-specific version of this routine. Please also check if you are using the FPC variant of the UTF8 routines, or the lazarus specific ones. I tested without lazarus. |
|
(0061208) Bart Broersma (reporter) 2012-07-24 18:25 |
This program tests both scenarios (with and without olevriants). program sr; {$mode objfpc}{$H+} uses SysUtils, ActiveX; function testsr(s: string): Boolean; var i, j: integer; begin i:=pos('€', s); if i <> 1 then writeln('i = ',i,' [should be 1]'); s:=stringreplace(s, 'anything', 'anything2', []); // do nothing j:=pos('€', s); if j <> 1 then writeln('j = ',j,' [should be 1]'); Result := (i=1) and (j=1); end; procedure testo; var o: olevariant; b: Boolean; begin o:=UTF8Decode('€'); b := testsr(UTF8Encode(o)); write('Test with olevariant = '); if b then writeln('Ok') else writeln('Fail'); end; procedure tests; var b: Boolean; begin b := testsr(UTF8Encode(UTF8Decode('€'))); write('Test with string = '); if b then writeln('Ok') else writeln('Fail'); end; begin testo; tests; end. C:\Users\Bart\LazarusProjecten\ConsoleProjecten\bugs\StringReplace>fpc sr.lpr Free Pascal Compiler version 2.6.0 [2011/12/25] for i386 Copyright (c) 1993-2011 by Florian Klaempfl and others Target OS: Win32 for i386 Compiling sr.lpr Linking sr.exe 41 lines compiled, 0.3 sec , 150128 bytes code, 22668 bytes data C:\Users\Bart\LazarusProjecten\ConsoleProjecten\bugs\StringReplace>sr Test with olevariant = Ok Test with string = Ok Tested on Win7 |
|
(0061210) ocean (reporter) 2012-07-24 19:28 |
I tested your program, Win7/32, locale finnish. You used 2.6.0, that works here too. C:\lazarus\fpc\2.7.1\bin\i386-win32>fpc c:/bug/sr.lpr Free Pascal Compiler version 2.7.1 [2012/05/23] for i386 Copyright (c) 1993-2012 by Florian Klaempfl and others Target OS: Win32 for i386 Compiling c:\bug\sr.lpr Linking c:\bug\sr.exe 39 lines compiled, 3.8 sec, 149984 bytes code, 24972 bytes data C:\lazarus\fpc\2.7.1\bin\i386-win32>c:/bug/sr i = 0 [should be 1] j = 0 [should be 1] Test with olevariant = Fail i = 0 [should be 1] j = 0 [should be 1] Test with string = Fail |
|
(0061211) ocean (reporter) 2012-07-24 19:47 |
Try 2, (Coding broke, when I copied it from here to my texteditor, sorry) C:\lazarus\fpc\2.7.1\bin\i386-win32>fpc c:/bug/sr2.lpr Free Pascal Compiler version 2.7.1 [2012/05/23] for i386 Copyright (c) 1993-2012 by Florian Klaempfl and others Target OS: Win32 for i386 Compiling c:\bug\sr2.lpr Linking c:\bug\sr2.exe 40 lines compiled, 3.6 sec, 150000 bytes code, 24972 bytes data C:\lazarus\fpc\2.7.1\bin\i386-win32>c:/bug/sr2 j = 0 [should be 1] Test with olevariant = Fail j = 0 [should be 1] Test with string = Fail |
|
(0061212) Bart Broersma (reporter) 2012-07-24 20:56 edited on: 2012-07-24 21:07 |
This should fix any copy/paste problems, and it will tell us the content of S if things go wrong. program sr; {$mode objfpc}{$H+} uses SysUtils, ActiveX; const EUR = Chr(226) + Chr(130) + Chr(172); //UTF-8 sequnece for the Euro symbol function testsr(s: string): Boolean; var x, i, j: integer; begin i:=pos(EUR, s); if i <> 1 then begin writeln('Before StringReplace: i = ',i,' [should be 1]'); write('EUR = '); for x := 1 to length(EUR) do write('#',Ord(EUR[x]),' '); writeln; write('S = '); for x := 1 to length(s) do write('#',Ord(s[x]),' '); writeln; end; s:=stringreplace(s, 'anything', 'anything2', []); // do nothing j:=pos(EUR, s); if j <> 1 then begin writeln('Before StringReplace: j = ',j,' [should be 1]'); write('EUR = '); for x := 1 to length(EUR) do write('#',Ord(EUR[x]),' '); writeln; write('S = '); for x := 1 to length(s) do write('#',Ord(s[x]),' '); writeln; end; Result := (i=1) and (j=1); end; procedure testo; var o: olevariant; b: Boolean; begin o:=UTF8Decode(EUR); b := testsr(UTF8Encode(o)); write('Test with olevariant = '); if b then writeln('Ok') else writeln('Fail'); end; procedure tests; var b: Boolean; begin b := testsr(UTF8Encode(UTF8Decode(EUR))); write('Test with string = '); if b then writeln('Ok') else writeln('Fail'); end; begin testo; tests; end. Please re-test. |
|
(0061213) ocean (reporter) 2012-07-24 21:15 |
C:\lazarus\fpc\2.7.1\bin\i386-win32>fpc c:/bug/sr.lpr Free Pascal Compiler version 2.7.1 [2012/05/23] for i386 Copyright (c) 1993-2012 by Florian Klaempfl and others Target OS: Win32 for i386 Compiling c:\bug\sr.lpr Linking c:\bug\sr.exe 56 lines compiled, 3.7 sec, 150704 bytes code, 25036 bytes data C:\lazarus\fpc\2.7.1\bin\i386-win32>c:/bug/sr Before StringReplace: j = 0 [should be 1] EUR = 0000226 0000130 0000172 S = 0000128 Test with olevariant = Fail Before StringReplace: j = 0 [should be 1] EUR = 0000226 0000130 0000172 S = 0000128 Test with string = Fail |
|
(0061214) Ludo Brands (developer) 2012-07-24 21:41 edited on: 2012-07-24 22:32 |
I can reproduce the problem with fpc 2.7.1 21643. Program attached. Replace var o: variant; with var o: olevariant; and the program will crash with a EAccessViolation. But that is probably yet another bug. At i:=pos('€', s); s contains 0xe2 0x82 0xac 0x00 which is utf8 for €. At j:=pos('€', s); s contains 0x80 0x00. Stringreplace is from sysstr.inc in the rtl. Did some debugging and the problem is in fpc_AnsiStr_Concat. StringReplace does a Result:=Result+RemS; with result being ''. fpc_AnsiStr_Concat does if (Pointer(DestS)=nil) then DestCP:=cp and forces the codepage to the system code page which is 1252 on windows. Edit: Olevariant crash reported as 0022504 |
|
(0061215) Bart Broersma (reporter) 2012-07-24 22:43 |
Does the program (my latest version from the notes above) still fail if you leave out the utf8Encode(utf8 decode()) part? It should if your analysis is correct (which would mean that many string functions would suffer from this?). |
|
(0061229) Ludo Brands (developer) 2012-07-25 08:16 |
@ Bart. Are you kidding? "we must have a minimal (preferably, without lazarus dependencies) COMPILABLE program to test this". Is 357 bytes not minimal enough? Why do we need to test your program when you have a program that reproduces the problem. A shorter version if you have problems downloading the attached file and that shows the extend of the problem (line second bug): program test2; var s,snul:string; s1:string; begin snul:=''; s:=utf8encode(utf8decode('€')); s1:=snul+s; if s[1]<>s1[1] then begin writeln('bug'); if s=s1 then writeln('a second bug'); end; end. The s=s1 works because before doing the compare both sides are converted to the system code page. Reason why I did a s[1]<>s1[1] before. s1 does not contain a €. The problem is linked to utf8Encode returning a RawByteString. To illustrate the problem with concatenating RawByteString and string try this one (even shorter and using your preferred € encoding): program test2; var sr:RawByteString; snul,s1:string; begin snul:=''; sr:=Chr(226) + Chr(130) + Chr(172); s1:=snul+sr; if sr<>s1 then writeln('bug'); end. In this test s1 is empty which is even more surprising. Using an intermediate string for the concat doesn't change the result: program test2; var sr:RawByteString; snul,s,s1:string; begin snul:=''; sr:=Chr(226) + Chr(130) + Chr(172); s:=sr; s1:=snul+s; if sr<>s1 then writeln('bug'); end. s contains € but s1 is '' again. |
|
(0061235) Sergei Gorelkin (developer) 2012-07-25 10:10 |
Make no mistake, in 2.7.1 the generic 'string' type and all functions using it as arguments or result types are generally unusable in Windows if you deal with data outside of ANSI codepage. In will drop data outside ANSI codepage. Hacks like utf8encode which actually permit the result string to contain data in different encoding than its declaration soften some corners but don't help in general. In D2009+ this model is usable because 'string' aliases to UnicodeString. It is also usable in Linux because of its default utf8 codepage. To make it work properly, each and every procedure/function working with strings, excluding a few basic ones like concat/insert/delete (these are already fixed), has to be changed to accept RawByteString arguments, examine actual encoding of arguments and do necessary conversions. While in principle this can be done for RTL/packages codebase, I remain very pessimistic about possibility for people to write their own string functions using the new model. |
|
(0061238) Ludo Brands (developer) 2012-07-25 11:47 |
> excluding a few basic ones like concat/insert/delete (these are already fixed) except that in case of concat with mixed encodings the results goes through a unicode to system encoding conversion even if one side is null. Concat with an empty string should be a zero operation. BTW The empty result string in the last tests is caused by Win32Ansi2UnicodeMove of a RawByteString that calls MultiByteToWideChar with cp=$ffff (CP_NONE) which is not supported. |
|
(0061239) theo (reporter) 2012-07-25 12:26 |
Don't know if exactly the same, but I had some problems with StringReplace on Lazarus 1.1 r FPC 2.7.1 x86_64-win64-win32/win64 some minutes ago. Everything works as I expect when using this code: initialization widestringmanager.Unicode2AnsiMoveProc:=@DefaultUnicode2AnsiMove; widestringmanager.Ansi2UnicodeMoveProc:=@DefaultAnsi2UnicodeMove; |
|
(0061240) Sergei Gorelkin (developer) 2012-07-25 13:07 |
Here you use RawByteString to hack a utf8 sequence into an ansistring which cannot contain one. This isn't going to work (and RawByteString is intended to be used only along with manual manipulations on codepage field -- but since nothing prevents its use as generic type, similar issues are going to pop up endlessly, I guess). fpc_ansistr_concat routine assumes codepage of an empty argument equal to destination codepage, so it won't perform conversion if codepage of another argument matches destination too. If another argument has different codepage then conversion will still occur (and it will go through unicode because direct conversion from one non-unicode codepage to another is not possible). I believe this behavior is correct. fpc_ansistr_concat_multi does not follow this pattern though, it likely needs a fix. |
|
(0061241) Bart Broersma (reporter) 2012-07-25 14:11 |
@Sergei: > Make no mistake, in 2.7.1 the generic 'string' type and all functions using it > as arguments or result types are generally unusable in Windows if you deal with > data outside of ANSI codepage. Maybe I misunderstand this, but it seems to me that this would make all fpc string operations almost useless in Lazarus on Windows, since LCL is UTF-8 by nature and Windows isn't. @Ludo: forgive my ignorance. Whay I mainly wanted to know was: program test2; var s,snul:string; s1:string; begin snul:=''; s:=utf8encode(utf8decode('€')); //<<-- **** s1:=snul+s; if s[1]<>s1[1] then begin writeln('bug'); if s=s1 then writeln('a second bug'); end; end. If you leave out the utf8encode(utf8decode('€')) part and simply use s := '€', does this bug still happen? IOW, what part does the utf8encode(utf8decode()) play in all of this. (Alo my test offerd some insight into the content of the strings.) |
|
(0061243) Sergei Gorelkin (developer) 2012-07-25 14:58 |
@Bart: >Maybe I misunderstand this, but it seems to me that this would make all fpc >string operations almost useless in Lazarus on Windows, since LCL is UTF-8 by >nature and Windows isn't. You understand correctly. Either LCL will have to change all 'string' declarations into 'utf8string' to work properly in utf8 encoding, or FPC will have to introduce means to redefine 'string' type globally as utf8string. |
|
(0061245) Jonas Maebe (manager) 2012-07-25 15:49 |
Or the LCL (or the LCL program) could set DefaultSystemCodePage to CP_UTF8. |
Issue History |
|||
| Date Modified | Username | Field | Change |
| 2012-07-24 13:42 | ocean | New Issue | |
| 2012-07-24 14:12 | Michael Van Canneyt | Note Added: 0061200 | |
| 2012-07-24 14:48 | Marco van de Voort | Note Added: 0061201 | |
| 2012-07-24 15:36 | Marco van de Voort | Status | new => feedback |
| 2012-07-24 16:08 | ocean | Note Added: 0061206 | |
| 2012-07-24 16:20 | Michael Van Canneyt | Note Added: 0061207 | |
| 2012-07-24 18:25 | Bart Broersma | Note Added: 0061208 | |
| 2012-07-24 19:28 | ocean | Note Added: 0061210 | |
| 2012-07-24 19:47 | ocean | Note Added: 0061211 | |
| 2012-07-24 20:56 | Bart Broersma | Note Added: 0061212 | |
| 2012-07-24 20:57 | Bart Broersma | Note Edited: 0061212 | |
| 2012-07-24 20:58 | Bart Broersma | Note Edited: 0061212 | |
| 2012-07-24 21:05 | Bart Broersma | Note Edited: 0061212 | |
| 2012-07-24 21:06 | Bart Broersma | Note Edited: 0061212 | |
| 2012-07-24 21:07 | Bart Broersma | Note Edited: 0061212 | |
| 2012-07-24 21:15 | ocean | Note Added: 0061213 | |
| 2012-07-24 21:41 | Ludo Brands | Note Added: 0061214 | |
| 2012-07-24 21:42 | Ludo Brands | File Added: test.pas | |
| 2012-07-24 22:32 | Ludo Brands | Note Edited: 0061214 | |
| 2012-07-24 22:43 | Bart Broersma | Note Added: 0061215 | |
| 2012-07-25 08:16 | Ludo Brands | Note Added: 0061229 | |
| 2012-07-25 10:10 | Sergei Gorelkin | Note Added: 0061235 | |
| 2012-07-25 11:47 | Ludo Brands | Note Added: 0061238 | |
| 2012-07-25 12:26 | theo | Note Added: 0061239 | |
| 2012-07-25 13:07 | Sergei Gorelkin | Note Added: 0061240 | |
| 2012-07-25 14:11 | Bart Broersma | Note Added: 0061241 | |
| 2012-07-25 14:58 | Sergei Gorelkin | Note Added: 0061243 | |
| 2012-07-25 15:27 | Marco van de Voort | Status | feedback => confirmed |
| 2012-07-25 15:49 | Jonas Maebe | Note Added: 0061245 | |
| Main | My View | View Issues | Change Log | Roadmap |



