View Issue Details

IDProjectCategoryView StatusLast Update
0029732FPCRTLpublic2017-07-23 10:48
ReporterHolger KlemtAssigned ToMichael Van Canneyt 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
PlatformLazarus 1.6OSWindows OS Version8/8.1/10
Product VersionProduct Build 
Target Version3.2.0Fixed in Version3.1.1 
Summary0029732: Stringreplace does strange things with german ß Umlaut
DescriptionSS is interpreted as ß by error
Steps To Reproduceprocedure TForm1.Button1Click(Sender: TObject);
var s:String;
begin
  s:='ADRESSE';
  s:=Stringreplace(s,'ß','SZ',[rfReplaceAll,rfIgnoreCase]);
  ShowMessage(s);
  //result is ADRESZE instead of ADRESSE, double S seems to be internally interpreted

end;
TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget
Attached Files

Activities

Maxim Ganetsky

2016-02-26 13:24

reporter   ~0090342

StringReplace is a function from SysUtils unit and thus belongs to FPC RTL.

Please attach small compilable example not depending on LCL.

Moving to FPC project.

Thaddy de Koning

2016-02-26 14:05

reporter   ~0090344

Last edited: 2016-02-26 14:25

View 4 revisions

As long as the codepage is correct everything goes to plan.
See:
program ringels;
{$APPTYPE CONSOLE}
{$CODEPAGE cp1252}
uses
  SysUtils;

var s:String;
begin
  s:='ADRESSE';
  s:=Stringreplace(s,'SS', 'ß',[rfReplaceAll,rfIgnoreCase]);
  writeln(s);
  s:=Stringreplace(s,'ß','SS',[rfReplaceAll,rfIgnoreCase]);
  writeln(s);
  s:=Stringreplace(s,'SS', 'ß',[rfReplaceAll,rfIgnoreCase]);
  writeln(s);
  readln;
end.

Note that with {$codepage UTF8} you will get a malformed UTF8 string error on the first ringel s. Seems to indicate StringReplace is not unicode aware yet.

Thaddy de Koning

2016-02-26 14:40

reporter   ~0090347

Last edited: 2016-02-26 14:42

View 3 revisions

It may be that just SRUpperCase in syssr.inc needs to be changed.
This is macro'd in sysstr.inc to AnsiUpperCase which looks wrong for UTF8.

Bart Broersma

2016-02-26 17:06

reporter   ~0090352

Last edited: 2016-02-26 17:07

View 2 revisions

When using rfIgnoreCase the OldPattern parameter gets UpperCased first.
This will result in Ringel-S -> SS, and then that is replaced with NewPattern (SZ).
It's the fault of the Germans, they should have put an uppercase Ringel-S in their alphabet ;-)

Marco van de Voort

2016-02-26 20:24

manager   ~0090362

The $codepage only changes literals. The functions will still be defined using ansistring(0)?

user268

2016-02-27 07:59

  ~0090371

It depends what FPC is used. If not used latest FPC from SVN is StringReplace is ANSI aware only.

If used latest FPC from SVN, Jarto's ANSI aware function was modified a bit to be utf8 aware by Michael Van Canneyt. This could be unfortunate, as I noted here:
http://bugs.freepascal.org/view.php?id=26864#c89802

Holger Klemt

2016-02-27 10:20

reporter  

p1.zip (128,802 bytes)

Holger Klemt

2016-02-27 10:20

reporter  

p2.zip (1,685 bytes)

Holger Klemt

2016-02-27 10:22

reporter   ~0090378

attached p1.zip is a small lazarus example which reproduces the error (at least on my german windows 10 computer and a customer german win8 compter). Attached example p2 in console does not show the error

Thaddy de Koning

2016-02-27 10:48

reporter   ~0090380

Last edited: 2016-02-27 11:05

View 3 revisions

@Marco: as per my example change it to codepage utf8. That's how I found out stringreplace is not UTF8 compatible. Only regarding the use of AnsiUppercase afaict. Don't no about the ansi stringtype but indeed the codepage is just for string literals as per the documentation.

[edit] must be something different after all. I followed up on Bart's comment and removed ignorecase.

It won't even compile for codepage UTF8. It bails out on the literal.

using this example:
program ringels;
{$APPTYPE CONSOLE}
{$CODEPAGE UTF8}
uses SysUtils;
var
  s,r:AnsiString;
begin
  s:='ADRESSE';
  r := 'ß'; // error here already!
  s:=Stringreplace(s,'SS', r,[rfReplaceAll]);
  writeln(s);
  readln;
end.

Output with codepage utf8 is:
C:\Kol64>fpc ringels.dpr
ringels.dpr(9,8) Error: Malformed UTF-8 string
ringels.dpr(9,8) Fatal: String exceeds line
Fatal: Compilation aborted
Error: c:\pp\bin\x86_64-win64\ppcx64.exe returned an error exitcode

And that should not be the case. The ringel s is plainly not even accepted as a literal.

Bart Broersma

2016-02-27 13:22

reporter   ~0090392

Example program from Thaddy in note (0090380), with {$codepage UTF8} enabled:

C:\Users\Bart\LazarusProjecten\ConsoleProjecten>fpc test.pas
Free Pascal Compiler version 3.0.0 [2015/11/16] for i386
Copyright (c) 1993-2015 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling test.pas
Linking test.exe
18 lines compiled, 2.6 sec, 65200 bytes code, 4068 bytes data

C:\Users\Bart\LazarusProjecten\ConsoleProjecten>test
ADREßE

@Thaddy: did you make sure the actual encoding of the sourcefile is indeed UTF8?

Bart Broersma

2016-02-27 13:23

reporter   ~0090393

Related to 0022501?

Thaddy de Koning

2016-02-27 15:15

reporter   ~0090402

Last edited: 2016-02-27 15:25

View 2 revisions

@Bart: If I explicitly set the sourcecode to UTF8, yes, then it compiles.
Sorry, still use D7 as my editor ;) After loading into geany and saving as UTF8 it compiled and worked.

Holger Klemt

2016-02-27 18:18

reporter   ~0090412

btw: regarding sourcecode: the lazarus example with entering data at runtime in a TEdit Controlhad had the same problem

Holger Klemt

2016-03-01 12:05

reporter   ~0090517

one more thing, don´t know if the reason is the same
i use this button click event since years and it worked until laz14x without a problem

var
  path: string;
begin
  path:=Edit1.Text;
  WinExec(pchar('explorer.exe '+Utf8ToAnsi(path)+'"'),sw_show);
end;

When edit1.text has a path for example 'C:\path\' in text property, explorer.exe open this path, also when german special characters like ä,ö,ü are used in the path name, for example 'C:\pathäöü\' (for sure the path must exists).

Since laz16, it works only without special characters. When edit1.text contains 'C:\pathäöü\' explorer.exe does not recognize a valid path parameter, even that the path exists. It looks like there are some changes in utf8toansi which are responsible for this error.

I created several workaround examples based on tprocess, etc. and all have the same error, but only in laz16, older versions always work fine.

Bart Broersma

2016-03-01 16:53

reporter   ~0090528

Us the Utf8ToWinCP function in LazUtf8, or use TProcessUtf8().
WinExec() is legacy code and IIRC deprecated by MS.
ShellExecuteW with explicti conversions of String to WideString should work OK too.

Holger Klemt

2016-03-01 19:36

reporter   ~0090534

Bart, thanks, you made my day, UTF8Process was the solution for the winexec replacement, will help me also in some other areas, but the SS to SZ conversion problem first mentioned is still alive

Michael Van Canneyt

2017-07-23 10:48

administrator   ~0101861

Tested with latest trunk, original issue seems resolved, most likely due to StringReplace changes.

Possibly the fix made it to 3.0.2 or 3.0.4

Issue History

Date Modified Username Field Change
2016-02-26 12:25 Holger Klemt New Issue
2016-02-26 13:24 Maxim Ganetsky Note Added: 0090342
2016-02-26 13:24 Maxim Ganetsky Project Packages => FPC
2016-02-26 14:05 Thaddy de Koning Note Added: 0090344
2016-02-26 14:10 Thaddy de Koning Note Edited: 0090344 View Revisions
2016-02-26 14:11 Thaddy de Koning Note Edited: 0090344 View Revisions
2016-02-26 14:25 Thaddy de Koning Note Edited: 0090344 View Revisions
2016-02-26 14:40 Thaddy de Koning Note Added: 0090347
2016-02-26 14:40 Thaddy de Koning Note Edited: 0090347 View Revisions
2016-02-26 14:42 Thaddy de Koning Note Edited: 0090347 View Revisions
2016-02-26 17:06 Bart Broersma Note Added: 0090352
2016-02-26 17:07 Bart Broersma Note Edited: 0090352 View Revisions
2016-02-26 20:24 Marco van de Voort Note Added: 0090362
2016-02-27 07:59 user268 Note Added: 0090371
2016-02-27 10:20 Holger Klemt File Added: p1.zip
2016-02-27 10:20 Holger Klemt File Added: p2.zip
2016-02-27 10:22 Holger Klemt Note Added: 0090378
2016-02-27 10:30 Michael Van Canneyt Category LCL => RTL
2016-02-27 10:30 Michael Van Canneyt Product Version 1.6 =>
2016-02-27 10:48 Thaddy de Koning Note Added: 0090380
2016-02-27 10:51 Thaddy de Koning Note Edited: 0090380 View Revisions
2016-02-27 11:05 Thaddy de Koning Note Edited: 0090380 View Revisions
2016-02-27 13:22 Bart Broersma Note Added: 0090392
2016-02-27 13:23 Bart Broersma Note Added: 0090393
2016-02-27 15:15 Thaddy de Koning Note Added: 0090402
2016-02-27 15:25 Thaddy de Koning Note Edited: 0090402 View Revisions
2016-02-27 18:18 Holger Klemt Note Added: 0090412
2016-03-01 12:05 Holger Klemt Note Added: 0090517
2016-03-01 16:53 Bart Broersma Note Added: 0090528
2016-03-01 19:36 Holger Klemt Note Added: 0090534
2017-07-23 10:35 Michael Van Canneyt Assigned To => Michael Van Canneyt
2017-07-23 10:35 Michael Van Canneyt Status new => assigned
2017-07-23 10:48 Michael Van Canneyt Note Added: 0101861
2017-07-23 10:48 Michael Van Canneyt Status assigned => resolved
2017-07-23 10:48 Michael Van Canneyt Fixed in Version => 3.1.1
2017-07-23 10:48 Michael Van Canneyt Resolution open => fixed
2017-07-23 10:48 Michael Van Canneyt Target Version => 3.2.0