View Issue Details

IDProjectCategoryView StatusLast Update
0036755FPCCompilerpublic2020-03-05 01:25
ReporterMarcin Wiazowski Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Product Version3.3.1 
Summary0036755: UniqueString is not always called, as it should be
DescriptionThe problem described below occurs only in Delphi mode. Please see the attached Reproduce1 and Reproduce2 demos.


When compiled with Delphi, output of both demos is as expected:
  aaaaaaaa
  aXaaaaaa

When compiled with FPC (NOT in Delphi mode), output of both demos is also as expected:
  aaaaaaaa
  aXaaaaaa

When compiled with FPC (in Delphi mode), Reproduce1 returns Runtime error 216, while Reproduce2's output is:
  aXaaaaaa
  aXaaaaaa


Explanation: Reproduce1 contains the following code:

var
  Str1, Str2 : string;
  P : PChar;
begin
  Str1:='aaaaaaaa'; // Str1 points now to the read-only memory (i.e. to executable's code section)
  Str2:=Str1; // Str2 points now to the same memory address as Str1

  // FPC should internally generate the following call here (as Delphi does):
  // UniqueString(Str2);

  P:=@Str2[2];
  P^:='X'; // Due to lacking "UniqueString(Str2)" call, Str2 still points to
           // the read-only memory, so FPC causes access violation here

Similar problem occurs in Reproduce2.


Solution: when access to string's memory is detected (in the "P:=@Str2[2]" statement), compiler should internally insert an "UniqueString(Str2)" call before - as it is currently in non-Delphi mode.



After removing the {$MODE DELPHI} directive, the problem no longer occurs. Tested on Win32 platform, with FPC 3.0.4, 3.2.0 (r44257) and 3.3.1 (r44257).
TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget
Attached Files

Relationships

related to 0035461 closedOndrej Pokorny SIGSEGV when modifying a PChar assigned from a string constant 

Activities

Marcin Wiazowski

2020-03-04 00:11

reporter  

Reproduce1.dpr (628 bytes)   
{$IFDEF FPC}{$MODE DELPHI}{$ENDIF}

{$APPTYPE CONSOLE}

program Reproduce1;

var
  Str1, Str2 : string;
  P : PChar;
begin
  Str1:='aaaaaaaa'; // Str1 points now to the read-only memory (i.e. to executable's code section)
  Str2:=Str1;       // Str2 points now to the same memory address as Str1

  // FPC should internally generate the following call here (as Delphi does):
  // UniqueString(Str2);

  P:=@Str2[2];
  P^:='X'; // Due to lacking "UniqueString(Str2)" call, Str2 still points to
           // the read-only memory, so FPC causes access violation here

  Writeln(Str1);
  Writeln(Str2);
end.
Reproduce1.dpr (628 bytes)   
Reproduce2.dpr (604 bytes)   
{$IFDEF FPC}{$MODE DELPHI}{$ENDIF}

{$APPTYPE CONSOLE}

program Reproduce2;

var
  Str1, Str2 : string;
  P : PChar;
begin
  Str1:=StringOfChar('a',8); // Str1 points now to the writble memory
  Str2:=Str1;                // Str2 points now to the same memory address as Str1

  // FPC should internally generate the following call here (as Delphi does):
  // UniqueString(Str2);

  P:=@Str2[2];
  P^:='X'; // Due to lacking "UniqueString(Str2)" call, Str2 still points to
           // the same memory as Str1, so BOTH strings are modified

  Writeln(Str1);
  Writeln(Str2);
end.
Reproduce2.dpr (604 bytes)   

Serge Anvarov

2020-03-04 02:42

reporter   ~0121349

In {$OBJFPC} mode it is also reproduced, if add {$H+} or explicitly specify the type as AnsiString.

Marcin Wiazowski

2020-03-04 13:54

reporter   ~0121358

Last edited: 2020-03-04 14:17

View 2 revisions

Indeed - after changing "string" declaration to AnsiString, UnicodeString, ShortString or WideString - {$MODE DELPHI} / {$MODE OBJFPC} / {$H+} / {$H-} no longer changes anything.

After changing "string" to:
- AnsiString or UnicodeString - both demos always work in a buggy way,
- ShortString or WideString - both demos always work properly.

Marco van de Voort

2020-03-04 15:00

manager   ~0121361

I'm not entirely sure why this _should_ be the case. Delphi might be an implementation detail, not really a designed feature.

Specially since you seem to need lowlevel access to force it, these things are rarely documented.

nanobit

2020-03-04 16:34

reporter   ~0121363

This is known behavior in FPC; the compiler could solve this only at the earliest stage:
Str1:='aaaaaaaa'; here create a copy (refcount = 1) instead of (refcount = -1).

From programmers view (high level),
Str1 can be seen as fully capable string as declared (ansistring variable);
to fulfill this (not readonly), the compiler needs to create a copy.
Later testing any PChar(P)^ for string memory is definitely too late.

Marcin Wiazowski

2020-03-04 16:46

reporter   ~0121364

In fact, Delphi only implements copy-on-write functionality. Delphi documentation says: "When indexing is used to change the value of a single character in a string, a copy of the string is made if - but only if - its reference count is greater than one. This is called copy-on-write semantics."

Although using a statement like "P:=@Str2[2]" may seem surprising at first sight, such a statement is very useful for optimization. I'm attaching Test1 and Test2 demos. Test1 contains the following code:



function MakeUpperCase(const Str : AnsiString) : AnsiString;
var
  I : Integer;
begin
  Result := Str;
  for I := 1 to Length(Result) do
    if (Result[I] >= 'a') and (Result[I] <= 'z') then
      Dec(Result[I], 32); // <== UniqueString call is inserted here by the
                          // compiler (both Delphi and FPC), and this call
                          // is executed IN EACH LOOP ITERATION
end;

begin
  Writeln(MakeUpperCase('TestString'));
end.



Both Delphi and FPC insert a "UniqueString" call as shown in the comment above (more precisely: FPC inserts an "fpc_ansistr_unique" call). The only problem is that the generated code is highly inefficient - the "UniqueString" call is executed in each loop iteration. To make the code more optimal, we can use (see Test2):



function MakeUpperCase(const Str : AnsiString) : AnsiString;
var
  P : PAnsiChar;
begin
  Result := Str;
  P := @Result[1]; // <== UniqueString call is inserted here by Delphi
  while P^ <> #0 do
  begin
    if (P^ >= 'a') and (P^ <= 'z') then
      Dec(P^, 32);
    Inc(P);
  end;
end;



But here we have the same problem again: Delphi works properly, while FPC raises an access violation exception (runtime error 216).



Since:
- there is no advantage of the current FPC implementation (having access violation - i.e. the runtime error 216 - when executing a valid code, doesn't give us any advantage),
- current FPC implementation is not Delphi compatible,
- FPC already implements the needed functionality (which is used for ShortStrings and WideStrings),

it would be good to have this functionality also for AnsiStrings and UnicodeStrings, I think. Or in other words: I can't see disadvantages of having this implemented, while I can see advantages.
Test1.dpr (604 bytes)   
{$IFDEF FPC}{$MODE OBJFPC}{$ENDIF} // This is to have the "Result" variable defined

{$APPTYPE CONSOLE}

program Test1;

function MakeUpperCase(const Str : AnsiString) : AnsiString;
var
  I : Integer;
begin
  Result := Str;
  for I := 1 to Length(Result) do
    if (Result[I] >= 'a') and (Result[I] <= 'z') then
      Dec(Result[I], 32); // <== UniqueString call is inserted here by the
                          // compiler (both Delphi and FPC), and this call
                          // is executed IN EACH LOOP ITERATION
end;

begin
  Writeln(MakeUpperCase('TestString'));
end.
Test1.dpr (604 bytes)   
Test2.dpr (483 bytes)   
{$IFDEF FPC}{$MODE OBJFPC}{$ENDIF} // This is to have the "Result" variable defined

{$APPTYPE CONSOLE}

program Test2;

function MakeUpperCase(const Str : AnsiString) : AnsiString;
var
  P : PAnsiChar;
begin
  Result := Str;
  P := @Result[1]; // <== UniqueString call is inserted here by Delphi
  while P^ <> #0 do
  begin
    if (P^ >= 'a') and (P^ <= 'z') then
      Dec(P^, 32);
    Inc(P);
  end;
end;

begin
  Writeln(MakeUpperCase('TestString'));
end.
Test2.dpr (483 bytes)   

Serge Anvarov

2020-03-04 17:02

reporter   ~0121366

I think performance considerations are more important because string constants are common and almost always don't change. But for rare cases of direct modification attempts, needs to provide additional code generation, like in Delphi. Just a specific case like this example was skipped.

nanobit

2020-03-04 17:22

reporter   ~0121367

Marcin, I agree this is a problem (and gave the solution for the compiler),
but specific pointer examples do not matter. The decision boils down
to whether every string variable should allow pointers for write access.
(By language reference, ony can say Yes. After all, its not const str1 = 'aaa..')

But copy-on-write (a means of optimization) already works as expected:
copy-on-write can be relied on only if the programmer does not! use pointers.

Marcin Wiazowski

2020-03-04 18:01

reporter   ~0121371

I think it's a matter of "copy-on-write" definition.

Everyone agrees, that string should be made unique, when writing directly to string's character, like in "Str[i] := 'x'" statement.

The question is: what should be done, when a pointer to the string is used. Delphi just uses a most safe solution - which guarantees, that everyone's "copy-on-write" understanding will be satisfied.

Just to note: since using a pointer to the string is not a very common practice, updating FPC's behavior will not - in fact - affect efficiency of the existing code (while it can solve specific problems described above - but this is obvious).

Marco van de Voort

2020-03-04 18:25

manager   ~0121372

macin "P:=@Str2[2]" does not fur-fill

"When indexing is used to change the value of a single character in a string, a copy of the string is made if - but only if - its reference count is greater than one. This is called copy-on-write semantics."

since the value of a single character in the string is NOT done. It is done later with pointer, not string- operations

Marcin Wiazowski

2020-03-04 19:06

reporter   ~0121373

Marco, you are right. To be exact: when using "P:=@Str2[2]", no reading from or writing to the string is still performed.

Assuming, that compiler wants to guarantee, that every writing to the string will be performed on an already-made-unique string, the compiler can:

1) Track all references to the "P" variable, in order to detect all writing operations by using "P" - so UniqueString call can be inserted in this case (this is an ideal solution). This is complicated, because "P" can be then assigned to - let's say - "P2" and "P3" - and all of them must be tracked in this case.

2) Just insert UniqueString call when obtaining "P" value (like Delphi does).

As for me, in real programming scenarios, option 1) would be an overkill.

Marco van de Voort

2020-03-04 19:11

manager   ~0121374

3) leave responsibility for low level hacks to the programmer, without needless copying of strings of both (1) and (2) in some cases. So that would make this an documentation mantis #

Marcin Wiazowski

2020-03-04 19:13

reporter   ~0121375

Last edited: 2020-03-04 19:32

View 2 revisions

I don't have anything against 3), although, in Delphi mode, 1) should be performed.


EDIT: 2) should be performed.

Marco van de Voort

2020-03-04 19:26

manager   ~0121376

I mean for all modes. Delphi mode is not an emulation of the most remote manual hacks. We don't support other internal details like calling __ functions either.

Marcin Wiazowski

2020-03-04 19:39

reporter   ~0121377

Well, it's not my decision.

However, please note, that the current behavior is inconsistent. The functionality, that you consider to be a hack, is supported by FPC for ShortString and WideString - while not for AnsiString and WideString. Although there may be some internal reason for this inconsistency, from the programmer's point of view, this doesn't seem to have much sense.

Marcin Wiazowski

2020-03-04 19:55

reporter   ~0121378

Some additional note: not supporting the __ functions (as far as I understand) leads to a compilation error. This is ok, because the programmer can see, that some changes to the code are needed.

Not calling UniqueString in Delphi mode is performed silently. This may be misleading, because there is no visible sign of the difference in code behavior.

Marco van de Voort

2020-03-04 20:55

manager   ~0121379

Again, not documented. Shortstring is a static type without refcount so uniquestring() is useless, and widestring probably always does so because of COM rules.

But it is not my call either. We both have made points, let's see what compiler devels say.

Ondrej Pokorny

2020-03-04 21:20

developer   ~0121380

If I am not mistaken, it is the same issue as 0035461 - and the current FPC behavior is wanted and documented. See comment from Serge 0035461:0115885.

Marcin Wiazowski

2020-03-04 21:59

reporter   ~0121382

Although these cases are similar, there is some difference:

In 0035461, we have:
  P := PAnsiChar(Str)

while here we have:
  P := @Str[]



Let's make some test:

{$APPTYPE CONSOLE}
program TestA;
var
  Str1, Str2 : AnsiString;
  P : PAnsiChar;
begin
  Str1 := StringOfChar('a', 8);
  Str2 := Str1;
  P := PAnsiChar(Str2);
  P[1] := 'X';
  Writeln(Str1);
  Writeln(Str2);
end.

This returns both in Delphi and in FPC:
  aXaaaaaa
  aXaaaaaa



But:

{$APPTYPE CONSOLE}
program TestB;
var
  Str1, Str2 : AnsiString;
  P : PAnsiChar;
begin
  Str1 := StringOfChar('a', 8);
  Str2 := Str1;
  P := @Str2[2];
  P^ := 'X';
  Writeln(Str1);
  Writeln(Str2);
end.

Returns in Delphi:
  aaaaaaaa
  aXaaaaaa

while in FPC:
  aXaaaaaa
  aXaaaaaa

nanobit

2020-03-04 23:08

reporter   ~0121384

Marcin, you should not use pointers to write to a string which has refcount unlike 1.
If you want to use a pointer for writing, you need refcount 1:
The compiler-independent way is calling uniqueString().
If this is documented, then nothing else (no compiler change) is needed.
Unless, we want more compatibility with Delphi application code
that has no explicit uniqueString() call.

Marcin Wiazowski

2020-03-04 23:22

reporter   ~0121385

Indeed, I'm concerned mainly about Delphi compatibility.

Jan Bruns

2020-03-05 01:01

reporter   ~0121388

Here's a counter example that would probably result in unexpected square instead of linear runtime.
jans_counter_example.pas (1,128 bytes)   
{$LINKLIB whatever}

type
Tstringtypetouse = ansistring;

{ for example, let some external lib-code detect some keyword
}
function ext_readonly_detect_something(p : pchar) : pchar; external;


{ where the keywords all have a fixed length
}
const ext_kwlen_minus1 = 4-1;


{ and maybe we wanna call it like
}
function do_detect(const a : Tstringtypetouse; idx : longint) : pchar;
begin
  { note there was some reason to declare param a as const. }
  if (idx>0) and ((idx+ext_kwlen_minus1)<=length(a))
  then do_test := ext_readonly_detect_something(@a[idx])
  else do_test := nil;
end;


{ but this all probably needs to be done in a loop
}
function detect_all(const a : Tstringtypetouse) : Pchar;
var idx : longint; p : pchar;
begin
  p := nil;
  idx := 1;
  while (p=nil) and (idx<=length(a)) do begin
    p := do_detect(a,idx);
    inc(idx);
  end;
  detect_all := p;
end;


var 
p : pchar; 
s1,s2 : Tstringtypetouse;

begin
  s2 := 'some text file that still fits into Tstringtypetouse';
  s1 := s2;
  p := detect_all(s1);
  if (p<>nil) then writeln('detected: ',p);
end.

jans_counter_example.pas (1,128 bytes)   

Marcin Wiazowski

2020-03-05 01:25

reporter   ~0121389

That's an interesting case.



When having:

function do_detect(const a : Tstringtypetouse; idx : longint) : pchar;
begin
  { note there was some reason to declare param a as const. }
  if (idx>0) and ((idx+ext_kwlen_minus1)<=length(a))
  then do_detect := ext_readonly_detect_something(@a[idx])
  else do_detect := nil;
end;

the "a" parameter is declared as "const" - in this case Delphi does NOT generate a UniqueString call due to the "@a[idx]" statement (so there is no square execution time).



After changing to:

function do_detect(var a : Tstringtypetouse; idx : longint) : pchar;

a UniqueString call is generated by Delphi.

Issue History

Date Modified Username Field Change
2020-03-04 00:11 Marcin Wiazowski New Issue
2020-03-04 00:11 Marcin Wiazowski File Added: Reproduce1.dpr
2020-03-04 00:11 Marcin Wiazowski File Added: Reproduce2.dpr
2020-03-04 02:42 Serge Anvarov Note Added: 0121349
2020-03-04 13:54 Marcin Wiazowski Note Added: 0121358
2020-03-04 14:17 Marcin Wiazowski Note Edited: 0121358 View Revisions
2020-03-04 15:00 Marco van de Voort Note Added: 0121361
2020-03-04 16:34 nanobit Note Added: 0121363
2020-03-04 16:46 Marcin Wiazowski File Added: Test1.dpr
2020-03-04 16:46 Marcin Wiazowski File Added: Test2.dpr
2020-03-04 16:46 Marcin Wiazowski Note Added: 0121364
2020-03-04 17:02 Serge Anvarov Note Added: 0121366
2020-03-04 17:22 nanobit Note Added: 0121367
2020-03-04 18:01 Marcin Wiazowski Note Added: 0121371
2020-03-04 18:25 Marco van de Voort Note Added: 0121372
2020-03-04 19:06 Marcin Wiazowski Note Added: 0121373
2020-03-04 19:11 Marco van de Voort Note Added: 0121374
2020-03-04 19:13 Marcin Wiazowski Note Added: 0121375
2020-03-04 19:26 Marco van de Voort Note Added: 0121376
2020-03-04 19:32 Marcin Wiazowski Note Edited: 0121375 View Revisions
2020-03-04 19:39 Marcin Wiazowski Note Added: 0121377
2020-03-04 19:55 Marcin Wiazowski Note Added: 0121378
2020-03-04 20:55 Marco van de Voort Note Added: 0121379
2020-03-04 21:18 Ondrej Pokorny Relationship added related to 0035461
2020-03-04 21:20 Ondrej Pokorny Note Added: 0121380
2020-03-04 21:59 Marcin Wiazowski Note Added: 0121382
2020-03-04 23:08 nanobit Note Added: 0121384
2020-03-04 23:22 Marcin Wiazowski Note Added: 0121385
2020-03-05 01:01 Jan Bruns File Added: jans_counter_example.pas
2020-03-05 01:01 Jan Bruns Note Added: 0121388
2020-03-05 01:25 Marcin Wiazowski Note Added: 0121389