View Issue Details

IDProjectCategoryView StatusLast Update
0033559FPCRTLpublic2018-04-14 14:08
ReporterBenito van der ZanderAssigned ToMichael Van Canneyt 
PrioritynormalSeverityminorReproducibilityhave not tried
Status resolvedResolutionfixed 
Product Version3.1.1Product Build 
Target Version3.2.0Fixed in Version3.1.1 
Summary0033559: TStringHelper.Starts/EndsWith
Descriptionthe string helper looks quite useful, but let's take a close look

> 1297 function TStringHelper.StartsWith(const AValue: string; IgnoreCase: Boolean
> 1298 ): Boolean;
> 1299 Var
> 1300 L : Integer;
> 1301 begin
> 1302 L:=System.Length(AValue);

length returns sizeint nowadays, not integer. But the string helper has integer everywhere. That is not going to work with longer strings...

> 1303 Result:=L>0;
> 1304 if Result then

So no string starts with an empty string? Would it not make more sense if every string starts with an empty string?

> 1305 if IgnoreCase then
> 1306 Result:=StrLiComp(PChar(AValue),PChar(Self),L)=0
> 1307 else
> 1308 Result:=StrLComp(PChar(AValue),PChar(Self),L)=0

This stops the comparison on #0 chars.

Is that behavior supposed to be Delphi compatible or is the function just completely broken?


On the other hand endsWith handles #0, but makes an unnecessary copy

> 567 S:=system.Copy(Self,Length-L+1,L);
> 568 Result:=system.Length(S)=L;
> 569 if Result then
> 570 if IgnoreCase then
> 571 Result:=CompareText(S,AValue)=0
> 572 else
> 573 Result:=S=AValue;
TagsNo tags attached.
Fixed in Revision38769
FPCOldBugId
FPCTarget
Attached Files
  • StrUtils.diff (4,479 bytes)
    Index: packages/rtl-objpas/src/inc/strutils.pp
    ===================================================================
    --- packages/rtl-objpas/src/inc/strutils.pp	(revision 38710)
    +++ packages/rtl-objpas/src/inc/strutils.pp	(working copy)
    @@ -33,8 +33,8 @@
     Function AnsiReplaceText(const AText, AFromText, AToText: string): string;inline;
     Function AnsiMatchText(const AText: string; const AValues: array of string): Boolean;inline;
     Function AnsiIndexText(const AText: string; const AValues: array of string): Integer;
    -Function StartsText(const ASubText, AText: string): Boolean;
    -Function EndsText(const ASubText, AText: string): Boolean;
    +Function StartsText(const ASubText, AText: string): Boolean; inline;
    +Function EndsText(const ASubText, AText: string): Boolean; inline;
     
     { ---------------------------------------------------------------------
         Case sensitive search/replace
    @@ -906,39 +906,25 @@
     
     function AnsiStartsText(const ASubText, AText: string): Boolean;
     begin
    -  if (Length(AText) >= Length(ASubText)) and (ASubText <> '') then
    -    Result := AnsiStrLIComp(PChar(ASubText), PChar(AText), Length(ASubText)) = 0
    -  else
    -    Result := False;
    +  Result := (ASubText = '') or AnsiSameText(LeftStr(AText, Length(ASubText)), ASubText);
     end;
     
     
     function AnsiEndsText(const ASubText, AText: string): Boolean;
     begin
    -  if Length(AText) >= Length(ASubText) then
    -    Result := AnsiStrLIComp(PChar(ASubText),
    -      PChar(AText) + Length(AText) - Length(ASubText), Length(ASubText)) = 0
    -  else
    -    Result := False;
    +  Result := (ASubText = '') or AnsiSameText(RightStr(AText, Length(ASubText)), ASubText);
     end;
     
     
    -function StartsText(const ASubText, AText: string): Boolean;
    +function StartsText(const ASubText, AText: string): Boolean; inline;
     begin
    -  if (Length(AText) >= Length(ASubText)) and (ASubText <> '') then
    -    Result := StrLIComp(PChar(ASubText), PChar(AText), Length(ASubText)) = 0
    -  else
    -    Result := False;
    +  Result := AnsiStartsText(ASubText, AText);
     end;
     
     
     function EndsText(const ASubText, AText: string): Boolean;
     begin
    -  if Length(AText) >= Length(ASubText) then
    -    Result := StrLIComp(PChar(ASubText),
    -      PChar(AText) + Length(AText) - Length(ASubText), Length(ASubText)) = 0
    -  else
    -    Result := False;
    +  Result := AnsiEndsText(ASubText, AText);
     end;
     
     
    @@ -955,17 +941,11 @@
     
     
     function AnsiIndexText(const AText: string; const AValues: array of string): Integer;
    -
    -var
    -  i : Integer;
    -
     begin
    -  Result:=-1;
    -  if (high(AValues)=-1) or (High(AValues)>MaxInt) Then
    -    Exit;
    -  for i:=low(AValues) to High(Avalues) do
    -     if CompareText(avalues[i],atext)=0 Then
    -       exit(i);  // make sure it is the first val.
    +  for Result := Low(AValues) to High(AValues) do
    +    if AnsiSameText(AValues[Result], AText) then
    +      Exit;
    +  Result := -1;
     end;
     
     
    @@ -981,20 +961,13 @@
     
     function AnsiStartsStr(const ASubText, AText: string): Boolean;
     begin
    -  if (Length(AText) >= Length(ASubText)) and (ASubText <> '') then
    -    Result := AnsiStrLComp(PChar(ASubText), PChar(AText), Length(ASubText)) = 0
    -  else
    -    Result := False;
    +  Result := (ASubText = '') or (LeftStr(AText, Length(ASubText)) = ASubText);
     end;
     
     
     function AnsiEndsStr(const ASubText, AText: string): Boolean;
     begin
    -  if Length(AText) >= Length(ASubText) then
    -    Result := AnsiStrLComp(PChar(ASubText),
    -      PChar(AText) + Length(AText) - Length(ASubText), Length(ASubText)) = 0
    -  else
    -    Result := False;
    +  Result := (ASubText = '') or (RightStr(AText, Length(ASubText)) = AText);
     end;
     
     
    @@ -1093,18 +1066,23 @@
       ---------------------------------------------------------------------}
     
     function DupeString(const AText: string; ACount: Integer): string;
    -
    -var i,l : SizeInt;
    -
    +var
    +  Len: SizeInt;
    +  Source, Target: PByte;
     begin
    - result:='';
    - if aCount>=0 then
    -   begin
    -     l:=length(atext);
    -     SetLength(result,aCount*l);
    -     for i:=0 to ACount-1 do
    -       move(atext[1],Result[l*i+1],l);
    -   end;
    +  Len := Length(AText);
    +  SetLength(Result, ACount * Len);
    +  // Use PByte to skip implicit UniqueString, because SetLength always unique
    +  Target := PByte(Result);
    +  if Target = nil then // ACount = 0 or AText = ''
    +    Exit;
    +  // Now ACount > 0 and Len > 0
    +  Source := PByte(AText);
    +  repeat
    +    Move(Source[0], Target[0], Len * SizeOf(Char));
    +    Inc(Target, Len * SizeOf(Char));
    +    Dec(ACount);
    +  until ACount = 0;
     end;
     
     function ReverseString(const AText: string): string;
    
    StrUtils.diff (4,479 bytes)

Activities

Michael Van Canneyt

2018-04-04 12:51

administrator   ~0107589

The #0 behaviour is Delphi compatible. I will look at the rest.

Serge Anvarov

2018-04-05 05:59

reporter   ~0107622

program Project1;
{$APPTYPE CONSOLE}
uses SysUtils, StrUtils;
const
  FullStr = '123'#0'45';
  TestStr = '123'#0'!';
begin
  Writeln('StartsWith=', FullStr.StartsWith(TestStr));
  Writeln('Pos=', Pos(TestStr, FullStr) > 0);
  Writeln('StartsText=', StartsText(TestStr, FullStr));
  Readln;
end.

On Delphi (10.3):
StartsWith=TRUE
Pos=FALSE
StartsText=FALSE

On FPC (3.1.1):
StartsWith=TRUE
Pos=FALSE
StartsText=TRUE

My opinion: should be false anywhere. This is Pascal, not C.

Benito van der Zander

2018-04-05 23:14

reporter   ~0107642

> My opinion: should be false anywhere. This is Pascal, not C.

+1

Serge Anvarov

2018-04-08 14:53

reporter  

StrUtils.diff (4,479 bytes)
Index: packages/rtl-objpas/src/inc/strutils.pp
===================================================================
--- packages/rtl-objpas/src/inc/strutils.pp	(revision 38710)
+++ packages/rtl-objpas/src/inc/strutils.pp	(working copy)
@@ -33,8 +33,8 @@
 Function AnsiReplaceText(const AText, AFromText, AToText: string): string;inline;
 Function AnsiMatchText(const AText: string; const AValues: array of string): Boolean;inline;
 Function AnsiIndexText(const AText: string; const AValues: array of string): Integer;
-Function StartsText(const ASubText, AText: string): Boolean;
-Function EndsText(const ASubText, AText: string): Boolean;
+Function StartsText(const ASubText, AText: string): Boolean; inline;
+Function EndsText(const ASubText, AText: string): Boolean; inline;
 
 { ---------------------------------------------------------------------
     Case sensitive search/replace
@@ -906,39 +906,25 @@
 
 function AnsiStartsText(const ASubText, AText: string): Boolean;
 begin
-  if (Length(AText) >= Length(ASubText)) and (ASubText <> '') then
-    Result := AnsiStrLIComp(PChar(ASubText), PChar(AText), Length(ASubText)) = 0
-  else
-    Result := False;
+  Result := (ASubText = '') or AnsiSameText(LeftStr(AText, Length(ASubText)), ASubText);
 end;
 
 
 function AnsiEndsText(const ASubText, AText: string): Boolean;
 begin
-  if Length(AText) >= Length(ASubText) then
-    Result := AnsiStrLIComp(PChar(ASubText),
-      PChar(AText) + Length(AText) - Length(ASubText), Length(ASubText)) = 0
-  else
-    Result := False;
+  Result := (ASubText = '') or AnsiSameText(RightStr(AText, Length(ASubText)), ASubText);
 end;
 
 
-function StartsText(const ASubText, AText: string): Boolean;
+function StartsText(const ASubText, AText: string): Boolean; inline;
 begin
-  if (Length(AText) >= Length(ASubText)) and (ASubText <> '') then
-    Result := StrLIComp(PChar(ASubText), PChar(AText), Length(ASubText)) = 0
-  else
-    Result := False;
+  Result := AnsiStartsText(ASubText, AText);
 end;
 
 
 function EndsText(const ASubText, AText: string): Boolean;
 begin
-  if Length(AText) >= Length(ASubText) then
-    Result := StrLIComp(PChar(ASubText),
-      PChar(AText) + Length(AText) - Length(ASubText), Length(ASubText)) = 0
-  else
-    Result := False;
+  Result := AnsiEndsText(ASubText, AText);
 end;
 
 
@@ -955,17 +941,11 @@
 
 
 function AnsiIndexText(const AText: string; const AValues: array of string): Integer;
-
-var
-  i : Integer;
-
 begin
-  Result:=-1;
-  if (high(AValues)=-1) or (High(AValues)>MaxInt) Then
-    Exit;
-  for i:=low(AValues) to High(Avalues) do
-     if CompareText(avalues[i],atext)=0 Then
-       exit(i);  // make sure it is the first val.
+  for Result := Low(AValues) to High(AValues) do
+    if AnsiSameText(AValues[Result], AText) then
+      Exit;
+  Result := -1;
 end;
 
 
@@ -981,20 +961,13 @@
 
 function AnsiStartsStr(const ASubText, AText: string): Boolean;
 begin
-  if (Length(AText) >= Length(ASubText)) and (ASubText <> '') then
-    Result := AnsiStrLComp(PChar(ASubText), PChar(AText), Length(ASubText)) = 0
-  else
-    Result := False;
+  Result := (ASubText = '') or (LeftStr(AText, Length(ASubText)) = ASubText);
 end;
 
 
 function AnsiEndsStr(const ASubText, AText: string): Boolean;
 begin
-  if Length(AText) >= Length(ASubText) then
-    Result := AnsiStrLComp(PChar(ASubText),
-      PChar(AText) + Length(AText) - Length(ASubText), Length(ASubText)) = 0
-  else
-    Result := False;
+  Result := (ASubText = '') or (RightStr(AText, Length(ASubText)) = AText);
 end;
 
 
@@ -1093,18 +1066,23 @@
   ---------------------------------------------------------------------}
 
 function DupeString(const AText: string; ACount: Integer): string;
-
-var i,l : SizeInt;
-
+var
+  Len: SizeInt;
+  Source, Target: PByte;
 begin
- result:='';
- if aCount>=0 then
-   begin
-     l:=length(atext);
-     SetLength(result,aCount*l);
-     for i:=0 to ACount-1 do
-       move(atext[1],Result[l*i+1],l);
-   end;
+  Len := Length(AText);
+  SetLength(Result, ACount * Len);
+  // Use PByte to skip implicit UniqueString, because SetLength always unique
+  Target := PByte(Result);
+  if Target = nil then // ACount = 0 or AText = ''
+    Exit;
+  // Now ACount > 0 and Len > 0
+  Source := PByte(AText);
+  repeat
+    Move(Source[0], Target[0], Len * SizeOf(Char));
+    Inc(Target, Len * SizeOf(Char));
+    Dec(ACount);
+  until ACount = 0;
 end;
 
 function ReverseString(const AText: string): string;
StrUtils.diff (4,479 bytes)

Serge Anvarov

2018-04-08 15:03

reporter   ~0107696

Here is a patch for StrUtils in which strings can contain the #0 character. If administrators don't mind, I'll make a patch for StringHelper.
Notes about patch:
May be extracting a substring for comparison is not very efficient, but does not depend on the platform. In addition, on many platforms, strings are implicitly converted before comparison.
AnsiIndexText fixed, because it's didn't care locale information.
DupeString just caught my eye and I fixed it for speedup

Benito van der Zander

2018-04-08 17:44

reporter   ~0107698

Right, I did not think about encoding mismatches. (I have always used my own functions btw https://github.com/benibela/bbutils/blob/master/bbutils.pas#L1095-L1138 )

But usually the encoding is the same and then it should avoid the copy. So it needs two functions in each case, one for mismatched and one for same encoding (a function that create a temporary string is still slow even if it does not create it 0030777 )

Michael Van Canneyt

2018-04-14 14:08

administrator   ~0107783

Fixed.
* Use SizeInt in TStringHelper
* StartsWith now allows #0
* '' returns true in EndsWith,StartsWith
* Patch for StrUtils applied.

Issue History

Date Modified Username Field Change
2018-04-04 12:40 Benito van der Zander New Issue
2018-04-04 12:49 Michael Van Canneyt Assigned To => Michael Van Canneyt
2018-04-04 12:49 Michael Van Canneyt Status new => assigned
2018-04-04 12:51 Michael Van Canneyt Note Added: 0107589
2018-04-05 05:59 Serge Anvarov Note Added: 0107622
2018-04-05 23:14 Benito van der Zander Note Added: 0107642
2018-04-08 14:53 Serge Anvarov File Added: StrUtils.diff
2018-04-08 15:03 Serge Anvarov Note Added: 0107696
2018-04-08 17:44 Benito van der Zander Note Added: 0107698
2018-04-14 14:08 Michael Van Canneyt Fixed in Revision => 38769
2018-04-14 14:08 Michael Van Canneyt Note Added: 0107783
2018-04-14 14:08 Michael Van Canneyt Status assigned => resolved
2018-04-14 14:08 Michael Van Canneyt Fixed in Version => 3.1.1
2018-04-14 14:08 Michael Van Canneyt Resolution open => fixed
2018-04-14 14:08 Michael Van Canneyt Target Version => 3.2.0