View Issue Details

IDProjectCategoryView StatusLast Update
0035817FPCRTLpublic2019-07-11 22:34
ReporterSerge AnvarovAssigned ToMichael Van Canneyt 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version3.3.1Product Build 
Target VersionFixed in Version3.3.1 
Summary0035817: Patch. TStrings.LoadFromStream does not work correctly with large files on x64
DescriptionThis is true only for 64-bit systems. We are talking about files of size 2GB or more. In 32 bit such files and so will not fit in memory.

In LoadFromStream there are two errors:
1. Variables of type SizeInt are used, although the Read procedure only understands LongInt. As a result of the overflow, negative values are already passed to it.
2. The upper limit of the unit size for reading is set to 2**38, which also exceeds the LongInt size.

In "Steps To Reproduce" given the project that is load strings from large zeroed stream. In order to avoid creating such large files, implement simple class TFilledZerosStream.

Patch included.
Steps To Reproduce{$APPTYPE CONSOLE}
{$MODE OBJFPC}
{$LONGSTRINGS ON}

uses Math, Classes;

type
  TFilledZerosStream = class(TStream)
  strict private
    FSize: Int64;
    FPosition: Int64;
  protected
    function GetSize: Int64; override;
    function GetPosition: Int64; override;
  public
    function Read(var Buffer; Count: LongInt): LongInt; override;
    function Seek(const Offset: Int64; Origin: TSeekOrigin): Int64; override;
    constructor Create(ASize: Int64);
  end;

constructor TFilledZerosStream.Create(ASize: Int64);
begin
  inherited Create;
  if ASize > 0 then
    FSize := ASize;
end;

function TFilledZerosStream.GetPosition: Int64;
begin
  Result := FPosition;
end;

function TFilledZerosStream.GetSize: Int64;
begin
  Result := FSize;
end;

function TFilledZerosStream.Read(var Buffer; Count: LongInt): LongInt;
var
  Rest: Int64;
begin
  Result := 0;
  if (Count > 0) and InRange(FPosition, 0, FSize - 1) then // TRUE: FSize > 0
  begin
    Result := Count;
    // Avoid overflow with an intermediate variable of type int64
    Rest := FSize - FPosition;
    If Result > Rest then
      Result := LongInt(Rest);
    FillChar(Buffer, Result, 0);
    Inc(FPosition, Result);
  end;
end;

function TFilledZerosStream.Seek(const Offset: Int64; Origin: TSeekOrigin): Int64;
begin
  // Avoid overflow by check value before addition
  case Origin of
    soBeginning:
      FPosition := EnsureRange(Offset, 0, FSize);
    soEnd:
      FPosition := FSize + EnsureRange(Offset, -FSize, 0);
    soCurrent:
      FPosition := FPosition + EnsureRange(Offset, -FPosition, FSize - FPosition);
  end;
  Result := FPosition;
end;

// Corrected version of TStrings.LoadFromStream
procedure LoadFromStream(Self: TStrings; Stream: TStream);
{
   Borlands method is no good, since a pipe for
   instance doesn't have a size.
   So we must do it the hard way.
}
Const
  BufSize = 1024;
Var
  Buffer : AnsiString;
  BytesRead: LongInt;
  BufLen: SizeInt;
  I,BufDelta : LongInt;
begin
  // reread into a buffer
  Self.beginupdate;
  try
    Buffer:='';
    BufLen:=0;
    I:=1;
    Repeat
      BufDelta:=BufSize*I;
      SetLength(Buffer,BufLen+BufDelta);
      BytesRead:=Stream.Read(Buffer[BufLen+1],BufDelta);
      Inc(BufLen, BufDelta);
      if BufDelta < (MaxInt div 2) then
         I:=I shl 1;
    Until BytesRead<>BufDelta;
    SetLength(Buffer, BufLen-BufDelta+BytesRead);
    Self.Text := Buffer;
    SetLength(Buffer,0);
  finally
    Self.EndUpdate;
  end;
end;

var
  L: TStringList;
  Z: TFilledZerosStream;
begin
  L := TStringList.Create;
  try
    Z := TFilledZerosStream.Create(MaxInt);
    try
      //L.LoadFromStream(Z); // Pos=2147482624, rest=1023
      LoadFromStream(L, Z); // Pos=2147483647, rest=0
      Writeln('Pos=', Z.Position, ', rest=', Z.Size - Z.Position);
    finally
      Z.Free;
    end;
  finally
    L.Free;
  end;
  Readln;
end.
Additional InformationThis error also exists in FPC 3.0.4. But it implementation of TStrings and so does not know how to work with large files.
TagsNo tags attached.
Fixed in Revision42351
FPCOldBugId
FPCTarget3.2.0
Attached Files
  • StringsLoadFromStream.diff (1,453 bytes)
    Index: rtl/objpas/classes/stringl.inc
    ===================================================================
    --- rtl/objpas/classes/stringl.inc	(revision 42346)
    +++ rtl/objpas/classes/stringl.inc	(working copy)
    @@ -1091,13 +1091,12 @@
     }
     Const
       BufSize = 1024;
    -  MaxGrow = 1 shl 29;
     
     Var
       Buffer     : AnsiString;
    -  BytesRead,
    -  BufLen,
    -  I,BufDelta     : SizeInt;
    +  BytesRead: LongInt;
    +  BufLen: SizeInt;
    +  I,BufDelta: LongInt;
     begin
       if not IgnoreEncoding then
         begin
    @@ -1115,7 +1114,7 @@
           SetLength(Buffer,BufLen+BufDelta);
           BytesRead:=Stream.Read(Buffer[BufLen+1],BufDelta);
           inc(BufLen,BufDelta);
    -      If I<MaxGrow then
    +      if BufDelta < (MaxInt div 2) then
             I:=I shl 1;
         Until BytesRead<>BufDelta;
         SetLength(Buffer, BufLen-BufDelta+BytesRead);
    @@ -1135,14 +1134,13 @@
     }
     Const
       BufSize = 1024;
    -  MaxGrow = 1 shl 29;
     
     Var
       Buffer         : TBytes;
       T              : string;
    -  BytesRead,
    -  BufLen,
    -  I,BufDelta: SizeInt;
    +  BytesRead: LongInt;
    +  BufLen: SizeInt;
    +  I,BufDelta: LongInt;
       PreambleLength : Longint;
     begin
       // reread into a buffer
    @@ -1156,7 +1154,7 @@
           SetLength(Buffer,BufLen+BufDelta);
           BytesRead:=Stream.Read(Buffer[BufLen],BufDelta);
           inc(BufLen,BufDelta);
    -      If I<MaxGrow then
    +      if BufDelta < (MaxInt div 2) then
             I:=I shl 1;
         Until BytesRead<>BufDelta;
         SetLength(Buffer,BufLen-BufDelta+BytesRead);
    

Activities

Serge Anvarov

2019-07-09 19:54

reporter  

StringsLoadFromStream.diff (1,453 bytes)
Index: rtl/objpas/classes/stringl.inc
===================================================================
--- rtl/objpas/classes/stringl.inc	(revision 42346)
+++ rtl/objpas/classes/stringl.inc	(working copy)
@@ -1091,13 +1091,12 @@
 }
 Const
   BufSize = 1024;
-  MaxGrow = 1 shl 29;
 
 Var
   Buffer     : AnsiString;
-  BytesRead,
-  BufLen,
-  I,BufDelta     : SizeInt;
+  BytesRead: LongInt;
+  BufLen: SizeInt;
+  I,BufDelta: LongInt;
 begin
   if not IgnoreEncoding then
     begin
@@ -1115,7 +1114,7 @@
       SetLength(Buffer,BufLen+BufDelta);
       BytesRead:=Stream.Read(Buffer[BufLen+1],BufDelta);
       inc(BufLen,BufDelta);
-      If I<MaxGrow then
+      if BufDelta < (MaxInt div 2) then
         I:=I shl 1;
     Until BytesRead<>BufDelta;
     SetLength(Buffer, BufLen-BufDelta+BytesRead);
@@ -1135,14 +1134,13 @@
 }
 Const
   BufSize = 1024;
-  MaxGrow = 1 shl 29;
 
 Var
   Buffer         : TBytes;
   T              : string;
-  BytesRead,
-  BufLen,
-  I,BufDelta: SizeInt;
+  BytesRead: LongInt;
+  BufLen: SizeInt;
+  I,BufDelta: LongInt;
   PreambleLength : Longint;
 begin
   // reread into a buffer
@@ -1156,7 +1154,7 @@
       SetLength(Buffer,BufLen+BufDelta);
       BytesRead:=Stream.Read(Buffer[BufLen],BufDelta);
       inc(BufLen,BufDelta);
-      If I<MaxGrow then
+      if BufDelta < (MaxInt div 2) then
         I:=I shl 1;
     Until BytesRead<>BufDelta;
     SetLength(Buffer,BufLen-BufDelta+BytesRead);

Bart Broersma

2019-07-10 12:53

reporter   ~0117134

> Borlands method is no good
You looked at their source code?

Michael Van Canneyt

2019-07-10 14:05

administrator   ~0117135

@Bart, the comment is mine, it exists since day 1.
And yes, I did know Borland's methods, one cannot avoid knowing them while debugging in delphi.

Michael Van Canneyt

2019-07-10 14:28

administrator   ~0117136

@Serge, why did you remove the MaxGrow constant ?
MaxGrow = Maxint Div 2:
would have been OK, no ?

Akira1364

2019-07-10 18:00

reporter   ~0117138

Last edited: 2019-07-10 19:03

View 4 revisions

@Michael:

MaxInt = MaxSmallint by default, and MaxLongint when in Delphi/ObjFPC modes. So perhaps it was part of the problem?

In general though I don't quite get why the patch still mixes (and compares / subtracts / e.t.c) both SizeInt and LongInt variables.

It would make more sense and be much safer to use exclusively SizeInt variables (especially considering, for example, SetLength takes a SizeInt, not a LongInt.)

Doing it that way would ensure correct behavior regardless of whether a 32 or 64 bit system was being used.

Michael Van Canneyt

2019-07-10 18:26

administrator   ~0117139

The classes unit is always compiled in 32-bit mode, so maxint is 32-bit unless you're compiling for 8086.
There is no confusion possible.

But using SizeInt everywhere is wrong.
Total size can be 64-bit, but read size for a single read operation can only be a longint,
so there using SizeInt is not correct, not on 64-bit, not on 16 bit.

Serge Anvarov

2019-07-11 00:28

reporter   ~0117149

@Michael:
Of course, it can be "MaxGrow = MaxInt div 2", but it is used only in one place, so it is easier to place it right where it is used.
I would even replace the line "I:=I shl 1;" with "Inc(I, I);", but for some it may be less clear.

@Akira1364:
About "use exclusively SizeInt" read error 1. in description.

Michael Van Canneyt

2019-07-11 22:34

administrator   ~0117190

Fixed the types, the constant was already changed in trunk.

Issue History

Date Modified Username Field Change
2019-07-09 19:54 Serge Anvarov New Issue
2019-07-09 19:54 Serge Anvarov File Added: StringsLoadFromStream.diff
2019-07-09 20:19 Michael Van Canneyt Assigned To => Michael Van Canneyt
2019-07-09 20:19 Michael Van Canneyt Status new => assigned
2019-07-10 12:53 Bart Broersma Note Added: 0117134
2019-07-10 14:05 Michael Van Canneyt Note Added: 0117135
2019-07-10 14:28 Michael Van Canneyt Note Added: 0117136
2019-07-10 14:28 Michael Van Canneyt Status assigned => feedback
2019-07-10 14:28 Michael Van Canneyt FPCTarget => -
2019-07-10 18:00 Akira1364 Note Added: 0117138
2019-07-10 18:25 Akira1364 Note Edited: 0117138 View Revisions
2019-07-10 18:26 Michael Van Canneyt Note Added: 0117139
2019-07-10 18:27 Akira1364 Note Edited: 0117138 View Revisions
2019-07-10 19:03 Akira1364 Note Edited: 0117138 View Revisions
2019-07-11 00:28 Serge Anvarov Note Added: 0117149
2019-07-11 00:28 Serge Anvarov Status feedback => assigned
2019-07-11 22:34 Michael Van Canneyt Status assigned => resolved
2019-07-11 22:34 Michael Van Canneyt Resolution open => fixed
2019-07-11 22:34 Michael Van Canneyt Fixed in Version => 3.3.1
2019-07-11 22:34 Michael Van Canneyt Fixed in Revision => 42351
2019-07-11 22:34 Michael Van Canneyt FPCTarget - => 3.2.0
2019-07-11 22:34 Michael Van Canneyt Note Added: 0117190