TCSVParser raises an unhandled EReadError exception when reading from a file containg less than 3 bytes and DetectBOM is True
Original Reporter info from Mantis: sileno
-
Reporter name: Sileno Gödicke
Original Reporter info from Mantis: sileno
- Reporter name: Sileno Gödicke
Description:
Analysis and Cause of the Bug:
The issue is in TCSVParser.ResetParser at Line 460 of cvsreadwrite.pp
where FSourceStream.ReadBuffer(b[0], 3) is expecting to read three bytes from the stream for the BOM check. If the filestream does not contain at least three bytes, e.g. it is a very short ANSI coded file (no BOM), TStream.ReadBuffer generates the EReadError exception. From the online Help: "ReadBuffer reads Count bytes of the stream into Buffer. If the stream does not contain Count bytes, then an exception is raised."
A bug fix is proposed in the Additional Information section
Steps to reproduce:
The bug is sistematically reproduceable as follows:
- Create a TCSVParser object
- set DetectBOM to True
- Create a filetream pointing to a csv file containing zero, one or two bytes
- Assign the stream to the parser calling TCSVParser.SetSource(AStream: TStream);
- EReadError exception is raised
A simple test application showing the bug symptoms with a test datafile is attached. Try to fill TestFile.csv with 0, 1 , 2 or 3 chars and run the ParserBOMBug.pas program.
Tested on Win 7 and Win 10 with Lazarus 1.8.4 win32 and fpc 3.0.4
Debugging the FCL is not easy because it is not compiled with debug info. If you just copy csvdocument.pp in the project directory and recompile it you will be able to follow the code in this unit with the debugger.
Additional information:
Bug Fix: A check of the stream length is made before attempting to read data. If the stream size is less than three bytes we can assume that there is no BOM.
Old Code
procedure TCSVParser.ResetParser; {from Line 451 of cvsreadwrite.pp}
var
b: packed array[0..2] of byte;
n: Integer;
begin
ClearOutput;
FSourceStream.Seek(0, soFromBeginning);
if FDetectBOM then
begin
FSourceStream.ReadBuffer(b[0], 3);
if (b[0] = $EF) and (b[1] = $BB) and (b[2] = $BF) then begin
FBOM := bomUTF8;
n := 3;
end else
if (b[0] = $FE) and (b[1] = $FF) then begin
FBOM := bomUTF16BE;
n := 2;
end else
if (b[0] = $FF) and (b[1] = $FE) then begin
FBOM := bomUTF16LE;
n := 2;
end else begin
FBOM := bomNone;
n := 0;
end;
FSourceStream.Seek(n, soFromBeginning);
end;
EndOfFile := False;
NextChar;
end;
New Code
procedure TCSVParser.ResetParser; {from Line 451 of cvsreadwrite.pp}
var
b: packed array[0..2] of byte;
n: Integer;
begin
ClearOutput;
FSourceStream.Seek(0, soFromBeginning);
if FDetectBOM then
begin
if FSourceStream.Size >= 3 then
begin
FSourceStream.ReadBuffer(b[0], 3);
if (b[0] = $EF) and (b[1] = $BB) and (b[2] = $BF) then begin
FBOM := bomUTF8;
n := 3;
end else
if (b[0] = $FE) and (b[1] = $FF) then begin
FBOM := bomUTF16BE;
n := 2;
end else
if (b[0] = $FF) and (b[1] = $FE) then begin
FBOM := bomUTF16LE;
n := 2;
end else begin
FBOM := bomNone;
n := 0;
end
end
else begin
FBOM := bomNone;
n := 0;
end;
FSourceStream.Seek(n, soFromBeginning);
end;
EndOfFile := False;
NextChar;
end;
Mantis conversion info:
- Mantis ID: 33886
- OS: Windows
- OS Build: 10
- Platform: i386
- Version: 3.0.4
- Fixed in version: 3.1.1
- Fixed in revision: 39324 (#b0c0102d)
- Monitored by: » sileno (Sileno Gödicke)
- Target version: 3.2.0