View Issue Details

IDProjectCategoryView StatusLast Update
0036822FPCUtilitiespublic2021-06-08 15:48
ReporterChris Rorden Assigned ToMichael Van Canneyt  
PrioritynormalSeverityfeatureReproducibilityalways
Status closedResolutionfixed 
PlatformMacBook 2012 Retina 13"OSDarwin 
Product Version3.0.4 
Fixed in Version3.3.1 
Summary0036822: zstream fails for concatenated gz files
DescriptionThe gzip format allows for multiple such streams to be concatenated (gzipped files are simply decompressed concatenated as if they were originally one file) (1). In other words, a single file could be broken into parts, each part compressed with gz and the resulting gz files stacked into a single file in sequential order. When decompressed, the result should be identical to the input file. This strategy is used by the bgzf format and mgzip - the advantages include faster random access, faster compression and faster decompression, at a small cost in terms of compression efficiency. The files created by bgzf and mgzip are fully compatible with the gzip standard, but the pascal zstreams unit only appears to read the first block.
 


1.) https://en.wikipedia.org/wiki/Gzip
2.) http://samtools.github.io/hts-specs/SAMv1.pdf
3.) https://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
4.) https://pypi.org/project/mgzip/
Steps To ReproduceCreate a blocked gzip file. For the example, I compressed a 1147232 byte file to 284864 bytes using the following Python code:

import mgzip
fnm = 'img.nii'
fh = open(fnm, "rb")
gh = mgzip.open(fnm + ".gz", "wb", compresslevel=9, blocksize=10**5)
data = fh.read()
gh.write(data)
gh.close()

Now try to extract this with zstream. It will only identify 200000 bytes, and if you try to read more you will get an exception.
TagsNo tags attached.
Fixed in Revision49421
FPCOldBugId
FPCTarget4.0.0
Attached Files

Activities

Chris Rorden

2020-03-24 20:41

reporter  

img.nii.gz (284,864 bytes)
decomp.pas (849 bytes)   
program decomp;

{$mode Delphi}{$H+}



uses
  Classes, SysUtils, zstream;

function gzBytes(fnm: string): integer;
var
  gz: TGZFileStream;
  chunk:string;
  cnt, sum:integer;
const
  CHUNKSIZE=4096;
begin
gz:= TGZFileStream.create(fnm,gzopenread);
sum := 0;
setlength(chunk,CHUNKSIZE);
repeat
  cnt:=gz.read(chunk[1],CHUNKSIZE);
  if cnt<CHUNKSIZE then
    setlength(chunk,cnt);
  sum := sum + cnt;
until cnt<CHUNKSIZE;
exit(sum);
end;

procedure unGz(fnm: string);
const
	kOutSz = 1147232;
	//kOutSz = 200000;
var
	Stream: TGZFileStream;
    fRawVolBytes : array of byte;
begin
    Stream := TGZFileStream.Create (fnm, gzopenread);
    SetLength (fRawVolBytes, kOutSz);
    Stream.ReadBuffer (fRawVolBytes[0], kOutSz);
    Stream.Free;
    fRawVolBytes := nil;
end;


begin;
	Writeln(inttostr(gzBytes('img.nii.gz')));
	unGz('img.nii.gz');	
end.

decomp.pas (849 bytes)   

Michael Van Canneyt

2020-03-25 07:31

administrator   ~0121701

I can confirm that gzip extracts the full file.

Chris Rorden

2021-05-28 14:05

reporter   ~0131070

geraldholdswor provides some code to read such files:
https://forum.lazarus.freepascal.org/index.php/topic,54796.0.html

Michael Van Canneyt

2021-05-31 20:09

administrator   ~0131103

gzio has support for reading concatenated files.

You created a blocked file, and 2 blocks were being read,
but at the end of the second block, an error occurred.

A check was using the 'total-over-blocks' instead of the 'total-for-this-block' so it gave an error at the end of the second block.

I fixed the check, the workaround is not necessary.

Chris Rorden

2021-06-08 15:48

reporter   ~0131201

Thanks, this works perfectly!

Issue History

Date Modified Username Field Change
2020-03-24 20:41 Chris Rorden New Issue
2020-03-24 20:41 Chris Rorden File Added: img.nii.gz
2020-03-24 20:41 Chris Rorden File Added: decomp.pas
2020-03-24 21:33 Marco van de Voort Severity minor => feature
2020-03-24 21:33 Marco van de Voort FPCTarget => -
2020-03-25 07:31 Michael Van Canneyt Assigned To => Michael Van Canneyt
2020-03-25 07:31 Michael Van Canneyt Status new => acknowledged
2020-03-25 07:31 Michael Van Canneyt Note Added: 0121701
2021-05-28 14:05 Chris Rorden Note Added: 0131070
2021-05-31 20:09 Michael Van Canneyt Status acknowledged => resolved
2021-05-31 20:09 Michael Van Canneyt Resolution open => fixed
2021-05-31 20:09 Michael Van Canneyt Fixed in Version => 3.3.1
2021-05-31 20:09 Michael Van Canneyt Fixed in Revision => 49421
2021-05-31 20:09 Michael Van Canneyt FPCTarget - => 4.0.0
2021-05-31 20:09 Michael Van Canneyt Note Added: 0131103
2021-06-08 15:48 Chris Rorden Status resolved => closed
2021-06-08 15:48 Chris Rorden Note Added: 0131201