View Issue Details

IDProjectCategoryView StatusLast Update
0036822FPCUtilitiespublic2020-03-25 08:31
ReporterChris Rorden Assigned ToMichael Van Canneyt  
PrioritynormalSeverityfeatureReproducibilityalways
Status acknowledgedResolutionopen 
PlatformMacBook 2012 Retina 13"OSDarwin 
Product Version3.0.4 
Summary0036822: zstream fails for concatenated gz files
DescriptionThe gzip format allows for multiple such streams to be concatenated (gzipped files are simply decompressed concatenated as if they were originally one file) (1). In other words, a single file could be broken into parts, each part compressed with gz and the resulting gz files stacked into a single file in sequential order. When decompressed, the result should be identical to the input file. This strategy is used by the bgzf format and mgzip - the advantages include faster random access, faster compression and faster decompression, at a small cost in terms of compression efficiency. The files created by bgzf and mgzip are fully compatible with the gzip standard, but the pascal zstreams unit only appears to read the first block.
 


1.) https://en.wikipedia.org/wiki/Gzip
2.) http://samtools.github.io/hts-specs/SAMv1.pdf
3.) https://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
4.) https://pypi.org/project/mgzip/
Steps To ReproduceCreate a blocked gzip file. For the example, I compressed a 1147232 byte file to 284864 bytes using the following Python code:

import mgzip
fnm = 'img.nii'
fh = open(fnm, "rb")
gh = mgzip.open(fnm + ".gz", "wb", compresslevel=9, blocksize=10**5)
data = fh.read()
gh.write(data)
gh.close()

Now try to extract this with zstream. It will only identify 200000 bytes, and if you try to read more you will get an exception.
TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget-
Attached Files

Activities

Chris Rorden

2020-03-24 21:41

reporter  

img.nii.gz (284,864 bytes)
decomp.pas (849 bytes)   
program decomp;

{$mode Delphi}{$H+}



uses
  Classes, SysUtils, zstream;

function gzBytes(fnm: string): integer;
var
  gz: TGZFileStream;
  chunk:string;
  cnt, sum:integer;
const
  CHUNKSIZE=4096;
begin
gz:= TGZFileStream.create(fnm,gzopenread);
sum := 0;
setlength(chunk,CHUNKSIZE);
repeat
  cnt:=gz.read(chunk[1],CHUNKSIZE);
  if cnt<CHUNKSIZE then
    setlength(chunk,cnt);
  sum := sum + cnt;
until cnt<CHUNKSIZE;
exit(sum);
end;

procedure unGz(fnm: string);
const
	kOutSz = 1147232;
	//kOutSz = 200000;
var
	Stream: TGZFileStream;
    fRawVolBytes : array of byte;
begin
    Stream := TGZFileStream.Create (fnm, gzopenread);
    SetLength (fRawVolBytes, kOutSz);
    Stream.ReadBuffer (fRawVolBytes[0], kOutSz);
    Stream.Free;
    fRawVolBytes := nil;
end;


begin;
	Writeln(inttostr(gzBytes('img.nii.gz')));
	unGz('img.nii.gz');	
end.

decomp.pas (849 bytes)   

Michael Van Canneyt

2020-03-25 08:31

administrator   ~0121701

I can confirm that gzip extracts the full file.

Issue History

Date Modified Username Field Change
2020-03-24 21:41 Chris Rorden New Issue
2020-03-24 21:41 Chris Rorden File Added: img.nii.gz
2020-03-24 21:41 Chris Rorden File Added: decomp.pas
2020-03-24 22:33 Marco van de Voort Severity minor => feature
2020-03-24 22:33 Marco van de Voort FPCTarget => -
2020-03-25 08:31 Michael Van Canneyt Assigned To => Michael Van Canneyt
2020-03-25 08:31 Michael Van Canneyt Status new => acknowledged
2020-03-25 08:31 Michael Van Canneyt Note Added: 0121701