[Patch] "LEB128" Encoding to reduce PPU file size by 10%
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
This is a new facility in the compiler that serves to reduce the size of PPU files by approximately 10% with only a very small performance hit. It seeks to reduce the number of bytes stored for size, count and length fields etc. when they normally store small values.
To achieve this, it uses the LEB128 encoding system (see https://en.wikipedia.org/wiki/LEB128 for a technical description), which is already in use within the compiler for the DWARF debug file format. Instead of using a full 4 bytes or 8 bytes for storing values that are frequently less than, say, 100 (but which much support the full range), LEB128 only stores the number in 1 byte for such small values.
Steps to reproduce:
Apply patches and confirm correct operation and reduced file size of PPU files.
"leb_" patches are independent, although all depend on "leb_base.patch", which contains the encoding implementation (and increments the PPU version by 1).
"leb_misc_util_ppudump.patch" updates the ppudump utility to correctly decode LEB128-encoded fields (also requires "leb_base.patch")
Additional information:
Compared to Varlen ( http://wiki.freepascal.org/Varlen_Encoding ), some space efficiency is lost (when compiling Lazarus, the resultant 907 PPU wiles were 83,286 bytes larger in total), but LEB128 is a known, proven standard.
These minor inefficiencies are due to the following reasons:
- LEB128 require 10 bytes to store the largest 64-bit values, whereas Varlen only requires 9.
- Varlen uses offsets to remove duplicate encodings and increase the chance of a value requiring fewer bytes to store (e.g. 16,384 requires 3 bytes to store under LEB128, whereas Varlen only requires 2).