View Issue Details

IDProjectCategoryView StatusLast Update
0030664FPCRTLpublic2019-08-05 14:01
ReporterBenito van der ZanderAssigned ToMarco van de Voort 
PrioritynormalSeverityfeatureReproducibilityhave not tried
Status resolvedResolutionno change required 
Product Version3.1.1Product Build 
Target VersionFixed in Version 
Summary0030664: ideas for ghashmap
DescriptionThe ghashmap is extremely slow (e.g. compared to TFPDataHashTable from contnrs).

Here are some ideas to improve it:

* make the key const

* For large keys (e.g. strings), drop the second arg of hash, so it does not generate a hash from <0,n-1> but from <0,maxuint-1> and cache that hash in the map. Currently it is comparing full keys even if the full hash codes of those keys would be different, because it only gets the truncated hash codes.
Although for keys smaller than the hash that would not matter, it needs to adjust depending on the size.

* Drop TVector. The additional cache-miss from the indirection to the table array and in each container to the container array can hurt a lot. An array of arrays, or even an array of pointer blocks will be faster.


* Actually consider dropping the containers altogether. Open hashing is usually faster, despite the deletion complications. https://infosys.cs.uni-saarland.de/publications/p249-richter.pdf

TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget-
Attached Files

Activities

Maciej Izak

2016-09-28 20:49

reporter   ~0094856

IIRC ghashmap uses chained hashing. We don't need to change ghashmap, current list of containers is not bad (I rather not to use "complex" word ;) ). Please note that we have new rtl-generics package which has many new kinds of hash maps. Generics.Collections contains following hashing schemes:

* linear probing (TOpenAddressingLP / TDictionary)
* linear probing with tombstones (TOpenAddressingLPT)
* quadratic probing (TOpenAddressingQP)
* double hashing (TOpenAddressingDH)
* de-amortized cuckoo hashing (TCuckooD2/TFastHashMap, TCuckooD4/THashMap, TCuckooD6)

Each of schemes is configurable. TCuckooD2 is almost always faster than TFPDataHashTable.

Benito van der Zander

2016-09-28 21:35

reporter   ~0094857

Last edited: 2016-09-28 21:52

View 2 revisions

>IRC ghashmap uses chained hashing. We don't need to change ghashmap, current list of containers is not bad (I rather not to use "complex" word ;) ).

Well, there is no point in having several or a bad hash map implementation.

>Please note that we have new rtl-generics package which has many new kinds of hash maps

It is called rtl-generics now?
I tried that first, but from https://github.com/dathox/generics.collections and it did even not compile with trunk. rtl-generics however did compile (despite only differening in an <T>), was installed, but not found. Now Lazarus crashed parsing it. Not really recommendable, yet.

> Each of schemes is configurable. TCuckooD2 is almost always faster than TFPDataHashTable.

Seems consistently slower on my system (random string keys). Only quadratic probing is faster than TFPObjectHashTable. Actually cuckoo is slower than ghashmap, perhaps fpc screws up some optimizations.

Or perhaps, when I use random keys, the hashes are optimally distributed, so the collision handling does not affect the speed, only the implementation overhead does.

Maciej Izak

2016-09-29 09:29

reporter   ~0094862

> It is called rtl-generics now? ... Not really recommendable, yet.

Lazarus is separate project so it has nothing about recommendations especially that we have bug/bugs on Lazarus side for rtl-generics. btw. Lazarus has own temporary package sparta_Generics (rtl-generics will be part of FPC 3.2 or 4.0 but I need Generics.Collections now for my Lazarus work for FPC 3.0.x). rtl-generics FPC package and sparta_Generics Lazarus package have the same source:

https://github.com/dathox/generics.collections

github repo is hosted separately since I am not FPC developer, so I need to host somewhere my library :P. Github repo has latest features and is synced with rtl-generics.

> Only quadratic probing is faster than TFPObjectHashTable. Actually cuckoo is slower than ghashmap

all depends of kind of data, that is why we have many hashing schemes. Each of hashing scheme has different load factor and memory layout. Usually the biggest performance impact is visible for big data amount. TCuckooD2 is very fast for big data count (is developed for fast lookup, almost 2x faster than any other technique, but for big data amount)

Thaddy de Koning

2016-09-29 10:48

reporter   ~0094867

That is also consistent - at least in line - with the results presented in the research paper that Benito referred to, btw. Interesting read,

Maciej Izak

2016-09-29 11:24

reporter   ~0094868

Last edited: 2016-09-29 11:24

View 2 revisions

IMO this "topic" should be closed and moved into fpc-devel or fpc-pascal mailing list.

Benito van der Zander

2016-09-29 12:25

reporter   ~0094869

No, this was not supposed to be a discussion or a comparison to generics.collection, but requests for the ghashmap implementation.

It will be faster with const arguments and array instead of tvector

Maciej Izak

2016-09-29 12:44

reporter   ~0094871

Last edited: 2016-09-29 12:46

View 2 revisions

OK. Please split this report to few other reports. This report has too many topics and looks for me more like discussion about hash maps.

Maciej Izak

2016-09-29 12:48

reporter   ~0094872

I have even better idea - create patch ;)

Marco van de Voort

2018-02-26 21:06

manager   ~0106655

I do agree this report is a bit open ended. Better submit concrete proposals, or go to the maillists for abstract discussion.

Marco van de Voort

2019-07-18 22:03

manager   ~0117311

rtl-generics has been trunk (and fixes_3_2) for a while now. No reaction to the report.

Last call before closing. If you want to respond, please phrase your answer in a narrow way (i.e. concrete changes/patches)

Marco van de Voort

2019-08-05 14:01

manager   ~0117574

No feedback in over an month

Issue History

Date Modified Username Field Change
2016-09-28 19:13 Benito van der Zander New Issue
2016-09-28 20:49 Maciej Izak Note Added: 0094856
2016-09-28 21:35 Benito van der Zander Note Added: 0094857
2016-09-28 21:52 Benito van der Zander Note Edited: 0094857 View Revisions
2016-09-29 09:29 Maciej Izak Note Added: 0094862
2016-09-29 10:48 Thaddy de Koning Note Added: 0094867
2016-09-29 11:24 Maciej Izak Note Added: 0094868
2016-09-29 11:24 Maciej Izak Note Edited: 0094868 View Revisions
2016-09-29 12:25 Benito van der Zander Note Added: 0094869
2016-09-29 12:44 Maciej Izak Note Added: 0094871
2016-09-29 12:46 Maciej Izak Note Edited: 0094871 View Revisions
2016-09-29 12:48 Maciej Izak Note Added: 0094872
2018-02-26 21:06 Marco van de Voort Severity minor => feature
2018-02-26 21:06 Marco van de Voort Note Added: 0106655
2019-07-18 22:03 Marco van de Voort Assigned To => Marco van de Voort
2019-07-18 22:03 Marco van de Voort Status new => feedback
2019-07-18 22:03 Marco van de Voort FPCTarget => -
2019-07-18 22:03 Marco van de Voort Note Added: 0117311
2019-08-05 14:01 Marco van de Voort Status feedback => resolved
2019-08-05 14:01 Marco van de Voort Resolution open => no change required
2019-08-05 14:01 Marco van de Voort Note Added: 0117574