TAChart: sorted data autodetection for TListChartSource
Original Reporter info from Mantis: Marcin Wiazowski
-
Reporter name:
Original Reporter info from Mantis: Marcin Wiazowski
- Reporter name:
Description:
I'm attaching a patch, that introduces automatic detection of sorted data order in TListChartSource - which was already discussed a bit in ~115914 and ~116273.
In case, when TListChartSource detects, that its data points are sorted, TListChartSource.IsSorted() method returns True - even if Sorted property has not been explicitly set. This enables various optimizations and speed ups in chart handling - which is useful especially for time-series data.
Most of the implementation is placed in TCustomSortedChartSource, and TListChartSource was only adjusted a bit. TCustomSortedChartSource implements now an additional constructor, where an AUseSortedAutoDetected : Boolean parameter is passed - TListChartSource passes True, so new functionality is enabled, while TSortedChartSource passes False, so it still works as it was earlier (TSortedChartSource could also use the default constructor, but False is passed explicitly to show, that this is a deliberate decision).
The question is: when can we assume, that data is sorted? The answer is:
- if source is empty or has only one point,
OR
- if newly added/inserted point is still in the sorted order,
OR
- if modified point is still in the sorted order,
OR
- if DoSort() method has been successfully called.
Ad 1) We should monitor all the situations, when source's item count is decreased - which is possible only in TListChartSource.Clear() and TListChartSource.Delete() methods. To handle these cases, a new method has been introduced: TCustomSortedChartSource.ItemDeleted().
Ad 2) Adding/inserting data points is already fully monitored - TCustomSortedChartSource.ItemAdd() or TCustomSortedChartSource.ItemInsert() methods are called in these cases. So it's enough to adjust ItemAdd() and ItemInsert() as needed.
Ad 3) Modifying data points is already fully monitored - TCustomSortedChartSource.ItemModified() method is called in these cases. So it's enough to adjust ItemModified() as needed.
Ad 4) DoSort() is called only in TCustomSortedChartSource.Sort(). So it's enough to adjust Sort() as needed.
So the general idea of the introduced changes is rather simple. The TCustomSortedChartSource.IsSorted() implementation is now:
function TCustomSortedChartSource.IsSorted: Boolean;
begin
Result := false;
if FSorted or FSortedAutoDetected then
case FSortBy of
sbX:
Result := (FSortIndex = 0) or (FSortIndex < FXCount);
sbY:
Result := (FSortIndex = 0) or (FSortIndex < FYCount);
sbColor, sbText:
Result := true;
sbCustom:
Result := Assigned(FOnCompare);
end;
end;
As can be seen, to return True, either FSorted must be set (which means, that dataset has been sorted forcibly) or FSortedAutoDetected must be set (which means, that current dataset contents are sorted). In addition, current sorting parameters must be valid - otherwise FSorted / FSortedAutoDetected are meaningless.
So our task is to manage the FSortedAutoDetected variable. To achieve this at lowest level, two new methods have been added:
procedure TCustomSortedChartSource.ResetSortedAutoDetected;
begin
FSortedAutoDetected := FUseSortedAutoDetected and (FData.Count < 2) and (FSortBy <> sbCustom);
end;
procedure TCustomSortedChartSource.SetSortedAutoDetected;
begin
FSortedAutoDetected := FUseSortedAutoDetected and (FSortBy <> sbCustom);
end;
SetSortedAutoDetected() sets FSortedAutoDetected regardless of the current item count, so it is called only:
- after successful call to DoSort() - so there is guarantee, that all data points are sorted properly,
- in TListChartSource.CopyFrom(), when the whole sorted dataset is copied from some other source.
In all other cases ResetSortedAutoDetected() is called - this is either with the intention of setting FSortedAutoDetected to False, after adding/inserting/modifying data in unsorted order, or to try to set FSortedAutoDetected to True again, when some data point has been deleted. When changing sorting parameters, ResetSortedAutoDetected() is also called, because FSortedAutoDetected state is no longer valid.
On more observation is: the user is allowed to modify data points directly, when between BeginUpdate() and EndUpdate() calls. For this reason, TCustomSortedChartSource.GetItem() has been modified, so referencing the source's Item[] property - when in update mode - calls ResetSortedAutoDetected(). This is because calling Item[] allows to reference (and thus modify) data point's memory - so we can no longer be sure, that dataset is still sorted.
As a consequence - for internal needs - an ItemInternal[] protected property has been introduced. It just makes a call to the virtual GetItem() method - but then restores the FSortedAutoDetected state. ItemInternal[] is used internally, when there is guarantee, that: either data point is not modified (it is accessed in the read-only mode), or changes are handled properly, by making a call to ItemModified() / ItemDeleted() / etc. Because - of course - all dataset changes in TListChartSource's implementation are handled properly - all references to Item[] property, in TListChartSource's implementation, have been changed to ItemInternal[].
An advantage of having FSortedAutoDetected is also in TCustomSortedChartSource.Sort(): if FSortedAutoDetected is set, data is currently sorted - so Sort() just exits. Thanks to this, calling Sort() more than once does not consume additional time.
It's no longer assumed, that every call to Sort() finishes with a call to Notify(); for this reason, statements like:
if IsSorted then Sort else Notify;
were transformed into something like:
if IsSorted then Sort;
Notify;
But, to avoid useless notifications, a protected helper method was introduced:
procedure TCustomSortedChartSource.SortNoNotify;
begin
{ Don't call BeginUpdate() to avoid invalidating the caches. }
Inc(FUpdateCount);
try
Sort;
finally
Dec(FUpdateCount);
end;
end;
so in fact, statements are rather like:
if IsSorted then SortNoNotify;
Notify;
Using SortNoNotify() makes no sense, if already between BeginUpdate() .. EndUpdate() calls - in such cases, just Sort() is still called.
As can be seen in the ResetSortedAutoDetected() / SetSortedAutoDetected() implementation, FSortedAutoDetected is never set for FSortBy = sbCustom mode. This is because custom sorting function may use some external variables for its work - and changing these external variables may make the custom sorting function behaving differently. This means, that changing an external variable should lead to invalidating FSortedAutoDetected state - but we don't have a way of monitoring changes in custom external variables. This could be overcame by requiring the user to make a call to ResetSortedAutoDetected() in such cases - but there is rather no practical reason for using FSortedAutoDetected in sbCustom mode; this is because all the internal chart optimizations - like extent calculation - are designed to work only in sbX or sbY modes.
After applying changes, the SpeedTest application - attached to #35356 (closed) - returns exactly same results, with one exception: execution time of Test 3 is now much shorter, i.e. roughly equal to execution time of Test 2 (in fact even a bit shorter, due to some implementation differences). Explanation:
Test 2 is:
List.Sorted := true;
for I := 1 to 1000000 do
List.Add(I,I);
Test 3 is:
for I := 1 to 1000000 do
List.Add(I,I);
List.Sorted := true;
In Test 2, Sorted is set to True BEFORE adding any data - so Sort() is called, but there is no data to sort, so it exits immediately. Then new data is added in the sorted order - so there is still nothing to do.
In Test 3, Sorted is set to True AFTER adding data - so Sort() is called on the whole dataset. Without the patch, Sort() just sorts the existing data, which consumes much time. After applying the patch, FSortedAutoDetected is set to True (because data has been added in the sorted order), so Sort() implementation will just exit in this case.
This leads us to important conclusion: if we are going to add data in the sorted order, and also set Sorted property to True, it's no longer important to set Sorted BEFORE adding data points - this can be now done also AFTER adding the sorted data, without any performance penalty.
Although the patch introduces many changes, most of them are similar - like changing Item[] references to ItemInternal[] references in many places. I tested every line of the modified code, and everything worked properly for me.
Mantis conversion info:
- Mantis ID: 35681
- Build: 61331
- Version: 2.1 (SVN)