TAChart: TCubicSplineSeries may omit some points when drawing
Original Reporter info from Mantis: Marcin Wiazowski
-
Reporter name:
Original Reporter info from Mantis: Marcin Wiazowski
- Reporter name:
Description:
Here comes a bug report, that I promised in #35250 (closed).
Due to nature of the cubic spline interpolation, the interpolated curve crosses every existing data point. But, currently, TCubicSplineSeries doesn't guarantee this.
As described in detail in #35250 (closed), TCubicSplineSeries - and also three other series classes - use an internal TIntervalList object when drawing. These are:
- TExpressionSeries - doesn't use a data source, so doesn't have any data points,
- TFuncSeries - as above,
- TFitSeries - uses data source for calculations, and the curve is allowed to run smoothly between data points,
- TCubicSplineSeries - uses data source for calculations, and the curve is required to cross every data point.
So the problem described here is specific to TCubicSplineSeries only, because only TCubicSplineSeries is required to cross every data point.
As described in detail in #35250 (closed), TCubicSplineSeries has a Step property. When drawing the series, for every pixel on the horizontal axis (when Step = 1) / every second pixel on the horizontal axis (when Step = 2) / etc. / etc., function's Y value is calculated, and a straight line is drawn to the last calculated (X, Y) point; TIntervalList implements this behavior.
But this means, that - when there is a local data peak between two adjacent X pixels - this peak can be omitted when drawing; the attached Explanation.png shows this case.
This effect can be easily achieved in practice. For example, when horizontal axis' range is one week (604800 seconds) and chart's width is 1000 pixels, we have about 605 seconds (10 minutes) per one X pixel (assuming the most demanding setting Step = 1) - so peak in data, that lasts only 5 minutes, in many cases will be omitted - or drawn only partially. And, for the default TCubicSplineSeries' setting Step = 4, things are even worse.
The attached Reproduce application shows symptoms of the problem. Chart1 reproduces the behavior presented in Explanation.png. On Chart2, at the right side, we can see a strange effect for some chart widths - this is because horizontal axis' end is exactly at X = 1000, and X = 1000 is located somewhere between two pixels, so - in fact - function's value is calculated not for X = 1000, but for the last visible pixel, which may be for example at X = 1000.8 - and func(1000.8) is much greater than func(1000), so the curve finishes its drawing much higher than it should.
Thanks to changes introduced in #35250 (closed), solution is quite simple: we may create an exclusion range - of width 0 - around every existing data point:
for ...
FIntervals.AddRange(FX[i], FX[i], [ioOpenStart, ioOpenEnd])
This way, curve drawing will be ended - and then also restarted - exactly at each ( FX[i], func(FX[i]) ) point, where - because we are exactly at the data point - func(FX[i]) = FY[i]. So we will effectively visit every data point when drawing the curve - which is exactly what we needed.
Surprisingly, this still doesn't work properly. Short investigation shows, that TIntervalList reveals some unexpected behavior. Let's assume, that we defined two excluded ranges in TIntervalList:
[10 .. 20] [30 .. 40]
Now we call TIntervalList.Intersect() to ask for a 0 .. 100 range.
Expected behavior: Intersect() returns a 10 .. 20 range - which means, that we should draw a line from X = 0 to X = 10, and then the next line should start at X = 20. Consecutive call to Intersect() should return a 30 .. 40 range, so the line should end at X = 30 and then the next-next line should start at X = 40.
Current behavior: Intersect() returns a 10 .. 40 range - which means, that we should draw a line from X = 0 to X = 10, and then the next line should start at X = 40.
This way, we omitted the 20 .. 30 range when drawing. The described behavior has been probably assumed to be an optimization when drawing: since we asked about range 0 .. 100, this probably means that our pixel N is at X = 0 and pixel N+1 is at X = 100 - so it may seem that there is no need to draw a line between X = 20 and X = 30.
In addition, making "AHint" parameter - that is passed to Intersect() call - larger than usual (although still in range), may change the returned range from 10 .. 40 to 30 .. 40 - this is definitely is a bug, since AHint is only to make the execution potentially faster, and not to change the returned result.
Fortunately, we can safely remove the described optimization. After removing, we will be also drawing the line between X = 20 and X = 30. In most cases, points (20, func(20)) and (30, func(30)) will be mapped to the same pixel on the chart, so MoveTo() and LineTo() calls will operate on the same pixel - which will effectively draw nothing for the X range from 20 to 30 - exactly as it is now.
But, sometimes, points (20, func(20)) and (30, func(30)) will be mapped to different pixels, so a line will be drawn - this will be for cases, when there is a large change in function's value between consecutive X pixels. For these cases, the line should really be drawn - exactly as in our TCubicSplineSeries example.
Currently, TAChart's code uses - internally - only two exclusion ranges: -INF .. FirstPointX and LastPointX .. +INF (for TFitSeries and TCubicSplineSeries). So the change will not affect currently existing TAChart's internals in any way.
And this is everything that is required, from the functional point of view, to solve the problem with drawing TCubicSplineSeries.
The attached patch1.diff implements the solution.
But it's good to take a look at one more thing: when drawing, many calls to TIntervalList.Intersect() are made - but this call is highly optimized. On the contrary, TIntervalList.AddRange() is not optimized in any way, so adding many data points is slow. The fix is quite simple: in TIntervalList.AddRange(), there are two "while" loops: one searches the already existing ranges from first to last, and the other from last to first. But, in all practical cases - including our TCubicSplineSeries case - ranges will be added in the ascending order, i.e. at the end of the currently existing range list. Making both the "while" loops working from the last to the first item, makes things dramatically faster.
The attached patch2.diff implements this improvement - and also fully includes patch1.diff.
There is still one thing to improve: in our case, each AddRange() call executes "SetLength(FIntervals, Length(FIntervals) + 1)", so memory is reallocated a lot of times. This is of course slow.
So the attached patch3.diff implements a commonly known "capacity" functionality - and also fully includes patch1.diff and patch2.diff.
====
There are some speed measurement applications attached here. Testing algorithm is:
-
Launch "IntervalsSpeedTest0and1and2" (it adds 50000 ranges to TIntervalList) without any patches.
-
Launch "IntervalsSpeedTest0and1and2" (it adds 50000 ranges to TIntervalList) with patch1.diff.
-
Launch "IntervalsSpeedTest0and1and2" (it adds 50000 ranges to TIntervalList) with patch2.diff.
Results on a low-end laptop:
-
5820 ms
-
5820 ms
-
15 ms
-
Launch "IntervalsSpeedTest3" (it adds 2000000 ranges to TIntervalList) with patch3.diff.
Results on a low-end laptop:
3a) without using "capacity" feature (nevertheless, there are still much less memory reallocations than with patch2.diff): 2465 ms
3b) with using "capacity" feature (there are no memory reallocations at all): 125 ms
4a) Launch "DrawingSpeedTest" without any patches.
4b) Launch "DrawingSpeedTest" with patch3.diff.
Results on a low-end laptop:
4a) 3.28 ms per drawing
4b) 5.00 ms per drawing
"DrawingSpeedTest" uses 2000000 data points, but displays only about 500 of them due to horizontal axis' Range settings - this simulates a situation, when we have many data points collected (for example from few years), but display only some subrange of them (for example only from one week). Drawing is repeated 100 times and the result is averaged. There is a noticeable difference in drawing speed between 4a) - where TIntervalList holds 2 ranges, i.e from -INF to 1 and from 2000000 to +INF, and 4b) - where TIntervalList holds 2000000 ranges, but the difference is acceptable. This means, that internally executed TIntervalList.Intersect() calls, even when operating on 2000000 ranges, are still optimized enough to not introduce too much slow down.
As expected, the best solution is patch3.diff.