View Issue Details

IDProjectCategoryView StatusLast Update
0038118LazarusFCLpublic2020-12-14 05:44
ReporterJBThiel Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
PlatformFPC 3.0.4 vs FPC 2.6.4OSLinux GTK2 
Product Version1.8.4 
Summary0038118: Extreme Slowdown in FPC 3.0.x versus 2.6.4, DBGrid Sqlite3 master-detail
DescriptionI am working to migrate a Lazarus App to newer FPC 3.0.x
From prior FPC 2.6.4, and Lazarus versions 1.2.x, 1.4.x, 1.6.x, 1.8.x
on Linux GTK2
Testing described below is all with Lazarus 1.8.4 which compiles with both FPC 2.6.4 and 3.0.4

App design is master-detail DBGrids, on SQLite v3.8.5 database, with associated SQLConnections, Datasets, etc.


PROBLEM: With FPC 3.0.4, master-detail DBGrid is extremely much slower, the app is unusable.

To illustrate:
Cursor Next Line (Down arrow) in a master DBGrid, the cursor should move and detail grid update.
With FPC 2.6.4 is instantaneous.
With FPC 3.0.4 this takes about 1+ second.

During the delay, it's 100% usage on a single cpu, and the whole app is frozen, menus don't work, etc.
When it comes back, the cursor moves and detail grid updates, simultaneously (as expected), and
correct data is displayed.
Does not seem to be dependent on how many records are in the detail.

Also, I noticed if you key Next Line 2x, the delay will be 2x, and the final update will show the 2nd line down.
and likewise 3x, 4x, etc. Also note PgDown takes same amount of time as NextLine.
(ie. the delay seems specifically related to time for the detail grid/query to track the newly selected master row)
Additional InformationI have done some valgrind Profiling, which does not show obvious runaway functions that I can distinguish from normal operation.
It's basically all idle loop and handle message, context dispatch, widget events, alloc, free, etc.

I have 2 suspect areas so far:

1) Possible Bug(s) in FCL event handlers, spurious/repeated events, exceptions, etc.

2) Comparing execution profile of FPC 304 with 264, I see considerably more ANSISTRING manipulations.
I noticed there are many changes in the FCL Sqlite code for 3.0.4, like sqlite3conn.pp and sqlite3.inc,
and all Sqlite external function declarations with pchar were changed to pansichar.
Also AnsiStrings now have Unicode/CodePage support in FPC 3.
Do I need to make some settings to account for this? I'm unfamiliar with codepage and basically ignoring it.

In particular, when profiling these functions stand out for FPC 304, are not in FPC 264:
  ( 4.06%) ???:SYSUTILS_$$_INTERNALCHANGECASE$ANSISTRING$TSYSCHARSET$LONGINT$$ANSISTRING
  ( 1.55%) ???:fpc_ansistr_compare_equal
  ( 1.48%) ???:CWSTRING_$$_UPPERANSISTRING$ANSISTRING$$ANSISTRING
Instead of the last one above, FPC 264 has
  ( 1.51%) ???:SYSUTILS_UPPERCASE$ANSISTRING$$ANSISTRING


REPORT is two-fold:

1) What is going wrong, what is the bug, what changed in FPC or FCL 3.0.x that could be causing such extreme performance drop?
  (also, should I upgrade my Sqlite version, from 3.8.5, which is somewhat older)

2) I would like to provide further detail, but what debugging techniques can be used for this kind of problem?
How can I find and measure effects like spurious/repeated events, excessive ansistrings, etc?
The results are all "correct".
If I catch specific events and step thru the debugger, each step basically happens "fast".
Just that overall it is 10x or 100x slower.

How can I find what it's doing during these 1+ second freezes?
like profile exactly all function calls between key NextLine until it refreshes all grid displays.

TagsNo tags attached.
Fixed in Revision
LazTarget
WidgetsetGTK 2
Attached Files

Relationships

related to 0038009 new TMemo.Modifed/OnChange handling is inconsistent and wrong on Assignments via TMemo.Text := 'aString' 

Activities

Juha Manninen

2020-11-23 10:09

developer   ~0127134

FPC 3.0.4 is old. How does FPC 3.2 behave?

JBThiel

2020-12-06 19:04

reporter   ~0127380

I reported on 2.6.4 to 3.0.4 to provide a near-minimal increment that produced the error, to help localize
and keep the same Lazarus 1.8.4.
This narrows the problem for sure to FCL.
FPC 3.2.0 doesn't compile Lazarus 1.8.4, nor Lazarus 2.0.0

Now I have installed and tested FPC 3.2.0 with Lazarus 2.0.10-2

The error is reproduced with FPC 3.2.0, the same extreme slowdown as described, and seems even worse.
Now it's like 2+ seconds or more, sometime 5-10+ seconds, going NextLine which updates the master-detail.
As before the operation and data is correct, just extremely slow.

With brief testing, I noticed another problem with FPC 3.2.0 / Laz 2.0.10-2, which is that all DBGrids appear to be forced into Edit mode
immediately, with a vertical bar in the grey Row header, instead of the black triangle. (I have not fully confirmed it's in dsEdit mode yet)
I previously detected this effect was caused by bugs with TMemo.Modified/OnChange handling
which I reported as
  0038009 TMemo.Modifed/OnChange handling is inconsistent and wrong on Assignments via TMemo.Text := 'aString'
Perhaps the workaround I supplied there is no longer working.
I don't know if it's the same bug as before, but could be why FPC 3.2.0 is now even slower.

I will try to investigate further, but it is extremely difficult to test and use my app with these newer FPC versions;
it's basically unresponsive, the slightest grid cursor movement takes seconds.

I will also test some newer versions of SQLite3 to see if any change/improvement from that. (does FCL have a dependency on which Sqlite3 version?)

Note also, for completeness in case it matters, I am using a custom static linkage directly to my compiled sqlite3.o object file.
This requires some changes to sqlite3conn.pp,
and a custom sqlite3static.pp I have made from sqlite3.inc (analogous to sqlite3dyn.pp)
(I will be filing a separate report to encourage better FCL support for direct linking to sqlite3.o object file,
which is really the preferred linkage for SQLite.)

I think this should not matter, I'm doing same in FPC 2.6.4 which is fast, but let me know if there are any issues to consider around this.
Related to the static .o linking, I have to outcomment a few lines in sqlite3conn.pp
which assume some sqlite3 function names are dynamically loaded. (by testing with Assigned -- the code should not do that)
Diffs shown below, I'm not sure if I have this SQLITE_ENABLE_COLUMN_METADATA enabled in my sqlite3.o
or if the functions mentioned here would affect performance for the slowdown issue,
  sqlite3_column_origin_name
  sqlite3_table_column_metadata
  sqlite3_column_database_name
  sqlite3_column_table_name

diff -c ../fpc-fcl-source/fpc320/sqlite3conn.pp sqlite3conn.pp

*** ../fpc-fcl-source/fpc320/sqlite3conn.pp 2020-11-29 08:14:11.244039571 -0800
--- sqlite3conn.pp 2020-12-06 08:23:55.016750372 -0800
***************
*** 561,573 ****
        ftUnknown : DatabaseErrorFmt('Unknown or unsupported data type %s of column %s', [FD, FN]);
      end; // Case
      // check if SQLite is compiled with SQLITE_ENABLE_COLUMN_METADATA
! if Assigned(sqlite3_column_origin_name) then
! CN := sqlite3_column_origin_name(st,i)
! else
        CN := nil;
      // check only for physical table columns (not computed)
      // is column declared as NOT NULL ? (table name parameter (3rd) must be not nil)
! if not (Assigned(CN) and (sqlite3_table_column_metadata(fhandle, sqlite3_column_database_name(st,i), sqlite3_column_table_name(st,i), CN, nil, nil, @NotNull, nil, nil) = SQLITE_OK)) then
        NotNull := 0;
      FieldDefs.Add(FN, FT, size1, size2, NotNull=1, false, i+1, CP_UTF8);
      end;
--- 561,573 ----
        ftUnknown : DatabaseErrorFmt('Unknown or unsupported data type %s of column %s', [FD, FN]);
      end; // Case
      // check if SQLite is compiled with SQLITE_ENABLE_COLUMN_METADATA
! //if Assigned(sqlite3_column_origin_name) then
! // CN := sqlite3_column_origin_name(st,i)
! //else
        CN := nil;
      // check only for physical table columns (not computed)
      // is column declared as NOT NULL ? (table name parameter (3rd) must be not nil)
! // if not (Assigned(CN) and (sqlite3_table_column_metadata(fhandle, sqlite3_column_database_name(st,i), sqlite3_column_table_name(st,i), CN, nil, nil, @NotNull, nil, nil) = SQLITE_OK)) then
        NotNull := 0;
      FieldDefs.Add(FN, FT, size1, size2, NotNull=1, false, i+1, CP_UTF8);
      end;
***************
*** 867,874 ****
    InitializeSQLite;
    filename := DatabaseName;
    checkerror(sqlite3_open_v2(PAnsiChar(filename),@fhandle,GetSQLiteOpenFlags,Nil));
! if (Length(Password)>0) and assigned(sqlite3_key) then
! checkerror(sqlite3_key(fhandle,PChar(Password),StrLen(PChar(Password))));
    if Params.IndexOfName('foreign_keys') <> -1 then
      execsql('PRAGMA foreign_keys = '+Params.Values['foreign_keys']);
  end;
--- 867,875 ----
    InitializeSQLite;
    filename := DatabaseName;
    checkerror(sqlite3_open_v2(PAnsiChar(filename),@fhandle,GetSQLiteOpenFlags,Nil));
! {$WARNING The Static-linked sqlite3.o doesnt support encryption, no sqlite3_key function}
! //if (Length(Password)>0) and assigned(sqlite3_key) then
! //checkerror(sqlite3_key(fhandle,PChar(Password),StrLen(PChar(Password))));
    if Params.IndexOfName('foreign_keys') <> -1 then
      execsql('PRAGMA foreign_keys = '+Params.Values['foreign_keys']);
  end;

Juha Manninen

2020-12-06 19:25

developer   ~0127381

This is a tricky one because it involves both FPC/FCL and Lazarus/LCL changes. FPC and Lazarus are 2 different projects. Ideally you should be able to isolate a slowdown in FCL code without DBGrid or other LCL components. Do you think it would be possible?
Now this report's Project=Lazarus. It may not get much attention from FPC project.
It would help a lot to bisect the guilty FPC/FCL revision. I know it requires some effort.

JBThiel

2020-12-06 22:12

reporter   ~0127387

I have now confirmed, with FPC 3.2.0 / Laz 2.0.10, the DBGrids/DataSet are being immediately put in dsEdit mode, purely from programmatic setting during the initial loading and pushing values into associated controls in OnDataChange. I saw this also before on Linux GTK2, even with FPC 2.6.4, and documented in 0038009, it is a problem with TMemo.Modified/OnChange events. There is a serious problem potential if the widgetset and Lazarus have different concept of who/when is supposed to generate certain events, callbacks, etc. And I know for sure this is happening with GTK2 Memo. So maybe with other controls too. (for example, if every single cell, in every grid, were generating spurious events and callbacks, that could be quite some slowdown)

In the context of this bug report about Extreme Slowdown, this dsEdit issue is a new secondary error in FPC 3.2.0, but supports my idea that there are spurious events being generated, which may be the cause of slowdown.

I will try to track down exactly what is triggering the dsEdit mode now.

@Juha, thanks for rapid feedbacks. Is there a separate place for Free Pascal bugs? I noticed this is categorized for Lazarus, even though I listed "FCL" as the category. It is a tricky overlap as you say. Is why I had hoped to keep the Lazarus version constant across different FPC/FCL versions, but I cannot compile Laz 1.8.4 with FPC 3.2.0
From my internal testing notes, i can confirm it was also slow with FPC 3.0.0. So the bisection point is
FPC 2.6.4 / Laz 1.8.4 fast
FPC 3.0.0 / Laz 1.8.4 extremely slow
(all on Linux GTK2)

If I DisableControls and step along thru the Master dataset programmatically (ie. do Next in a loop instead of with keystrokes), would that sufficiently de-couple the LCL stuff, to get some clues?

Another test I will try is different widgetset on Linux (qt5?) or OSX Carbon, though it will take a while to assemble the components.
If it works better/correctly on another widgetset, that will tell it is likely the GTK2 LCL/widgets causing the slowdown.

I am leaning to think the cause is bad LCL widgetset events/callbacks, giving spurious inputs to the FCL components.

Juha Manninen

2020-12-07 15:49

developer   ~0127424

> Is there a separate place for Free Pascal bugs? I noticed this is categorized for Lazarus, even though I listed "FCL" as the category.
The "Project" of the issue must be "FPC" then. I am not sure why Lazarus project has "FCL". There are components in the System tab that come from FCL and there is a wrapper package for them.

> I cannot compile Laz 1.8.4 with FPC 3.2.0
I believe it only requires few IFDEFs to support FPC 3.2.0. You could make a local modification.

> If I DisableControls and step along thru the Master dataset programmatically
> (ie. do Next in a loop instead of with keystrokes), would that sufficiently de-couple the LCL stuff, to get some clues?
I believe so, yes.

JBThiel

2020-12-08 14:01

reporter   ~0127455

UPDATE: My app grids going into dsEdit mode is because of a change/regression in TMemo.Modified handling on Linux GTK2,
so a workaround I had stopped working in FPC 3.2.0 / Laz 2.0.10. Described in 0038009

So that is a separate issue. I will have to fix my app around that, and then can continue testing with FPC 3.2.0

JBThiel

2020-12-14 05:44

reporter   ~0127602

A few updates:

With FPC 3.2, my grids were going into dsEdit mode by TMemo event handler. This was due to 0038009.
I have applied a TMemo patch discussed there, and now the grids work correctly in that regard.

Now performance is quite improved, but from other testing described below, I'm not sure if that was the performance factor, or just an ancillary problem.

Also, there are still a couple performance and slowdown issues, as follows:

1) The grids are still "losing sync" if you scroll too fast. This is better than before, but still happening.
It stops updating the display in realtime, and waits until you have stopped keypressing/repeating, and when it finally catches up, then it repaints.
I don't recall this behavior with FPC 2.6.4.
I would prefer that the grids update after each keypress/keyrepeat, even if that makes it overall slower.
Is this a new grid setting, that I can update somewhere?
Also I will check if I have some stray DisableControls setting.

2) There is still an extreme slowdown present, due to some condition.
When I trigger this condition, then the app goes to taking literally seconds to scroll 1 Nextline in the master grid.
I have found a clue about the cause: it may relate to a "dangling detail grid".
Sometimes I turn off a detail grid, so it's not following the master; it shouldn't be doing anything.
I'm not certain yet, but it seems like once such a grid has been turned off, then the extreme slowdown happens.
Again, this worked fine in FPC 2.6.4 / LAZ 1.8.4
Are there any known changes that could produce such an effect?
Any recently new grid/dataset/datasource settings for example, that I may have forgot to update or initialize?

I will further inspect such dangling grids to see if that is the problem.


Meanwhile, to see when the TMemo/dsEdit problem had come back, I went back to test my older app binaries built with FPC 300 and 304 (both Laz 1.8.4)
The TMemo/dsEdit bug was not in those, so this confirmed it was re-introduced in FPC 3.2.0

However, to my surprise, the FPC 300/304 binaries seem to be working somewhat faster.
I did not rebuild the app binaries, so I'm not sure what happened here.

Thinking over, I did some major system upgrades since my prior testing, which likely included GTK and other windowing components.
I don't recall the prior versions, they might have been 6 months to a year+ old
What I have installed now is:
x11-libs/gtk+:
  2.24.32-r1
  3.24.22

and all the FPC 3.2.0 / 3.0.4 / 3.0.0 seem at least a little faster than before. (3.2.0 is the fastest)



SUMMARY
Some improved performance with FPC 3.2.0, possibly from:
  TMemo/dsEdit fix
  GTK and other window components upgraded (may also have improved FPC 300/304 binaries)

Open issues:
  Loss of sync when scrolling DBgrids
  Extreme slowdown occasionally triggered, possibly from "dangling detail grid" with disconnected master.

Issue History

Date Modified Username Field Change
2020-11-23 04:55 JBThiel New Issue
2020-11-23 10:09 Juha Manninen Note Added: 0127134
2020-12-06 19:04 JBThiel Note Added: 0127380
2020-12-06 19:25 Juha Manninen Note Added: 0127381
2020-12-06 22:12 JBThiel Note Added: 0127387
2020-12-07 15:42 Juha Manninen Relationship added related to 0038009
2020-12-07 15:49 Juha Manninen Note Added: 0127424
2020-12-08 14:01 JBThiel Note Added: 0127455
2020-12-14 05:44 JBThiel Note Added: 0127602