View Issue Details

IDProjectCategoryView StatusLast Update
0036325LazarusOtherpublic2019-11-21 17:04
ReporterChris Rorden Assigned ToDmitry Boyarintsev  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
PlatformMacBook 2012 Retina 13"OSDarwin 
Summary0036325: FPC optimization disables threading on Darwin
DescriptionThis seems to be specific to MacOS Darwin, as it does not happen on Linux. When I compile a project with an optimization level higher than -o1, the multi-threading is disabled. This is not only in the simple program attached (where one could argue that optimization removes useless calls, but also to real world projects like this one: https://github.com/neurolabusc/DistanceFields which includes sample data
Steps To ReproduceCompile a project with MacOS using -O1 and note threading works. Recompile with -O2 or -O3 and notice threads are run sequentially. It is not only the order of the threads (which could be random) but the speed which suggests single threading.

Using sample project
  https://wiki.freepascal.org/Parallel_procedures#Simple_example

$fpc -O1 -CX -Xs -XX tester
Free Pascal Compiler version 3.0.4 [2018/09/30] for x86_64
Copyright (c) 1993-2017 by Florian Klaempfl and others
...
$./tester
1
5
4
3
2

$fpc -O2 -CX -Xs -XX tester
Free Pascal Compiler version 3.0.4 [2018/09/30] for x86_64
...
$./tester
1
2
3
4
5
Additional InformationThere is a patch attached for a file belonging to Lazarus, so I assume this issue does not belong in "Lazarus-CCR".
TagsNo tags attached.
Fixed in Revision62277
LazTarget-
Widgetset
Attached Files

Activities

Chris Rorden

2019-11-18 03:51

reporter  

mp.zip (7,818 bytes)

Thaddy de Koning

2019-11-18 07:53

reporter   ~0119378

Last edited: 2019-11-18 08:00

View 4 revisions

I don't know if it makes any difference, but in the unit mtprocs global inlining is active which means the compiler will try to inline *everything* it can at higher optimization settings.
This is also the case if a method is not marked as inline.
However, according to the comment, the method TProcThreadPool.DoParallelLocalProc should never be inlined. so should be marked with the noinline modifier in that case or the global {$inline on} should be removed.
I am not able to test on Catalina today, but see if it helps.

Note the above is used in TProcThreadPool.DoParallelIntern which is in turn used by TProcThreadPool.DoParallel - which is what you use - so this definitely needs correcting by either of my suggestions.

Thaddy de Koning

2019-11-18 10:02

reporter   ~0119382

I can now confirm that is likely the issue. A colleague just tested my hunch.

Thaddy de Koning

2019-11-18 10:47

reporter   ~0119383

Chris, the test was against 3.2.0, not 3.0.4.

Chris Rorden

2019-11-18 14:32

reporter   ~0119385

Is a 3.2.0 beta release available as a pkg? I tried to compile fps trunk on my Mojave computer, but got errors regarding "ld: file not found: /usr/lib/crt1.10.5.o". I reinstalled fpc3.0.4a pkg from the Lazarus SourceForge site, and that did not seem to help. I tried editing the /etc/fpc.cfg file using "sudo open /etc/fpc.cfg" as suggested below but was denied making changes by system integrity protection.
   https://forum.lazarus.freepascal.org/index.php/topic,42657.15.html
 I also tried editing the t_bsd.pas file as suggested below, but this did not seem to resolve the issue.
   https://fpc-devel.freepascal.narkive.com/dynJwugZ/macos-mojave-beta-crt1-o-not-installed-to-usr-lib

Commenting out {$inline on} in mtcpu.pas and mtprocs.pas did not help with my fpc 3.0.4 install.

Marco van de Voort

2019-11-18 15:24

manager   ~0119386

Might it simply optimize away empty for loops? Then the thread is probably done before the second is created.

Ryan Joseph

2019-11-18 16:20

reporter   ~0119387

I have the same problem on 3.3.1 and using Sleep instead of an empty loop:

procedure DoSomethingParallel(Index: PtrInt; Data: Pointer; Item: TMultiThreadProcItem);
var
  i: Integer;
begin
  Sleep(10+Random(1000));
  writeln(Index);
end;

Thaddy de Koning

2019-11-18 16:40

reporter   ~0119388

It is as far as I know just the absense of noinline that cause this. Or remove the global {$inline on}
This code is written by Matthias. He can have a look?
The error is not Mac specific, btw. It is reproducable on other platforms too, but really easy to fix. See my analysis.

Ryan Joseph

2019-11-18 17:07

reporter   ~0119389

"It is as far as I know just the absense of noinline that cause this. Or remove the global {$inline on}"

I disabled inlines with {$inline off} and tried adding no inline to TProcThreadPool.DoParallelLocalProc with no difference. Testing on 3.3.1 Catalina.

Chris Rorden

2019-11-18 17:11

reporter   ~0119390

Thaddy, can you be explicit about what your fix is? Simply removing {$inline on} functions in mtcpu.pas and mtprocs.pas does not fix this on my system. Are you suggesting some modification that allows other optimizations of -O2 except global inlining? If so, how do I enable this? With regards to Marco's comments about empty loops, the example project is very minimal so optimization could in theory eliminate these features. However, my Github example has each thread doing a lot of work, so the threads can not be optimized out of existence.

Thaddy de Koning

2019-11-18 19:04

reporter   ~0119391

Last edited: 2019-11-18 19:09

View 2 revisions

As far as I can see there is only one routine that is really affected by inlining as I pointed out.
the method TProcThreadPool.DoParallelLocalProc which is used, but deeply hidden in the code.
You should NOT use {inline on/off} at all, because the individual methods are already marked as inline. Inline off would remove inlining altogether and that is not what you want,
the method TProcThreadPool.DoParallelLocalProc should be marked noinline (fpc 3.2.0) or the {$inline on} at the to should be removed.
You should not inline this method is because it manipulates the stack directly.
So the code contains at least one bug.
If it is your bug, I don't know yet, but my collegue tested the small example on catalina + fpc 3.2.0.

Ryan Joseph

2019-11-18 20:07

reporter   ~0119392

Adding noinline doesn't work for me on the trunk (3.3.1). Can you confirm this?

Where is 3.2 coming from? I'm not aware this was ever a release.

Ryan Joseph

2019-11-19 17:43

reporter   ~0119400

The problem is with fpsysctl and something strange the optimizer does. In the example below try moving the variable "res" to the top of the list and then the bottom. Depending on the location the value changes. Should we make another bug report for this issue since it appears to be general compiler related issue? In the mean time we can just rearrange the variables to fix the problem but obviously the compiler needs to be fixed as well.

function NumberOfCPU: integer;
var
  res: cint; // <<<------ returns 4 on my system if res is here
  mib: array[0..1] of integer;
  status: integer;
  len: cint;
  res: cint; // <<<------ returns 0 if res is here
begin
  mib[0] := CTL_HW;
  mib[1] := HW_NCPU;
  len := sizeof(t);
  status := fpsysctl(@mib, length(mib), @res, @len, Nil, 0);
  if status <> 0 then RaiseLastOSError;
  result := res;
end;

Marco van de Voort

2019-11-19 17:54

manager   ~0119401

That sounds like the type/size of MIB is wrong.

Ryan Joseph

2019-11-19 18:11

reporter   ~0119402

But it only happens with -O2 on. If the size what wrong then we would expect bad results either way right?

Ryan Joseph

2019-11-19 18:23

reporter   ~0119403

Changing the size of "len" to size_t fixes the problem with -O2 on or off. Is this an optimizer bug or just a coincidence? The function GetSystemThreadCount in MTPCPU.pas needs to change the type of the local "t" to size_t (darwin/unix) to fix the problem in the RTL.

function NumberOfCPU: integer;
var
  mib: array[0..1] of integer;
  status: integer;
  len: size_t;
  res: cuint;
begin
  mib[0] := CTL_HW;
  mib[1] := HW_NCPU;
  len := sizeof(res);
  status := fpsysctl(@mib, length(mib), @res, @len, nil, 0);
  if status <> 0 then RaiseLastOSError;
  result := res;
end;

Jonas Maebe

2019-11-19 19:54

manager   ~0119404

> Changing the size of "len" to size_t fixes the problem with -O2 on or off. Is this an optimizer bug or just a coincidence?

Coincidence. If a size is wrong, then random data will get overwritten. The stack layout changes with -O2, because some variables will be moved to registers.

Sven Barth

2019-11-20 12:43

manager   ~0119406

According to the manpages for sysctl (Linux: http://man7.org/linux/man-pages/man2/sysctl.2.html, macOS: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/sysctl.3.html ) size_t for len is indeed the correct solution.

Ryan Joseph

2019-11-20 15:25

reporter   ~0119408

Here's a patch to add the correct type.
patch.diff (812 bytes)   
diff --git a/Users/ryanjoseph/Downloads/mp/mtpcpu.pas b/Users/ryanjoseph/Developer/lazarus/components/multithreadprocs/mtpcpu.pas
index fc71dc5..01a607e 100644
--- a/Users/ryanjoseph/Downloads/mp/mtpcpu.pas
+++ b/Users/ryanjoseph/Developer/lazarus/components/multithreadprocs/mtpcpu.pas
@@ -21,7 +21,7 @@ interface
 {$IF defined(windows)}
 uses Windows;
 {$ELSEIF defined(freebsd) or defined(darwin)}
-uses ctypes, sysctl;
+uses ctypes, unixtype, sysctl;
 {$ELSEIF defined(linux)}
 {$linklib c}
 uses ctypes;
@@ -71,12 +71,12 @@ end;
 var
   mib: array[0..1] of cint;
   len: cint;
-  t: cint;
+  t: size_t;
 begin
   mib[0] := CTL_HW;
   mib[1] := HW_NCPU;
   len := sizeof(t);
-  fpsysctl(pchar(@mib), 2, @t, @len, Nil, 0);
+  fpsysctl(@mib, 2, @t, @len, Nil, 0);
   Result:=t;
 end;
 {$ELSEIF defined(linux)}
patch.diff (812 bytes)   

Dmitry Boyarintsev

2019-11-21 16:02

developer   ~0119417

please test and close if ok

Issue History

Date Modified Username Field Change
2019-11-18 03:51 Chris Rorden New Issue
2019-11-18 03:51 Chris Rorden File Added: mp.zip
2019-11-18 07:53 Thaddy de Koning Note Added: 0119378
2019-11-18 07:55 Thaddy de Koning Note Edited: 0119378 View Revisions
2019-11-18 07:59 Thaddy de Koning Note Edited: 0119378 View Revisions
2019-11-18 08:00 Thaddy de Koning Note Edited: 0119378 View Revisions
2019-11-18 10:02 Thaddy de Koning Note Added: 0119382
2019-11-18 10:47 Thaddy de Koning Note Added: 0119383
2019-11-18 14:32 Chris Rorden Note Added: 0119385
2019-11-18 15:24 Marco van de Voort Note Added: 0119386
2019-11-18 16:20 Ryan Joseph Note Added: 0119387
2019-11-18 16:40 Thaddy de Koning Note Added: 0119388
2019-11-18 17:07 Ryan Joseph Note Added: 0119389
2019-11-18 17:11 Chris Rorden Note Added: 0119390
2019-11-18 19:04 Thaddy de Koning Note Added: 0119391
2019-11-18 19:09 Thaddy de Koning Note Edited: 0119391 View Revisions
2019-11-18 20:07 Ryan Joseph Note Added: 0119392
2019-11-19 17:43 Ryan Joseph Note Added: 0119400
2019-11-19 17:54 Marco van de Voort Note Added: 0119401
2019-11-19 18:11 Ryan Joseph Note Added: 0119402
2019-11-19 18:23 Ryan Joseph Note Added: 0119403
2019-11-19 19:54 Jonas Maebe Note Added: 0119404
2019-11-19 19:54 Jonas Maebe Project FPC => Lazarus CCR
2019-11-19 19:54 Jonas Maebe Category Compiler => General
2019-11-20 12:43 Sven Barth Note Added: 0119406
2019-11-20 15:25 Ryan Joseph File Added: patch.diff
2019-11-20 15:25 Ryan Joseph Note Added: 0119408
2019-11-20 16:38 Bart Broersma Project Lazarus CCR => Lazarus
2019-11-20 16:40 Bart Broersma Category General => Other
2019-11-20 16:40 Bart Broersma Product Version 3.0.4 =>
2019-11-20 16:40 Bart Broersma Additional Information Updated View Revisions
2019-11-20 16:40 Bart Broersma LazTarget => -
2019-11-21 16:02 Dmitry Boyarintsev Assigned To => Dmitry Boyarintsev
2019-11-21 16:02 Dmitry Boyarintsev Status new => resolved
2019-11-21 16:02 Dmitry Boyarintsev Resolution open => fixed
2019-11-21 16:02 Dmitry Boyarintsev Fixed in Revision => 62277
2019-11-21 16:02 Dmitry Boyarintsev Note Added: 0119417
2019-11-21 17:04 Chris Rorden Status resolved => closed