View Issue Details

IDProjectCategoryView StatusLast Update
0036308FPCCompilerpublic2019-11-24 21:27
ReporterJ. Gareth MoretonAssigned ToFlorian 
PrioritylowSeveritytweakReproducibilityN/A
Status resolvedResolutionfixed 
Platformi386 and x86_64OSMicrosoft WindowsOS Version10 Professional
Product Version3.3.1Product Build43455 
Target VersionFixed in Version3.3.1 
Summary0036308: [Patch] "MOV REG, -1" -> "OR REG, -1" optimisation
DescriptionThis patch adds an extra optimisation to "PostPeepholeOptMov" in compiler/x86/aoptx86.pas:

If the instruction "MOV REG, -1" (Intel notation) is found, where REG is either a 32- or 64-bit register, it is changed to "OR REG, -1" instead. The effect is the same and takes exactly the same speed to execute, but the encoding is much smaller.

For 16-bit registers, only AX is optimised this way because it has its own encoding for OR that takes fewer bytes.

Though it only saves a handful of bytes per occurrance, -1 is a common value to indicate an error or as a means of initialising a for-loop that starts at zero, so the cumulative effect can be quite substantial (about 30 KiB was noted to have been shaved off the binary for the x86_64-win64 build of Lazarus).
Steps To ReproduceApply patch and confirm correct compilation and also observe size saving
Additional Information- The optimisation is not applied if the FLAGS register is in use at the same time, as OR scrambles it.
- This particular optimisation has been observed in GCC as well, so it has a proven track record.
Tagscompiler, optimizations, patch, x86, x86_64
Fixed in Revision43579
FPCOldBugId
FPCTarget-
Attached Files
  • x86-mov-to-or-optimisation.patch (1,407 bytes)
    Index: compiler/x86/aoptx86.pas
    ===================================================================
    --- compiler/x86/aoptx86.pas	(revision 43455)
    +++ compiler/x86/aoptx86.pas	(working copy)
    @@ -4161,9 +4161,24 @@
                           Result := True;
                         end;
                       else
    -                    ;
    +                    { Do nothing };
                     end;
                   end;
    +            -1:
    +              { Don't make this optimisation if the CPU flags are required, since OR scrambles them }
    +              if (cs_opt_size in current_settings.optimizerswitches) and
    +                (taicpu(p).opsize <> S_B) and
    +                not (RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    +                begin
    +                  { change "mov $-1,%reg" into "or $-1,%reg" }
    +                  { NOTES:
    +                    - No size saving is made when changing a Word-sized assignment unless the register is AX (smaller encoding)
    +                    - This operation creates a false dependency on the register, so only do it when optimising for size
    +                    - It is possible to set memory operands using this method, but this creates an even greater false dependency, so don't do this at all
    +                  }
    +                  taicpu(p).opcode := A_OR;
    +                  Result := True;
    +                end;
                 end;
               end;
           end;
    

Activities

J. Gareth Moreton

2019-11-13 07:19

developer   ~0119257

As an additional note... the reason why OR takes fewer bytes to encode is because the immediate operand is sign-extended, so specifying "OR REG, -1" only uses 1 byte for the immediate (the value $FF), on top of the two bytes for the opcode and register.

Florian

2019-11-13 22:14

administrator   ~0119279

Just curious: doesn't this cause more data dependencies?

J. Gareth Moreton

2019-11-14 01:25

developer   ~0119283

Aah, you are right; "or $-1,%reg", while significantly smaller when dealing with 32-bit or 64-bit registers, introduces a false dependency.

I did some reading up and it turns out even some major developers weren't aware of it - I found this suggestion thread to the Visual C++ team as late as 2017, and in response, they said they'd disable the optimisation unless one is compiling for size:

https://developercommunity.visualstudio.com/content/problem/102658/initialization-int-x-1-compiles-into-or-eax-1-whic.html

I've done something similar with the patch - it will now only make the optimisation if "(cs_opt_size in current_settings.optimizerswitches)" is true (and for compiler efficiency, it is the first thing checked in the if-statement).

J. Gareth Moreton

2019-11-14 01:28

developer  

x86-mov-to-or-optimisation.patch (1,407 bytes)
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 43455)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -4161,9 +4161,24 @@
                       Result := True;
                     end;
                   else
-                    ;
+                    { Do nothing };
                 end;
               end;
+            -1:
+              { Don't make this optimisation if the CPU flags are required, since OR scrambles them }
+              if (cs_opt_size in current_settings.optimizerswitches) and
+                (taicpu(p).opsize <> S_B) and
+                not (RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
+                begin
+                  { change "mov $-1,%reg" into "or $-1,%reg" }
+                  { NOTES:
+                    - No size saving is made when changing a Word-sized assignment unless the register is AX (smaller encoding)
+                    - This operation creates a false dependency on the register, so only do it when optimising for size
+                    - It is possible to set memory operands using this method, but this creates an even greater false dependency, so don't do this at all
+                  }
+                  taicpu(p).opcode := A_OR;
+                  Result := True;
+                end;
             end;
           end;
       end;

Florian

2019-11-24 21:27

administrator   ~0119481

Thanks, applied.

Issue History

Date Modified Username Field Change
2019-11-13 07:16 J. Gareth Moreton New Issue
2019-11-13 07:16 J. Gareth Moreton File Added: x86-mov-to-or-optimisation.patch
2019-11-13 07:16 J. Gareth Moreton Priority normal => low
2019-11-13 07:16 J. Gareth Moreton Severity minor => tweak
2019-11-13 07:16 J. Gareth Moreton FPCTarget => -
2019-11-13 07:17 J. Gareth Moreton Tag Attached: patch
2019-11-13 07:17 J. Gareth Moreton Tag Attached: compiler
2019-11-13 07:17 J. Gareth Moreton Tag Attached: optimizations
2019-11-13 07:17 J. Gareth Moreton Tag Attached: x86_64
2019-11-13 07:17 J. Gareth Moreton Tag Attached: x86
2019-11-13 07:19 J. Gareth Moreton Note Added: 0119257
2019-11-13 22:14 Florian Note Added: 0119279
2019-11-14 01:22 J. Gareth Moreton File Deleted: x86-mov-to-or-optimisation.patch
2019-11-14 01:25 J. Gareth Moreton File Added: x86-mov-to-or-optimisation.patch
2019-11-14 01:25 J. Gareth Moreton Note Added: 0119283
2019-11-14 01:28 J. Gareth Moreton File Deleted: x86-mov-to-or-optimisation.patch
2019-11-14 01:28 J. Gareth Moreton File Added: x86-mov-to-or-optimisation.patch
2019-11-24 21:27 Florian Assigned To => Florian
2019-11-24 21:27 Florian Status new => resolved
2019-11-24 21:27 Florian Resolution open => fixed
2019-11-24 21:27 Florian Fixed in Version => 3.3.1
2019-11-24 21:27 Florian Fixed in Revision => 43579
2019-11-24 21:27 Florian Note Added: 0119481