View Issue Details

IDProjectCategoryView StatusLast Update
0015163FPCCompilerpublic2010-01-13 06:34
ReporterDmitry Boyarintsev Assigned ToJonas Maebe  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Fixed in Version2.6.0 
Summary0015163: compiler: internal i386 darwin assembler
Descriptionthe patch contains darwin object file writer. It can be extended to support powerpc or arm, if necessary.
obj-p supported.

i don't know if it worths adding to the trunk or creating a branch.
TagsNo tags attached.
Fixed in Revision14628
Attached Files


related to 0015142 closedJonas Maebe wrong STABS SLINE info generated for the internal assembler 


2009-11-23 12:08 (42,151 bytes)

Jonas Maebe

2009-11-23 13:32

manager   ~0032379

Branches are only for features that are still under heavy development. Once the initial work is done and things are more or less stable then trunk is best, because otherwise you will have to adapt your changes to new trunk code every now and then (in case conflicting code is checked into trunk).

Jonas Maebe

2009-11-23 20:51

manager   ~0032400

Please adjust some things to the compiler coding style:
* no "begin" on the same line as, and indent such constructs like this
if x then
else if y then
* split all composite if-conditions over multiple lines, so no "if (x) and (y) then" but
if x and
   y then
(except possibly if x and y are simply boolean variables)

I also prefer not to make the binary writer the default assembler for Darwin. It may be faster, but it's another component that can cause bugs of which may require updates to support new features. Everyone who wants to use it by default can add -Amacho to their fpc.cfg, of course.

Dmitry Boyarintsev

2009-11-23 21:12

developer   ~0032403

tests shows that the compile time is the same as using external assembler.

I'll fix the style and resend the patch.

Jonas Maebe

2009-11-23 21:35

manager   ~0032404

Ok, thanks!

Jonas Maebe

2009-11-23 22:16

manager   ~0032406

Last edited: 2009-11-23 22:16

I also saw that you included the patch from 0015142. Have you checked whether it doesn't break stabs on Windows or Linux?

Dmitry Boyarintsev

2009-11-24 00:04

developer   ~0032410

Here's assembler listing of Windows (Linux is identical) version.
Offsets are explicitly set and match the address of the first instruction of a source line.
I wonder why darwin stabs are missing offset (and additional labels) set? The patch at 15142 wouldn't be required if proper stabs is generated.

    .stabn 68,0,3,.Ll1 - _main
# [test.pas]
# [3] begin
    pushl %ebp
    movl %esp,%ebp
    subl $4,%esp
    movl %ebx,-4(%ebp)
    .stabn 68,0,4,.Ll2 - _main
# [4] c:=5;
    movw $5,U_P$PROGRAM_C
    .stabn 68,0,5,.Ll3 - _main
# [5] writelN(1);
    call fpc_get_output
    movl %eax,%ebx
    movl %ebx,%edx
    movl $1,%ecx
    movl $0,%eax
    call fpc_write_text_sint
    call FPC_IOCHECK
    movl %ebx,%eax
    call fpc_writeln_end
    call FPC_IOCHECK

2009-11-24 00:44 (42,260 bytes)

Jonas Maebe

2009-11-24 10:25

manager   ~0032423

> I wonder why darwin stabs are missing offset (and additional labels) set?

Because the external assembler refuses to assemble the stabs if we do that (or it generates non-working stabs, I forgot which it is).

To test, you can change the current behaviour by adding tf_use_function_relative_addresses to the flags set in compiler/systems/i_bsd.pas for Darwin/i386.

I guess we can add an extra check in dbgstabs.pas for the macho-binary writer (instead of only for tf_use_function_relative_addresses), although it's not very clean.

Dmitry Boyarintsev

2009-11-24 15:47

developer   ~0032430

a program assembles and compiles fine with tf_use_function_relative_addresses flag used for the darwin external assembler:
Apple Inc version cctools-698.1~1, GNU assembler version 1.38

I don't like adding extra macho-binary writer check.
Maybe another assembler flag should be added, like "af_suppress_stabs_relative_addresses", that would be used by external darwin assembler only?

Jonas Maebe

2009-11-24 16:03

manager   ~0032431

Can it also be debugged correctly? Things definitely did not work when I first implemented Mac OS X support (although this was in Mac OS X 10.1 or 10.2 times). And at least GCC also not generate such stabs either, even on Mac OS X 10.5.

Furthermore, it's not just the Darwin external assembler that needs this (at least OS/2 and EMX don't have it either).

But changing it from a target flag into an assembler flag is indeed a good idea.

Dmitry Boyarintsev

2009-11-25 08:49

developer   ~0032452

Yes the resulting code of external asm + tf_use_function_relative_addresses flag can be debugged.

gdb determines the address of breakpoint correctly for "b filename:linenum" command.
There're differences, though. SLINE symbol doesn't have section number set (that should be 1, the number of __TEXT __text section).

00000000 - 01 0000 SO test.pas
00000000 - 00 0000 LSYM void:t2=2
00000000 - 00 0000 LSYM SMALLINT:t1=r1;-32768;32767;
0000011c - 04 0002 LCSYM I:S1
0000011e - 04 0002 LCSYM C:S1
00000000 - 01 0001 FUN PASCALMAIN:F2
00000000 - 01 0000 SOL test.pas
00000000 - 00 0003 SLINE
00000016 - 00 0004 SLINE
00000021 - 00 0005 SLINE
0000002c - 00 0006 SLINE
0000004a - 00 0007 SLINE
00000077 - 00 0008 SLINE
00000000 - 00 0000 LBRAC
00000084 - 00 0000 RBRAC
00000084 - 00 0000 FUN

without af_suppress_stabs_relative_addresses flag, generated stabs are:

00000000 - 01 0003 SLINE

the Internal assemble also doesn't set the section number for SLINE symbol, but it can be fixed.

2009-12-10 11:59 (42,801 bytes)

Dmitry Boyarintsev

2009-12-10 11:59

developer   ~0032903

the patch is updated to the latest trunk changes

Jonas Maebe

2009-12-10 18:24

manager   ~0032915

This does not seem like a good idea to me:

@@ -143,6 +145,12 @@
        size : aword;
        { Used for external and common solving during linking }
        exesymbol : TExeSymbol;
+ { Indirect symbol name, used by darwin }
+ { It's possible that indirect symbol name is set, but symbol itself is not defined }
+ { That's why there're indirect as string and indsymbol as TObjSymbol }
+ { indsymbol is left for the object writer, it's not filled by the Internal Asm }
+ indirect : string;
+ indsymbol : TObjSymbol; {not initialized, even if indirect<>''}
        constructor create(AList:TFPHashObjectList;const AName:string);
        function address:aword;
        procedure SetAddress(apass:byte;aobjsec:TObjSection;abind:TAsmsymbind;atyp:Tasmsymtype);

It enlarges the size of each symbol with 256 bytes (string = shortstring in the compiler for speed reasons). That should probably be a pshortstring instead, so it can be allocated only when needed (and only of the necessary size).

There are also still wrong indentations in the code, e.g.:

+ if Assigned(strsec) then begin
+ // 1 byte of zero name (that stands in the end of table, not at zero pos)
+ inc(symlen, strsec.Size + 1)
+ end else
+ inc(symlen); {the first zero byte}

"then begin" and "end else" have to be on separate lines (although if only that were the problem, I could fix it myself).

I really wish that there were someone who actually knows something about the internal assembler to review it. I can test whether it works and give some general remarks, but I cannot in any way say whether it's implemented properly or not...

At least the "indirect" handling with the string seems not optimal (even with pshortstring), and I don't understand why you cannot generate a symbol for that case as well (possibly of some new type, undefinedindirect or so, if it doesn't fit into any existing category).

Dmitry Boyarintsev

2009-12-12 20:20

developer   ~0033003

Sure, the additional symbol can be created (if none is present in the list).

But still the information about what symbol indirectly points to another is required. I guess indsymbol should be used for this reason.

Currently indsymbol is set by TMachoObjectOutput, though this should be done by TInternalAssembler (when handling .indirect modifier)

I tried to implement the darwin asm with as little changes to original fpc units as possible.

Jonas Maebe

2009-12-12 21:47

manager   ~0033011

It's no problem to modify other units if that's the right place to do things. I think symbols are better than strings, because afaik multiple symbols with the same name automatically point to the same data, saving memory.

Dmitry Boyarintsev

2010-01-11 04:10

developer   ~0033554

Last edited: 2010-01-11 04:12

using "indirect" string has been removed

make sure you don't have "ogmacho.pas","macho.pas","machoutils.pas" on the harddrive. otherwise, patch utility will add content of the patch to them (without of overwriting these files)

2010-01-11 04:10 (48,060 bytes)

Jonas Maebe

2010-01-11 21:44

manager   ~0033565

Last edited: 2010-01-11 21:45

Shouldn't the {$ifdef endian_big} part of the following code be deleted:

  {$ifdef ENDIAN_BIG}
    si_len_mask: array [TRelocInfoLength] of Integer = (0 shl 1, 1 shl 1, 2 shl 1, 3 shl 1);
    si_type_ofs = 4;
    si_pcrel_bit = 1 shl 1;
    si_scatter_bit = 1 shl 0;
    si_addr_ofs = 8;
    si_len_mask: array [TRelocInfoLength] of Integer = (0 shl 28, 1 shl 28, 2 shl 28, 3 shl 28);
    si_type_ofs = 24;
    si_pcrel_bit = 1 shl 30;
    si_scatter_bit = 1 shl 31;
    si_addr_ofs = 0;

  procedure ScatterRelocInfo(value, addr, reltype: integer; len: TRelocInfoLength; pcrel: Boolean; var info: scattered_relocation_info);
      relbit : array [Boolean] of Integer = (0, si_pcrel_bit);
                                                                                       // big endian
      info.r_info:=si_scatter_bit or // r_scattered:1, /* 1=scattered, 0=non-scattered (see above) */
                   relbit[pcrel] or // r_pcrel:1, /* was relocated pc relative already */
                   si_len_mask[len] or // r_length:2, /* 0=byte, 1=word, 2=long, 3=quad */
                   ((reltype and $F) shl si_type_ofs) or // r_type:4, /* if not 0, machine specific relocation type */
                   ((addr and $FFFFFF) shl si_addr_ofs); // r_address:24; /* offset in the section to what is being relocated */}
                                                          // little endian
                                                          // r_address:24, /* offset in the section to what is being relocated */
                                                          // r_type:4, /* if not 0, machine specific relocation type */
                                                          // r_length:2, /* 0=byte, 1=word, 2=long, 3=quad */
                                                          // r_pcrel:1, /* was relocated pc relative already */
                                                          // r_scattered:1; /* 1=scattered, 0=non-scattered (see above) */ *}

The big endian struct description places the r_pcrel bit first. The first bit on big endian corresponds to "1 shl 31".

Jonas Maebe

2010-01-11 21:47

manager   ~0033566

> make sure you don't have "ogmacho.pas","macho.pas",
> "machoutils.pas" on the harddrive. otherwise, patch utility
> will add content of the patch to them (without of overwriting these files)

I guess that's why the patch contains the contents of machoutils.pas twice :)

Jonas Maebe

2010-01-11 22:47

manager   ~0033568

BTW: please don't attach a full new patch, as I've already edited some things (mainly adjusting the formatting)

Dmitry Boyarintsev

2010-01-12 07:08

developer   ~0033571

Is there anything missing?

The latest patch provides the internal asm.

Jonas Maebe

2010-01-12 11:42

manager   ~0033573

I was wondering what you think about my remark in comment 0033565 above.

Dmitry Boyarintsev

2010-01-12 12:03

developer   ~0033574

I expected FPC to turn-out bitwise-operations, so 1 shl 0 is the first bit on Big-Endian CPUs too. (Never studied it, actually)

I guess $ifdef for big-endian can be removed, as well as comment lines starting from "// little endian"

Jonas Maebe

2010-01-12 15:03

manager   ~0033577

It doesn't work like that. Big endian only refers to how the *bytes* are stored in *memory*. It's unrelated to bits, and unrelated to how shl/shr work. In a register, conceptually all architectures are in fact big endian if you take "shift left/right" literally. Then there's also the issue of which bit you call "bit 0". IBM usually calls the most significant bit "bit 0", while Intel the least significant bit".

Anyway, "1 shl 0" is the least significant bit on both ppc and Intel. In memory, this corresponds to the least significant bit of the first byte on a little endian machine (Intel), and to the least significant bit of the fourth byte on a big endian machine (ppc).

If "r_scattered:1" is the first field in a C struct on a big endian machine, then this corresponds to 1 shl 31 there (if you treat the struct as a longint).

Marco van de Voort

2010-01-12 15:43

manager   ~0033579

(I think registers do not have endianess, and right/left is purely notational, because arabic numbers are bigendian. Endianess only plays a part when you start mapping them to linear addresses. memorysystem/cache)

Jonas Maebe

2010-01-12 15:49

manager   ~0033581

That's why I said "conceptually".

Jonas Maebe

2010-01-12 18:04

manager   ~0033585

They are in fact the same bits:

Dmitry Boyarintsev

2010-01-12 18:21

developer   ~0033586

so, scattered_relocation_info structs are identical on both platform (ppc and intel) and they can share the same code.

but relocation_info structs are not identical and require different initialization code?

Jonas Maebe

2010-01-12 18:53

manager   ~0033588

Last edited: 2010-01-12 19:41

> so, scattered_relocation_info structs are identical on both platform
> (ppc and intel) and they can share the same code.


> but relocation_info structs are not identical and require different initialization code?


Dmitry Boyarintsev

2010-01-12 20:40

developer   ~0033590

I guess RelocInfo() function (machoutils.pas:277)
should be the following:

  procedure RelocInfo(addr, symnum, reltype: integer; len: TRelocInfoLength; pcrel, extern: Boolean; var info: relocation_info);
    {$ifdef ENDIAN_BIG}
      relbit : array [Boolean] of Integer = (0, 1 shl 7);
      extbit : array [Boolean] of Integer = (0, 1 shl 4);
      ri_len_mask : array [TRelocInfoLength] of Integer = (0 shl 5, 1 shl 5, 2 shl 5, 3 shl 5);
      info.r_info:=((symnum and $FFFFFF) shl 8) or // r_symbolnum:24
                    relbit[pcrel] or // r_pcrel:1;
                    ri_len_mask[len] or // r_length:2;
                    extbit[extern] or // r_extern:1;
                    (reltype and $F); // r_type:4;
      relbit : array [Boolean] of Integer = (0, ri_pcrel_bit);
      extbit : array [Boolean] of Integer = (0, ri_extern_bit);
      ri_len_mask : array [TRelocInfoLength] of Integer = (0 shl 25, 1 shl 25, 2 shl 25, 3 shl 25);
      info.r_info:=(symnum and $FFFFFF) or // r_symbolnum:24
                   relbit[pcrel] or // r_pcrel:1;
                   extbit[extern] or // r_length:2;
                   ri_len_mask[len] or // r_extern:1;
                   (reltype shl 28); // r_type:4;

Jonas Maebe

2010-01-12 20:56

manager   ~0033592

It's committed. You can activate it using the -Amacho parameter.

There are still several errors though:
a) compiling with -gw causes internalerror 200603253
b) several tests fail with the internal assembler that work fine with the external one:

Dmitry Boyarintsev

2010-01-13 06:34

developer   ~0033602

I'll look into the failing tests.

Issue History

Date Modified Username Field Change
2009-11-23 12:08 Dmitry Boyarintsev New Issue
2009-11-23 12:08 Dmitry Boyarintsev File Added:
2009-11-23 13:30 Jonas Maebe Status new => assigned
2009-11-23 13:30 Jonas Maebe Assigned To => Jonas Maebe
2009-11-23 13:32 Jonas Maebe Note Added: 0032379
2009-11-23 20:51 Jonas Maebe Note Added: 0032400
2009-11-23 21:12 Dmitry Boyarintsev Note Added: 0032403
2009-11-23 21:35 Jonas Maebe Note Added: 0032404
2009-11-23 22:16 Jonas Maebe Note Added: 0032406
2009-11-23 22:16 Jonas Maebe Note Edited: 0032406
2009-11-23 22:16 Jonas Maebe Relationship added related to 0015142
2009-11-24 00:04 Dmitry Boyarintsev Note Added: 0032410
2009-11-24 00:44 Dmitry Boyarintsev File Added:
2009-11-24 10:25 Jonas Maebe Note Added: 0032423
2009-11-24 15:47 Dmitry Boyarintsev Note Added: 0032430
2009-11-24 16:03 Jonas Maebe Note Added: 0032431
2009-11-25 08:49 Dmitry Boyarintsev Note Added: 0032452
2009-12-10 11:59 Dmitry Boyarintsev File Added:
2009-12-10 11:59 Dmitry Boyarintsev Note Added: 0032903
2009-12-10 18:24 Jonas Maebe Note Added: 0032915
2009-12-12 20:20 Dmitry Boyarintsev Note Added: 0033003
2009-12-12 21:47 Jonas Maebe Note Added: 0033011
2010-01-11 04:10 Dmitry Boyarintsev Note Added: 0033554
2010-01-11 04:10 Dmitry Boyarintsev File Added:
2010-01-11 04:12 Dmitry Boyarintsev Note Edited: 0033554
2010-01-11 21:44 Jonas Maebe Note Added: 0033565
2010-01-11 21:45 Jonas Maebe Note Edited: 0033565
2010-01-11 21:47 Jonas Maebe Note Added: 0033566
2010-01-11 22:47 Jonas Maebe Note Added: 0033568
2010-01-12 07:08 Dmitry Boyarintsev Note Added: 0033571
2010-01-12 11:42 Jonas Maebe Note Added: 0033573
2010-01-12 12:03 Dmitry Boyarintsev Note Added: 0033574
2010-01-12 15:03 Jonas Maebe Note Added: 0033577
2010-01-12 15:43 Marco van de Voort Note Added: 0033579
2010-01-12 15:49 Jonas Maebe Note Added: 0033581
2010-01-12 18:04 Jonas Maebe Note Added: 0033585
2010-01-12 18:21 Dmitry Boyarintsev Note Added: 0033586
2010-01-12 18:53 Jonas Maebe Note Added: 0033588
2010-01-12 19:41 Jonas Maebe Note Edited: 0033588
2010-01-12 19:41 Jonas Maebe Note Edited: 0033588
2010-01-12 20:40 Dmitry Boyarintsev Note Added: 0033590
2010-01-12 20:56 Jonas Maebe Fixed in Revision => 14628
2010-01-12 20:56 Jonas Maebe Status assigned => resolved
2010-01-12 20:56 Jonas Maebe Fixed in Version => 2.5.1
2010-01-12 20:56 Jonas Maebe Resolution open => fixed
2010-01-12 20:56 Jonas Maebe Note Added: 0033592
2010-01-13 06:34 Dmitry Boyarintsev Status resolved => closed
2010-01-13 06:34 Dmitry Boyarintsev Note Added: 0033602