View Issue Details

IDProjectCategoryView StatusLast Update
0037469FPCCompilerpublic2020-10-23 00:14
ReporterSergey Larin Assigned To 
PrioritynormalSeverityminorReproducibilityN/A
Status newResolutionopen 
Platformx86_64OSWindows 
Product Version3.3.1 
Summary0037469: Win64 TLS section_threadvars implementation
DescriptionThis patch offers native threadvar implementation for Windows x64 with partial use of the existing "section_threadvars" functionality.
Additional InformationUsing SECTION_THREADVAR for Win64 is enabled if you define TEST_WIN64_SECTION_THREADVAR and do not define SUPPORT_WINXP.

Later, if necessary, it will be possible to rename TEST_WIN64_SECTION_THREADVARS to tls_threadvars as in i_linux.pas and i_bsd.pas

On Windows Vista and later, dynamically-loaded DLLs via LoadLibrary also full support Thread Local Storage.
Therefore, if someone still needs support for Windows XP (x64), then you need to define SUPPORT_WINXP and the old threadvars implementation will be used.

The unfinished and probably never used previous "native threadvar implementation" added in rev. 23359 has been completely removed.

SECREL32 relocation support is added in both AT&T asmoutput and Internal assembler.

In TDebugInfoDwarf.appendsym_var_with_name_type_offset, the `target_info.system in [system_x86_64_win64]` check is added, otherwise an internalerror(200905071) occurs in TCoffObjOutput.section_write_relocs.

In node_complexity, the estimation of the complexity of using threadvar has been changed. If the helper call is required, then the estimate is greatly increased (offhand 10 times), because the previous value was clearly underestimated.

If you specify that you need to use old binutils (cs_asm_re_binutils_2_25), then using SECREL32 relocation fails. Erroneous offsets are generated. Tested on version 2.21

It turned out that the Internal linker cannot be used in its current state. Offsets for `$threadvar@secrel32`,`_tls_index(%rip)` are generated incorrectly, and an "Incompatible section options" error occurs in TExeSection.AddObjSection.
The presented patch did not fix problems with the Internal linker.
In this regard, an External linker was used (ld.exe)

There was also a problem that the tls callback FreePascal_TLS_callback is not called if an external linker (ld.exe) is used. This error apparently was always regardless of this patch.
A workaround is provided in the tls_callback-ld-workaround.patch
Tagscompiler, optimization, patch, x86_64-win64
Fixed in Revision
FPCOldBugId
FPCTarget
Attached Files

Activities

Sergey Larin

2020-08-02 21:54

reporter  

tls-win64-section_threadvars.patch (31,067 bytes)   
---
 compiler/cgbase.pas        |  1 +
 compiler/dbgdwarf.pas      |  3 +-
 compiler/nld.pas           |  4 +-
 compiler/nutils.pas        |  9 +++--
 compiler/ogcoff.pas        | 22 ++--------
 compiler/systems/i_win.pas |  7 ++++
 compiler/x86/aasmcpu.pas   |  6 +++
 compiler/x86/agx86att.pas  |  9 +++++
 compiler/x86/nx86ld.pas    | 83 ++++++++++++++++++++++++--------------
 rtl/inc/heaptrc.pp         |  4 ++
 rtl/linux/si_impl.inc      |  2 +
 rtl/win/sysosh.inc         |  5 +--
 rtl/win/systhrd.inc        | 19 +++++++--
 rtl/win/systlsdir.inc      | 13 ++++--
 rtl/win/syswin.inc         |  6 ++-
 rtl/win32/sysinit.inc      | 10 +++++
 rtl/win32/system.pp        | 37 +----------------
 rtl/win64/sysinit.pp       | 10 +++++
 rtl/win64/system.pp        | 43 +-------------------
 19 files changed, 152 insertions(+), 141 deletions(-)

diff --git a/compiler/cgbase.pas b/compiler/cgbase.pas
index 6dca71ba40..0e862ca739 100644
--- a/compiler/cgbase.pas
+++ b/compiler/cgbase.pas
@@ -140,6 +140,7 @@ interface
 {$ifdef x86_64}
           ,addr_tpoff
           ,addr_tlsgd
+          ,addr_secrel
 {$endif x86_64}
          );
 
diff --git a/compiler/dbgdwarf.pas b/compiler/dbgdwarf.pas
index 5b75e257b5..d81436ed2a 100644
--- a/compiler/dbgdwarf.pas
+++ b/compiler/dbgdwarf.pas
@@ -2658,7 +2658,8 @@ implementation
                   begin
                     if vo_is_thread_var in sym.varoptions then
                       begin
-                        if tf_section_threadvars in target_info.flags then
+                        if (tf_section_threadvars in target_info.flags) and
+                           (not (target_info.system in [system_x86_64_win64])) then // TODO : dwarf for threadvars on win64
                           begin
                             case sizeof(puint) of
                               2:
diff --git a/compiler/nld.pas b/compiler/nld.pas
index e2bccbc080..0453f162e8 100644
--- a/compiler/nld.pas
+++ b/compiler/nld.pas
@@ -450,7 +450,9 @@ implementation
                 { call to get address of threadvar }
                 if (vo_is_thread_var in tabstractvarsym(symtableentry).varoptions) then
                   begin
-                    include(current_procinfo.flags,pi_do_call);
+                    if (not (tf_section_threadvars in target_info.flags)) or
+                       (current_settings.tlsmodel in [tlsm_local_dynamic,tlsm_global_dynamic]) then
+                      include(current_procinfo.flags,pi_do_call);
                     include(current_procinfo.flags,pi_uses_threadvar);
                   end;
               end;
diff --git a/compiler/nutils.pas b/compiler/nutils.pas
index d586fc222d..b5c0de1937 100644
--- a/compiler/nutils.pas
+++ b/compiler/nutils.pas
@@ -191,7 +191,7 @@ interface
 implementation
 
     uses
-      cutils,verbose,globals,
+      cutils,verbose,globals,systems,
       symconst,symdef,
       defcmp,defutil,
       ncon,ncnv,nld,nflw,nset,ncal,nadd,nmem,ninl,
@@ -799,10 +799,13 @@ implementation
                 begin
                   if assigned(tloadnode(p).left) then
                     inc(result,node_complexity(tloadnode(p).left));
-                  { threadvars need a helper call }
+                  { some threadvars models need a helper call }
                   if (tloadnode(p).symtableentry.typ=staticvarsym) and
                      (vo_is_thread_var in tstaticvarsym(tloadnode(p).symtableentry).varoptions) then
-                    inc(result,5)
+                    if (tf_section_threadvars in target_info.flags) and (current_settings.tlsmodel in [tlsm_local_exec,tlsm_initial_exec]) then
+                      inc(result,3)
+                    else
+                      inc(result,50)
                   else if not((tloadnode(p).symtableentry.typ in [staticvarsym,localvarsym,paravarsym,fieldvarsym]) and
                     (tabstractvarsym(tloadnode(p).symtableentry).varregable in [vr_intreg,vr_mmreg,vr_fpureg])) then
                     inc(result);
diff --git a/compiler/ogcoff.pas b/compiler/ogcoff.pas
index 8c05b3792a..3e361d10fd 100644
--- a/compiler/ogcoff.pas
+++ b/compiler/ogcoff.pas
@@ -2667,21 +2667,17 @@ const pemagic : array[0..3] of byte = (
 
         procedure UpdateTlsDataDir;
         var
-          {callbacksection : TExeSection;}
           tlsexesymbol: TExeSymbol;
           tlssymbol: TObjSymbol;
-          callbackexesymbol: TExeSymbol;
-          //callbacksymbol: TObjSymbol;
         begin
           { according to GNU ld,
             the callback routines should be placed into .CRT$XL*
             sections, and the thread local variables in .tls
             __tls_start__ and __tls_end__ symbols
-            should be used for the initialized part,
-            which we do not support yet. }
-          { For now, we only pass the address of the __tls_used
+            should be used for the initialized part. }
+          { We pass the address of the __tls_used
             asm symbol into PE_DATADIR_TLS with the correct
-            size of this table (different for win32/win64 }
+            size of this table (different for win32/win64) }
           tlsexesymbol:=texesymbol(ExeSymbolList.Find(
             target_info.Cprefix+'_tls_used'));
           if assigned(tlsexesymbol) then
@@ -2690,18 +2686,6 @@ const pemagic : array[0..3] of byte = (
               peoptheader.DataDirectory[PE_DATADIR_TLS].vaddr:=tlssymbol.address;
               { sizeof(TlsDirectory) is different on host and target when cross-compiling }
               peoptheader.DataDirectory[PE_DATADIR_TLS].size:=TLSDIR_SIZE;
-              if IsSharedLibrary then
-                begin
-                  { Here we should reset __FPC_tls_callbacks value to nil }
-                  callbackexesymbol:=texesymbol(ExeSymbolList.Find(
-                                        '__FPC_tls_callbacks'));
-                  if assigned (callbackexesymbol) then
-                    begin
-                      //callbacksymbol:=callbackexesymbol.ObjSymbol;
-
-                    end;
-                end;
-
            end;
         end;
 
diff --git a/compiler/systems/i_win.pas b/compiler/systems/i_win.pas
index d4e8f842d5..0e469042e2 100644
--- a/compiler/systems/i_win.pas
+++ b/compiler/systems/i_win.pas
@@ -107,6 +107,9 @@ unit i_win;
             shortname    : 'Win64';
             flags        : [tf_files_case_aware,tf_has_dllscanner,
                             tf_smartlink_sections,
+{$if defined(TEST_WIN64_SECTION_THREADVARS) and not defined(SUPPORT_WINXP)}
+                            tf_section_threadvars,
+{$endif}
                             tf_winlikewidestring,tf_no_pic_supported,
                             tf_dwarf_only_local_labels,
                             tf_no_generic_stackcheck,tf_has_winlike_resources,
@@ -139,7 +142,11 @@ unit i_win;
             dirsep       : '\';
             assem        : as_x86_64_pecoff;
             assemextern  : as_gas;
+{$if defined(TEST_WIN64_SECTION_THREADVARS) and not defined(SUPPORT_WINXP)}
+            link         : ld_windows;
+{$else}
             link         : ld_int_windows;
+{$endif}
             linkextern   : ld_windows;
             ar           : ar_gnu_ar;
             res          : res_gnu_windres;
diff --git a/compiler/x86/aasmcpu.pas b/compiler/x86/aasmcpu.pas
index f712af7196..115474e6d1 100644
--- a/compiler/x86/aasmcpu.pas
+++ b/compiler/x86/aasmcpu.pas
@@ -3617,6 +3617,12 @@ implementation
                       currabsreloc:=RELOC_TLSGD;
                       currabsreloc32:=RELOC_TLSGD;
                     end
+                  else if oper[opidx]^.ref^.refaddr=addr_secrel then
+                    begin
+                      currrelreloc:=RELOC_SECREL32;
+                      currabsreloc:=RELOC_SECREL32;
+                      currabsreloc32:=RELOC_SECREL32;
+                    end
                   else
 {$endif x86_64}
                     begin
diff --git a/compiler/x86/agx86att.pas b/compiler/x86/agx86att.pas
index 657f073163..bf69e08bd5 100644
--- a/compiler/x86/agx86att.pas
+++ b/compiler/x86/agx86att.pas
@@ -190,6 +190,8 @@ interface
                owner.writer.AsmWrite('@tpoff');
              addr_tlsgd:
                owner.writer.AsmWrite('@tlsgd');
+             addr_secrel:
+               owner.writer.AsmWrite('@secrel32');
 {$endif x86_64}
              else
                ;
@@ -259,6 +261,13 @@ interface
               {$ifdef x86_64},addr_tpoff,addr_tlsgd{$endif x86_64}
               ] then
               WriteReference(o.ref^)
+        {$ifdef x86_64}
+            else if o.ref^.refaddr=addr_secrel then
+              begin
+                owner.writer.AsmWrite('$');
+                WriteReference(o.ref^);
+              end
+        {$endif x86_64}
             else
               begin
                 owner.writer.AsmWrite('$');
diff --git a/compiler/x86/nx86ld.pas b/compiler/x86/nx86ld.pas
index 774addd1ca..eed22bff5d 100644
--- a/compiler/x86/nx86ld.pas
+++ b/compiler/x86/nx86ld.pas
@@ -38,7 +38,7 @@ interface
 implementation
 
     uses
-      globals,
+      globals,fmodule,
       cutils,verbose,systems,
       aasmbase,aasmtai,aasmdata,aasmcpu,
       cgutils,cgobj,
@@ -54,37 +54,10 @@ implementation
       var
         paraloc1 : tcgpara;
         pd: tprocdef;
-        href: treference;
-        hregister : tregister;
-        handled: boolean;
+        href, href_index: treference;
+        hregister, reg_index, reg_offset, reg_start : tregister;
       begin
-        handled:=false;
-        if (tf_section_threadvars in target_info.flags) then
-          begin
-            if target_info.system in [system_i386_win32,system_x86_64_win64] then
-              begin
-                paraloc1.init;
-                pd:=search_system_proc('fpc_tls_add');
-                paramanager.getcgtempparaloc(current_asmdata.CurrAsmList,pd,1,paraloc1);
-                if not(vo_is_weak_external in gvs.varoptions) then
-                  reference_reset_symbol(href,current_asmdata.RefAsmSymbol(gvs.mangledname,AT_DATA,use_indirect_symbol(gvs)),0,sizeof(pint),[])
-                else
-                  reference_reset_symbol(href,current_asmdata.WeakRefAsmSymbol(gvs.mangledname,AT_DATA),0,sizeof(pint),[]);
-                cg.a_loadaddr_ref_cgpara(current_asmdata.CurrAsmList,href,paraloc1);
-                paramanager.freecgpara(current_asmdata.CurrAsmList,paraloc1);
-                paraloc1.done;
-
-                cg.g_call(current_asmdata.CurrAsmList,'FPC_TLS_ADD');
-                cg.ungetcpuregister(current_asmdata.CurrAsmList,NR_FUNCTION_RESULT_REG);
-                hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
-                cg.a_load_reg_reg(current_asmdata.CurrAsmList,OS_ADDR,OS_ADDR,NR_FUNCTION_RESULT_REG,hregister);
-                location.reference.base:=hregister;
-                handled:=true;
-              end;
-          end;
-
-        if not handled then
-          inherited;
+        inherited;
 
         if (tf_section_threadvars in target_info.flags) then
           begin
@@ -156,6 +129,54 @@ implementation
                       Internalerror(2019012002);
                   end;
                 end;
+              system_x86_64_win64:
+                begin
+                  reference_reset(href,sizeof(AInt),[]); href.segment:=NR_GS; href.offset:=88;
+                  hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq %gs:88, %rcx
+
+                  reference_reset_base(href,hregister,0,ctempposinvalid,sizeof(AInt),[]);
+                  if (cs_link_nolink in current_settings.globalswitches) or
+                     (main_module=nil) or main_module.IsLibrary then
+                    begin // _tls_index <> 0
+                      // Internal linker - Error: Incompatible section options
+                      if not ((cs_link_extern in current_settings.globalswitches) or (target_info.link=ld_windows)) then
+                        Comment(V_Error,'Section threadvars currently requires external linker');
+
+                      reference_reset_symbol(href_index,current_asmdata.RefAsmSymbol('_tls_index',AT_DATA),0,sizeof(dword),[]);
+                      href_index.base:=NR_RIP;
+                      reg_index:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                      current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_L,href_index,
+                        newreg(getregtype(reg_index),getsupreg(reg_index),R_SUBD)));                       // movl _tls_index(%rip), %edx
+                      href.index:=reg_index; href.scalefactor:=8;
+                    end;
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq (%rcx), %rcx  |  movq (%rcx,%rdx,8), %rcx
+
+                  reg_offset:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                  href:=location.reference; // from inherited
+
+                if not (cs_asm_pre_binutils_2_25 in current_settings.globalswitches) and
+                   ((cs_link_extern in current_settings.globalswitches) or (target_info.link=ld_windows)) then
+                  begin
+                    href.refaddr:=addr_secrel;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_L,href,
+                      newreg(getregtype(reg_offset),getsupreg(reg_offset),R_SUBD)));                       // movl $threadvar@secrel32, %rax
+                  end
+                else // Old binutils (2.21) and Internal linker not support $threadvar@secrel32
+                  begin
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_offset));      // movq $threadvar, %rax
+                    reg_start:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                    reference_reset_symbol(href,current_asmdata.RefAsmSymbol('___tls_start__',AT_TLS),0,sizeof(AInt),[]);
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_start));       // movq $___tls_start__, %rdx
+                    current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg(A_SUB,S_Q,reg_start,reg_offset)); // subq %rdx, %rax
+                  end;
+
+                  reference_reset(location.reference,location.reference.alignment,location.reference.volatility);
+                  location.reference.base:=hregister;
+                  location.reference.index:=reg_offset;                                                    // (%rcx,%rax)
+                end;
               else
                 ;
             end;
diff --git a/rtl/inc/heaptrc.pp b/rtl/inc/heaptrc.pp
index 920f448293..da7caade9f 100644
--- a/rtl/inc/heaptrc.pp
+++ b/rtl/inc/heaptrc.pp
@@ -1035,11 +1035,13 @@ var
    edata : ptruint; external name '__data_end__';
    sbss : ptruint; external name '__bss_start__';
    ebss : ptruint; external name '__bss_end__';
+{$ifndef FPC_SECTION_THREADVARS}
    TLSKey : PDWord; external name '_FPC_TlsKey';
    TLSSize : DWord; external name '_FPC_TlsSize';
 
 function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
+{$endif !FPC_SECTION_THREADVARS}
 {$endif}
 
 {$ifdef BEOS}
@@ -1104,6 +1106,7 @@ begin
   { inside data, rdata ... bss }
   if (ptruint(p)>=ptruint(@sdata)) and (ptruint(p)<ptruint(@ebss)) then
     exit;
+  {$ifndef FPC_SECTION_THREADVARS}
   { is program multi-threaded and p inside Threadvar range? }
   if TlsKey^<>-1 then
     begin
@@ -1112,6 +1115,7 @@ begin
           (ptruint(p)<ptruint(datap)+TlsSize)) then
         exit;
     end;
+  {$endif !FPC_SECTION_THREADVARS}
 {$endif windows}
 
 {$IFDEF OS2}
diff --git a/rtl/linux/si_impl.inc b/rtl/linux/si_impl.inc
index a727310551..e3ceee427e 100644
--- a/rtl/linux/si_impl.inc
+++ b/rtl/linux/si_impl.inc
@@ -20,7 +20,9 @@ procedure SysEntry_InitTLS(constref info: TEntryInformation); external name 'FPC
 
 var
   InitFinalTable : record end; external name 'INITFINAL';
+  {$ifndef FPC_SECTION_THREADVARS}
   ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+  {$endif}
   {$ifdef FPC_HAS_RESSTRINITS}
   ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
   {$endif}
diff --git a/rtl/win/sysosh.inc b/rtl/win/sysosh.inc
index 5bee0d4c17..4b65a1d41d 100644
--- a/rtl/win/sysosh.inc
+++ b/rtl/win/sysosh.inc
@@ -56,7 +56,9 @@ type
     {$ifdef WIN32}
     asm_exit : Procedure;stdcall;
     {$endif WIN32}
+    {$ifndef FPC_SECTION_THREADVARS}
     TlsKeyAddr : PDWord;
+    {$endif !FPC_SECTION_THREADVARS}
     SysInstance: {$ifdef CPU64}PQWord{$else}PLongInt{$endif};
     WideInitTables : Pointer;
   end;
@@ -117,7 +119,4 @@ var
 {$ifdef FPC_USE_WIN64_SEH}
 procedure _fpc_local_unwind(frame,target: Pointer);compilerproc;
 {$endif FPC_USE_WIN64_SEH}
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer):pointer;compilerproc;
-{$endif FPC_SECTION_THREADVARS}
 
diff --git a/rtl/win/systhrd.inc b/rtl/win/systhrd.inc
index 10bb07fda4..1abc04b53e 100644
--- a/rtl/win/systhrd.inc
+++ b/rtl/win/systhrd.inc
@@ -98,6 +98,10 @@ var
                              Threadvar support
 *****************************************************************************}
 
+    var
+      MainThreadIdWin32 : DWORD;
+      
+{$ifndef FPC_SECTION_THREADVARS}
     var
       // public names are used by heaptrc unit
       threadvarblocksize : dword; public name '_FPC_TlsSize';
@@ -108,9 +112,6 @@ var
       TLSKey : PDWord = @TLSKeyVar; public name '_FPC_TlsKey';
       {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
 
-    var
-      MainThreadIdWin32 : DWORD;
-
     procedure SysInitThreadvar(var offset : dword;size : dword);
       begin
         offset:=threadvarblocksize;
@@ -199,6 +200,7 @@ var
             TlsSetValue(tlskey^, nil);
           end;
       end;
+{$endif !FPC_SECTION_THREADVARS}
 
 
 {*****************************************************************************
@@ -255,7 +257,9 @@ var
         writeln('Creating new thread');
 {$endif DEBUG_MT}
         { Initialize multithreading if not done }
+{$ifndef FPC_SECTION_THREADVARS}
         SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
         IsMultiThread:=true;
 
         { the only way to pass data to the newly created thread
@@ -623,10 +627,17 @@ begin
     EnterCriticalSection   :=@SysEnterCriticalSection;
     TryEnterCriticalSection:=@SysTryEnterCriticalSection;
     LeaveCriticalSection   :=@SysLeaveCriticalSection;
+{$ifndef FPC_SECTION_THREADVARS}
     InitThreadVar          :=@SysInitThreadVar;
     RelocateThreadVar      :=@SysRelocateThreadVar;
     AllocateThreadVars     :=@SysAllocateThreadVars;
     ReleaseThreadVars      :=@SysReleaseThreadVars;
+{$else}
+    InitThreadVar          :=Nil;
+    RelocateThreadVar      :=Nil;
+    AllocateThreadVars     :=Nil;
+    ReleaseThreadVars      :=Nil;
+{$endif FPC_SECTION_THREADVARS}
     BasicEventCreate       :=@intBasicEventCreate;
     BasicEventDestroy      :=@intBasicEventDestroy;
     BasicEventResetEvent   :=@intBasicEventResetEvent;
@@ -641,10 +652,12 @@ begin
     end;
   SetThreadManager(WinThreadManager);
   ThreadID := GetCurrentThreadID;
+{$ifndef FPC_SECTION_THREADVARS}
 {$ifndef FPC_USE_TLS_DIRECTORY}
   if IsLibrary then
 {$endif}
     SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
 
 {$ifndef WINCE}
   KernelHandle:=GetModuleHandle(KernelDLL);
diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index db53b77a94..328f0b141a 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -32,8 +32,10 @@ Const
   DLL_PROCESS_DETACH = 0;
   DLL_THREAD_DETACH = 3;
 
+{$ifndef FPC_SECTION_THREADVARS}
 var
    TlsKey : PDWord = @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
 
 type
   TTlsDirectory=packed record
@@ -47,9 +49,11 @@ type
 function TlsGetValue(dwTlsIndex : DWord) : pointer; stdcall;
   external 'kernel32' name 'TlsGetValue';
 
+{$ifndef FPC_SECTION_THREADVARS}
 procedure InitSystemThreads; external name '_FPC_InitSystemThreads';
 procedure SysAllocateThreadVars; external name '_FPC_SysAllocateThreadVars';
 procedure InitHeap; external name '_FPC_InitHeap';
+{$endif !FPC_SECTION_THREADVARS}
 
 {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
 procedure SetupEntryInformation(constref info: TEntryInformation); external name '_FPC_SetupEntryInformation';
@@ -58,7 +62,7 @@ procedure SetupEntryInformation(constref info: TEntryInformation); external name
 {$endif FPC_INSSIDE_SYSINIT}
 
 procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer);
-  stdcall; [public,alias:'_FPC_Tls_Callback'];
+  stdcall;
   begin
      if IsLibrary then
        Exit;
@@ -77,6 +81,7 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          and the EntryInformation is a constant which sholud prevent troubles }
        DLL_PROCESS_ATTACH:
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
            {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
            { since this procedure is called before SetupEntryInformation and thus
              before EXE_Entry we need to setup the entry information here so that
@@ -85,10 +90,12 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
            {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
            InitHeap;
            InitSystemThreads;
+         {$endif !FPC_SECTION_THREADVARS}
          end;
 
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
          {  !!! SysInitMultithreading must NOT be called here. Windows guarantees that
             the main thread invokes PROCESS_ATTACH, not THREAD_ATTACH. So this always
             executes in non-main thread. SysInitMultithreading() here will cause
@@ -100,6 +107,7 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -107,13 +115,12 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          end;
        DLL_THREAD_DETACH :
          begin
-           if TlsGetValue(TLSKey^)<>nil then
+           if ThreadID<>0 then
              DoneThread; { Assume everything is idempotent there }
          end;
      end;
   end;
 
-
 { Mingw tlssup.c source code has
   _CRTALLOC(".CRT$XLA") PIMAGE_TLS_CALLBACK __xl_a = 0;
   _CRTALLOC(".CRT$XLZ") PIMAGE_TLS_CALLBACK __xl_z = 0;
diff --git a/rtl/win/syswin.inc b/rtl/win/syswin.inc
index 63ee752569..829d432ff5 100644
--- a/rtl/win/syswin.inc
+++ b/rtl/win/syswin.inc
@@ -408,10 +408,12 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
          end;
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
            { SysInitMultithreading must not be called here,
              see comments in exec_tls_callback below }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -425,7 +427,7 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
            if assigned(Dll_Thread_Detach_Hook) then
              Dll_Thread_Detach_Hook(DllParam);
            { Release Threadvars }
-           if TlsGetValue(TLSKey^)<>nil then
+           if ThreadID<>0 then
              DoneThread; { Assume everything is idempotent there }
          end;
        DLL_PROCESS_DETACH :
@@ -439,9 +441,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
                InternalExit;
              end;
 
+         {$ifndef FPC_SECTION_THREADVARS}
            SysReleaseThreadVars;
            { Free TLS resources used by ThreadVars }
            SysFiniMultiThreading;
+         {$endif !FPC_SECTION_THREADVARS}
            MainThreadIDWin32:=0;
          end;
      end;
diff --git a/rtl/win32/sysinit.inc b/rtl/win32/sysinit.inc
index 0eea501af0..c8f7b6f042 100644
--- a/rtl/win32/sysinit.inc
+++ b/rtl/win32/sysinit.inc
@@ -16,10 +16,14 @@
 
    var
       SysInstance : LongInt;
+ {$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+ {$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+ {$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+ {$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
  {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -53,7 +57,11 @@
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -65,7 +73,9 @@
         valgrind_used : false;
         OS : (
           asm_exit : @asm_exit;
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
diff --git a/rtl/win32/system.pp b/rtl/win32/system.pp
index 4ffa0b65ff..4cf69de94d 100644
--- a/rtl/win32/system.pp
+++ b/rtl/win32/system.pp
@@ -78,7 +78,9 @@ end;
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -498,41 +500,6 @@ begin
 end;
 {$endif Set_i386_Exception_handler}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $tls_data_start,%eax
-      cmpb  $0,IsLibrary
-      mov   _tls_index,%ecx
-      jnz   .L1
-      mov   %fs:(0x2c),%edx
-      add   (%edx,%ecx,4),%eax
-      ret
-.L1:
-      push  %ebx
-      mov   %eax,%ebx
-      call  GetLastError
-      push  %eax                      { save LastError }
-      push  _tls_index
-      call  TlsGetValue
-      test  %eax,%eax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%eax
-      call  InitThread
-      push  _tls_index
-      call  TlsGetValue
-.L2:
-      add   %eax,%ebx
-      call  SetLastError              { restore (value is on stack) }
-      mov   %ebx,%eax
-      pop   %ebx
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
diff --git a/rtl/win64/sysinit.pp b/rtl/win64/sysinit.pp
index 5d396beb8f..e27668a0cd 100644
--- a/rtl/win64/sysinit.pp
+++ b/rtl/win64/sysinit.pp
@@ -21,10 +21,14 @@ unit sysinit;
 
    var
       SysInstance : QWord;
+{$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+{$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+{$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+{$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
 {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -58,7 +62,11 @@ unit sysinit;
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -69,7 +77,9 @@ unit sysinit;
         PascalMain : @PascalMain;
         valgrind_used : false;
         OS : (
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
diff --git a/rtl/win64/system.pp b/rtl/win64/system.pp
index 6cf7b174cd..b6fad06fff 100644
--- a/rtl/win64/system.pp
+++ b/rtl/win64/system.pp
@@ -122,7 +122,9 @@ procedure PascalMain;external name 'PASCALMAIN';
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -504,47 +506,6 @@ begin
 end;
 {$endif VER3_0}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $56,%rsp                  { 32 spill area + 16 local vars + 8 misalignment }
-  .seh_stackalloc 56
-  .seh_endprologue
-      lea   tls_data_start(%rip),%rax
-      sub   %rax,%rcx
-      cmpb  $0,IsLibrary(%rip)
-      mov   _tls_index(%rip),%eax
-      jnz   .L1
-      mov   %gs:(88),%rdx
-      add   (%rdx,%rax,8),%rcx
-      mov   %rcx,%rax
-      jmp   .L3
-.L1:
-      mov   %rcx,32(%rsp)
-      call  GetLastError
-      mov   %rax,40(%rsp)             { save LastError }
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-      test  %rax,%rax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%rcx
-      call  InitThread
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-.L2:
-      add   %rax,32(%rsp)
-      mov   40(%rsp),%rcx
-      call  SetLastError
-      mov   32(%rsp),%rax
-.L3:
-      add   $56,%rsp
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
-- 
2.26.2.windows.1

tls_callback-ld-workaround.patch (631 bytes)   
---
 rtl/win/systlsdir.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index 328f0b141a..b7abf26d58 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -146,7 +146,7 @@ const
     data_start : @tls_data_start;
     data_end : @tls_data_end;
     index_pointer : @_tls_index;
-    callbacks_pointer : @tls_callbacks;
+    callbacks_pointer : @FreePascal_TLS_callback{@tls_callbacks}; // Workaround for the problem of not calling Exec_Tls_callback when using ld.exe
     zero_fill_size : 0;
     flags : 0;
   ); cvar; public;
-- 
2.26.2.windows.1

Sergey Larin

2020-08-02 22:02

reporter   ~0124517

If the exe is compiled, _tls_index is always 0 when running, so optimization is applied with this fact in mind.

Sergey Larin

2020-08-02 22:11

reporter   ~0124518

Note: If someone, for some reason, used the trick of creating their own tls callback by using `section '.CRT$XL*'` (which is generally very unlikely and undocumented), then this will stop working.

Sergey Larin

2020-08-03 11:24

reporter   ~0124525

To prevent calling DoneThread if PHP_SECTION_THREADVARS is not defined, the `TlsGetValue(TLSKey^)<>nil` check is used (as before), otherwise SysAllocateThreadVars will be called on the DLL_THREAD_DETACH event, which leads to a memory leak.
Fixup tls-win64-section_threadvars-fixup1.patch
tls-win64-section_threadvars-fixup1.patch (865 bytes)   
---
 rtl/win/systlsdir.inc | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index 328f0b141a..af60fee15b 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -115,12 +115,17 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          end;
        DLL_THREAD_DETACH :
          begin
+         {$ifdef FPC_SECTION_THREADVARS}
            if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
+           if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
      end;
   end;
 
+
 { Mingw tlssup.c source code has
   _CRTALLOC(".CRT$XLA") PIMAGE_TLS_CALLBACK __xl_a = 0;
   _CRTALLOC(".CRT$XLZ") PIMAGE_TLS_CALLBACK __xl_z = 0;
-- 
2.26.2.windows.1

Sergey Larin

2020-08-04 10:04

reporter   ~0124536

Another minor tweak.
Removed local imports of TlsAlloc, TlsFree, TlsGetValue, TlsSetValue, LocalAlloc, LocalFree functions if FPC_SECTION_THREADVARS is defined.
tls-win64-section_threadvars-fixup2.patch (2,770 bytes)   
---
 rtl/win/systhrd.inc   | 8 +++++---
 rtl/win/systlsdir.inc | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/rtl/win/systhrd.inc b/rtl/win/systhrd.inc
index 1abc04b53e..dfa583fee7 100644
--- a/rtl/win/systhrd.inc
+++ b/rtl/win/systhrd.inc
@@ -19,6 +19,7 @@
                            Local WINApi imports
 *****************************************************************************}
 
+{$ifndef FPC_SECTION_THREADVARS}
 const
   { LocalAlloc flags  }
   LMEM_FIXED = 0;
@@ -34,15 +35,16 @@ function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
 function TlsSetValue(dwTlsIndex : DWord;lpTlsValue : pointer) : LongBool;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsSetValue';
+function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
+  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
+function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
+{$endif !FPC_SECTION_THREADVARS}
 function CreateThread(lpThreadAttributes : pointer;
   dwStackSize : SIZE_T; lpStartAddress : pointer;lpParameter : pointer;
   dwCreationFlags : DWord;var lpThreadId : DWord) : THandle;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'CreateThread';
 procedure ExitThread(dwExitCode : DWord);
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ExitThread';
-function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
-  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
-function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
 procedure Sleep(dwMilliseconds: DWord); {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'Sleep';
 function  WinSuspendThread (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'SuspendThread';
 function  WinResumeThread  (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ResumeThread';
diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index af60fee15b..28bd4e9be3 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -46,10 +46,10 @@ type
   end;
 
 
+{$ifndef FPC_SECTION_THREADVARS}
 function TlsGetValue(dwTlsIndex : DWord) : pointer; stdcall;
   external 'kernel32' name 'TlsGetValue';
 
-{$ifndef FPC_SECTION_THREADVARS}
 procedure InitSystemThreads; external name '_FPC_InitSystemThreads';
 procedure SysAllocateThreadVars; external name '_FPC_SysAllocateThreadVars';
 procedure InitHeap; external name '_FPC_InitHeap';
-- 
2.26.2.windows.1

Sergey Larin

2020-08-05 10:32

reporter   ~0124577

For GNU assembler, the .tls section is generated with the "b" (bss section) flag to be the same as the Internal assembler.
Internal linker implements RELOC_SECREL32 address adjustment for threadvar.
In this way, the Internal linker is activated.
Fixed missing generation of TLS Directory in dll.
SECTION_THREADVAR for Win64 is enabled by default. You can disable it if you define DISABLE_TLS_DIRECTORY or SUPPORT_WINXP.

Cumulative patch - tls-win64-section_threadvars-v3.patch
Patch that solves the problem of not calling tls callback when using ld.exe - tls_callback-ld-workaround.patch
tls-win64-section_threadvars-v3.patch (34,401 bytes)   
---
 compiler/aggas.pas         |  4 +-
 compiler/cgbase.pas        |  1 +
 compiler/dbgdwarf.pas      |  3 +-
 compiler/nld.pas           |  4 +-
 compiler/nutils.pas        |  9 +++--
 compiler/ogcoff.pas        | 28 ++++----------
 compiler/systems/i_win.pas |  3 ++
 compiler/x86/aasmcpu.pas   |  6 +++
 compiler/x86/agx86att.pas  |  9 +++++
 compiler/x86/nx86ld.pas    | 79 +++++++++++++++++++++++---------------
 rtl/inc/heaptrc.pp         |  4 ++
 rtl/linux/si_impl.inc      |  2 +
 rtl/win/sysosh.inc         |  5 +--
 rtl/win/systhrd.inc        | 27 ++++++++++---
 rtl/win/systlsdir.inc      | 14 ++++++-
 rtl/win/syswin.inc         |  8 ++++
 rtl/win32/sysinit.inc      | 10 +++++
 rtl/win32/system.pp        | 37 +-----------------
 rtl/win64/sysinit.pp       | 16 ++++++++
 rtl/win64/system.pp        | 43 +--------------------
 20 files changed, 167 insertions(+), 145 deletions(-)

diff --git a/compiler/aggas.pas b/compiler/aggas.pas
index 41d20c5129..f9b69f605b 100644
--- a/compiler/aggas.pas
+++ b/compiler/aggas.pas
@@ -533,7 +533,9 @@ implementation
              { according to the GNU AS guide AS for COFF does not support the
                progbits }
              writer.AsmWrite('.section ');
-             usesectionflags:=true;
+             { .tls section requires special processing }
+             if not((atype=sec_threadvar) and (tf_section_threadvars in target_info.flags)) then
+               usesectionflags:=true;
            end;
          system_powerpc_darwin,
          system_i386_darwin,
diff --git a/compiler/cgbase.pas b/compiler/cgbase.pas
index 6dca71ba40..0e862ca739 100644
--- a/compiler/cgbase.pas
+++ b/compiler/cgbase.pas
@@ -140,6 +140,7 @@ interface
 {$ifdef x86_64}
           ,addr_tpoff
           ,addr_tlsgd
+          ,addr_secrel
 {$endif x86_64}
          );
 
diff --git a/compiler/dbgdwarf.pas b/compiler/dbgdwarf.pas
index 5b75e257b5..d81436ed2a 100644
--- a/compiler/dbgdwarf.pas
+++ b/compiler/dbgdwarf.pas
@@ -2658,7 +2658,8 @@ implementation
                   begin
                     if vo_is_thread_var in sym.varoptions then
                       begin
-                        if tf_section_threadvars in target_info.flags then
+                        if (tf_section_threadvars in target_info.flags) and
+                           (not (target_info.system in [system_x86_64_win64])) then // TODO : dwarf for threadvars on win64
                           begin
                             case sizeof(puint) of
                               2:
diff --git a/compiler/nld.pas b/compiler/nld.pas
index e2bccbc080..0453f162e8 100644
--- a/compiler/nld.pas
+++ b/compiler/nld.pas
@@ -450,7 +450,9 @@ implementation
                 { call to get address of threadvar }
                 if (vo_is_thread_var in tabstractvarsym(symtableentry).varoptions) then
                   begin
-                    include(current_procinfo.flags,pi_do_call);
+                    if (not (tf_section_threadvars in target_info.flags)) or
+                       (current_settings.tlsmodel in [tlsm_local_dynamic,tlsm_global_dynamic]) then
+                      include(current_procinfo.flags,pi_do_call);
                     include(current_procinfo.flags,pi_uses_threadvar);
                   end;
               end;
diff --git a/compiler/nutils.pas b/compiler/nutils.pas
index d586fc222d..b5c0de1937 100644
--- a/compiler/nutils.pas
+++ b/compiler/nutils.pas
@@ -191,7 +191,7 @@ interface
 implementation
 
     uses
-      cutils,verbose,globals,
+      cutils,verbose,globals,systems,
       symconst,symdef,
       defcmp,defutil,
       ncon,ncnv,nld,nflw,nset,ncal,nadd,nmem,ninl,
@@ -799,10 +799,13 @@ implementation
                 begin
                   if assigned(tloadnode(p).left) then
                     inc(result,node_complexity(tloadnode(p).left));
-                  { threadvars need a helper call }
+                  { some threadvars models need a helper call }
                   if (tloadnode(p).symtableentry.typ=staticvarsym) and
                      (vo_is_thread_var in tstaticvarsym(tloadnode(p).symtableentry).varoptions) then
-                    inc(result,5)
+                    if (tf_section_threadvars in target_info.flags) and (current_settings.tlsmodel in [tlsm_local_exec,tlsm_initial_exec]) then
+                      inc(result,3)
+                    else
+                      inc(result,50)
                   else if not((tloadnode(p).symtableentry.typ in [staticvarsym,localvarsym,paravarsym,fieldvarsym]) and
                     (tabstractvarsym(tloadnode(p).symtableentry).varregable in [vr_intreg,vr_mmreg,vr_fpureg])) then
                     inc(result);
diff --git a/compiler/ogcoff.pas b/compiler/ogcoff.pas
index 8c05b3792a..803b9ebad3 100644
--- a/compiler/ogcoff.pas
+++ b/compiler/ogcoff.pas
@@ -1015,8 +1015,10 @@ const pemagic : array[0..3] of byte = (
                   end;
                 RELOC_SECREL32 :
                   begin
-                    { fixup address when the symbol was known in defined object }
-                    if (relocsec.objdata=objsec.objdata) then
+                    { fixup address when the symbol was known in defined object or
+                      when the symbol is a threadvar }
+                    if (relocsec.objdata=objsec.objdata) or
+                       assigned(objreloc.symbol) and (copy(objreloc.symbol.objsection.Name,1,4)='.tls') then
                       dec(address,relocsec.ExeSection.MemPos);
                     inc(address,relocval);
                   end;
@@ -2667,21 +2669,17 @@ const pemagic : array[0..3] of byte = (
 
         procedure UpdateTlsDataDir;
         var
-          {callbacksection : TExeSection;}
           tlsexesymbol: TExeSymbol;
           tlssymbol: TObjSymbol;
-          callbackexesymbol: TExeSymbol;
-          //callbacksymbol: TObjSymbol;
         begin
           { according to GNU ld,
             the callback routines should be placed into .CRT$XL*
             sections, and the thread local variables in .tls
             __tls_start__ and __tls_end__ symbols
-            should be used for the initialized part,
-            which we do not support yet. }
-          { For now, we only pass the address of the __tls_used
+            should be used for the initialized part. }
+          { We pass the address of the __tls_used
             asm symbol into PE_DATADIR_TLS with the correct
-            size of this table (different for win32/win64 }
+            size of this table (different for win32/win64) }
           tlsexesymbol:=texesymbol(ExeSymbolList.Find(
             target_info.Cprefix+'_tls_used'));
           if assigned(tlsexesymbol) then
@@ -2690,18 +2688,6 @@ const pemagic : array[0..3] of byte = (
               peoptheader.DataDirectory[PE_DATADIR_TLS].vaddr:=tlssymbol.address;
               { sizeof(TlsDirectory) is different on host and target when cross-compiling }
               peoptheader.DataDirectory[PE_DATADIR_TLS].size:=TLSDIR_SIZE;
-              if IsSharedLibrary then
-                begin
-                  { Here we should reset __FPC_tls_callbacks value to nil }
-                  callbackexesymbol:=texesymbol(ExeSymbolList.Find(
-                                        '__FPC_tls_callbacks'));
-                  if assigned (callbackexesymbol) then
-                    begin
-                      //callbacksymbol:=callbackexesymbol.ObjSymbol;
-
-                    end;
-                end;
-
            end;
         end;
 
diff --git a/compiler/systems/i_win.pas b/compiler/systems/i_win.pas
index d4e8f842d5..d1ffb4e57c 100644
--- a/compiler/systems/i_win.pas
+++ b/compiler/systems/i_win.pas
@@ -107,6 +107,9 @@ unit i_win;
             shortname    : 'Win64';
             flags        : [tf_files_case_aware,tf_has_dllscanner,
                             tf_smartlink_sections,
+{$if not defined(DISABLE_TLS_DIRECTORY) and not defined(SUPPORT_WINXP)}
+                            tf_section_threadvars,
+{$endif}
                             tf_winlikewidestring,tf_no_pic_supported,
                             tf_dwarf_only_local_labels,
                             tf_no_generic_stackcheck,tf_has_winlike_resources,
diff --git a/compiler/x86/aasmcpu.pas b/compiler/x86/aasmcpu.pas
index f712af7196..115474e6d1 100644
--- a/compiler/x86/aasmcpu.pas
+++ b/compiler/x86/aasmcpu.pas
@@ -3617,6 +3617,12 @@ implementation
                       currabsreloc:=RELOC_TLSGD;
                       currabsreloc32:=RELOC_TLSGD;
                     end
+                  else if oper[opidx]^.ref^.refaddr=addr_secrel then
+                    begin
+                      currrelreloc:=RELOC_SECREL32;
+                      currabsreloc:=RELOC_SECREL32;
+                      currabsreloc32:=RELOC_SECREL32;
+                    end
                   else
 {$endif x86_64}
                     begin
diff --git a/compiler/x86/agx86att.pas b/compiler/x86/agx86att.pas
index 657f073163..bf69e08bd5 100644
--- a/compiler/x86/agx86att.pas
+++ b/compiler/x86/agx86att.pas
@@ -190,6 +190,8 @@ interface
                owner.writer.AsmWrite('@tpoff');
              addr_tlsgd:
                owner.writer.AsmWrite('@tlsgd');
+             addr_secrel:
+               owner.writer.AsmWrite('@secrel32');
 {$endif x86_64}
              else
                ;
@@ -259,6 +261,13 @@ interface
               {$ifdef x86_64},addr_tpoff,addr_tlsgd{$endif x86_64}
               ] then
               WriteReference(o.ref^)
+        {$ifdef x86_64}
+            else if o.ref^.refaddr=addr_secrel then
+              begin
+                owner.writer.AsmWrite('$');
+                WriteReference(o.ref^);
+              end
+        {$endif x86_64}
             else
               begin
                 owner.writer.AsmWrite('$');
diff --git a/compiler/x86/nx86ld.pas b/compiler/x86/nx86ld.pas
index 774addd1ca..61c82755d4 100644
--- a/compiler/x86/nx86ld.pas
+++ b/compiler/x86/nx86ld.pas
@@ -38,10 +38,10 @@ interface
 implementation
 
     uses
-      globals,
+      globals,fmodule,
       cutils,verbose,systems,
       aasmbase,aasmtai,aasmdata,aasmcpu,
-      cgutils,cgobj,
+      cgutils,cgobj,hlcgobj,
       symconst,symdef,symtable,
       cgbase,cpubase,parabase,paramgr,
       procinfo;
@@ -54,37 +54,10 @@ implementation
       var
         paraloc1 : tcgpara;
         pd: tprocdef;
-        href: treference;
-        hregister : tregister;
-        handled: boolean;
+        href, href_index: treference;
+        hregister, reg_index, reg_offset, reg_start : tregister;
       begin
-        handled:=false;
-        if (tf_section_threadvars in target_info.flags) then
-          begin
-            if target_info.system in [system_i386_win32,system_x86_64_win64] then
-              begin
-                paraloc1.init;
-                pd:=search_system_proc('fpc_tls_add');
-                paramanager.getcgtempparaloc(current_asmdata.CurrAsmList,pd,1,paraloc1);
-                if not(vo_is_weak_external in gvs.varoptions) then
-                  reference_reset_symbol(href,current_asmdata.RefAsmSymbol(gvs.mangledname,AT_DATA,use_indirect_symbol(gvs)),0,sizeof(pint),[])
-                else
-                  reference_reset_symbol(href,current_asmdata.WeakRefAsmSymbol(gvs.mangledname,AT_DATA),0,sizeof(pint),[]);
-                cg.a_loadaddr_ref_cgpara(current_asmdata.CurrAsmList,href,paraloc1);
-                paramanager.freecgpara(current_asmdata.CurrAsmList,paraloc1);
-                paraloc1.done;
-
-                cg.g_call(current_asmdata.CurrAsmList,'FPC_TLS_ADD');
-                cg.ungetcpuregister(current_asmdata.CurrAsmList,NR_FUNCTION_RESULT_REG);
-                hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
-                cg.a_load_reg_reg(current_asmdata.CurrAsmList,OS_ADDR,OS_ADDR,NR_FUNCTION_RESULT_REG,hregister);
-                location.reference.base:=hregister;
-                handled:=true;
-              end;
-          end;
-
-        if not handled then
-          inherited;
+        inherited;
 
         if (tf_section_threadvars in target_info.flags) then
           begin
@@ -156,6 +129,48 @@ implementation
                       Internalerror(2019012002);
                   end;
                 end;
+              system_x86_64_win64:
+                begin
+                  reference_reset(href,sizeof(AInt),[]); href.segment:=NR_GS; href.offset:=88;
+                  hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq %gs:88, %rcx
+
+                  reference_reset_base(href,hregister,0,ctempposinvalid,sizeof(AInt),[]);
+                  if (main_module=nil) or main_module.IsLibrary or main_module.is_unit then
+                    begin // _tls_index <> 0
+                      reference_reset_symbol(href_index,current_asmdata.RefAsmSymbol('_tls_index',AT_DATA),0,sizeof(dword),[]);
+                      reg_index:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                      hlcg.a_load_ref_reg(current_asmdata.CurrAsmList,u32inttype,u64inttype,
+                        href_index,reg_index);                                                             // movl _tls_index(%rip), %edx
+                      href.index:=reg_index; href.scalefactor:=8;
+                    end;
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq (%rcx), %rcx  |  movq (%rcx,%rdx,8), %rcx
+
+                  reg_offset:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                  href:=location.reference; // from inherited
+
+                if not (cs_asm_pre_binutils_2_25 in current_settings.globalswitches) then
+                  begin
+                    href.refaddr:=addr_secrel;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_L,href,
+                      newreg(getregtype(reg_offset),getsupreg(reg_offset),R_SUBD)));                       // movl $threadvar@secrel32, %rax
+                  end
+                else // Old binutils (2.21) not support $threadvar@secrel32
+                  begin
+                    href.symbol.typ:=AT_DATA;
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_offset));      // movq $threadvar, %rax
+                    reg_start:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                    reference_reset_symbol(href,current_asmdata.RefAsmSymbol('___tls_start__',AT_DATA),0,sizeof(AInt),[]);
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_start));       // movq $___tls_start__, %rdx
+                    current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg(A_SUB,S_Q,reg_start,reg_offset)); // subq %rdx, %rax
+                  end;
+
+                  reference_reset(location.reference,location.reference.alignment,location.reference.volatility);
+                  location.reference.base:=hregister;
+                  location.reference.index:=reg_offset;                                                    // (%rcx,%rax)
+                end;
               else
                 ;
             end;
diff --git a/rtl/inc/heaptrc.pp b/rtl/inc/heaptrc.pp
index 920f448293..da7caade9f 100644
--- a/rtl/inc/heaptrc.pp
+++ b/rtl/inc/heaptrc.pp
@@ -1035,11 +1035,13 @@ var
    edata : ptruint; external name '__data_end__';
    sbss : ptruint; external name '__bss_start__';
    ebss : ptruint; external name '__bss_end__';
+{$ifndef FPC_SECTION_THREADVARS}
    TLSKey : PDWord; external name '_FPC_TlsKey';
    TLSSize : DWord; external name '_FPC_TlsSize';
 
 function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
+{$endif !FPC_SECTION_THREADVARS}
 {$endif}
 
 {$ifdef BEOS}
@@ -1104,6 +1106,7 @@ begin
   { inside data, rdata ... bss }
   if (ptruint(p)>=ptruint(@sdata)) and (ptruint(p)<ptruint(@ebss)) then
     exit;
+  {$ifndef FPC_SECTION_THREADVARS}
   { is program multi-threaded and p inside Threadvar range? }
   if TlsKey^<>-1 then
     begin
@@ -1112,6 +1115,7 @@ begin
           (ptruint(p)<ptruint(datap)+TlsSize)) then
         exit;
     end;
+  {$endif !FPC_SECTION_THREADVARS}
 {$endif windows}
 
 {$IFDEF OS2}
diff --git a/rtl/linux/si_impl.inc b/rtl/linux/si_impl.inc
index a727310551..e3ceee427e 100644
--- a/rtl/linux/si_impl.inc
+++ b/rtl/linux/si_impl.inc
@@ -20,7 +20,9 @@ procedure SysEntry_InitTLS(constref info: TEntryInformation); external name 'FPC
 
 var
   InitFinalTable : record end; external name 'INITFINAL';
+  {$ifndef FPC_SECTION_THREADVARS}
   ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+  {$endif}
   {$ifdef FPC_HAS_RESSTRINITS}
   ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
   {$endif}
diff --git a/rtl/win/sysosh.inc b/rtl/win/sysosh.inc
index 5bee0d4c17..4b65a1d41d 100644
--- a/rtl/win/sysosh.inc
+++ b/rtl/win/sysosh.inc
@@ -56,7 +56,9 @@ type
     {$ifdef WIN32}
     asm_exit : Procedure;stdcall;
     {$endif WIN32}
+    {$ifndef FPC_SECTION_THREADVARS}
     TlsKeyAddr : PDWord;
+    {$endif !FPC_SECTION_THREADVARS}
     SysInstance: {$ifdef CPU64}PQWord{$else}PLongInt{$endif};
     WideInitTables : Pointer;
   end;
@@ -117,7 +119,4 @@ var
 {$ifdef FPC_USE_WIN64_SEH}
 procedure _fpc_local_unwind(frame,target: Pointer);compilerproc;
 {$endif FPC_USE_WIN64_SEH}
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer):pointer;compilerproc;
-{$endif FPC_SECTION_THREADVARS}
 
diff --git a/rtl/win/systhrd.inc b/rtl/win/systhrd.inc
index 10bb07fda4..dfa583fee7 100644
--- a/rtl/win/systhrd.inc
+++ b/rtl/win/systhrd.inc
@@ -19,6 +19,7 @@
                            Local WINApi imports
 *****************************************************************************}
 
+{$ifndef FPC_SECTION_THREADVARS}
 const
   { LocalAlloc flags  }
   LMEM_FIXED = 0;
@@ -34,15 +35,16 @@ function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
 function TlsSetValue(dwTlsIndex : DWord;lpTlsValue : pointer) : LongBool;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsSetValue';
+function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
+  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
+function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
+{$endif !FPC_SECTION_THREADVARS}
 function CreateThread(lpThreadAttributes : pointer;
   dwStackSize : SIZE_T; lpStartAddress : pointer;lpParameter : pointer;
   dwCreationFlags : DWord;var lpThreadId : DWord) : THandle;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'CreateThread';
 procedure ExitThread(dwExitCode : DWord);
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ExitThread';
-function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
-  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
-function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
 procedure Sleep(dwMilliseconds: DWord); {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'Sleep';
 function  WinSuspendThread (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'SuspendThread';
 function  WinResumeThread  (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ResumeThread';
@@ -98,6 +100,10 @@ var
                              Threadvar support
 *****************************************************************************}
 
+    var
+      MainThreadIdWin32 : DWORD;
+      
+{$ifndef FPC_SECTION_THREADVARS}
     var
       // public names are used by heaptrc unit
       threadvarblocksize : dword; public name '_FPC_TlsSize';
@@ -108,9 +114,6 @@ var
       TLSKey : PDWord = @TLSKeyVar; public name '_FPC_TlsKey';
       {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
 
-    var
-      MainThreadIdWin32 : DWORD;
-
     procedure SysInitThreadvar(var offset : dword;size : dword);
       begin
         offset:=threadvarblocksize;
@@ -199,6 +202,7 @@ var
             TlsSetValue(tlskey^, nil);
           end;
       end;
+{$endif !FPC_SECTION_THREADVARS}
 
 
 {*****************************************************************************
@@ -255,7 +259,9 @@ var
         writeln('Creating new thread');
 {$endif DEBUG_MT}
         { Initialize multithreading if not done }
+{$ifndef FPC_SECTION_THREADVARS}
         SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
         IsMultiThread:=true;
 
         { the only way to pass data to the newly created thread
@@ -623,10 +629,17 @@ begin
     EnterCriticalSection   :=@SysEnterCriticalSection;
     TryEnterCriticalSection:=@SysTryEnterCriticalSection;
     LeaveCriticalSection   :=@SysLeaveCriticalSection;
+{$ifndef FPC_SECTION_THREADVARS}
     InitThreadVar          :=@SysInitThreadVar;
     RelocateThreadVar      :=@SysRelocateThreadVar;
     AllocateThreadVars     :=@SysAllocateThreadVars;
     ReleaseThreadVars      :=@SysReleaseThreadVars;
+{$else}
+    InitThreadVar          :=Nil;
+    RelocateThreadVar      :=Nil;
+    AllocateThreadVars     :=Nil;
+    ReleaseThreadVars      :=Nil;
+{$endif FPC_SECTION_THREADVARS}
     BasicEventCreate       :=@intBasicEventCreate;
     BasicEventDestroy      :=@intBasicEventDestroy;
     BasicEventResetEvent   :=@intBasicEventResetEvent;
@@ -641,10 +654,12 @@ begin
     end;
   SetThreadManager(WinThreadManager);
   ThreadID := GetCurrentThreadID;
+{$ifndef FPC_SECTION_THREADVARS}
 {$ifndef FPC_USE_TLS_DIRECTORY}
   if IsLibrary then
 {$endif}
     SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
 
 {$ifndef WINCE}
   KernelHandle:=GetModuleHandle(KernelDLL);
diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index db53b77a94..28bd4e9be3 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -32,8 +32,10 @@ Const
   DLL_PROCESS_DETACH = 0;
   DLL_THREAD_DETACH = 3;
 
+{$ifndef FPC_SECTION_THREADVARS}
 var
    TlsKey : PDWord = @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
 
 type
   TTlsDirectory=packed record
@@ -44,12 +46,14 @@ type
   end;
 
 
+{$ifndef FPC_SECTION_THREADVARS}
 function TlsGetValue(dwTlsIndex : DWord) : pointer; stdcall;
   external 'kernel32' name 'TlsGetValue';
 
 procedure InitSystemThreads; external name '_FPC_InitSystemThreads';
 procedure SysAllocateThreadVars; external name '_FPC_SysAllocateThreadVars';
 procedure InitHeap; external name '_FPC_InitHeap';
+{$endif !FPC_SECTION_THREADVARS}
 
 {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
 procedure SetupEntryInformation(constref info: TEntryInformation); external name '_FPC_SetupEntryInformation';
@@ -58,7 +62,7 @@ procedure SetupEntryInformation(constref info: TEntryInformation); external name
 {$endif FPC_INSSIDE_SYSINIT}
 
 procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer);
-  stdcall; [public,alias:'_FPC_Tls_Callback'];
+  stdcall;
   begin
      if IsLibrary then
        Exit;
@@ -77,6 +81,7 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          and the EntryInformation is a constant which sholud prevent troubles }
        DLL_PROCESS_ATTACH:
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
            {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
            { since this procedure is called before SetupEntryInformation and thus
              before EXE_Entry we need to setup the entry information here so that
@@ -85,10 +90,12 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
            {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
            InitHeap;
            InitSystemThreads;
+         {$endif !FPC_SECTION_THREADVARS}
          end;
 
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
          {  !!! SysInitMultithreading must NOT be called here. Windows guarantees that
             the main thread invokes PROCESS_ATTACH, not THREAD_ATTACH. So this always
             executes in non-main thread. SysInitMultithreading() here will cause
@@ -100,6 +107,7 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -107,7 +115,11 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          end;
        DLL_THREAD_DETACH :
          begin
+         {$ifdef FPC_SECTION_THREADVARS}
+           if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
            if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
      end;
diff --git a/rtl/win/syswin.inc b/rtl/win/syswin.inc
index 63ee752569..37fb574f33 100644
--- a/rtl/win/syswin.inc
+++ b/rtl/win/syswin.inc
@@ -408,10 +408,12 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
          end;
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
            { SysInitMultithreading must not be called here,
              see comments in exec_tls_callback below }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -425,7 +427,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
            if assigned(Dll_Thread_Detach_Hook) then
              Dll_Thread_Detach_Hook(DllParam);
            { Release Threadvars }
+         {$ifdef FPC_SECTION_THREADVARS}
+           if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
            if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
        DLL_PROCESS_DETACH :
@@ -439,9 +445,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
                InternalExit;
              end;
 
+         {$ifndef FPC_SECTION_THREADVARS}
            SysReleaseThreadVars;
            { Free TLS resources used by ThreadVars }
            SysFiniMultiThreading;
+         {$endif !FPC_SECTION_THREADVARS}
            MainThreadIDWin32:=0;
          end;
      end;
diff --git a/rtl/win32/sysinit.inc b/rtl/win32/sysinit.inc
index 0eea501af0..c8f7b6f042 100644
--- a/rtl/win32/sysinit.inc
+++ b/rtl/win32/sysinit.inc
@@ -16,10 +16,14 @@
 
    var
       SysInstance : LongInt;
+ {$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+ {$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+ {$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+ {$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
  {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -53,7 +57,11 @@
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -65,7 +73,9 @@
         valgrind_used : false;
         OS : (
           asm_exit : @asm_exit;
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
diff --git a/rtl/win32/system.pp b/rtl/win32/system.pp
index 4ffa0b65ff..4cf69de94d 100644
--- a/rtl/win32/system.pp
+++ b/rtl/win32/system.pp
@@ -78,7 +78,9 @@ end;
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -498,41 +500,6 @@ begin
 end;
 {$endif Set_i386_Exception_handler}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $tls_data_start,%eax
-      cmpb  $0,IsLibrary
-      mov   _tls_index,%ecx
-      jnz   .L1
-      mov   %fs:(0x2c),%edx
-      add   (%edx,%ecx,4),%eax
-      ret
-.L1:
-      push  %ebx
-      mov   %eax,%ebx
-      call  GetLastError
-      push  %eax                      { save LastError }
-      push  _tls_index
-      call  TlsGetValue
-      test  %eax,%eax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%eax
-      call  InitThread
-      push  _tls_index
-      call  TlsGetValue
-.L2:
-      add   %eax,%ebx
-      call  SetLastError              { restore (value is on stack) }
-      mov   %ebx,%eax
-      pop   %ebx
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
diff --git a/rtl/win64/sysinit.pp b/rtl/win64/sysinit.pp
index 5d396beb8f..9d740e19ec 100644
--- a/rtl/win64/sysinit.pp
+++ b/rtl/win64/sysinit.pp
@@ -21,10 +21,14 @@ unit sysinit;
 
    var
       SysInstance : QWord;
+{$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+{$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+{$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+{$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
 {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -58,7 +62,11 @@ unit sysinit;
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -69,7 +77,9 @@ unit sysinit;
         PascalMain : @PascalMain;
         valgrind_used : false;
         OS : (
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
@@ -124,6 +134,9 @@ unit sysinit;
       sysinstance:=_hinstance;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
+{$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
+      LinkIn(@tlsdir,@tls_callback_end,@tls_callback);
+{$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
     end;
@@ -135,6 +148,9 @@ unit sysinit;
       sysinstance:=_hinstance;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
+{$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
+      LinkIn(@tlsdir,@tls_callback_end,@tls_callback);
+{$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
     end;
diff --git a/rtl/win64/system.pp b/rtl/win64/system.pp
index 6cf7b174cd..b6fad06fff 100644
--- a/rtl/win64/system.pp
+++ b/rtl/win64/system.pp
@@ -122,7 +122,9 @@ procedure PascalMain;external name 'PASCALMAIN';
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -504,47 +506,6 @@ begin
 end;
 {$endif VER3_0}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $56,%rsp                  { 32 spill area + 16 local vars + 8 misalignment }
-  .seh_stackalloc 56
-  .seh_endprologue
-      lea   tls_data_start(%rip),%rax
-      sub   %rax,%rcx
-      cmpb  $0,IsLibrary(%rip)
-      mov   _tls_index(%rip),%eax
-      jnz   .L1
-      mov   %gs:(88),%rdx
-      add   (%rdx,%rax,8),%rcx
-      mov   %rcx,%rax
-      jmp   .L3
-.L1:
-      mov   %rcx,32(%rsp)
-      call  GetLastError
-      mov   %rax,40(%rsp)             { save LastError }
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-      test  %rax,%rax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%rcx
-      call  InitThread
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-.L2:
-      add   %rax,32(%rsp)
-      mov   40(%rsp),%rcx
-      call  SetLastError
-      mov   32(%rsp),%rax
-.L3:
-      add   $56,%rsp
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
-- 
2.26.2.windows.1

Sven Barth

2020-08-05 21:55

manager   ~0124584

First of, thank you for implementing this.

I have not yet tested it, but I have a few points I noticed while looking through your changes:
- if you disable Windows XP/2003 support then maybe you should set the OS version of the PE binary to 6.0 so that it won't run on systems older than Vista (I don't know if ld allows it, but the internal linker definitely does) - it's done in TCoffexeoutput.writedata, look for Major-/MinorOperatingSystemVersion and probably also Major-/MinorSubsystemVersion
- why are you using main_module in nx86ld? I have the feeling that you're using it for the wrong reasons
- is there a reason why you use the TLS callback for libraries as well when the normal entry point should be enough?
- why did you remove the alias of the Exec_tls_callback? It's still used for the tls_callback variable
- why did you change rtl/linux/si_impl.inc?
- the DLL_PROCESS_ATTACH initialization for the heap in Exec_tls_callback needs to be kept, because if the DLL_PROCESS_ATTACH entry of a used DLL creates a thread, DLL_THREAD_ATTACH of the executable will be called so that the RTL for the thread can be initialized (we're not doing lazy initialization for each thread) and that needs the code in DLL_PROCESS_ATTACH to be run first

Additionally please also test access to thread variables contained in a dynamic package (which are essentially libraries). I don't know how this needs to be done for section threadsvars, but for the default implementation it's done using an indirect access.

You can test it as follows:
- add tf_supports_packages to system_x64_win64_info.flags and build the compiler as well as the RTL (or rebuild the whole source, your choice)
- build the RTL package by compiling the attached package file (this will generate a rtl.ppl and rtl.pcp)
- build a test program that accesses a threadvar in the RTL package (e.g. one of those around line 804 in rtl\inc\systemh.inc) passing "-FPrtl -Fppath\where\pcp\resides" as additional parameters
rtl.ppk (644 bytes)   
package Rtl;

contains
  System,
  ObjPas,
  MacPas,
  Iso7185,
  FPIntRes,
{$ifdef darwin}
  FPExtRes,
{$endif}
  Classes,
  SysUtils,
  character,
  charset,
  cpu,
  dos,
  dynlibs,
  exeinfo,
  fgl,
  fpwidestring,
  getopts,
  heaptrc,
  lineinfo,
  lnfodwrf,
  math,
  rtlconsts,
  strings,
  sysconst,
  types,
  typinfo,
  uuchar,
{$ifdef linux}
  syscall,
  linux,
  linuxvcs,
{$endif}
{$ifdef unix}
  ports,
  errors,
  dl,
  unixcp,
  unix,
  unixtype,
  unixutil,
  termio,
  baseunix,
  x86,
{$endif}
{$ifdef cpux86}
  mmx,
{$endif}
{$ifdef windows}
  Windows,
{$endif}
  { some unit at the end after the defines }
  ctypes;

end.
rtl.ppk (644 bytes)   

Sergey Larin

2020-08-06 19:15

reporter   ~0124623

Last edited: 2020-08-10 10:40

View 5 revisions

> - if you disable Windows XP/2003 support then maybe you should set the OS version of the PE binary to 6.0 so that it won't run on systems older than Vista (I don't know if ld allows it, but the internal linker definitely does) - it's done in TCoffexeoutput.writedata, look for Major-/MinorOperatingSystemVersion and probably also Major-/MinorSubsystemVersion

Implemented for both internal linker and ld.exe


> - why are you using main_module in nx86ld? I have the feeling that you're using it for the wrong reasons

Using the `not ((main_module=nil) or main_module.IsLibrary or main_module.is_unit)` check, I determine that the exe is currently being compiled and, accordingly, I can apply the optimization that _tls_index is 0.
If there is another more correct way, please tell us about it.


> - is there a reason why you use the TLS callback for libraries as well when the normal entry point should be enough?

If the symbol '_tls_used' is not referenced, then it will be removed by the linker, the dll will not contain the TLS Directory, the OS will not install tls_index. Therefore, the symbol '_tls_used' must be referenced in _FPC_DLLMainCRTStartup, _FPC_DLLWinMainCRTStartup.
The TLS callback itself is not needed, but this is not a problem, because at the beginning of Exec_Tls_callback there is a check `if IsLibrary then Exit;`


> - why did you remove the alias of the Exec_tls_callback? It's still used for the tls_callback variable

The alias name was '_FPC_Tls_Callback', not'__FPC_tls_callbacks'. '_FPC_Tls_Callback' is not used anywhere.


> - why did you change rtl/linux/si_impl.inc?

In general, it is at the same time.
The variable ThreadvarTablesTable is declared as `external name 'FPC_THREADVARTABLES'`.
The symbol named 'FPC_THREADVARTABLES' defines compiler only if `not (tf_section_threadvars in target_info.flags)`.
When linking, the presence of the ThreadvarTablesTable variable if `tf_section_threadvars in target_info.flags` causes an error on Windows, but not on Linux. But this is potentially a error.


> - the DLL_PROCESS_ATTACH initialization for the heap in Exec_tls_callback needs to be kept, because if the DLL_PROCESS_ATTACH entry of a used DLL creates a thread, DLL_THREAD_ATTACH of the executable will be called so that the RTL for the thread can be initialized (we're not doing lazy initialization for each thread) and that needs the code in DLL_PROCESS_ATTACH to be run first

Done. Added a comment.


> Additionally please also test access to thread variables contained in a dynamic package (which are essentially libraries).

Already when compiling pp.pas using ppc1.exe an error occurs: Undefined symbol: U_$SYSTEM_$$_THREADID$indirect (first seen in sysinit.o)
When tf_section_threadvars is disabled, the test application using the dynamic Rtl package works fine when accessing the ThreadID, for example.
In general, it seems that native threadvar support for dynamic packages is either not very easy to implement, or not possible at all. MSVC, for example, prohibits exporting threadvar from the dll, i.e. ' __declspec(dllexport) __declspec(thread) ` you can't write.
The current implementation is definitely not enough, because at least to correctly access threadvar from another library, you need to use tls_index from this other library, and not tls_index from the current module.
I also tried to do this in FPC on Linux. Fail.
An error occurs when compiling rtl.ppk: system.o: relocation R_X86_64_PC32 against symbol `SYSTEM_$$_HANDLEERRORADDRFRAME$LONGINT$POINTER$POINTER' can not be used when making a shared object; recompile with -fPIC
If you compile FPC (make all OPT='-fPIC') itself with -fPIC, then the error is the same.
Apparently now if you need to use tf_support_packages, then tf_section_threadvars must be disabled somehow.

Increment patch - tls-win64-section_threadvars-fixup4.patch
Cumulative patch - tls-win64-section_threadvars-v4.patch

Sergey Larin

2020-08-06 19:22

reporter   ~0124625

tls-win64-section_threadvars-fixup4.patch (4,936 bytes)   
---
 compiler/ogcoff.pas        |  7 ++++++-
 compiler/systems/t_win.pas | 27 ++++++++++++++++++++++++++-
 rtl/win/systlsdir.inc      |  8 ++++++--
 3 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/compiler/ogcoff.pas b/compiler/ogcoff.pas
index 803b9ebad3..90bbca8b0d 100644
--- a/compiler/ogcoff.pas
+++ b/compiler/ogcoff.pas
@@ -2776,7 +2776,10 @@ const pemagic : array[0..3] of byte = (
               end
             else
               begin
-                peoptheader.MajorOperatingSystemVersion:=4;
+                if tf_section_threadvars in target_info.flags then
+                  peoptheader.MajorOperatingSystemVersion:=6 // >= Windows Vista
+                else
+                  peoptheader.MajorOperatingSystemVersion:=4;
                 peoptheader.MinorOperatingSystemVersion:=0;
               end;
             if SetPEUserVersionSetExplicitely then
@@ -2798,6 +2801,8 @@ const pemagic : array[0..3] of byte = (
               begin
                 if target_info.system in systems_wince then
                   peoptheader.MajorSubsystemVersion:=3
+                else if tf_section_threadvars in target_info.flags then
+                  peoptheader.MajorSubsystemVersion:=6 // >= Windows Vista
                 else
                   peoptheader.MajorSubsystemVersion:=4;
                 peoptheader.MinorSubsystemVersion:=0;
diff --git a/compiler/systems/t_win.pas b/compiler/systems/t_win.pas
index 887787f8e9..b96f820a3c 100644
--- a/compiler/systems/t_win.pas
+++ b/compiler/systems/t_win.pas
@@ -1711,11 +1711,36 @@ implementation
             else
               ;
           end;
-        if dllversion<>'' then
+        if SetPEUserVersionSetExplicitely then
+          begin
+            peoptheader.MajorImageVersion:=peuserversionmajor;
+            peoptheader.MinorImageVersion:=peuserversionminor;
+          end
+        else if dllversion<>'' then
           begin
            peoptheader.MajorImageVersion:=dllmajor;
            peoptheader.MinorImageVersion:=dllminor;
           end;
+        if SetPEOSVersionSetExplicitely then
+          begin
+            peoptheader.MajorOperatingSystemVersion:=peosversionmajor;
+            peoptheader.MinorOperatingSystemVersion:=peosversionminor;
+          end
+        else if tf_section_threadvars in target_info.flags then
+          begin
+            peoptheader.MajorOperatingSystemVersion:=6; // >= Windows Vista
+            peoptheader.MinorOperatingSystemVersion:=0;
+          end;
+        if SetPESubSysVersionSetExplicitely then
+          begin
+            peoptheader.MajorSubsystemVersion:=pesubsysversionmajor;
+            peoptheader.MinorSubsystemVersion:=pesubsysversionminor;
+          end
+        else if tf_section_threadvars in target_info.flags then
+          begin
+            peoptheader.MajorSubsystemVersion:=6; // >= Windows Vista
+            peoptheader.MinorSubsystemVersion:=0;
+          end;
         { reset timestamp }
         peheader.time:=0;
         { write header back, skip pe magic }
diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index 28bd4e9be3..d0b48a89a5 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -52,8 +52,8 @@ function TlsGetValue(dwTlsIndex : DWord) : pointer; stdcall;
 
 procedure InitSystemThreads; external name '_FPC_InitSystemThreads';
 procedure SysAllocateThreadVars; external name '_FPC_SysAllocateThreadVars';
-procedure InitHeap; external name '_FPC_InitHeap';
 {$endif !FPC_SECTION_THREADVARS}
+procedure InitHeap; external name '_FPC_InitHeap';
 
 {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
 procedure SetupEntryInformation(constref info: TEntryInformation); external name '_FPC_SetupEntryInformation';
@@ -81,14 +81,18 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          and the EntryInformation is a constant which sholud prevent troubles }
        DLL_PROCESS_ATTACH:
          begin
-         {$ifndef FPC_SECTION_THREADVARS}
            {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
            { since this procedure is called before SetupEntryInformation and thus
              before EXE_Entry we need to setup the entry information here so that
              the threadvar handling can be correctly initialized }
            SetupEntryInformation(SysInitEntryInformation);
            {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
+           { the DLL_PROCESS_ATTACH initialization for the heap is required, because
+             if the DLL_PROCESS_ATTACH entry of a used DLL creates a thread, DLL_THREAD_ATTACH
+             of the executable will be called so that the RTL for the thread can be initialized
+             and that needs the code in DLL_PROCESS_ATTACH to be run first }
            InitHeap;
+         {$ifndef FPC_SECTION_THREADVARS}
            InitSystemThreads;
          {$endif !FPC_SECTION_THREADVARS}
          end;
-- 
2.26.2.windows.1

tls-win64-section_threadvars-v4.patch (37,088 bytes)   
---
 compiler/aggas.pas         |  4 +-
 compiler/cgbase.pas        |  1 +
 compiler/dbgdwarf.pas      |  3 +-
 compiler/nld.pas           |  4 +-
 compiler/nutils.pas        |  9 +++--
 compiler/ogcoff.pas        | 35 +++++++----------
 compiler/systems/i_win.pas |  3 ++
 compiler/systems/t_win.pas | 27 ++++++++++++-
 compiler/x86/aasmcpu.pas   |  6 +++
 compiler/x86/agx86att.pas  |  9 +++++
 compiler/x86/nx86ld.pas    | 79 +++++++++++++++++++++++---------------
 rtl/inc/heaptrc.pp         |  4 ++
 rtl/linux/si_impl.inc      |  2 +
 rtl/win/sysosh.inc         |  5 +--
 rtl/win/systhrd.inc        | 27 ++++++++++---
 rtl/win/systlsdir.inc      | 18 ++++++++-
 rtl/win/syswin.inc         |  8 ++++
 rtl/win32/sysinit.inc      | 10 +++++
 rtl/win32/system.pp        | 37 +-----------------
 rtl/win64/sysinit.pp       | 16 ++++++++
 rtl/win64/system.pp        | 43 +--------------------
 21 files changed, 203 insertions(+), 147 deletions(-)

diff --git a/compiler/aggas.pas b/compiler/aggas.pas
index 41d20c5129..f9b69f605b 100644
--- a/compiler/aggas.pas
+++ b/compiler/aggas.pas
@@ -533,7 +533,9 @@ implementation
              { according to the GNU AS guide AS for COFF does not support the
                progbits }
              writer.AsmWrite('.section ');
-             usesectionflags:=true;
+             { .tls section requires special processing }
+             if not((atype=sec_threadvar) and (tf_section_threadvars in target_info.flags)) then
+               usesectionflags:=true;
            end;
          system_powerpc_darwin,
          system_i386_darwin,
diff --git a/compiler/cgbase.pas b/compiler/cgbase.pas
index 6dca71ba40..0e862ca739 100644
--- a/compiler/cgbase.pas
+++ b/compiler/cgbase.pas
@@ -140,6 +140,7 @@ interface
 {$ifdef x86_64}
           ,addr_tpoff
           ,addr_tlsgd
+          ,addr_secrel
 {$endif x86_64}
          );
 
diff --git a/compiler/dbgdwarf.pas b/compiler/dbgdwarf.pas
index 5b75e257b5..d81436ed2a 100644
--- a/compiler/dbgdwarf.pas
+++ b/compiler/dbgdwarf.pas
@@ -2658,7 +2658,8 @@ implementation
                   begin
                     if vo_is_thread_var in sym.varoptions then
                       begin
-                        if tf_section_threadvars in target_info.flags then
+                        if (tf_section_threadvars in target_info.flags) and
+                           (not (target_info.system in [system_x86_64_win64])) then // TODO : dwarf for threadvars on win64
                           begin
                             case sizeof(puint) of
                               2:
diff --git a/compiler/nld.pas b/compiler/nld.pas
index e2bccbc080..0453f162e8 100644
--- a/compiler/nld.pas
+++ b/compiler/nld.pas
@@ -450,7 +450,9 @@ implementation
                 { call to get address of threadvar }
                 if (vo_is_thread_var in tabstractvarsym(symtableentry).varoptions) then
                   begin
-                    include(current_procinfo.flags,pi_do_call);
+                    if (not (tf_section_threadvars in target_info.flags)) or
+                       (current_settings.tlsmodel in [tlsm_local_dynamic,tlsm_global_dynamic]) then
+                      include(current_procinfo.flags,pi_do_call);
                     include(current_procinfo.flags,pi_uses_threadvar);
                   end;
               end;
diff --git a/compiler/nutils.pas b/compiler/nutils.pas
index d586fc222d..b5c0de1937 100644
--- a/compiler/nutils.pas
+++ b/compiler/nutils.pas
@@ -191,7 +191,7 @@ interface
 implementation
 
     uses
-      cutils,verbose,globals,
+      cutils,verbose,globals,systems,
       symconst,symdef,
       defcmp,defutil,
       ncon,ncnv,nld,nflw,nset,ncal,nadd,nmem,ninl,
@@ -799,10 +799,13 @@ implementation
                 begin
                   if assigned(tloadnode(p).left) then
                     inc(result,node_complexity(tloadnode(p).left));
-                  { threadvars need a helper call }
+                  { some threadvars models need a helper call }
                   if (tloadnode(p).symtableentry.typ=staticvarsym) and
                      (vo_is_thread_var in tstaticvarsym(tloadnode(p).symtableentry).varoptions) then
-                    inc(result,5)
+                    if (tf_section_threadvars in target_info.flags) and (current_settings.tlsmodel in [tlsm_local_exec,tlsm_initial_exec]) then
+                      inc(result,3)
+                    else
+                      inc(result,50)
                   else if not((tloadnode(p).symtableentry.typ in [staticvarsym,localvarsym,paravarsym,fieldvarsym]) and
                     (tabstractvarsym(tloadnode(p).symtableentry).varregable in [vr_intreg,vr_mmreg,vr_fpureg])) then
                     inc(result);
diff --git a/compiler/ogcoff.pas b/compiler/ogcoff.pas
index 8c05b3792a..90bbca8b0d 100644
--- a/compiler/ogcoff.pas
+++ b/compiler/ogcoff.pas
@@ -1015,8 +1015,10 @@ const pemagic : array[0..3] of byte = (
                   end;
                 RELOC_SECREL32 :
                   begin
-                    { fixup address when the symbol was known in defined object }
-                    if (relocsec.objdata=objsec.objdata) then
+                    { fixup address when the symbol was known in defined object or
+                      when the symbol is a threadvar }
+                    if (relocsec.objdata=objsec.objdata) or
+                       assigned(objreloc.symbol) and (copy(objreloc.symbol.objsection.Name,1,4)='.tls') then
                       dec(address,relocsec.ExeSection.MemPos);
                     inc(address,relocval);
                   end;
@@ -2667,21 +2669,17 @@ const pemagic : array[0..3] of byte = (
 
         procedure UpdateTlsDataDir;
         var
-          {callbacksection : TExeSection;}
           tlsexesymbol: TExeSymbol;
           tlssymbol: TObjSymbol;
-          callbackexesymbol: TExeSymbol;
-          //callbacksymbol: TObjSymbol;
         begin
           { according to GNU ld,
             the callback routines should be placed into .CRT$XL*
             sections, and the thread local variables in .tls
             __tls_start__ and __tls_end__ symbols
-            should be used for the initialized part,
-            which we do not support yet. }
-          { For now, we only pass the address of the __tls_used
+            should be used for the initialized part. }
+          { We pass the address of the __tls_used
             asm symbol into PE_DATADIR_TLS with the correct
-            size of this table (different for win32/win64 }
+            size of this table (different for win32/win64) }
           tlsexesymbol:=texesymbol(ExeSymbolList.Find(
             target_info.Cprefix+'_tls_used'));
           if assigned(tlsexesymbol) then
@@ -2690,18 +2688,6 @@ const pemagic : array[0..3] of byte = (
               peoptheader.DataDirectory[PE_DATADIR_TLS].vaddr:=tlssymbol.address;
               { sizeof(TlsDirectory) is different on host and target when cross-compiling }
               peoptheader.DataDirectory[PE_DATADIR_TLS].size:=TLSDIR_SIZE;
-              if IsSharedLibrary then
-                begin
-                  { Here we should reset __FPC_tls_callbacks value to nil }
-                  callbackexesymbol:=texesymbol(ExeSymbolList.Find(
-                                        '__FPC_tls_callbacks'));
-                  if assigned (callbackexesymbol) then
-                    begin
-                      //callbacksymbol:=callbackexesymbol.ObjSymbol;
-
-                    end;
-                end;
-
            end;
         end;
 
@@ -2790,7 +2776,10 @@ const pemagic : array[0..3] of byte = (
               end
             else
               begin
-                peoptheader.MajorOperatingSystemVersion:=4;
+                if tf_section_threadvars in target_info.flags then
+                  peoptheader.MajorOperatingSystemVersion:=6 // >= Windows Vista
+                else
+                  peoptheader.MajorOperatingSystemVersion:=4;
                 peoptheader.MinorOperatingSystemVersion:=0;
               end;
             if SetPEUserVersionSetExplicitely then
@@ -2812,6 +2801,8 @@ const pemagic : array[0..3] of byte = (
               begin
                 if target_info.system in systems_wince then
                   peoptheader.MajorSubsystemVersion:=3
+                else if tf_section_threadvars in target_info.flags then
+                  peoptheader.MajorSubsystemVersion:=6 // >= Windows Vista
                 else
                   peoptheader.MajorSubsystemVersion:=4;
                 peoptheader.MinorSubsystemVersion:=0;
diff --git a/compiler/systems/i_win.pas b/compiler/systems/i_win.pas
index d4e8f842d5..d1ffb4e57c 100644
--- a/compiler/systems/i_win.pas
+++ b/compiler/systems/i_win.pas
@@ -107,6 +107,9 @@ unit i_win;
             shortname    : 'Win64';
             flags        : [tf_files_case_aware,tf_has_dllscanner,
                             tf_smartlink_sections,
+{$if not defined(DISABLE_TLS_DIRECTORY) and not defined(SUPPORT_WINXP)}
+                            tf_section_threadvars,
+{$endif}
                             tf_winlikewidestring,tf_no_pic_supported,
                             tf_dwarf_only_local_labels,
                             tf_no_generic_stackcheck,tf_has_winlike_resources,
diff --git a/compiler/systems/t_win.pas b/compiler/systems/t_win.pas
index 887787f8e9..b96f820a3c 100644
--- a/compiler/systems/t_win.pas
+++ b/compiler/systems/t_win.pas
@@ -1711,11 +1711,36 @@ implementation
             else
               ;
           end;
-        if dllversion<>'' then
+        if SetPEUserVersionSetExplicitely then
+          begin
+            peoptheader.MajorImageVersion:=peuserversionmajor;
+            peoptheader.MinorImageVersion:=peuserversionminor;
+          end
+        else if dllversion<>'' then
           begin
            peoptheader.MajorImageVersion:=dllmajor;
            peoptheader.MinorImageVersion:=dllminor;
           end;
+        if SetPEOSVersionSetExplicitely then
+          begin
+            peoptheader.MajorOperatingSystemVersion:=peosversionmajor;
+            peoptheader.MinorOperatingSystemVersion:=peosversionminor;
+          end
+        else if tf_section_threadvars in target_info.flags then
+          begin
+            peoptheader.MajorOperatingSystemVersion:=6; // >= Windows Vista
+            peoptheader.MinorOperatingSystemVersion:=0;
+          end;
+        if SetPESubSysVersionSetExplicitely then
+          begin
+            peoptheader.MajorSubsystemVersion:=pesubsysversionmajor;
+            peoptheader.MinorSubsystemVersion:=pesubsysversionminor;
+          end
+        else if tf_section_threadvars in target_info.flags then
+          begin
+            peoptheader.MajorSubsystemVersion:=6; // >= Windows Vista
+            peoptheader.MinorSubsystemVersion:=0;
+          end;
         { reset timestamp }
         peheader.time:=0;
         { write header back, skip pe magic }
diff --git a/compiler/x86/aasmcpu.pas b/compiler/x86/aasmcpu.pas
index f712af7196..115474e6d1 100644
--- a/compiler/x86/aasmcpu.pas
+++ b/compiler/x86/aasmcpu.pas
@@ -3617,6 +3617,12 @@ implementation
                       currabsreloc:=RELOC_TLSGD;
                       currabsreloc32:=RELOC_TLSGD;
                     end
+                  else if oper[opidx]^.ref^.refaddr=addr_secrel then
+                    begin
+                      currrelreloc:=RELOC_SECREL32;
+                      currabsreloc:=RELOC_SECREL32;
+                      currabsreloc32:=RELOC_SECREL32;
+                    end
                   else
 {$endif x86_64}
                     begin
diff --git a/compiler/x86/agx86att.pas b/compiler/x86/agx86att.pas
index 657f073163..bf69e08bd5 100644
--- a/compiler/x86/agx86att.pas
+++ b/compiler/x86/agx86att.pas
@@ -190,6 +190,8 @@ interface
                owner.writer.AsmWrite('@tpoff');
              addr_tlsgd:
                owner.writer.AsmWrite('@tlsgd');
+             addr_secrel:
+               owner.writer.AsmWrite('@secrel32');
 {$endif x86_64}
              else
                ;
@@ -259,6 +261,13 @@ interface
               {$ifdef x86_64},addr_tpoff,addr_tlsgd{$endif x86_64}
               ] then
               WriteReference(o.ref^)
+        {$ifdef x86_64}
+            else if o.ref^.refaddr=addr_secrel then
+              begin
+                owner.writer.AsmWrite('$');
+                WriteReference(o.ref^);
+              end
+        {$endif x86_64}
             else
               begin
                 owner.writer.AsmWrite('$');
diff --git a/compiler/x86/nx86ld.pas b/compiler/x86/nx86ld.pas
index 774addd1ca..61c82755d4 100644
--- a/compiler/x86/nx86ld.pas
+++ b/compiler/x86/nx86ld.pas
@@ -38,10 +38,10 @@ interface
 implementation
 
     uses
-      globals,
+      globals,fmodule,
       cutils,verbose,systems,
       aasmbase,aasmtai,aasmdata,aasmcpu,
-      cgutils,cgobj,
+      cgutils,cgobj,hlcgobj,
       symconst,symdef,symtable,
       cgbase,cpubase,parabase,paramgr,
       procinfo;
@@ -54,37 +54,10 @@ implementation
       var
         paraloc1 : tcgpara;
         pd: tprocdef;
-        href: treference;
-        hregister : tregister;
-        handled: boolean;
+        href, href_index: treference;
+        hregister, reg_index, reg_offset, reg_start : tregister;
       begin
-        handled:=false;
-        if (tf_section_threadvars in target_info.flags) then
-          begin
-            if target_info.system in [system_i386_win32,system_x86_64_win64] then
-              begin
-                paraloc1.init;
-                pd:=search_system_proc('fpc_tls_add');
-                paramanager.getcgtempparaloc(current_asmdata.CurrAsmList,pd,1,paraloc1);
-                if not(vo_is_weak_external in gvs.varoptions) then
-                  reference_reset_symbol(href,current_asmdata.RefAsmSymbol(gvs.mangledname,AT_DATA,use_indirect_symbol(gvs)),0,sizeof(pint),[])
-                else
-                  reference_reset_symbol(href,current_asmdata.WeakRefAsmSymbol(gvs.mangledname,AT_DATA),0,sizeof(pint),[]);
-                cg.a_loadaddr_ref_cgpara(current_asmdata.CurrAsmList,href,paraloc1);
-                paramanager.freecgpara(current_asmdata.CurrAsmList,paraloc1);
-                paraloc1.done;
-
-                cg.g_call(current_asmdata.CurrAsmList,'FPC_TLS_ADD');
-                cg.ungetcpuregister(current_asmdata.CurrAsmList,NR_FUNCTION_RESULT_REG);
-                hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
-                cg.a_load_reg_reg(current_asmdata.CurrAsmList,OS_ADDR,OS_ADDR,NR_FUNCTION_RESULT_REG,hregister);
-                location.reference.base:=hregister;
-                handled:=true;
-              end;
-          end;
-
-        if not handled then
-          inherited;
+        inherited;
 
         if (tf_section_threadvars in target_info.flags) then
           begin
@@ -156,6 +129,48 @@ implementation
                       Internalerror(2019012002);
                   end;
                 end;
+              system_x86_64_win64:
+                begin
+                  reference_reset(href,sizeof(AInt),[]); href.segment:=NR_GS; href.offset:=88;
+                  hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq %gs:88, %rcx
+
+                  reference_reset_base(href,hregister,0,ctempposinvalid,sizeof(AInt),[]);
+                  if (main_module=nil) or main_module.IsLibrary or main_module.is_unit then
+                    begin // _tls_index <> 0
+                      reference_reset_symbol(href_index,current_asmdata.RefAsmSymbol('_tls_index',AT_DATA),0,sizeof(dword),[]);
+                      reg_index:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                      hlcg.a_load_ref_reg(current_asmdata.CurrAsmList,u32inttype,u64inttype,
+                        href_index,reg_index);                                                             // movl _tls_index(%rip), %edx
+                      href.index:=reg_index; href.scalefactor:=8;
+                    end;
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq (%rcx), %rcx  |  movq (%rcx,%rdx,8), %rcx
+
+                  reg_offset:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                  href:=location.reference; // from inherited
+
+                if not (cs_asm_pre_binutils_2_25 in current_settings.globalswitches) then
+                  begin
+                    href.refaddr:=addr_secrel;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_L,href,
+                      newreg(getregtype(reg_offset),getsupreg(reg_offset),R_SUBD)));                       // movl $threadvar@secrel32, %rax
+                  end
+                else // Old binutils (2.21) not support $threadvar@secrel32
+                  begin
+                    href.symbol.typ:=AT_DATA;
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_offset));      // movq $threadvar, %rax
+                    reg_start:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                    reference_reset_symbol(href,current_asmdata.RefAsmSymbol('___tls_start__',AT_DATA),0,sizeof(AInt),[]);
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_start));       // movq $___tls_start__, %rdx
+                    current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg(A_SUB,S_Q,reg_start,reg_offset)); // subq %rdx, %rax
+                  end;
+
+                  reference_reset(location.reference,location.reference.alignment,location.reference.volatility);
+                  location.reference.base:=hregister;
+                  location.reference.index:=reg_offset;                                                    // (%rcx,%rax)
+                end;
               else
                 ;
             end;
diff --git a/rtl/inc/heaptrc.pp b/rtl/inc/heaptrc.pp
index 920f448293..da7caade9f 100644
--- a/rtl/inc/heaptrc.pp
+++ b/rtl/inc/heaptrc.pp
@@ -1035,11 +1035,13 @@ var
    edata : ptruint; external name '__data_end__';
    sbss : ptruint; external name '__bss_start__';
    ebss : ptruint; external name '__bss_end__';
+{$ifndef FPC_SECTION_THREADVARS}
    TLSKey : PDWord; external name '_FPC_TlsKey';
    TLSSize : DWord; external name '_FPC_TlsSize';
 
 function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
+{$endif !FPC_SECTION_THREADVARS}
 {$endif}
 
 {$ifdef BEOS}
@@ -1104,6 +1106,7 @@ begin
   { inside data, rdata ... bss }
   if (ptruint(p)>=ptruint(@sdata)) and (ptruint(p)<ptruint(@ebss)) then
     exit;
+  {$ifndef FPC_SECTION_THREADVARS}
   { is program multi-threaded and p inside Threadvar range? }
   if TlsKey^<>-1 then
     begin
@@ -1112,6 +1115,7 @@ begin
           (ptruint(p)<ptruint(datap)+TlsSize)) then
         exit;
     end;
+  {$endif !FPC_SECTION_THREADVARS}
 {$endif windows}
 
 {$IFDEF OS2}
diff --git a/rtl/linux/si_impl.inc b/rtl/linux/si_impl.inc
index a727310551..e3ceee427e 100644
--- a/rtl/linux/si_impl.inc
+++ b/rtl/linux/si_impl.inc
@@ -20,7 +20,9 @@ procedure SysEntry_InitTLS(constref info: TEntryInformation); external name 'FPC
 
 var
   InitFinalTable : record end; external name 'INITFINAL';
+  {$ifndef FPC_SECTION_THREADVARS}
   ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+  {$endif}
   {$ifdef FPC_HAS_RESSTRINITS}
   ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
   {$endif}
diff --git a/rtl/win/sysosh.inc b/rtl/win/sysosh.inc
index 5bee0d4c17..4b65a1d41d 100644
--- a/rtl/win/sysosh.inc
+++ b/rtl/win/sysosh.inc
@@ -56,7 +56,9 @@ type
     {$ifdef WIN32}
     asm_exit : Procedure;stdcall;
     {$endif WIN32}
+    {$ifndef FPC_SECTION_THREADVARS}
     TlsKeyAddr : PDWord;
+    {$endif !FPC_SECTION_THREADVARS}
     SysInstance: {$ifdef CPU64}PQWord{$else}PLongInt{$endif};
     WideInitTables : Pointer;
   end;
@@ -117,7 +119,4 @@ var
 {$ifdef FPC_USE_WIN64_SEH}
 procedure _fpc_local_unwind(frame,target: Pointer);compilerproc;
 {$endif FPC_USE_WIN64_SEH}
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer):pointer;compilerproc;
-{$endif FPC_SECTION_THREADVARS}
 
diff --git a/rtl/win/systhrd.inc b/rtl/win/systhrd.inc
index 10bb07fda4..dfa583fee7 100644
--- a/rtl/win/systhrd.inc
+++ b/rtl/win/systhrd.inc
@@ -19,6 +19,7 @@
                            Local WINApi imports
 *****************************************************************************}
 
+{$ifndef FPC_SECTION_THREADVARS}
 const
   { LocalAlloc flags  }
   LMEM_FIXED = 0;
@@ -34,15 +35,16 @@ function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
 function TlsSetValue(dwTlsIndex : DWord;lpTlsValue : pointer) : LongBool;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsSetValue';
+function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
+  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
+function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
+{$endif !FPC_SECTION_THREADVARS}
 function CreateThread(lpThreadAttributes : pointer;
   dwStackSize : SIZE_T; lpStartAddress : pointer;lpParameter : pointer;
   dwCreationFlags : DWord;var lpThreadId : DWord) : THandle;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'CreateThread';
 procedure ExitThread(dwExitCode : DWord);
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ExitThread';
-function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
-  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
-function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
 procedure Sleep(dwMilliseconds: DWord); {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'Sleep';
 function  WinSuspendThread (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'SuspendThread';
 function  WinResumeThread  (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ResumeThread';
@@ -98,6 +100,10 @@ var
                              Threadvar support
 *****************************************************************************}
 
+    var
+      MainThreadIdWin32 : DWORD;
+      
+{$ifndef FPC_SECTION_THREADVARS}
     var
       // public names are used by heaptrc unit
       threadvarblocksize : dword; public name '_FPC_TlsSize';
@@ -108,9 +114,6 @@ var
       TLSKey : PDWord = @TLSKeyVar; public name '_FPC_TlsKey';
       {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
 
-    var
-      MainThreadIdWin32 : DWORD;
-
     procedure SysInitThreadvar(var offset : dword;size : dword);
       begin
         offset:=threadvarblocksize;
@@ -199,6 +202,7 @@ var
             TlsSetValue(tlskey^, nil);
           end;
       end;
+{$endif !FPC_SECTION_THREADVARS}
 
 
 {*****************************************************************************
@@ -255,7 +259,9 @@ var
         writeln('Creating new thread');
 {$endif DEBUG_MT}
         { Initialize multithreading if not done }
+{$ifndef FPC_SECTION_THREADVARS}
         SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
         IsMultiThread:=true;
 
         { the only way to pass data to the newly created thread
@@ -623,10 +629,17 @@ begin
     EnterCriticalSection   :=@SysEnterCriticalSection;
     TryEnterCriticalSection:=@SysTryEnterCriticalSection;
     LeaveCriticalSection   :=@SysLeaveCriticalSection;
+{$ifndef FPC_SECTION_THREADVARS}
     InitThreadVar          :=@SysInitThreadVar;
     RelocateThreadVar      :=@SysRelocateThreadVar;
     AllocateThreadVars     :=@SysAllocateThreadVars;
     ReleaseThreadVars      :=@SysReleaseThreadVars;
+{$else}
+    InitThreadVar          :=Nil;
+    RelocateThreadVar      :=Nil;
+    AllocateThreadVars     :=Nil;
+    ReleaseThreadVars      :=Nil;
+{$endif FPC_SECTION_THREADVARS}
     BasicEventCreate       :=@intBasicEventCreate;
     BasicEventDestroy      :=@intBasicEventDestroy;
     BasicEventResetEvent   :=@intBasicEventResetEvent;
@@ -641,10 +654,12 @@ begin
     end;
   SetThreadManager(WinThreadManager);
   ThreadID := GetCurrentThreadID;
+{$ifndef FPC_SECTION_THREADVARS}
 {$ifndef FPC_USE_TLS_DIRECTORY}
   if IsLibrary then
 {$endif}
     SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
 
 {$ifndef WINCE}
   KernelHandle:=GetModuleHandle(KernelDLL);
diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index db53b77a94..d0b48a89a5 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -32,8 +32,10 @@ Const
   DLL_PROCESS_DETACH = 0;
   DLL_THREAD_DETACH = 3;
 
+{$ifndef FPC_SECTION_THREADVARS}
 var
    TlsKey : PDWord = @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
 
 type
   TTlsDirectory=packed record
@@ -44,11 +46,13 @@ type
   end;
 
 
+{$ifndef FPC_SECTION_THREADVARS}
 function TlsGetValue(dwTlsIndex : DWord) : pointer; stdcall;
   external 'kernel32' name 'TlsGetValue';
 
 procedure InitSystemThreads; external name '_FPC_InitSystemThreads';
 procedure SysAllocateThreadVars; external name '_FPC_SysAllocateThreadVars';
+{$endif !FPC_SECTION_THREADVARS}
 procedure InitHeap; external name '_FPC_InitHeap';
 
 {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
@@ -58,7 +62,7 @@ procedure SetupEntryInformation(constref info: TEntryInformation); external name
 {$endif FPC_INSSIDE_SYSINIT}
 
 procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer);
-  stdcall; [public,alias:'_FPC_Tls_Callback'];
+  stdcall;
   begin
      if IsLibrary then
        Exit;
@@ -83,12 +87,19 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
              the threadvar handling can be correctly initialized }
            SetupEntryInformation(SysInitEntryInformation);
            {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
+           { the DLL_PROCESS_ATTACH initialization for the heap is required, because
+             if the DLL_PROCESS_ATTACH entry of a used DLL creates a thread, DLL_THREAD_ATTACH
+             of the executable will be called so that the RTL for the thread can be initialized
+             and that needs the code in DLL_PROCESS_ATTACH to be run first }
            InitHeap;
+         {$ifndef FPC_SECTION_THREADVARS}
            InitSystemThreads;
+         {$endif !FPC_SECTION_THREADVARS}
          end;
 
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
          {  !!! SysInitMultithreading must NOT be called here. Windows guarantees that
             the main thread invokes PROCESS_ATTACH, not THREAD_ATTACH. So this always
             executes in non-main thread. SysInitMultithreading() here will cause
@@ -100,6 +111,7 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -107,7 +119,11 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          end;
        DLL_THREAD_DETACH :
          begin
+         {$ifdef FPC_SECTION_THREADVARS}
+           if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
            if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
      end;
diff --git a/rtl/win/syswin.inc b/rtl/win/syswin.inc
index 63ee752569..37fb574f33 100644
--- a/rtl/win/syswin.inc
+++ b/rtl/win/syswin.inc
@@ -408,10 +408,12 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
          end;
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
            { SysInitMultithreading must not be called here,
              see comments in exec_tls_callback below }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -425,7 +427,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
            if assigned(Dll_Thread_Detach_Hook) then
              Dll_Thread_Detach_Hook(DllParam);
            { Release Threadvars }
+         {$ifdef FPC_SECTION_THREADVARS}
+           if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
            if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
        DLL_PROCESS_DETACH :
@@ -439,9 +445,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
                InternalExit;
              end;
 
+         {$ifndef FPC_SECTION_THREADVARS}
            SysReleaseThreadVars;
            { Free TLS resources used by ThreadVars }
            SysFiniMultiThreading;
+         {$endif !FPC_SECTION_THREADVARS}
            MainThreadIDWin32:=0;
          end;
      end;
diff --git a/rtl/win32/sysinit.inc b/rtl/win32/sysinit.inc
index 0eea501af0..c8f7b6f042 100644
--- a/rtl/win32/sysinit.inc
+++ b/rtl/win32/sysinit.inc
@@ -16,10 +16,14 @@
 
    var
       SysInstance : LongInt;
+ {$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+ {$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+ {$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+ {$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
  {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -53,7 +57,11 @@
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -65,7 +73,9 @@
         valgrind_used : false;
         OS : (
           asm_exit : @asm_exit;
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
diff --git a/rtl/win32/system.pp b/rtl/win32/system.pp
index 4ffa0b65ff..4cf69de94d 100644
--- a/rtl/win32/system.pp
+++ b/rtl/win32/system.pp
@@ -78,7 +78,9 @@ end;
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -498,41 +500,6 @@ begin
 end;
 {$endif Set_i386_Exception_handler}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $tls_data_start,%eax
-      cmpb  $0,IsLibrary
-      mov   _tls_index,%ecx
-      jnz   .L1
-      mov   %fs:(0x2c),%edx
-      add   (%edx,%ecx,4),%eax
-      ret
-.L1:
-      push  %ebx
-      mov   %eax,%ebx
-      call  GetLastError
-      push  %eax                      { save LastError }
-      push  _tls_index
-      call  TlsGetValue
-      test  %eax,%eax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%eax
-      call  InitThread
-      push  _tls_index
-      call  TlsGetValue
-.L2:
-      add   %eax,%ebx
-      call  SetLastError              { restore (value is on stack) }
-      mov   %ebx,%eax
-      pop   %ebx
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
diff --git a/rtl/win64/sysinit.pp b/rtl/win64/sysinit.pp
index 5d396beb8f..9d740e19ec 100644
--- a/rtl/win64/sysinit.pp
+++ b/rtl/win64/sysinit.pp
@@ -21,10 +21,14 @@ unit sysinit;
 
    var
       SysInstance : QWord;
+{$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+{$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+{$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+{$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
 {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -58,7 +62,11 @@ unit sysinit;
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -69,7 +77,9 @@ unit sysinit;
         PascalMain : @PascalMain;
         valgrind_used : false;
         OS : (
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
@@ -124,6 +134,9 @@ unit sysinit;
       sysinstance:=_hinstance;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
+{$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
+      LinkIn(@tlsdir,@tls_callback_end,@tls_callback);
+{$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
     end;
@@ -135,6 +148,9 @@ unit sysinit;
       sysinstance:=_hinstance;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
+{$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
+      LinkIn(@tlsdir,@tls_callback_end,@tls_callback);
+{$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
     end;
diff --git a/rtl/win64/system.pp b/rtl/win64/system.pp
index 6cf7b174cd..b6fad06fff 100644
--- a/rtl/win64/system.pp
+++ b/rtl/win64/system.pp
@@ -122,7 +122,9 @@ procedure PascalMain;external name 'PASCALMAIN';
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -504,47 +506,6 @@ begin
 end;
 {$endif VER3_0}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $56,%rsp                  { 32 spill area + 16 local vars + 8 misalignment }
-  .seh_stackalloc 56
-  .seh_endprologue
-      lea   tls_data_start(%rip),%rax
-      sub   %rax,%rcx
-      cmpb  $0,IsLibrary(%rip)
-      mov   _tls_index(%rip),%eax
-      jnz   .L1
-      mov   %gs:(88),%rdx
-      add   (%rdx,%rax,8),%rcx
-      mov   %rcx,%rax
-      jmp   .L3
-.L1:
-      mov   %rcx,32(%rsp)
-      call  GetLastError
-      mov   %rax,40(%rsp)             { save LastError }
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-      test  %rax,%rax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%rcx
-      call  InitThread
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-.L2:
-      add   %rax,32(%rsp)
-      mov   40(%rsp),%rcx
-      call  SetLastError
-      mov   32(%rsp),%rax
-.L3:
-      add   $56,%rsp
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
-- 
2.26.2.windows.1

Sergey Larin

2020-08-06 21:47

reporter   ~0124630

Last edited: 2020-08-06 21:57

View 2 revisions

Regarding the problem that the tls callback is not called when using ld.exe, I found out the following: In the TLS Directory, the AddressOfCallBacks address is 16 bytes higher than the __FPC_tls_callbacks block start.
tlsdirectory.png (23,809 bytes)   
tlsdirectory.png (23,809 bytes)   
memory.png (14,149 bytes)   
memory.png (14,149 bytes)   

Sergey Larin

2020-08-10 12:58

reporter   ~0124719

> - is there a reason why you use the TLS callback for libraries as well when the normal entry point should be enough?

I found a way to completely exclude Exec_Tls_callback for libraries - tls-win64-section_threadvars-fixup5.patch

However this conflicts with the workaround for the issue of not calling tls callbacks when using ld.exe.
But, I think I fixed this error too - tls_callback-ld-fix.patch
Those. the tls_callback-ld-workaround.patch workaround has become outdated and unnecessary.

Increment patch - tls-win64-section_threadvars-fixup5.patch
Cumulative patch - tls-win64-section_threadvars-v5.patch
TLS callback ld.exe bug fix - tls_callback-ld-fix.patch
tls-win64-section_threadvars-v5.patch (37,291 bytes)   
---
 compiler/aggas.pas         |  4 +-
 compiler/cgbase.pas        |  1 +
 compiler/dbgdwarf.pas      |  3 +-
 compiler/nld.pas           |  4 +-
 compiler/nutils.pas        |  9 +++--
 compiler/ogcoff.pas        | 35 +++++++----------
 compiler/systems/i_win.pas |  3 ++
 compiler/systems/t_win.pas | 27 ++++++++++++-
 compiler/x86/aasmcpu.pas   |  6 +++
 compiler/x86/agx86att.pas  |  9 +++++
 compiler/x86/nx86ld.pas    | 79 +++++++++++++++++++++++---------------
 rtl/inc/heaptrc.pp         |  4 ++
 rtl/linux/si_impl.inc      |  2 +
 rtl/win/sysosh.inc         |  5 +--
 rtl/win/systhrd.inc        | 27 ++++++++++---
 rtl/win/systlsdir.inc      | 21 ++++++++--
 rtl/win/syswin.inc         |  8 ++++
 rtl/win32/sysinit.inc      | 10 +++++
 rtl/win32/system.pp        | 37 +-----------------
 rtl/win64/sysinit.pp       | 16 ++++++++
 rtl/win64/system.pp        | 43 +--------------------
 21 files changed, 204 insertions(+), 149 deletions(-)

diff --git a/compiler/aggas.pas b/compiler/aggas.pas
index 41d20c5129..f9b69f605b 100644
--- a/compiler/aggas.pas
+++ b/compiler/aggas.pas
@@ -533,7 +533,9 @@ implementation
              { according to the GNU AS guide AS for COFF does not support the
                progbits }
              writer.AsmWrite('.section ');
-             usesectionflags:=true;
+             { .tls section requires special processing }
+             if not((atype=sec_threadvar) and (tf_section_threadvars in target_info.flags)) then
+               usesectionflags:=true;
            end;
          system_powerpc_darwin,
          system_i386_darwin,
diff --git a/compiler/cgbase.pas b/compiler/cgbase.pas
index 6dca71ba40..0e862ca739 100644
--- a/compiler/cgbase.pas
+++ b/compiler/cgbase.pas
@@ -140,6 +140,7 @@ interface
 {$ifdef x86_64}
           ,addr_tpoff
           ,addr_tlsgd
+          ,addr_secrel
 {$endif x86_64}
          );
 
diff --git a/compiler/dbgdwarf.pas b/compiler/dbgdwarf.pas
index 5b75e257b5..d81436ed2a 100644
--- a/compiler/dbgdwarf.pas
+++ b/compiler/dbgdwarf.pas
@@ -2658,7 +2658,8 @@ implementation
                   begin
                     if vo_is_thread_var in sym.varoptions then
                       begin
-                        if tf_section_threadvars in target_info.flags then
+                        if (tf_section_threadvars in target_info.flags) and
+                           (not (target_info.system in [system_x86_64_win64])) then // TODO : dwarf for threadvars on win64
                           begin
                             case sizeof(puint) of
                               2:
diff --git a/compiler/nld.pas b/compiler/nld.pas
index e2bccbc080..0453f162e8 100644
--- a/compiler/nld.pas
+++ b/compiler/nld.pas
@@ -450,7 +450,9 @@ implementation
                 { call to get address of threadvar }
                 if (vo_is_thread_var in tabstractvarsym(symtableentry).varoptions) then
                   begin
-                    include(current_procinfo.flags,pi_do_call);
+                    if (not (tf_section_threadvars in target_info.flags)) or
+                       (current_settings.tlsmodel in [tlsm_local_dynamic,tlsm_global_dynamic]) then
+                      include(current_procinfo.flags,pi_do_call);
                     include(current_procinfo.flags,pi_uses_threadvar);
                   end;
               end;
diff --git a/compiler/nutils.pas b/compiler/nutils.pas
index d586fc222d..b5c0de1937 100644
--- a/compiler/nutils.pas
+++ b/compiler/nutils.pas
@@ -191,7 +191,7 @@ interface
 implementation
 
     uses
-      cutils,verbose,globals,
+      cutils,verbose,globals,systems,
       symconst,symdef,
       defcmp,defutil,
       ncon,ncnv,nld,nflw,nset,ncal,nadd,nmem,ninl,
@@ -799,10 +799,13 @@ implementation
                 begin
                   if assigned(tloadnode(p).left) then
                     inc(result,node_complexity(tloadnode(p).left));
-                  { threadvars need a helper call }
+                  { some threadvars models need a helper call }
                   if (tloadnode(p).symtableentry.typ=staticvarsym) and
                      (vo_is_thread_var in tstaticvarsym(tloadnode(p).symtableentry).varoptions) then
-                    inc(result,5)
+                    if (tf_section_threadvars in target_info.flags) and (current_settings.tlsmodel in [tlsm_local_exec,tlsm_initial_exec]) then
+                      inc(result,3)
+                    else
+                      inc(result,50)
                   else if not((tloadnode(p).symtableentry.typ in [staticvarsym,localvarsym,paravarsym,fieldvarsym]) and
                     (tabstractvarsym(tloadnode(p).symtableentry).varregable in [vr_intreg,vr_mmreg,vr_fpureg])) then
                     inc(result);
diff --git a/compiler/ogcoff.pas b/compiler/ogcoff.pas
index 8c05b3792a..90bbca8b0d 100644
--- a/compiler/ogcoff.pas
+++ b/compiler/ogcoff.pas
@@ -1015,8 +1015,10 @@ const pemagic : array[0..3] of byte = (
                   end;
                 RELOC_SECREL32 :
                   begin
-                    { fixup address when the symbol was known in defined object }
-                    if (relocsec.objdata=objsec.objdata) then
+                    { fixup address when the symbol was known in defined object or
+                      when the symbol is a threadvar }
+                    if (relocsec.objdata=objsec.objdata) or
+                       assigned(objreloc.symbol) and (copy(objreloc.symbol.objsection.Name,1,4)='.tls') then
                       dec(address,relocsec.ExeSection.MemPos);
                     inc(address,relocval);
                   end;
@@ -2667,21 +2669,17 @@ const pemagic : array[0..3] of byte = (
 
         procedure UpdateTlsDataDir;
         var
-          {callbacksection : TExeSection;}
           tlsexesymbol: TExeSymbol;
           tlssymbol: TObjSymbol;
-          callbackexesymbol: TExeSymbol;
-          //callbacksymbol: TObjSymbol;
         begin
           { according to GNU ld,
             the callback routines should be placed into .CRT$XL*
             sections, and the thread local variables in .tls
             __tls_start__ and __tls_end__ symbols
-            should be used for the initialized part,
-            which we do not support yet. }
-          { For now, we only pass the address of the __tls_used
+            should be used for the initialized part. }
+          { We pass the address of the __tls_used
             asm symbol into PE_DATADIR_TLS with the correct
-            size of this table (different for win32/win64 }
+            size of this table (different for win32/win64) }
           tlsexesymbol:=texesymbol(ExeSymbolList.Find(
             target_info.Cprefix+'_tls_used'));
           if assigned(tlsexesymbol) then
@@ -2690,18 +2688,6 @@ const pemagic : array[0..3] of byte = (
               peoptheader.DataDirectory[PE_DATADIR_TLS].vaddr:=tlssymbol.address;
               { sizeof(TlsDirectory) is different on host and target when cross-compiling }
               peoptheader.DataDirectory[PE_DATADIR_TLS].size:=TLSDIR_SIZE;
-              if IsSharedLibrary then
-                begin
-                  { Here we should reset __FPC_tls_callbacks value to nil }
-                  callbackexesymbol:=texesymbol(ExeSymbolList.Find(
-                                        '__FPC_tls_callbacks'));
-                  if assigned (callbackexesymbol) then
-                    begin
-                      //callbacksymbol:=callbackexesymbol.ObjSymbol;
-
-                    end;
-                end;
-
            end;
         end;
 
@@ -2790,7 +2776,10 @@ const pemagic : array[0..3] of byte = (
               end
             else
               begin
-                peoptheader.MajorOperatingSystemVersion:=4;
+                if tf_section_threadvars in target_info.flags then
+                  peoptheader.MajorOperatingSystemVersion:=6 // >= Windows Vista
+                else
+                  peoptheader.MajorOperatingSystemVersion:=4;
                 peoptheader.MinorOperatingSystemVersion:=0;
               end;
             if SetPEUserVersionSetExplicitely then
@@ -2812,6 +2801,8 @@ const pemagic : array[0..3] of byte = (
               begin
                 if target_info.system in systems_wince then
                   peoptheader.MajorSubsystemVersion:=3
+                else if tf_section_threadvars in target_info.flags then
+                  peoptheader.MajorSubsystemVersion:=6 // >= Windows Vista
                 else
                   peoptheader.MajorSubsystemVersion:=4;
                 peoptheader.MinorSubsystemVersion:=0;
diff --git a/compiler/systems/i_win.pas b/compiler/systems/i_win.pas
index d4e8f842d5..d1ffb4e57c 100644
--- a/compiler/systems/i_win.pas
+++ b/compiler/systems/i_win.pas
@@ -107,6 +107,9 @@ unit i_win;
             shortname    : 'Win64';
             flags        : [tf_files_case_aware,tf_has_dllscanner,
                             tf_smartlink_sections,
+{$if not defined(DISABLE_TLS_DIRECTORY) and not defined(SUPPORT_WINXP)}
+                            tf_section_threadvars,
+{$endif}
                             tf_winlikewidestring,tf_no_pic_supported,
                             tf_dwarf_only_local_labels,
                             tf_no_generic_stackcheck,tf_has_winlike_resources,
diff --git a/compiler/systems/t_win.pas b/compiler/systems/t_win.pas
index 887787f8e9..b96f820a3c 100644
--- a/compiler/systems/t_win.pas
+++ b/compiler/systems/t_win.pas
@@ -1711,11 +1711,36 @@ implementation
             else
               ;
           end;
-        if dllversion<>'' then
+        if SetPEUserVersionSetExplicitely then
+          begin
+            peoptheader.MajorImageVersion:=peuserversionmajor;
+            peoptheader.MinorImageVersion:=peuserversionminor;
+          end
+        else if dllversion<>'' then
           begin
            peoptheader.MajorImageVersion:=dllmajor;
            peoptheader.MinorImageVersion:=dllminor;
           end;
+        if SetPEOSVersionSetExplicitely then
+          begin
+            peoptheader.MajorOperatingSystemVersion:=peosversionmajor;
+            peoptheader.MinorOperatingSystemVersion:=peosversionminor;
+          end
+        else if tf_section_threadvars in target_info.flags then
+          begin
+            peoptheader.MajorOperatingSystemVersion:=6; // >= Windows Vista
+            peoptheader.MinorOperatingSystemVersion:=0;
+          end;
+        if SetPESubSysVersionSetExplicitely then
+          begin
+            peoptheader.MajorSubsystemVersion:=pesubsysversionmajor;
+            peoptheader.MinorSubsystemVersion:=pesubsysversionminor;
+          end
+        else if tf_section_threadvars in target_info.flags then
+          begin
+            peoptheader.MajorSubsystemVersion:=6; // >= Windows Vista
+            peoptheader.MinorSubsystemVersion:=0;
+          end;
         { reset timestamp }
         peheader.time:=0;
         { write header back, skip pe magic }
diff --git a/compiler/x86/aasmcpu.pas b/compiler/x86/aasmcpu.pas
index f712af7196..115474e6d1 100644
--- a/compiler/x86/aasmcpu.pas
+++ b/compiler/x86/aasmcpu.pas
@@ -3617,6 +3617,12 @@ implementation
                       currabsreloc:=RELOC_TLSGD;
                       currabsreloc32:=RELOC_TLSGD;
                     end
+                  else if oper[opidx]^.ref^.refaddr=addr_secrel then
+                    begin
+                      currrelreloc:=RELOC_SECREL32;
+                      currabsreloc:=RELOC_SECREL32;
+                      currabsreloc32:=RELOC_SECREL32;
+                    end
                   else
 {$endif x86_64}
                     begin
diff --git a/compiler/x86/agx86att.pas b/compiler/x86/agx86att.pas
index 657f073163..bf69e08bd5 100644
--- a/compiler/x86/agx86att.pas
+++ b/compiler/x86/agx86att.pas
@@ -190,6 +190,8 @@ interface
                owner.writer.AsmWrite('@tpoff');
              addr_tlsgd:
                owner.writer.AsmWrite('@tlsgd');
+             addr_secrel:
+               owner.writer.AsmWrite('@secrel32');
 {$endif x86_64}
              else
                ;
@@ -259,6 +261,13 @@ interface
               {$ifdef x86_64},addr_tpoff,addr_tlsgd{$endif x86_64}
               ] then
               WriteReference(o.ref^)
+        {$ifdef x86_64}
+            else if o.ref^.refaddr=addr_secrel then
+              begin
+                owner.writer.AsmWrite('$');
+                WriteReference(o.ref^);
+              end
+        {$endif x86_64}
             else
               begin
                 owner.writer.AsmWrite('$');
diff --git a/compiler/x86/nx86ld.pas b/compiler/x86/nx86ld.pas
index 774addd1ca..61c82755d4 100644
--- a/compiler/x86/nx86ld.pas
+++ b/compiler/x86/nx86ld.pas
@@ -38,10 +38,10 @@ interface
 implementation
 
     uses
-      globals,
+      globals,fmodule,
       cutils,verbose,systems,
       aasmbase,aasmtai,aasmdata,aasmcpu,
-      cgutils,cgobj,
+      cgutils,cgobj,hlcgobj,
       symconst,symdef,symtable,
       cgbase,cpubase,parabase,paramgr,
       procinfo;
@@ -54,37 +54,10 @@ implementation
       var
         paraloc1 : tcgpara;
         pd: tprocdef;
-        href: treference;
-        hregister : tregister;
-        handled: boolean;
+        href, href_index: treference;
+        hregister, reg_index, reg_offset, reg_start : tregister;
       begin
-        handled:=false;
-        if (tf_section_threadvars in target_info.flags) then
-          begin
-            if target_info.system in [system_i386_win32,system_x86_64_win64] then
-              begin
-                paraloc1.init;
-                pd:=search_system_proc('fpc_tls_add');
-                paramanager.getcgtempparaloc(current_asmdata.CurrAsmList,pd,1,paraloc1);
-                if not(vo_is_weak_external in gvs.varoptions) then
-                  reference_reset_symbol(href,current_asmdata.RefAsmSymbol(gvs.mangledname,AT_DATA,use_indirect_symbol(gvs)),0,sizeof(pint),[])
-                else
-                  reference_reset_symbol(href,current_asmdata.WeakRefAsmSymbol(gvs.mangledname,AT_DATA),0,sizeof(pint),[]);
-                cg.a_loadaddr_ref_cgpara(current_asmdata.CurrAsmList,href,paraloc1);
-                paramanager.freecgpara(current_asmdata.CurrAsmList,paraloc1);
-                paraloc1.done;
-
-                cg.g_call(current_asmdata.CurrAsmList,'FPC_TLS_ADD');
-                cg.ungetcpuregister(current_asmdata.CurrAsmList,NR_FUNCTION_RESULT_REG);
-                hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
-                cg.a_load_reg_reg(current_asmdata.CurrAsmList,OS_ADDR,OS_ADDR,NR_FUNCTION_RESULT_REG,hregister);
-                location.reference.base:=hregister;
-                handled:=true;
-              end;
-          end;
-
-        if not handled then
-          inherited;
+        inherited;
 
         if (tf_section_threadvars in target_info.flags) then
           begin
@@ -156,6 +129,48 @@ implementation
                       Internalerror(2019012002);
                   end;
                 end;
+              system_x86_64_win64:
+                begin
+                  reference_reset(href,sizeof(AInt),[]); href.segment:=NR_GS; href.offset:=88;
+                  hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq %gs:88, %rcx
+
+                  reference_reset_base(href,hregister,0,ctempposinvalid,sizeof(AInt),[]);
+                  if (main_module=nil) or main_module.IsLibrary or main_module.is_unit then
+                    begin // _tls_index <> 0
+                      reference_reset_symbol(href_index,current_asmdata.RefAsmSymbol('_tls_index',AT_DATA),0,sizeof(dword),[]);
+                      reg_index:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                      hlcg.a_load_ref_reg(current_asmdata.CurrAsmList,u32inttype,u64inttype,
+                        href_index,reg_index);                                                             // movl _tls_index(%rip), %edx
+                      href.index:=reg_index; href.scalefactor:=8;
+                    end;
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq (%rcx), %rcx  |  movq (%rcx,%rdx,8), %rcx
+
+                  reg_offset:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                  href:=location.reference; // from inherited
+
+                if not (cs_asm_pre_binutils_2_25 in current_settings.globalswitches) then
+                  begin
+                    href.refaddr:=addr_secrel;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_L,href,
+                      newreg(getregtype(reg_offset),getsupreg(reg_offset),R_SUBD)));                       // movl $threadvar@secrel32, %rax
+                  end
+                else // Old binutils (2.21) not support $threadvar@secrel32
+                  begin
+                    href.symbol.typ:=AT_DATA;
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_offset));      // movq $threadvar, %rax
+                    reg_start:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                    reference_reset_symbol(href,current_asmdata.RefAsmSymbol('___tls_start__',AT_DATA),0,sizeof(AInt),[]);
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_start));       // movq $___tls_start__, %rdx
+                    current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg(A_SUB,S_Q,reg_start,reg_offset)); // subq %rdx, %rax
+                  end;
+
+                  reference_reset(location.reference,location.reference.alignment,location.reference.volatility);
+                  location.reference.base:=hregister;
+                  location.reference.index:=reg_offset;                                                    // (%rcx,%rax)
+                end;
               else
                 ;
             end;
diff --git a/rtl/inc/heaptrc.pp b/rtl/inc/heaptrc.pp
index 920f448293..da7caade9f 100644
--- a/rtl/inc/heaptrc.pp
+++ b/rtl/inc/heaptrc.pp
@@ -1035,11 +1035,13 @@ var
    edata : ptruint; external name '__data_end__';
    sbss : ptruint; external name '__bss_start__';
    ebss : ptruint; external name '__bss_end__';
+{$ifndef FPC_SECTION_THREADVARS}
    TLSKey : PDWord; external name '_FPC_TlsKey';
    TLSSize : DWord; external name '_FPC_TlsSize';
 
 function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
+{$endif !FPC_SECTION_THREADVARS}
 {$endif}
 
 {$ifdef BEOS}
@@ -1104,6 +1106,7 @@ begin
   { inside data, rdata ... bss }
   if (ptruint(p)>=ptruint(@sdata)) and (ptruint(p)<ptruint(@ebss)) then
     exit;
+  {$ifndef FPC_SECTION_THREADVARS}
   { is program multi-threaded and p inside Threadvar range? }
   if TlsKey^<>-1 then
     begin
@@ -1112,6 +1115,7 @@ begin
           (ptruint(p)<ptruint(datap)+TlsSize)) then
         exit;
     end;
+  {$endif !FPC_SECTION_THREADVARS}
 {$endif windows}
 
 {$IFDEF OS2}
diff --git a/rtl/linux/si_impl.inc b/rtl/linux/si_impl.inc
index a727310551..e3ceee427e 100644
--- a/rtl/linux/si_impl.inc
+++ b/rtl/linux/si_impl.inc
@@ -20,7 +20,9 @@ procedure SysEntry_InitTLS(constref info: TEntryInformation); external name 'FPC
 
 var
   InitFinalTable : record end; external name 'INITFINAL';
+  {$ifndef FPC_SECTION_THREADVARS}
   ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+  {$endif}
   {$ifdef FPC_HAS_RESSTRINITS}
   ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
   {$endif}
diff --git a/rtl/win/sysosh.inc b/rtl/win/sysosh.inc
index 5bee0d4c17..4b65a1d41d 100644
--- a/rtl/win/sysosh.inc
+++ b/rtl/win/sysosh.inc
@@ -56,7 +56,9 @@ type
     {$ifdef WIN32}
     asm_exit : Procedure;stdcall;
     {$endif WIN32}
+    {$ifndef FPC_SECTION_THREADVARS}
     TlsKeyAddr : PDWord;
+    {$endif !FPC_SECTION_THREADVARS}
     SysInstance: {$ifdef CPU64}PQWord{$else}PLongInt{$endif};
     WideInitTables : Pointer;
   end;
@@ -117,7 +119,4 @@ var
 {$ifdef FPC_USE_WIN64_SEH}
 procedure _fpc_local_unwind(frame,target: Pointer);compilerproc;
 {$endif FPC_USE_WIN64_SEH}
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer):pointer;compilerproc;
-{$endif FPC_SECTION_THREADVARS}
 
diff --git a/rtl/win/systhrd.inc b/rtl/win/systhrd.inc
index 10bb07fda4..dfa583fee7 100644
--- a/rtl/win/systhrd.inc
+++ b/rtl/win/systhrd.inc
@@ -19,6 +19,7 @@
                            Local WINApi imports
 *****************************************************************************}
 
+{$ifndef FPC_SECTION_THREADVARS}
 const
   { LocalAlloc flags  }
   LMEM_FIXED = 0;
@@ -34,15 +35,16 @@ function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
 function TlsSetValue(dwTlsIndex : DWord;lpTlsValue : pointer) : LongBool;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsSetValue';
+function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
+  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
+function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
+{$endif !FPC_SECTION_THREADVARS}
 function CreateThread(lpThreadAttributes : pointer;
   dwStackSize : SIZE_T; lpStartAddress : pointer;lpParameter : pointer;
   dwCreationFlags : DWord;var lpThreadId : DWord) : THandle;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'CreateThread';
 procedure ExitThread(dwExitCode : DWord);
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ExitThread';
-function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
-  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
-function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
 procedure Sleep(dwMilliseconds: DWord); {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'Sleep';
 function  WinSuspendThread (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'SuspendThread';
 function  WinResumeThread  (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ResumeThread';
@@ -98,6 +100,10 @@ var
                              Threadvar support
 *****************************************************************************}
 
+    var
+      MainThreadIdWin32 : DWORD;
+      
+{$ifndef FPC_SECTION_THREADVARS}
     var
       // public names are used by heaptrc unit
       threadvarblocksize : dword; public name '_FPC_TlsSize';
@@ -108,9 +114,6 @@ var
       TLSKey : PDWord = @TLSKeyVar; public name '_FPC_TlsKey';
       {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
 
-    var
-      MainThreadIdWin32 : DWORD;
-
     procedure SysInitThreadvar(var offset : dword;size : dword);
       begin
         offset:=threadvarblocksize;
@@ -199,6 +202,7 @@ var
             TlsSetValue(tlskey^, nil);
           end;
       end;
+{$endif !FPC_SECTION_THREADVARS}
 
 
 {*****************************************************************************
@@ -255,7 +259,9 @@ var
         writeln('Creating new thread');
 {$endif DEBUG_MT}
         { Initialize multithreading if not done }
+{$ifndef FPC_SECTION_THREADVARS}
         SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
         IsMultiThread:=true;
 
         { the only way to pass data to the newly created thread
@@ -623,10 +629,17 @@ begin
     EnterCriticalSection   :=@SysEnterCriticalSection;
     TryEnterCriticalSection:=@SysTryEnterCriticalSection;
     LeaveCriticalSection   :=@SysLeaveCriticalSection;
+{$ifndef FPC_SECTION_THREADVARS}
     InitThreadVar          :=@SysInitThreadVar;
     RelocateThreadVar      :=@SysRelocateThreadVar;
     AllocateThreadVars     :=@SysAllocateThreadVars;
     ReleaseThreadVars      :=@SysReleaseThreadVars;
+{$else}
+    InitThreadVar          :=Nil;
+    RelocateThreadVar      :=Nil;
+    AllocateThreadVars     :=Nil;
+    ReleaseThreadVars      :=Nil;
+{$endif FPC_SECTION_THREADVARS}
     BasicEventCreate       :=@intBasicEventCreate;
     BasicEventDestroy      :=@intBasicEventDestroy;
     BasicEventResetEvent   :=@intBasicEventResetEvent;
@@ -641,10 +654,12 @@ begin
     end;
   SetThreadManager(WinThreadManager);
   ThreadID := GetCurrentThreadID;
+{$ifndef FPC_SECTION_THREADVARS}
 {$ifndef FPC_USE_TLS_DIRECTORY}
   if IsLibrary then
 {$endif}
     SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
 
 {$ifndef WINCE}
   KernelHandle:=GetModuleHandle(KernelDLL);
diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index db53b77a94..32e0f28e36 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -32,8 +32,10 @@ Const
   DLL_PROCESS_DETACH = 0;
   DLL_THREAD_DETACH = 3;
 
+{$ifndef FPC_SECTION_THREADVARS}
 var
    TlsKey : PDWord = @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
 
 type
   TTlsDirectory=packed record
@@ -44,11 +46,13 @@ type
   end;
 
 
+{$ifndef FPC_SECTION_THREADVARS}
 function TlsGetValue(dwTlsIndex : DWord) : pointer; stdcall;
   external 'kernel32' name 'TlsGetValue';
 
 procedure InitSystemThreads; external name '_FPC_InitSystemThreads';
 procedure SysAllocateThreadVars; external name '_FPC_SysAllocateThreadVars';
+{$endif !FPC_SECTION_THREADVARS}
 procedure InitHeap; external name '_FPC_InitHeap';
 
 {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
@@ -58,10 +62,9 @@ procedure SetupEntryInformation(constref info: TEntryInformation); external name
 {$endif FPC_INSSIDE_SYSINIT}
 
 procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer);
-  stdcall; [public,alias:'_FPC_Tls_Callback'];
+  stdcall;
   begin
-     if IsLibrary then
-       Exit;
+     Assert(not IsLibrary);
      case reason of
        { For executables, DLL_PROCESS_ATTACH is called *before* the entry point,
          and DLL_PROCESS_DETACH is called *after* RTL shuts down and calls ExitProcess.
@@ -83,12 +86,19 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
              the threadvar handling can be correctly initialized }
            SetupEntryInformation(SysInitEntryInformation);
            {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
+           { the DLL_PROCESS_ATTACH initialization for the heap is required, because
+             if the DLL_PROCESS_ATTACH entry of a used DLL creates a thread, DLL_THREAD_ATTACH
+             of the executable will be called so that the RTL for the thread can be initialized
+             and that needs the code in DLL_PROCESS_ATTACH to be run first }
            InitHeap;
+         {$ifndef FPC_SECTION_THREADVARS}
            InitSystemThreads;
+         {$endif !FPC_SECTION_THREADVARS}
          end;
 
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
          {  !!! SysInitMultithreading must NOT be called here. Windows guarantees that
             the main thread invokes PROCESS_ATTACH, not THREAD_ATTACH. So this always
             executes in non-main thread. SysInitMultithreading() here will cause
@@ -100,6 +110,7 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -107,7 +118,11 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          end;
        DLL_THREAD_DETACH :
          begin
+         {$ifdef FPC_SECTION_THREADVARS}
+           if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
            if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
      end;
diff --git a/rtl/win/syswin.inc b/rtl/win/syswin.inc
index 63ee752569..37fb574f33 100644
--- a/rtl/win/syswin.inc
+++ b/rtl/win/syswin.inc
@@ -408,10 +408,12 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
          end;
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
            { SysInitMultithreading must not be called here,
              see comments in exec_tls_callback below }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -425,7 +427,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
            if assigned(Dll_Thread_Detach_Hook) then
              Dll_Thread_Detach_Hook(DllParam);
            { Release Threadvars }
+         {$ifdef FPC_SECTION_THREADVARS}
+           if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
            if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
        DLL_PROCESS_DETACH :
@@ -439,9 +445,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
                InternalExit;
              end;
 
+         {$ifndef FPC_SECTION_THREADVARS}
            SysReleaseThreadVars;
            { Free TLS resources used by ThreadVars }
            SysFiniMultiThreading;
+         {$endif !FPC_SECTION_THREADVARS}
            MainThreadIDWin32:=0;
          end;
      end;
diff --git a/rtl/win32/sysinit.inc b/rtl/win32/sysinit.inc
index 0eea501af0..c8f7b6f042 100644
--- a/rtl/win32/sysinit.inc
+++ b/rtl/win32/sysinit.inc
@@ -16,10 +16,14 @@
 
    var
       SysInstance : LongInt;
+ {$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+ {$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+ {$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+ {$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
  {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -53,7 +57,11 @@
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -65,7 +73,9 @@
         valgrind_used : false;
         OS : (
           asm_exit : @asm_exit;
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
diff --git a/rtl/win32/system.pp b/rtl/win32/system.pp
index 4ffa0b65ff..4cf69de94d 100644
--- a/rtl/win32/system.pp
+++ b/rtl/win32/system.pp
@@ -78,7 +78,9 @@ end;
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -498,41 +500,6 @@ begin
 end;
 {$endif Set_i386_Exception_handler}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $tls_data_start,%eax
-      cmpb  $0,IsLibrary
-      mov   _tls_index,%ecx
-      jnz   .L1
-      mov   %fs:(0x2c),%edx
-      add   (%edx,%ecx,4),%eax
-      ret
-.L1:
-      push  %ebx
-      mov   %eax,%ebx
-      call  GetLastError
-      push  %eax                      { save LastError }
-      push  _tls_index
-      call  TlsGetValue
-      test  %eax,%eax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%eax
-      call  InitThread
-      push  _tls_index
-      call  TlsGetValue
-.L2:
-      add   %eax,%ebx
-      call  SetLastError              { restore (value is on stack) }
-      mov   %ebx,%eax
-      pop   %ebx
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
diff --git a/rtl/win64/sysinit.pp b/rtl/win64/sysinit.pp
index 5d396beb8f..f658cccb2b 100644
--- a/rtl/win64/sysinit.pp
+++ b/rtl/win64/sysinit.pp
@@ -21,10 +21,14 @@ unit sysinit;
 
    var
       SysInstance : QWord;
+{$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+{$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+{$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+{$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
 {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -58,7 +62,11 @@ unit sysinit;
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -69,7 +77,9 @@ unit sysinit;
         PascalMain : @PascalMain;
         valgrind_used : false;
         OS : (
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
@@ -124,6 +134,9 @@ unit sysinit;
       sysinstance:=_hinstance;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
+{$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
+      LinkIn(@tlsdir,@tls_callback_end,nil);
+{$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
     end;
@@ -135,6 +148,9 @@ unit sysinit;
       sysinstance:=_hinstance;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
+{$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
+      LinkIn(@tlsdir,@tls_callback_end,nil);
+{$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
     end;
diff --git a/rtl/win64/system.pp b/rtl/win64/system.pp
index 6cf7b174cd..b6fad06fff 100644
--- a/rtl/win64/system.pp
+++ b/rtl/win64/system.pp
@@ -122,7 +122,9 @@ procedure PascalMain;external name 'PASCALMAIN';
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -504,47 +506,6 @@ begin
 end;
 {$endif VER3_0}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $56,%rsp                  { 32 spill area + 16 local vars + 8 misalignment }
-  .seh_stackalloc 56
-  .seh_endprologue
-      lea   tls_data_start(%rip),%rax
-      sub   %rax,%rcx
-      cmpb  $0,IsLibrary(%rip)
-      mov   _tls_index(%rip),%eax
-      jnz   .L1
-      mov   %gs:(88),%rdx
-      add   (%rdx,%rax,8),%rcx
-      mov   %rcx,%rax
-      jmp   .L3
-.L1:
-      mov   %rcx,32(%rsp)
-      call  GetLastError
-      mov   %rax,40(%rsp)             { save LastError }
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-      test  %rax,%rax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%rcx
-      call  InitThread
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-.L2:
-      add   %rax,32(%rsp)
-      mov   40(%rsp),%rcx
-      call  SetLastError
-      mov   32(%rsp),%rax
-.L3:
-      add   $56,%rsp
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
-- 
2.26.2.windows.1

tls-win64-section_threadvars-fixup5.patch (1,619 bytes)   
---
 rtl/win/systlsdir.inc | 3 +--
 rtl/win64/sysinit.pp  | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index d0b48a89a5..32e0f28e36 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -64,8 +64,7 @@ procedure SetupEntryInformation(constref info: TEntryInformation); external name
 procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer);
   stdcall;
   begin
-     if IsLibrary then
-       Exit;
+     Assert(not IsLibrary);
      case reason of
        { For executables, DLL_PROCESS_ATTACH is called *before* the entry point,
          and DLL_PROCESS_DETACH is called *after* RTL shuts down and calls ExitProcess.
diff --git a/rtl/win64/sysinit.pp b/rtl/win64/sysinit.pp
index 9d740e19ec..f658cccb2b 100644
--- a/rtl/win64/sysinit.pp
+++ b/rtl/win64/sysinit.pp
@@ -135,7 +135,7 @@ unit sysinit;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
 {$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
-      LinkIn(@tlsdir,@tls_callback_end,@tls_callback);
+      LinkIn(@tlsdir,@tls_callback_end,nil);
 {$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
@@ -149,7 +149,7 @@ unit sysinit;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
 {$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
-      LinkIn(@tlsdir,@tls_callback_end,@tls_callback);
+      LinkIn(@tlsdir,@tls_callback_end,nil);
 {$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
-- 
2.26.2.windows.1

tls_callback-ld-fix.patch (801 bytes)   
---
 compiler/systems/t_win.pas | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compiler/systems/t_win.pas b/compiler/systems/t_win.pas
index 887787f8e9..583b71088e 100644
--- a/compiler/systems/t_win.pas
+++ b/compiler/systems/t_win.pas
@@ -1343,7 +1343,7 @@ implementation
             Add('    ___crt_xi_start__ = . ;');
             Add('    *(SORT(.CRT$XI*))  /* C++ initialization */');
             Add('    ___crt_xi_end__ = . ;');
-            Add('    ___crt_xl_start__ = . ;');
+            Add('    PROVIDE (___crt_xl_start__ = .);');
             Add('    *(SORT(.CRT$XL*))  /* TLS callbacks */');
             Add('    /* ___crt_xl_end__ is defined in the TLS Directory support code */');
             Add('    PROVIDE (___crt_xl_end__ = .);');
-- 
2.26.2.windows.1

tls_callback-ld-fix.patch (801 bytes)   

Sergey Larin

2020-08-11 13:34

reporter   ~0124756

Also, if FPC_SECTION_THREADVARS is defined, then it is possible not to initialize threadvars (InOutRes, ExceptObjectStack, ExceptAddrStack) to zero in InitThread and System.initialization, because these are unnecessary and misleading operations.

Sergey Larin

2020-08-23 20:10

reporter   ~0125090

Last edited: 2020-08-29 11:03

View 2 revisions

Removed unnecessary threadvar initialization with zeros - tls-win64-section_threadvars-v6.patch

> - why are you using main_module in nx86ld? I have the feeling that you're using it for the wrong reasons

Instead of referring to the fmodule from nx86ld, I can suggest testing current_settings.tlsmodel - tls-win64-section_threadvars-fixup7.patch

Increment patch - tls-win64-section_threadvars-fixup6.patch, tls-win64-section_threadvars-fixup7.patch
Cumulative patch - tls-win64-section_threadvars-v7.patch
tls-win64-section_threadvars-fixup6.patch (1,637 bytes)   
---
 rtl/inc/thread.inc  | 4 ++++
 rtl/win64/system.pp | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/rtl/inc/thread.inc b/rtl/inc/thread.inc
index ce7ec796b4..07e20bb145 100644
--- a/rtl/inc/thread.inc
+++ b/rtl/inc/thread.inc
@@ -52,16 +52,20 @@ Var
           widestringmanager.ThreadInitProc;
 {$endif FPC_HAS_FEATURE_WIDESTRINGS}
 {$ifdef FPC_HAS_FEATURE_EXCEPTIONS}
+{$ifndef FPC_SECTION_THREADVARS}
         { ExceptAddrStack and ExceptObjectStack are threadvars       }
         { so every thread has its on exception handling capabilities }
         SysInitExceptions;
+{$endif !FPC_SECTION_THREADVARS}
 {$endif FPC_HAS_FEATURE_EXCEPTIONS}
 {$ifdef FPC_HAS_FEATURE_CONSOLEIO}
 {$ifndef EMBEDDED}
         { Open all stdio fds again }
         SysInitStdio;
+{$ifndef FPC_SECTION_THREADVARS}
         InOutRes:=0;
         // ErrNo:=0;
+{$endif !FPC_SECTION_THREADVARS}
 {$endif EMBEDDED}
 {$endif FPC_HAS_FEATURE_CONSOLEIO}
 {$ifdef FPC_HAS_FEATURE_STACKCHECK}
diff --git a/rtl/win64/system.pp b/rtl/win64/system.pp
index b6fad06fff..a5771e281e 100644
--- a/rtl/win64/system.pp
+++ b/rtl/win64/system.pp
@@ -593,15 +593,19 @@ initialization
     InitHeap;
     InitSystemThreads;
   end;
+{$ifndef FPC_SECTION_THREADVARS}
   SysInitExceptions;
+{$endif !FPC_SECTION_THREADVARS}
   initunicodestringmanager;
   InitWin32Widestrings;
   SysInitStdIO;
   { Arguments }
   setup_arguments;
   InitSystemDynLibs;
+{$ifndef FPC_SECTION_THREADVARS}
   { Reset IO Error }
   InOutRes:=0;
+{$endif !FPC_SECTION_THREADVARS}
   ProcessID := GetCurrentProcessID;
   DispCallByIDProc:=@DoDispCallByIDError;
 
-- 
2.26.2.windows.1

tls-win64-section_threadvars-fixup7.patch (3,140 bytes)   
---
 compiler/globals.pas    | 2 ++
 compiler/options.pas    | 3 +++
 compiler/pmodules.pas   | 6 ++++++
 compiler/x86/nx86ld.pas | 4 ++--
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/compiler/globals.pas b/compiler/globals.pas
index c3a1f781a0..78d8fb26eb 100644
--- a/compiler/globals.pas
+++ b/compiler/globals.pas
@@ -165,6 +165,7 @@ interface
          disabledircache : boolean;
 
          tlsmodel : ttlsmodel;
+         tlsmodel_auto : boolean;
 
 {$if defined(i8086)}
          x86memorymodel  : tx86memorymodel;
@@ -585,6 +586,7 @@ interface
         disabledircache : false;
 
         tlsmodel : tlsm_none;
+        tlsmodel_auto : false;
 {$if defined(i8086)}
         x86memorymodel : mm_small;
 {$endif defined(i8086)}
diff --git a/compiler/options.pas b/compiler/options.pas
index b77220e47a..65fa7611e4 100644
--- a/compiler/options.pas
+++ b/compiler/options.pas
@@ -4250,8 +4250,11 @@ begin
     begin
       if cs_create_pic in init_settings.moduleswitches then
         init_settings.tlsmodel:=tlsm_global_dynamic
+      else if target_info.system in [system_x86_64_win64] then
+        init_settings.tlsmodel:=tlsm_initial_exec
       else
         init_settings.tlsmodel:=tlsm_local_exec;
+      init_settings.tlsmodel_auto:=true;
     end;
 
   { set Mac OS X version default macros if not specified explicitly }
diff --git a/compiler/pmodules.pas b/compiler/pmodules.pas
index bf66024f02..9378377127 100644
--- a/compiler/pmodules.pas
+++ b/compiler/pmodules.pas
@@ -1962,6 +1962,12 @@ type
          { make the compiler happy and avoid an uninitialized variable warning on Setlength(sc,length(sc)+1); }
          sc:=nil;
 
+         if not IsLibrary and init_settings.tlsmodel_auto and (init_settings.tlsmodel=tlsm_initial_exec) then
+           begin
+             init_settings.tlsmodel:=tlsm_local_exec;
+             current_settings.tlsmodel:=tlsm_local_exec;
+           end;
+
          { DLL defaults to create reloc info }
          if islibrary or (target_info.system in [system_aarch64_win64]) then
            begin
diff --git a/compiler/x86/nx86ld.pas b/compiler/x86/nx86ld.pas
index 61c82755d4..d9e07bad0e 100644
--- a/compiler/x86/nx86ld.pas
+++ b/compiler/x86/nx86ld.pas
@@ -38,7 +38,7 @@ interface
 implementation
 
     uses
-      globals,fmodule,
+      globals,
       cutils,verbose,systems,
       aasmbase,aasmtai,aasmdata,aasmcpu,
       cgutils,cgobj,hlcgobj,
@@ -136,7 +136,7 @@ implementation
                   current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq %gs:88, %rcx
 
                   reference_reset_base(href,hregister,0,ctempposinvalid,sizeof(AInt),[]);
-                  if (main_module=nil) or main_module.IsLibrary or main_module.is_unit then
+                  if current_settings.tlsmodel<>tlsm_local_exec then
                     begin // _tls_index <> 0
                       reference_reset_symbol(href_index,current_asmdata.RefAsmSymbol('_tls_index',AT_DATA),0,sizeof(dword),[]);
                       reg_index:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
-- 
2.26.2.windows.1

tls-win64-section_threadvars-v7.patch (40,624 bytes)   
---
 compiler/aggas.pas         |  4 +-
 compiler/cgbase.pas        |  1 +
 compiler/dbgdwarf.pas      |  3 +-
 compiler/globals.pas       |  2 +
 compiler/nld.pas           |  4 +-
 compiler/nutils.pas        |  9 +++--
 compiler/ogcoff.pas        | 35 +++++++----------
 compiler/options.pas       |  3 ++
 compiler/pmodules.pas      |  6 +++
 compiler/systems/i_win.pas |  3 ++
 compiler/systems/t_win.pas | 27 ++++++++++++-
 compiler/x86/aasmcpu.pas   |  6 +++
 compiler/x86/agx86att.pas  |  9 +++++
 compiler/x86/nx86ld.pas    | 77 +++++++++++++++++++++++---------------
 rtl/inc/heaptrc.pp         |  4 ++
 rtl/inc/thread.inc         |  4 ++
 rtl/linux/si_impl.inc      |  2 +
 rtl/win/sysosh.inc         |  5 +--
 rtl/win/systhrd.inc        | 27 ++++++++++---
 rtl/win/systlsdir.inc      | 21 +++++++++--
 rtl/win/syswin.inc         |  8 ++++
 rtl/win32/sysinit.inc      | 10 +++++
 rtl/win32/system.pp        | 37 +-----------------
 rtl/win64/sysinit.pp       | 16 ++++++++
 rtl/win64/system.pp        | 47 +++--------------------
 25 files changed, 222 insertions(+), 148 deletions(-)

diff --git a/compiler/aggas.pas b/compiler/aggas.pas
index 41d20c5129..f9b69f605b 100644
--- a/compiler/aggas.pas
+++ b/compiler/aggas.pas
@@ -533,7 +533,9 @@ implementation
              { according to the GNU AS guide AS for COFF does not support the
                progbits }
              writer.AsmWrite('.section ');
-             usesectionflags:=true;
+             { .tls section requires special processing }
+             if not((atype=sec_threadvar) and (tf_section_threadvars in target_info.flags)) then
+               usesectionflags:=true;
            end;
          system_powerpc_darwin,
          system_i386_darwin,
diff --git a/compiler/cgbase.pas b/compiler/cgbase.pas
index 6dca71ba40..0e862ca739 100644
--- a/compiler/cgbase.pas
+++ b/compiler/cgbase.pas
@@ -140,6 +140,7 @@ interface
 {$ifdef x86_64}
           ,addr_tpoff
           ,addr_tlsgd
+          ,addr_secrel
 {$endif x86_64}
          );
 
diff --git a/compiler/dbgdwarf.pas b/compiler/dbgdwarf.pas
index 5b75e257b5..d81436ed2a 100644
--- a/compiler/dbgdwarf.pas
+++ b/compiler/dbgdwarf.pas
@@ -2658,7 +2658,8 @@ implementation
                   begin
                     if vo_is_thread_var in sym.varoptions then
                       begin
-                        if tf_section_threadvars in target_info.flags then
+                        if (tf_section_threadvars in target_info.flags) and
+                           (not (target_info.system in [system_x86_64_win64])) then // TODO : dwarf for threadvars on win64
                           begin
                             case sizeof(puint) of
                               2:
diff --git a/compiler/globals.pas b/compiler/globals.pas
index c3a1f781a0..78d8fb26eb 100644
--- a/compiler/globals.pas
+++ b/compiler/globals.pas
@@ -165,6 +165,7 @@ interface
          disabledircache : boolean;
 
          tlsmodel : ttlsmodel;
+         tlsmodel_auto : boolean;
 
 {$if defined(i8086)}
          x86memorymodel  : tx86memorymodel;
@@ -585,6 +586,7 @@ interface
         disabledircache : false;
 
         tlsmodel : tlsm_none;
+        tlsmodel_auto : false;
 {$if defined(i8086)}
         x86memorymodel : mm_small;
 {$endif defined(i8086)}
diff --git a/compiler/nld.pas b/compiler/nld.pas
index e2bccbc080..0453f162e8 100644
--- a/compiler/nld.pas
+++ b/compiler/nld.pas
@@ -450,7 +450,9 @@ implementation
                 { call to get address of threadvar }
                 if (vo_is_thread_var in tabstractvarsym(symtableentry).varoptions) then
                   begin
-                    include(current_procinfo.flags,pi_do_call);
+                    if (not (tf_section_threadvars in target_info.flags)) or
+                       (current_settings.tlsmodel in [tlsm_local_dynamic,tlsm_global_dynamic]) then
+                      include(current_procinfo.flags,pi_do_call);
                     include(current_procinfo.flags,pi_uses_threadvar);
                   end;
               end;
diff --git a/compiler/nutils.pas b/compiler/nutils.pas
index d586fc222d..b5c0de1937 100644
--- a/compiler/nutils.pas
+++ b/compiler/nutils.pas
@@ -191,7 +191,7 @@ interface
 implementation
 
     uses
-      cutils,verbose,globals,
+      cutils,verbose,globals,systems,
       symconst,symdef,
       defcmp,defutil,
       ncon,ncnv,nld,nflw,nset,ncal,nadd,nmem,ninl,
@@ -799,10 +799,13 @@ implementation
                 begin
                   if assigned(tloadnode(p).left) then
                     inc(result,node_complexity(tloadnode(p).left));
-                  { threadvars need a helper call }
+                  { some threadvars models need a helper call }
                   if (tloadnode(p).symtableentry.typ=staticvarsym) and
                      (vo_is_thread_var in tstaticvarsym(tloadnode(p).symtableentry).varoptions) then
-                    inc(result,5)
+                    if (tf_section_threadvars in target_info.flags) and (current_settings.tlsmodel in [tlsm_local_exec,tlsm_initial_exec]) then
+                      inc(result,3)
+                    else
+                      inc(result,50)
                   else if not((tloadnode(p).symtableentry.typ in [staticvarsym,localvarsym,paravarsym,fieldvarsym]) and
                     (tabstractvarsym(tloadnode(p).symtableentry).varregable in [vr_intreg,vr_mmreg,vr_fpureg])) then
                     inc(result);
diff --git a/compiler/ogcoff.pas b/compiler/ogcoff.pas
index 8c05b3792a..90bbca8b0d 100644
--- a/compiler/ogcoff.pas
+++ b/compiler/ogcoff.pas
@@ -1015,8 +1015,10 @@ const pemagic : array[0..3] of byte = (
                   end;
                 RELOC_SECREL32 :
                   begin
-                    { fixup address when the symbol was known in defined object }
-                    if (relocsec.objdata=objsec.objdata) then
+                    { fixup address when the symbol was known in defined object or
+                      when the symbol is a threadvar }
+                    if (relocsec.objdata=objsec.objdata) or
+                       assigned(objreloc.symbol) and (copy(objreloc.symbol.objsection.Name,1,4)='.tls') then
                       dec(address,relocsec.ExeSection.MemPos);
                     inc(address,relocval);
                   end;
@@ -2667,21 +2669,17 @@ const pemagic : array[0..3] of byte = (
 
         procedure UpdateTlsDataDir;
         var
-          {callbacksection : TExeSection;}
           tlsexesymbol: TExeSymbol;
           tlssymbol: TObjSymbol;
-          callbackexesymbol: TExeSymbol;
-          //callbacksymbol: TObjSymbol;
         begin
           { according to GNU ld,
             the callback routines should be placed into .CRT$XL*
             sections, and the thread local variables in .tls
             __tls_start__ and __tls_end__ symbols
-            should be used for the initialized part,
-            which we do not support yet. }
-          { For now, we only pass the address of the __tls_used
+            should be used for the initialized part. }
+          { We pass the address of the __tls_used
             asm symbol into PE_DATADIR_TLS with the correct
-            size of this table (different for win32/win64 }
+            size of this table (different for win32/win64) }
           tlsexesymbol:=texesymbol(ExeSymbolList.Find(
             target_info.Cprefix+'_tls_used'));
           if assigned(tlsexesymbol) then
@@ -2690,18 +2688,6 @@ const pemagic : array[0..3] of byte = (
               peoptheader.DataDirectory[PE_DATADIR_TLS].vaddr:=tlssymbol.address;
               { sizeof(TlsDirectory) is different on host and target when cross-compiling }
               peoptheader.DataDirectory[PE_DATADIR_TLS].size:=TLSDIR_SIZE;
-              if IsSharedLibrary then
-                begin
-                  { Here we should reset __FPC_tls_callbacks value to nil }
-                  callbackexesymbol:=texesymbol(ExeSymbolList.Find(
-                                        '__FPC_tls_callbacks'));
-                  if assigned (callbackexesymbol) then
-                    begin
-                      //callbacksymbol:=callbackexesymbol.ObjSymbol;
-
-                    end;
-                end;
-
            end;
         end;
 
@@ -2790,7 +2776,10 @@ const pemagic : array[0..3] of byte = (
               end
             else
               begin
-                peoptheader.MajorOperatingSystemVersion:=4;
+                if tf_section_threadvars in target_info.flags then
+                  peoptheader.MajorOperatingSystemVersion:=6 // >= Windows Vista
+                else
+                  peoptheader.MajorOperatingSystemVersion:=4;
                 peoptheader.MinorOperatingSystemVersion:=0;
               end;
             if SetPEUserVersionSetExplicitely then
@@ -2812,6 +2801,8 @@ const pemagic : array[0..3] of byte = (
               begin
                 if target_info.system in systems_wince then
                   peoptheader.MajorSubsystemVersion:=3
+                else if tf_section_threadvars in target_info.flags then
+                  peoptheader.MajorSubsystemVersion:=6 // >= Windows Vista
                 else
                   peoptheader.MajorSubsystemVersion:=4;
                 peoptheader.MinorSubsystemVersion:=0;
diff --git a/compiler/options.pas b/compiler/options.pas
index b77220e47a..65fa7611e4 100644
--- a/compiler/options.pas
+++ b/compiler/options.pas
@@ -4250,8 +4250,11 @@ begin
     begin
       if cs_create_pic in init_settings.moduleswitches then
         init_settings.tlsmodel:=tlsm_global_dynamic
+      else if target_info.system in [system_x86_64_win64] then
+        init_settings.tlsmodel:=tlsm_initial_exec
       else
         init_settings.tlsmodel:=tlsm_local_exec;
+      init_settings.tlsmodel_auto:=true;
     end;
 
   { set Mac OS X version default macros if not specified explicitly }
diff --git a/compiler/pmodules.pas b/compiler/pmodules.pas
index bf66024f02..9378377127 100644
--- a/compiler/pmodules.pas
+++ b/compiler/pmodules.pas
@@ -1962,6 +1962,12 @@ type
          { make the compiler happy and avoid an uninitialized variable warning on Setlength(sc,length(sc)+1); }
          sc:=nil;
 
+         if not IsLibrary and init_settings.tlsmodel_auto and (init_settings.tlsmodel=tlsm_initial_exec) then
+           begin
+             init_settings.tlsmodel:=tlsm_local_exec;
+             current_settings.tlsmodel:=tlsm_local_exec;
+           end;
+
          { DLL defaults to create reloc info }
          if islibrary or (target_info.system in [system_aarch64_win64]) then
            begin
diff --git a/compiler/systems/i_win.pas b/compiler/systems/i_win.pas
index d4e8f842d5..d1ffb4e57c 100644
--- a/compiler/systems/i_win.pas
+++ b/compiler/systems/i_win.pas
@@ -107,6 +107,9 @@ unit i_win;
             shortname    : 'Win64';
             flags        : [tf_files_case_aware,tf_has_dllscanner,
                             tf_smartlink_sections,
+{$if not defined(DISABLE_TLS_DIRECTORY) and not defined(SUPPORT_WINXP)}
+                            tf_section_threadvars,
+{$endif}
                             tf_winlikewidestring,tf_no_pic_supported,
                             tf_dwarf_only_local_labels,
                             tf_no_generic_stackcheck,tf_has_winlike_resources,
diff --git a/compiler/systems/t_win.pas b/compiler/systems/t_win.pas
index 887787f8e9..b96f820a3c 100644
--- a/compiler/systems/t_win.pas
+++ b/compiler/systems/t_win.pas
@@ -1711,11 +1711,36 @@ implementation
             else
               ;
           end;
-        if dllversion<>'' then
+        if SetPEUserVersionSetExplicitely then
+          begin
+            peoptheader.MajorImageVersion:=peuserversionmajor;
+            peoptheader.MinorImageVersion:=peuserversionminor;
+          end
+        else if dllversion<>'' then
           begin
            peoptheader.MajorImageVersion:=dllmajor;
            peoptheader.MinorImageVersion:=dllminor;
           end;
+        if SetPEOSVersionSetExplicitely then
+          begin
+            peoptheader.MajorOperatingSystemVersion:=peosversionmajor;
+            peoptheader.MinorOperatingSystemVersion:=peosversionminor;
+          end
+        else if tf_section_threadvars in target_info.flags then
+          begin
+            peoptheader.MajorOperatingSystemVersion:=6; // >= Windows Vista
+            peoptheader.MinorOperatingSystemVersion:=0;
+          end;
+        if SetPESubSysVersionSetExplicitely then
+          begin
+            peoptheader.MajorSubsystemVersion:=pesubsysversionmajor;
+            peoptheader.MinorSubsystemVersion:=pesubsysversionminor;
+          end
+        else if tf_section_threadvars in target_info.flags then
+          begin
+            peoptheader.MajorSubsystemVersion:=6; // >= Windows Vista
+            peoptheader.MinorSubsystemVersion:=0;
+          end;
         { reset timestamp }
         peheader.time:=0;
         { write header back, skip pe magic }
diff --git a/compiler/x86/aasmcpu.pas b/compiler/x86/aasmcpu.pas
index f712af7196..115474e6d1 100644
--- a/compiler/x86/aasmcpu.pas
+++ b/compiler/x86/aasmcpu.pas
@@ -3617,6 +3617,12 @@ implementation
                       currabsreloc:=RELOC_TLSGD;
                       currabsreloc32:=RELOC_TLSGD;
                     end
+                  else if oper[opidx]^.ref^.refaddr=addr_secrel then
+                    begin
+                      currrelreloc:=RELOC_SECREL32;
+                      currabsreloc:=RELOC_SECREL32;
+                      currabsreloc32:=RELOC_SECREL32;
+                    end
                   else
 {$endif x86_64}
                     begin
diff --git a/compiler/x86/agx86att.pas b/compiler/x86/agx86att.pas
index 657f073163..bf69e08bd5 100644
--- a/compiler/x86/agx86att.pas
+++ b/compiler/x86/agx86att.pas
@@ -190,6 +190,8 @@ interface
                owner.writer.AsmWrite('@tpoff');
              addr_tlsgd:
                owner.writer.AsmWrite('@tlsgd');
+             addr_secrel:
+               owner.writer.AsmWrite('@secrel32');
 {$endif x86_64}
              else
                ;
@@ -259,6 +261,13 @@ interface
               {$ifdef x86_64},addr_tpoff,addr_tlsgd{$endif x86_64}
               ] then
               WriteReference(o.ref^)
+        {$ifdef x86_64}
+            else if o.ref^.refaddr=addr_secrel then
+              begin
+                owner.writer.AsmWrite('$');
+                WriteReference(o.ref^);
+              end
+        {$endif x86_64}
             else
               begin
                 owner.writer.AsmWrite('$');
diff --git a/compiler/x86/nx86ld.pas b/compiler/x86/nx86ld.pas
index 774addd1ca..d9e07bad0e 100644
--- a/compiler/x86/nx86ld.pas
+++ b/compiler/x86/nx86ld.pas
@@ -41,7 +41,7 @@ implementation
       globals,
       cutils,verbose,systems,
       aasmbase,aasmtai,aasmdata,aasmcpu,
-      cgutils,cgobj,
+      cgutils,cgobj,hlcgobj,
       symconst,symdef,symtable,
       cgbase,cpubase,parabase,paramgr,
       procinfo;
@@ -54,37 +54,10 @@ implementation
       var
         paraloc1 : tcgpara;
         pd: tprocdef;
-        href: treference;
-        hregister : tregister;
-        handled: boolean;
+        href, href_index: treference;
+        hregister, reg_index, reg_offset, reg_start : tregister;
       begin
-        handled:=false;
-        if (tf_section_threadvars in target_info.flags) then
-          begin
-            if target_info.system in [system_i386_win32,system_x86_64_win64] then
-              begin
-                paraloc1.init;
-                pd:=search_system_proc('fpc_tls_add');
-                paramanager.getcgtempparaloc(current_asmdata.CurrAsmList,pd,1,paraloc1);
-                if not(vo_is_weak_external in gvs.varoptions) then
-                  reference_reset_symbol(href,current_asmdata.RefAsmSymbol(gvs.mangledname,AT_DATA,use_indirect_symbol(gvs)),0,sizeof(pint),[])
-                else
-                  reference_reset_symbol(href,current_asmdata.WeakRefAsmSymbol(gvs.mangledname,AT_DATA),0,sizeof(pint),[]);
-                cg.a_loadaddr_ref_cgpara(current_asmdata.CurrAsmList,href,paraloc1);
-                paramanager.freecgpara(current_asmdata.CurrAsmList,paraloc1);
-                paraloc1.done;
-
-                cg.g_call(current_asmdata.CurrAsmList,'FPC_TLS_ADD');
-                cg.ungetcpuregister(current_asmdata.CurrAsmList,NR_FUNCTION_RESULT_REG);
-                hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
-                cg.a_load_reg_reg(current_asmdata.CurrAsmList,OS_ADDR,OS_ADDR,NR_FUNCTION_RESULT_REG,hregister);
-                location.reference.base:=hregister;
-                handled:=true;
-              end;
-          end;
-
-        if not handled then
-          inherited;
+        inherited;
 
         if (tf_section_threadvars in target_info.flags) then
           begin
@@ -156,6 +129,48 @@ implementation
                       Internalerror(2019012002);
                   end;
                 end;
+              system_x86_64_win64:
+                begin
+                  reference_reset(href,sizeof(AInt),[]); href.segment:=NR_GS; href.offset:=88;
+                  hregister:=cg.getaddressregister(current_asmdata.CurrAsmList);
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq %gs:88, %rcx
+
+                  reference_reset_base(href,hregister,0,ctempposinvalid,sizeof(AInt),[]);
+                  if current_settings.tlsmodel<>tlsm_local_exec then
+                    begin // _tls_index <> 0
+                      reference_reset_symbol(href_index,current_asmdata.RefAsmSymbol('_tls_index',AT_DATA),0,sizeof(dword),[]);
+                      reg_index:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                      hlcg.a_load_ref_reg(current_asmdata.CurrAsmList,u32inttype,u64inttype,
+                        href_index,reg_index);                                                             // movl _tls_index(%rip), %edx
+                      href.index:=reg_index; href.scalefactor:=8;
+                    end;
+                  current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,hregister));         // movq (%rcx), %rcx  |  movq (%rcx,%rdx,8), %rcx
+
+                  reg_offset:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                  href:=location.reference; // from inherited
+
+                if not (cs_asm_pre_binutils_2_25 in current_settings.globalswitches) then
+                  begin
+                    href.refaddr:=addr_secrel;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_L,href,
+                      newreg(getregtype(reg_offset),getsupreg(reg_offset),R_SUBD)));                       // movl $threadvar@secrel32, %rax
+                  end
+                else // Old binutils (2.21) not support $threadvar@secrel32
+                  begin
+                    href.symbol.typ:=AT_DATA;
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_offset));      // movq $threadvar, %rax
+                    reg_start:=cg.getintregister(current_asmdata.CurrAsmList,OS_INT);
+                    reference_reset_symbol(href,current_asmdata.RefAsmSymbol('___tls_start__',AT_DATA),0,sizeof(AInt),[]);
+                    href.refaddr:=addr_full;
+                    current_asmdata.CurrAsmList.concat(taicpu.op_ref_reg(A_MOV,S_Q,href,reg_start));       // movq $___tls_start__, %rdx
+                    current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg(A_SUB,S_Q,reg_start,reg_offset)); // subq %rdx, %rax
+                  end;
+
+                  reference_reset(location.reference,location.reference.alignment,location.reference.volatility);
+                  location.reference.base:=hregister;
+                  location.reference.index:=reg_offset;                                                    // (%rcx,%rax)
+                end;
               else
                 ;
             end;
diff --git a/rtl/inc/heaptrc.pp b/rtl/inc/heaptrc.pp
index 920f448293..da7caade9f 100644
--- a/rtl/inc/heaptrc.pp
+++ b/rtl/inc/heaptrc.pp
@@ -1035,11 +1035,13 @@ var
    edata : ptruint; external name '__data_end__';
    sbss : ptruint; external name '__bss_start__';
    ebss : ptruint; external name '__bss_end__';
+{$ifndef FPC_SECTION_THREADVARS}
    TLSKey : PDWord; external name '_FPC_TlsKey';
    TLSSize : DWord; external name '_FPC_TlsSize';
 
 function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
+{$endif !FPC_SECTION_THREADVARS}
 {$endif}
 
 {$ifdef BEOS}
@@ -1104,6 +1106,7 @@ begin
   { inside data, rdata ... bss }
   if (ptruint(p)>=ptruint(@sdata)) and (ptruint(p)<ptruint(@ebss)) then
     exit;
+  {$ifndef FPC_SECTION_THREADVARS}
   { is program multi-threaded and p inside Threadvar range? }
   if TlsKey^<>-1 then
     begin
@@ -1112,6 +1115,7 @@ begin
           (ptruint(p)<ptruint(datap)+TlsSize)) then
         exit;
     end;
+  {$endif !FPC_SECTION_THREADVARS}
 {$endif windows}
 
 {$IFDEF OS2}
diff --git a/rtl/inc/thread.inc b/rtl/inc/thread.inc
index ce7ec796b4..07e20bb145 100644
--- a/rtl/inc/thread.inc
+++ b/rtl/inc/thread.inc
@@ -52,16 +52,20 @@ Var
           widestringmanager.ThreadInitProc;
 {$endif FPC_HAS_FEATURE_WIDESTRINGS}
 {$ifdef FPC_HAS_FEATURE_EXCEPTIONS}
+{$ifndef FPC_SECTION_THREADVARS}
         { ExceptAddrStack and ExceptObjectStack are threadvars       }
         { so every thread has its on exception handling capabilities }
         SysInitExceptions;
+{$endif !FPC_SECTION_THREADVARS}
 {$endif FPC_HAS_FEATURE_EXCEPTIONS}
 {$ifdef FPC_HAS_FEATURE_CONSOLEIO}
 {$ifndef EMBEDDED}
         { Open all stdio fds again }
         SysInitStdio;
+{$ifndef FPC_SECTION_THREADVARS}
         InOutRes:=0;
         // ErrNo:=0;
+{$endif !FPC_SECTION_THREADVARS}
 {$endif EMBEDDED}
 {$endif FPC_HAS_FEATURE_CONSOLEIO}
 {$ifdef FPC_HAS_FEATURE_STACKCHECK}
diff --git a/rtl/linux/si_impl.inc b/rtl/linux/si_impl.inc
index a727310551..e3ceee427e 100644
--- a/rtl/linux/si_impl.inc
+++ b/rtl/linux/si_impl.inc
@@ -20,7 +20,9 @@ procedure SysEntry_InitTLS(constref info: TEntryInformation); external name 'FPC
 
 var
   InitFinalTable : record end; external name 'INITFINAL';
+  {$ifndef FPC_SECTION_THREADVARS}
   ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+  {$endif}
   {$ifdef FPC_HAS_RESSTRINITS}
   ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
   {$endif}
diff --git a/rtl/win/sysosh.inc b/rtl/win/sysosh.inc
index 5bee0d4c17..4b65a1d41d 100644
--- a/rtl/win/sysosh.inc
+++ b/rtl/win/sysosh.inc
@@ -56,7 +56,9 @@ type
     {$ifdef WIN32}
     asm_exit : Procedure;stdcall;
     {$endif WIN32}
+    {$ifndef FPC_SECTION_THREADVARS}
     TlsKeyAddr : PDWord;
+    {$endif !FPC_SECTION_THREADVARS}
     SysInstance: {$ifdef CPU64}PQWord{$else}PLongInt{$endif};
     WideInitTables : Pointer;
   end;
@@ -117,7 +119,4 @@ var
 {$ifdef FPC_USE_WIN64_SEH}
 procedure _fpc_local_unwind(frame,target: Pointer);compilerproc;
 {$endif FPC_USE_WIN64_SEH}
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer):pointer;compilerproc;
-{$endif FPC_SECTION_THREADVARS}
 
diff --git a/rtl/win/systhrd.inc b/rtl/win/systhrd.inc
index 10bb07fda4..dfa583fee7 100644
--- a/rtl/win/systhrd.inc
+++ b/rtl/win/systhrd.inc
@@ -19,6 +19,7 @@
                            Local WINApi imports
 *****************************************************************************}
 
+{$ifndef FPC_SECTION_THREADVARS}
 const
   { LocalAlloc flags  }
   LMEM_FIXED = 0;
@@ -34,15 +35,16 @@ function TlsGetValue(dwTlsIndex : DWord) : pointer;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsGetValue';
 function TlsSetValue(dwTlsIndex : DWord;lpTlsValue : pointer) : LongBool;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'TlsSetValue';
+function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
+  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
+function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
+{$endif !FPC_SECTION_THREADVARS}
 function CreateThread(lpThreadAttributes : pointer;
   dwStackSize : SIZE_T; lpStartAddress : pointer;lpParameter : pointer;
   dwCreationFlags : DWord;var lpThreadId : DWord) : THandle;
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'CreateThread';
 procedure ExitThread(dwExitCode : DWord);
   {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ExitThread';
-function LocalAlloc(uFlags:DWord; dwBytes:SIZE_T):Pointer;
-  {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalAlloc';
-function LocalFree(hMem : Pointer):Pointer; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'LocalFree';
 procedure Sleep(dwMilliseconds: DWord); {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'Sleep';
 function  WinSuspendThread (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'SuspendThread';
 function  WinResumeThread  (threadHandle : THandle) : dword; {$ifdef wince}cdecl{$else}stdcall{$endif};external KernelDLL name 'ResumeThread';
@@ -98,6 +100,10 @@ var
                              Threadvar support
 *****************************************************************************}
 
+    var
+      MainThreadIdWin32 : DWORD;
+      
+{$ifndef FPC_SECTION_THREADVARS}
     var
       // public names are used by heaptrc unit
       threadvarblocksize : dword; public name '_FPC_TlsSize';
@@ -108,9 +114,6 @@ var
       TLSKey : PDWord = @TLSKeyVar; public name '_FPC_TlsKey';
       {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
 
-    var
-      MainThreadIdWin32 : DWORD;
-
     procedure SysInitThreadvar(var offset : dword;size : dword);
       begin
         offset:=threadvarblocksize;
@@ -199,6 +202,7 @@ var
             TlsSetValue(tlskey^, nil);
           end;
       end;
+{$endif !FPC_SECTION_THREADVARS}
 
 
 {*****************************************************************************
@@ -255,7 +259,9 @@ var
         writeln('Creating new thread');
 {$endif DEBUG_MT}
         { Initialize multithreading if not done }
+{$ifndef FPC_SECTION_THREADVARS}
         SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
         IsMultiThread:=true;
 
         { the only way to pass data to the newly created thread
@@ -623,10 +629,17 @@ begin
     EnterCriticalSection   :=@SysEnterCriticalSection;
     TryEnterCriticalSection:=@SysTryEnterCriticalSection;
     LeaveCriticalSection   :=@SysLeaveCriticalSection;
+{$ifndef FPC_SECTION_THREADVARS}
     InitThreadVar          :=@SysInitThreadVar;
     RelocateThreadVar      :=@SysRelocateThreadVar;
     AllocateThreadVars     :=@SysAllocateThreadVars;
     ReleaseThreadVars      :=@SysReleaseThreadVars;
+{$else}
+    InitThreadVar          :=Nil;
+    RelocateThreadVar      :=Nil;
+    AllocateThreadVars     :=Nil;
+    ReleaseThreadVars      :=Nil;
+{$endif FPC_SECTION_THREADVARS}
     BasicEventCreate       :=@intBasicEventCreate;
     BasicEventDestroy      :=@intBasicEventDestroy;
     BasicEventResetEvent   :=@intBasicEventResetEvent;
@@ -641,10 +654,12 @@ begin
     end;
   SetThreadManager(WinThreadManager);
   ThreadID := GetCurrentThreadID;
+{$ifndef FPC_SECTION_THREADVARS}
 {$ifndef FPC_USE_TLS_DIRECTORY}
   if IsLibrary then
 {$endif}
     SysInitTLS;
+{$endif !FPC_SECTION_THREADVARS}
 
 {$ifndef WINCE}
   KernelHandle:=GetModuleHandle(KernelDLL);
diff --git a/rtl/win/systlsdir.inc b/rtl/win/systlsdir.inc
index db53b77a94..32e0f28e36 100644
--- a/rtl/win/systlsdir.inc
+++ b/rtl/win/systlsdir.inc
@@ -32,8 +32,10 @@ Const
   DLL_PROCESS_DETACH = 0;
   DLL_THREAD_DETACH = 3;
 
+{$ifndef FPC_SECTION_THREADVARS}
 var
    TlsKey : PDWord = @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
 
 type
   TTlsDirectory=packed record
@@ -44,11 +46,13 @@ type
   end;
 
 
+{$ifndef FPC_SECTION_THREADVARS}
 function TlsGetValue(dwTlsIndex : DWord) : pointer; stdcall;
   external 'kernel32' name 'TlsGetValue';
 
 procedure InitSystemThreads; external name '_FPC_InitSystemThreads';
 procedure SysAllocateThreadVars; external name '_FPC_SysAllocateThreadVars';
+{$endif !FPC_SECTION_THREADVARS}
 procedure InitHeap; external name '_FPC_InitHeap';
 
 {$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}
@@ -58,10 +62,9 @@ procedure SetupEntryInformation(constref info: TEntryInformation); external name
 {$endif FPC_INSSIDE_SYSINIT}
 
 procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer);
-  stdcall; [public,alias:'_FPC_Tls_Callback'];
+  stdcall;
   begin
-     if IsLibrary then
-       Exit;
+     Assert(not IsLibrary);
      case reason of
        { For executables, DLL_PROCESS_ATTACH is called *before* the entry point,
          and DLL_PROCESS_DETACH is called *after* RTL shuts down and calls ExitProcess.
@@ -83,12 +86,19 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
              the threadvar handling can be correctly initialized }
            SetupEntryInformation(SysInitEntryInformation);
            {$endif FPC_HAS_INDIRECT_ENTRY_INFORMATION}
+           { the DLL_PROCESS_ATTACH initialization for the heap is required, because
+             if the DLL_PROCESS_ATTACH entry of a used DLL creates a thread, DLL_THREAD_ATTACH
+             of the executable will be called so that the RTL for the thread can be initialized
+             and that needs the code in DLL_PROCESS_ATTACH to be run first }
            InitHeap;
+         {$ifndef FPC_SECTION_THREADVARS}
            InitSystemThreads;
+         {$endif !FPC_SECTION_THREADVARS}
          end;
 
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
          {  !!! SysInitMultithreading must NOT be called here. Windows guarantees that
             the main thread invokes PROCESS_ATTACH, not THREAD_ATTACH. So this always
             executes in non-main thread. SysInitMultithreading() here will cause
@@ -100,6 +110,7 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -107,7 +118,11 @@ procedure Exec_Tls_callback(Handle : pointer; reason : Dword; Reserved : pointer
          end;
        DLL_THREAD_DETACH :
          begin
+         {$ifdef FPC_SECTION_THREADVARS}
+           if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
            if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
      end;
diff --git a/rtl/win/syswin.inc b/rtl/win/syswin.inc
index 63ee752569..37fb574f33 100644
--- a/rtl/win/syswin.inc
+++ b/rtl/win/syswin.inc
@@ -408,10 +408,12 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
          end;
        DLL_THREAD_ATTACH :
          begin
+         {$ifndef FPC_SECTION_THREADVARS}
            { SysInitMultithreading must not be called here,
              see comments in exec_tls_callback below }
            { Allocate Threadvars  }
            SysAllocateThreadVars;
+         {$endif !FPC_SECTION_THREADVARS}
 
            { NS : no idea what is correct to pass here - pass dummy value for now }
            { passing a dummy is ok, the correct value is read from the coff header of SysInstance (FK) }
@@ -425,7 +427,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
            if assigned(Dll_Thread_Detach_Hook) then
              Dll_Thread_Detach_Hook(DllParam);
            { Release Threadvars }
+         {$ifdef FPC_SECTION_THREADVARS}
+           if ThreadID<>0 then
+         {$else !FPC_SECTION_THREADVARS}
            if TlsGetValue(TLSKey^)<>nil then
+         {$endif !FPC_SECTION_THREADVARS}
              DoneThread; { Assume everything is idempotent there }
          end;
        DLL_PROCESS_DETACH :
@@ -439,9 +445,11 @@ function Dll_entry{$ifdef FPC_HAS_INDIRECT_ENTRY_INFORMATION}(constref info : TE
                InternalExit;
              end;
 
+         {$ifndef FPC_SECTION_THREADVARS}
            SysReleaseThreadVars;
            { Free TLS resources used by ThreadVars }
            SysFiniMultiThreading;
+         {$endif !FPC_SECTION_THREADVARS}
            MainThreadIDWin32:=0;
          end;
      end;
diff --git a/rtl/win32/sysinit.inc b/rtl/win32/sysinit.inc
index 0eea501af0..c8f7b6f042 100644
--- a/rtl/win32/sysinit.inc
+++ b/rtl/win32/sysinit.inc
@@ -16,10 +16,14 @@
 
    var
       SysInstance : LongInt;
+ {$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+ {$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+ {$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+ {$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
  {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -53,7 +57,11 @@
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -65,7 +73,9 @@
         valgrind_used : false;
         OS : (
           asm_exit : @asm_exit;
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
diff --git a/rtl/win32/system.pp b/rtl/win32/system.pp
index 4ffa0b65ff..4cf69de94d 100644
--- a/rtl/win32/system.pp
+++ b/rtl/win32/system.pp
@@ -78,7 +78,9 @@ end;
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -498,41 +500,6 @@ begin
 end;
 {$endif Set_i386_Exception_handler}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $tls_data_start,%eax
-      cmpb  $0,IsLibrary
-      mov   _tls_index,%ecx
-      jnz   .L1
-      mov   %fs:(0x2c),%edx
-      add   (%edx,%ecx,4),%eax
-      ret
-.L1:
-      push  %ebx
-      mov   %eax,%ebx
-      call  GetLastError
-      push  %eax                      { save LastError }
-      push  _tls_index
-      call  TlsGetValue
-      test  %eax,%eax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%eax
-      call  InitThread
-      push  _tls_index
-      call  TlsGetValue
-.L2:
-      add   %eax,%ebx
-      call  SetLastError              { restore (value is on stack) }
-      mov   %ebx,%eax
-      pop   %ebx
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
diff --git a/rtl/win64/sysinit.pp b/rtl/win64/sysinit.pp
index 5d396beb8f..f658cccb2b 100644
--- a/rtl/win64/sysinit.pp
+++ b/rtl/win64/sysinit.pp
@@ -21,10 +21,14 @@ unit sysinit;
 
    var
       SysInstance : QWord;
+{$ifndef FPC_SECTION_THREADVARS}
       TlsKeyVar: DWord = $ffffffff;
+{$endif !FPC_SECTION_THREADVARS}
 
       InitFinalTable : record end; external name 'INITFINAL';
+{$ifndef FPC_SECTION_THREADVARS}
       ThreadvarTablesTable : record end; external name 'FPC_THREADVARTABLES';
+{$endif !FPC_SECTION_THREADVARS}
       WideInitTables : record end; external name 'FPC_WIDEINITTABLES';
 {$ifdef FPC_HAS_RESSTRINITS}
       ResStrInitTables : record end; external name 'FPC_RESSTRINITTABLES';
@@ -58,7 +62,11 @@ unit sysinit;
       STD_INPUT_HANDLE = dword(-10);
       SysInitEntryInformation : TEntryInformation = (
         InitFinalTable : @InitFinalTable;
+{$ifdef FPC_SECTION_THREADVARS}
+        ThreadvarTablesTable : nil;
+{$else FPC_SECTION_THREADVARS}
         ThreadvarTablesTable : @ThreadvarTablesTable;
+{$endif FPC_SECTION_THREADVARS}
         ResourceStringTables : @ResourceStringTables;
 {$ifdef FPC_HAS_RESSTRINITS}
         ResStrInitTables : @ResStrInitTables;
@@ -69,7 +77,9 @@ unit sysinit;
         PascalMain : @PascalMain;
         valgrind_used : false;
         OS : (
+{$ifndef FPC_SECTION_THREADVARS}
           TlsKeyAddr : @TlsKeyVar;
+{$endif !FPC_SECTION_THREADVARS}
           SysInstance : @SysInstance;
           WideInitTables: @WideInitTables;
           );
@@ -124,6 +134,9 @@ unit sysinit;
       sysinstance:=_hinstance;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
+{$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
+      LinkIn(@tlsdir,@tls_callback_end,nil);
+{$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
     end;
@@ -135,6 +148,9 @@ unit sysinit;
       sysinstance:=_hinstance;
       dllreason:=_dllreason;
       dllparam:=PtrInt(_dllparam);
+{$if defined(FPC_USE_TLS_DIRECTORY) and defined(FPC_SECTION_THREADVARS)}
+      LinkIn(@tlsdir,@tls_callback_end,nil);
+{$endif}
       SetupEntryInformation;
       DLL_Entry(SysInitEntryInformation);
     end;
diff --git a/rtl/win64/system.pp b/rtl/win64/system.pp
index 6cf7b174cd..a5771e281e 100644
--- a/rtl/win64/system.pp
+++ b/rtl/win64/system.pp
@@ -122,7 +122,9 @@ procedure PascalMain;external name 'PASCALMAIN';
 
 procedure OsSetupEntryInformation(constref info: TEntryInformation);
 begin
+{$ifndef FPC_SECTION_THREADVARS}
   TlsKey := info.OS.TlsKeyAddr;
+{$endif !FPC_SECTION_THREADVARS}
   FPCSysInstance := info.OS.SysInstance;
   WStrInitTablesTable := info.OS.WideInitTables;
 end;
@@ -504,47 +506,6 @@ begin
 end;
 {$endif VER3_0}
 
-{$ifdef FPC_SECTION_THREADVARS}
-function fpc_tls_add(addr: pointer): pointer; assembler; nostackframe;
-  [public,alias: 'FPC_TLS_ADD']; compilerproc;
-  asm
-      sub   $56,%rsp                  { 32 spill area + 16 local vars + 8 misalignment }
-  .seh_stackalloc 56
-  .seh_endprologue
-      lea   tls_data_start(%rip),%rax
-      sub   %rax,%rcx
-      cmpb  $0,IsLibrary(%rip)
-      mov   _tls_index(%rip),%eax
-      jnz   .L1
-      mov   %gs:(88),%rdx
-      add   (%rdx,%rax,8),%rcx
-      mov   %rcx,%rax
-      jmp   .L3
-.L1:
-      mov   %rcx,32(%rsp)
-      call  GetLastError
-      mov   %rax,40(%rsp)             { save LastError }
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-      test  %rax,%rax
-      jnz   .L2
-      { This can happen when a thread existed before DLL was loaded,
-        or if DisableThreadLibraryCalls was called. }
-      call  SysAllocateThreadVars
-      mov   $0x1000000,%rcx
-      call  InitThread
-      mov   _tls_index(%rip),%ecx
-      call  TlsGetValue
-.L2:
-      add   %rax,32(%rsp)
-      mov   40(%rsp),%rcx
-      call  SetLastError
-      mov   32(%rsp),%rax
-.L3:
-      add   $56,%rsp
-  end;
-{$endif FPC_SECTION_THREADVARS}
-
 function CheckInitialStkLen(stklen : SizeUInt) : SizeUInt;
   type
     tdosheader = packed record
@@ -632,15 +593,19 @@ initialization
     InitHeap;
     InitSystemThreads;
   end;
+{$ifndef FPC_SECTION_THREADVARS}
   SysInitExceptions;
+{$endif !FPC_SECTION_THREADVARS}
   initunicodestringmanager;
   InitWin32Widestrings;
   SysInitStdIO;
   { Arguments }
   setup_arguments;
   InitSystemDynLibs;
+{$ifndef FPC_SECTION_THREADVARS}
   { Reset IO Error }
   InOutRes:=0;
+{$endif !FPC_SECTION_THREADVARS}
   ProcessID := GetCurrentProcessID;
   DispCallByIDProc:=@DoDispCallByIDError;
 
-- 
2.26.2.windows.1

Issue History

Date Modified Username Field Change
2020-08-02 21:54 Sergey Larin New Issue
2020-08-02 21:54 Sergey Larin File Added: tls-win64-section_threadvars.patch
2020-08-02 21:54 Sergey Larin File Added: tls_callback-ld-workaround.patch
2020-08-02 22:02 Sergey Larin Note Added: 0124517
2020-08-02 22:04 Sergey Larin Tag Attached: compiler
2020-08-02 22:04 Sergey Larin Tag Attached: x86_64-win64
2020-08-02 22:04 Sergey Larin Tag Attached: patch
2020-08-02 22:04 Sergey Larin Tag Attached: optimization
2020-08-02 22:11 Sergey Larin Note Added: 0124518
2020-08-03 11:24 Sergey Larin Note Added: 0124525
2020-08-03 11:24 Sergey Larin File Added: tls-win64-section_threadvars-fixup1.patch
2020-08-04 10:04 Sergey Larin Note Added: 0124536
2020-08-04 10:04 Sergey Larin File Added: tls-win64-section_threadvars-fixup2.patch
2020-08-05 10:32 Sergey Larin Note Added: 0124577
2020-08-05 10:32 Sergey Larin File Added: tls-win64-section_threadvars-v3.patch
2020-08-05 21:55 Sven Barth Note Added: 0124584
2020-08-05 21:55 Sven Barth File Added: rtl.ppk
2020-08-06 19:15 Sergey Larin Note Added: 0124623
2020-08-06 19:17 Sergey Larin Note Edited: 0124623 View Revisions
2020-08-06 19:18 Sergey Larin Note Edited: 0124623 View Revisions
2020-08-06 19:21 Sergey Larin Note Edited: 0124623 View Revisions
2020-08-06 19:22 Sergey Larin Note Added: 0124625
2020-08-06 19:22 Sergey Larin File Added: tls-win64-section_threadvars-fixup4.patch
2020-08-06 19:22 Sergey Larin File Added: tls-win64-section_threadvars-v4.patch
2020-08-06 21:47 Sergey Larin Note Added: 0124630
2020-08-06 21:47 Sergey Larin File Added: tlsdirectory.png
2020-08-06 21:47 Sergey Larin File Added: memory.png
2020-08-06 21:57 Sergey Larin Note Edited: 0124630 View Revisions
2020-08-10 10:40 Sergey Larin Note Edited: 0124623 View Revisions
2020-08-10 12:58 Sergey Larin Note Added: 0124719
2020-08-10 12:58 Sergey Larin File Added: tls-win64-section_threadvars-v5.patch
2020-08-10 12:58 Sergey Larin File Added: tls-win64-section_threadvars-fixup5.patch
2020-08-10 12:58 Sergey Larin File Added: tls_callback-ld-fix.patch
2020-08-11 13:34 Sergey Larin Note Added: 0124756
2020-08-23 20:10 Sergey Larin Note Added: 0125090
2020-08-23 20:10 Sergey Larin File Added: tls-win64-section_threadvars-fixup6.patch
2020-08-23 20:10 Sergey Larin File Added: tls-win64-section_threadvars-fixup7.patch
2020-08-23 20:10 Sergey Larin File Added: tls-win64-section_threadvars-v7.patch
2020-08-29 11:03 Sergey Larin Note Edited: 0125090 View Revisions