View Issue Details

IDProjectCategoryView StatusLast Update
0035581FPCCompilerpublic2019-12-28 22:50
ReporterBart Broersma Assigned ToFlorian  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Platformi386OSWindows 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0035581: Compiler crash when assigning CP_UTF7 string and compiling with -FcUTF8
DescriptionThe following program crashes the compiler, if compiled with -FcUTF8

program cps;

{$mode objfpc}
{$h+}

type
  Utf7String = type AnsiString(CP_UTF7);

var
  U7: Utf7String;

begin
  U7 := 'U7'; //cps.lpr(13,9) Error: Unknown codepage "65000"
end.

Steps To ReproduceC:\Users\Bart\LazarusProjecten\bugs\Console\cpstring>fpc cps.lpr
Free Pascal Compiler version 3.3.1 [2019/05/12] for i386
Copyright (c) 1993-2018 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling cps.lpr
cps.lpr(13,9) Error: Unknown codepage "65000"
cps.lpr(65) Fatal: There were 1 errors compiling module, stopping
Fatal: Compilation aborted
Error: C:\pp\bin\i386-win32\ppc386.exe returned an error exitcode

Now add -FcUTF8
C:\Users\Bart\LazarusProjecten\bugs\Console\cpstring>fpc -FcUTF8 cps.lpr
Free Pascal Compiler version 3.3.1 [2019/05/12] for i386
Copyright (c) 1993-2018 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling cps.lpr
cps.lpr(13,9) Error: Unknown codepage "65000"
cps.lpr(13,9) Error: Compilation raised exception internally
Fatal: Compilation aborted
An unhandled exception occurred at $00477B14:
EAccessViolation: Access violation
  $00477B14 GETASCII, line 697 of C:/devel/fpc/trunk/rtl/inc/charset.pp
  $004C3954 TTYPECONVNODE__SIMPLIFY, line 2926 of ncnv.pas
  $004C27F9 TTYPECONVNODE__PASS_TYPECHECK, line 2426 of ncnv.pas
  $004CC597 TYPECHECKPASS_INTERNAL, line 81 of pass_1.pas
  $004BD2E7 INSERTTYPECONV, line 380 of ncnv.pas
  $004CC597 TYPECHECKPASS_INTERNAL, line 81 of pass_1.pas
  $00552EFB STATEMENT_BLOCK, line 1367 of pstatmnt.pas
  $00538079 BLOCK, line 381 of psub.pas
  $00439919 COMPILE, line 395 of parser.pas
  $00416674 COMPILE, line 278 of compiler.pas
Additional InformationSee related discussion in https://forum.lazarus.freepascal.org/index.php/topic,45380.msg320902.html#msg320902

While trying to use CP_UTF7 may be bonkers in the first place, the compiler should not crash.
I was unable to make it crash specifying another codepage with -Fc

Fpc 3.0.4 also crashes on that line (with -FcUTF8)
TagsNo tags attached.
Fixed in Revision43764
FPCOldBugId
FPCTarget-
Attached Files

Activities

Bart Broersma

2019-09-25 13:28

reporter   ~0118164

Here's the code flow that leads to the crash:

in unit ncon.pas:

procedure tstringconstnode.changestringtype:
  tstringdef(def).stringtype = st_ansistring
  cst_type = cst_conststring
  cp1=65000 {=CP_UTF7}, cp2=65001 {=CP_UTF8}
  cpavailable(cp1) =FALSE
  cpavailable(cp2) =FALSE
  current_settings.sourcecodepage=65001
  -> calls Message1(option_code_page_not_available {11039},'65000');

in unit verbose.pas

procedure Message1 calls procedure Msg2Comment

procedure Msg2Comment
  s="E_Unknown codepage "65000"", w=11039
  doqueue=FALSE, dostop=FALSE
  -> calls do_comment, this prints the "cps.pas(13,9) Error: Unknown codepage "65000"" message
  the result of do_comment function=FALSE
  status.errorcount=1, status.maxerrorcount=50, status.skip_error=FALSE
  exit Msg2Comment, no ECompilerAbort.Create is raised, so we return to the calling procedure

in unit ncon.pas
  return to tstringconstnode.changestringtype:

  initwidestr()
  setlengthwidestring(pw,2), len=2
  l:=Utf8ToUnicode() -> l=3 //Why is the value of l 3 here (the string in question = 'U7'
  l<>getlengthwidestring
    setlengthwidestring(pw,3);
    ReallocMem(value_str,3)
  -> calls unicode2ascii(pw,value_str,65000{cp1}) //value_str='U7'

in unit widestr.pas

procedure unicodetoascii:
  m:=getmap(65000) -> returns NIL !
  source^=85 //'U'
  r^.len=3
  -> calls getascii in a for loop (1 to 3). Notice that punicodemap parameter is nil, so this is bound to bomb out anyway
    i=1, calls getascii(85, nil);

in unit charset.pp

function getascii: EAccesViolation at "begin" (entrypoint of that function)
I expected this to fail at rm:=find(c,p); since this will reference p^.reversemap (and p=nil)
Something already seems to have gone wrong before the call to getascii was made.



An observation: in procedure unicode2ascii there is this remark:
        { can't implement that here, because the memory size for p() cannot
          be changed here, and we may need more bytes than have been allocated }
        if cp=CP_UTF8 then
          internalerrorproc(2015092701);
I think the same holds for CP_UTF7: it can require more bytes than was allocated before, e.g '1 + 1 = 2' is encoded as '1 +- 1 +AD0- 2' in CP_UTF7 (see: https://en.wikipedia.org/wiki/UTF-7#Examples)

Possible solutions:

1. Make option_code_page_not_available a fatal error, this will raise an ECompilerAbort.Create inside Msg2Comment.
This will affect options.pas and scandir.pas units as well.
2. in tstringconstnode.changestringtype add an Exit after the call to Message1(option_code_page_not_available,IntToStr(cp1)).

Another observation.
In tstringconstnode.changestringtype there is another codepath that may possibly lead to a compiler crash:
If cp1=CP_UTF8 and not cpavailable(cp2) then the rest of that codeblock will be excuted aftre the call to Message1(option_code_page_not_available,IntToStr(cp2)), which seems unsafe to do.

Thaddy de Koning

2019-09-26 04:20

reporter   ~0118166

It is not Windows specific. It also crashes on linux.
~ $ fpc -FcUTF8 cps.pas
Free Pascal Compiler version 3.3.1-r43062 [2019/09/25] for arm
Copyright (c) 1993-2019 by Florian Klaempfl and others
Target OS: Linux for ARMHF
Compiling cps.pas
cps.pas(13,9) Error: Unknown codepage "65000"
cps.pas(13,9) Error: Compilation raised exception internally
Fatal: Compilation aborted
An unhandled exception occurred at $0006F190:
EAccessViolation: Access violation
  $0006F190
  $0011A18C
  $00118E20
  $0010CFD0

Error: /usr/local/bin/ppcarm returned an error exitcode

Anton Kavalenka

2019-09-26 08:00

reporter   ~0118169

0036104 maybe related to the way compiler processing string constants.

Bart Broersma

2019-09-27 16:15

reporter   ~0118175

@Thaddy: not really surprising since it follows the same codepath in the compiler.

@Anton: I don't see a releationshio here. The compiler tries to do what it is supposed to do: convert an UTF8 encoded string constant ('U7') to UTF7. It cannot do so because it cannot load the conversion map for that. Up until then this is not a problem. It then proceeds however to trying to convert it anyway and that part of the code crashes. It should simply fail and stop.

Issue History

Date Modified Username Field Change
2019-05-13 20:16 Bart Broersma New Issue
2019-09-25 13:28 Bart Broersma Note Added: 0118164
2019-09-26 04:20 Thaddy de Koning Note Added: 0118166
2019-09-26 08:00 Anton Kavalenka Note Added: 0118169
2019-09-27 16:15 Bart Broersma Note Added: 0118175
2019-12-23 21:52 Florian Assigned To => Florian
2019-12-23 21:52 Florian Status new => resolved
2019-12-23 21:52 Florian Resolution open => fixed
2019-12-23 21:52 Florian Fixed in Version => 3.3.1
2019-12-23 21:52 Florian Fixed in Revision => 43764
2019-12-23 21:52 Florian FPCTarget => -
2019-12-28 22:50 Bart Broersma Status resolved => closed