Assigning one character to ANSI string gives wrong .ascii output

Original Reporter info from Mantis: engkin @engkin

Reporter name:

Description:

Assigning one character to a single-byte code page string can produce wrong result if the character has a UTF8 value two or more bytes long.

It seems that the internal conversion from UTF8 to the target code page does not terminate the string properly. And it leave some UTF8 related bytes at the end of the .ascii output.

Steps to reproduce:

Check the assembly file (-al) produced for the following program (also attached):

    program Project1;
     
    {$mode objfpc}{$H+}
    {$Codepage UTF8}
     
    type
      CP437String = type ansistring(437);
     
    var
      s_cpUTF8: string;
      s_cp437_1, s_cp437_2: CP437String;
    begin
      s_cpUTF8  := '║';
      s_cp437_1 := '║';  //<--- buggy
      s_cp437_2 := '║1';
    end.

s_cp437_1 receives wrong value:
_PROJECT1_Ld2:
.ascii "\272?\221\000"

the correct value should be:
.ascii "\272\000"

while the other two variables get correct values:

    _$PROJECT1$_Ld1:
       .ascii   "\342\225\221\000"

    _$PROJECT1$_Ld3:
       .ascii   "\2721\000"

Additional information:

One of the two wrong values in s_cp437_1 (\221) equals the last value in s_cpUTF8.

The example character U+2551 ║ is:
code page 437: 186 = &272
UTF8: $E2 $99 $AA = &342 &225 &221

I tested with FPC 3.0.4, Thaddy with trunk r38861.

Mantis conversion info:

Mantis ID: 33666
Fixed in version: 3.2.0
Fixed in revision: 40637 (#70cadc76),40735 (#fc9e9e80)
Target version: 3.2.0

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message