Assigning one character to ANSI string gives wrong .ascii output
Original Reporter info from Mantis: engkin @engkin
-
Reporter name:
Original Reporter info from Mantis: engkin @engkin
- Reporter name:
Description:
Assigning one character to a single-byte code page string can produce wrong result if the character has a UTF8 value two or more bytes long.
It seems that the internal conversion from UTF8 to the target code page does not terminate the string properly. And it leave some UTF8 related bytes at the end of the .ascii output.
Steps to reproduce:
Check the assembly file (-al) produced for the following program (also attached):
program Project1;
{$mode objfpc}{$H+}
{$Codepage UTF8}
type
CP437String = type ansistring(437);
var
s_cpUTF8: string;
s_cp437_1, s_cp437_2: CP437String;
begin
s_cpUTF8 := '║';
s_cp437_1 := '║'; //<--- buggy
s_cp437_2 := '║1';
end.
s_cp437_1 receives wrong value:
_PROJECT1
_Ld2:
.ascii "\272?\221\000"
the correct value should be:
.ascii "\272\000"
while the other two variables get correct values:
_$PROJECT1$_Ld1:
.ascii "\342\225\221\000"
_$PROJECT1$_Ld3:
.ascii "\2721\000"
Additional information:
One of the two wrong values in s_cp437_1 (\221) equals the last value in s_cpUTF8.
The example character U+2551 ║ is:
code page 437: 186 = &272
UTF8: $E2 $99 $AA = &342 &225 &221
I tested with FPC 3.0.4, Thaddy with trunk r38861.
Related forum post:
http://forum.lazarus.freepascal.org/index.php/topic,41095.0.html