View Issue Details

IDProjectCategoryView StatusLast Update
0029851LazarusLazUtilspublic2016-03-24 00:47
ReporterBart BroersmaAssigned ToBart Broersma 
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Platformi386OSWindowsOS VersionWin7
Product Version1.7 (SVN)Product Buildr51965 
Target Version1.6.2Fixed in Version1.6.2 
Summary0029851: Bug in UTF8FindNearestCharStart
DescriptionUTF8FindNearestCharStart returns wrong result if BytePos points to $B8 in this 3-byte sequence $E0 $B8 $9A (which appears to be a valid codepoint: THAI CHARACTER BO BAIMAI, U+0E1A, see: http://unicode.scarfboy.com/?s=U%2b0E1A).

It returns an index pointing to $B8, where it should point to $E0 instead.
Steps To ReproduceUnzip and build attached sample.
(The sample project has more code than needed fo this test, but it will just run the test demonstarting the problem.

It outputs:
C:\Users\Bart\LazarusProjecten\ConsoleProjecten\bugs\comparestr>compare
Windows: using LazUtf8
$C3 $A4 $E0 $B8 $9A
1: NCS=0 B=C3
2: NCS=0 B=C3
3: NCS=2 B=E0
4: NCS=3 B=B8 Expected: E0
5: NCS=2 B=E0
Additional InformationI was looking for a similar function in LazUtf8 that would returnn the start of the codepoint, only if the codepoint was valid.
The sampleproject has a function Utf8FindCodepointStart(...): Boolean that does just that.

Run the TestUtf8FindCodepointStart procedure to see the difference in behaviour (with a string that also has invalid codepoints):

C:\Users\Bart\LazarusProjecten\ConsoleProjecten\bugs\comparestr>compare
Windows: using LazUtf8
$C3 $A4 $E0 $B8 $9A $81 $F0
 1 C3 TRUE B=C3 CL=2 Cur-S=0 | TRUE B=C3 CL=2 Idx=1 | NCS=0 B=C3
 2 A4 TRUE B=C3 CL=2 Cur-S=0 | TRUE B=C3 CL=2 Idx=1 | NCS=0 B=C3
 3 E0 TRUE B=E0 CL=3 Cur-S=2 | TRUE B=E0 CL=3 Idx=3 | NCS=2 B=E0
 4 B8 TRUE B=E0 CL=3 Cur-S=2 | TRUE B=E0 CL=3 Idx=3 | NCS=3 B=B8
 5 9A TRUE B=E0 CL=3 Cur-S=2 | TRUE B=E0 CL=3 Idx=3 | NCS=2 B=E0
 6 81 FALSE | FALSE | NCS=5 B=81
 7 F0 FALSE | FALSE | NCS=6 B=F0
TagsNo tags attached.
Fixed in Revisionr51973
LazTarget1.6.2
Widgetset
Attached Files

Activities

Bart Broersma

2016-03-17 00:33

developer  

compare.zip (6,362 bytes)

Issue History

Date Modified Username Field Change
2016-03-17 00:33 Bart Broersma New Issue
2016-03-17 00:33 Bart Broersma File Added: compare.zip
2016-03-17 00:44 Bart Broersma Description Updated View Revisions
2016-03-17 11:44 Bart Broersma Fixed in Revision => r51973
2016-03-17 11:44 Bart Broersma LazTarget - => 1.6.2
2016-03-17 11:44 Bart Broersma Status new => resolved
2016-03-17 11:44 Bart Broersma Fixed in Version => 1.6.2
2016-03-17 11:44 Bart Broersma Resolution open => fixed
2016-03-17 11:44 Bart Broersma Assigned To => Bart Broersma
2016-03-17 11:44 Bart Broersma Target Version => 1.6.2
2016-03-24 00:47 Bart Broersma Status resolved => closed