View Issue Details

IDProjectCategoryView StatusLast Update
0034754FPCRTLpublic2019-08-02 17:03
Reporterrd0xAssigned ToMichael Van Canneyt 
PrioritynormalSeverityminorReproducibilityN/A
Status closedResolutionfixed 
Product Version3.2.0Product Build 
Target Version3.2.0Fixed in Version3.3.1 
Summary0034754: Add IsLeadChar for Delphi compatibility
DescriptionFunction IsLeadChar from Delphi is missing in FPC.

function IsLeadChar(C: AnsiChar): Boolean;
function IsLeadChar(C: Byte): Boolean;
function IsLeadChar(C: WideChar): Boolean;

"Checks whether a character is a valid lead character (first in a multi-byte character sequence).

Call IsLeadChar to check whether a character represents a valid lead character in the current system locale. A lead character is the first in a multi-byte character sequence. For ANSI characters, IsLeadChar uses the LeadBytes variable. Wide characters have a fixed locale-independent number of allowed lead characters."

http://docwiki.embarcadero.com/Libraries/Tokyo/en/System.SysUtils.IsLeadChar
Additional InformationI think one implementation can be seen here:
https://github.com/pasdoc/pasdoc/blob/master/source/component/PasDoc_Utils.pas#L668
TagsNo tags attached.
Fixed in Revision41337
FPCOldBugId
FPCTarget
Attached Files

Activities

Thaddy de Koning

2018-12-23 13:43

reporter   ~0112827

Isn't the link an exact copy of the Delphi sources? (but it seems trivial.)
I will check that.

rd0x

2019-01-27 14:46

reporter   ~0113668

Last edited: 2019-01-27 15:44

View 3 revisions

In between I found these implementations:
Delphi:
(deleted)
Indy: (Ansi only)
https://github.com/fabioxgn/IRC/blob/master/Indy10/Protocols/IdGlobalProtocols.pas#L4486

Marco van de Voort

2019-01-27 15:53

manager   ~0113670

Last edited: 2019-01-27 15:58

View 2 revisions

Please don't post links to clearly copyrighted sources. I removed them from your comments.

I meanwhile was also looking, and while the widechar is very predictable, so I commited it, but I also found out that the ansichar one uses the leadbytes variable which is defined but not filled yet:

http://docwiki.embarcadero.com/RADStudio/Rio/en/Unicode_in_RAD_Studio

I assume Delphi gets them from the Windows locale info somehow, but our 1-byte encoding (e.g. utf8 in lazarus) can be different from the system encoding (e.g. in the Western world typically some windows 12x0 equivalent of ISO 8859-* encoding), so that needs different handling and thinking.

Worse, file and general apis ( DefaultFileSystemCodePage , DefaultSystemCodePage , DefaultRTLFileSystemCodePage) can be different again, needing deep thinking how you apply such function.

J. Gareth Moreton

2019-01-27 20:53

developer   ~0113678

I presume the whole "lead char" thing is not something that's exclusive to UTF8 - if it was, then the result is True if "(C and $C0) <> $80" - that is, the first two bits are either 00, 01 or 11, but not 10.

Bart Broersma

2019-01-27 21:24

reporter   ~0113680

http://docwiki.embarcadero.com/Libraries/Tokyo/en/System.WideStrUtils.IsUTF8LeadByte says different?

J. Gareth Moreton

2019-01-28 07:24

developer   ~0113686

Last edited: 2019-01-28 07:26

View 2 revisions

Ah yes. It's almost identical, with the exception of values 192, 193, 254 and 255.

One thing to be careful of... some dialects of UTF-8 allow the sequence $C0 $80 for encoding a null character, since $00 is frequently used as a terminator.

Marco van de Voort

2019-01-28 11:05

manager   ~0113693

Well, for older East Asian encodings it matters, but FPC doesn't really support them.

So that leaves a few problems:

- startup state is plain ascii, without leadchars.
- lazarus sets parts to utf8.
- It is possible to set one of the named codepage variables to utf8, and one not. I assume it is logic to follow DefaultSystemCodepage.
- Bart's delphi link seems to indicate that 1-byte sequences (ASCII) also have isutf8leadbyte true?

Marco van de Voort

2019-01-29 11:23

manager   ~0113706

Last edited: 2019-01-29 11:26

View 2 revisions

{$apptype console}

uses widestrutils,sysutils;

var c : ansichar;
begin
 c:=0000065;
 writeln( isutf8leadbyte(chr(65)),' ',isleadchar(c));
end.

returns "true false" in Delphi XE10, but the default encoding is not utf8, so the isleadchar result is doutbtful.

Michael Van Canneyt

2019-02-16 11:57

administrator   ~0114176

I added an initialization for LeadBytes on windows.

On Linux, it does not make sense to have this. MBCS is a windows thing, I do not believe we should consider UTF8 a MBCS in the sense that Windows had it.
As marco said, it was something for eastern locales

Despite the name the isUTF8LeadChar is not the "UTF8 equivalent" of IsLeadChar, it just says whether a character is valid in UTF8, not if it is a valid start of a multi-byte character. (which is what IsLeadChar does)

Please test and close if OK.

Issue History

Date Modified Username Field Change
2018-12-23 11:36 rd0x New Issue
2018-12-23 11:54 Michael Van Canneyt Assigned To => Michael Van Canneyt
2018-12-23 11:54 Michael Van Canneyt Status new => assigned
2018-12-23 13:43 Thaddy de Koning Note Added: 0112827
2019-01-27 14:46 rd0x Note Added: 0113668
2019-01-27 15:44 Marco van de Voort Note Edited: 0113668 View Revisions
2019-01-27 15:44 Marco van de Voort Note Edited: 0113668 View Revisions
2019-01-27 15:53 Marco van de Voort Note Added: 0113670
2019-01-27 15:53 Marco van de Voort Fixed in Revision => 41085
2019-01-27 15:58 Marco van de Voort Note Edited: 0113670 View Revisions
2019-01-27 20:53 J. Gareth Moreton Note Added: 0113678
2019-01-27 21:24 Bart Broersma Note Added: 0113680
2019-01-28 07:24 J. Gareth Moreton Note Added: 0113686
2019-01-28 07:26 J. Gareth Moreton Note Edited: 0113686 View Revisions
2019-01-28 11:05 Marco van de Voort Note Added: 0113693
2019-01-29 11:23 Marco van de Voort Note Added: 0113706
2019-01-29 11:26 Marco van de Voort Note Edited: 0113706 View Revisions
2019-02-16 11:57 Michael Van Canneyt Fixed in Revision 41085 => 41337
2019-02-16 11:57 Michael Van Canneyt Note Added: 0114176
2019-02-16 11:57 Michael Van Canneyt Status assigned => resolved
2019-02-16 11:57 Michael Van Canneyt Fixed in Version => 3.3.1
2019-02-16 11:57 Michael Van Canneyt Resolution open => fixed
2019-02-16 11:57 Michael Van Canneyt Target Version => 3.2.0
2019-08-02 17:03 rd0x Status resolved => closed