View Issue Details

IDProjectCategoryView StatusLast Update
0037315FPCRTLpublic2020-11-13 22:28
ReporterCudaText man Assigned ToMichael Van Canneyt  
PrioritynormalSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Fixed in Version3.3.1 
Summary0037315: WideStrUtils - more implemented
Descriptionhttps://github.com/graemeg/freepascal/blob/master/packages/rtl-objpas/src/inc/widestrutils.pp

Orig docs
http://docwiki.embarcadero.com/Libraries/Rio/en/System.WideStrUtils

User asked to add some func there, so I added N funcs.
attached.
TagsNo tags attached.
Fixed in Revision47393
FPCOldBugId
FPCTarget3.2.2
Attached Files

Activities

CudaText man

2020-07-08 10:53

reporter  

tst-widestrutils.zip (3,629 bytes)

Tomas Hajny

2020-07-08 11:29

manager   ~0123815

Oh well, yet another half-baked invention created in Delphi. :-( I feel somehow that TEncodeType should contain at least something like etUTF16 (and preferably also etUTF32 for completeness) in addition to the other values defined for that set... Obviously, this comment is not meant as critics of the contribution provided above; the contribution is appreciated as a compatibility improvement. I realize that the problem is in the original Delphi definition of the mentioned type.

CudaText man

2020-07-09 23:07

reporter   ~0123857

updated unit fpdetectutf8 (optimized).
fpdetectutf8.zip (929 bytes)

CudaText man

2020-07-09 23:09

reporter   ~0123858

> that TEncodeType should contain at least something like etUTF16 (and preferably also etUTF32

no, Delphi type is fully Ok, no need in utf16 (very hard to detect even for Slavic txt, and very hard for CJK).

Tomas Hajny

2020-07-09 23:19

manager   ~0123859

Last edited: 2020-07-09 23:23

View 2 revisions

What's so difficult with reading the different BOMs? You cannot differentiate UTF-8 without BOM from complex 8-bit codepage text file using full range of characters completely reliably either...

And if you say that there's no need - well, there's obviously no such need for people interested in Delphi compatibility, which is probably the whole point of this unit. Apart from that, one could hardly say that there are no UTF-16 encoded text files which might need to be handled in Pascal code, right?

CudaText man

2020-08-13 20:07

reporter   ~0124851

Last edited: 2020-08-13 20:08

View 2 revisions

Tomas,
I dont get what do you want from me-- my code adds Delphi compatable funcs. It is my own code.

CudaText man

2020-08-13 20:11

reporter   ~0124852

"no need in utf16 here"- I mean that it's missed in Delphi. and its missed - coz it's hard to make euristics.

Tomas Hajny

2020-08-13 21:08

manager   ~0124855

I do not "want" anything. I would just prefer to have the etUTF16 and etUTF32 values included in the set definition and the code to support recognition of the respective BOM marks. Not heuristics - I believe (I may be wrong, of course) that most UTF-x plain text includes BOM nowadays. But again, my comment doesn't imply that your contribution cannot be accepted as it is (again, thanks for providing it!). I didn't assign the report to myself, any other member of the FPC team may include it if he believes that my reasoning is wrong.

CudaText man

2020-08-14 07:42

reporter   ~0124865

I may add utf16/32 LE/BE detection, after it's applied.

Michael Van Canneyt

2020-11-12 11:13

administrator   ~0126853

Applied. Made the fpDetectUTF8 code part of the widestrutils unit (left code origin mention intact).
Thanks for the patch.

CudaText man

2020-11-13 20:48

reporter   ~0126891

I reopen to tell, that I updated support function in the https://github.com/Alexey-T/ATSynEdit/blob/master/atsynedit/atstringproc_utf8detect.pas
Function now returns enum with 3 values, which is useful for my app CudaText: CudaText needed to know additional 3rd state of utf8 buffer detection. Pls update function in FPC?

Michael Van Canneyt

2020-11-13 22:28

administrator   ~0126897

Please provide a patch in a separate bugreport, thanks !

Issue History

Date Modified Username Field Change
2020-07-08 10:53 CudaText man New Issue
2020-07-08 10:53 CudaText man File Added: tst-widestrutils.zip
2020-07-08 11:00 Michael Van Canneyt Assigned To => Michael Van Canneyt
2020-07-08 11:00 Michael Van Canneyt Status new => assigned
2020-07-08 11:29 Tomas Hajny Note Added: 0123815
2020-07-09 23:07 CudaText man Note Added: 0123857
2020-07-09 23:07 CudaText man File Added: fpdetectutf8.zip
2020-07-09 23:09 CudaText man Note Added: 0123858
2020-07-09 23:19 Tomas Hajny Note Added: 0123859
2020-07-09 23:23 Tomas Hajny Note Edited: 0123859 View Revisions
2020-08-13 20:07 CudaText man Note Added: 0124851
2020-08-13 20:08 CudaText man Note Edited: 0124851 View Revisions
2020-08-13 20:11 CudaText man Note Added: 0124852
2020-08-13 21:08 Tomas Hajny Note Added: 0124855
2020-08-14 07:42 CudaText man Note Added: 0124865
2020-11-12 11:13 Michael Van Canneyt Status assigned => resolved
2020-11-12 11:13 Michael Van Canneyt Resolution open => fixed
2020-11-12 11:13 Michael Van Canneyt Fixed in Version => 3.3.1
2020-11-12 11:13 Michael Van Canneyt Fixed in Revision => 47393
2020-11-12 11:13 Michael Van Canneyt FPCTarget => 3.2.2
2020-11-12 11:13 Michael Van Canneyt Note Added: 0126853
2020-11-13 20:48 CudaText man Note Added: 0126891
2020-11-13 22:28 Michael Van Canneyt Note Added: 0126897