View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0037315||FPC||RTL||public||2020-07-08 10:53||2020-08-14 07:42|
|Reporter||CudaText man||Assigned To||Michael Van Canneyt|
|Summary||0037315: WideStrUtils - more implemented|
User asked to add some func there, so I added N funcs.
|Tags||No tags attached.|
|Fixed in Revision|
tst-widestrutils.zip (3,629 bytes)
||Oh well, yet another half-baked invention created in Delphi. :-( I feel somehow that TEncodeType should contain at least something like etUTF16 (and preferably also etUTF32 for completeness) in addition to the other values defined for that set... Obviously, this comment is not meant as critics of the contribution provided above; the contribution is appreciated as a compatibility improvement. I realize that the problem is in the original Delphi definition of the mentioned type.|
updated unit fpdetectutf8 (optimized).
fpdetectutf8.zip (929 bytes)
> that TEncodeType should contain at least something like etUTF16 (and preferably also etUTF32
no, Delphi type is fully Ok, no need in utf16 (very hard to detect even for Slavic txt, and very hard for CJK).
What's so difficult with reading the different BOMs? You cannot differentiate UTF-8 without BOM from complex 8-bit codepage text file using full range of characters completely reliably either...
And if you say that there's no need - well, there's obviously no such need for people interested in Delphi compatibility, which is probably the whole point of this unit. Apart from that, one could hardly say that there are no UTF-16 encoded text files which might need to be handled in Pascal code, right?
I dont get what do you want from me-- my code adds Delphi compatable funcs. It is my own code.
||"no need in utf16 here"- I mean that it's missed in Delphi. and its missed - coz it's hard to make euristics.|
||I do not "want" anything. I would just prefer to have the etUTF16 and etUTF32 values included in the set definition and the code to support recognition of the respective BOM marks. Not heuristics - I believe (I may be wrong, of course) that most UTF-x plain text includes BOM nowadays. But again, my comment doesn't imply that your contribution cannot be accepted as it is (again, thanks for providing it!). I didn't assign the report to myself, any other member of the FPC team may include it if he believes that my reasoning is wrong.|
||I may add utf16/32 LE/BE detection, after it's applied.|
|2020-07-08 10:53||CudaText man||New Issue|
|2020-07-08 10:53||CudaText man||File Added: tst-widestrutils.zip|
|2020-07-08 11:00||Michael Van Canneyt||Assigned To||=> Michael Van Canneyt|
|2020-07-08 11:00||Michael Van Canneyt||Status||new => assigned|
|2020-07-08 11:29||Tomas Hajny||Note Added: 0123815|
|2020-07-09 23:07||CudaText man||Note Added: 0123857|
|2020-07-09 23:07||CudaText man||File Added: fpdetectutf8.zip|
|2020-07-09 23:09||CudaText man||Note Added: 0123858|
|2020-07-09 23:19||Tomas Hajny||Note Added: 0123859|
|2020-07-09 23:23||Tomas Hajny||Note Edited: 0123859||View Revisions|
|2020-08-13 20:07||CudaText man||Note Added: 0124851|
|2020-08-13 20:08||CudaText man||Note Edited: 0124851||View Revisions|
|2020-08-13 20:11||CudaText man||Note Added: 0124852|
|2020-08-13 21:08||Tomas Hajny||Note Added: 0124855|
|2020-08-14 07:42||CudaText man||Note Added: 0124865|