View Issue Details

IDProjectCategoryView StatusLast Update
0038008FPCRTLpublic2020-11-13 21:36
ReporterCudaText man_ Assigned ToMichael Van Canneyt  
PrioritynormalSeverityminorReproducibilityN/A
Status closedResolutionfixed 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0038008: feature req: Utf8ToUnicodeEx with ErrorMode
Descriptionhttps://github.com/graemeg/freepascal/blob/master/rtl/inc/ustrings.inc
Utf8ToUnicode exists. It always handles 'bad chars' like this

                        //Not valid UTF-8 sequence
                        UC:=UNICODE_INVALID; //Alexey-- ord('?')

So my app cannot detect DATA LOSS. replacing of 'bad chars' to '?' is DATA LOSS if I pass some random German text file to UTf8ToUnicode.
Feature req.
Add Utf8ToUnicodeEx with the same params + new param ErrorMode. ErrorMode is enum (eemReplace, eemException).
Utf8ToUnicode will call Utf8ToUnicodeEx with ErrorMode=eemReplace.
In my app I will call Utf8ToUnicodeEx with ErrorMode=eemException.

Change is simple: replace

                        //Not valid UTF-8 sequence
                        UC:=UNICODE_INVALID; //Alexey-- ord('?')

to call DoError(UC). DoError will be internal procedure which raises exception.
TagsNo tags attached.
Fixed in Revision47391
FPCOldBugId
FPCTarget3.2.2
Attached Files

Activities

Bart Broersma

2020-10-29 17:25

reporter   ~0126639

This will break existing programs AFAICS, without a proper patch though, it's hard to tell what exactly you proposal is.

Sven Barth

2020-10-30 09:56

manager   ~0126650

There is no breaking of existing programs, because the reporter requested a new overload where the existing Utf8ToUnicode calls the new one with the backwards compatible parameter.

Bart Broersma

2020-10-30 10:26

reporter   ~0126654

Withou a patch it was a bit hard to deduce that from the description.

Sven Barth

2020-10-30 14:47

manager   ~0126657

Not really. The following part from the summary contained the information that I told you:

> Add Utf8ToUnicodeEx with the same params + new param ErrorMode. ErrorMode is enum (eemReplace, eemException).
Utf8ToUnicode will call Utf8ToUnicodeEx with ErrorMode=eemReplace.
In my app I will call Utf8ToUnicodeEx with ErrorMode=eemException.

Bart Broersma

2020-10-30 14:59

reporter   ~0126658

I stand corrected then.
Sorry for the noise.

Bart Broersma

2020-11-08 10:50

reporter   ~0126788

Is it feasible to change the Utf8ToUnicode signature to something like this?
function Utf8ToUnicode(Dest: PUnicodeChar; Source: PChar; MaxChars: SizeInt; RaiseExceptionOnError: Boolean=False): SizeInt;{$ifdef SYSTEMINLINE}inline;{$endif}
function Utf8ToUnicode(Dest: PUnicodeChar; MaxDestChars: SizeUInt; Source: PChar; SourceBytes: SizeUInt; RaiseExceptionOnError: Boolean=False): SizeUInt;

CudaText man

2020-11-08 10:56

reporter   ~0126789

It is also OK for me, such RaiseExceptionOnError:boolean.

Michael Van Canneyt

2020-11-08 17:19

administrator   ~0126794

No, we cannot change the signature for backwards compatibility reasons:
You need an overload in case someone used the current version in a callback.
It has happened in the past and we got complaints.

Michael Van Canneyt

2020-11-12 10:18

administrator   ~0126850

Added an overload:

function Utf8ToUnicode(Dest: PUnicodeChar; MaxDestChars: SizeUInt; Source: PChar; SourceBytes: SizeUInt; IgnoreInvalid : Boolean): SizeUInt;

Existing functions call this with IgnoreInvalid = True.

On error, run-time error 231 is triggered. This is converted by SysUtils to an EConversionError.

CudaText man_

2020-11-13 20:41

reporter   ~0126890

posted by mistake. pls close again.

Issue History

Date Modified Username Field Change
2020-10-29 15:41 CudaText man_ New Issue
2020-10-29 17:25 Bart Broersma Note Added: 0126639
2020-10-30 09:56 Sven Barth Note Added: 0126650
2020-10-30 10:26 Bart Broersma Note Added: 0126654
2020-10-30 14:47 Sven Barth Note Added: 0126657
2020-10-30 14:59 Bart Broersma Note Added: 0126658
2020-11-08 10:50 Bart Broersma Note Added: 0126788
2020-11-08 10:56 CudaText man Note Added: 0126789
2020-11-08 17:19 Michael Van Canneyt Note Added: 0126794
2020-11-12 10:18 Michael Van Canneyt Assigned To => Michael Van Canneyt
2020-11-12 10:18 Michael Van Canneyt Status new => resolved
2020-11-12 10:18 Michael Van Canneyt Resolution open => fixed
2020-11-12 10:18 Michael Van Canneyt Fixed in Version => 3.3.1
2020-11-12 10:18 Michael Van Canneyt Fixed in Revision => 47391
2020-11-12 10:18 Michael Van Canneyt FPCTarget => 3.2.2
2020-11-12 10:18 Michael Van Canneyt Note Added: 0126850
2020-11-13 20:40 CudaText man_ Status resolved => feedback
2020-11-13 20:40 CudaText man_ Resolution fixed => open
2020-11-13 20:41 CudaText man_ Note Added: 0126890
2020-11-13 20:41 CudaText man_ Status feedback => assigned
2020-11-13 21:36 Michael Van Canneyt Status assigned => closed
2020-11-13 21:36 Michael Van Canneyt Resolution open => fixed