View Issue Details

IDProjectCategoryView StatusLast Update
0036983FPCCompilerpublic2020-05-01 06:23
Reporternanobit Assigned ToJonas Maebe  
PrioritynormalSeverityminorReproducibilitysometimes
Status resolvedResolutionwon't fix 
Platformwin32OSWindows 
Product Version3.3.1 
Summary0036983: Case else-branch should reliably catch undefined value of fixed-size enum
DescriptionOn undefined enum-input, the case-statement
should reliably run the else-block or leave (if "else" is absent).
Case enm of ... else ... end; // e.g. on (byte(enm)=0) and type Tenum = (e1=1, e2)

Current situation: Undefined input may lead to undefined case-behavior (crash)
before reaching user-code. Reason is a functional FPC Pascal limit.
The limit is not widely known and inconvenient compared with other compilers (Delphi, C#, C++).
To reduce the gap, a test-operator for external use has been proposed
(0033603) to prevent undefined behaviors in subsequent operations.

This is not a duplicate of other bug (effect)-reports, but gives a systematic overview:
The four main arguments for allowing undefined input in case-statement:
1) Enum holders can contain undefined values, thus testing is required.
This reality (see note 0000001/2) has to be accepted as the starting point.
2) Case-statements already are input-test-operations.
3) Input-test prior to case-statement is inconvenient, which hinders its wide adoption.
4) obtain compatibility with Delphi case-statement which allows undefined input
like switch() in C# (enum Tenum: Tint {...};) and C++ (enum class Tenum: Tint {...};)
Programmers are typically unaware of the FPC else-limit and mostly rely on
common interpretation of: ALL unlisted values (cases) go to "else" branch.
In that sense, the limit is a bug, in the better sense it's an unrequired burden
(always own catching (even for ignore) required and earlier than in builtin versions).
Additional InformationThe report objective is a single improvement:
To become compatible with (and reaching convenience of) other compilers:
Case-statement should use else-handling (else-block run, or leave if absent) for ALL unlisted cases.
It should not optimize away else-handling of certain enum-basetype values.

Two important notes (0000001/2, 0000002/2) are the foundation of this report.
They try to answer "all" possible report-questions which stay close
enough to the case-statement input and implementation.
TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget-
Attached Files

Activities

nanobit

2020-04-28 15:03

reporter   ~0122495

REPORT NOTE 1/2:
Case-statement:
Possible Tenum input:

What is the origin of fixed-size (basis of undefined values)?
Non-bitpacked TEnum in Delphi, FPC, C#, C++ is stored in underlying integer (Tint);
this combination is always documented, implementation-independent:
Sizeof( Tenum) (also important for streaming) and sign are fixed,
in FPC via Tenum declaration + $packenum.

How many different undefined Tenum values are possible?
UndefinedCount = (2^bitCount) - definedCount

How can undefined values arise in input?
1) Tenum value error (from failed source, corruption, uninitialized, or default Tenum(0) out-of-bounds)
2) Tenum is older (shorter list) in library which is upward compatible:
CaseStmt ignores or reports extended input-value from clients.
3) Tenum defines a case-selector (subview) on stored data (Tenum(data)),
where data has the fixed underlying integer-type (Tint) of Tenum.
(2) and (3) are legal Pascal use-cases of the documented relation to Tint.
All legal typecasting to Tenum can reinterpret stored values as undefined values.

Reversedly, can enm (Tenum) be forced to hold only defined (caseStmt crash-free) values?
1) Normal storage can be supervised, e.g. via graceful caseStmt which we want.
2) Sort of unusual: Bitpacking could be used for certain defined counts (2^bitCount).
(The concept of undefinedness (more-values-than-needed) enables bitpacking)
3) Conflictual workaround: reserve all: eMin = low( Tint), eMax = high( Tint)
Only (1) is of general use, but still the need is diverse like this:
Apps use case-statements each sensitive to a specific situation/task.

What does "undefined value" mean for function calls?
No specs (can) claim that Tenum-defined is the only possible value-state in programs.
Functions support at least Tenum-defined input, which should also be used if full support is unknown.
Functions (which use or are test-functions) can support Tenum-undefined input:
To sort-out undefined values and allow error-handling.

Does case-statement Always require defined enum-input?
-No, by docs and other compilers: All unlisted values (cases) go to "else" branch.
-No, by asset-theory: If stmts/functions can test/sort-out undefined values, then this is a useful asset.
-Yes with current FPC (introduced with optimize): case-statement needs defined enum-input.

nanobit

2020-04-28 15:04

reporter   ~0122496

REPORT NOTE 2/2:
Case-statement:
Focus on value-testing:

Use bounds-checking to preclude FPC caseStmt crash on enm:
(isNamedEnum() unneeded, because hole-values reach "else")
function isDefinedEnum( enm: Tenum): boolean;
var int: integer; // underlying type (basetype) or wider
begin
  int := integer( enm); // get ordinal value regardless of definedness
  result := (ord(Low(Tenum)) <= int) and (int <= ord(high(Tenum)));
end;

The case-statement is by its nature an input test-function too.
Thus there is no good reason, why the case-statement should
be dumber than isDefinedEnum( enm) for the same input:
Both test-functions declare Tenum and receive undefined values.
Both test-functions need to detect the presence (not meaning) of undefined values.
"if isDefinedEnum( enm) then case enm of ..." can be replaced with graceful case-else.
Fault-tolerant systems are safer if regular case-statements can gracefully check all values:
App-developers can easier focus on checking coherency and increasing redundancy to find solutions.

What should the case-else-statement do on undefined input?
The incoming value is stored in fixed underlying integer (Tint) of Tenum.
ALL stored bits build (represent) one Tenum-value (defined or not).
Ordinally compare input with listed cases; all unlisted cases belong to else-branch.
There is an important difference between undefined enum-value and undefined behavior.
An analogy: (List.indexOf( invalidPtr) = -1), but the valid (defined) range is not constant.
Analogously, we have listed (valid) and unlisted (valid + invalid) pointers. (-1) is graceful-else.

The case-statement is an abstract formulation hiding individual comparisons:
if (Tint(input) = ord( Lbl1)) then ... else ...
Ordinal comparison allows all possible Tenum input (defined and not).
After internal comparison, the user-code follows.

How do users handle undefined values after detection?
Often ignore (if value (command) is just unsupported), but generally start error-handling.

Jonas Maebe

2020-04-28 16:43

manager   ~0122500

Storing values that are invalid for a variable's type in that variable by definition leads to undefined behaviour, and hence cannot be handled by language constructs (otherwise the behaviour would not be undefined). It doesn't matter whether this is an enum, an ansistring, an integer, a pointer, or any other type.

Unnamed enum values in an enum with holes are still valid values for that enum, so those are handled in a defined way. It's also not just about "case", it's about how the entire compiler reasons about enums and subrange types at every step of the compilation process.

And yes, it _is_ reopening the same discussion again that has been had several times before.

nanobit

2020-05-01 05:38

reporter   ~0122571

Last edited: 2020-05-01 06:23

View 2 revisions

The specification (standard, convention) defines the behavior.
The defined caseStmt-behavior (for any input) is possible
because the caseStmt only needs defined mapping-behavior: input-value found (listed) or not.
CaseStmt means mapping from an input value to a piece of user-code.

isUndefined is an extra value-state, which is not always preventable in programs.
And for caseStmt, the useful convention (Delphi, C#, C++) is the support of the dataflow:
CaseStmt and assignStmt have the same input (Tenum) and the caseStmt can match the assignStmt:

enm2 := enm;

case enm of
e1: enm2 := enm;
else enm2 := enm;
end;

It is also convention, that compilers do not automatically remove else-blocks,
if the Tenum range (defined) does not completely cover the underlying basetype.
If the non-else branches are exhausting the defined range, then programmers
may omit the else-block or use something like the [[unreachable]] attribute in other languages.
But in both cases, the programmer decides (opt-in), not the compiler.

Again, the starting point is that undefined Tenum can occur in programs,
consequently some testing and some unprevented dataflow (before user-handling) must be supported.
How much, is a matter of convention and convenience.
If you are against the convention, then you must document this case for FPC.
And a range-error under {$R+} would be appropriate then, unlike a random crash.
But this would be still inferior to the report objective.

My notes in this report are sufficiently complete to fulfill the report objective,
thus I can pass it on.

Issue History

Date Modified Username Field Change
2020-04-28 15:01 nanobit New Issue
2020-04-28 15:03 nanobit Note Added: 0122495
2020-04-28 15:04 nanobit Note Added: 0122496
2020-04-28 16:43 Jonas Maebe Assigned To => Jonas Maebe
2020-04-28 16:43 Jonas Maebe Status new => resolved
2020-04-28 16:43 Jonas Maebe Resolution open => won't fix
2020-04-28 16:43 Jonas Maebe FPCTarget => -
2020-04-28 16:43 Jonas Maebe Note Added: 0122500
2020-05-01 05:38 nanobit Note Added: 0122571
2020-05-01 06:23 nanobit Note Edited: 0122571 View Revisions