View Issue Details

IDProjectCategoryView StatusLast Update
0018497Lazarus CCR-public2011-01-14 21:58
ReporterDr. Udo Rempe Assigned ToVincent Snijders  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionwon't fix 
Summary0018497: assigning chars as one-byte is not possible in the case of German Umlaute
DescriptionIf a variable sym is declared as char and assignment "sym := 'Ä' ;" follows, the character 'Ä' is stored in a two-byte string with the values 0000002, 0000195, 0000132 in the positions 0, 1, and 2. It is not stored as an ASCII-character with the ordinal value 142 or an ANSI-character with the ordinal value 196; but "sym := 0000196 ;" or "sym := 0000142 ;" remain possible. The same is the case in the other German "Umlaut"-letters. In string constants the "Umlaut"-characters are too stored as two bytes. Such difficulties occur in all German "Umlaut"-characters. If characters with ordinal values below 128 are used (English letters) they are stored as one-byte chars and such difficulties cannot be observed. Such difficulties do not occur in Turbo Delphi 6.0. But in Delphi 2010 the difficulties are even larger than in Lazarus.
The difficulties did not occur in the case
Additional Informationof using Lazarus in combination with Windows 7 but only in combination with Windows Vista. Since the char constant 'Ä' was treated as a two-byte string the assignment "sym := 'Ä' ;" was consequently treated as illegal. Probably the implementation of some compiler switch may be feasible to avoid such incompatibilities.
TagsNo tags attached.
Widgetset
Attached Files

Activities

Vincent Snijders

2011-01-13 23:10

administrator   ~0045158

The source file is using UTF-8 encoding by default, so 'Ä' is not a one byte char, but a two byte string.

Dr. Udo Rempe

2011-01-14 18:40

reporter   ~0045182

In Delphi 2010 the following changes eliminate difficulies:
(1) Use of the compiler switch "{SH+}".
(2) Redefining the type char by "type char = ANSIchar ;".
(3) Redefining the function chr by "
 function chr ( i : integer ) : ANSIchar ;
  begin
   chr := ANSIchar ( i )
  end {ANSIchar procedure ( sym )} ;
".
In Lazarus these changes are accepted too. But these changes are not sufficient. As colleague Snijders pointed out the default UTF-8 encoding of the source file is the cause. Encoding can be changed by right-clicking to the name of the source file if it is opened by the text editor. Then the popup menu appears. There
(1) File Settings,
(2) Encoding, and
(3) ANSI
can be clicked. (In German versions "Dateieinstellungen/Kodierung/ANSI").
Thereafter Lazarus does not longer announce syntax errors and for instance 'Ä' can too be used in sets or as case labels. Therefore changes to Lazarus are not urgent if these possibilities are included in the documentations. But it seems more consequent automatically to use ANSI-encoding as default if the compiler switch {$H+} demands ANSI-encoding for variable strings.
Sincerely

Udo Rempe

Vincent Snijders

2011-01-14 21:58

administrator   ~0045189

Last edited: 2011-01-14 21:59

No, to support full unicode, all text in the LCL, the main library used to write Lazarus applications, must be UTF-8 encoded. Therefore the default source file encoding will remain UTF-8. Lazarus (mis-)uses ansistring to store UTF8-encoded string, even though the name is ansistring.

See also: http://wiki.lazarus.freepascal.org/LCL_Unicode_Support

Please, ask further questions or add further observations on the Lazarus forums and / or mailing lists.

Issue History

Date Modified Username Field Change
2011-01-13 22:48 Dr. Udo Rempe New Issue
2011-01-13 23:10 Vincent Snijders Note Added: 0045158
2011-01-14 18:40 Dr. Udo Rempe Note Added: 0045182
2011-01-14 21:58 Vincent Snijders Status new => resolved
2011-01-14 21:58 Vincent Snijders Resolution open => won't fix
2011-01-14 21:58 Vincent Snijders Assigned To => Vincent Snijders
2011-01-14 21:58 Vincent Snijders Note Added: 0045189
2011-01-14 21:59 Vincent Snijders Note Edited: 0045189