View Issue Details

IDProjectCategoryView StatusLast Update
0036492LazarusPackagespublic2019-12-30 15:03
ReporterCudaText man Assigned ToJuha Manninen  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version2.1 (SVN) 
Summary0036492: LConvEncoding fix for CP936 and other DBCS codepages
DescriptionIssue is described here
https://github.com/Alexey-T/CudaText/issues/2344
Issue is that cp936 skips char when converting bytes $A1 $47 to utf8.
My fix is for clone of LConvEncoding
https://github.com/Alexey-T/EncConv/commit/0044e2197859fa9920d786a4482cd8a36aa4f603

the same must be done to Lazarus
lazarus/asiancodepagefunctions.inc
function DBCSToUTF8(const s: string; const ArrayUni, ArrayCP: array of word): string;
TagsNo tags attached.
Fixed in Revisionr62468, r62471
LazTarget-
Widgetset
Attached Files

Activities

CudaText man

2019-12-29 11:00

reporter  

ermode.diff (2,580 bytes)   
Index: components/lazutils/asiancodepagefunctions.inc
===================================================================
--- components/lazutils/asiancodepagefunctions.inc	(revision 62452)
+++ components/lazutils/asiancodepagefunctions.inc	(working copy)
@@ -46,6 +46,20 @@
       begin
         l:=UnicodeToUTF8Inline(code,Dest);
         inc(Dest,l);
+      end
+      else
+      case ConvertEncodingErrorMode of
+        ceemSkip:
+          begin end;
+        ceemException:
+          raise EConvertError.Create('Cannot convert DBCS code page to UTF8');
+        ceemReplace:
+          begin
+            Dest^:='?';
+            Inc(Dest);
+          end;
+        ceemReturmEmpty:
+          Exit('');
       end;
     end;
   until false;
@@ -209,8 +223,19 @@
         Inc(Dest);
       end
       else
-      if ConvertEncodingFromUtf8RaisesException then
-        raise EConvertError.Create('Cannot convert UTF8 to DBCS code page');
+      case ConvertEncodingErrorMode of
+        ceemSkip:
+          begin end;
+        ceemException:
+          raise EConvertError.Create('Cannot convert UTF8 to DBCS code page');
+        ceemReplace:
+          begin
+            Dest^ := '?';
+            Inc(Dest);
+          end;
+        ceemReturmEmpty:
+          Exit('');
+      end;
     end;
   until false;
   //SetLength(Result, Dest - PChar(Result));
Index: components/lazutils/lconvencoding.pas
===================================================================
--- components/lazutils/lconvencoding.pas	(revision 62452)
+++ components/lazutils/lconvencoding.pas	(working copy)
@@ -31,8 +31,16 @@
   SysUtils, Classes, dos, LazUTF8
   {$IFDEF EnableIconvEnc},iconvenc{$ENDIF};
 
+type
+  TConvertEncodingErrorMode = (
+    ceemSkip,
+    ceemException,
+    ceemReplace,
+    ceemReturmEmpty
+    );
+
 var
-  ConvertEncodingFromUtf8RaisesException: boolean = False;
+  ConvertEncodingErrorMode: TConvertEncodingErrorMode = ceemSkip;
 
 //encoding names
 const
@@ -2105,8 +2113,19 @@
         inc(Dest);
       end
       else
-      if ConvertEncodingFromUtf8RaisesException then
-        raise EConvertError.Create('Cannot convert UTF8 to single byte');
+      case ConvertEncodingErrorMode of
+        ceemSkip:
+          begin end;
+        ceemException:
+          raise EConvertError.Create('Cannot convert UTF8 to single byte');
+        ceemReplace:
+          begin
+            Dest^:='?';
+            inc(Dest);
+          end;
+        ceemReturmEmpty:
+          Exit('');
+      end;
     end;
   end;
   SetLength(Result,Dest-PChar(Result));
ermode.diff (2,580 bytes)   

CudaText man

2019-12-29 11:00

reporter   ~0120123

I added global var ConvertEncodingErrorMode, which replaces old "raise exception flag" and solves it.
patch.

Juha Manninen

2019-12-30 00:23

developer   ~0120141

I applied the patch in r62468. It looks good.
The fundamental error in conversion however should be fixed. What causes it? Unicode covers every possible character so conversion should be possible.

CudaText man

2019-12-30 14:13

reporter   ~0120150

Last edited: 2019-12-30 14:14

View 2 revisions

>What causes it?

Juha, you didn't read my Github issue report. User opened random text file in cp936. It has bytes
61 A1 47 62. Bytes A1 47 form the pair, which canNOT be converted by cp936.

Apply new fix to fix typo in id, sorry.
fix.diff.

fix.diff (1,421 bytes)   
Index: components/lazutils/asiancodepagefunctions.inc
===================================================================
--- components/lazutils/asiancodepagefunctions.inc	(revision 62469)
+++ components/lazutils/asiancodepagefunctions.inc	(working copy)
@@ -58,7 +58,7 @@
             Dest^:='?';
             Inc(Dest);
           end;
-        ceemReturmEmpty:
+        ceemReturnEmpty:
           Exit('');
       end;
     end;
@@ -233,7 +233,7 @@
             Dest^ := '?';
             Inc(Dest);
           end;
-        ceemReturmEmpty:
+        ceemReturnEmpty:
           Exit('');
       end;
     end;
Index: components/lazutils/lconvencoding.pas
===================================================================
--- components/lazutils/lconvencoding.pas	(revision 62469)
+++ components/lazutils/lconvencoding.pas	(working copy)
@@ -36,10 +36,12 @@
     ceemSkip,
     ceemException,
     ceemReplace,
-    ceemReturmEmpty
+    ceemReturnEmpty
     );
 
 var
+  //Global variable which controls behaviour of encoding conversion error, in 3 places:
+  //a) UTF8 to single byte encoding, b) DBCS (Asian) encoding to UTF8, c) UTF8 to DBCS
   ConvertEncodingErrorMode: TConvertEncodingErrorMode = ceemSkip;
 
 //encoding names
@@ -2123,7 +2125,7 @@
             Dest^:='?';
             inc(Dest);
           end;
-        ceemReturmEmpty:
+        ceemReturnEmpty:
           Exit('');
       end;
     end;
fix.diff (1,421 bytes)   

Juha Manninen

2019-12-30 15:03

developer   ~0120154

I read but didn't understand apparently. :)
So the byte sequence was not valid cp936. Then it cannot be converted. Understood.
I applied the spelling correction in r62471.

Issue History

Date Modified Username Field Change
2019-12-29 09:49 CudaText man New Issue
2019-12-29 11:00 CudaText man File Added: ermode.diff
2019-12-29 11:00 CudaText man Note Added: 0120123
2019-12-30 00:04 Juha Manninen Assigned To => Juha Manninen
2019-12-30 00:04 Juha Manninen Status new => assigned
2019-12-30 00:23 Juha Manninen Status assigned => feedback
2019-12-30 00:23 Juha Manninen LazTarget => -
2019-12-30 00:23 Juha Manninen Note Added: 0120141
2019-12-30 00:23 Juha Manninen Fixed in Revision => r62468
2019-12-30 14:13 CudaText man File Added: fix.diff
2019-12-30 14:13 CudaText man Note Added: 0120150
2019-12-30 14:13 CudaText man Status feedback => assigned
2019-12-30 14:14 CudaText man Note Edited: 0120150 View Revisions
2019-12-30 15:03 Juha Manninen Note Added: 0120154
2019-12-30 15:03 Juha Manninen Status assigned => resolved
2019-12-30 15:03 Juha Manninen Resolution open => fixed
2019-12-30 15:03 Juha Manninen Fixed in Revision r62468 => r62468, r62471