View Issue Details

IDProjectCategoryView StatusLast Update
0027707LazarusIDEpublic2016-12-11 15:18
Reportermalcome Assigned ToMartin Friebe  
PrioritynormalSeverityminorReproducibilityalways
Status assignedResolutionopen 
PlatformWindows 
Product Version1.4RC2 
Summary0027707: Source Editor(TSynEdit) doesn't respect "East Asian Width".
DescriptionSee attached file.
Ref. East Asian Width - http://www.unicode.org/reports/tr11/
TagsNo tags attached.
Fixed in Revision
LazTarget-
WidgetsetWin32/Win64
Attached Files

Relationships

related to 0028540 new TMemo will cause crash in ATOK (Japanese input method editor) 
related to 0026369 new uim and synedit. 
related to 0013374 assignedDmitry Boyarintsev Cannot input Japanese character in IDE(SynEdit: GTK2, Carbon) 
related to 0030478 resolvedJuha Manninen Cannot input chinese character 

Activities

malcome

2015-03-21 23:17

reporter  

test.png (22,015 bytes)   
test.png (22,015 bytes)   

Martin Friebe

2015-03-22 00:34

manager   ~0082170

Please attach a file containing the text, that you show in your screenshot,

Also specify, which font you use in your editor.

Martin Friebe

2015-03-22 00:38

manager   ~0082171

Also note, if any of those are "ambiguous": http://www.unicode.org/reports/tr11/#Ambiguous

Then this is a known issue.
To resolve ambiguous ones, SynEdit will need to use external unicode libraries (or ask the OS, if the OS provides the info). This will not be solved soon.

malcome

2015-03-22 00:55

reporter   ~0082172

Is below OK?

あいうえお
◯★★あいうえお★★◯
”あいうえお”
1234567

Please resolve it, we want use your great TSynEdit in East Asia!
we can't use it for our customers now.

Martin Friebe

2015-03-22 02:29

manager   ~0082174

I did check the quote at the start of "a3"
'RIGHT DOUBLE QUOTATION MARK' (U+201D)

And it is marked as ambiguous. And my windows (vista/uk english, but with "eastern" fonts enabled/installed) draws this (and also the circle and star) as narrow (half width)

So if I change the hardcoded default in SynEdit, it will work for people with an OS optimized for East Asian text. But it will break for anyone using this on a "western" PC.

And worse according to the unicode standard, they depend on context. So the same OS could in the same text render them sometime wide, sometimes narrow (if I understand the doc correctly).

If you go to the reference editor (the one that does it correct). And you put latin chars around the ★◯”, are they still double width, or do the follow the context and become narrow?


Unfortunately I see no quick way.


---
You can always add them to SynEditTextDoubleWidthChars.
They will need PWidths^ := 2;
You need to translate the utf16 to utf8. That is unfortunately a bit of work.


But there are lots of ambiguous chars. Are they all shown fullwidth for you?
If not then this approach will only work, if there is a selection that will always have the correct width.


I would be willing to add an IFDEF (duplicating the entire proc, not splitting it into dozens of ifdef sections).
But I don't know the answer to "Which of the ambiguous chars"; nor the immediate time to do the work. But if you have a copy of the file that works for you, I will add it as IFDEF, then at least you need not keep patching the file on updates.

But if there was that IFDEF and you compile with it, and it runs on a European/Us PC, then it will look odd there.


---
There also is SynEditTextSystemCharWidth. It was experimental, and it is unfinished. (and win only)
But the API it uses is deprecated in the meantime. So it needs to be redone from scratch.
And that is not currently a priority. (It will need major rework)

Martin Friebe

2015-03-22 03:19

manager   ~0082175

I just did a test on the experimental code: ON my PC GetCharacterPlacementW does only return dx/caret for "Western" chars. Even though I can display "Eastern" too.

That means this function is not usable.

malcome

2015-03-22 10:15

reporter   ~0082179

Last edited: 2015-03-22 10:40

View 2 revisions

Yes, "ambiguous" is FULLWIDTH in East Asian Text Editor or Terminal(Like a Dos prompt, it means Fixed Pitch Matrix Type Editor).
The problem which often happens. (ex. https://bugs.kde.org/41744)
You have to check CJK from System Locale...probably...
I dont find good sample. sorry.

malcome

2015-03-22 10:27

reporter  

test2.png (79,682 bytes)   
test2.png (79,682 bytes)   

malcome

2015-03-22 10:35

reporter  

test3.png (54,008 bytes)   
test3.png (54,008 bytes)   

Martin Friebe

2015-03-22 14:42

manager   ~0082185

Last edited: 2015-03-22 14:45

View 2 revisions

IT is more than just locale.

I found that on my system (locale uk) it depends on the font.

Some fonts show it narrow, others wide.
Or probably more specific, some fonts (used for western chars) contain a glyph for it (and it is narrow).
Other fonts do not contain that glyph, and because I have "eastern" fonts installed, windows uses a fallback font.

Therefore it is not always all ambiguous chars. But depending on the font, different subsets.

Now the next best thing that comes to mind, is that each such char, needs to be measured once, before being used.

--
Of course forcing a narrow glyph to use double space, just leaves a gap, and is less bad than the current overlap. So as long as your app runs on East Asian PC only, that may be an option (for you).

As I said for that all of those must be added to the SynEditTextDoubleWidthChars.

---
It looks like the word processor draws the quote narrow? But never mind, for ease of edditing in monospaced you would desire it forced to double?

malcome

2015-03-22 22:06

reporter   ~0082190

Last edited: 2015-03-22 22:09

View 3 revisions

We use FULLWIDTH from age of DOS(MS C, Turbo Pascal, Borland C, etc).
Japanese Programers uses like a below code in Text Editor and Terminal...
(Those days char code is not UTF8, is shift-JIS or EUC.)

    PWidths^:=2;
    case Line^ of
      #$01..$7f: PWidths^:=1;
      #$ef: begin
        if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^:=1;
        if (Line[1] = #$be) and (Line[2] in [#$80..#$9f]) then PWidths^:=1;
      end;
    end;

So 'RIGHT DOUBLE QUOTATION MARK' (U+201D) is FULLWIDTH in Japanese Text Editor and Terminal even now.

Martin Friebe

2015-03-22 22:17

manager   ~0082191

Just a note on your code: PWidths[i] is only 2 (or 1) for the lead byte.

for
      #$80..#$BF: // continuation byte
        PWidths^ := 0;

malcome

2015-03-24 10:31

reporter  

synedittextdoublewidthchars.pas.patch (1,160 bytes)   
Index: components/synedit/synedittextdoublewidthchars.pas
===================================================================
--- components/synedit/synedittextdoublewidthchars.pas	(revision 48472)
+++ components/synedit/synedittextdoublewidthchars.pas	(working copy)
@@ -46,6 +46,9 @@
 
 
 implementation
+{$ifdef windows}
+uses Windows;
+{$endif}
 
 { SynEditTextDoubleWidthChars }
 
@@ -60,6 +63,31 @@
 
   dec(Line);
   dec(PWidths);
+
+  {$IFDEF Windows}
+  {$IF FPC_FULLVERSION>=20701}
+  if DefaultSystemCodePage = 932{Japanese} then
+  {$ELSE}
+  if GetACP = 932{Japanese} then
+  {$ENDIF}
+  begin
+    for i := 0 to LineLen - 1 do begin
+      inc(Line);
+      inc(PWidths);
+      if PWidths^ = 0 then continue;
+      PWidths^:=2;
+      case Line^ of
+        #$01..#$7F: PWidths^ := 1;
+        #$80..#$BF: PWidths^ := 0;
+        #$EF: begin
+          if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^ := 1;
+          if (Line[1] = #$be) and (Line[2] in [#$80..#$9f]) then PWidths^ := 1;
+        end;
+      end;
+    end
+  end else
+  {$ENDIF}
+
   for i := 0 to LineLen - 1 do begin
     inc(Line);
     inc(PWidths);

malcome

2015-03-24 10:39

reporter   ~0082271

Last edited: 2015-03-24 11:13

View 2 revisions

I create a patch.
This is enough in Japan. It's better than indicated narrow,
I do not know about China and Korea. And only Windows. Sorry.

Martin Friebe

2015-03-26 23:34

manager   ~0082357

Sorry I did not see the patch before.

The patch demonstrates, why I had not fixed it yet myself.

öäü <= Those and many other chars are incorrectly displayed as double widths. (Yes only on Japanese Codepage)

I agree: They will be very unlikely in Japanese text. But even on a PC with Japanese CodePage, a user may open a document in other languages.

-----------------------
The correct way is to:
1) Extract a list of all the chars that are ambiguous (and only those)
2) convert them to tuf8 (they are utf16 in the specs)
3) Create the correct "case" statements.

Martin Friebe

2015-03-26 23:47

manager   ~0082358

Last edited: 2015-03-26 23:48

View 2 revisions

Since your code introduces other problems, I do NOT consider it a fix.

But I acknowledge that in many cases the new errors are less of an issue than the original problem.

I have therefore added your code as workaround. (r48517)
But only in an IFDEF.

{$IFDEF SynForceDoubeWidthHack}

Compiling your IDE with the define, will activate your code.

-----------
For a correct (and unconditional) fix, see my last comment.

malcome

2015-03-26 23:56

reporter   ~0082359

I understand that it isn't perfect.
But this is enough as "IDE Source Editor" in Japan.
As "SynEdit Component",
If it can be customized freely for our users, I'm happy.

malcome

2015-03-27 00:01

reporter   ~0082360

Last edited: 2015-03-27 00:02

View 2 revisions

If you put a perfect table of course, it's welcomed.
I just wanted to be lazy.

Martin Friebe

2015-03-27 03:20

manager   ~0082361

Well at some future time, a better solution will be added.
Now, I just don't have the time.

Do-wan Kim

2015-03-27 07:31

reporter   ~0082363

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65408&number=128

More additions for Korean,

      #$ef: begin
        if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^:=1;
        if (Line[1] = #$be) and (Line[2] in [#$80..#$be]) then PWidths^:=1;
        if (Line[1] = #$bf) and (Line[2] in [#$80..#$9c]) then PWidths^:=1;
        if (Line[1] = #$bf) and (Line[2] in [#$A8..#$ae]) then PWidths^:=1;
      end;

Do-wan Kim

2015-03-27 23:25

reporter  

synedittextdoublewidthchars.pas-inc-ko.patch (1,181 bytes)   
Index: components/synedit/synedittextdoublewidthchars.pas
===================================================================
--- components/synedit/synedittextdoublewidthchars.pas	(revision 48524)
+++ components/synedit/synedittextdoublewidthchars.pas	(working copy)
@@ -64,9 +64,9 @@
 
   {$IFDEF SynForceDoubeWidthHack}
   {$IF FPC_FULLVERSION>=20701}
-  if DefaultSystemCodePage = 932{Japanese}
+  if (DefaultSystemCodePage = 932{Japanese}) or (DefaultSystemCodePage = 949{Korean})
   {$ELSE}
-  if GetACP = 932{Japanese}
+  if (GetACP = 932{Japanese}) or (GetACP = 949{Korean})
   {$ENDIF}
   then begin
     for i := 0 to LineLen - 1 do begin
@@ -79,7 +79,9 @@
         #$80..#$BF: PWidths^ := 0;
         #$EF: begin
           if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^ := 1;
-          if (Line[1] = #$be) and (Line[2] in [#$80..#$9f]) then PWidths^ := 1;
+          if (Line[1] = #$be) and (Line[2] in [#$80..#$be]) then PWidths^ := 1;
+          if (Line[1] = #$bf) and (Line[2] in [#$80..#$9c]) then PWidths^ := 1;
+          if (Line[1] = #$bf) and (Line[2] in [#$A8..#$ae]) then PWidths^ := 1;          
         end;
       end;
     end;

Martin Friebe

2015-03-28 00:16

manager   ~0082387

Sorry, but this is taking entirely the wrong direction.

The current Patch is totally wrong. Under the (apparently wrong) assumption, that it would be only for Japanese users, and not matter to much, I added in in IFDEF.

That said, I don *not* plan to spent time on maintaining code that is wrong.

The correct patch (if an codepage approach is taken), is

  HalfWidthSize := 1;
  if Codepage = ... then HalfWidthSize := 2;

  case
   ...
    #..: PWidths := HalfWidthSize; // for each hasfwidth char.

Do-wan Kim

2015-03-28 06:29

reporter  

synedittextdoublewidthchars.pas-nohack.patch (4,222 bytes)   
Index: components/synedit/synedittextdoublewidthchars.pas
===================================================================
--- components/synedit/synedittextdoublewidthchars.pas	(revision 48524)
+++ components/synedit/synedittextdoublewidthchars.pas	(working copy)
@@ -64,9 +64,9 @@
 
   {$IFDEF SynForceDoubeWidthHack}
   {$IF FPC_FULLVERSION>=20701}
-  if DefaultSystemCodePage = 932{Japanese}
+  if (DefaultSystemCodePage = 932{Japanese}) or (DefaultSystemCodePage = 949{Korean})
   {$ELSE}
-  if GetACP = 932{Japanese}
+  if (GetACP = 932{Japanese}) or (GetACP = 949{Korean})
   {$ENDIF}
   then begin
     for i := 0 to LineLen - 1 do begin
@@ -79,7 +79,9 @@
         #$80..#$BF: PWidths^ := 0;
         #$EF: begin
           if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^ := 1;
-          if (Line[1] = #$be) and (Line[2] in [#$80..#$9f]) then PWidths^ := 1;
+          if (Line[1] = #$be) and (Line[2] in [#$80..#$be]) then PWidths^ := 1;
+          if (Line[1] = #$bf) and (Line[2] in [#$80..#$9c]) then PWidths^ := 1;
+          if (Line[1] = #$bf) and (Line[2] in [#$A8..#$ae]) then PWidths^ := 1;          
         end;
       end;
     end;
@@ -95,15 +97,52 @@
     case Line^ of
       #$e1:
         case Line[1] of
-          #$84:
+          #$84,#$85,#$86:
             if (Line[2] >= #$80) then PWidths^ := 2;
-          #$85:
-            if (Line[2] <= #$9f) then PWidths^ := 2;
+          #$87:
+            if (Line[2] >= #$80) and (Line[2] <= #$b9) then PWidths^ := 2;
         end;
       #$e2:
         case Line[1] of
+          #$80:
+            if Line[2] in [#$a5,#$bb] then PWidths^ := 2;
+          #$81:
+            if Line[2] = #$b4 then PWidths^ := 2;
+          #$82:
+            if (Line[2] >= #$81) and (Line[2] <= #$84) then PWidths^ := 2;
+          #$84:
+            if Line[2] in [#$83,#$89,#$a1,#$ab] then PWidths^ := 2;
+          #$85:
+            if (Line[2] >= #$a0) and (Line[2] <= #$b9) then PWidths^ := 2;
+          #$86:
+            if (Line[2] >= #$96) and (Line[2] <= #$99) then PWidths^ := 2;
+          #$87:
+            if Line[2] in [#$92,#$94] then PWidths^ := 2;
+          #$88:
+            if Line[2] in [#$80,#$83,#$87,#$88,#$8b,#$9d,#$a0,#$a5,#$a7,#$a8,#$aa,#$ac,#$ae,#$b4,#$b5,#$bc,#$bd] then PWidths^ := 2;
+          #$89:
+            if Line[2] in [#$92,#$aa,#$ab] then PWidths^ := 2;
+          #$8a:
+            if Line[2] in [#$82,#$83,#$86,#$87,#$99,#$a5] then PWidths^ := 2;
           #$8c:
-            if (Line[2] = #$a9) or (Line[2] = #$aa) then PWidths^ := 2;
+            if (Line[2] = #$92) or (Line[2] = #$a9) or (Line[2] = #$aa) then PWidths^ := 2;
+          #$91:
+            if (Line[2] >= #$a0) and (Line[2] <= #$bf) then PWidths^ := 2;
+          #$92, #$93, #$94:
+            if (Line[2] >= #$80) and (Line[2] <= #$bf) then PWidths^ := 2;
+          #$95:
+            if (Line[2] >= #$80) and (Line[2] <= #$8b) then PWidths^ := 2;
+          #$96:
+            begin
+              if (Line[2] >= #$a3) and (Line[2] <= #$a9) then PWidths^ := 2;
+              if Line[2] in [#$b3,#$b6,#$b7,#$bd] then PWidths^ := 2;
+            end;
+          #$97:
+            if Line[2] in [#$80,#$81,#$86,#$87,#$88,#$8e,#$90,#$91] then PWidths^ := 2;
+          #$98:
+            if Line[2] in [#$85,#$86,#$8e,#$8f,#$9c,#$9e] then PWidths^ := 2;
+          #$99:
+            if Line[2] in [#$a1,#$a4,#$a7,#$a8,#$a9,#$ac,#$ad] then PWidths^ := 2;
           #$ba:
             if (Line[2] >= #$80) then PWidths^ := 2;
           #$bb..#$ff:
@@ -148,11 +187,10 @@
           #$93:
             if (Line[2] <= #$86) then PWidths^ := 2;
         end;
-      #$eb..#$ec:
+      #$eb..#$ec,#$ee:
         PWidths^ := 2;
       #$ed:
         if (Line[1] <= #$9e) or (Line[2] <= #$a3) then PWidths^ := 2;
-
       #$ef:
         case Line[1] of
           #$a4:
@@ -168,7 +206,7 @@
           #$bc:
             if (Line[2] >= #$81) then PWidths^ := 2;
           #$bd:
-            if (Line[2] <= #$a0) then PWidths^ := 2;
+            if (Line[2] >= #$80) and (Line[2] <= #$9e) then PWidths^ := 2;
           #$bf:
             if (Line[2] >= #$a0) and (Line[2] <= #$a6) then PWidths^ := 2;
         end;

malcome

2015-03-28 10:33

reporter   ~0082393

Do-wan, Thank you for your help.
Martin, His patch is no prob in Japan, of course.

Do-wan Kim

2015-03-31 03:24

reporter  

testfontwidth_tool.zip (177,102 bytes)

Do-wan Kim

2015-03-31 03:27

reporter   ~0082485

You're welcome, malcome.

Upload app source that I find double size chars.

Dmitry Boyarintsev

2015-04-12 06:44

developer  

cjkinfo2.zip (44,063 bytes)

Dmitry Boyarintsev

2015-04-12 06:46

developer   ~0082832

cjkinfo2.zip contains a project that's used to parse unicode data list 0000011 (http://www.unicode.org/reports/tr11/)

The project also generates pascal code array of character width information based on the file.
The pascal code array is used in cjkinfo.pas unit. The unit introduces a function:
GetCJKWidth

That's return the east asian width of a unicode character.

Martin Friebe

2015-04-12 19:45

manager   ~0082853

Added the cjkinfo patch (in the existing ifdef, instead of the previous ifdef)

http://forum.lazarus.freepascal.org/index.php/topic,27838.msg174309.html#msg174309

@Do-wan Kim
Thanks, but this info has to be determined inside SynEdit. The same char may be half or full with different fonts. Or half or full with the same font on different systems, or even more complex to determine.

The next step is to look at System API and find out more from there.

Issue History

Date Modified Username Field Change
2015-03-21 23:17 malcome New Issue
2015-03-21 23:17 malcome File Added: test.png
2015-03-22 00:34 Martin Friebe LazTarget => -
2015-03-22 00:34 Martin Friebe Note Added: 0082170
2015-03-22 00:34 Martin Friebe Assigned To => Martin Friebe
2015-03-22 00:34 Martin Friebe Status new => feedback
2015-03-22 00:38 Martin Friebe Note Added: 0082171
2015-03-22 00:55 malcome Note Added: 0082172
2015-03-22 00:55 malcome Status feedback => assigned
2015-03-22 02:29 Martin Friebe Note Added: 0082174
2015-03-22 03:19 Martin Friebe Note Added: 0082175
2015-03-22 10:15 malcome Note Added: 0082179
2015-03-22 10:27 malcome File Added: test2.png
2015-03-22 10:35 malcome File Added: test3.png
2015-03-22 10:40 malcome Note Edited: 0082179 View Revisions
2015-03-22 14:42 Martin Friebe Note Added: 0082185
2015-03-22 14:45 Martin Friebe Note Edited: 0082185 View Revisions
2015-03-22 22:06 malcome Note Added: 0082190
2015-03-22 22:07 malcome Note Edited: 0082190 View Revisions
2015-03-22 22:09 malcome Note Edited: 0082190 View Revisions
2015-03-22 22:17 Martin Friebe Note Added: 0082191
2015-03-24 10:31 malcome File Added: synedittextdoublewidthchars.pas.patch
2015-03-24 10:39 malcome Note Added: 0082271
2015-03-24 11:13 malcome Note Edited: 0082271 View Revisions
2015-03-26 23:34 Martin Friebe Note Added: 0082357
2015-03-26 23:34 Martin Friebe Status assigned => feedback
2015-03-26 23:47 Martin Friebe Note Added: 0082358
2015-03-26 23:48 Martin Friebe Note Edited: 0082358 View Revisions
2015-03-26 23:56 malcome Note Added: 0082359
2015-03-26 23:56 malcome Status feedback => assigned
2015-03-27 00:01 malcome Note Added: 0082360
2015-03-27 00:02 malcome Note Edited: 0082360 View Revisions
2015-03-27 03:20 Martin Friebe Note Added: 0082361
2015-03-27 07:31 Do-wan Kim Note Added: 0082363
2015-03-27 23:25 Do-wan Kim File Added: synedittextdoublewidthchars.pas-inc-ko.patch
2015-03-28 00:16 Martin Friebe Note Added: 0082387
2015-03-28 06:29 Do-wan Kim File Added: synedittextdoublewidthchars.pas-nohack.patch
2015-03-28 10:33 malcome Note Added: 0082393
2015-03-31 03:24 Do-wan Kim File Added: testfontwidth_tool.zip
2015-03-31 03:27 Do-wan Kim Note Added: 0082485
2015-04-12 06:44 Dmitry Boyarintsev File Added: cjkinfo2.zip
2015-04-12 06:46 Dmitry Boyarintsev Note Added: 0082832
2015-04-12 19:45 Martin Friebe Note Added: 0082853
2016-03-16 20:29 Juha Manninen Relationship added related to 0028540
2016-03-16 20:32 Juha Manninen Relationship added related to 0026369
2016-03-16 20:48 Juha Manninen Relationship added related to 0013374
2016-12-11 15:18 Juha Manninen Relationship added related to 0030478