View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0027707 | Lazarus | IDE | public | 2015-03-21 23:17 | 2016-12-11 15:18 |
Reporter | malcome | Assigned To | Martin Friebe | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | assigned | Resolution | open | ||
Platform | Windows | ||||
Product Version | 1.4RC2 | ||||
Summary | 0027707: Source Editor(TSynEdit) doesn't respect "East Asian Width". | ||||
Description | See attached file. Ref. East Asian Width - http://www.unicode.org/reports/tr11/ | ||||
Tags | No tags attached. | ||||
Fixed in Revision | |||||
LazTarget | - | ||||
Widgetset | Win32/Win64 | ||||
Attached Files |
|
related to | 0028540 | new | TMemo will cause crash in ATOK (Japanese input method editor) | |
related to | 0026369 | new | uim and synedit. | |
related to | 0013374 | assigned | Dmitry Boyarintsev | Cannot input Japanese character in IDE(SynEdit: GTK2, Carbon) |
related to | 0030478 | resolved | Juha Manninen | Cannot input chinese character |
|
|
|
Please attach a file containing the text, that you show in your screenshot, Also specify, which font you use in your editor. |
|
Also note, if any of those are "ambiguous": http://www.unicode.org/reports/tr11/#Ambiguous Then this is a known issue. To resolve ambiguous ones, SynEdit will need to use external unicode libraries (or ask the OS, if the OS provides the info). This will not be solved soon. |
|
Is below OK? あいうえお ◯★★あいうえお★★◯ ”あいうえお” 1234567 Please resolve it, we want use your great TSynEdit in East Asia! we can't use it for our customers now. |
|
I did check the quote at the start of "a3" 'RIGHT DOUBLE QUOTATION MARK' (U+201D) And it is marked as ambiguous. And my windows (vista/uk english, but with "eastern" fonts enabled/installed) draws this (and also the circle and star) as narrow (half width) So if I change the hardcoded default in SynEdit, it will work for people with an OS optimized for East Asian text. But it will break for anyone using this on a "western" PC. And worse according to the unicode standard, they depend on context. So the same OS could in the same text render them sometime wide, sometimes narrow (if I understand the doc correctly). If you go to the reference editor (the one that does it correct). And you put latin chars around the ★◯”, are they still double width, or do the follow the context and become narrow? Unfortunately I see no quick way. --- You can always add them to SynEditTextDoubleWidthChars. They will need PWidths^ := 2; You need to translate the utf16 to utf8. That is unfortunately a bit of work. But there are lots of ambiguous chars. Are they all shown fullwidth for you? If not then this approach will only work, if there is a selection that will always have the correct width. I would be willing to add an IFDEF (duplicating the entire proc, not splitting it into dozens of ifdef sections). But I don't know the answer to "Which of the ambiguous chars"; nor the immediate time to do the work. But if you have a copy of the file that works for you, I will add it as IFDEF, then at least you need not keep patching the file on updates. But if there was that IFDEF and you compile with it, and it runs on a European/Us PC, then it will look odd there. --- There also is SynEditTextSystemCharWidth. It was experimental, and it is unfinished. (and win only) But the API it uses is deprecated in the meantime. So it needs to be redone from scratch. And that is not currently a priority. (It will need major rework) |
|
I just did a test on the experimental code: ON my PC GetCharacterPlacementW does only return dx/caret for "Western" chars. Even though I can display "Eastern" too. That means this function is not usable. |
|
Yes, "ambiguous" is FULLWIDTH in East Asian Text Editor or Terminal(Like a Dos prompt, it means Fixed Pitch Matrix Type Editor). The problem which often happens. (ex. https://bugs.kde.org/41744) You have to check CJK from System Locale...probably... I dont find good sample. sorry. |
|
|
|
|
|
IT is more than just locale. I found that on my system (locale uk) it depends on the font. Some fonts show it narrow, others wide. Or probably more specific, some fonts (used for western chars) contain a glyph for it (and it is narrow). Other fonts do not contain that glyph, and because I have "eastern" fonts installed, windows uses a fallback font. Therefore it is not always all ambiguous chars. But depending on the font, different subsets. Now the next best thing that comes to mind, is that each such char, needs to be measured once, before being used. -- Of course forcing a narrow glyph to use double space, just leaves a gap, and is less bad than the current overlap. So as long as your app runs on East Asian PC only, that may be an option (for you). As I said for that all of those must be added to the SynEditTextDoubleWidthChars. --- It looks like the word processor draws the quote narrow? But never mind, for ease of edditing in monospaced you would desire it forced to double? |
|
We use FULLWIDTH from age of DOS(MS C, Turbo Pascal, Borland C, etc). Japanese Programers uses like a below code in Text Editor and Terminal... (Those days char code is not UTF8, is shift-JIS or EUC.) PWidths^:=2; case Line^ of #$01..$7f: PWidths^:=1; #$ef: begin if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^:=1; if (Line[1] = #$be) and (Line[2] in [#$80..#$9f]) then PWidths^:=1; end; end; So 'RIGHT DOUBLE QUOTATION MARK' (U+201D) is FULLWIDTH in Japanese Text Editor and Terminal even now. |
|
Just a note on your code: PWidths[i] is only 2 (or 1) for the lead byte. for #$80..#$BF: // continuation byte PWidths^ := 0; |
|
synedittextdoublewidthchars.pas.patch (1,160 bytes)
Index: components/synedit/synedittextdoublewidthchars.pas =================================================================== --- components/synedit/synedittextdoublewidthchars.pas (revision 48472) +++ components/synedit/synedittextdoublewidthchars.pas (working copy) @@ -46,6 +46,9 @@ implementation +{$ifdef windows} +uses Windows; +{$endif} { SynEditTextDoubleWidthChars } @@ -60,6 +63,31 @@ dec(Line); dec(PWidths); + + {$IFDEF Windows} + {$IF FPC_FULLVERSION>=20701} + if DefaultSystemCodePage = 932{Japanese} then + {$ELSE} + if GetACP = 932{Japanese} then + {$ENDIF} + begin + for i := 0 to LineLen - 1 do begin + inc(Line); + inc(PWidths); + if PWidths^ = 0 then continue; + PWidths^:=2; + case Line^ of + #$01..#$7F: PWidths^ := 1; + #$80..#$BF: PWidths^ := 0; + #$EF: begin + if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^ := 1; + if (Line[1] = #$be) and (Line[2] in [#$80..#$9f]) then PWidths^ := 1; + end; + end; + end + end else + {$ENDIF} + for i := 0 to LineLen - 1 do begin inc(Line); inc(PWidths); |
|
I create a patch. This is enough in Japan. It's better than indicated narrow, I do not know about China and Korea. And only Windows. Sorry. |
|
Sorry I did not see the patch before. The patch demonstrates, why I had not fixed it yet myself. öäü <= Those and many other chars are incorrectly displayed as double widths. (Yes only on Japanese Codepage) I agree: They will be very unlikely in Japanese text. But even on a PC with Japanese CodePage, a user may open a document in other languages. ----------------------- The correct way is to: 1) Extract a list of all the chars that are ambiguous (and only those) 2) convert them to tuf8 (they are utf16 in the specs) 3) Create the correct "case" statements. |
|
Since your code introduces other problems, I do NOT consider it a fix. But I acknowledge that in many cases the new errors are less of an issue than the original problem. I have therefore added your code as workaround. (r48517) But only in an IFDEF. {$IFDEF SynForceDoubeWidthHack} Compiling your IDE with the define, will activate your code. ----------- For a correct (and unconditional) fix, see my last comment. |
|
I understand that it isn't perfect. But this is enough as "IDE Source Editor" in Japan. As "SynEdit Component", If it can be customized freely for our users, I'm happy. |
|
If you put a perfect table of course, it's welcomed. I just wanted to be lazy. |
|
Well at some future time, a better solution will be added. Now, I just don't have the time. |
|
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65408&number=128 More additions for Korean, #$ef: begin if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^:=1; if (Line[1] = #$be) and (Line[2] in [#$80..#$be]) then PWidths^:=1; if (Line[1] = #$bf) and (Line[2] in [#$80..#$9c]) then PWidths^:=1; if (Line[1] = #$bf) and (Line[2] in [#$A8..#$ae]) then PWidths^:=1; end; |
|
synedittextdoublewidthchars.pas-inc-ko.patch (1,181 bytes)
Index: components/synedit/synedittextdoublewidthchars.pas =================================================================== --- components/synedit/synedittextdoublewidthchars.pas (revision 48524) +++ components/synedit/synedittextdoublewidthchars.pas (working copy) @@ -64,9 +64,9 @@ {$IFDEF SynForceDoubeWidthHack} {$IF FPC_FULLVERSION>=20701} - if DefaultSystemCodePage = 932{Japanese} + if (DefaultSystemCodePage = 932{Japanese}) or (DefaultSystemCodePage = 949{Korean}) {$ELSE} - if GetACP = 932{Japanese} + if (GetACP = 932{Japanese}) or (GetACP = 949{Korean}) {$ENDIF} then begin for i := 0 to LineLen - 1 do begin @@ -79,7 +79,9 @@ #$80..#$BF: PWidths^ := 0; #$EF: begin if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^ := 1; - if (Line[1] = #$be) and (Line[2] in [#$80..#$9f]) then PWidths^ := 1; + if (Line[1] = #$be) and (Line[2] in [#$80..#$be]) then PWidths^ := 1; + if (Line[1] = #$bf) and (Line[2] in [#$80..#$9c]) then PWidths^ := 1; + if (Line[1] = #$bf) and (Line[2] in [#$A8..#$ae]) then PWidths^ := 1; end; end; end; |
|
Sorry, but this is taking entirely the wrong direction. The current Patch is totally wrong. Under the (apparently wrong) assumption, that it would be only for Japanese users, and not matter to much, I added in in IFDEF. That said, I don *not* plan to spent time on maintaining code that is wrong. The correct patch (if an codepage approach is taken), is HalfWidthSize := 1; if Codepage = ... then HalfWidthSize := 2; case ... #..: PWidths := HalfWidthSize; // for each hasfwidth char. |
|
synedittextdoublewidthchars.pas-nohack.patch (4,222 bytes)
Index: components/synedit/synedittextdoublewidthchars.pas =================================================================== --- components/synedit/synedittextdoublewidthchars.pas (revision 48524) +++ components/synedit/synedittextdoublewidthchars.pas (working copy) @@ -64,9 +64,9 @@ {$IFDEF SynForceDoubeWidthHack} {$IF FPC_FULLVERSION>=20701} - if DefaultSystemCodePage = 932{Japanese} + if (DefaultSystemCodePage = 932{Japanese}) or (DefaultSystemCodePage = 949{Korean}) {$ELSE} - if GetACP = 932{Japanese} + if (GetACP = 932{Japanese}) or (GetACP = 949{Korean}) {$ENDIF} then begin for i := 0 to LineLen - 1 do begin @@ -79,7 +79,9 @@ #$80..#$BF: PWidths^ := 0; #$EF: begin if (Line[1] = #$bd) and (Line[2] in [#$A1..#$bf]) then PWidths^ := 1; - if (Line[1] = #$be) and (Line[2] in [#$80..#$9f]) then PWidths^ := 1; + if (Line[1] = #$be) and (Line[2] in [#$80..#$be]) then PWidths^ := 1; + if (Line[1] = #$bf) and (Line[2] in [#$80..#$9c]) then PWidths^ := 1; + if (Line[1] = #$bf) and (Line[2] in [#$A8..#$ae]) then PWidths^ := 1; end; end; end; @@ -95,15 +97,52 @@ case Line^ of #$e1: case Line[1] of - #$84: + #$84,#$85,#$86: if (Line[2] >= #$80) then PWidths^ := 2; - #$85: - if (Line[2] <= #$9f) then PWidths^ := 2; + #$87: + if (Line[2] >= #$80) and (Line[2] <= #$b9) then PWidths^ := 2; end; #$e2: case Line[1] of + #$80: + if Line[2] in [#$a5,#$bb] then PWidths^ := 2; + #$81: + if Line[2] = #$b4 then PWidths^ := 2; + #$82: + if (Line[2] >= #$81) and (Line[2] <= #$84) then PWidths^ := 2; + #$84: + if Line[2] in [#$83,#$89,#$a1,#$ab] then PWidths^ := 2; + #$85: + if (Line[2] >= #$a0) and (Line[2] <= #$b9) then PWidths^ := 2; + #$86: + if (Line[2] >= #$96) and (Line[2] <= #$99) then PWidths^ := 2; + #$87: + if Line[2] in [#$92,#$94] then PWidths^ := 2; + #$88: + if Line[2] in [#$80,#$83,#$87,#$88,#$8b,#$9d,#$a0,#$a5,#$a7,#$a8,#$aa,#$ac,#$ae,#$b4,#$b5,#$bc,#$bd] then PWidths^ := 2; + #$89: + if Line[2] in [#$92,#$aa,#$ab] then PWidths^ := 2; + #$8a: + if Line[2] in [#$82,#$83,#$86,#$87,#$99,#$a5] then PWidths^ := 2; #$8c: - if (Line[2] = #$a9) or (Line[2] = #$aa) then PWidths^ := 2; + if (Line[2] = #$92) or (Line[2] = #$a9) or (Line[2] = #$aa) then PWidths^ := 2; + #$91: + if (Line[2] >= #$a0) and (Line[2] <= #$bf) then PWidths^ := 2; + #$92, #$93, #$94: + if (Line[2] >= #$80) and (Line[2] <= #$bf) then PWidths^ := 2; + #$95: + if (Line[2] >= #$80) and (Line[2] <= #$8b) then PWidths^ := 2; + #$96: + begin + if (Line[2] >= #$a3) and (Line[2] <= #$a9) then PWidths^ := 2; + if Line[2] in [#$b3,#$b6,#$b7,#$bd] then PWidths^ := 2; + end; + #$97: + if Line[2] in [#$80,#$81,#$86,#$87,#$88,#$8e,#$90,#$91] then PWidths^ := 2; + #$98: + if Line[2] in [#$85,#$86,#$8e,#$8f,#$9c,#$9e] then PWidths^ := 2; + #$99: + if Line[2] in [#$a1,#$a4,#$a7,#$a8,#$a9,#$ac,#$ad] then PWidths^ := 2; #$ba: if (Line[2] >= #$80) then PWidths^ := 2; #$bb..#$ff: @@ -148,11 +187,10 @@ #$93: if (Line[2] <= #$86) then PWidths^ := 2; end; - #$eb..#$ec: + #$eb..#$ec,#$ee: PWidths^ := 2; #$ed: if (Line[1] <= #$9e) or (Line[2] <= #$a3) then PWidths^ := 2; - #$ef: case Line[1] of #$a4: @@ -168,7 +206,7 @@ #$bc: if (Line[2] >= #$81) then PWidths^ := 2; #$bd: - if (Line[2] <= #$a0) then PWidths^ := 2; + if (Line[2] >= #$80) and (Line[2] <= #$9e) then PWidths^ := 2; #$bf: if (Line[2] >= #$a0) and (Line[2] <= #$a6) then PWidths^ := 2; end; |
|
Do-wan, Thank you for your help. Martin, His patch is no prob in Japan, of course. |
|
|
|
You're welcome, malcome. Upload app source that I find double size chars. |
|
|
|
cjkinfo2.zip contains a project that's used to parse unicode data list 0000011 (http://www.unicode.org/reports/tr11/) The project also generates pascal code array of character width information based on the file. The pascal code array is used in cjkinfo.pas unit. The unit introduces a function: GetCJKWidth That's return the east asian width of a unicode character. |
|
Added the cjkinfo patch (in the existing ifdef, instead of the previous ifdef) http://forum.lazarus.freepascal.org/index.php/topic,27838.msg174309.html#msg174309 @Do-wan Kim Thanks, but this info has to be determined inside SynEdit. The same char may be half or full with different fonts. Or half or full with the same font on different systems, or even more complex to determine. The next step is to look at System API and find out more from there. |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-03-21 23:17 | malcome | New Issue | |
2015-03-21 23:17 | malcome | File Added: test.png | |
2015-03-22 00:34 | Martin Friebe | LazTarget | => - |
2015-03-22 00:34 | Martin Friebe | Note Added: 0082170 | |
2015-03-22 00:34 | Martin Friebe | Assigned To | => Martin Friebe |
2015-03-22 00:34 | Martin Friebe | Status | new => feedback |
2015-03-22 00:38 | Martin Friebe | Note Added: 0082171 | |
2015-03-22 00:55 | malcome | Note Added: 0082172 | |
2015-03-22 00:55 | malcome | Status | feedback => assigned |
2015-03-22 02:29 | Martin Friebe | Note Added: 0082174 | |
2015-03-22 03:19 | Martin Friebe | Note Added: 0082175 | |
2015-03-22 10:15 | malcome | Note Added: 0082179 | |
2015-03-22 10:27 | malcome | File Added: test2.png | |
2015-03-22 10:35 | malcome | File Added: test3.png | |
2015-03-22 10:40 | malcome | Note Edited: 0082179 | View Revisions |
2015-03-22 14:42 | Martin Friebe | Note Added: 0082185 | |
2015-03-22 14:45 | Martin Friebe | Note Edited: 0082185 | View Revisions |
2015-03-22 22:06 | malcome | Note Added: 0082190 | |
2015-03-22 22:07 | malcome | Note Edited: 0082190 | View Revisions |
2015-03-22 22:09 | malcome | Note Edited: 0082190 | View Revisions |
2015-03-22 22:17 | Martin Friebe | Note Added: 0082191 | |
2015-03-24 10:31 | malcome | File Added: synedittextdoublewidthchars.pas.patch | |
2015-03-24 10:39 | malcome | Note Added: 0082271 | |
2015-03-24 11:13 | malcome | Note Edited: 0082271 | View Revisions |
2015-03-26 23:34 | Martin Friebe | Note Added: 0082357 | |
2015-03-26 23:34 | Martin Friebe | Status | assigned => feedback |
2015-03-26 23:47 | Martin Friebe | Note Added: 0082358 | |
2015-03-26 23:48 | Martin Friebe | Note Edited: 0082358 | View Revisions |
2015-03-26 23:56 | malcome | Note Added: 0082359 | |
2015-03-26 23:56 | malcome | Status | feedback => assigned |
2015-03-27 00:01 | malcome | Note Added: 0082360 | |
2015-03-27 00:02 | malcome | Note Edited: 0082360 | View Revisions |
2015-03-27 03:20 | Martin Friebe | Note Added: 0082361 | |
2015-03-27 07:31 | Do-wan Kim | Note Added: 0082363 | |
2015-03-27 23:25 | Do-wan Kim | File Added: synedittextdoublewidthchars.pas-inc-ko.patch | |
2015-03-28 00:16 | Martin Friebe | Note Added: 0082387 | |
2015-03-28 06:29 | Do-wan Kim | File Added: synedittextdoublewidthchars.pas-nohack.patch | |
2015-03-28 10:33 | malcome | Note Added: 0082393 | |
2015-03-31 03:24 | Do-wan Kim | File Added: testfontwidth_tool.zip | |
2015-03-31 03:27 | Do-wan Kim | Note Added: 0082485 | |
2015-04-12 06:44 | Dmitry Boyarintsev | File Added: cjkinfo2.zip | |
2015-04-12 06:46 | Dmitry Boyarintsev | Note Added: 0082832 | |
2015-04-12 19:45 | Martin Friebe | Note Added: 0082853 | |
2016-03-16 20:29 | Juha Manninen | Relationship added | related to 0028540 |
2016-03-16 20:32 | Juha Manninen | Relationship added | related to 0026369 |
2016-03-16 20:48 | Juha Manninen | Relationship added | related to 0013374 |
2016-12-11 15:18 | Juha Manninen | Relationship added | related to 0030478 |