If use ShowMessage() function if parameter is unicode words to show,
all next word always overlap with the words before in the "Source Editor".
but it is not any other problem in show at runtime result,
just can show correctly in runtime,
At the moment synedit only supports one width for all characters (monospace fonts).
Chinese characters are double wide characters, even in monospace fonts, so the current synedit paints them overlapping.
The best solution would be to extend synedit for proportional fonts, where each character can have it's own width (including fractionals). But this is a lot of work.
Another solution is to handle double wide characters.
I don't think there is a quick solution, except for the "extraCharSpacing" that Mattias already implemented (except it doesn't (always?) work? (see below))
Implementing Double-width first will be the best thing, it's easier than full proportional support, but still a lot of work.
(The work can be significantly reduced by doing it in a fashion similar how tabs (without "tab to space") work, but that would allow the cursor to go into the middle of a char (: )
---
As for the extra char spacing, it doesn't work on my box, it appears that the value is only used in synedit, but not in SynTextDrawer. All the extra space collects at the end of each token...
But more of an issue is that at least under windows Vista the chineese chars are not painted at all, if the windows function "SetTextCharacterExtra(FDC, Value)" has been used.
So it would be needed to change all char spacing to use the FETODist array.
I have a patch in preparation for the extra char spacing, but need feedback on the SetTextCharacterExtra issue first. (I will attach a preview, which disables SetTextCharacterExtra . If it is ok, then SetTextCharacterExtra can be removed completely)
Edit:
If the SetTextCharacterExtra patch needs more discussion, it should go in another bug.
I am happy to have the original issue kept in my list, but other priorities first.
I agree, that SetTextCharacterExtra should be avoided.
Why does the patch only disable TheTextDrawer.DoSetCharExtra instead of removing the code or using IFDEF SYN_LAZARUS?
Ok, I have created a patch for the original issue.
* Warning first: It's a hack, not a full fix. *
(No comments about the code quality, its the worst I ever wrote, but I am about to embark on a holiday, and you will not get anything better for at least some weeks, more likely some month, as I have other bugs on higher priority)
In order to use it, you must compile lazarus with -dDoubleChrWidthHack (once the patch is applied.
* It does display double width char correctly *
But allows the cursor to go into the middle of a char. (It is based on the code for tabs).
I have created a Japanese patch on latest release.( 0.9.26.1 )
fixjp09261.zip
* Display double width char correctly
* Carret(=Cursor) Position locate at correct position
* Handle IME(Character input method) On Windows
* Tested only Japanese Windows, so it may work on only Japanese Windows.
But some problems are still at issue,
Document of patch in zip file.
If possible maybe even separated by Display+Caret handling and the IME input.
(Maybe even open a new bug report for IME input, so it can be handled on it's own)
I will then see if I can merge it into the latest SVN.
It may be that some of the functions you modified have been replaced in latest SVN (but similar new ones may exist).
For example all CharWidth-related Functions have been moved into a new set of Units.
If you wish to help porting this to the latest SVN, let me know, and I can give you some pointers about the new Class-Layout.
1)I'll supply new patch by diffs in few days.
2)I open bug report about IME at #13140 (closed).
3)Please let me know about the new Class-Layout in latest svn.
There are several wrapper classes around TSynEditTextBuffer:
TSynEditTextTrimmer (Trailing spaces)
TSynEditTextTabExpander (Handling tab expansion)
TSynEditFoldedView
The 1st 2 are the important, between those there should be TSynEditTextDoubleWidthChar
Those Classes are now responsible to Physical to logical and back. To be able to do that they define a method function GetPhysicalCharWidths(const Line: String; Index: Integer): TPhysicalCharWidths; override;
it returns an array mapping a number to each byte in the Text of the line. (Yes thats byte not utf8 char)
- A char that occupies 1 display cell has a width of 1
- A multibyte utf8 char has a width of 1 on it's first byte, and 0 on the other bytes( since the Display space is allocated)
The 0 Width can also be used to find the Start of a logical char (but that is not needed currently)
- Tabs can have any width
- Chinese/Japanese Chars should have a width of 2 on the first byte of the UTF 8 sequence, 0 on the others (that needs to be done)
However this has only been introduced a few days ago (I had just started to work on this issue, last weekend), so I am currently working on those widths to be recognized throughout the remainder of SynEdit
---
Once a class providing those widths for Chinese/Japanese Chars the Rest can be based on it.
- I will have the display recognize this info
- The code you wrote can probably be moved easily into that class
(The important thing is to get rid of the need to transform the text into utf16 as it my initial patch)
- caret correction can be done on this info too.
Only trick may be, if some other users like the current behaviour of stepping into a tab kept. Because then caret correction must know the diff between Tab and DBLW char
I suggest you download a daily SVN snapshot to look at this.
Some more review of the attached code:
In TextDrawer you are dropping DistArray. This will break the editor for a lot of other people.
I may be mistaken, since I haven't looked at every single change, but it seems that in UTF8ColumnWidth (which presumable should return if a char is DoubleWidth?)
// In Japanese font, multi byte character width is double to single byte character widthlen:=UTF8CharacterStrictLengthiflen<=1thenResult:=lenelseResult:=2;
This may hold true, if you do Japanese only. But in UTF8 (looking at Japanese, Chinese, Arab, Europe, ....) multi-byte UTF 8 chars can be both, Double or Single Width.
Try putting an Umlaut or accented western char into the middle of Japanese text.
If I am wrong, I am looking forward to see the code that actually detects the Double/Single Char Width.
I will keep this bug open for a short time, in order to allow feedback if anything is still broken. (closing the bug may restrict feedback to the original reporter).
If you have issues with specific chars, please provide the UTF8 code, thanks.
Thank you, Martin.
Now I understand something from your note and new SVN source code.
My patch seems to be too old to be merged.
I agree my patch should be adapted old code, and only Japanese windows, as I noted.( UTF8ColumnWidth does work only Japanese Char, You are right, It should be replaced in international context)
Sorry, I have downloaded daily SVN just yesterday. Some functions should be replaced daily SVN functions and I must learn unicode coding for other languages.
I will test and begin working on daily SVN instead of latest release.
@Saeka-jp:
Thanks for your feedback. The new patch format is better, since it is easier for me to see what changed.
If you can extract the IME parts and attach to the other bug-report, that will be helpful.
I am sorry the part for this bug will not be needed any more => it just overlapped with my own fix to it. (The fix will be published in the Snapshot for 2009/02/12)