View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0031991 | Lazarus | IDE | public | 2017-06-09 19:20 | 2017-06-12 15:58 |
Reporter | CudaText man | Assigned To | Juha Manninen | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Platform | Ubuntu 16.4 gtk2 | ||||
Product Version | 1.9 (SVN) | ||||
Summary | 0031991: OI help area wrong for TCombobox.Style | ||||
Description | Picture shows that area shows wrong text, missed list of values [2 lists in UL-LI tags]. FPDoc shows OK text in its area. | ||||
Tags | No tags attached. | ||||
Fixed in Revision | r55307, r55319, r55325, r55329, r55336 | ||||
LazTarget | - | ||||
Widgetset | |||||
Attached Files |
|
|
|
|
Fixed, please test. I see you don't use TurboPowerIProDsgn package which gives a nice HTML rendering for code help in editor hints and in OI Infobox. Without it the text looks butt-ugly. |
|
Still not nice: too less endOfLines here: we need 5 styles of Combobox as 5 new lines. picture shows. |
|
|
|
The formatting is totally screwed without HTML rendering. Fortunately TurboPowerIProDsgn works on every platform and it is installed by default. It is now the "standard" way to look at code help. Does it work well for you? After my fix all list items from the original XML file are included, aren't they? If you want to improve text rendering without HTML, please look at function HTMLToCaption() in unit IDEHelpManager. It only strips the tags out and copies the text without any formatting. For most people this is a low priority issue because HTML rendering works well. If you plan to provide a patch then I can keep this issue open for a while. Otherwise it closes soon. The task is not trivial. The code must do partly the same things that a HTML parser + renderer already does. |
|
Another idea: there must be some "HTML to plain text" rendering engines out there. If you find one with a proper license we could integrate it. It cannot show graphs or different font sizes but it could render text as nicely as possible. Such code should not be very big. We don't want to bloat Lazarus with code that is almost never used. Remember, most people use the HTML rendering by provided by TurboPowerIProDsgn. [Edit] After thinking a little I realized that even the HTMLToCaption() function could be improved easily without implementing any state machine. Spaces could be removed after "p" tag, list items would force a newline etc... |
|
|
|
fix-html.diff (576 bytes)
Index: ide/idehelpmanager.pas =================================================================== --- ide/idehelpmanager.pas (revision 55311) +++ ide/idehelpmanager.pas (working copy) @@ -380,8 +380,14 @@ sp: LongInt; InHeader: Boolean; CurTagName: String; +const + cReplacerForLI = LineEnding+'<br> * '; begin Result:=s; + + Result:=StringReplace(Result, '<li>', cReplacerForLI, [rfReplaceAll]); + Result:=StringReplace(Result, '<LI>', cReplacerForLI, [rfReplaceAll]); + //debugln(['HTMLToCaption HTML="',Result,'"']); Line:=1; p:=1; |
|
Tks for note about HTMLToCaption. Added fix for LI tag. And picture shows result |
|
Actually HTMLToCaption() did more layouting than I remembered but it didn't work very well with lots of whitespace. I ended up making a proper parser / renderer after all in r55319. It is a general purpose class, not specific to the IDE help system, so I placed it in LazUtils package. Now I feel I wasted a lot of time. Something in parsers is pulling me. Damn! The parser is robust and can be easily extended. For example the attribute in <div class="title"> could be parsed and used. Please test. How does it work? |
|
You did not small work. good... Good is to use "const" param in Render(), AddOutput(); name param as "aStream" |
|
Wish: add property LineEnding (with default of OS LineEnding), to use #_10. |
|
Maybe it is slower, but good: delete HtmlEntity() and use simple post handling s:=StringReplace(s, '....', '<', [rfReplaceAll]); |
|
Why would StringReplace be good? It would be MUCH slower, you are right about that. Did you notice my renderer does not copy the same big memory areas many times, it copies char by char only once what is needed? LineEnding (with some other name) could be a useful property for somebody although not needed for the current use case. |
|
Parser works ok for me, for k OI properties result is good. |
|
Just to consider: Extracting text from html would be a simple exercise for the fasthtmlparser unit html2text; {$mode objfpc}{$H+} interface uses Classes, SysUtils; function ExtractTextFromHTML(const AHTMLText: String): String; implementation uses fasthtmlparser; type THTMLTextExtractor = class private FParser: THTMLParser; FText: String; protected procedure FoundTextHandler(AText: String); public constructor Create(AHTMLText: String); destructor Destroy; override; function Execute: String; end; constructor THTMLTextExtractor.Create(AHTMLText: String); begin FParser := THTMLParser.Create(AHTMLText); FParser.OnFoundText := @FoundTextHandler; end; destructor THTMLTextExtractor.Destroy; begin FParser.Free; inherited; end; function THTMLTextExtractor.Execute: String; begin FText := ''; FParser.Exec; Result := FText; end; procedure THTMLTextExtractor.FoundTextHandler(AText: String); begin if AText = '' then exit; // Remove multiple line breaks from text start if (AText[1] in [0000010, 0000013]) then begin while (AText <> '') and (AText[1] in [0000010, 0000013]) do Delete(AText, 1, 1); AText := LineEnding + AText; if AText = '' then exit; end; // ... and from text end if (AText[Length(AText)] in [0000010, 0000013]) then begin while (AText <> '') and (AText[Length(AText)] in [0000010, 0000013]) do Delete(AText, Length(AText), 1); AText := AText + LineEnding; if AText = '' then exit; end; FText := FText + AText; end; function ExtractTextFromHTML(const AHTMLText: String): String; var extractor: THTMLTextExtractor; begin extractor := THTMLTextExtractor.Create(AHTMLText); try Result := extractor.Execute; finally extractor.Free; end; end; end. |
|
+ 'A': // Link + Result:=AddOutput(' 👀'); + '/A': + Result:=AddOutput('👀 '); eye chars?? Must be a property and better '[]' chars, IMO |
|
Result:=AddOutput('🔹'); And prop for this char, pls. |
|
+ Result:=AddOutput('&'); // Entity not found, add just '&'. Need prop, and better "?" char. |
|
I removed the eyes and added a TitleMark property in r55329. Unicode Emojis give nice opportunities for layout. They are essentially graphics inside text. IMO '🔹' looks good with a title. '&' without entity is not legal HTML, but if one is encountered then it must be copied verbatim. Why would you change it to '?' If input is '&xxx', output must also be '&xxx' and not '?xxx'. @wp: Yes, I believe fasthtmlparser and SAX could be used. However the code does not only extract text from HTML, it also renders in within the confines of pure text output. To my surprise I did not find such code. My class is loosely based on the original HTMLToCaption() function by Mattias. The function copied large memory blocks repeatedly while removing tags and thus was slow with big HTML. I was kind of carried away when making an optimized class. BTW, your example code removes newlines but it should remove the excess spaces, too. Delete(AText, 1, 1) inside a big loop is butt-slow. :) I am resolving this issue. The code can be discussed on mailing list or forum. Patches can be added. |
|
rendr.diff (3,282 bytes)
Index: components/lazutils/html2textrender.pas =================================================================== --- components/lazutils/html2textrender.pas (revision 55332) +++ components/lazutils/html2textrender.pas (working copy) @@ -31,11 +31,17 @@ private fHTML, fOutput: string; fMaxLines: integer; - fLineEndMark: String; // End of line, by default std. "LineEnding". - fTitleMark: String; // Text at start and end of title text, by default Unicode graph. + fLineEndMark: String; // End of line, by default standard LineEnding + fTitleMark: String; // Text at start/end of title text: <div class="title">...</div> + fHorzLine: String; // Text for <hr> tag + fLinkBegin: String; // Text before link, <a href="..."> + fLinkEnd: String; // Text after link + fListItemMark: String; // Text for <li> items + fMoreMark: String; // Text to add if too many lines fInHeader, fInDivTitle: Boolean; fPendingSpace: Boolean; fPendingNewLineCnt: Integer; + fIndentSize: integer; // Increment (in spaces) for each nested HTML level fIndent: integer; fLineCnt, fHtmlLen: Integer; p: Integer; @@ -53,6 +59,12 @@ public property LineEndMark: String read fLineEndMark write fLineEndMark; property TitleMark: String read fTitleMark write fTitleMark; + property HorzLineMark: String read fHorzLine write fHorzLine; + property LinkBeginMark: String read fLinkBegin write fLinkBegin; + property LinkEndMark: String read fLinkEnd write fLinkEnd; + property ListItemMark: String read fListItemMark write fListItemMark; + property MoreMark: String read fMoreMark write fMoreMark; + property IndentSize: integer read fIndentSize write fIndentSize; end; implementation @@ -68,6 +80,12 @@ // These can be changed by user later. fLineEndMark:=LineEnding; fTitleMark:='🔹'; + fHorzLine:= '——————————————————'; + fLinkBegin:='_'; + fLinkEnd:='_'; + fListItemMark:='* '; + fMoreMark:='...'; + fIndentSize:=2; end; constructor THTML2TextRenderer.Create(const Stream: TStream); @@ -122,13 +140,13 @@ // Return False if max # of lines exceeded. if fLineCnt>fMaxLines then begin - fOutput:=fOutput+fLineEndMark+'...'; + fOutput:=fOutput+fLineEndMark+fMoreMark; Exit(False); end; end; if fPendingNewLineCnt>0 then begin - fOutput:=fOutput+StringOfChar(' ',fIndent*2); + fOutput:=fOutput+StringOfChar(' ',fIndent*fIndentSize); fPendingNewLineCnt:=0; end; fOutput:=fOutput+aText; @@ -211,18 +229,18 @@ Inc(fIndent); // Don't leave empty lines before list item (not sure if this is good) AddOneNewLine; - Result:=AddOutput('* '); + Result:=AddOutput(fListItemMark); end; '/LI': Dec(fIndent); 'A': // Link - Result:=AddOutput(' _'); + Result:=AddOutput(' '+fLinkBegin); '/A': - Result:=AddOutput('_ '); + Result:=AddOutput(fLinkEnd+' '); 'HR': begin AddOneNewLine; - Result:=AddOutput('——————————————————'); + Result:=AddOutput(fHorzLine); //AddOneNewLine; end; end; |
|
Made refac, 6 new properties, patch added. |
|
Applied, although I don't find some of the properties very useful. For example who would want to change the '...' at the end of truncated output? I renamed one propery as IndentStep. |
|
@Juha there is Unicode char for "3 dots". |
Date Modified | Username | Field | Change |
---|---|---|---|
2017-06-09 19:20 | CudaText man | New Issue | |
2017-06-09 19:20 | CudaText man | File Added: combo-doc-bad.png | |
2017-06-10 16:30 | Juha Manninen | Assigned To | => Juha Manninen |
2017-06-10 16:30 | Juha Manninen | Status | new => assigned |
2017-06-10 18:37 | Juha Manninen | Fixed in Revision | => r55307 |
2017-06-10 18:37 | Juha Manninen | LazTarget | => - |
2017-06-10 18:37 | Juha Manninen | Note Added: 0101002 | |
2017-06-10 18:37 | Juha Manninen | Status | assigned => resolved |
2017-06-10 18:37 | Juha Manninen | Resolution | open => fixed |
2017-06-10 18:47 | CudaText man | Note Added: 0101004 | |
2017-06-10 18:47 | CudaText man | Status | resolved => assigned |
2017-06-10 18:47 | CudaText man | Resolution | fixed => reopened |
2017-06-10 18:47 | CudaText man | File Added: infobox.png | |
2017-06-10 19:07 | Juha Manninen | Note Added: 0101005 | |
2017-06-10 19:11 | Juha Manninen | Note Edited: 0101005 | View Revisions |
2017-06-10 19:18 | Juha Manninen | Note Edited: 0101005 | View Revisions |
2017-06-10 19:29 | Juha Manninen | Note Added: 0101007 | |
2017-06-10 20:08 | Juha Manninen | Note Edited: 0101007 | View Revisions |
2017-06-10 20:08 | Juha Manninen | Note Edited: 0101007 | View Revisions |
2017-06-10 22:21 | CudaText man | File Added: fixed1.png | |
2017-06-10 22:22 | CudaText man | File Added: fix-html.diff | |
2017-06-10 22:22 | CudaText man | Note Added: 0101011 | |
2017-06-11 20:43 | Juha Manninen | Note Added: 0101025 | |
2017-06-11 21:06 | CudaText man | Note Added: 0101030 | |
2017-06-11 21:09 | CudaText man | Note Added: 0101031 | |
2017-06-11 21:09 | CudaText man | Note Edited: 0101031 | View Revisions |
2017-06-11 21:38 | CudaText man | Note Added: 0101034 | |
2017-06-11 21:39 | CudaText man | Note Edited: 0101034 | View Revisions |
2017-06-11 22:21 | Juha Manninen | Note Added: 0101037 | |
2017-06-12 06:33 | CudaText man | Note Added: 0101043 | |
2017-06-12 09:47 | wp | Note Added: 0101046 | |
2017-06-12 10:55 | CudaText man | Note Added: 0101052 | |
2017-06-12 10:56 | CudaText man | Note Added: 0101053 | |
2017-06-12 10:58 | CudaText man | Note Added: 0101055 | |
2017-06-12 10:59 | CudaText man | Note Edited: 0101055 | View Revisions |
2017-06-12 11:43 | Juha Manninen | Note Added: 0101057 | |
2017-06-12 11:51 | Juha Manninen | Note Edited: 0101057 | View Revisions |
2017-06-12 11:54 | Juha Manninen | Note Edited: 0101057 | View Revisions |
2017-06-12 11:55 | Juha Manninen | Fixed in Revision | r55307 => r55307, r55319, r55329 |
2017-06-12 11:55 | Juha Manninen | Status | assigned => resolved |
2017-06-12 11:55 | Juha Manninen | Resolution | reopened => fixed |
2017-06-12 11:56 | Juha Manninen | Fixed in Revision | r55307, r55319, r55329 => r55307, r55319, r55325, r55329 |
2017-06-12 12:12 | Juha Manninen | Note Edited: 0101057 | View Revisions |
2017-06-12 12:18 | Juha Manninen | Note Edited: 0101057 | View Revisions |
2017-06-12 12:21 | Juha Manninen | Note Edited: 0101057 | View Revisions |
2017-06-12 13:10 | CudaText man | File Added: rendr.diff | |
2017-06-12 13:10 | CudaText man | Note Added: 0101058 | |
2017-06-12 13:11 | CudaText man | Status | resolved => assigned |
2017-06-12 13:11 | CudaText man | Resolution | fixed => reopened |
2017-06-12 14:22 | Juha Manninen | Fixed in Revision | r55307, r55319, r55325, r55329 => r55307, r55319, r55325, r55329, r55336 |
2017-06-12 14:22 | Juha Manninen | Note Added: 0101063 | |
2017-06-12 14:22 | Juha Manninen | Status | assigned => resolved |
2017-06-12 14:22 | Juha Manninen | Resolution | reopened => fixed |
2017-06-12 15:58 | CudaText man | Note Added: 0101064 |