View Issue Details

IDProjectCategoryView StatusLast Update
0031991LazarusIDEpublic2017-06-12 15:58
ReporterCudaText man Assigned ToJuha Manninen  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
PlatformUbuntu 16.4 gtk2 
Product Version1.9 (SVN) 
Summary0031991: OI help area wrong for TCombobox.Style
DescriptionPicture shows that area shows wrong text, missed list of values [2 lists in UL-LI tags].

FPDoc shows OK text in its area.
TagsNo tags attached.
Fixed in Revisionr55307, r55319, r55325, r55329, r55336
LazTarget-
Widgetset
Attached Files

Activities

CudaText man

2017-06-09 19:20

reporter  

combo-doc-bad.png (177,600 bytes)   
combo-doc-bad.png (177,600 bytes)   

Juha Manninen

2017-06-10 18:37

developer   ~0101002

Fixed, please test.

I see you don't use TurboPowerIProDsgn package which gives a nice HTML rendering for code help in editor hints and in OI Infobox.
Without it the text looks butt-ugly.

CudaText man

2017-06-10 18:47

reporter   ~0101004

Still not nice: too less endOfLines here:
we need 5 styles of Combobox as 5 new lines. picture shows.

CudaText man

2017-06-10 18:47

reporter  

infobox.png (72,209 bytes)   
infobox.png (72,209 bytes)   

Juha Manninen

2017-06-10 19:07

developer   ~0101005

Last edited: 2017-06-10 19:18

View 3 revisions

The formatting is totally screwed without HTML rendering. Fortunately TurboPowerIProDsgn works on every platform and it is installed by default.
It is now the "standard" way to look at code help.
Does it work well for you?

After my fix all list items from the original XML file are included, aren't they?
If you want to improve text rendering without HTML, please look at function HTMLToCaption() in unit IDEHelpManager.
It only strips the tags out and copies the text without any formatting.
For most people this is a low priority issue because HTML rendering works well.

If you plan to provide a patch then I can keep this issue open for a while. Otherwise it closes soon.
The task is not trivial. The code must do partly the same things that a HTML parser + renderer already does.

Juha Manninen

2017-06-10 19:29

developer   ~0101007

Last edited: 2017-06-10 20:08

View 3 revisions

Another idea: there must be some "HTML to plain text" rendering engines out there. If you find one with a proper license we could integrate it.
It cannot show graphs or different font sizes but it could render text as nicely as possible.
Such code should not be very big. We don't want to bloat Lazarus with code that is almost never used. Remember, most people use the HTML rendering by provided by TurboPowerIProDsgn.

[Edit]
After thinking a little I realized that even the HTMLToCaption() function could be improved easily without implementing any state machine.
Spaces could be removed after "p" tag, list items would force a newline etc...

CudaText man

2017-06-10 22:21

reporter  

fixed1.png (73,335 bytes)   
fixed1.png (73,335 bytes)   

CudaText man

2017-06-10 22:22

reporter  

fix-html.diff (576 bytes)   
Index: ide/idehelpmanager.pas
===================================================================
--- ide/idehelpmanager.pas	(revision 55311)
+++ ide/idehelpmanager.pas	(working copy)
@@ -380,8 +380,14 @@
   sp: LongInt;
   InHeader: Boolean;
   CurTagName: String;
+const
+  cReplacerForLI = LineEnding+'<br>&nbsp;*&nbsp;';  
 begin
   Result:=s;
+  
+  Result:=StringReplace(Result, '<li>', cReplacerForLI, [rfReplaceAll]);
+  Result:=StringReplace(Result, '<LI>', cReplacerForLI, [rfReplaceAll]);
+  
   //debugln(['HTMLToCaption HTML="',Result,'"']);
   Line:=1;
   p:=1;
fix-html.diff (576 bytes)   

CudaText man

2017-06-10 22:22

reporter   ~0101011

Tks for note about HTMLToCaption. Added fix for LI tag. And picture shows result

Juha Manninen

2017-06-11 20:43

developer   ~0101025

Actually HTMLToCaption() did more layouting than I remembered but it didn't work very well with lots of whitespace.
I ended up making a proper parser / renderer after all in r55319.
It is a general purpose class, not specific to the IDE help system, so I placed it in LazUtils package.
Now I feel I wasted a lot of time. Something in parsers is pulling me. Damn!

The parser is robust and can be easily extended. For example the attribute in <div class="title"> could be parsed and used.

Please test. How does it work?

CudaText man

2017-06-11 21:06

reporter   ~0101030

You did not small work. good...
Good is to use "const" param in Render(), AddOutput();
name param as "aStream"

CudaText man

2017-06-11 21:09

reporter   ~0101031

Last edited: 2017-06-11 21:09

View 2 revisions

Wish: add property LineEnding (with default of OS LineEnding), to use #_10.

CudaText man

2017-06-11 21:38

reporter   ~0101034

Last edited: 2017-06-11 21:39

View 2 revisions

Maybe it is slower, but good:
delete HtmlEntity() and use simple post handling

s:=StringReplace(s, '....', '<', [rfReplaceAll]);

Juha Manninen

2017-06-11 22:21

developer   ~0101037

Why would StringReplace be good? It would be MUCH slower, you are right about that.
Did you notice my renderer does not copy the same big memory areas many times, it copies char by char only once what is needed?

LineEnding (with some other name) could be a useful property for somebody although not needed for the current use case.

CudaText man

2017-06-12 06:33

reporter   ~0101043

Parser works ok for me, for k OI properties result is good.

wp

2017-06-12 09:47

developer   ~0101046

Just to consider: Extracting text from html would be a simple exercise for the fasthtmlparser


unit html2text;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils;

function ExtractTextFromHTML(const AHTMLText: String): String;

implementation

uses
  fasthtmlparser;

type
  THTMLTextExtractor = class
  private
    FParser: THTMLParser;
    FText: String;
  protected
    procedure FoundTextHandler(AText: String);
  public
    constructor Create(AHTMLText: String);
    destructor Destroy; override;
    function Execute: String;
  end;

constructor THTMLTextExtractor.Create(AHTMLText: String);
begin
  FParser := THTMLParser.Create(AHTMLText);
  FParser.OnFoundText := @FoundTextHandler;
end;

destructor THTMLTextExtractor.Destroy;
begin
  FParser.Free;
  inherited;
end;

function THTMLTextExtractor.Execute: String;
begin
  FText := '';
  FParser.Exec;
  Result := FText;
end;

procedure THTMLTextExtractor.FoundTextHandler(AText: String);
begin
  if AText = '' then
    exit;

  // Remove multiple line breaks from text start
  if (AText[1] in [0000010, 0000013]) then begin
    while (AText <> '') and (AText[1] in [0000010, 0000013]) do
      Delete(AText, 1, 1);
    AText := LineEnding + AText;
    if AText = '' then
      exit;
  end;

  // ... and from text end
  if (AText[Length(AText)] in [0000010, 0000013]) then begin
    while (AText <> '') and (AText[Length(AText)] in [0000010, 0000013]) do
      Delete(AText, Length(AText), 1);
    AText := AText + LineEnding;
    if AText = '' then
      exit;
  end;

  FText := FText + AText;
end;

function ExtractTextFromHTML(const AHTMLText: String): String;
var
  extractor: THTMLTextExtractor;
begin
  extractor := THTMLTextExtractor.Create(AHTMLText);
  try
    Result := extractor.Execute;
  finally
    extractor.Free;
  end;
end;


end.

CudaText man

2017-06-12 10:55

reporter   ~0101052

+ 'A': // Link
+ Result:=AddOutput(' 👀');
+ '/A':
+ Result:=AddOutput('👀 ');
eye chars?? Must be a property and better '[]' chars, IMO

CudaText man

2017-06-12 10:56

reporter   ~0101053

Result:=AddOutput('🔹');
And prop for this char, pls.

CudaText man

2017-06-12 10:58

reporter   ~0101055

Last edited: 2017-06-12 10:59

View 2 revisions

+ Result:=AddOutput('&'); // Entity not found, add just '&'.
Need prop, and better "?" char.

Juha Manninen

2017-06-12 11:43

developer   ~0101057

Last edited: 2017-06-12 12:21

View 6 revisions

I removed the eyes and added a TitleMark property in r55329.
Unicode Emojis give nice opportunities for layout. They are essentially graphics inside text.
IMO '🔹' looks good with a title.

'&' without entity is not legal HTML, but if one is encountered then it must be copied verbatim. Why would you change it to '?'
If input is '&xxx', output must also be '&xxx' and not '?xxx'.

@wp: Yes, I believe fasthtmlparser and SAX could be used. However the code does not only extract text from HTML, it also renders in within the confines of pure text output.
To my surprise I did not find such code.
My class is loosely based on the original HTMLToCaption() function by Mattias. The function copied large memory blocks repeatedly while removing tags and thus was slow with big HTML.
I was kind of carried away when making an optimized class.
BTW, your example code removes newlines but it should remove the excess spaces, too.
Delete(AText, 1, 1) inside a big loop is butt-slow. :)

I am resolving this issue. The code can be discussed on mailing list or forum.
Patches can be added.

CudaText man

2017-06-12 13:10

reporter  

rendr.diff (3,282 bytes)   
Index: components/lazutils/html2textrender.pas
===================================================================
--- components/lazutils/html2textrender.pas	(revision 55332)
+++ components/lazutils/html2textrender.pas	(working copy)
@@ -31,11 +31,17 @@
   private
     fHTML, fOutput: string;
     fMaxLines: integer;
-    fLineEndMark: String; // End of line, by default std. "LineEnding".
-    fTitleMark: String; // Text at start and end of title text, by default Unicode graph.
+    fLineEndMark: String; // End of line, by default standard LineEnding
+    fTitleMark: String; // Text at start/end of title text: <div class="title">...</div>
+    fHorzLine: String; // Text for <hr> tag
+    fLinkBegin: String; // Text before link, <a href="...">
+    fLinkEnd: String; // Text after link
+    fListItemMark: String; // Text for <li> items
+    fMoreMark: String; // Text to add if too many lines
     fInHeader, fInDivTitle: Boolean;
     fPendingSpace: Boolean;
     fPendingNewLineCnt: Integer;
+    fIndentSize: integer; // Increment (in spaces) for each nested HTML level
     fIndent: integer;
     fLineCnt, fHtmlLen: Integer;
     p: Integer;
@@ -53,6 +59,12 @@
   public
     property LineEndMark: String read fLineEndMark write fLineEndMark;
     property TitleMark: String read fTitleMark write fTitleMark;
+    property HorzLineMark: String read fHorzLine write fHorzLine;
+    property LinkBeginMark: String read fLinkBegin write fLinkBegin;
+    property LinkEndMark: String read fLinkEnd write fLinkEnd;
+    property ListItemMark: String read fListItemMark write fListItemMark;
+    property MoreMark: String read fMoreMark write fMoreMark;
+    property IndentSize: integer read fIndentSize write fIndentSize;
   end;
 
 implementation
@@ -68,6 +80,12 @@
   // These can be changed by user later.
   fLineEndMark:=LineEnding;
   fTitleMark:='🔹';
+  fHorzLine:= '——————————————————';
+  fLinkBegin:='_';
+  fLinkEnd:='_';
+  fListItemMark:='* ';
+  fMoreMark:='...';
+  fIndentSize:=2;
 end;
 
 constructor THTML2TextRenderer.Create(const Stream: TStream);
@@ -122,13 +140,13 @@
     // Return False if max # of lines exceeded.
     if fLineCnt>fMaxLines then
     begin
-      fOutput:=fOutput+fLineEndMark+'...';
+      fOutput:=fOutput+fLineEndMark+fMoreMark;
       Exit(False);
     end;
   end;
   if fPendingNewLineCnt>0 then
   begin
-    fOutput:=fOutput+StringOfChar(' ',fIndent*2);
+    fOutput:=fOutput+StringOfChar(' ',fIndent*fIndentSize);
     fPendingNewLineCnt:=0;
   end;
   fOutput:=fOutput+aText;
@@ -211,18 +229,18 @@
         Inc(fIndent);
         // Don't leave empty lines before list item (not sure if this is good)
         AddOneNewLine;
-        Result:=AddOutput('* ');
+        Result:=AddOutput(fListItemMark);
       end;
     '/LI':
         Dec(fIndent);
     'A':                             // Link
-        Result:=AddOutput(' _');
+        Result:=AddOutput(' '+fLinkBegin);
     '/A':
-        Result:=AddOutput('_ ');
+        Result:=AddOutput(fLinkEnd+' ');
     'HR':
       begin
         AddOneNewLine;
-        Result:=AddOutput('——————————————————');
+        Result:=AddOutput(fHorzLine);
         //AddOneNewLine;
       end;
   end;
rendr.diff (3,282 bytes)   

CudaText man

2017-06-12 13:10

reporter   ~0101058

Made refac, 6 new properties, patch added.

Juha Manninen

2017-06-12 14:22

developer   ~0101063

Applied, although I don't find some of the properties very useful. For example who would want to change the '...' at the end of truncated output?
I renamed one propery as IndentStep.

CudaText man

2017-06-12 15:58

reporter   ~0101064

@Juha
there is Unicode char for "3 dots".

Issue History

Date Modified Username Field Change
2017-06-09 19:20 CudaText man New Issue
2017-06-09 19:20 CudaText man File Added: combo-doc-bad.png
2017-06-10 16:30 Juha Manninen Assigned To => Juha Manninen
2017-06-10 16:30 Juha Manninen Status new => assigned
2017-06-10 18:37 Juha Manninen Fixed in Revision => r55307
2017-06-10 18:37 Juha Manninen LazTarget => -
2017-06-10 18:37 Juha Manninen Note Added: 0101002
2017-06-10 18:37 Juha Manninen Status assigned => resolved
2017-06-10 18:37 Juha Manninen Resolution open => fixed
2017-06-10 18:47 CudaText man Note Added: 0101004
2017-06-10 18:47 CudaText man Status resolved => assigned
2017-06-10 18:47 CudaText man Resolution fixed => reopened
2017-06-10 18:47 CudaText man File Added: infobox.png
2017-06-10 19:07 Juha Manninen Note Added: 0101005
2017-06-10 19:11 Juha Manninen Note Edited: 0101005 View Revisions
2017-06-10 19:18 Juha Manninen Note Edited: 0101005 View Revisions
2017-06-10 19:29 Juha Manninen Note Added: 0101007
2017-06-10 20:08 Juha Manninen Note Edited: 0101007 View Revisions
2017-06-10 20:08 Juha Manninen Note Edited: 0101007 View Revisions
2017-06-10 22:21 CudaText man File Added: fixed1.png
2017-06-10 22:22 CudaText man File Added: fix-html.diff
2017-06-10 22:22 CudaText man Note Added: 0101011
2017-06-11 20:43 Juha Manninen Note Added: 0101025
2017-06-11 21:06 CudaText man Note Added: 0101030
2017-06-11 21:09 CudaText man Note Added: 0101031
2017-06-11 21:09 CudaText man Note Edited: 0101031 View Revisions
2017-06-11 21:38 CudaText man Note Added: 0101034
2017-06-11 21:39 CudaText man Note Edited: 0101034 View Revisions
2017-06-11 22:21 Juha Manninen Note Added: 0101037
2017-06-12 06:33 CudaText man Note Added: 0101043
2017-06-12 09:47 wp Note Added: 0101046
2017-06-12 10:55 CudaText man Note Added: 0101052
2017-06-12 10:56 CudaText man Note Added: 0101053
2017-06-12 10:58 CudaText man Note Added: 0101055
2017-06-12 10:59 CudaText man Note Edited: 0101055 View Revisions
2017-06-12 11:43 Juha Manninen Note Added: 0101057
2017-06-12 11:51 Juha Manninen Note Edited: 0101057 View Revisions
2017-06-12 11:54 Juha Manninen Note Edited: 0101057 View Revisions
2017-06-12 11:55 Juha Manninen Fixed in Revision r55307 => r55307, r55319, r55329
2017-06-12 11:55 Juha Manninen Status assigned => resolved
2017-06-12 11:55 Juha Manninen Resolution reopened => fixed
2017-06-12 11:56 Juha Manninen Fixed in Revision r55307, r55319, r55329 => r55307, r55319, r55325, r55329
2017-06-12 12:12 Juha Manninen Note Edited: 0101057 View Revisions
2017-06-12 12:18 Juha Manninen Note Edited: 0101057 View Revisions
2017-06-12 12:21 Juha Manninen Note Edited: 0101057 View Revisions
2017-06-12 13:10 CudaText man File Added: rendr.diff
2017-06-12 13:10 CudaText man Note Added: 0101058
2017-06-12 13:11 CudaText man Status resolved => assigned
2017-06-12 13:11 CudaText man Resolution fixed => reopened
2017-06-12 14:22 Juha Manninen Fixed in Revision r55307, r55319, r55325, r55329 => r55307, r55319, r55325, r55329, r55336
2017-06-12 14:22 Juha Manninen Note Added: 0101063
2017-06-12 14:22 Juha Manninen Status assigned => resolved
2017-06-12 14:22 Juha Manninen Resolution reopened => fixed
2017-06-12 15:58 CudaText man Note Added: 0101064