fcl-xml can't parse complex html file
Original Reporter info from Mantis: leonardorame
-
Reporter name: Leonardo M. Ramé
Original Reporter info from Mantis: leonardorame
- Reporter name: Leonardo M. Ramé
Description:
Hi, I need to parse a web page that contains some complex scripting, and I'm getting this EDOMError in DOMDocument.CreateElement.
Inspecting the tagName param from TDOMDocument.CreateElement(const tagName: DOMString), I found the parser can't understand this line:
document.write('&LtPos;scr'+'ipt type="text/javascript" src="' + dtmtag + '">&LtPos;/scr'+'ipt>');
I attached the problematic document for you to test it. It looks like a bug in sax_html.pp.
Now, is there a way to tell the parser to ignore some tags?, for example, here I would like to ignore the tag "&LtPos;script>...&LtPos;/script>".
Steps to reproduce:
Just run this code to get the error:
var
lStrings: TStringList;
lStream: TStringStream;
begin
lStrings := TStringList.Create;
try
lStrings.LoadFromFile('/path_to/salida.txt');
lStrings.Text := '<script type="text/javascript"> ' +
'var dtmtag = window.location.protocol + "//a248." + (window.location.protocol == "http:" ? "g" : "e") + ".akamai.net/7/248/14564/20080403/dotomi.download.akamai.com/14564/rules/2370/dtmtag.js";'+
'document.write(''<scr''+''ipt type="text/javascript" src="'' + dtmtag + ''"></scr''+''ipt>'');' +
'</script>';
lStream := TStringStream.Create(lStrings.Text);
try
Memo1.Text := ExtractChainData(lStream);
finally
lStream.Free;
end;
finally
lStrings.Free;
end;
end.
Mantis conversion info:
- Mantis ID: 18826
- OS: All
- OS Build: All
- Build: r17000
- Platform: All
- Version: 2.5.1
- Fixed in version: 2.6.0
- Fixed in revision: 17003 (#b7e26ed9)