[Htmlparser-developer] Issue with dirty parsing
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-03-24 05:48:24
|
Hi Folks, I am encountering a really strange scenario - try to create a link = like this in a web page - <A HREF=3D"...">something<A> i.e. instead of putting a close tag </A>, put an open tag. I find that = Internet Explorer renders it just fine. Now if IE renders it, then = perhaps we ought to support it in HTML Parser. However, its not so easy = - check out the latest source from CVS - I have put in a testcase for this = situation which is failing (in HTMLLinkScannerTest - = com.kizna.html.scannersTests) The problem is in HTMLReader.find() - which goes into a sort of = recursion - when it finds <A ...> the first time, the scanner asks it to = find the remaining tags. Now if the second A is encountered, it will try = to keep parsing till the end tag is encountered, which wont happen. Now, = I need a clean elegant way of telling the reader not to expand in = exceptional situations like this one. I can of course do it with some flags - but before I do it - I was = wondering if anyone has insights on this problem - and if anyone thinks = we should not support this dirty html even if IE does. Regards, Somik |