[Htmlparser-developer] Issue with dirty parsing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Folks,
    I am encountering a really strange scenario - try to create a link =
like this in a web page -
<A HREF=3D"...">something<A>

i.e. instead of putting a close tag </A>, put an open tag. I find that =
Internet Explorer renders it just fine. Now if IE renders it, then =
perhaps we ought to support it in HTML Parser. However, its not so easy =
-

check out the latest source from CVS - I have put in a testcase for this =
situation which is failing (in HTMLLinkScannerTest - =
com.kizna.html.scannersTests)

The problem is in HTMLReader.find() - which goes into a sort of =
recursion - when it finds <A ...> the first time, the scanner asks it to =
find the remaining tags. Now if the second A is encountered, it will try =
to keep parsing till the end tag is encountered, which wont happen. Now, =
I need a clean elegant way of telling the reader not to expand in =
exceptional situations like this one.

I can of course do it with some flags - but before I do it - I was =
wondering if anyone has insights on this problem - and if anyone thinks =
we should not support this dirty html even if IE does.

Regards,
Somik