htmlparser-developer Mailing List for HTML Parser

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Folks,
     A major bug fix has been done. I had previously reported that the =
parser crashes when encountering very dirty html of the form :
<A HREF=3D"http://www.somelink.com">SomeText<A>

Instead of the end tag, we put in a begin tag by mistake, and the parser =
promptly crashes. This called for a modification in the evaluate() =
method, as the current scanners dont have more than existing local info =
about the parsing process. But now, Ive introduced a parameter - which =
takes in the scanner. So, if a tag was being parsed, and in the process =
of the parsing, another tag starts being parsed, then the second tag =
will now know that a scanner process is already running.

This enables the HTMLLinkScanner to come to the conclusion that its =
current parsing activity is of a dirty html tag, and hence take the =
appropriate action (flag the scanner into a dirty mode, and return an =
HTMLEndTag - which is expected by the previous scanner).

This solves this bug - and finally we can handle some really crazy =
pages...
This fix and some others, along with some additions (META and TITLE) =
will make it to release 1.1 (coming soon). Currently, the latest code is =
available thru CVS.

In case any of you have written your own scanners - you will need to =
modify the evaluate method signature to be compatible with the new =
HTMLTagScanner.

Regards,
Somik

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (1)	Dec (4)
2002	Jan (12)	Feb	Mar (7)	Apr (27)	May (14)	Jun (16)	Jul (27)	Aug (74)	Sep (1)	Oct (23)	Nov (12)	Dec (119)
2003	Jan (31)	Feb (23)	Mar (28)	Apr (59)	May (119)	Jun (10)	Jul (3)	Aug (17)	Sep (8)	Oct (38)	Nov (6)	Dec (1)
2004	Jan (4)	Feb (4)	Mar (1)	Apr (2)	May	Jun (7)	Jul (6)	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan	Feb (1)	Mar	Apr (8)	May	Jun	Jul	Aug (2)	Sep (10)	Oct (4)	Nov (15)	Dec
2006	Jan	Feb (1)	Mar	Apr (4)	May (11)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2007	Jan (3)	Feb (2)	Mar	Apr (2)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (5)	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr (2)	May	Jun (4)	Jul	Aug (1)	Sep	Oct	Nov	Dec (2)
2010	Jan (1)	Feb	Mar	Apr (8)	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (1)	Dec
2011	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (2)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov (2)	Dec (2)

S	M	T	W	T	F	S
					1	2
3	4 (1)	5	6	7	8	9
10	11 (1)	12 (1)	13	14	15	16
17	18	19	20	21	22 (1)	23
24 (2)	25	26	27	28	29	30
31 (1)

htmlparser-developer Mailing List for HTML Parser

htmlparser-developer — The developer mailing list of the htmlparser project