htmlparser-developer Mailing List for HTML Parser

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Peter,
=20
Yes, you have permission. In fact we would be honoured and endeavor to
assist you in any way necessary.
=20
It's funny you should mention images and DOM. The latest versions of
htmlparser includes an example application that does a very similar
task; getting the images behind thumbnails (see lib/thumbelina.jar or
package org.htmlparser.lexerapplications.thumbelina). It uses the low
level Lexer package to avoid having to form the entire document model. I
would check to see if something like this meets your needs.
=20
If you need more than that (i.e. table parsing, balancing end tags,
etc.) you'll have to go with the full parser. Unfortunately, the Lexer
hasn't been completely integrated into the parser yet and the current
CVS snapshot is a bit of a mess. With a bit of patience, this too will
come to pass.
=20
As far as performance comparisons go, I've only heard anecdotal evidence
that htmlparser is faster. I suppose this could be an area of
investigation.
=20
Derrick

-----Original Message-----
From: peter lin [mailto:jmw...@ya...]=20
Sent: September 29, 2003 8:53 AM
To: Derrick Oswald
Subject: question about using HTMLParser in Apache JMeter

=20
Hi derrick,
=20
=20
I am a commiter on Apache's Jakarta JMeter project. I was wondering if
we can get permission to use it. Since Apache foundation can't use LGPL
code without permission, I'm hoping you're open to the idea.
=20
here is a quick description of how I want to use it. JMeter currently is
a load testing tool for HTTP, FTP, JDBC and Java. The HTTP plugin uses
JTidy to parse the HTML and extract the images for download.
=20
test plans with more than 20 clients performs poorly because of the high
cost of DOM. JTidy generates DOM documents. One trick is to turn off
download images in JMeter, but that doesn't solve the real problem. I
want to replace JTidy with HTMLParser. I haven't done any performance
comparison yet, but I'm guessing it should use less memory.
=20
has anyone done a performance comparison between JTidy and HTMLParser?
=20
peter lin
=20
=20
=20
=20

  _____ =20

Do you Yahoo!?
The
<http://shopping.yahoo.com/?__yltc=3Ds%3A150000443%2Cd%3A22708228%2Cslk%3=
A
text%2Csec%3Amail> New Yahoo! Shopping - with improved product search

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (1)	Dec (4)
2002	Jan (12)	Feb	Mar (7)	Apr (27)	May (14)	Jun (16)	Jul (27)	Aug (74)	Sep (1)	Oct (23)	Nov (12)	Dec (119)
2003	Jan (31)	Feb (23)	Mar (28)	Apr (59)	May (119)	Jun (10)	Jul (3)	Aug (17)	Sep (8)	Oct (38)	Nov (6)	Dec (1)
2004	Jan (4)	Feb (4)	Mar (1)	Apr (2)	May	Jun (7)	Jul (6)	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan	Feb (1)	Mar	Apr (8)	May	Jun	Jul	Aug (2)	Sep (10)	Oct (4)	Nov (15)	Dec
2006	Jan	Feb (1)	Mar	Apr (4)	May (11)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2007	Jan (3)	Feb (2)	Mar	Apr (2)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (5)	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr (2)	May	Jun (4)	Jul	Aug (1)	Sep	Oct	Nov	Dec (2)
2010	Jan (1)	Feb	Mar	Apr (8)	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (1)	Dec
2011	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (2)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov (2)	Dec (2)

S	M	T	W	T	F	S
	1 (2)	2	3 (1)	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20 (1)
21	22	23	24	25	26	27
28	29 (4)	30

htmlparser-developer Mailing List for HTML Parser

htmlparser-developer — The developer mailing list of the htmlparser project