htmlparser-user Mailing List for HTML Parser
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
1
(1) |
2
(1) |
3
|
4
(1) |
5
|
6
(4) |
7
(7) |
8
(1) |
9
(3) |
10
(1) |
11
|
12
(2) |
13
(1) |
14
(2) |
15
|
16
|
17
|
18
|
19
|
20
(1) |
21
(10) |
22
(3) |
23
|
24
|
25
|
26
|
27
|
28
|
29
(3) |
30
|
31
|
From: Derrick O. <Der...@Ro...> - 2005-12-29 14:41:53
|
Ted, I don't think HTML Parser is what you need. It's primary use-case is programatic extraction of information from web pages, i.e. spidering, with some facilities for re-writing. As far as I know, there isn't anyone using the HTML Parser as the parsing component of the JEditorPane, and I don't believe anyone has written a browser based on it. You might want to try some of the java based browsers, e.g. shogun <http://sourceforge.net/projects/shogun>, JXWB <http://sourceforge.net/projects/jxwb>, or others <http://sourceforge.net/softwaremap/trove_list.php?form_cat=91> that purport to do what you want. Derrick Ted Byers wrote: > I have read that the HTML parser that is used within JEditorPane is > seriously broken. In reading the archive of this list, the impression > is created that using HTMLParser with JEditorPane is problematic at > best, although there seems to be little recent material on this issue. > > HTMLParser comes highly recommended. However, it does me no good if I > can't figure out how to get started in order to use it to render > generic web pages. I have a need for a java component (perhaps an > applet, or an application that can be launched using webstart) that > will display web pages (most of my users will need read only access to > the documents rendered, but there is a second category of user that > needs read/write access to documents). I do not need, or want, to > have to deal with examining the data parsed by the parser. I really > don't want to write a class to render the output produced by > HTMLParser. I just want to make a web page viewer (or, better, a web > browser that supports basic scripting using e.g. JScript) that uses > HTMLParser to make it more robust than the default parsed used in > JEditorPane. > > On the face of it, none of the example applications show me how to do > this; although it is possible that I missed something. > > To do what I need done, do I need anything else other than > HTMLParser? Or can it be that HTMLParser includes functions to render > generic web pages on, e.g., a JFrame? In either case, where can I > find an example program that shows me how to do what I need to do to > get started? > > Once I have a start, the next phase will involve using a wysiwyg > editor web page and a servlet that uses HTMLparser to validate web > pages created using the wysiwyg editor web page, and send the user > intelligible error messages when the user tries to create something > HTMLParser doesn't understand. Or maybe there is already something > out there that will do what I need to do (preferably open source). > Any ideas/recommendations? > > Thanks, > > Ted > > R.E. (Ted) Byers, Ph.D., Ed.D. > R & D Decision Support Software > http://www.randddecisionsupportsolutions.com/ |
From: Third E. <nav...@gm...> - 2005-12-29 14:12:27
|
HTMLParser is not a browser so it is not going to be possible to get coordinates and positions of elements directly. You may have to write some addin on top of the parser where you load the output of the parser into some UI controls and then get the position. Just curious, is there some specific functionlaity you are looking for by knowing the coordinates? Naveen On 12/22/05, Gurpreet Sachdeva <gur...@gm...> wrote: > Thanks for the reply Naveen. > > >>>HTML parser will give you position if site is using absolute positioni= ng > and proper coordinates have been set in the STYLE attribute. > > How do we capture that information through HTML Parser. > Lets say if I need the coordinates of each element on http://news.bbc.co.= uk > How do I achive that? > > Thanks for your help, > Gurpreet Singh > > > On 12/22/05, Naveen Kohli <nav...@gm...> wrote: > > > > > > > > HTML parser will give you position if site is using absolute positionin= g > and proper coordinates have been set in the STYLE attribute. Otherwise, n= o > HTML parser can't give you the coordinates. > > > > > > > > Naveen > > > > > > > > ________________________________ > > > > > From: htm...@li... > [mailto:htm...@li...] On > Behalf Of Gurpreet Sachdeva > > Sent: Thursday, December 22, 2005 5:51 AM > > To: htm...@li... > > Subject: [Htmlparser-user] coordinates of text rendered on browser. > > > > > > > > > > Hi guys, > > > > I have a basic query. Do HTML Parser gives me the coordinates of text a= s > rendered on the browser? > > > > When I tried this: > > java -jar lib/htmlparser.jar http://news.bbc.co.uk A > > > > It gave something: > > > > LinkData > > -------- > > 0 Txt (55846[3258,81],55858[3258,93]): News sources > > *** END of LinkData *** > > Link to : http://www.bbc.co.uk/info/; titled : About the BBC; begins at= : > 55875; ends at : 55934, AccessKey=3Dnull > > LinkData > > -------- > > 0 Txt (55934[3259,65],55947[3259,78]): About the BBC > > *** END of LinkData *** > > Link to : > http://news.bbc.co.uk/newswatch/ukfs/hi/feedback/default.stm; > titled : Contact us; begins at : 55964; ends at : 56067, AccessKey=3Dnull > > LinkData > > -------- > > 0 Txt (56067[3260,109],56077[3260,119]): Contact us > > *** END of LinkData *** > > > > In the End... > > Does these numbers (56067[3260,109],56077[3260,119]) > refer to the coordinates? if not can I by some way get those coordinates > that are rendered in a standard browser (Mozilla/Firefox) > > > > Thanks and Regards, > > Gurpreet Singh > > > > > > > > -- > Thanks and Regards, > GSS -- Naveen K Kohli http://www.netomatix.com |
From: Ted B. <r.t...@ro...> - 2005-12-29 01:13:33
|
I have read that the HTML parser that is used within JEditorPane is = seriously broken. In reading the archive of this list, the impression = is created that using HTMLParser with JEditorPane is problematic at = best, although there seems to be little recent material on this issue. HTMLParser comes highly recommended. However, it does me no good if I = can't figure out how to get started in order to use it to render generic = web pages. I have a need for a java component (perhaps an applet, or an = application that can be launched using webstart) that will display web = pages (most of my users will need read only access to the documents = rendered, but there is a second category of user that needs read/write = access to documents). I do not need, or want, to have to deal with = examining the data parsed by the parser. I really don't want to write a = class to render the output produced by HTMLParser. I just want to make = a web page viewer (or, better, a web browser that supports basic = scripting using e.g. JScript) that uses HTMLParser to make it more = robust than the default parsed used in JEditorPane. On the face of it, none of the example applications show me how to do = this; although it is possible that I missed something. To do what I need done, do I need anything else other than HTMLParser? = Or can it be that HTMLParser includes functions to render generic web = pages on, e.g., a JFrame? In either case, where can I find an example = program that shows me how to do what I need to do to get started? Once I have a start, the next phase will involve using a wysiwyg editor = web page and a servlet that uses HTMLparser to validate web pages = created using the wysiwyg editor web page, and send the user = intelligible error messages when the user tries to create something = HTMLParser doesn't understand. Or maybe there is already something out = there that will do what I need to do (preferably open source). Any = ideas/recommendations? Thanks, Ted R.E. (Ted) Byers, Ph.D., Ed.D. R & D Decision Support Software http://www.randddecisionsupportsolutions.com/ |
From: Gurpreet S. <gur...@gm...> - 2005-12-22 11:07:46
|
Thanks for the reply Naveen. >>>HTML parser will give you position if site is using absolute positioning and proper coordinates have been set in the STYLE attribute. How do we capture that information through HTML Parser. Lets say if I need the coordinates of each element on http://news.bbc.co.ukHow do I achive that? Thanks for your help, Gurpreet Singh On 12/22/05, Naveen Kohli <nav...@gm...> wrote: > > HTML parser will give you position if site is using absolute positioning > and proper coordinates have been set in the STYLE attribute. Otherwise, n= o > HTML parser can't give you the coordinates. > > > > Naveen > > > ------------------------------ > > *From:* htm...@li... [mailto: > htm...@li...] *On Behalf Of *Gurpreet > Sachdeva > *Sent:* Thursday, December 22, 2005 5:51 AM > *To:* htm...@li... > *Subject:* [Htmlparser-user] coordinates of text rendered on browser. > > > > Hi guys, > > I have a basic query. Do HTML Parser gives me the coordinates of text as > rendered on the browser? > > When I tried this: > java -jar lib/htmlparser.jar http://news.bbc.co.uk A > > It gave something: > > LinkData > -------- > 0 Txt (55846[3258,81],55858[3258,93]): News sources > *** END of LinkData *** > Link to : http://www.bbc.co.uk/info/; titled : About the BBC; begins at : > 55875; ends at : 55934, AccessKey=3Dnull > LinkData > -------- > 0 Txt (55934[3259,65],55947[3259,78]): About the BBC > *** END of LinkData *** > Link to : http://news.bbc.co.uk/newswatch/ukfs/hi/feedback/default.stm; > titled : Contact us; begins at : 55964; ends at : 56067, AccessKey=3Dnull > LinkData > -------- > 0 Txt (56067[3260,109],56077[3260,119]): Contact us > *** END of LinkData *** > > In the End... > Does these numbers (56067[3260,109],56077[3260,119]) refer to the > coordinates? if not can I by some way get those coordinates that are > rendered in a standard browser (Mozilla/Firefox) > > Thanks and Regards, > Gurpreet Singh > > -- Thanks and Regards, GSS |
From: Naveen K. <nav...@gm...> - 2005-12-22 11:00:41
|
HTML parser will give you position if site is using absolute positioning and proper coordinates have been set in the STYLE attribute. Otherwise, no HTML parser can't give you the coordinates. Naveen _____ From: htm...@li... [mailto:htm...@li...] On Behalf Of Gurpreet Sachdeva Sent: Thursday, December 22, 2005 5:51 AM To: htm...@li... Subject: [Htmlparser-user] coordinates of text rendered on browser. Hi guys, I have a basic query. Do HTML Parser gives me the coordinates of text as rendered on the browser? When I tried this: java -jar lib/htmlparser.jar http://news.bbc.co.uk A It gave something: LinkData -------- 0 Txt (55846[3258,81],55858[3258,93]): News sources *** END of LinkData *** Link to : http://www.bbc.co.uk/info/; titled : About the BBC; begins at : 55875; ends at : 55934, AccessKey=null LinkData -------- 0 Txt (55934[3259,65],55947[3259,78]): About the BBC *** END of LinkData *** Link to : http://news.bbc.co.uk/newswatch/ukfs/hi/feedback/default.stm; titled : Contact us; begins at : 55964; ends at : 56067, AccessKey=null LinkData -------- 0 Txt (56067[3260,109],56077[3260,119]): Contact us *** END of LinkData *** In the End... Does these numbers (56067[3260,109],56077[3260,119]) refer to the coordinates? if not can I by some way get those coordinates that are rendered in a standard browser (Mozilla/Firefox) Thanks and Regards, Gurpreet Singh |
From: Gurpreet S. <gur...@gm...> - 2005-12-22 10:51:11
|
Hi guys, I have a basic query. Do HTML Parser gives me the coordinates of text as rendered on the browser? When I tried this: java -jar lib/htmlparser.jar http://news.bbc.co.uk A It gave something: LinkData -------- 0 Txt (55846[3258,81],55858[3258,93]): News sources *** END of LinkData *** Link to : http://www.bbc.co.uk/info/; titled : About the BBC; begins at : 55875; ends at : 55934, AccessKey=3Dnull LinkData -------- 0 Txt (55934[3259,65],55947[3259,78]): About the BBC *** END of LinkData *** Link to : http://news.bbc.co.uk/newswatch/ukfs/hi/feedback/default.stm; titled : Contact us; begins at : 55964; ends at : 56067, AccessKey=3Dnull LinkData -------- 0 Txt (56067[3260,109],56077[3260,119]): Contact us *** END of LinkData *** In the End... Does these numbers (56067[3260,109],56077[3260,119]) refer to the coordinates? if not can I by some way get those coordinates that are rendered in a standard browser (Mozilla/Firefox) Thanks and Regards, Gurpreet Singh |
From: Srinivas V. <sv...@al...> - 2005-12-21 13:45:32
|
Thanks Derick, I got it now. That worked...=20 Srini=20 P.S: I really have to get a handle on using these filters...=20 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald Sent: Wednesday, December 21, 2005 7:04 PM To: htm...@li... Subject: Re: [Htmlparser-user] Filter Help If you are talking about the options at the bottom: =20 <li>Options available are:</ul>500 =3D Tape and Reel Packaging, 850<br>XXXE =3D Lead Free Option they aren't part of a list. You're probably better off post-processing the list you get to find all it's siblings: NodeList lists =3D ....apply the filter to the page; BulletList list =3D (BulletList)lists.elementAt (0); NodeList siblings =3D list.getParent ().getChildren (); Srinivas Vemula wrote: >Thank you very much, Some how I am not able to use the tool to create >that perfect filter, and I am sure . Its JUST ME. > >http://www.avagotech.com/products/product-detail.jsp?navId=3DH0,C2,C5213= , C >5249,P88749 > >Could you please modify the filter .. So that it catches all the <UL> >tags for "Features" block. Above is the URL which is an example, this >product has sub lists for a particular listItem in the main features >list > >Thanks a lot > >=20 > >-----Original Message----- >From: htm...@li... >[mailto:htm...@li...] On Behalf Of >Derrick Oswald >Sent: Wednesday, December 21, 2005 6:45 PM >To: htm...@li... >Subject: Re: [Htmlparser-user] Filter Help > >Having the URL makes it easy... >It needs to ignore lists with class or id attributes. I used a NotFilter >containing an OrFilter with the two HasAttributeFilters. >I've added this to the example. > >Srinivas Vemula wrote: > > =20 > >>Thanks for the helo Derick. It is not able to filter the Features part >> =20 >> > > =20 > >>of it, and I am ending up getting all the <UL> tags in the web page. >>I am attaching the web page, and here is the URL >>=20 >>http://www.avagotech.com/products/product-detail.jsp?navId=3DH0,C2,C521= 2 >>,C5255,P88706 >>=20 >>Thanks for your time and help on this >>=20 >>=20 >> >> >> =20 >> > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_idv37&alloc_id=16865&op=3Dclick >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > =20 > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Srinivas V. <sv...@al...> - 2005-12-21 13:37:37
|
Yes, You are right... But the current filter only returns till </UL> for the block and "500 =3D Tape and Reel Packaging, 850<br>XXXE =3D Lead = Free Option they aren't part of a list" is just text after the list. I am not able to get to that part=20 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald Sent: Wednesday, December 21, 2005 7:04 PM To: htm...@li... Subject: Re: [Htmlparser-user] Filter Help If you are talking about the options at the bottom: =20 <li>Options available are:</ul>500 =3D Tape and Reel Packaging, 850<br>XXXE =3D Lead Free Option they aren't part of a list. You're probably better off post-processing the list you get to find all it's siblings: NodeList lists =3D ....apply the filter to the page; BulletList list =3D (BulletList)lists.elementAt (0); NodeList siblings =3D list.getParent ().getChildren (); Srinivas Vemula wrote: >Thank you very much, Some how I am not able to use the tool to create >that perfect filter, and I am sure . Its JUST ME. > >http://www.avagotech.com/products/product-detail.jsp?navId=3DH0,C2,C5213= , C >5249,P88749 > >Could you please modify the filter .. So that it catches all the <UL> >tags for "Features" block. Above is the URL which is an example, this >product has sub lists for a particular listItem in the main features >list > >Thanks a lot > >=20 > >-----Original Message----- >From: htm...@li... >[mailto:htm...@li...] On Behalf Of >Derrick Oswald >Sent: Wednesday, December 21, 2005 6:45 PM >To: htm...@li... >Subject: Re: [Htmlparser-user] Filter Help > >Having the URL makes it easy... >It needs to ignore lists with class or id attributes. I used a NotFilter >containing an OrFilter with the two HasAttributeFilters. >I've added this to the example. > >Srinivas Vemula wrote: > > =20 > >>Thanks for the helo Derick. It is not able to filter the Features part >> =20 >> > > =20 > >>of it, and I am ending up getting all the <UL> tags in the web page. >>I am attaching the web page, and here is the URL >>=20 >>http://www.avagotech.com/products/product-detail.jsp?navId=3DH0,C2,C521= 2 >>,C5255,P88706 >>=20 >>Thanks for your time and help on this >>=20 >>=20 >> >> >> =20 >> > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_idv37&alloc_id=16865&op=3Dclick >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > =20 > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Derrick O. <Der...@Ro...> - 2005-12-21 13:33:49
|
If you are talking about the options at the bottom: <li>Options available are:</ul>500 = Tape and Reel Packaging, 850<br>XXXE = Lead Free Option they aren't part of a list. You're probably better off post-processing the list you get to find all it's siblings: NodeList lists = ....apply the filter to the page; BulletList list = (BulletList)lists.elementAt (0); NodeList siblings = list.getParent ().getChildren (); Srinivas Vemula wrote: >Thank you very much, Some how I am not able to use the tool to create >that perfect filter, and I am sure . Its JUST ME. > >http://www.avagotech.com/products/product-detail.jsp?navId=H0,C2,C5213,C >5249,P88749 > >Could you please modify the filter .. So that it catches all the <UL> >tags for "Features" block. Above is the URL which is an example, this >product has sub lists for a particular listItem in the main features >list > >Thanks a lot > > > >-----Original Message----- >From: htm...@li... >[mailto:htm...@li...] On Behalf Of >Derrick Oswald >Sent: Wednesday, December 21, 2005 6:45 PM >To: htm...@li... >Subject: Re: [Htmlparser-user] Filter Help > >Having the URL makes it easy... >It needs to ignore lists with class or id attributes. I used a NotFilter >containing an OrFilter with the two HasAttributeFilters. >I've added this to the example. > >Srinivas Vemula wrote: > > > >>Thanks for the helo Derick. It is not able to filter the Features part >> >> > > > >>of it, and I am ending up getting all the <UL> tags in the web page. >>I am attaching the web page, and here is the URL >> >>http://www.avagotech.com/products/product-detail.jsp?navId=H0,C2,C5212 >>,C5255,P88706 >> >>Thanks for your time and help on this >> >> >> >> >> >> > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_idv37&alloc_id865&op=click >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > |
From: Srinivas V. <sv...@al...> - 2005-12-21 13:28:21
|
One more thing Derek. It should only catch the Sub Lists, if there are any. Or should behave as before. =20 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of Srinivas Vemula Sent: Wednesday, December 21, 2005 6:51 PM To: htm...@li... Subject: RE: [Htmlparser-user] Filter Help Thank you very much, Some how I am not able to use the tool to create that perfect filter, and I am sure . Its JUST ME. http://www.avagotech.com/products/product-detail.jsp?navId=3DH0,C2,C5213,= C 5249,P88749 Could you please modify the filter .. So that it catches all the <UL> tags for "Features" block. Above is the URL which is an example, this product has sub lists for a particular listItem in the main features list Thanks a lot =20 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald Sent: Wednesday, December 21, 2005 6:45 PM To: htm...@li... Subject: Re: [Htmlparser-user] Filter Help Having the URL makes it easy... It needs to ignore lists with class or id attributes. I used a NotFilter containing an OrFilter with the two HasAttributeFilters. I've added this to the example. Srinivas Vemula wrote: > Thanks for the helo Derick. It is not able to filter the Features part > of it, and I am ending up getting all the <UL> tags in the web page. > I am attaching the web page, and here is the URL > =20 > = http://www.avagotech.com/products/product-detail.jsp?navId=3DH0,C2,C5212 > ,C5255,P88706 > =20 > Thanks for your time and help on this > =20 > =20 > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id=16865&op=3Dick _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Srinivas V. <sv...@al...> - 2005-12-21 13:21:14
|
Thank you very much, Some how I am not able to use the tool to create that perfect filter, and I am sure . Its JUST ME. http://www.avagotech.com/products/product-detail.jsp?navId=3DH0,C2,C5213,= C 5249,P88749 Could you please modify the filter .. So that it catches all the <UL> tags for "Features" block. Above is the URL which is an example, this product has sub lists for a particular listItem in the main features list Thanks a lot =20 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald Sent: Wednesday, December 21, 2005 6:45 PM To: htm...@li... Subject: Re: [Htmlparser-user] Filter Help Having the URL makes it easy... It needs to ignore lists with class or id attributes. I used a NotFilter containing an OrFilter with the two HasAttributeFilters. I've added this to the example. Srinivas Vemula wrote: > Thanks for the helo Derick. It is not able to filter the Features part > of it, and I am ending up getting all the <UL> tags in the web page. > I am attaching the web page, and here is the URL > =20 > = http://www.avagotech.com/products/product-detail.jsp?navId=3DH0,C2,C5212 > ,C5255,P88706 > =20 > Thanks for your time and help on this > =20 > =20 > > |
From: Derrick O. <Der...@Ro...> - 2005-12-21 13:16:10
|
This doesn't look like a HTML problem. Are you sure HTML Parser is the tool to use here. Why not just code it as a regular file modification? 'Sam Birch' wrote: >Hi >I've got a load of HTML templates that I want to access via HTMLParser to >alter the text in them. The text at the start of the template is of the >form: > >!ADMIN LOG > !AD_PARAM TEXT = A message here >!END_ADMIN > ><table> >. >. >. > >I'm trying to replace the text "A message here" with something else. I'm >trying to put together a tag to extract this, but the problem is that >since there is no <> tag at the start of the HTML, HTMLParser can't >identify the text I want. I've tried using "" as the mID for the tag - but >no luck. If I use Firefox's DOM explorer to view the page, it will >"magically" generate the <HTML> and <BODY> tags such that the page makes >sense. Is there any way for me to do something similar in HTMLParser? Or >perhaps check for specific text (e.g. !ADMIN - this text doesn't appear >anywhere else in my HTML) > >Any advice would be greatly appreciated! > >Sam > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > |
From: Derrick O. <Der...@Ro...> - 2005-12-21 13:15:06
|
Having the URL makes it easy... It needs to ignore lists with class or id attributes. I used a NotFilter containing an OrFilter with the two HasAttributeFilters. I've added this to the example. Srinivas Vemula wrote: > Thanks for the helo Derick. It is not able to filter the Features part > of it, and I am ending up getting all the <UL> tags in the web page. > I am attaching the web page, and here is the URL > > http://www.avagotech.com/products/product-detail.jsp?navId=H0,C2,C5212,C5255,P88706 > > Thanks for your time and help on this > > > > |
From: 'Sam Birch' <Sam...@ca...> - 2005-12-21 12:53:18
|
Hi I've got a load of HTML templates that I want to access via HTMLParser to alter the text in them. The text at the start of the template is of the form: !ADMIN LOG !AD_PARAM TEXT = A message here !END_ADMIN <table> . . . I'm trying to replace the text "A message here" with something else. I'm trying to put together a tag to extract this, but the problem is that since there is no <> tag at the start of the HTML, HTMLParser can't identify the text I want. I've tried using "" as the mID for the tag - but no luck. If I use Firefox's DOM explorer to view the page, it will "magically" generate the <HTML> and <BODY> tags such that the page makes sense. Is there any way for me to do something similar in HTMLParser? Or perhaps check for specific text (e.g. !ADMIN - this text doesn't appear anywhere else in my HTML) Any advice would be greatly appreciated! Sam |
From: Derrick O. <Der...@Ro...> - 2005-12-21 01:59:53
|
// Generated by FilterBuilder. http://htmlparser.org // [aced0005737200206f72672e68746d6c7061727365722e66696c746572732e416e644= 6696c74657224c30516b2b7b2120200015b000b6d5072656469636174657374001c5b4c6f= 72672f68746d6c7061727365722f4e6f646546696c7465723b78707572001c5b4c6f72672= e68746d6c7061727365722e4e6f646546696c7465723b8f17479b1d5f7992020000787000= 000002737200246f72672e68746d6c7061727365722e66696c746572732e5461674e616d6= 546696c746572b28b2601a614890f0200014c00056d4e616d657400124c6a6176612f6c61= 6e672f537472696e673b7870740002554c737200266f72672e68746d6c7061727365722e6= 6696c746572732e486173506172656e7446696c746572430e73bb2cda7a4e0200025a000a= 6d5265637572736976654c000d6d506172656e7446696c74657274001b4c6f72672f68746= d6c7061727365722f4e6f646546696c7465723b7870017371007e00007571007e00030000= 00027371007e0005740003444956737200276f72672e68746d6c7061727365722e66696c7= 46572732e4861735369626c696e6746696c746572eb4819a7c54a9a2b0200014c000e6d53= 69626c696e6746696c74657271007e000a78707371007e00007571007e000300000002737= 1007e0005740003444956737200256f72672e68746d6c7061727365722e66696c74657273= 2e4861734368696c6446696c7465720d33e5cd9f31450e0200025a000a6d5265637572736= 976654c000c6d4368696c6446696c74657271007e000a787001737200236f72672e68746d= 6c7061727365722e66696c746572732e537472696e6746696c74657207df2adf4bd4ef0c0= 200045a000e6d4361736553656e7369746976654c00076d4c6f63616c657400124c6a6176= 612f7574696c2f4c6f63616c653b4c00086d5061747465726e71007e00064c000d6d55707= 065725061747465726e71007e0006787001737200106a6176612e7574696c2e4c6f63616c= 657ef811609c30f9ec02000449000868617368636f64654c0007636f756e74727971007e0= 0064c00086c616e677561676571007e00064c000776617269616e7471007e00067870ffff= ffff7400025553740002656e740000740008466561747572657371007e0020] import org.htmlparser.*; import org.htmlparser.filters.*; import org.htmlparser.beans.*; import org.htmlparser.util.*; public class ListItems { public static void main (String args[]) { TagNameFilter filter0 =3D new TagNameFilter (); filter0.setName ("UL"); TagNameFilter filter1 =3D new TagNameFilter (); filter1.setName ("DIV"); TagNameFilter filter2 =3D new TagNameFilter (); filter2.setName ("DIV"); StringFilter filter3 =3D new StringFilter (); filter3.setCaseSensitive (true); filter3.setLocale (new java.util.Locale ("en", "US", "")); filter3.setPattern ("Features"); HasChildFilter filter4 =3D new HasChildFilter (); filter4.setRecursive (true); filter4.setChildFilter (filter3); NodeFilter[] array0 =3D new NodeFilter[2]; array0[0] =3D filter2; array0[1] =3D filter4; AndFilter filter5 =3D new AndFilter (); filter5.setPredicates (array0); HasSiblingFilter filter6 =3D new HasSiblingFilter (); filter6.setSiblingFilter (filter5); NodeFilter[] array1 =3D new NodeFilter[2]; array1[0] =3D filter1; array1[1] =3D filter6; AndFilter filter7 =3D new AndFilter (); filter7.setPredicates (array1); HasParentFilter filter8 =3D new HasParentFilter (); filter8.setRecursive (true); filter8.setParentFilter (filter7); NodeFilter[] array2 =3D new NodeFilter[2]; array2[0] =3D filter0; array2[1] =3D filter8; AndFilter filter9 =3D new AndFilter (); filter9.setPredicates (array2); NodeFilter[] array3 =3D new NodeFilter[1]; array3[0] =3D filter9; FilterBean bean =3D new FilterBean (); bean.setFilters (array3); if (0 !=3D args.length) { bean.setURL (args[0]); System.out.println (bean.getNodes ().toHtml ()); } else System.out.println ("Usage: java -classpath .:htmlparser.jar = ListItems <url>"); } } |
From: Srinivas V. <sv...@al...> - 2005-12-20 14:28:17
|
Hi All, =20 I am trying to use the filter classes in the api to match the below HTML Code, but not successful. =20 <div class=3D"block"> <div class=3D"hd"><h2>Features</h2></div> <div class=3D"indent"> <UL><LI>Well Defined Spatial Radiation Patterns <LI>Viewing Angle: 15 degrees <LI>High Luminous Output <LI>Color:<BR>590 nm Amber<LI>High Operating Temperature: TJLED =3D +130 degrees C <LI>Superior Resistance to Moisture <LI>Package Options:<BR> Without Lead Stand-Offs<BR>Bulk </UL> </div> </div> =20 Basically I am interested in the data from the List (<UL> ) tag. There are many blocks like these in the whole HTML, and the only way to differentiate this List is from the fact that it will have a <Div> block as sibling which will have "Features" as text, and it will also enclosed in another <div> tag, and these both DIV tags will be enclosed in a <Div Class=3D"block"> tag. =20 =20 How to use the filters AND and NOT to match the above pattern? =20 =20 I appreciate your help, and thank you all for your time. =20 =20 =20 =20 |
From: Derrick O. <Der...@Ro...> - 2005-12-14 13:03:28
|
It's just the java.util.regex stuff under the hood. So look for examples for that package. v.sudhakarreddy ch wrote: > > Hi , > I want to know what kind of different regular expressions > suppoerted by HTML parser? In documentation only one example was given > on dates. Where can i find more examples use of > regular expression with HTML? > -- > Thanks in advance > sudhakar > |
From: v.sudhakarreddy c. <sud...@gm...> - 2005-12-14 12:14:01
|
Hi , I want to know what kind of different regular expressions suppoerted b= y HTML parser? In documentation only one example was given on dates. Where ca= n i find more examples use of regular expression with HTML? -- Thanks in advance sudhakar |
From: Derrick O. <Der...@Ro...> - 2005-12-13 03:00:08
|
Dink, The text of the children can be retrieved two ways: System.out.println (getChildren ().asString ()); or StringBean sb = new StringBean (); getChildren ().visitAllNodesWith (sb); System.out.println (sb.getStrings ()); The second way has better handling of line breaks and other whitespace. As for attributes of tags, there are a lot of ways depending on whether you want all the attributes or a particular one. Look at Tag.getAttribute (String), or Tag.getAttributesEx () along with the Attribute class. Derrick dink wrote: > Hello, > I am a beginner to use the html parser and would like to thank the > contributors to this tool. > When I want to get the content of the table, I encounter some problems. > The table I want to parse is like below: > <table> > <TR> > <TD><b>HTML</b></TD> > </TR> > </table> > The code used is: > NodeList tables = parser.parse (new TagNameFilter ("TABLE")); > TableTag table = (TableTag) tables.elementAt(0); //There is only one Table > TableRow row = table.getRows (0); //There is only one TR > TableColumn column = row.getColumn(); //get the TD > System.out.println(column.getChildren (); //print the content of TD > The output is: > 0 tag:b > 1 txt:HTML > 2 end:/b > Can somebody tell me how to only get the content,"HTML"? And if there > are some attributes in the tag <b>, e.g. <b attribute=xxx>,how can I > get the attribute value "xxx"? > Thanks in advance. > Dink Lo |
From: Fairy E. <en...@gm...> - 2005-12-12 15:17:14
|
SGVsbG8gYW55Ym9keQoKSSB3YW50IHRvIGtub3cgYWJvdXQgYSBYTUxQYXJzZXIgdGhhdCB3b3Jr cyBpbiB0aGUgd2F5IEhUTUxQYXJzZXIgd29ya3MsIGkKdGhpbmsgaXQgd2FzIHZlcnkgZ29vZCB0 byB1c2UgaWYgd2UgY2FuIHVzZSAiYm9vbGVhbiBvcGVyYXRpb25zIiBpbiBvcmRlciB0bwpuYXZp Z2F0ZSwgc2VhcmNoIG9yIG1vZGlmeSB0aGUgWE1MIGZpbGUuIEkgd2FudCBhIFhNTCBwYXJzZXIg dGhhdCB1c2UKZmlsdGVycyAoYW5kIHdyYXBwaW5nIGZpbHRlcnMgZmVhdHVyZSB0b28hKSB0byBw YXJzZSBYTUwKCnRoYW5rcyBmb3IgeW91ciB0aW1lIGFuZCBoZWxwCgpieWUKCjspCg== |
From: Fairy E. <en...@gm...> - 2005-12-12 15:11:56
|
SSB0aGluayBodG1scGFyc2VyIGlzIGdvb2QgZm9yIHRoaXMgdGFzaywganVzdCB1c2UgdGhlIHRy ZWUgc3RydWN0dXJlIG9mCm5vZGVzIHRoaXMgdG9vbCByZXR1cm5zIHdoZW4gcGFyc2luZyBhIGh0 bWwgZmlsZSBvciBzdHJlYW0uCgpUaGUgaGFyZCB0YXNrIGlzIGRlZmluZSB0aGUgZHRkIGZvciB4 bWwgb3V0cHV0LiBCdXQgSSdsbCBkbyB0aGF0IGluIGEgbmFpdmUKd2F5Li4uCg== |
From: dink <di...@mi...> - 2005-12-10 16:34:17
|
Hello, I am a beginner to use the html parser and would like to thank the = contributors to this tool. When I want to get the content of the table, I encounter some problems. The table I want to parse is like below: <table> <TR> <TD><b>HTML</b></TD> </TR> </table> The code used is: NodeList tables =3D parser.parse (new TagNameFilter ("TABLE")); TableTag table =3D (TableTag) tables.elementAt(0); //There is only = one Table TableRow row =3D table.getRows (0); //There is = only one TR TableColumn column =3D row.getColumn(); //get the TD System.out.println(column.getChildren (); //print the = content of TD The output is: 0 tag:b 1 txt:HTML 2 end:/b Can somebody tell me how to only get the content,"HTML"? And if there = are some attributes in the tag <b>, e.g. <b attribute=3Dxxx>,how can I = get the attribute value "xxx"? Thanks in advance. Dink Lo |
From: Ian M. <ian...@gm...> - 2005-12-09 15:10:17
|
That looks something like a depth-first search algorithm for fetching next and previous nodes. I've already volunteered the possibility of breadth-first traversal to the project, so we just have to see if the people who lead the project would like to accept it, then both could be contributed. By the way, the code would deal with Node's rather than Tag's (the logic is tree traversal), so you wouldn't want to check if it was a tag or not (what you'd instead do is get next node, and loop until it matches whatever you wanted it to match). I envisaged these methods: getNextSibling(Node currentNode, Node rootNode, boolean depthFirst) getPreviousSibling(Node currentNode, Node rootNode, boolean depthFirst) and as the depth-first is likely to be the more common use-case, wrappers: getNextSibling(Node currentNode, Node rootNode) getPreviousSibling(Node currentNode, Node rootNode) and indeed, as the entire document is likely to be what we are searching: getNextSibling(Node currentNode, boolean depthFirst) getPreviousSibling(Node currentNode, boolean depthFirst) getNextSibling(Node currentNode) getPreviousSibling(Node currentNode) Though those last ones would have to wait till the getNext/Previous node methods could deal with documents with multiple root nodes. Either that, or there ought to be a DocumentNode that holds the entire document. I'm not yet sure what the best way is. By the way, there is an inefficiency in your code that you'd want to change, in addition to changing Tag to generic Node. Instead of: if(tempNode.getNextSibling() !=3D null) { nextNode =3D tempNode.getNextSibling(); break; } it's more efficient to do this: tempNode2 =3D tempNode.getNextSibling() if(tempNode2 !=3D null) { nextNode =3D tempNode2; break; } That way it only calls getNextSibling at that point once, not twice. Kind regards, Ian Macfarlane On 12/9/05, Madhur Kumar Tanwani <mad...@gm...> wrote: > Hey, > Thanks Ian!! great!! That was a clear cut explanation... cool!! > > Ok.. so suit my situation, at least, I've designed and implemented code > snippets, which would get the Previous and Next Node. I've attached code > for the same with this mail. > > I've tested the code with many HTML pages. It works fine. In case > useful, the code is free to use, by anybody anywhere, but I expect that > you would preserve the ownership details. > > Please, if possible, could anyone comment on the code with critics or > suggestions. One probably important thing is that I could start > supporting filters in the function (something like get me the previous > link node only). > > I'm not sure of the procedures and standards but if this code with > whatever tweaks required could make it to some version of HTML parser, > I'll be obliged. I did not post it to the HTML Dev mailing list, since I > think that it would be too early to announce the code. > > So, HTMLParser Users, I need your comments and suggestions. > Looking forward to comments, > > Thanks, > > Ian Macfarlane wrote: > > >After that, it exits the loop, because prevSibling is now null. > > > >Why? Because this is the node structure (the formatting might not come > >out right, I'll also explain below): > > > >On 12/7/05, Madhur Kumar Tanwani <mad...@gm...> wrote: > > > > > >>>String : Unsubscribe > >>>Prev Sibling Txt (389[3,100],402[3,113]): Unsubscribe > >>>Next Sibling Txt (389[3,100],402[3,113]): Unsubscribe > >>> > >>> > >>I expected that the parser would treat the <A> tag and the <IMG> just b= efore the text "Unsubscribe" > >>as siblings and wold return those. > >> > >> > > -- > __________________________ > Madhur Kumar Tanwani > mad...@gm... > Ph.: 0253-5614792. > __________________________ > Always remember that you are absolutely unique. Just like everyone else. > > > > |
From: Madhur K. T. <mad...@gm...> - 2005-12-09 05:57:31
|
Hey, Thanks Ian!! great!! That was a clear cut explanation... cool!! Ok.. so suit my situation, at least, I've designed and implemented code snippets, which would get the Previous and Next Node. I've attached code for the same with this mail. I've tested the code with many HTML pages. It works fine. In case useful, the code is free to use, by anybody anywhere, but I expect that you would preserve the ownership details. Please, if possible, could anyone comment on the code with critics or suggestions. One probably important thing is that I could start supporting filters in the function (something like get me the previous link node only). I'm not sure of the procedures and standards but if this code with whatever tweaks required could make it to some version of HTML parser, I'll be obliged. I did not post it to the HTML Dev mailing list, since I think that it would be too early to announce the code. So, HTMLParser Users, I need your comments and suggestions. Looking forward to comments, Thanks, Ian Macfarlane wrote: >After that, it exits the loop, because prevSibling is now null. > >Why? Because this is the node structure (the formatting might not come >out right, I'll also explain below): > >On 12/7/05, Madhur Kumar Tanwani <mad...@gm...> wrote: > > >>>String : Unsubscribe >>>Prev Sibling Txt (389[3,100],402[3,113]): Unsubscribe >>>Next Sibling Txt (389[3,100],402[3,113]): Unsubscribe >>> >>> >>I expected that the parser would treat the <A> tag and the <IMG> just before the text "Unsubscribe" >>as siblings and wold return those. >> >> -- __________________________ Madhur Kumar Tanwani mad...@gm... Ph.: 0253-5614792. __________________________ Always remember that you are absolutely unique. Just like everyone else. |
From: JeffJie <Je...@bo...> - 2005-12-09 04:57:27
|
thank you.I have add the code to my main program.it seems work.but in fact the code doesn't work for the right place. another block of my code: Page page = new Page(manager.openConnection(url)); page.setEncoding(Global.PAGE_ENCODING); ps = new Parser(new Lexer(page)); after adding the timeout,when connection timeout occur.a ParserException was throwed : Exception getting input stream from http://localhost:8081/test/count then look back the code last time I paste here: fb.setParser(ps); the program still stop here sometimes.seems the time out doesn't work. could you please help me find out the answer.thank you! ---------- Original Message ---------------------------------- From: Derrick Oswald <Der...@Ro...> Reply-To: htm...@li... Date: Tue, 20 Sep 2005 07:39:56 -0400 >It's probably not timing out on the fetch from the page. >Try adding this to your main program: > > System.setProperty ("sun.net.client.defaultReadTimeout", "7000"); > System.setProperty ("sun.net.client.defaultConnectTimeout", "7000"); > > >JeffJie wrote: > >>hello. I got a problem when working with htmlparser. >>my program sometimes stop when invoking the api of the htmlparser. >>I wrote the function want to get the content in the specific tag,the tag has one attribute and has it's own value. >>here's part of my code: >>----------------------------------------------------------------------------- >>private NodeList filterResult(Parser ps, String key_name, String attr_name, String attr_value) { >> >> NodeFilter name=new TagNameFilter(key_name); >> >> NodeFilter attr=new HasAttributeFilter(attr_name,attr_value); >> >> NodeFilter tag=new AndFilter(name,attr); >> >> NodeFilter[] nf=new NodeFilter[1]; >> >> nf[0]=tag; >> FilterBean fb=new FilterBean(); >> >> fb.setFilters(nf); >> if(log.isInfoEnabled()){ >> log.info("now setting the parser"); >> } >> fb.setParser(ps); //program stop here >> if(log.isInfoEnabled()){ >> log.info("finished form the filter"); >> } >> return fb.getNodes(); >>} >>--------------------------------------------------------------------------- >>the argument "ps" had been initail like this: >>--------------------------------------------------------------------------- >> String url = "http://foo.bar.com"; >> ConnectionManager manager = new ConnectionManager(); >> Page page = new Page(manager.openConnection(url)); >> page.setEncoding(Global.PAGE_ENCODING); >> >> Parser ps = new Parser(new Lexer(page)); >>----------------------------------------------------------------------------- >>problem comes up at the mark line.the code is: >>fb.setParser(ps); >>where the program run,the log before this line execute normally.but the log after the line sometimes never execute. >>does it has the matter with the firewall?or any other reason? I need your help!thanks. >> >> >> >> >>________________________________________________________________ >>Sent via the WebMail system at botwave.com >> >> >> >> >> >> >>------------------------------------------------------- >>SF.Net email is sponsored by: >>Tame your development challenges with Apache's Geronimo App Server. Download >>it for free - -and be entered to win a 42" plasma tv or your very own >>Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >>_______________________________________________ >>Htmlparser-user mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> > > > >------------------------------------------------------- >SF.Net email is sponsored by: >Tame your development challenges with Apache's Geronimo App Server. Download >it for free - -and be entered to win a 42" plasma tv or your very own >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > ________________________________________________________________ Sent via the WebMail system at botwave.com |