htmlparser-user Mailing List for HTML Parser

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thanks a lot, it worked!
Sincerely,
Ope

>From: htm...@li...
>Reply-To: htm...@li...
>To: htm...@li...
>Subject: Htmlparser-user digest, Vol 1 #228 - 1 msg
>Date: Sun, 30 Mar 2003 12:09:36 -0800
>
>Send Htmlparser-user mailing list submissions to
>	htm...@li...
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>or, via email, send a message with subject or body 'help' to
>	htm...@li...
>
>You can reach the person managing the list at
>	htm...@li...
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Htmlparser-user digest..."
>
>
>Today's Topics:
>
>    1. Re: Re: Htmlparser-user digest, Vol 1 #226 - 2 msgs (Somik Raha)
>
>--__--__--
>
>Message: 1
>From: "Somik Raha" <so...@ya...>
>To: <htm...@li...>
>Subject: Re: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2 
>msgs
>Date: Sat, 29 Mar 2003 22:18:18 -0800
>Reply-To: htm...@li...
>
>FYI, I've just found that the CompositeTagScanner had a bug, due to which
>the filters were not being set. Ope -->
>node.collectInto(nodeList, LinkTag.LINK_TAG_FILTER);
>
>will work in the next integration release.
>
>Regards,
>Somik
>----- Original Message -----
>From: "Somik Raha" <so...@ya...>
>To: <htm...@li...>
>Sent: Thursday, March 27, 2003 2:38 PM
>Subject: RE: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2
>msgs
>
>
> > Instead of this,
> > > node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);
> > use:
> >
> > node.collectInto(nodeList,LinkTag.class);
> >
> > Regards,
> > Somik
> > --- Marc Novakowski <ma...@ke...> wrote:
> > > Try removing the following line from your code:
> > >
> > > nodeList.add(node);
> > >
> > > It's most likely adding non-LinkTag nodes into
> > > nodeList which causes the ClassCastException later
> > > on.
> > >
> > > Marc
> > >
> > > -----Original Message-----
> > > From: ope tomori [mailto:op...@ho...]
> > > Sent: Thursday, March 27, 2003 1:31 PM
> > > To: htm...@li...
> > > Subject: [Htmlparser-user] Re: Htmlparser-user
> > > digest, Vol 1 #226 - 2
> > > msgs
> > >
> > >
> > > I figured out the part using the
> > > nodeList.collectInto. My debug output shows
> > > the right output, put when i try to process the link
> > > information, i get this
> > > error (this is part of the error):
> > >
> > > Exception occurred during event dispatching:
> > > java.lang.ClassCastException:
> > > org.htmlparser.tags.DoctypeTag
> > >
> > >
> > > Thanks in advance for your help
> > >
> > > Sincerely,
> > > Ope T.
> > >
> > >
> > > This is my code below:
> > > try{
> > > //create the parser with the url to be parsed
> > > parser = new Parser(urlAddressComplete,new
> > > DefaultParserFeedback());
> > > parser.registerScanners();
> > > nodeList = new NodeList();
> > >
> > > //to extratct all the embedded links and images
> > >
> > > for (NodeIterator e =
> > > parser.elements();e.hasMoreNodes();) {
> > > Node node = (Node)e.nextNode();
> > > nodeList.add(node);
> > >
> > //node.collectInto(nodeList,ImageTag.IMAGE_TAG_FILTER);
> > > node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);
> > >
> > > }//for
> > >
> > > System.out.print("CHECKING NODES.. " +
> > > nodeList.toString()+ "\n");
> > >
> > > //now process the links and images
> > > //this is the part that doesnt seem to work
> > >
> > > for (SimpleNodeIterator e =
> > > nodeList.elements();e.hasMoreNodes();) {
> > > LinkTag linkTag = (LinkTag)e.nextNode();
> > >
> > > //put the links and their texts into vectors
> > > allTextLinkVector.addElement(linkTag.getLinkText());
> > > allLinkVector.addElement(linkTag.getLink());
> > > }
> > > // System.out.print( "All Links " + "Size: "+
> > > allTextLinkVector.size() + "
> > > "+ allTextLinkVector.toString()+ "\n");
> > >
> > > }//inner try
> > >
> > > catch (ParserException e) {
> > > System.err.println("Error, could not create parser
> > > object");
> > > e.printStackTrace();
> > > }//catch
> > > }// outer try
> > > catch(IOException ex) { ex.printStackTrace(); }
> > >
> > >
> > >
> > >
> > >
> > >
> > > >From: htm...@li...
> > > Reply-To:
> > > >htm...@li... To:
> > > >htm...@li... Subject:
> > > Htmlparser-user digest, Vol
> > > >1 #226 - 2 msgs Date: Thu, 27 Mar 2003 12:49:39
> > > -0800
> > > >
> > > >Send Htmlparser-user mailing list submissions to
> > > >htm...@li...
> > > >
> > > >To subscribe or unsubscribe via the World Wide Web,
> > > visit
> > >
> > >https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > > or, via email,
> > > >send a message with subject or body 'help' to
> > > >htm...@li...
> > > >
> > > >You can reach the person managing the list at
> > > >htm...@li...
> > > >
> > > >When replying, please edit your Subject line so it
> > > is more specific than
> > > >"Re: Contents of Htmlparser-user digest..."
> > > >
> > > >
> > > >Today's Topics:
> > > >
> > > >1. Help with method --> node.collectInto() (ope
> > > tomori) 2. RE: Help with
> > > >method --> node.collectInto() (Marc Novakowski)
> > > >
> > > >-- __--__--
> > > >
> > > >Message: 1 From: "ope tomori" To:
> > > htm...@li...
> > > >Date: Thu, 27 Mar 2003 15:00:17 +0000 Subject:
> > > [Htmlparser-user] Help with
> > > >method --> node.collectInto() Reply-To:
> > > >htm...@li...
> > > >
> > > >
> > > >Hi Im trying to use the method
> > > node.collectInto(...) to extract embedded
> > > >links and images on webpages. Im using the latest
> > > integration release which
> > > >means its now Parser, not HTMLParser, nodeIterator,
> > > etc and all the other
> > > >changes.
> > > >
> > > >
> > > >
> > > >I followed the sample code:
> > > >
> > > >HTMLParser parser = new
> > > HTMLParser("http://www.yahoo.com");
> > > >parser.registerScanners(); int i = 0; Vector
> > > collectionVector = new
> > > >Vector(); HTMLNode node; for (HTMLEnumeration e =
> > > >parser.elements();e.hasMoreNodes();) { node =
> > > e.nextHTMLNode();
> > >
> > >node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
> > > } // All
> > > >items in the collection vector should be links for
> > > (Enumeration e =
> > > >collectionVector.elements();e.hasMoreElements();) {
> > > HTMLLinkTag linkTag =
> > > >(HTMLLinkTag)e.nextElement(); // you can now
> > > process the links as you like
> > > >}
> > >
> > ***********************************************************
> > > >
> > > >
> > > >Im getting an error because this line:
> > > >
> > >
> > >node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
> > > requires a
> > > >nodeList and not a vector, ive tried changing it
> > > without any success:
> > > >Creating a nodelist instead of a vector,
> > > >
> > > >can u please help me!!
> > > >
> > > >Thanks Ope
> > > >
> > > >
> > >
> > >_________________________________________________________________
> > > The new
> > > >MSN 8: advanced junk mail protection and 2 months
> > > FREE*
> > > >http://join.msn.com/?page=features/junkmail
> > > >
> > > >
> > > >
> > > >-- __--__--
> > > >
> > > >Message: 2 Subject: RE: [Htmlparser-user] Help with
> > > method -->
> > > >node.collectInto() Date: Thu, 27 Mar 2003 08:30:54
> > > -0800 From: "Marc
> > > >Novakowski" To: Reply-To:
> > > htm...@li...
> > > >
> > > >If you can paste the actual code you're trying to
> > > compile, I'd be more =
> > > >than happy to take a look at it.
> > > >
> > > >Marc
> > > >
> > > >-----Original Message----- From: ope tomori
> > > [mailto:op...@ho...]
> > > >Sent: Thursday, March 27, 2003 7:00 AM To:
> > > >htm...@li... Subject:
> > > [Htmlparser-user]
> > === message truncated ===
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
> > http://platinum.yahoo.com
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by:
> > The Definitive IT and Networking Event. Be There!
> > NetWorld+Interop Las Vegas 2003 -- Register today!
> > http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>
>
>--__--__--
>
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>End of Htmlparser-user Digest

_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2002	Jan (7)	Feb	Mar (9)	Apr (50)	May (20)	Jun (47)	Jul (37)	Aug (32)	Sep (30)	Oct (11)	Nov (37)	Dec (47)
2003	Jan (31)	Feb (70)	Mar (67)	Apr (34)	May (66)	Jun (25)	Jul (48)	Aug (43)	Sep (58)	Oct (25)	Nov (10)	Dec (25)
2004	Jan (38)	Feb (17)	Mar (24)	Apr (25)	May (11)	Jun (6)	Jul (24)	Aug (42)	Sep (13)	Oct (17)	Nov (13)	Dec (44)
2005	Jan (10)	Feb (16)	Mar (16)	Apr (23)	May (6)	Jun (19)	Jul (39)	Aug (15)	Sep (40)	Oct (49)	Nov (29)	Dec (41)
2006	Jan (28)	Feb (24)	Mar (52)	Apr (41)	May (31)	Jun (34)	Jul (22)	Aug (12)	Sep (11)	Oct (11)	Nov (11)	Dec (4)
2007	Jan (39)	Feb (13)	Mar (16)	Apr (24)	May (13)	Jun (12)	Jul (21)	Aug (61)	Sep (31)	Oct (13)	Nov (32)	Dec (15)
2008	Jan (7)	Feb (8)	Mar (14)	Apr (12)	May (23)	Jun (20)	Jul (9)	Aug (6)	Sep (2)	Oct (7)	Nov (3)	Dec (2)
2009	Jan (5)	Feb (8)	Mar (10)	Apr (22)	May (85)	Jun (82)	Jul (45)	Aug (28)	Sep (26)	Oct (50)	Nov (8)	Dec (16)
2010	Jan (3)	Feb (11)	Mar (39)	Apr (56)	May (80)	Jun (64)	Jul (49)	Aug (48)	Sep (16)	Oct (3)	Nov (5)	Dec (5)
2011	Jan (13)	Feb	Mar (1)	Apr (7)	May (7)	Jun (7)	Jul (7)	Aug (8)	Sep	Oct (6)	Nov (2)	Dec
2012	Jan (5)	Feb	Mar (3)	Apr (3)	May (4)	Jun (8)	Jul (1)	Aug (5)	Sep (10)	Oct (3)	Nov (2)	Dec (4)
2013	Jan (4)	Feb (2)	Mar (7)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar (1)	Apr	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (4)	Nov (2)	Dec (4)
2015	Jan (4)	Feb (2)	Mar (8)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug (1)	Sep (1)	Oct (4)	Nov (3)	Dec (4)
2016	Jan (4)	Feb (6)	Mar (9)	Apr (9)	May (6)	Jun (1)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2017	Jan	Feb (1)	Mar (3)	Apr (1)	May	Jun (1)	Jul (2)	Aug (3)	Sep (6)	Oct (3)	Nov (2)	Dec (5)
2018	Jan (3)	Feb (13)	Mar (28)	Apr (5)	May (4)	Jun (2)	Jul (2)	Aug (8)	Sep (2)	Oct (1)	Nov (5)	Dec (1)
2019	Jan (8)	Feb (1)	Mar	Apr (1)	May (4)	Jun	Jul (1)	Aug	Sep	Oct	Nov (2)	Dec (2)
2020	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (2)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec (1)
2021	Jan (3)	Feb (2)	Mar (1)	Apr (1)	May (2)	Jun (1)	Jul (2)	Aug (1)	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2024	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct (1)	Nov	Dec

S	M	T	W	T	F	S
						1
2 (5)	3 (9)	4 (1)	5 (1)	6	7	8 (2)
9	10 (5)	11 (3)	12 (11)	13 (3)	14 (2)	15 (1)
16 (1)	17 (2)	18 (1)	19 (1)	20 (1)	21 (5)	22
23 (2)	24 (3)	25	26	27 (5)	28	29
30 (1)	31 (2)

htmlparser-user Mailing List for HTML Parser

htmlparser-user — The user mailing list for users of the htmlparser library