htmlparser-user Mailing List for HTML Parser
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
|
1
|
2
|
3
(1) |
4
(2) |
5
(1) |
6
|
7
(2) |
8
(1) |
9
|
10
|
11
(1) |
12
|
13
|
14
(2) |
15
|
16
(1) |
17
|
18
|
19
(2) |
20
|
21
(3) |
22
(2) |
23
|
24
(2) |
25
|
26
(1) |
27
|
28
(2) |
29
(1) |
30
(2) |
31
(2) |
|
|
|
|
|
From: semeera B <sem...@ya...> - 2009-08-31 10:06:48
|
What is html parser ? How to create it ? See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ |
From: Lee G. <20...@le...> - 2009-08-31 08:03:01
|
Thanks - it was a version error, on my side, I think. Derrick Oswald wrote: > The tools.jar file comes with the JDK. It's in the ext directory I think. > It's probably a version issue - it's looking for an older version of > the JDK tools than you have. > You may be able to edit the build.xml and change the version. > > On Sun, Aug 30, 2009 at 10:23 AM, Lee Goddard <20...@le... > <mailto:20...@le...>> wrote: > > Sorry if this is a FAQ, I couldn't see it mentinoed on the site. > > Since using HTML Parser, I've been getting the following from Maven: > "Missing artifact com.sun:tools:jar:1.6.0:system" > > I've tried adding the extra build profile mentioned on the Maven FAQ > page, to no avail. > > Could someone please help? > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports > 2008 30-Day > trial. Simplify your report design, integration and deployment - > and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > <mailto:Htm...@li...> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > ------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Derrick O. <der...@gm...> - 2009-08-30 14:43:36
|
The tools.jar file comes with the JDK. It's in the ext directory I think. It's probably a version issue - it's looking for an older version of the JDK tools than you have. You may be able to edit the build.xml and change the version. On Sun, Aug 30, 2009 at 10:23 AM, Lee Goddard <20...@le...> wrote: > Sorry if this is a FAQ, I couldn't see it mentinoed on the site. > > Since using HTML Parser, I've been getting the following from Maven: > "Missing artifact com.sun:tools:jar:1.6.0:system" > > I've tried adding the extra build profile mentioned on the Maven FAQ > page, to no avail. > > Could someone please help? > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Lee G. <20...@le...> - 2009-08-30 08:24:06
|
Sorry if this is a FAQ, I couldn't see it mentinoed on the site. Since using HTML Parser, I've been getting the following from Maven: "Missing artifact com.sun:tools:jar:1.6.0:system" I've tried adding the extra build profile mentioned on the Maven FAQ page, to no avail. Could someone please help? |
From: tamizh v. <tam...@gm...> - 2009-08-29 13:57:05
|
I am going to do a project based on HtmlParser. So as a first step i suppose to learn the HtmlParser. I go through the documentation but it is difficult keep track the context while learning through the documentation. So could you please provide some tutorials of HtmlParser, so that i can learn it well.. Thanks in advance.. |
From: Derrick O. <der...@gm...> - 2009-08-28 08:48:05
|
You don't need a parser. Just get the text directly: URL url URLConnection con; InputStream in; con = url.openConnection (); con.connect (); in = con.getInputStream() then do what you want with the contents. On Fri, Aug 28, 2009 at 9:43 AM, Neftali Papelleras < pap...@ya...> wrote: > Hi Good Day, > > I've been trying to look for a function in this library that can return a > string of html text of a web page. I know the java.net.URLConnection can > provide me with it, but it's better for me to just use a single function say > getHTMLSource that returns the html text of a url.Please let me know if it's > possible here and with sample code :) Thanks in advance, > > > > Kind Regards, > nef > > start: 0000-00-00 end: 0000-00-00 > ------------------------------ > Feel safer online. Upgrade to the new, safer Internet Explorer 8 > <http://us.lrd.yahoo.com/_ylc=X3oDMTFnNHZxc2k1BHRtX2RtZWNoA1RleHQgTGluawR0bV9sbmsDVTExMDM0NjUEdG1fbmV0A1lhaG9vIQ--/SIG=11k7khaee/**http%3A//downloads.yahoo.com/sg/internetexplorer/>optimized > for Yahoo! to put your mind at peace. It's free. > Get IE8 here!<http://us.lrd.yahoo.com/_ylc=X3oDMTFnNHZxc2k1BHRtX2RtZWNoA1RleHQgTGluawR0bV9sbmsDVTExMDM0NjUEdG1fbmV0A1lhaG9vIQ--/SIG=11k7khaee/**http%3A//downloads.yahoo.com/sg/internetexplorer/> > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Neftali P. <pap...@ya...> - 2009-08-28 07:43:34
|
Hi Good Day, I've been trying to look for a function in this library that can return a string of html text of a web page. I know the java.net.URLConnection can provide me with it, but it's better for me to just use a single function say getHTMLSource that returns the html text of a url.Please let me know if it's possible here and with sample code :) Thanks in advance, Kind Regards, nef New Email addresses available on Yahoo! Get the Email name you've always wanted on the new @ymail and @rocketmail. Hurry before someone else does! http://mail.promotions.yahoo.com/newdomains/ph/ |
From: Derrick O. <der...@gm...> - 2009-08-24 15:42:55
|
You probably want the text that you can get from the StringBean. http://htmlparser.sourceforge.net/javadoc/index.html. Or if you really want the tags too, you can use toHtml(). On Mon, Aug 24, 2009 at 2:30 PM, Agrawal Ashish <agr...@st...>wrote: > Dear Users, > > I am quite new to this library. I want to use the function getStringText() > from CompositeParser class. I dont know how I can use it. I am doing the > following: > > parser = new Parser (urlString); > NodeList list = new NodeList (); > NodeFilter filter = new TagNameFilter ("STRONG"); > > for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) > (e.nextNode ()).collectInto (list, filter); > > > Can you help me for finding the way I can typecast or something to get > getStringText() function work. > > > Thank you very much > > Ashish > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Agrawal A. <agr...@st...> - 2009-08-24 12:53:30
|
Dear Users, I am quite new to this library. I want to use the function getStringText() from CompositeParser class. I dont know how I can use it. I am doing the following: parser = new Parser (urlString); NodeList list = new NodeList (); NodeFilter filter = new TagNameFilter ("STRONG"); for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) (e.nextNode ()).collectInto (list, filter); Can you help me for finding the way I can typecast or something to get getStringText() function work. Thank you very much Ashish |
From: Neftali P. <pap...@ya...> - 2009-08-22 00:59:41
|
Good Day! I just woke up,8:30 in the morning. I'm very glad got a reply from this organization already with very helpful information. I will look at this later this morning as I will have a seminar to attend to at university. Thank you very much! i really appreciated this help :) I will check on here from time to time if I get hung up on a problem regarding the topic. Respectfully, neftali ________________________________ From: "htm...@li..." <htm...@li...> To: htm...@li... Sent: Saturday, August 22, 2009 4:56:24 AM Subject: Htmlparser-user Digest, Vol 35, Issue 4 Send Htmlparser-user mailing list submissions to htm...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/htmlparser-user or, via email, send a message with subject or body 'help' to htm...@li...urceforge..net You can reach the person managing the list at htm...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of Htmlparser-user digest...." Today's Topics: 1. Need Suggestions to get Started in HTML parsing (tamizh vendan) 2. Re: Need Suggestions to get Started in HTML parsing (Derrick Oswald) 3. Web Crawler Thesis Project Using HTML Parser To collect links (Neftali Papelleras) 4.. Web Crawler Thesis Project Using HTML Parser To collect links (Neftali Papelleras) 5. Re: Web Crawler Thesis Project Using HTML Parser To collect links (Derrick Oswald) ---------------------------------------------------------------------- Message: 1 Date: Wed, 19 Aug 2009 20:42:04 +0530 From: tamizh vendan <tam...@gm...> Subject: [Htmlparser-user] Need Suggestions to get Started in HTML parsing To: htm...@li... Message-ID: <b98...@ma...> Content-Type: text/plain; charset="iso-8859-1" I am newbie to HTML parsing. I knew both Java and HTML well. I would like to construct a DOM tree from the HTML coding of a Webpage. It would be helpful for me if someone specify how to get started and kindly provide some tutorial or article links. Provide Sample programs if possible.. Thanks in advance.. -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 2 Date: Wed, 19 Aug 2009 19:18:39 +0200 From: Derrick Oswald <der...@gm...> Subject: Re: [Htmlparser-user] Need Suggestions to get Started in HTML parsing To: htmlparser user list <htm...@li...> Message-ID: <16a...@ma...> Content-Type: text/plain; charset="iso-8859-1" Have a look at the mainline in Parser.java: http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/Parser.java?revision=8&view=markup That program prints it out, but the results of parser.Parse (filter) is a NodeList which is your (nested) dom tree. Also have a look for other main methods in the code. On Wed, Aug 19, 2009 at 5:12 PM, tamizh vendan <tam...@gm...> wrote: > > I am newbie to HTML parsing.. I knew both Java and HTML well. I would like > to construct a DOM tree from the HTML coding of a Webpage. It would be > helpful for me if someone specify how to get started and kindly provide some > tutorial or article links. Provide Sample programs if possible.. Thanks in > advance.. > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 3 Date: Fri, 21 Aug 2009 10:40:19 -0700 (PDT) From: Neftali Papelleras <pap...@ya...> Subject: [Htmlparser-user] Web Crawler Thesis Project Using HTML Parser To collect links To: htm...@li... Cc: pap...@ya... Message-ID: <661...@we...> Content-Type: text/plain; charset="utf-8" Hi everyone. I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage. The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way. I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again. I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API. Looking forward for a good response from this organization. Respectfully, neftali Surf faster. Internet Explorer 8 optmized for Yahoo! auto launches 2 of your favorite pages everytime you open your browser. Get IE8 here! http://downloads.yahoo.com/sg/internetexplorer/ -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 4 Date: Fri, 21 Aug 2009 10:42:32 -0700 (PDT) From: Neftali Papelleras <pap...@ya...> Subject: [Htmlparser-user] Web Crawler Thesis Project Using HTML Parser To collect links To: htm...@li... Cc: pap...@ya... Message-ID: <269...@we...> Content-Type: text/plain; charset="utf-8" Hi everyone. I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage. The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way. I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again. I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API. Looking forward for a good response from this organization. Respectfully, neftali Design your own exclusive Pingbox today! It's easy to create your personal chat space on your blogs. http://ph.messenger.yahoo.com/pingbox -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 5 Date: Fri, 21 Aug 2009 22:56:14 +0200 From: Derrick Oswald <der...@gm...> Subject: Re: [Htmlparser-user] Web Crawler Thesis Project Using HTML Parser To collect links To: htmlparser user list <htm...@li...> Message-ID: <16a...@ma...> Content-Type: text/plain; charset="iso-8859-1" Have a look at org.htmlparser.beans.HTMLLinkBean<http://htmlparser.sourceforge.net/javadoc/index.html> At the bottom of the source file<http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/beans/HTMLLinkBean.java?revision=4&view=markup>is a commented out main program to get you started. On Fri, Aug 21, 2009 at 7:42 PM, Neftali Papelleras < pap...@ya...> wrote: > Hi everyone. > > I am Neftali Papelleras, an Engineering student from University of San > Carlos, Cebu City, Philippines. I am currently having my thesis project > which involves web crawling. The title of my project is A Web Extraction > Tool to Monitor Websites and is implemented in Java. I am still on the first > month of this one-year thesis project, and still on the information > gathering stage. > > The first question I need to answer is how to create a Java-based web > crawler. And next is how to retrieve the the web contents on every web page. > And lastly, how to retrieve links from a given web source. First thing came > to my mind was to use Java RegEx to retrieve the links given a web source. > But now I understand it's not the right way to do it. And that's why I came > to HTML Parser, because I knew this is the right way. > > I know Java but not on advanced level, I just know the concept. Though I > have created several programs already, last was a chat system, I am still > not confident with my skills on Java. But I am very much eager to learn and > I am starting now, again. > > I have already downloaded the 1.6 version of HTML Parser and have browsed > on different folders and files. I attempted to create a very simple parser > program using the HTML Parser API, but unfortunately I was confused where to > and how to start. I am hoping that this organization can provide a simple > program that illustrates how to retrieve a link given a web page > source/html text. I can follow through the program and eventually lead me to > the understanding of using this API. > > Looking forward for a good response from this organization. > > Respectfully, > neftali > > ------------------------------ > Have a new Yahoo! Mail account?<http://us.rd.yahoo.com/SIG=11dea1p2c/**https%3A%2F%2Fsiteproxy.ruqli.workers.dev%3A443%2Fhttp%2Fwww.trueswitch.com%2Fyahoo-ph> > Kick start your journey by importing all your contacts! > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ------------------------------ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user End of Htmlparser-user Digest, Vol 35, Issue 4 ********************************************** Cleaner, Better, Faster - Experience the new Faster Yahoo! Mail today at http://ph.mail.yahoo.com |
From: Derrick O. <der...@gm...> - 2009-08-21 20:56:24
|
Have a look at org.htmlparser.beans.HTMLLinkBean<http://htmlparser.sourceforge.net/javadoc/index.html> At the bottom of the source file<http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/beans/HTMLLinkBean.java?revision=4&view=markup>is a commented out main program to get you started. On Fri, Aug 21, 2009 at 7:42 PM, Neftali Papelleras < pap...@ya...> wrote: > Hi everyone. > > I am Neftali Papelleras, an Engineering student from University of San > Carlos, Cebu City, Philippines. I am currently having my thesis project > which involves web crawling. The title of my project is A Web Extraction > Tool to Monitor Websites and is implemented in Java. I am still on the first > month of this one-year thesis project, and still on the information > gathering stage. > > The first question I need to answer is how to create a Java-based web > crawler. And next is how to retrieve the the web contents on every web page. > And lastly, how to retrieve links from a given web source. First thing came > to my mind was to use Java RegEx to retrieve the links given a web source. > But now I understand it's not the right way to do it. And that's why I came > to HTML Parser, because I knew this is the right way. > > I know Java but not on advanced level, I just know the concept. Though I > have created several programs already, last was a chat system, I am still > not confident with my skills on Java. But I am very much eager to learn and > I am starting now, again. > > I have already downloaded the 1.6 version of HTML Parser and have browsed > on different folders and files. I attempted to create a very simple parser > program using the HTML Parser API, but unfortunately I was confused where to > and how to start. I am hoping that this organization can provide a simple > program that illustrates how to retrieve a link given a web page > source/html text. I can follow through the program and eventually lead me to > the understanding of using this API. > > Looking forward for a good response from this organization. > > Respectfully, > neftali > > ------------------------------ > Have a new Yahoo! Mail account?<http://us.rd.yahoo.com/SIG=11dea1p2c/**https%3A%2F%2Fsiteproxy.ruqli.workers.dev%3A443%2Fhttp%2Fwww.trueswitch.com%2Fyahoo-ph> > Kick start your journey by importing all your contacts! > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Neftali P. <pap...@ya...> - 2009-08-21 17:42:44
|
Hi everyone. I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage. The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way. I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again. I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API. Looking forward for a good response from this organization. Respectfully, neftali Design your own exclusive Pingbox today! It's easy to create your personal chat space on your blogs. http://ph.messenger.yahoo.com/pingbox |
From: Neftali P. <pap...@ya...> - 2009-08-21 17:40:41
|
Hi everyone. I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage. The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way. I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again. I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API. Looking forward for a good response from this organization. Respectfully, neftali Surf faster. Internet Explorer 8 optmized for Yahoo! auto launches 2 of your favorite pages everytime you open your browser. Get IE8 here! http://downloads.yahoo.com/sg/internetexplorer/ |
From: Derrick O. <der...@gm...> - 2009-08-19 17:18:53
|
Have a look at the mainline in Parser.java: http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/Parser.java?revision=8&view=markup That program prints it out, but the results of parser.Parse (filter) is a NodeList which is your (nested) dom tree. Also have a look for other main methods in the code. On Wed, Aug 19, 2009 at 5:12 PM, tamizh vendan <tam...@gm...> wrote: > > I am newbie to HTML parsing. I knew both Java and HTML well. I would like > to construct a DOM tree from the HTML coding of a Webpage. It would be > helpful for me if someone specify how to get started and kindly provide some > tutorial or article links. Provide Sample programs if possible.. Thanks in > advance.. > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: tamizh v. <tam...@gm...> - 2009-08-19 15:12:17
|
I am newbie to HTML parsing. I knew both Java and HTML well. I would like to construct a DOM tree from the HTML coding of a Webpage. It would be helpful for me if someone specify how to get started and kindly provide some tutorial or article links. Provide Sample programs if possible.. Thanks in advance.. |