htmlparser-user Mailing List for HTML Parser
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
1
(2) |
2
|
3
|
4
|
5
|
6
(1) |
7
(4) |
8
(2) |
9
(2) |
10
(1) |
11
(4) |
12
(3) |
13
(2) |
14
(1) |
15
(2) |
16
(1) |
17
(2) |
18
(4) |
19
(2) |
20
(3) |
21
(3) |
22
(1) |
23
(12) |
24
(5) |
25
|
26
(1) |
27
|
28
(2) |
29
|
30
(1) |
31
|
|
From: Dominick N. <Nie...@an...> - 2007-08-28 06:49:49
|
Mail to htmlparser-user A MASSIVE schlong, is only a few months away cari Bot http://pinester.com/ |
From: william l. <wil...@ya...> - 2007-08-28 04:27:30
|
just like in the following: "<li>info</li> <li>info</li> <li>info</li> <li>info</li>" or "<a title="taught" href="/https/sourceforge.net/index.html" rel="section">info</a> info <a title="research-led" href="/https/sourceforge.net/index.html" rel="section">info</a> info." How to extract these info circularly? Thanks in advance! --------------------------------- Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. |
From: lindberg e. <lin...@na...> - 2007-08-26 16:19:10
|
http://www.mankine.com/ Hi htmlparser-user Tired of being just average? Ravinder janzen |
From: Derrick O. <der...@ro...> - 2007-08-24 22:29:07
|
You probably have to hit the login page first.=0AThen use the same Connecti= onManager to access the desired page.=0A=0A----- Original Message ----=0AFr= om: "mic...@Ta..." <mic...@Ta...>=0ATo: htmlparser user l= ist <htm...@li...>=0ACc: htmlparser user list <htm= lpa...@li...>=0ASent: Friday, August 24, 2007 10:36:0= 5 AM=0ASubject: Re: [Htmlparser-user] How to login to web page=0A=0ADerrick= ,=0AI'm trying this approach (strictly HtmlParser) as well, but I can't get= =0Alogged in. The array of URLs is that of a page shown when not logged in.= =0A=0ACan you suggest anything else?=0AThanks,=0AMick=0A=0AURL[] urlArray;= =0A=0AConnectionManager connectionManager =3D new ConnectionManager();=0Aur= l =3D new URL("www.someloginpage.com";);=0AconnectionManager.openConnection= (url);=0A=0AconnectionManager.setRedirectionProcessingEnabled(true);=0Aconn= ectionManager.setCookieProcessingEnabled(true);=0AconnectionManager.setUser= (USER_NAME);=0AconnectionManager.setPassword(PASSWORD);=0A=0A// go to link = with stuff=0Aurl =3D new URL("a page beyond the login page");=0AconnectionM= anager.openConnection(url);=0AlinkBean.setConnection(connectionManager.open= Connection(url));=0AurlArray =3D linkBean.getLinks(); // get all links=0A= =0A=0A---------------------------------------------------------------------= =0A> You might try setRedirectionProcessingEnabled(true). Often the first U= RL=0A> is only a gateway.=0A> Also, it's setCookieProcessingEnabled(true), = not addCookies.=0A=0A=0A---------------------------------------------------= ----------------------=0AThis SF.net email is sponsored by: Splunk Inc.=0AS= till grepping through log files to find problems? Stop.=0ANow Search log e= vents and configuration files using AJAX and a browser.=0ADownload your FRE= E copy of Splunk now >> http://get.splunk.com/=0A_________________________= ______________________=0AHtmlparser-user mailing list=0AHtmlparser-user@lis= ts.sourceforge.net=0Ahttps://lists.sourceforge.net/lists/listinfo/htmlparse= r-user=0A=0A=0A=0A=0A |
From: <mic...@Ta...> - 2007-08-24 14:36:12
|
Derrick, I'm trying this approach (strictly HtmlParser) as well, but I can't get logged in. The array of URLs is that of a page shown when not logged in. Can you suggest anything else? Thanks, Mick URL[] urlArray; ConnectionManager connectionManager = new ConnectionManager(); url = new URL("www.someloginpage.com"); connectionManager.openConnection(url); connectionManager.setRedirectionProcessingEnabled(true); connectionManager.setCookieProcessingEnabled(true); connectionManager.setUser(USER_NAME); connectionManager.setPassword(PASSWORD); // go to link with stuff url = new URL("a page beyond the login page"); connectionManager.openConnection(url); linkBean.setConnection(connectionManager.openConnection(url)); urlArray = linkBean.getLinks(); // get all links --------------------------------------------------------------------- > You might try setRedirectionProcessingEnabled(true). Often the first URL > is only a gateway. > Also, it's setCookieProcessingEnabled(true), not addCookies. |
From: <mic...@Ta...> - 2007-08-24 13:35:25
|
Thanks for all the information! In your first response to my post, in your code, you call these methods: getCookiesArrayList(client) client.getResponseHeaders() getRequestHeaders() I removed these calls, but I guess now I need to use them. Can you post these methods? > 2007/8/24, mic...@ta... <mic...@ta...>: >> >> Mattia, >> >> In my original post, I showed this code that uses HtmlParser to connect >> to >> a web page and get all link from that page. >> >> url = new URL("http://www.google.com"); >> urlConnection = url.openConnection(); >> ConnectionManager connectionManager = new ConnectionManager(); >> connectionManager.setRedirectionProcessingEnabled(true); >> linkBean.setConnection(connectionManager.openConnection(url)); >> urlArray = linkBean.getLinks(); // get all links > > > Ok. So dis work for you and You are exactly in the same situation I was > some > month ago. > I needed to parse some pages and when I faced pages protected from login I > used HttpClient. Sorry for confusing you a bit! :o) > > My problem was that I couldn't get a page that required a login. You >> replied with some code, which I have modified a bit (below). This works, >> but it is not using HtmlParser. It uses apache commons. >> >> Can I get the HtmlParser code to work with the apache commons code. In >> particular, can I use the connection established with the HttpClient >> (from >> apache commons) in the HtmlParser code? If so, how? > > > The site I'm working on requires 2 cookies to be sent to pages protected > by > login so the first step is logging in with HttpClient commons lib, take > the > cookies and use those inside HttpHeaders I'm sending to the second page > (page protected from login) so I could enter and parse this second page. > > > I think you have to investigate if the site you are tring to enter has a > particular system to understand if you are logged or not. Try to see from > fireworks "right click" on the page and select "Page info" you will see > headers form and so on. Lets see if there are cookies. > > > Suppose you are logged in and you took a cookie that say to the site that > you are logged I use the HtmlParser in this way (similarly as you do in > your > code): > > public Hashtable getMySecondPage(String url, ArrayList cookies, > Header[] headers) { > logger.info("url: " + url); > try { > // I pass headers and cookies making the request so the site > see > that I'm logged > setUpConnectionManager(cookies, headers); > Parser parser = new Parser(url); > NodeList nodelist = parser.parse(null); > for (int i = 0; i < nodelist.size(); i++) { > System.out.print(nodelist.toHtml()); > } > } catch (Exception e) { > e.printStackTrace(); > } > return null; > } > > If the site doesn't use cookies try just to make 2 request sequentially, > FIRST the login call than call the page protected from login. > > Cheers > > Mattia > > > > Thanks, >> Mick >> >> -------------------------------------------------------------------------- >> >> import java.io.IOException; >> import java.util.logging.*; >> import java.util.ArrayList; >> import org.apache.commons.httpclient.NameValuePair; >> >> /** >> * WebScraper2 >> * >> */ >> public class WebScraper2 { >> >> private Logger logger; >> private HttpClientUtil httpClientUtil; >> private String loginURL; >> >> /** >> * constructor >> */ >> public WebScraper2() { >> >> createLogger(); >> >> loginURL = "https://www.ctslink.com/login.do"; >> >> httpClientUtil = new HttpClientUtil(); >> >> login(); >> >> } >> >> >> /* >> * login to site >> */ >> public String login() { >> >> String responseString = null; >> >> //logger.info(this.getClass().getName() + " - login"); >> >> try { >> ArrayList<NameValuePair> parameters = new >> ArrayList<NameValuePair>(); >> parameters.add(new NameValuePair("username", >> "joeUser")); >> parameters.add(new NameValuePair("password", >> "somePassword")); >> >> int response = >> httpClientUtil.submitPostForm(this.loginURL, "", >> parameters, null); >> logger.info("Response = " + response); >> parameters.clear(); >> } catch (Exception e) { >> logger.warning(" LOGIN PROBLEM!"); >> e.printStackTrace(); >> } >> >> return responseString; >> } >> >> /* >> * create the logger >> */ >> public void createLogger() { >> // Get a logger; the logger is automatically created if >> // it doesn't already exist >> try { >> // Create a file handler that writes log record to a >> file >> FileHandler handler = new >> FileHandler("webscraper.log"); >> handler.setFormatter(new SimpleFormatter()); // set >> file >> format >> // to plain text, not xml >> >> // Add to the desired logger >> logger = Logger.getLogger("webscraper.Webscraper"); >> logger.addHandler(handler); >> logger.setLevel(Level.INFO); >> } catch (IOException e) { >> System.out >> .println("WebScraper2:createLogger(): >> Error >> creating logger"); >> } >> } >> >> >> /** >> * >> * @param args >> */ >> public static void main(String[] args) { >> >> new WebScraper2(); >> System.out.println("Done"); >> >> } >> } >> >> --------------------------------------------------------------------- >> >> import java.io.*; >> import java.util.ArrayList; >> import org.apache.commons.httpclient.methods.PostMethod; >> import org.apache.commons.httpclient.*; >> import org.apache.commons.httpclient.cookie.CookiePolicy; >> import org.apache.commons.httpclient.NameValuePair; >> >> /** >> * HttpClientUtil >> * >> */ >> public class HttpClientUtil extends HttpClient { >> >> private PostMethod postMethod; >> >> /** >> * constructor >> */ >> public HttpClientUtil() { >> >> } >> >> >> /** >> * submitPostForm >> * >> * @param relativeUrl >> * @param formName >> * @param params >> * @param requestHeaders >> * @return >> */ >> public int submitPostForm(String relativeUrl, String formName, >> ArrayList<NameValuePair> params, Header[] >> requestHeaders) { >> BufferedReader bufferedReader = null; >> int statusCode = -999; >> >> byte[] result = null; >> try { >> NameValuePair[] data = null; >> if (params != null) { >> data = new NameValuePair[params.size()]; >> for (int i = 0; i < params.size(); i++) { >> data[i] = (NameValuePair) params.get(i); >> } >> } >> PostMethod method = new PostMethod(relativeUrl); >> this.postMethod = method; >> method.getParams().setCookiePolicy(CookiePolicy.RFC_2109 >> ); >> >> if (params != null) { >> method.addParameters(data); >> } >> statusCode = this.executeMethod(method); >> >> bufferedReader = new BufferedReader(new >> InputStreamReader(method.getResponseBodyAsStream())); >> String readLine; >> while(((readLine = bufferedReader.readLine()) != null)) { >> System.out.println(readLine); >> } >> >> >> } catch (IOException ioe) { >> ioe.printStackTrace(); >> } >> return statusCode; >> } >> >> } >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> > http://get.splunk.com/_______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Petja K. <Pe...@EA...> - 2007-08-24 10:56:24
|
Hello htmlparser-user Become the ultimate pleasure machine, women will love it http://www.paviboo.com/ |
From: Mattia T. <mat...@gm...> - 2007-08-24 07:00:24
|
2007/8/24, mic...@ta... <mic...@ta...>: > > Mattia, > > In my original post, I showed this code that uses HtmlParser to connect to > a web page and get all link from that page. > > url = new URL("http://www.google.com"); > urlConnection = url.openConnection(); > ConnectionManager connectionManager = new ConnectionManager(); > connectionManager.setRedirectionProcessingEnabled(true); > linkBean.setConnection(connectionManager.openConnection(url)); > urlArray = linkBean.getLinks(); // get all links Ok. So dis work for you and You are exactly in the same situation I was some month ago. I needed to parse some pages and when I faced pages protected from login I used HttpClient. Sorry for confusing you a bit! :o) My problem was that I couldn't get a page that required a login. You > replied with some code, which I have modified a bit (below). This works, > but it is not using HtmlParser. It uses apache commons. > > Can I get the HtmlParser code to work with the apache commons code. In > particular, can I use the connection established with the HttpClient (from > apache commons) in the HtmlParser code? If so, how? The site I'm working on requires 2 cookies to be sent to pages protected by login so the first step is logging in with HttpClient commons lib, take the cookies and use those inside HttpHeaders I'm sending to the second page (page protected from login) so I could enter and parse this second page. I think you have to investigate if the site you are tring to enter has a particular system to understand if you are logged or not. Try to see from fireworks "right click" on the page and select "Page info" you will see headers form and so on. Lets see if there are cookies. Suppose you are logged in and you took a cookie that say to the site that you are logged I use the HtmlParser in this way (similarly as you do in your code): public Hashtable getMySecondPage(String url, ArrayList cookies, Header[] headers) { logger.info("url: " + url); try { // I pass headers and cookies making the request so the site see that I'm logged setUpConnectionManager(cookies, headers); Parser parser = new Parser(url); NodeList nodelist = parser.parse(null); for (int i = 0; i < nodelist.size(); i++) { System.out.print(nodelist.toHtml()); } } catch (Exception e) { e.printStackTrace(); } return null; } If the site doesn't use cookies try just to make 2 request sequentially, FIRST the login call than call the page protected from login. Cheers Mattia Thanks, > Mick > > -------------------------------------------------------------------------- > > import java.io.IOException; > import java.util.logging.*; > import java.util.ArrayList; > import org.apache.commons.httpclient.NameValuePair; > > /** > * WebScraper2 > * > */ > public class WebScraper2 { > > private Logger logger; > private HttpClientUtil httpClientUtil; > private String loginURL; > > /** > * constructor > */ > public WebScraper2() { > > createLogger(); > > loginURL = "https://www.ctslink.com/login.do"; > > httpClientUtil = new HttpClientUtil(); > > login(); > > } > > > /* > * login to site > */ > public String login() { > > String responseString = null; > > //logger.info(this.getClass().getName() + " - login"); > > try { > ArrayList<NameValuePair> parameters = new > ArrayList<NameValuePair>(); > parameters.add(new NameValuePair("username", > "joeUser")); > parameters.add(new NameValuePair("password", > "somePassword")); > > int response = > httpClientUtil.submitPostForm(this.loginURL, "", > parameters, null); > logger.info("Response = " + response); > parameters.clear(); > } catch (Exception e) { > logger.warning(" LOGIN PROBLEM!"); > e.printStackTrace(); > } > > return responseString; > } > > /* > * create the logger > */ > public void createLogger() { > // Get a logger; the logger is automatically created if > // it doesn't already exist > try { > // Create a file handler that writes log record to a > file > FileHandler handler = new FileHandler("webscraper.log"); > handler.setFormatter(new SimpleFormatter()); // set file > format > // to plain text, not xml > > // Add to the desired logger > logger = Logger.getLogger("webscraper.Webscraper"); > logger.addHandler(handler); > logger.setLevel(Level.INFO); > } catch (IOException e) { > System.out > .println("WebScraper2:createLogger(): Error > creating logger"); > } > } > > > /** > * > * @param args > */ > public static void main(String[] args) { > > new WebScraper2(); > System.out.println("Done"); > > } > } > > --------------------------------------------------------------------- > > import java.io.*; > import java.util.ArrayList; > import org.apache.commons.httpclient.methods.PostMethod; > import org.apache.commons.httpclient.*; > import org.apache.commons.httpclient.cookie.CookiePolicy; > import org.apache.commons.httpclient.NameValuePair; > > /** > * HttpClientUtil > * > */ > public class HttpClientUtil extends HttpClient { > > private PostMethod postMethod; > > /** > * constructor > */ > public HttpClientUtil() { > > } > > > /** > * submitPostForm > * > * @param relativeUrl > * @param formName > * @param params > * @param requestHeaders > * @return > */ > public int submitPostForm(String relativeUrl, String formName, > ArrayList<NameValuePair> params, Header[] > requestHeaders) { > BufferedReader bufferedReader = null; > int statusCode = -999; > > byte[] result = null; > try { > NameValuePair[] data = null; > if (params != null) { > data = new NameValuePair[params.size()]; > for (int i = 0; i < params.size(); i++) { > data[i] = (NameValuePair) params.get(i); > } > } > PostMethod method = new PostMethod(relativeUrl); > this.postMethod = method; > method.getParams().setCookiePolicy(CookiePolicy.RFC_2109 > ); > > if (params != null) { > method.addParameters(data); > } > statusCode = this.executeMethod(method); > > bufferedReader = new BufferedReader(new > InputStreamReader(method.getResponseBodyAsStream())); > String readLine; > while(((readLine = bufferedReader.readLine()) != null)) { > System.out.println(readLine); > } > > > } catch (IOException ioe) { > ioe.printStackTrace(); > } > return statusCode; > } > > } > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <mic...@Ta...> - 2007-08-23 22:46:47
|
Mattia, In my original post, I showed this code that uses HtmlParser to connect to a web page and get all link from that page. url = new URL("http://www.google.com"); urlConnection = url.openConnection(); ConnectionManager connectionManager = new ConnectionManager(); connectionManager.setRedirectionProcessingEnabled(true); linkBean.setConnection(connectionManager.openConnection(url)); urlArray = linkBean.getLinks(); // get all links My problem was that I couldn't get a page that required a login. You replied with some code, which I have modified a bit (below). This works, but it is not using HtmlParser. It uses apache commons. Can I get the HtmlParser code to work with the apache commons code. In particular, can I use the connection established with the HttpClient (from apache commons) in the HtmlParser code? If so, how? Thanks, Mick -------------------------------------------------------------------------- import java.io.IOException; import java.util.logging.*; import java.util.ArrayList; import org.apache.commons.httpclient.NameValuePair; /** * WebScraper2 * */ public class WebScraper2 { private Logger logger; private HttpClientUtil httpClientUtil; private String loginURL; /** * constructor */ public WebScraper2() { createLogger(); loginURL = "https://www.ctslink.com/login.do"; httpClientUtil = new HttpClientUtil(); login(); } /* * login to site */ public String login() { String responseString = null; //logger.info(this.getClass().getName() + " - login"); try { ArrayList<NameValuePair> parameters = new ArrayList<NameValuePair>(); parameters.add(new NameValuePair("username", "joeUser")); parameters.add(new NameValuePair("password", "somePassword")); int response = httpClientUtil.submitPostForm(this.loginURL, "", parameters, null); logger.info("Response = " + response); parameters.clear(); } catch (Exception e) { logger.warning(" LOGIN PROBLEM!"); e.printStackTrace(); } return responseString; } /* * create the logger */ public void createLogger() { // Get a logger; the logger is automatically created if // it doesn't already exist try { // Create a file handler that writes log record to a file FileHandler handler = new FileHandler("webscraper.log"); handler.setFormatter(new SimpleFormatter()); // set file format // to plain text, not xml // Add to the desired logger logger = Logger.getLogger("webscraper.Webscraper"); logger.addHandler(handler); logger.setLevel(Level.INFO); } catch (IOException e) { System.out .println("WebScraper2:createLogger(): Error creating logger"); } } /** * * @param args */ public static void main(String[] args) { new WebScraper2(); System.out.println("Done"); } } --------------------------------------------------------------------- import java.io.*; import java.util.ArrayList; import org.apache.commons.httpclient.methods.PostMethod; import org.apache.commons.httpclient.*; import org.apache.commons.httpclient.cookie.CookiePolicy; import org.apache.commons.httpclient.NameValuePair; /** * HttpClientUtil * */ public class HttpClientUtil extends HttpClient { private PostMethod postMethod; /** * constructor */ public HttpClientUtil() { } /** * submitPostForm * * @param relativeUrl * @param formName * @param params * @param requestHeaders * @return */ public int submitPostForm(String relativeUrl, String formName, ArrayList<NameValuePair> params, Header[] requestHeaders) { BufferedReader bufferedReader = null; int statusCode = -999; byte[] result = null; try { NameValuePair[] data = null; if (params != null) { data = new NameValuePair[params.size()]; for (int i = 0; i < params.size(); i++) { data[i] = (NameValuePair) params.get(i); } } PostMethod method = new PostMethod(relativeUrl); this.postMethod = method; method.getParams().setCookiePolicy(CookiePolicy.RFC_2109); if (params != null) { method.addParameters(data); } statusCode = this.executeMethod(method); bufferedReader = new BufferedReader(new InputStreamReader(method.getResponseBodyAsStream())); String readLine; while(((readLine = bufferedReader.readLine()) != null)) { System.out.println(readLine); } } catch (IOException ioe) { ioe.printStackTrace(); } return statusCode; } } |
From: Mattia T. <mat...@gm...> - 2007-08-23 21:07:05
|
Ok, PERFECT! 200 is the HTTP code that stands for page succesfully reached or loaded. Search for Http Code to look at other's You'll need them, 404 is page not found, 500 is server error... Good job. Mattia 2007/8/23, mic...@ta... <mic...@ta...>: > > I think i got it. > The 'response' returned is the html code of loginURL. > The statusCode returned = 200 > > Did it work? (ie, did it log in successfully?) > Where can I find what status code = 200 means? > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <mic...@Ta...> - 2007-08-23 20:38:47
|
I think i got it. The 'response' returned is the html code of loginURL. The statusCode returned = 200 Did it work? (ie, did it log in successfully?) Where can I find what status code = 200 means? |
From: Mattia T. <mat...@gm...> - 2007-08-23 19:41:34
|
Exactly, those are global objects of the class with login method. for login page look at this example: Open firefox on the page where the login form is, view page's source code, look at the form tag and you will see something like: <FORM name=form enctype=x-www-form-urlencoded method=post action="/forms/Myform.jsp"> loginPage for me is the page where the submission of the form send the navigation. I hope all is's clear now. Bye Mattia 2007/8/23, mic...@ta... <mic...@ta...>: > > So I did this: > private String baseUrl; > private Any user; > private Any password; > > this.user.insert_string("giovanni_doe"); > this.password.insert_string("mypassword); > > I think all I need is to understand what "loginPage" and it will compile, > at least. > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <mic...@Ta...> - 2007-08-23 16:46:08
|
So I did this: private String baseUrl; private Any user; private Any password; this.user.insert_string("giovanni_doe"); this.password.insert_string("mypassword); I think all I need is to understand what "loginPage" and it will compile, at least. |
From: <mic...@Ta...> - 2007-08-23 16:24:22
|
I guess I still don't understand what "loginPage" is. Can you give me an example? > loginData forget it, it's just a check not to login every time. Delete if > instruction. > > loginPage, is the action of the form you are tring to submit. > > For now forgive Headers, both Request and Response and also cookies. > > Just call submitPostForm(String url, ArrayList params) > Passing an url and the login parameters. > > client inside HttpClientUtil id an HttpClient instance. > > HttpClientUtil is a wrapper of HttpClient just to reuse some basic method > to > post forms in post or get way. > > Delete wat you don't need for now. > > Bye. > > Mattia > > > > 2007/8/23, mic...@ta... <mic...@ta...>: >> >> I find some problems with this code: >> >> loginData, loginPage, getRequestHeaders(), getResponseHeaders(), >> getCookiesArrayList() are undefined >> >> In HttpClientUtil: >> >> If client is an instantiation of HttpClientUtil and submitPostForm() is >> a >> method in HttpClientUtil, then how can submitPostForm() use 'client'? >> >> Methods addHeader() is undefined for HttpClientUtil. >> >> cookies (in printCookies(cookies);) is undefined. >> >> >> > Hi, >> > >> > this is code I used to login: >> > >> > >> > public Object login() { >> > logger.info(this.getClass().getName() + " - login"); >> > if (loginData == null) { >> > try { >> > client.setBaseURL(new URL(this.baseUrl)); >> > >> > ArrayList parametri = new ArrayList(); >> > parametri.add(new NameValuePair("username", >> this.user)); >> > parametri.add(new NameValuePair("password", >> > this.password)); >> > >> > byte[] response = >> client.submitPostForm(this.loginPage, >> > "", >> > parametri, getRequestHeaders()); >> > String workingResponse = new String(response); >> > Header[] headers = client.getResponseHeaders(); >> > for (int j = 0; j < headers.length; j++) { >> > logger.info("headers " + j + ": " + >> > headers[j].getName() >> > + "=" + headers[j].getValue()); >> > } >> > logger.info(workingResponse); >> > >> > parametri.clear(); >> > } catch (Exception e) { >> > logger.error(" LOGIN PROBLEM!"); >> > e.printStackTrace(); >> > } >> > } >> > return getCookiesArrayList(client); >> > } >> > >> > >> > client is an utility class HttpClientUtil that contains the method >> > submitPostForm below. >> > >> > public byte[] submitPostForm(String relativeUrl, String formName, >> > ArrayList params, Header[] requestHeaders) { >> > byte[] result = null; >> > try { >> > NameValuePair[] data = null; >> > if (params != null) { >> > data = new NameValuePair[params.size()]; >> > for (int i = 0; i < params.size(); i++) { >> > data[i] = (NameValuePair) params.get(i); >> > } >> > } >> > PostMethod method = new PostMethod(getBaseURL() + >> > relativeUrl); >> > this.method = method; >> > logger.info("BASE URL: " + getBaseURL()); >> > logger.info("RELATIVE URL: " + relativeUrl); >> > method.getParams().setCookiePolicy(CookiePolicy.RFC_2109); >> > >> > addHeaders(requestHeaders); >> > logger.info("URI PRE QueryString Setting>>> " >> > + method.getURI()); >> > if (params != null) { >> > method.addParameters(data); >> > } >> > int statusCode = client.executeMethod(method); >> > >> > logger.info("QueryString>>> " + method.getQueryString()); >> > logger.info("URI>>> " + method.getURI()); >> > >> > result = method.getResponseBody(); >> > >> > setCookies(client.getState().getCookies()); >> > printCookies(cookies); >> > Header[] headers = method.getRequestHeaders(); >> > addHeaders(headers); >> > } catch (IOException ioe) { >> > ioe.printStackTrace(); >> > } >> > return result; >> > } >> > >> > >> > Hope it help's. >> > >> > Cheers >> > >> > Mattia >> > 2007/8/23, Derrick Oswald <der...@ro...>: >> >> >> >> >> >> You might try setRedirectionProcessingEnabled(true). Often the first >> URL >> >> is only a gateway. >> >> Also, it's setCookieProcessingEnabled(true), not addCookies. >> >> >> >> ----- Original Message ---- >> >> From: "mic...@Ta..." <mic...@Ta...> >> >> To: htm...@li... >> >> Sent: Wednesday, August 22, 2007 7:05:36 PM >> >> Subject: [Htmlparser-user] How to login to web page >> >> >> >> How does one get to a web page that requires login? >> >> The code that I wrote (below) doesn't seem to work. >> >> >> >> url = new URL("http://www.somewebsite.com" >> >> <http://www.somewebsite.com%22> >> >> ;); >> >> urlConnection = url.openConnection(); >> >> >> >> ConnectionManager connectionManager = new ConnectionManager(); >> >> connectionManager.setUser(USER_NAME); >> >> connectionManager.setPassword(PASSWORD); >> >> connectionManager.addCookies(urlConnection); >> >> >> >> linkBean.setConnection(connectionManager.openConnection(url)); >> >> URL[] urlArray = linkBean.getLinks(); // get all links >> >> >> >> Thanks for any help. >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by: Splunk Inc. >> >> Still grepping through log files to find problems? Stop. >> >> Now Search log events and configuration files using AJAX and a >> browser. >> >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> >> _______________________________________________ >> >> Htmlparser-user mailing list >> >> Htm...@li... >> >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by: Splunk Inc. >> >> Still grepping through log files to find problems? Stop. >> >> Now Search log events and configuration files using AJAX and a >> browser. >> >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> >> _______________________________________________ >> >> Htmlparser-user mailing list >> >> Htm...@li... >> >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> >> >> > >> ------------------------------------------------------------------------- >> > This SF.net email is sponsored by: Splunk Inc. >> > Still grepping through log files to find problems? Stop. >> > Now Search log events and configuration files using AJAX and a >> browser. >> > Download your FREE copy of Splunk now >> >> > http://get.splunk.com/_______________________________________________ >> > Htmlparser-user mailing list >> > Htm...@li... >> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> > >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> > http://get.splunk.com/_______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <mic...@Ta...> - 2007-08-23 15:57:43
|
What are the classes of: this.baseUrl this.user this.password user and password are not Strings, which was my guess. > loginData forget it, it's just a check not to login every time. Delete if > instruction. > > loginPage, is the action of the form you are tring to submit. > > For now forgive Headers, both Request and Response and also cookies. > > Just call submitPostForm(String url, ArrayList params) > Passing an url and the login parameters. > > client inside HttpClientUtil id an HttpClient instance. > > HttpClientUtil is a wrapper of HttpClient just to reuse some basic method > to > post forms in post or get way. > > Delete wat you don't need for now. > > Bye. > > Mattia > > > > 2007/8/23, mic...@ta... <mic...@ta...>: >> >> I find some problems with this code: >> >> loginData, loginPage, getRequestHeaders(), getResponseHeaders(), >> getCookiesArrayList() are undefined >> >> In HttpClientUtil: >> >> If client is an instantiation of HttpClientUtil and submitPostForm() is >> a >> method in HttpClientUtil, then how can submitPostForm() use 'client'? >> >> Methods addHeader() is undefined for HttpClientUtil. >> >> cookies (in printCookies(cookies);) is undefined. >> >> >> > Hi, >> > >> > this is code I used to login: >> > >> > >> > public Object login() { >> > logger.info(this.getClass().getName() + " - login"); >> > if (loginData == null) { >> > try { >> > client.setBaseURL(new URL(this.baseUrl)); >> > >> > ArrayList parametri = new ArrayList(); >> > parametri.add(new NameValuePair("username", >> this.user)); >> > parametri.add(new NameValuePair("password", >> > this.password)); >> > >> > byte[] response = >> client.submitPostForm(this.loginPage, >> > "", >> > parametri, getRequestHeaders()); >> > String workingResponse = new String(response); >> > Header[] headers = client.getResponseHeaders(); >> > for (int j = 0; j < headers.length; j++) { >> > logger.info("headers " + j + ": " + >> > headers[j].getName() >> > + "=" + headers[j].getValue()); >> > } >> > logger.info(workingResponse); >> > >> > parametri.clear(); >> > } catch (Exception e) { >> > logger.error(" LOGIN PROBLEM!"); >> > e.printStackTrace(); >> > } >> > } >> > return getCookiesArrayList(client); >> > } >> > >> > >> > client is an utility class HttpClientUtil that contains the method >> > submitPostForm below. >> > >> > public byte[] submitPostForm(String relativeUrl, String formName, >> > ArrayList params, Header[] requestHeaders) { >> > byte[] result = null; >> > try { >> > NameValuePair[] data = null; >> > if (params != null) { >> > data = new NameValuePair[params.size()]; >> > for (int i = 0; i < params.size(); i++) { >> > data[i] = (NameValuePair) params.get(i); >> > } >> > } >> > PostMethod method = new PostMethod(getBaseURL() + >> > relativeUrl); >> > this.method = method; >> > logger.info("BASE URL: " + getBaseURL()); >> > logger.info("RELATIVE URL: " + relativeUrl); >> > method.getParams().setCookiePolicy(CookiePolicy.RFC_2109); >> > >> > addHeaders(requestHeaders); >> > logger.info("URI PRE QueryString Setting>>> " >> > + method.getURI()); >> > if (params != null) { >> > method.addParameters(data); >> > } >> > int statusCode = client.executeMethod(method); >> > >> > logger.info("QueryString>>> " + method.getQueryString()); >> > logger.info("URI>>> " + method.getURI()); >> > >> > result = method.getResponseBody(); >> > >> > setCookies(client.getState().getCookies()); >> > printCookies(cookies); >> > Header[] headers = method.getRequestHeaders(); >> > addHeaders(headers); >> > } catch (IOException ioe) { >> > ioe.printStackTrace(); >> > } >> > return result; >> > } >> > >> > >> > Hope it help's. >> > >> > Cheers >> > >> > Mattia >> > 2007/8/23, Derrick Oswald <der...@ro...>: >> >> >> >> >> >> You might try setRedirectionProcessingEnabled(true). Often the first >> URL >> >> is only a gateway. >> >> Also, it's setCookieProcessingEnabled(true), not addCookies. >> >> >> >> ----- Original Message ---- >> >> From: "mic...@Ta..." <mic...@Ta...> >> >> To: htm...@li... >> >> Sent: Wednesday, August 22, 2007 7:05:36 PM >> >> Subject: [Htmlparser-user] How to login to web page >> >> >> >> How does one get to a web page that requires login? >> >> The code that I wrote (below) doesn't seem to work. >> >> >> >> url = new URL("http://www.somewebsite.com" >> >> <http://www.somewebsite.com%22> >> >> ;); >> >> urlConnection = url.openConnection(); >> >> >> >> ConnectionManager connectionManager = new ConnectionManager(); >> >> connectionManager.setUser(USER_NAME); >> >> connectionManager.setPassword(PASSWORD); >> >> connectionManager.addCookies(urlConnection); >> >> >> >> linkBean.setConnection(connectionManager.openConnection(url)); >> >> URL[] urlArray = linkBean.getLinks(); // get all links >> >> >> >> Thanks for any help. >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by: Splunk Inc. >> >> Still grepping through log files to find problems? Stop. >> >> Now Search log events and configuration files using AJAX and a >> browser. >> >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> >> _______________________________________________ >> >> Htmlparser-user mailing list >> >> Htm...@li... >> >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by: Splunk Inc. >> >> Still grepping through log files to find problems? Stop. >> >> Now Search log events and configuration files using AJAX and a >> browser. >> >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> >> _______________________________________________ >> >> Htmlparser-user mailing list >> >> Htm...@li... >> >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> >> >> > >> ------------------------------------------------------------------------- >> > This SF.net email is sponsored by: Splunk Inc. >> > Still grepping through log files to find problems? Stop. >> > Now Search log events and configuration files using AJAX and a >> browser. >> > Download your FREE copy of Splunk now >> >> > http://get.splunk.com/_______________________________________________ >> > Htmlparser-user mailing list >> > Htm...@li... >> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> > >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> > http://get.splunk.com/_______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Mattia T. <mat...@gm...> - 2007-08-23 15:06:07
|
loginData forget it, it's just a check not to login every time. Delete if instruction. loginPage, is the action of the form you are tring to submit. For now forgive Headers, both Request and Response and also cookies. Just call submitPostForm(String url, ArrayList params) Passing an url and the login parameters. client inside HttpClientUtil id an HttpClient instance. HttpClientUtil is a wrapper of HttpClient just to reuse some basic method to post forms in post or get way. Delete wat you don't need for now. Bye. Mattia 2007/8/23, mic...@ta... <mic...@ta...>: > > I find some problems with this code: > > loginData, loginPage, getRequestHeaders(), getResponseHeaders(), > getCookiesArrayList() are undefined > > In HttpClientUtil: > > If client is an instantiation of HttpClientUtil and submitPostForm() is a > method in HttpClientUtil, then how can submitPostForm() use 'client'? > > Methods addHeader() is undefined for HttpClientUtil. > > cookies (in printCookies(cookies);) is undefined. > > > > Hi, > > > > this is code I used to login: > > > > > > public Object login() { > > logger.info(this.getClass().getName() + " - login"); > > if (loginData == null) { > > try { > > client.setBaseURL(new URL(this.baseUrl)); > > > > ArrayList parametri = new ArrayList(); > > parametri.add(new NameValuePair("username", this.user)); > > parametri.add(new NameValuePair("password", > > this.password)); > > > > byte[] response = client.submitPostForm(this.loginPage, > > "", > > parametri, getRequestHeaders()); > > String workingResponse = new String(response); > > Header[] headers = client.getResponseHeaders(); > > for (int j = 0; j < headers.length; j++) { > > logger.info("headers " + j + ": " + > > headers[j].getName() > > + "=" + headers[j].getValue()); > > } > > logger.info(workingResponse); > > > > parametri.clear(); > > } catch (Exception e) { > > logger.error(" LOGIN PROBLEM!"); > > e.printStackTrace(); > > } > > } > > return getCookiesArrayList(client); > > } > > > > > > client is an utility class HttpClientUtil that contains the method > > submitPostForm below. > > > > public byte[] submitPostForm(String relativeUrl, String formName, > > ArrayList params, Header[] requestHeaders) { > > byte[] result = null; > > try { > > NameValuePair[] data = null; > > if (params != null) { > > data = new NameValuePair[params.size()]; > > for (int i = 0; i < params.size(); i++) { > > data[i] = (NameValuePair) params.get(i); > > } > > } > > PostMethod method = new PostMethod(getBaseURL() + > > relativeUrl); > > this.method = method; > > logger.info("BASE URL: " + getBaseURL()); > > logger.info("RELATIVE URL: " + relativeUrl); > > method.getParams().setCookiePolicy(CookiePolicy.RFC_2109); > > > > addHeaders(requestHeaders); > > logger.info("URI PRE QueryString Setting>>> " > > + method.getURI()); > > if (params != null) { > > method.addParameters(data); > > } > > int statusCode = client.executeMethod(method); > > > > logger.info("QueryString>>> " + method.getQueryString()); > > logger.info("URI>>> " + method.getURI()); > > > > result = method.getResponseBody(); > > > > setCookies(client.getState().getCookies()); > > printCookies(cookies); > > Header[] headers = method.getRequestHeaders(); > > addHeaders(headers); > > } catch (IOException ioe) { > > ioe.printStackTrace(); > > } > > return result; > > } > > > > > > Hope it help's. > > > > Cheers > > > > Mattia > > 2007/8/23, Derrick Oswald <der...@ro...>: > >> > >> > >> You might try setRedirectionProcessingEnabled(true). Often the first > URL > >> is only a gateway. > >> Also, it's setCookieProcessingEnabled(true), not addCookies. > >> > >> ----- Original Message ---- > >> From: "mic...@Ta..." <mic...@Ta...> > >> To: htm...@li... > >> Sent: Wednesday, August 22, 2007 7:05:36 PM > >> Subject: [Htmlparser-user] How to login to web page > >> > >> How does one get to a web page that requires login? > >> The code that I wrote (below) doesn't seem to work. > >> > >> url = new URL("http://www.somewebsite.com" > >> <http://www.somewebsite.com%22> > >> ;); > >> urlConnection = url.openConnection(); > >> > >> ConnectionManager connectionManager = new ConnectionManager(); > >> connectionManager.setUser(USER_NAME); > >> connectionManager.setPassword(PASSWORD); > >> connectionManager.addCookies(urlConnection); > >> > >> linkBean.setConnection(connectionManager.openConnection(url)); > >> URL[] urlArray = linkBean.getLinks(); // get all links > >> > >> Thanks for any help. > >> > >> > >> > ------------------------------------------------------------------------- > >> This SF.net email is sponsored by: Splunk Inc. > >> Still grepping through log files to find problems? Stop. > >> Now Search log events and configuration files using AJAX and a browser. > >> Download your FREE copy of Splunk now >> http://get.splunk.com/ > >> _______________________________________________ > >> Htmlparser-user mailing list > >> Htm...@li... > >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >> > >> > >> > ------------------------------------------------------------------------- > >> This SF.net email is sponsored by: Splunk Inc. > >> Still grepping through log files to find problems? Stop. > >> Now Search log events and configuration files using AJAX and a browser. > >> Download your FREE copy of Splunk now >> http://get.splunk.com/ > >> _______________________________________________ > >> Htmlparser-user mailing list > >> Htm...@li... > >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >> > >> > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> > > http://get.splunk.com/_______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <mic...@Ta...> - 2007-08-23 14:49:28
|
I find some problems with this code: loginData, loginPage, getRequestHeaders(), getResponseHeaders(), getCookiesArrayList() are undefined In HttpClientUtil: If client is an instantiation of HttpClientUtil and submitPostForm() is a method in HttpClientUtil, then how can submitPostForm() use 'client'? Methods addHeader() is undefined for HttpClientUtil. cookies (in printCookies(cookies);) is undefined. > Hi, > > this is code I used to login: > > > public Object login() { > logger.info(this.getClass().getName() + " - login"); > if (loginData == null) { > try { > client.setBaseURL(new URL(this.baseUrl)); > > ArrayList parametri = new ArrayList(); > parametri.add(new NameValuePair("username", this.user)); > parametri.add(new NameValuePair("password", > this.password)); > > byte[] response = client.submitPostForm(this.loginPage, > "", > parametri, getRequestHeaders()); > String workingResponse = new String(response); > Header[] headers = client.getResponseHeaders(); > for (int j = 0; j < headers.length; j++) { > logger.info("headers " + j + ": " + > headers[j].getName() > + "=" + headers[j].getValue()); > } > logger.info(workingResponse); > > parametri.clear(); > } catch (Exception e) { > logger.error(" LOGIN PROBLEM!"); > e.printStackTrace(); > } > } > return getCookiesArrayList(client); > } > > > client is an utility class HttpClientUtil that contains the method > submitPostForm below. > > public byte[] submitPostForm(String relativeUrl, String formName, > ArrayList params, Header[] requestHeaders) { > byte[] result = null; > try { > NameValuePair[] data = null; > if (params != null) { > data = new NameValuePair[params.size()]; > for (int i = 0; i < params.size(); i++) { > data[i] = (NameValuePair) params.get(i); > } > } > PostMethod method = new PostMethod(getBaseURL() + > relativeUrl); > this.method = method; > logger.info("BASE URL: " + getBaseURL()); > logger.info("RELATIVE URL: " + relativeUrl); > method.getParams().setCookiePolicy(CookiePolicy.RFC_2109); > > addHeaders(requestHeaders); > logger.info("URI PRE QueryString Setting>>> " > + method.getURI()); > if (params != null) { > method.addParameters(data); > } > int statusCode = client.executeMethod(method); > > logger.info("QueryString>>> " + method.getQueryString()); > logger.info("URI>>> " + method.getURI()); > > result = method.getResponseBody(); > > setCookies(client.getState().getCookies()); > printCookies(cookies); > Header[] headers = method.getRequestHeaders(); > addHeaders(headers); > } catch (IOException ioe) { > ioe.printStackTrace(); > } > return result; > } > > > Hope it help's. > > Cheers > > Mattia > 2007/8/23, Derrick Oswald <der...@ro...>: >> >> >> You might try setRedirectionProcessingEnabled(true). Often the first URL >> is only a gateway. >> Also, it's setCookieProcessingEnabled(true), not addCookies. >> >> ----- Original Message ---- >> From: "mic...@Ta..." <mic...@Ta...> >> To: htm...@li... >> Sent: Wednesday, August 22, 2007 7:05:36 PM >> Subject: [Htmlparser-user] How to login to web page >> >> How does one get to a web page that requires login? >> The code that I wrote (below) doesn't seem to work. >> >> url = new URL("http://www.somewebsite.com" >> <http://www.somewebsite.com%22> >> ;); >> urlConnection = url.openConnection(); >> >> ConnectionManager connectionManager = new ConnectionManager(); >> connectionManager.setUser(USER_NAME); >> connectionManager.setPassword(PASSWORD); >> connectionManager.addCookies(urlConnection); >> >> linkBean.setConnection(connectionManager.openConnection(url)); >> URL[] urlArray = linkBean.getLinks(); // get all links >> >> Thanks for any help. >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> > http://get.splunk.com/_______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <pos...@or...> - 2007-08-23 11:44:09
|
This is an automatically generated Delivery Status Notification. Unable to deliver message to the following recipients, due to being unable to connect successfully to the destination mail server. htm...@li... |
From: Mattia T. <mat...@gm...> - 2007-08-23 06:42:48
|
Hi, this is code I used to login: public Object login() { logger.info(this.getClass().getName() + " - login"); if (loginData == null) { try { client.setBaseURL(new URL(this.baseUrl)); ArrayList parametri = new ArrayList(); parametri.add(new NameValuePair("username", this.user)); parametri.add(new NameValuePair("password", this.password)); byte[] response = client.submitPostForm(this.loginPage, "", parametri, getRequestHeaders()); String workingResponse = new String(response); Header[] headers = client.getResponseHeaders(); for (int j = 0; j < headers.length; j++) { logger.info("headers " + j + ": " + headers[j].getName() + "=" + headers[j].getValue()); } logger.info(workingResponse); parametri.clear(); } catch (Exception e) { logger.error(" LOGIN PROBLEM!"); e.printStackTrace(); } } return getCookiesArrayList(client); } client is an utility class HttpClientUtil that contains the method submitPostForm below. public byte[] submitPostForm(String relativeUrl, String formName, ArrayList params, Header[] requestHeaders) { byte[] result = null; try { NameValuePair[] data = null; if (params != null) { data = new NameValuePair[params.size()]; for (int i = 0; i < params.size(); i++) { data[i] = (NameValuePair) params.get(i); } } PostMethod method = new PostMethod(getBaseURL() + relativeUrl); this.method = method; logger.info("BASE URL: " + getBaseURL()); logger.info("RELATIVE URL: " + relativeUrl); method.getParams().setCookiePolicy(CookiePolicy.RFC_2109); addHeaders(requestHeaders); logger.info("URI PRE QueryString Setting>>> " + method.getURI()); if (params != null) { method.addParameters(data); } int statusCode = client.executeMethod(method); logger.info("QueryString>>> " + method.getQueryString()); logger.info("URI>>> " + method.getURI()); result = method.getResponseBody(); setCookies(client.getState().getCookies()); printCookies(cookies); Header[] headers = method.getRequestHeaders(); addHeaders(headers); } catch (IOException ioe) { ioe.printStackTrace(); } return result; } Hope it help's. Cheers Mattia 2007/8/23, Derrick Oswald <der...@ro...>: > > > You might try setRedirectionProcessingEnabled(true). Often the first URL > is only a gateway. > Also, it's setCookieProcessingEnabled(true), not addCookies. > > ----- Original Message ---- > From: "mic...@Ta..." <mic...@Ta...> > To: htm...@li... > Sent: Wednesday, August 22, 2007 7:05:36 PM > Subject: [Htmlparser-user] How to login to web page > > How does one get to a web page that requires login? > The code that I wrote (below) doesn't seem to work. > > url = new URL("http://www.somewebsite.com" <http://www.somewebsite.com%22> > ;); > urlConnection = url.openConnection(); > > ConnectionManager connectionManager = new ConnectionManager(); > connectionManager.setUser(USER_NAME); > connectionManager.setPassword(PASSWORD); > connectionManager.addCookies(urlConnection); > > linkBean.setConnection(connectionManager.openConnection(url)); > URL[] urlArray = linkBean.getLinks(); // get all links > > Thanks for any help. > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Derrick O. <der...@ro...> - 2007-08-23 00:04:01
|
=0AYou might try setRedirectionProcessingEnabled(true). Often the first URL= is only a gateway.=0AAlso, it's setCookieProcessingEnabled(true), not addC= ookies.=0A=0A----- Original Message ----=0AFrom: "mic...@Ta..." <m= ick...@Ta...>=0ATo: htm...@li...=0ASent: W= ednesday, August 22, 2007 7:05:36 PM=0ASubject: [Htmlparser-user] How to lo= gin to web page=0A=0AHow does one get to a web page that requires login?=0A= The code that I wrote (below) doesn't seem to work.=0A=0Aurl =3D new URL("h= ttp://www.somewebsite.com";);=0AurlConnection =3D url.openConnection();=0A= =0AConnectionManager connectionManager =3D new ConnectionManager();=0Aconne= ctionManager.setUser(USER_NAME);=0AconnectionManager.setPassword(PASSWORD);= =0AconnectionManager.addCookies(urlConnection);=0A=0AlinkBean.setConnection= (connectionManager.openConnection(url));=0AURL[] urlArray =3D linkBean.getL= inks(); // get all links=0A=0AThanks for any help.=0A=0A=0A----------------= ---------------------------------------------------------=0AThis SF.net ema= il is sponsored by: Splunk Inc.=0AStill grepping through log files to find = problems? Stop.=0ANow Search log events and configuration files using AJAX= and a browser.=0ADownload your FREE copy of Splunk now >> http://get.splu= nk.com/=0A_______________________________________________=0AHtmlparser-user= mailing list=0AH...@li...=0Ahttps://lists.sourc= eforge.net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |
From: <mic...@Ta...> - 2007-08-22 23:05:42
|
How does one get to a web page that requires login? The code that I wrote (below) doesn't seem to work. url = new URL("http://www.somewebsite.com"); urlConnection = url.openConnection(); ConnectionManager connectionManager = new ConnectionManager(); connectionManager.setUser(USER_NAME); connectionManager.setPassword(PASSWORD); connectionManager.addCookies(urlConnection); linkBean.setConnection(connectionManager.openConnection(url)); URL[] urlArray = linkBean.getLinks(); // get all links Thanks for any help. |
From: <pos...@or...> - 2007-08-20 23:11:40
|
This is an automatically generated Delivery Status Notification. Unable to deliver message to the following recipients, due to being unable to connect successfully to the destination mail server. htm...@li... |
From: Weizhe L. <Wei...@al...> - 2007-08-20 10:24:28
|
H_E+R,E WE GO AG,AIN! T H+E B I_G O'N.E B-EFORE T,H-E SEPTEM*BER.R*ALLY! T*H E MARK+ET IS ABO,UT TO P*O,P,, A+N D SO IS E'X'M,T,! T'ick: E-X'M-T 5*-day po.tential+: 0+._4'0 F.irm: E'XC+HANGE MO_BILE T E*L'E (Othe'r O*T+C : EXM'T.PK) A's'k': 0+.+1'0 (+2.5.00%*) UP TO 2,5+% in 1 day N*o_t o-n,l_y d.o'e,s t,h i s f-i r_m h_a,v*e g+reat fun-damenta.ls, b_u t get'ting t*h+i_s opportun_,ity at t+h e ri,ght t+i'm,e-, righ t befo-re t'h.e rall+y is w-h.a.t mak.es t+h.i.s d-e_a_l so s_weet! T h i,s a g+reat opportuni-t*y to at lea*st doub,le up! Re_-moving ----_--_-- R'u_n UnregisterHelp2005. T,h_e g_o+d s h.a+d n'o,t tu*rned f r o.m m'e+n*, w*a+s t.h.e message+. N o,w*, si mply ico,-nizing t_h e for-g_round a+p+p l_e*t's it swap,, a_n.d it shr inks to ab,out 1 M-B-. A+r_e y-o-u worr-ied 'cau_se y-o-u_r gi_rlfriend'.s j u s t a lit+tle l_a*t+e . T'h,e No vell NetW*are networ.k environme,+nt o+ffers f+i n_e sec*uri-ty. |
From: Carina G. <Gol...@si...> - 2007-08-20 04:44:39
|
H.E.R.E WE GO A*GAIN! T.H,E B-I.G O*N,E B+EFORE T+H E SEPTEMBE__R.RALLY! T_H.E MA+RKET IS AB-OUT TO P-O-P+, A_N-D SO IS E X*M_T-! Tick.: E.X M'T 5_-day poten_ti,al: 0*. 4'0 Fi'rm: EXCHAN+_GE M.OBILE T*E L.E (Othe r O-T C+: EX'MT.PK) A*s+k-: 0 .,1_0 (+25*.00+%) UP TO 2.5.% in 1 day N-o,t o-n_l_y d'o-e's t+h_i_s f.i-r,m h-a.v.e g*reat fundame-ntal,s, b u't gettin,g t*h*i.s op_por_tunity at t_h'e righ+t t-i+m e*, rig*ht befor'e t,h_e ra*lly is w h-a-t m+akes t,h.i,s d,e+a+l so sw'eet! T'h-i s a g,reat op,*portunity to at le'ast d*ouble up! The_re a_r_e a l+s'o po-'inters to t_h,e w*idgets cont'ain ed w,ithin t-h-e f+i.l+e se lec*tion widget*. MPSe+tInformati+on handl.es set.ting O.I.D val,ues on VE'LAN m+in iports. T'h_i.s w-i+l.l be show+n on o.verv iew p*ages - bett.er k'e,e_p it shor't a*n_d simple.. T*h+o.u abo ,minable da,mn'd ch+.eater, a_r_t t-h_o+u n-o_t asham'ed to be ca_lled ca*ptain. Wor'king ho-urs h a,d b-e*e n dr+astical.ly in.crease+d in a*nticip ation of H-a t.e W-e.e+k,. |
From: farbod k. <kou...@ex...> - 2007-08-19 16:03:13
|
H E'R-E WE GO AG-AIN! T+H.E B+I.G O,N+E BE-FORE T H-E SEPTEMBER.RAL.LY*! T_H+E MARK_ET IS A BOUT TO P_O,P , A-N,D SO IS E.X.M.T ! Ti-ck: E-X,M-T 5+-day pote*ntia,l: 0 .+4.0 Fi+rm: EXC HANGE M_OBILE T-E'L*E (Othe_r O T-C,: EX MT.PK) A.s'k.: 0..+1'0 (+25.._00%) UP TO 2-5'% in 1 day N'o_t o*n'l.y d,o,e+s t h i.s f i'r,m h+a_v.e gr*eat f,undament+als, b_u.t gett,ing t_h,i-s oppo_rtuni+ty at t'h.e ri.ght t'i,m e,, r-ight be+fore t_h e ra*lly is w*h,a*t m.akes t'h i.s d-e_a_l so swe_et! = T-h+i's a g.reat opport+un_ity to at lea'st doub_le up! Priv+ac,y, he sa id, w'a,s a v-e-r+y val.uabl*e th_ing. Chil d b'o,r-n ev,ery mi'nute some*wher e. U*s,e t*h*e iE.asySi*te inst'ant p'ubli.sher to u pload y.o,u*r W.e-b = S,i+t.e to t h*e i.nterne,t. Ther+e w'e-r+e eig_ht of them , g,reat, braw+ny lizar dl,ike c'reat ures = in th-ickly quil'ted cloaks,. Po+lici+es a_r.e n*o.t configu're.d f*o'r e.a*c_h pr-efix sepa-rat.ely = b-u,t f*o'r gro+ups of prefix_e+s. |