Thread: [Htmlparser-user] Link Parsing within RemarkNode
Brought to you by:
derrickoswald
From: George M. <emi...@be...> - 2005-09-21 13:31:33
|
I have a link that I would like to parse out of html code, but the link is located within a remark. How can I parse out the contents of the "href" that is between the <!-- <a class="greylink" href="somecode"> --> Thanks, Ed |
From: Derrick O. <Der...@Ro...> - 2005-09-22 01:10:05
|
You can take the text from the Remark node and pass it into another parser: NodeList remarks = parser.parse (new NodeClassFilter (Remark.class)); Parser parser = new Parser (); // you can't pass the string in the constructor... it wants a URL ... foreach remark node in the list { String s = remarkNode.getText (); parser.setInputHTML (s); NodeList links = parser.parse (new TagNameFilter ("A")); // if there's at least one link... if (1 <= links.size ()) LinkTag link = links.elementAt (0); } George Mitchell wrote: > I have a link that I would like to parse out of html code, but the > link is located within a remark. How can I parse out the contents of > the "href" that is between the <!-- <a class="greylink" > href="somecode"> --> > > Thanks, > > Ed > |
From: prince p. <pri...@gm...> - 2005-09-23 11:16:37
|
// I wrote this code .My output is different from the output by running the StringExtractor batch file. In my output i didnot get the carriage return(\n).can u suggest any remedy for this? sextract =3D new StringExtractor (url1); PrintWriter out1 =3D new PrintWriter (new FileOutputStream ("C:\\goo.txt")); out1.println(sextract.extractStrings ( true)); out1.close(); sextract1 =3D new StringExtractor ("c:\\goo.txt"); PrintWriter out11 =3D new PrintWriter (new FileOutputStream ("C:\\goo1.txt")); out11.println(sextract1.extractStrings ( true)); out11.close(); ORGINAL OUTPUT BY RUNNING STRINGEXTRACTOR BATCH FILE ON OUTPUT SCREEN:--- *sample output:* Yahoo! Finance Music HotJobs Mail My Yahoo! Messenger Search for: on the web in Images in Video in Directory in Local in Shopping ? Advanced ? My Web Yahoo! Games-Play Family Feud,Chicken Invaders 2,poker superstar,Tumble bugs,Diner Dash, More Check your mail status:Sign In Free mail: Signup MY ABOVE CODE OUTPUT LIKE THIS: Yahoo! Finance Music HotJobs Mail My Yahoo! Messenger Search for: on the We= b in Images in Video in Directory in Local in News in Shopping ? Advanced ? M= y Web Yahoo! Travel - Flights , Hotels , Cars , Vacations , Today's Deals , Enter for a Chance to Win a Car Check your mail status: Sign In Free mail: Sign Up 360 can u suggest how to get the same output with my code? plz reply me..... |
From: George M. <emi...@be...> - 2005-09-24 23:24:47
|
Thanks! This worked great. Now I have another issue. I want to parse the link that was just extracted. The link now contains spaces in the string instead of %20. This causes a parser exception. Is there a way to covert the spaces to %20? On Sep 21, 2005, at 9:09 PM, Derrick Oswald wrote: > You can take the text from the Remark node and pass it into another > parser: > > NodeList remarks = parser.parse (new NodeClassFilter (Remark.class)); > > Parser parser = new Parser (); // you can't pass the string in the > constructor... it wants a URL > ... foreach remark node in the list > { > String s = remarkNode.getText (); > parser.setInputHTML (s); > NodeList links = parser.parse (new TagNameFilter ("A")); > // if there's at least one link... > if (1 <= links.size ()) > LinkTag link = links.elementAt (0); > } > > > George Mitchell wrote: > >> I have a link that I would like to parse out of html code, but the >> link is located within a remark. How can I parse out the contents of >> the "href" that is between the <!-- <a class="greylink" >> href="somecode"> --> >> >> Thanks, >> >> Ed >> > > > > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. > Download > it for free - -and be entered to win a 42" plasma tv or your very own > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Eric A. H. <lu...@ta...> - 2005-09-25 03:42:20
|
George Mitchell wrote: > Thanks! This worked great. Now I have another issue. I want to > parse the link that was just extracted. The link now contains spaces > in the string instead of %20. This causes a parser exception. Is > there a way to covert the spaces to %20? > I think this may be what you're looking for. Hope it helps. String encodedString = java.net.URLEncoder.encode("string you want to convert to UTF-8", "UTF-8"); |