Thread: [Htmlparser-user] Link Parsing within RemarkNode

Brought to you by: derrickoswald

htmlparser-user

[Htmlparser-user] Link Parsing within RemarkNode

From: George M. <emi...@be...> - 2005-09-21 13:31:33

I have a link that I would like to parse out of html code, but the link 
is located within a remark.  How can I parse out the contents of the 
"href" that is between the <!--  <a class="greylink" href="somecode"> 
-->

Thanks,

Ed

Re: [Htmlparser-user] Link Parsing within RemarkNode

From: Derrick O. <Der...@Ro...> - 2005-09-22 01:10:05

You can take the text from the Remark node and pass it into another parser:

NodeList remarks = parser.parse (new NodeClassFilter (Remark.class));

Parser parser = new Parser (); // you can't pass the string in the 
constructor... it wants a URL
... foreach remark node in the list
{
    String s = remarkNode.getText ();
    parser.setInputHTML (s);
    NodeList links = parser.parse (new TagNameFilter ("A"));
    // if there's at least one link...
    if (1 <= links.size ())
        LinkTag link = links.elementAt (0);
}


George Mitchell wrote:

> I have a link that I would like to parse out of html code, but the 
> link is located within a remark.  How can I parse out the contents of 
> the "href" that is between the <!--  <a class="greylink" 
> href="somecode"> -->
>
> Thanks,
>
> Ed
>

Re: [Htmlparser-user] Link Parsing within RemarkNode

From: prince p. <pri...@gm...> - 2005-09-23 11:16:37

  // I wrote this code .My output is different from the output by running
the StringExtractor batch file.
 In my output i didnot get the carriage return(\n).can u suggest any remedy
for this?
  sextract =3D
new StringExtractor (url1);

PrintWriter out1 =3D
new PrintWriter (new FileOutputStream ("C:\\goo.txt"));

out1.println(sextract.extractStrings (
true));

out1.close();

sextract1 =3D
new StringExtractor ("c:\\goo.txt");

PrintWriter out11 =3D
new PrintWriter (new FileOutputStream ("C:\\goo1.txt"));

out11.println(sextract1.extractStrings (
true));

out11.close();

ORGINAL OUTPUT BY RUNNING STRINGEXTRACTOR BATCH FILE ON OUTPUT SCREEN:---

*sample output:*

Yahoo!
Finance
Music
HotJobs
Mail
My Yahoo!
Messenger
Search for:
on the web in Images in Video in Directory in Local in Shopping
? Advanced
? My Web
Yahoo! Games-Play Family Feud,Chicken Invaders 2,poker superstar,Tumble
bugs,Diner Dash, More
Check your mail status:Sign In
Free mail: Signup

 MY ABOVE CODE OUTPUT LIKE THIS:

Yahoo! Finance Music HotJobs Mail My Yahoo! Messenger Search for: on the We=
b
in Images in Video in Directory in Local in News in Shopping ? Advanced ? M=
y
Web Yahoo! Travel - Flights , Hotels , Cars , Vacations , Today's Deals ,
Enter for a Chance to Win a Car Check your mail status: Sign In Free mail:
Sign Up 360

 can u suggest how to get the same output with my code?

plz reply me.....

Re: [Htmlparser-user] Link Parsing within RemarkNode

From: George M. <emi...@be...> - 2005-09-24 23:24:47

Thanks!  This worked great.  Now I have another issue.  I want to parse 
the link that was just extracted.  The link now contains spaces in the 
string instead of %20.  This causes a parser exception.  Is there a way 
to covert the spaces to %20?


On Sep 21, 2005, at 9:09 PM, Derrick Oswald wrote:

> You can take the text from the Remark node and pass it into another 
> parser:
>
> NodeList remarks = parser.parse (new NodeClassFilter (Remark.class));
>
> Parser parser = new Parser (); // you can't pass the string in the 
> constructor... it wants a URL
> ... foreach remark node in the list
> {
>    String s = remarkNode.getText ();
>    parser.setInputHTML (s);
>    NodeList links = parser.parse (new TagNameFilter ("A"));
>    // if there's at least one link...
>    if (1 <= links.size ())
>        LinkTag link = links.elementAt (0);
> }
>
>
> George Mitchell wrote:
>
>> I have a link that I would like to parse out of html code, but the 
>> link is located within a remark.  How can I parse out the contents of 
>> the "href" that is between the <!--  <a class="greylink" 
>> href="somecode"> -->
>>
>> Thanks,
>>
>> Ed
>>
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by:
> Tame your development challenges with Apache's Geronimo App Server. 
> Download
> it for free - -and be entered to win a 42" plasma tv or your very own
> Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

Re: [Htmlparser-user] Link Parsing within RemarkNode

From: Eric A. H. <lu...@ta...> - 2005-09-25 03:42:20

George Mitchell wrote:

> Thanks!  This worked great.  Now I have another issue.  I want to 
> parse the link that was just extracted.  The link now contains spaces 
> in the string instead of %20.  This causes a parser exception.  Is 
> there a way to covert the spaces to %20?
>
I think this may be what you're looking for. Hope it helps.

String encodedString = java.net.URLEncoder.encode("string you want to 
convert to UTF-8", "UTF-8");