htmlparser-user Mailing List for HTML Parser

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Sorry, replied without thinking.
You can apply the StringBean directly to a node list:

Parser parser = new Parser ("http://yadda.yadda");
NodeList list = parser.parse (my_spiffo_DIV_finding_filter);
Div div = list.elementAt (0);
StringBean bean = new StringBean ();
div.getChildren ().visitAllNodesWith (bean);
System.out.println (bean.getStrings ());

Derrick

Derrick Oswald wrote:

>Jesse,
>
>The job breaks down into two tasks:
>  1) get the outermost tag (your <div id="video_infobox_con"> tag) using 
>a filter you construct.
>  2) use a StringBean as a visitor on that node and it's children to 
>extract the text, like so:
>
>Parser parser = new Parser ("http://yadda.yadda");
>NodeList list = parser.parse (my_spiffo_DIV_finding_filter);
>Div div = list.elementAt (0);
>// now re-create the HTML and pass it into another Parser
>Parser parser = new Parser (div.toHtml ()); // Note: for older versions 
>you need to use setInputHtml()
>StringBean bean = new StringBean ();
>parser.visitAllNodesWith (bean);
>System.out.println (bean.getStrings ());
>
>Derrick
>
>h pq wrote:
>
>  
>
>>Hi all, I have a question when I parsered the html content.  In the 
>>html content there are many tags, if I want to get a tag text like 
>>LinkTag or TableTag , it's very easy to use the LinkRegexFilter or 
>>TagNameFilter, but if I want to get more than one tag's content , is 
>>there a filter chain ?  Maybe the example following will explain what 
>>I said directly:
>> 
>> <div id="video_infobox_con">
>>    ·add by:<span class="fcolor_03">2006.07.27 - 01:22</span><br />
>>    ·Label: 
>>                 <a href="search.do?q=%B0%CD%B6%FB%C4%E1%D1%C7%C4%E1" 
>>class="lnk_04" target=_self><u>test_a</u></a>              
>>              
>>                 <a href="search.do?q=%D7%B4%D4%AA%D0%E3" 
>>class="lnk_04" target=_self><u>test_b</u></a>              
>>              
>>                 <a href=" search.do?q=%C0%BA%C7%F2" class="lnk_04" 
>>target=_self><u>test_c</u></a>              
>>              
>>                 <a href="search.do?q=%CC%E5%D3%FD" class="lnk_04" 
>>target=_self><u>test_d</u></a>              
>>              
>> </div>
>><input type="text" id="htmlurl" name="htmlurl" value='value_test'  />
>> 
>>there are four tags such as div, span, a ,input, and  all content in 
>>these tags are what I need like 2006.07.27 - 01:22,  test_a,  test_b,  
>> test_c,  test_d and value_test
>>How should I do?  Maybe I can parser the html for 4 times to get the 
>>four tags' content, but I think it'll impact the proformance. Could 
>>you help me ? Thank you very much.
>> 
>>Best Regards
>>Jesse
>> 
>>
>>------------------------------------------------------------------------
>>
>>-------------------------------------------------------------------------
>>Take Surveys. Earn Cash. Influence the Future of IT
>>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>>opinions on IT & business topics through brief surveys -- and earn cash
>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>Htmlparser-user mailing list
>>Htm...@li...
>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>> 
>>
>>    
>>
>
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys -- and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>  
>

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2002	Jan (7)	Feb	Mar (9)	Apr (50)	May (20)	Jun (47)	Jul (37)	Aug (32)	Sep (30)	Oct (11)	Nov (37)	Dec (47)
2003	Jan (31)	Feb (70)	Mar (67)	Apr (34)	May (66)	Jun (25)	Jul (48)	Aug (43)	Sep (58)	Oct (25)	Nov (10)	Dec (25)
2004	Jan (38)	Feb (17)	Mar (24)	Apr (25)	May (11)	Jun (6)	Jul (24)	Aug (42)	Sep (13)	Oct (17)	Nov (13)	Dec (44)
2005	Jan (10)	Feb (16)	Mar (16)	Apr (23)	May (6)	Jun (19)	Jul (39)	Aug (15)	Sep (40)	Oct (49)	Nov (29)	Dec (41)
2006	Jan (28)	Feb (24)	Mar (52)	Apr (41)	May (31)	Jun (34)	Jul (22)	Aug (12)	Sep (11)	Oct (11)	Nov (11)	Dec (4)
2007	Jan (39)	Feb (13)	Mar (16)	Apr (24)	May (13)	Jun (12)	Jul (21)	Aug (61)	Sep (31)	Oct (13)	Nov (32)	Dec (15)
2008	Jan (7)	Feb (8)	Mar (14)	Apr (12)	May (23)	Jun (20)	Jul (9)	Aug (6)	Sep (2)	Oct (7)	Nov (3)	Dec (2)
2009	Jan (5)	Feb (8)	Mar (10)	Apr (22)	May (85)	Jun (82)	Jul (45)	Aug (28)	Sep (26)	Oct (50)	Nov (8)	Dec (16)
2010	Jan (3)	Feb (11)	Mar (39)	Apr (56)	May (80)	Jun (64)	Jul (49)	Aug (48)	Sep (16)	Oct (3)	Nov (5)	Dec (5)
2011	Jan (13)	Feb	Mar (1)	Apr (7)	May (7)	Jun (7)	Jul (7)	Aug (8)	Sep	Oct (6)	Nov (2)	Dec
2012	Jan (5)	Feb	Mar (3)	Apr (3)	May (4)	Jun (8)	Jul (1)	Aug (5)	Sep (10)	Oct (3)	Nov (2)	Dec (4)
2013	Jan (4)	Feb (2)	Mar (7)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar (1)	Apr	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (4)	Nov (2)	Dec (4)
2015	Jan (4)	Feb (2)	Mar (8)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug (1)	Sep (1)	Oct (4)	Nov (3)	Dec (4)
2016	Jan (4)	Feb (6)	Mar (9)	Apr (9)	May (6)	Jun (1)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2017	Jan	Feb (1)	Mar (3)	Apr (1)	May	Jun (1)	Jul (2)	Aug (3)	Sep (6)	Oct (3)	Nov (2)	Dec (5)
2018	Jan (3)	Feb (13)	Mar (28)	Apr (5)	May (4)	Jun (2)	Jul (2)	Aug (8)	Sep (2)	Oct (1)	Nov (5)	Dec (1)
2019	Jan (8)	Feb (1)	Mar	Apr (1)	May (4)	Jun	Jul (1)	Aug	Sep	Oct	Nov (2)	Dec (2)
2020	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (2)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec (1)
2021	Jan (3)	Feb (2)	Mar (1)	Apr (1)	May (2)	Jun (1)	Jul (2)	Aug (1)	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2024	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
						1 (1)
2	3	4	5	6	7	8
9	10	11	12	13	14 (2)	15
16	17 (1)	18	19 (2)	20	21	22
23	24 (2)	25 (1)	26	27	28 (5)	29 (4)
30 (1)	31 (3)

htmlparser-user Mailing List for HTML Parser

htmlparser-user — The user mailing list for users of the htmlparser library