[Htmlparser-user] Can't use extractAllNodesThatMatch back-to-back for same Parser instance
Brought to you by:
derrickoswald
From: Daniel D. <me...@cr...> - 2008-03-11 14:03:57
|
Hello, Anyone know why I can't use two extractAllNodesThatMatch(filter) methods back-to-back on the same Parser instance? More specifically I have this code: ======================================== Parser parser = new Parser(google); NodeList titleList = parser.extractAllNodesThatMatch(titleFilter); NodeList summaryTableList = parser.extractAllNodesThatMatch(summaryTableFilter); ======================================== The Google search results page I'm parsing has a series of these: <a href="blah">Title</a> <table><tr><td>.....Summary info....</td></tr></table> The two filters above, when independent, work fine. Run them back-to-back and the second will come up empty. I don't see where the extractAllNodesThatMatch method literally pulls the nodes out of the captured source, thus affecting the second filter. Here are my filters: ======================================== // filter to pull out titles (all links that are next to a table) NodeFilter titleFilter = new AndFilter ( new NodeClassFilter (LinkTag.class), new HasSiblingFilter (new NodeClassFilter(TableTag.class)) ); // filter to pull out summaries (all tables that are next to a title link) NodeFilter summaryTableFilter = new AndFilter ( new NodeClassFilter (TableTag.class), new NodeClassFilterOnPreviousSibling (LinkTag.class) // custom filter ); ======================================== Thanks for the help. I've already tried subclassing the Parser so that I could implement the clone() method, but got the same result. -Daniel |