[Htmlparser-user] Can't use extractAllNodesThatMatch back-to-back for same Parser instance

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello,

Anyone know why I can't use two extractAllNodesThatMatch(filter)
methods back-to-back on the same Parser instance?

More specifically I have this code:

========================================
Parser parser = new Parser(google);

NodeList titleList = parser.extractAllNodesThatMatch(titleFilter);
NodeList summaryTableList = parser.extractAllNodesThatMatch(summaryTableFilter);
========================================

The Google search results page I'm parsing has a series of these:

<a href="blah">Title</a>
<table><tr><td>.....Summary info....</td></tr></table>

The two filters above, when independent, work fine.  Run them
back-to-back and the second will come up empty.  I don't see where the
extractAllNodesThatMatch method literally pulls the nodes out of the
captured source, thus affecting the second filter. Here are my
filters:

========================================
// filter to pull out titles (all links that are next to a table)
NodeFilter titleFilter = new AndFilter (
		new NodeClassFilter (LinkTag.class),
                new HasSiblingFilter (new NodeClassFilter(TableTag.class))
);
// filter to pull out summaries (all tables that are next to a title link)
NodeFilter summaryTableFilter = new AndFilter (
		new NodeClassFilter (TableTag.class),
                new NodeClassFilterOnPreviousSibling (LinkTag.class)
// custom filter
);
========================================

Thanks for the help.  I've already tried subclassing the Parser so
that I could implement the clone() method, but got the same result.

-Daniel