Thread: RE: [Htmlparser-developer] Writing OPTION tag

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi guys,

I am yet trying to solve my problem with the scanner of my OPTION tag. I
would really appreciate any help from the developers of the parsing
engine. I think a solution may lie in knowing certain internals of the
parser.

Let me explain my problem in detail.

Assume the following 2 OPTION tags :
<OPTION value="AltaVista Search">AltaVista
<OPTION value="Lycos Search"></OPTION>

The OPTION tag does not explicitly require an end tag. Hence the first
line is valid.
My parsing logic in scan is as follows :
1. Disable existing parsers
2. Read elements from the Reader.
3. Check whether it is an EndTag for OPTION or SELECT (since OPTION tags
are always under SELECT). If so create an OptionTag object with
necessary values
4. If it is not an EndTag, check whether it is a StringNode (this would
be for the value between <OPTION> and </OPTION> tags). If so it is the
text of the OPTION tag and store it temporarily. (This will be later
used in the constructor).
5. If it is neither it could be an error or the beginning of another tag
(possible another <OPTION> tag as above) and hence the current loop must
be terminated and the option object must be constructed.

The problem with my input is that <OPTION value="AltaVista Search">
would be read as an OptionTag, AltaVista would be read as the StringNode
and then <OPTION value="Lycos Search"> would be read and since it is
neither a StringNode nor an EndTag an OptionTag would be created for the
above 2 values. However since this tag is already read it will not
qualify as a new OptionTag and hence I am missing out this tag in my
parsing. I hope I have been able to explain my problem clearly. If not,
I would certainly like to clarify on any points which are not
understood.

A snippet of code from scan() of HTMLOptionTagScanner is given below

Vector lScannerVector = HTMLParserUtils.adjustScanners(pReader);  
do 
{
      lNode = pReader.readElement();
      System.out.println(lNode.toHTML());
      if (lNode instanceof HTMLEndTag)
      {
            lEndTag = (HTMLEndTag)lNode;
            String lEndTagString = lEndTag.getText().toUpperCase();
            if (lEndTagString.equals("OPTION") ||
lEndTagString.equals("SELECT")) 
            {
                  endTagFound = true;
            }
      }
      else if (lNode instanceof HTMLStringNode)
      {
            lText.append(lNode.toHTML());
      }
      else if (lNode instanceof HTMLTag)
      {
            endTagFound = true;
      }
}
while (!endTagFound);

HTMLOptionTag lOptionTag = new HTMLOptionTag(0, lNode.elementEnd(),
pTag.getText(), lText.toString(), pCurrLine);
HTMLParserUtils.restoreScanners(pReader, lScannerVector);

Regards,

Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457

Thread: RE: [Htmlparser-developer] Writing OPTION tag

htmlparser-developer