htmlparser-developer Mailing List for HTML Parser
Brought to you by:
derrickoswald
You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
| 2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
| 2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
| 2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
| 2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
| 2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
| 2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
1
(1) |
2
(1) |
3
|
4
(2) |
5
(3) |
6
|
|
7
|
8
(1) |
9
(2) |
10
|
11
(3) |
12
(2) |
13
|
|
14
|
15
(1) |
16
(2) |
17
(2) |
18
(1) |
19
|
20
|
|
21
|
22
|
23
(1) |
24
|
25
|
26
(2) |
27
(2) |
|
28
|
29
(1) |
30
|
|
|
|
|
|
From: Raghavender S. <kin...@ho...> - 2002-04-29 23:37:03
|
Hi Somik, I encountered a strange problem today. while I was running htmlparser...I got a java.lang.OutOfMemoryError. seems that lot of objects are being allocated. where exactly is this happening. I mean could you give me an idea where or in which file the potential problem could be. Raghav >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >CC: <htm...@li...> >Subject: Re: [Htmlparser-user] Hints on how to change image tag locations >and write out document >Date: Sat, 27 Apr 2002 18:22:34 +0900 > >Hi Annette, > Pls find attached a program to get you started. This program will do >what you want - you will need to modify the construct that checks for the >image tag - and replace it with the location of your choice. > Also - I found one bug thanks to this requirement - image tags params >were not being correctly put in. Though it needs a deeper look, I have done >a quick fix for now, and all test cases are passing (with one test case in >HTMLImageScannerTest trapping this bug). > Please check out the latest html parser source code from CVS. > >Regards, >Somik > > ----- Original Message ----- > From: Doyle, Annette > To: htm...@li... > Sent: Friday, April 26, 2002 10:08 PM > Subject: [Htmlparser-user] Hints on how to change image tag locations >and write out document > > > Could you please give me some hints as how to change only image tag >locations and then, (or at the same time) write out the html document to >file (with new image tag locations)? > > > > Thanks- > > Annette Doyle > ><< ImageTagRetriever.java >> _________________________________________________________________ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com |
|
From: Somik R. <so...@ya...> - 2002-04-27 09:33:26
|
Hi Folks, =20
I am getting a lot of pain integrating html parser with Swing. It =
seems like Sun doesent want folks to change their parser. I am trying to =
come to terms with the fact that I need 72 if-thens, for all kinds of =
tags. I had initially written an object framework to compare html parser =
parsed objects with the swing parser objects, and its a nightmare, bcos =
even simple tags are not being picked up correctly by the latter.
Meta tags dont seem to work, or tags with attributes have the =
attributes not showing up.
I think its crazy for one person to do all of this, but if I can =
have help - then I will put up this integration code, and maybe we'd be =
able to get this done in a month (??)
I guess this would be kind of prestigious if it gets finished - so =
developers- pls let me know who volunteers to help in this enterprise. =
(Its not hard really, but lots to be done)
Cheers,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-04-26 03:43:51
|
Hi Annette,
I just figured out what is happening...
Sorry for the previous mail - this is not a bug in the parser. You see -
the tags which werent getting reported as image tags, were sandwiched
between link tags <A HREF="..."><IMG ..></A>. Hence, in your application,
you will also need to watch out for link tags, and pick up the images from
within should there be any.
Now - if this causes you additional headaches, then dont register all
the scanners, so the link scanner will not interfere, and you will only get
the image tags.
In order to prove that this analysis is correct - I added one more test
case to HTMLImageScannerTest.java -
testImageTagsFromYahooWithAllScannersRegistered()
This test case extracts the link and checks that the image is found within.
Also no of tags found is verified. You can check out this code from CVS, it
might help you if you are interested in getting image tags out of link tags.
Correspondingly, there is also testImageTagsFromYahoo() which passes (with
only html image scanner registered).
Let me know if you need further help.
Regards,
Somik
----- Original Message -----
From: Doyle, Annette
To: htm...@li...
Sent: Friday, April 26, 2002 1:32 AM
Subject: [Htmlparser-user] Not all image tags are returned
Is there any known problem about not all image tags being returned? I did
the following code:
HTMLParser parser = new
HTMLParser(htmlOriginalFileLoc);
// Registering all the common scanners
parser.registerScanners();
for (Enumeration e =
parser.elements();e.hasMoreElements();) {
HTMLNode node = (HTMLNode)e.nextElement();
if (node instanceof HTMLImageTag)
{
System.out.println();
System.out.println(((HTMLImageTag)node).getTagLine());
System.out.println();
file://imageTagsUrl.addElement(((HTMLImageTag)node).getImageLocation());
}
}
I was testing with another html parser and it found all the image tags.
Attached is the source from www.yahoo.com when I ran the code above.
|
|
From: Somik R. <so...@ya...> - 2002-04-26 03:28:17
|
Hi Annette,
Thanks for the report, I wrote a functional testcase, to do a raw =
check IMG tags, and with the parser, and could reproduce the bug. I dont =
think its a problem with the image scanner code - bcos the unit tests =
are passing with the same yahoo tags.
Here's a quick solution for you : Dont use registerScanners() for =
now. Since your app specifically needs to check only image scanners, =
replace the line :
parser.registerScanners();=20
with
parser.addScanner(new HTMLImageScanner("-i"));=20
I checked that all the yahoo image tags come fine with this change. =
The functional test has been checked into CVS (FunctionalTests.java), =
and the one with registerScanners() fails. The corresponding unit test =
in HTMLImageScanner passes.
Meanwhile, I am trying to find out which scanner is messing up..
Thanks again for your report.
Cheers,
Somik
----- Original Message -----=20
From: Doyle, Annette=20
To: htm...@li...=20
Sent: Friday, April 26, 2002 1:32 AM
Subject: [Htmlparser-user] Not all image tags are returned
Is there any known problem about not all image tags being returned? I =
did the following code:
=20
HTMLParser parser =3D new =
HTMLParser(htmlOriginalFileLoc);
// Registering all the common scanners
parser.registerScanners();=20
for (Enumeration e =3D =
parser.elements();e.hasMoreElements();) {
HTMLNode node =3D =
(HTMLNode)e.nextElement();
if (node instanceof HTMLImageTag)
{
System.out.println();
=
System.out.println(((HTMLImageTag)node).getTagLine());
System.out.println();
=20
=
//imageTagsUrl.addElement(((HTMLImageTag)node).getImageLocation());
}
}
=20
I was testing with another html parser and it found all the image =
tags. Attached is the source from www.yahoo.com when I ran the code =
above.
|
|
From: Somik R. <so...@ya...> - 2002-04-23 14:56:17
|
Hi Developers,
What do you think of Gordon Deudney's bug report at =
http://sourceforge.net/tracker/?group_id=3D24399&atid=3D381399
This is actually open for discussion.
Regards
Somik
|
|
From: Somik R. <so...@ya...> - 2002-04-18 01:59:39
|
Hi Folks,
To all the developers - here is a to-do list for the project. You =
can pick any to get involved :
[1] Swing integration - Plugin htmlparser and demonstrate how it can be =
used instead of HTML Parser that comes with the Sun JDK
[2] Set up a servlet - which allows people to test the html parser =
online. The idea is :
(i) Enter your URL - and click Parse
(ii) The parser is launched on the server, and produces all the =
nodes (node.print()) on the display.
(iii) If an exception gets thrown, then this url is saved into a =
database(??)
(iv) If no exception is thrown, but however there is an error in the =
parsing, then a report can be entered on the result page by the tester, =
telling us why he thinks the output is incorrect.
(v) We get notified everytime there is a report, either of a crash, =
or a human reported error
The vision is - to capitalize on distributed testing resources. Also =
everyone has a tendency to desire simple testing -without downloading =
and wasting time thru manuals. I think we can get a lot of feedback if =
we can harness the power of the web.
=20
To do this - some simple servlets will need to be written. And we =
will need to find hosting, either at sourceforge or myservlets.com
[3] Create AWT components which can understand HTML formatting. Since =
HTMLParser works with Java 1.1, no special download is required for it =
run in standard browsers.=20
[4] Have a report section on the htmlparser site, where people can =
report and see how html parser is being used in various industry =
projects.
Pls feel free to add to this list - especially if you have any =
interesting insights or vision about where you see this project going. =
Once we are done with some basic brainstorming, we could probably set =
milestones for each of these tasks.=20
Cheers,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-04-17 03:40:48
|
Hi Folks,
HTMLParser 1.1 has just been released. This is a production release =
- HTMLParser finally moves out of the beta stage.=20
A whole lot of bug fixes, architecture modifications, and intense =
testing has been done.=20
You can get it from http://htmlparser.sourceforge.net
Thanks are due to a whole lot of people who helped with bug reports =
and suggestions for this release:
[1] Sam Joseph
[2] Raj Sharma
[3] Raghavender Srimantula
Regards,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-04-17 02:31:05
|
> Due to time constraints, I've decided to use the HTML parser in Swing > for the time being, but I'd definitely like to see the effect of a > better parser in Swing. Just try a search for 'JEditorPane' in the Bug > Parade and you'll see how long Sun has had issues with this area... Yes, I know the parser from Sun is not good. > I think your idea of trying the integration after 1.1 release is good. Ok - 1.1 should be out really soon. I am done with an ant script for building (phew!), and am giving the final touches to the code. We can expect a release this week. Regards, Somik ----- Original Message ----- From: "Craig Raw" <cr...@qu...> To: "'Somik Raha'" <so...@ya...>; <htm...@li...> Sent: Tuesday, April 16, 2002 6:17 PM Subject: [Htmlparser-developer] RE: [Htmlparser-user] Swing integration > Due to time constraints, I've decided to use the HTML parser in Swing > for the time being, but I'd definitely like to see the effect of a > better parser in Swing. Just try a search for 'JEditorPane' in the Bug > Parade and you'll see how long Sun has had issues with this area... > > I think your idea of trying the integration after 1.1 release is good. > > -craig > > > -----Original Message----- > From: Somik Raha [mailto:so...@ya...] > Sent: 16 April 2002 04:57 AM > To: htm...@li... > Cc: Craig Raw > Subject: Re: [Htmlparser-user] Swing integration > > Hi Craig, Asgher > I finally had the time to check Swing integration. Boy - the parser > design in Swing sucks!! Theoretically its possible to do it - and I got > started, but just realized that in order to be compatible with swing > objects > that do compile time type checking with a particular tag, I have to > actually > have 73 if statements to give the right tag to the callback. > I have more important things to do at the moment, but probably will > get > back to this donkey work. *sigh* > > I am thinking we should make release 1.1 and then try this. Any > suggestions ? > > Regards, > Somik > ----- Original Message ----- > From: "Somik Raha" <so...@ya...> > To: <htm...@li...> > Sent: Thursday, April 04, 2002 11:20 AM > Subject: Re: [Htmlparser-user] Swing integration > > > > Hi Craig, > > Thanks a lot for the post. Pls go ahead with your analysis. I will > try > > to catch up this weekend. > > Regards, > > Somik > > ----- Original Message ----- > > From: "Craig Raw" <cr...@qu...> > > To: "'Somik Raha'" <so...@ya...> > > Sent: Tuesday, April 02, 2002 3:32 PM > > Subject: RE: [Htmlparser-user] Swing integration > > > > > > > Hi Somik, > > > > > > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc - > which > > > is the driver behind JEditorPane's reading and writing HTML > > > capabilities. > > > > > > --- > > > Extendable/Scalable > > > > > > To maximize the usefulness of this kit, a great deal of effort has > gone > > > into making it extendable. These are some of the features. > > > The parser is replaceable. The default parser is the Hot Java parser > > > which is DTD based. A different DTD can be used, or an entirely > > > different parser can be used. To change the parser, reimplement the > > > getParser method. The default parser is dynamically loaded when > first > > > asked for, so the class files will never be loaded if an alternative > > > parser is used. The default parser is in a separate package called > > > parser below this package. > > > > > > The parser drives the ParserCallback, which is provided by > HTMLDocument. > > > To change the callback, subclass HTMLDocument and reimplement the > > > createDefaultDocument method to return document that produces a > > > different reader. The reader controls how the document is > structured. > > > Although the Document provides HTML support by default, there is > nothing > > > preventing support of non-HTML tags that result in alternative > element > > > structures. > > > --- > > > > > > I may find some time to look into this as well, although I am not > sure > > > how much it would fix JEditorPane's somewhat buggy HTML rendering > > > capabilities.... > > > > > > -craig > > > > > > > > > -----Original Message----- > > > From: htm...@li... > > > [mailto:htm...@li...] On Behalf Of > Somik > > > Raha > > > Sent: 01 April 2002 05:28 PM > > > To: HTMLParser User List > > > Cc: HTMLParser Developer List > > > Subject: Re: [Htmlparser-user] Swing integration > > > > > > Hi Craig > > > Wow! Thats a great question. > > > Actually, I doubt if I could replace Sun Microsystems' code with > > > mine. I > > > dont think Java is that open (or is it ?) > > > However, we could think of writing our own adapter for the html > parser > > > that > > > might plugin in some way... > > > I have never used Sun's html parser (If I had, I might not have > > > started > > > this project). > > > I will need to study Sun's parser before I can answer your > > > question.. > > > But there does seem to be some interesting possibilities. > > > > > > Regards > > > Somik > > > ----- Original Message ----- > > > From: "Craig Raw" <cr...@qu...> > > > To: <htm...@li...> > > > Sent: Monday, April 01, 2002 10:20 PM > > > Subject: [Htmlparser-user] Swing integration > > > > > > > > > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to > > > > provide a better implementation of JEditorPane's HTML viewing > > > > capabilities? HTML Parser would need to replace > > > > javax.swing.text.html.parser.Parser, which is currently somewhat > > > buggy. > > > > Anyone tried this? > > > > > > > > -craig > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Htmlparser-user mailing list > > > > Htm...@li... > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > _________________________________________________________ > > > Do You Yahoo!? > > > Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > _________________________________________________________ > > Do You Yahoo!? > > Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
|
From: Craig R. <cr...@qu...> - 2002-04-16 09:19:24
|
Due to time constraints, I've decided to use the HTML parser in Swing
for the time being, but I'd definitely like to see the effect of a
better parser in Swing. Just try a search for 'JEditorPane' in the Bug
Parade and you'll see how long Sun has had issues with this area...
I think your idea of trying the integration after 1.1 release is good.
-craig
-----Original Message-----
From: Somik Raha [mailto:so...@ya...]
Sent: 16 April 2002 04:57 AM
To: htm...@li...
Cc: Craig Raw
Subject: Re: [Htmlparser-user] Swing integration
Hi Craig, Asgher
I finally had the time to check Swing integration. Boy - the parser
design in Swing sucks!! Theoretically its possible to do it - and I got
started, but just realized that in order to be compatible with swing
objects
that do compile time type checking with a particular tag, I have to
actually
have 73 if statements to give the right tag to the callback.
I have more important things to do at the moment, but probably will
get
back to this donkey work. *sigh*
I am thinking we should make release 1.1 and then try this. Any
suggestions ?
Regards,
Somik
----- Original Message -----
From: "Somik Raha" <so...@ya...>
To: <htm...@li...>
Sent: Thursday, April 04, 2002 11:20 AM
Subject: Re: [Htmlparser-user] Swing integration
> Hi Craig,
> Thanks a lot for the post. Pls go ahead with your analysis. I will
try
> to catch up this weekend.
> Regards,
> Somik
> ----- Original Message -----
> From: "Craig Raw" <cr...@qu...>
> To: "'Somik Raha'" <so...@ya...>
> Sent: Tuesday, April 02, 2002 3:32 PM
> Subject: RE: [Htmlparser-user] Swing integration
>
>
> > Hi Somik,
> >
> > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc -
which
> > is the driver behind JEditorPane's reading and writing HTML
> > capabilities.
> >
> > ---
> > Extendable/Scalable
> >
> > To maximize the usefulness of this kit, a great deal of effort has
gone
> > into making it extendable. These are some of the features.
> > The parser is replaceable. The default parser is the Hot Java parser
> > which is DTD based. A different DTD can be used, or an entirely
> > different parser can be used. To change the parser, reimplement the
> > getParser method. The default parser is dynamically loaded when
first
> > asked for, so the class files will never be loaded if an alternative
> > parser is used. The default parser is in a separate package called
> > parser below this package.
> >
> > The parser drives the ParserCallback, which is provided by
HTMLDocument.
> > To change the callback, subclass HTMLDocument and reimplement the
> > createDefaultDocument method to return document that produces a
> > different reader. The reader controls how the document is
structured.
> > Although the Document provides HTML support by default, there is
nothing
> > preventing support of non-HTML tags that result in alternative
element
> > structures.
> > ---
> >
> > I may find some time to look into this as well, although I am not
sure
> > how much it would fix JEditorPane's somewhat buggy HTML rendering
> > capabilities....
> >
> > -craig
> >
> >
> > -----Original Message-----
> > From: htm...@li...
> > [mailto:htm...@li...] On Behalf Of
Somik
> > Raha
> > Sent: 01 April 2002 05:28 PM
> > To: HTMLParser User List
> > Cc: HTMLParser Developer List
> > Subject: Re: [Htmlparser-user] Swing integration
> >
> > Hi Craig
> > Wow! Thats a great question.
> > Actually, I doubt if I could replace Sun Microsystems' code with
> > mine. I
> > dont think Java is that open (or is it ?)
> > However, we could think of writing our own adapter for the html
parser
> > that
> > might plugin in some way...
> > I have never used Sun's html parser (If I had, I might not have
> > started
> > this project).
> > I will need to study Sun's parser before I can answer your
> > question..
> > But there does seem to be some interesting possibilities.
> >
> > Regards
> > Somik
> > ----- Original Message -----
> > From: "Craig Raw" <cr...@qu...>
> > To: <htm...@li...>
> > Sent: Monday, April 01, 2002 10:20 PM
> > Subject: [Htmlparser-user] Swing integration
> >
> >
> > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to
> > > provide a better implementation of JEditorPane's HTML viewing
> > > capabilities? HTML Parser would need to replace
> > > javax.swing.text.html.parser.Parser, which is currently somewhat
> > buggy.
> > > Anyone tried this?
> > >
> > > -craig
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Htmlparser-user mailing list
> > > Htm...@li...
> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
> > _________________________________________________________
> > Do You Yahoo!?
> > Get your free @yahoo.com address at http://mail.yahoo.com
> >
> >
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
> _________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com
>
>
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: Somik R. <so...@ya...> - 2002-04-16 02:59:53
|
Hi Craig, Asgher
I finally had the time to check Swing integration. Boy - the parser
design in Swing sucks!! Theoretically its possible to do it - and I got
started, but just realized that in order to be compatible with swing objects
that do compile time type checking with a particular tag, I have to actually
have 73 if statements to give the right tag to the callback.
I have more important things to do at the moment, but probably will get
back to this donkey work. *sigh*
I am thinking we should make release 1.1 and then try this. Any
suggestions ?
Regards,
Somik
----- Original Message -----
From: "Somik Raha" <so...@ya...>
To: <htm...@li...>
Sent: Thursday, April 04, 2002 11:20 AM
Subject: Re: [Htmlparser-user] Swing integration
> Hi Craig,
> Thanks a lot for the post. Pls go ahead with your analysis. I will try
> to catch up this weekend.
> Regards,
> Somik
> ----- Original Message -----
> From: "Craig Raw" <cr...@qu...>
> To: "'Somik Raha'" <so...@ya...>
> Sent: Tuesday, April 02, 2002 3:32 PM
> Subject: RE: [Htmlparser-user] Swing integration
>
>
> > Hi Somik,
> >
> > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc - which
> > is the driver behind JEditorPane's reading and writing HTML
> > capabilities.
> >
> > ---
> > Extendable/Scalable
> >
> > To maximize the usefulness of this kit, a great deal of effort has gone
> > into making it extendable. These are some of the features.
> > The parser is replaceable. The default parser is the Hot Java parser
> > which is DTD based. A different DTD can be used, or an entirely
> > different parser can be used. To change the parser, reimplement the
> > getParser method. The default parser is dynamically loaded when first
> > asked for, so the class files will never be loaded if an alternative
> > parser is used. The default parser is in a separate package called
> > parser below this package.
> >
> > The parser drives the ParserCallback, which is provided by HTMLDocument.
> > To change the callback, subclass HTMLDocument and reimplement the
> > createDefaultDocument method to return document that produces a
> > different reader. The reader controls how the document is structured.
> > Although the Document provides HTML support by default, there is nothing
> > preventing support of non-HTML tags that result in alternative element
> > structures.
> > ---
> >
> > I may find some time to look into this as well, although I am not sure
> > how much it would fix JEditorPane's somewhat buggy HTML rendering
> > capabilities....
> >
> > -craig
> >
> >
> > -----Original Message-----
> > From: htm...@li...
> > [mailto:htm...@li...] On Behalf Of Somik
> > Raha
> > Sent: 01 April 2002 05:28 PM
> > To: HTMLParser User List
> > Cc: HTMLParser Developer List
> > Subject: Re: [Htmlparser-user] Swing integration
> >
> > Hi Craig
> > Wow! Thats a great question.
> > Actually, I doubt if I could replace Sun Microsystems' code with
> > mine. I
> > dont think Java is that open (or is it ?)
> > However, we could think of writing our own adapter for the html parser
> > that
> > might plugin in some way...
> > I have never used Sun's html parser (If I had, I might not have
> > started
> > this project).
> > I will need to study Sun's parser before I can answer your
> > question..
> > But there does seem to be some interesting possibilities.
> >
> > Regards
> > Somik
> > ----- Original Message -----
> > From: "Craig Raw" <cr...@qu...>
> > To: <htm...@li...>
> > Sent: Monday, April 01, 2002 10:20 PM
> > Subject: [Htmlparser-user] Swing integration
> >
> >
> > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to
> > > provide a better implementation of JEditorPane's HTML viewing
> > > capabilities? HTML Parser would need to replace
> > > javax.swing.text.html.parser.Parser, which is currently somewhat
> > buggy.
> > > Anyone tried this?
> > >
> > > -craig
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Htmlparser-user mailing list
> > > Htm...@li...
> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
> > _________________________________________________________
> > Do You Yahoo!?
> > Get your free @yahoo.com address at http://mail.yahoo.com
> >
> >
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
> _________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com
>
>
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: Somik R. <so...@ya...> - 2002-04-15 05:14:41
|
Hi Folks,
Thanks to Sam Joseph (creator of Neurogrid). Sam is using the parser =
in the neurogrid project, and has pointed out a bug that slipped our =
attention. If links or image urls contain spaces, those spaces were =
being absorbed. That is incorrect behaviour, especially if you have =
something of the form :
http://myservlet.com/someservlet?name=3DSam Joseph&age=3D22
The same goes for images like
http://www.kizna.com/images/kizna corp.jpg
Also - previously, newline character were being converted to spaces. =
This has been modified - new line characters are left as is. The =
responsibility to deal with them is now with the appropriate scanner. =
So, the link and image scanners specifically filter out the newline =
characters, whereas jsp tags which might have jsp code - would like to =
preserve the new line chars.
Over 73 testcases now in the htmlparser, and all passing..
I think we're ready for release 1.1 now, unless I get any more bug =
reports this week.
You can check out the latest code from CVS.
Regards,
Somik
|
|
From: Raghavender S. <kin...@ho...> - 2002-04-12 03:38:39
|
Thanks somik. I will work on it.
Raghav
>From: "Somik Raha" <so...@ya...>
>To: <htm...@li...>, "Raghavender Srimantula"
><kin...@ho...>
>Subject: Re: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1
>Date: Fri, 12 Apr 2002 11:57:50 +0900
>
>Hi Raghav
> You are right. That is indeed a bug. I have written a test case for
>it,
>captured it, and fixed it.
> Code is checked into CVS - it should work for you now.
>
>Regards,
>Somik
>----- Original Message -----
>From: "Raghavender Srimantula" <kin...@ho...>
>To: <so...@ya...>; <htm...@li...>
>Sent: Friday, April 12, 2002 6:12 AM
>Subject: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1
>
>
> > hi Somik,
> > the code snippet you mailed me seems to have some problems.
> > let me explain you. the method
> > isXMLTagFound(node,"OPTION")
> > would always return false. the reason: in the definition of the above
>method
> > we have
> >
> > if (node instanceof HTMLTag) {
> > System.out.println("node instanceof HTMLTag in tagscanner ");
> > HTMLTag tag = (HTMLTag)node;
> > if (tag.getText().equals(tagName)) {
> > xmlTagFound=true;
> > }
> > }
> >
> > tag.getText() would always give me
> > OPTION value="#">Select a destination
> >
> > which is not equal to the tagName, in this case the tagName=OPTION.
> >
> > Raghav
> >
> >
> > >From: "Somik Raha" <so...@ya...>
> > >To: "Raghavender Srimantula" <kin...@ho...>,
> > ><htm...@li...>
> > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > >Date: Thu, 11 Apr 2002 11:14:51 +0900
> > >
> > >Hi Raghav
> > > I replied to your earlier query. Did you recieve the mail (I
>forwarded
> > >it again) ?
> > > Regarding your current query, there are two ways to handle option
> > >tags.
> > >
> > >[1] Like in the previous question, you will have to recognize a HTMLTag
> > >(begin tag), followed by HTMLStringNode, and finally HTMLEndTag.
> > >[2] To make life easier, since this tag is basic xml, you can use a
>special
> > >XML parsing method provided in the superclass HTMLTagScanner.
> > >
> > >The methods are :
> > >(i) isXMLTagFound
> > >(ii) extractXMLData
> > >
> > >both of them are static mehods.
> > >You would use it like this :
> > >
> > >HTMLNode node = reader.readElement();
> > >if (isXMLTag(node,"OPTION")) {
> > > String option = extractXMLData(node,"OPTION",reader);
> > > // The string now contains the data within the option xml tag
> > > // So given an input : <OPTION value="#">Select a
>destination</OPTION>
> > > // option will hold "Select a destination"
> > >}
> > >
> > >But getting the value from the option tag itself would need to be
>handled
> > >seperately.
> > >
> > >Regards,
> > >Somik
> > >----- Original Message -----
> > >From: "Raghavender Srimantula" <kin...@ho...>
> > >To: <so...@ya...>; <htm...@li...>
> > >Sent: Thursday, April 11, 2002 9:22 AM
> > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > >
> > >
> > > > hi Somik,
> > > > any ideas about my previous mail. let us say if we have
> > > > <OPTION value="#">Select a destination</OPTION>
> > > > when I do a
> > > > node = reader.readElement();
> > > > where "reader" is HTMLReader
> > > > the node I get is of type neither HTMLStringNode, HTMLEndTag,
> > > > HTMLRemarkNode.
> > > > how do I classify this if I want to do some thing with them.
> > > > Raghav
> > > >
> > > > >From: "Somik Raha" <so...@ya...>
> > > > >To: "Raghavender Srimantula" <kin...@ho...>
> > > > >CC: <htm...@li...>
> > > > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > > > >Date: Mon, 8 Apr 2002 13:04:07 +0900
> > > > >
> > > > >Hi Raghav
> > > > > > when would be this HTMLparser 1.1 out?
> > > > >As soon as I can wrap it up. Technically, the code is ready and
>already
> > > > >checked into CVS. I need to do the process of creating a release -
>make
> > > > >some
> > > > >documentation, check everything is ok, ..
> > > > >If I had some help I could wrap it up sooner.
> > > > >
> > > > > > I am not sure, but to me the way htmlparser parses is it gives
>me
> > >the
> > > > >tag
> > > > > > parameter of the first line in the above snippet of html code,
>when
> > >I
> > >do
> > > > > > Hashtable table = tag.parseParameters();
> > > > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > > > .....</FORM>
> > > > >
> > > > >Yes - parseParameters() will give you the stuff inside the FORM
>tag.
> > >That
> > > > >is
> > > > >what I call "microscopic" parsing. But to get the remaining tags -
>till
> > >you
> > > > >encounter </FORM> you need to do "macroscopic" parsing. This is not
> > >hard-
> > > > >check HTMLAppletScanner as an example.
> > > > >
> > > > >In a nutshell - concept is very simple. The scan method provides
>you
> > >with
> > >a
> > > > >reader. So you are to use that reader to read ahead and get the
>next
> > >tags.
> > > > >This is simple bcos the reader will automatically identify the
>correct
> > > > >tags,
> > > > >and the mechanism is very similar to using the parser to get the
>tags
> > >you
> > > > >want. The HTMLLinkScanner among others, also works on the same
> > >principle.
> > > > >
> > > > >Bytway - I think we should take this discussion to the Developer
>list.
> > > > >
> > > > >Regards,
> > > > >Somik
> > > > >----- Original Message -----
> > > > >From: "Raghavender Srimantula" <kin...@ho...>
> > > > >To: <htm...@li...>
> > > > >Sent: Monday, April 08, 2002 6:39 AM
> > > > >Subject: [Htmlparser-user] HTML parser 1.1
> > > > >
> > > > >
> > > > > > Hi Somik,
> > > > > > when would be this HTMLparser 1.1 out?
> > > > > > one more question. to parse the FORM tags, I have a small
>question.
> > > > > > let us say this is a form tag
> > > > > >
> > > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke">
> > > > > > <P>User name:
> > > > > > <INPUT TYPE="text" NAME="userName" SIZE="10">
> > > > > > <P>Password:
> > > > > > <INPUT TYPE="password" NAME="password" SIZE="12">
> > > > > > <P><INPUT TYPE="submit" VALUE="Log in">
> > > > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()">
> > > > > > </FORM>
> > > > > >
> > > > > > I am not sure, but to me the way htmlparser parses is it gives
>me
> > >the
> > > > >tag
> > > > > > parameter of the first line in the above snippet of html code,
>when
> > >I
> > >do
> > > > > > Hashtable table = tag.parseParameters();
> > > > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > > > .....</FORM>
> > > > > >
> > > > > > could you suggest me how to go ahead with this.
> > > > > > Raghav
> > > > > >
> > > > > >
> > > > > > to extract the INPUT tag parameters
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
>_________________________________________________________________
> > > > > > MSN Photos is the easiest way to share and print your photos:
> > > > > > http://photos.msn.com/support/worldwide.aspx
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Htmlparser-user mailing list
> > > > > > Htm...@li...
> > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > > > >
> > > > >
> > > > >_________________________________________________________
> > > > >Do You Yahoo!?
> > > > >Get your free @yahoo.com address at http://mail.yahoo.com
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > _________________________________________________________________
> > > > Get your FREE download of MSN Explorer at
> > >http://explorer.msn.com/intl.asp.
> > >
> > >
> > >_________________________________________________________
> > >Do You Yahoo!?
> > >Get your free @yahoo.com address at http://mail.yahoo.com
> > >
> >
> >
> >
> >
> > _________________________________________________________________
> > Chat with friends online, try MSN Messenger: http://messenger.msn.com
> >
> >
> > _______________________________________________
> > Htmlparser-developer mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>
>_________________________________________________________
>Do You Yahoo!?
>Get your free @yahoo.com address at http://mail.yahoo.com
>
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.
|
|
From: Somik R. <so...@ya...> - 2002-04-12 03:00:50
|
Hi Raghav
You are right. That is indeed a bug. I have written a test case for it,
captured it, and fixed it.
Code is checked into CVS - it should work for you now.
Regards,
Somik
----- Original Message -----
From: "Raghavender Srimantula" <kin...@ho...>
To: <so...@ya...>; <htm...@li...>
Sent: Friday, April 12, 2002 6:12 AM
Subject: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1
> hi Somik,
> the code snippet you mailed me seems to have some problems.
> let me explain you. the method
> isXMLTagFound(node,"OPTION")
> would always return false. the reason: in the definition of the above
method
> we have
>
> if (node instanceof HTMLTag) {
> System.out.println("node instanceof HTMLTag in tagscanner ");
> HTMLTag tag = (HTMLTag)node;
> if (tag.getText().equals(tagName)) {
> xmlTagFound=true;
> }
> }
>
> tag.getText() would always give me
> OPTION value="#">Select a destination
>
> which is not equal to the tagName, in this case the tagName=OPTION.
>
> Raghav
>
>
> >From: "Somik Raha" <so...@ya...>
> >To: "Raghavender Srimantula" <kin...@ho...>,
> ><htm...@li...>
> >Subject: Re: [Htmlparser-user] HTML parser 1.1
> >Date: Thu, 11 Apr 2002 11:14:51 +0900
> >
> >Hi Raghav
> > I replied to your earlier query. Did you recieve the mail (I
forwarded
> >it again) ?
> > Regarding your current query, there are two ways to handle option
> >tags.
> >
> >[1] Like in the previous question, you will have to recognize a HTMLTag
> >(begin tag), followed by HTMLStringNode, and finally HTMLEndTag.
> >[2] To make life easier, since this tag is basic xml, you can use a
special
> >XML parsing method provided in the superclass HTMLTagScanner.
> >
> >The methods are :
> >(i) isXMLTagFound
> >(ii) extractXMLData
> >
> >both of them are static mehods.
> >You would use it like this :
> >
> >HTMLNode node = reader.readElement();
> >if (isXMLTag(node,"OPTION")) {
> > String option = extractXMLData(node,"OPTION",reader);
> > // The string now contains the data within the option xml tag
> > // So given an input : <OPTION value="#">Select a
destination</OPTION>
> > // option will hold "Select a destination"
> >}
> >
> >But getting the value from the option tag itself would need to be handled
> >seperately.
> >
> >Regards,
> >Somik
> >----- Original Message -----
> >From: "Raghavender Srimantula" <kin...@ho...>
> >To: <so...@ya...>; <htm...@li...>
> >Sent: Thursday, April 11, 2002 9:22 AM
> >Subject: Re: [Htmlparser-user] HTML parser 1.1
> >
> >
> > > hi Somik,
> > > any ideas about my previous mail. let us say if we have
> > > <OPTION value="#">Select a destination</OPTION>
> > > when I do a
> > > node = reader.readElement();
> > > where "reader" is HTMLReader
> > > the node I get is of type neither HTMLStringNode, HTMLEndTag,
> > > HTMLRemarkNode.
> > > how do I classify this if I want to do some thing with them.
> > > Raghav
> > >
> > > >From: "Somik Raha" <so...@ya...>
> > > >To: "Raghavender Srimantula" <kin...@ho...>
> > > >CC: <htm...@li...>
> > > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > > >Date: Mon, 8 Apr 2002 13:04:07 +0900
> > > >
> > > >Hi Raghav
> > > > > when would be this HTMLparser 1.1 out?
> > > >As soon as I can wrap it up. Technically, the code is ready and
already
> > > >checked into CVS. I need to do the process of creating a release -
make
> > > >some
> > > >documentation, check everything is ok, ..
> > > >If I had some help I could wrap it up sooner.
> > > >
> > > > > I am not sure, but to me the way htmlparser parses is it gives me
> >the
> > > >tag
> > > > > parameter of the first line in the above snippet of html code,
when
> >I
> >do
> > > > > Hashtable table = tag.parseParameters();
> > > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > > .....</FORM>
> > > >
> > > >Yes - parseParameters() will give you the stuff inside the FORM tag.
> >That
> > > >is
> > > >what I call "microscopic" parsing. But to get the remaining tags -
till
> >you
> > > >encounter </FORM> you need to do "macroscopic" parsing. This is not
> >hard-
> > > >check HTMLAppletScanner as an example.
> > > >
> > > >In a nutshell - concept is very simple. The scan method provides you
> >with
> >a
> > > >reader. So you are to use that reader to read ahead and get the next
> >tags.
> > > >This is simple bcos the reader will automatically identify the
correct
> > > >tags,
> > > >and the mechanism is very similar to using the parser to get the tags
> >you
> > > >want. The HTMLLinkScanner among others, also works on the same
> >principle.
> > > >
> > > >Bytway - I think we should take this discussion to the Developer
list.
> > > >
> > > >Regards,
> > > >Somik
> > > >----- Original Message -----
> > > >From: "Raghavender Srimantula" <kin...@ho...>
> > > >To: <htm...@li...>
> > > >Sent: Monday, April 08, 2002 6:39 AM
> > > >Subject: [Htmlparser-user] HTML parser 1.1
> > > >
> > > >
> > > > > Hi Somik,
> > > > > when would be this HTMLparser 1.1 out?
> > > > > one more question. to parse the FORM tags, I have a small
question.
> > > > > let us say this is a form tag
> > > > >
> > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke">
> > > > > <P>User name:
> > > > > <INPUT TYPE="text" NAME="userName" SIZE="10">
> > > > > <P>Password:
> > > > > <INPUT TYPE="password" NAME="password" SIZE="12">
> > > > > <P><INPUT TYPE="submit" VALUE="Log in">
> > > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()">
> > > > > </FORM>
> > > > >
> > > > > I am not sure, but to me the way htmlparser parses is it gives me
> >the
> > > >tag
> > > > > parameter of the first line in the above snippet of html code,
when
> >I
> >do
> > > > > Hashtable table = tag.parseParameters();
> > > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > > .....</FORM>
> > > > >
> > > > > could you suggest me how to go ahead with this.
> > > > > Raghav
> > > > >
> > > > >
> > > > > to extract the INPUT tag parameters
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > _________________________________________________________________
> > > > > MSN Photos is the easiest way to share and print your photos:
> > > > > http://photos.msn.com/support/worldwide.aspx
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Htmlparser-user mailing list
> > > > > Htm...@li...
> > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > > >
> > > >
> > > >_________________________________________________________
> > > >Do You Yahoo!?
> > > >Get your free @yahoo.com address at http://mail.yahoo.com
> > > >
> > >
> > >
> > >
> > >
> > > _________________________________________________________________
> > > Get your FREE download of MSN Explorer at
> >http://explorer.msn.com/intl.asp.
> >
> >
> >_________________________________________________________
> >Do You Yahoo!?
> >Get your free @yahoo.com address at http://mail.yahoo.com
> >
>
>
>
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
>
>
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
|
|
From: Raghavender S. <kin...@ho...> - 2002-04-11 21:12:58
|
hi Somik,
the code snippet you mailed me seems to have some problems.
let me explain you. the method
isXMLTagFound(node,"OPTION")
would always return false. the reason: in the definition of the above method
we have
if (node instanceof HTMLTag) {
System.out.println("node instanceof HTMLTag in tagscanner ");
HTMLTag tag = (HTMLTag)node;
if (tag.getText().equals(tagName)) {
xmlTagFound=true;
}
}
tag.getText() would always give me
OPTION value="#">Select a destination
which is not equal to the tagName, in this case the tagName=OPTION.
Raghav
>From: "Somik Raha" <so...@ya...>
>To: "Raghavender Srimantula" <kin...@ho...>,
><htm...@li...>
>Subject: Re: [Htmlparser-user] HTML parser 1.1
>Date: Thu, 11 Apr 2002 11:14:51 +0900
>
>Hi Raghav
> I replied to your earlier query. Did you recieve the mail (I forwarded
>it again) ?
> Regarding your current query, there are two ways to handle option
>tags.
>
>[1] Like in the previous question, you will have to recognize a HTMLTag
>(begin tag), followed by HTMLStringNode, and finally HTMLEndTag.
>[2] To make life easier, since this tag is basic xml, you can use a special
>XML parsing method provided in the superclass HTMLTagScanner.
>
>The methods are :
>(i) isXMLTagFound
>(ii) extractXMLData
>
>both of them are static mehods.
>You would use it like this :
>
>HTMLNode node = reader.readElement();
>if (isXMLTag(node,"OPTION")) {
> String option = extractXMLData(node,"OPTION",reader);
> // The string now contains the data within the option xml tag
> // So given an input : <OPTION value="#">Select a destination</OPTION>
> // option will hold "Select a destination"
>}
>
>But getting the value from the option tag itself would need to be handled
>seperately.
>
>Regards,
>Somik
>----- Original Message -----
>From: "Raghavender Srimantula" <kin...@ho...>
>To: <so...@ya...>; <htm...@li...>
>Sent: Thursday, April 11, 2002 9:22 AM
>Subject: Re: [Htmlparser-user] HTML parser 1.1
>
>
> > hi Somik,
> > any ideas about my previous mail. let us say if we have
> > <OPTION value="#">Select a destination</OPTION>
> > when I do a
> > node = reader.readElement();
> > where "reader" is HTMLReader
> > the node I get is of type neither HTMLStringNode, HTMLEndTag,
> > HTMLRemarkNode.
> > how do I classify this if I want to do some thing with them.
> > Raghav
> >
> > >From: "Somik Raha" <so...@ya...>
> > >To: "Raghavender Srimantula" <kin...@ho...>
> > >CC: <htm...@li...>
> > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > >Date: Mon, 8 Apr 2002 13:04:07 +0900
> > >
> > >Hi Raghav
> > > > when would be this HTMLparser 1.1 out?
> > >As soon as I can wrap it up. Technically, the code is ready and already
> > >checked into CVS. I need to do the process of creating a release - make
> > >some
> > >documentation, check everything is ok, ..
> > >If I had some help I could wrap it up sooner.
> > >
> > > > I am not sure, but to me the way htmlparser parses is it gives me
>the
> > >tag
> > > > parameter of the first line in the above snippet of html code, when
>I
>do
> > > > Hashtable table = tag.parseParameters();
> > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > .....</FORM>
> > >
> > >Yes - parseParameters() will give you the stuff inside the FORM tag.
>That
> > >is
> > >what I call "microscopic" parsing. But to get the remaining tags - till
>you
> > >encounter </FORM> you need to do "macroscopic" parsing. This is not
>hard-
> > >check HTMLAppletScanner as an example.
> > >
> > >In a nutshell - concept is very simple. The scan method provides you
>with
>a
> > >reader. So you are to use that reader to read ahead and get the next
>tags.
> > >This is simple bcos the reader will automatically identify the correct
> > >tags,
> > >and the mechanism is very similar to using the parser to get the tags
>you
> > >want. The HTMLLinkScanner among others, also works on the same
>principle.
> > >
> > >Bytway - I think we should take this discussion to the Developer list.
> > >
> > >Regards,
> > >Somik
> > >----- Original Message -----
> > >From: "Raghavender Srimantula" <kin...@ho...>
> > >To: <htm...@li...>
> > >Sent: Monday, April 08, 2002 6:39 AM
> > >Subject: [Htmlparser-user] HTML parser 1.1
> > >
> > >
> > > > Hi Somik,
> > > > when would be this HTMLparser 1.1 out?
> > > > one more question. to parse the FORM tags, I have a small question.
> > > > let us say this is a form tag
> > > >
> > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke">
> > > > <P>User name:
> > > > <INPUT TYPE="text" NAME="userName" SIZE="10">
> > > > <P>Password:
> > > > <INPUT TYPE="password" NAME="password" SIZE="12">
> > > > <P><INPUT TYPE="submit" VALUE="Log in">
> > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()">
> > > > </FORM>
> > > >
> > > > I am not sure, but to me the way htmlparser parses is it gives me
>the
> > >tag
> > > > parameter of the first line in the above snippet of html code, when
>I
>do
> > > > Hashtable table = tag.parseParameters();
> > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > .....</FORM>
> > > >
> > > > could you suggest me how to go ahead with this.
> > > > Raghav
> > > >
> > > >
> > > > to extract the INPUT tag parameters
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _________________________________________________________________
> > > > MSN Photos is the easiest way to share and print your photos:
> > > > http://photos.msn.com/support/worldwide.aspx
> > > >
> > > >
> > > > _______________________________________________
> > > > Htmlparser-user mailing list
> > > > Htm...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > >
> > >
> > >_________________________________________________________
> > >Do You Yahoo!?
> > >Get your free @yahoo.com address at http://mail.yahoo.com
> > >
> >
> >
> >
> >
> > _________________________________________________________________
> > Get your FREE download of MSN Explorer at
>http://explorer.msn.com/intl.asp.
>
>
>_________________________________________________________
>Do You Yahoo!?
>Get your free @yahoo.com address at http://mail.yahoo.com
>
_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com
|
|
From: Somik R. <so...@ya...> - 2002-04-11 02:17:41
|
Hi Raghav
I replied to your earlier query. Did you recieve the mail (I forwarded
it again) ?
Regarding your current query, there are two ways to handle option tags.
[1] Like in the previous question, you will have to recognize a HTMLTag
(begin tag), followed by HTMLStringNode, and finally HTMLEndTag.
[2] To make life easier, since this tag is basic xml, you can use a special
XML parsing method provided in the superclass HTMLTagScanner.
The methods are :
(i) isXMLTagFound
(ii) extractXMLData
both of them are static mehods.
You would use it like this :
HTMLNode node = reader.readElement();
if (isXMLTag(node,"OPTION")) {
String option = extractXMLData(node,"OPTION",reader);
// The string now contains the data within the option xml tag
// So given an input : <OPTION value="#">Select a destination</OPTION>
// option will hold "Select a destination"
}
But getting the value from the option tag itself would need to be handled
seperately.
Regards,
Somik
----- Original Message -----
From: "Raghavender Srimantula" <kin...@ho...>
To: <so...@ya...>; <htm...@li...>
Sent: Thursday, April 11, 2002 9:22 AM
Subject: Re: [Htmlparser-user] HTML parser 1.1
> hi Somik,
> any ideas about my previous mail. let us say if we have
> <OPTION value="#">Select a destination</OPTION>
> when I do a
> node = reader.readElement();
> where "reader" is HTMLReader
> the node I get is of type neither HTMLStringNode, HTMLEndTag,
> HTMLRemarkNode.
> how do I classify this if I want to do some thing with them.
> Raghav
>
> >From: "Somik Raha" <so...@ya...>
> >To: "Raghavender Srimantula" <kin...@ho...>
> >CC: <htm...@li...>
> >Subject: Re: [Htmlparser-user] HTML parser 1.1
> >Date: Mon, 8 Apr 2002 13:04:07 +0900
> >
> >Hi Raghav
> > > when would be this HTMLparser 1.1 out?
> >As soon as I can wrap it up. Technically, the code is ready and already
> >checked into CVS. I need to do the process of creating a release - make
> >some
> >documentation, check everything is ok, ..
> >If I had some help I could wrap it up sooner.
> >
> > > I am not sure, but to me the way htmlparser parses is it gives me the
> >tag
> > > parameter of the first line in the above snippet of html code, when I
do
> > > Hashtable table = tag.parseParameters();
> > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > .....</FORM>
> >
> >Yes - parseParameters() will give you the stuff inside the FORM tag. That
> >is
> >what I call "microscopic" parsing. But to get the remaining tags - till
you
> >encounter </FORM> you need to do "macroscopic" parsing. This is not hard-
> >check HTMLAppletScanner as an example.
> >
> >In a nutshell - concept is very simple. The scan method provides you with
a
> >reader. So you are to use that reader to read ahead and get the next
tags.
> >This is simple bcos the reader will automatically identify the correct
> >tags,
> >and the mechanism is very similar to using the parser to get the tags you
> >want. The HTMLLinkScanner among others, also works on the same principle.
> >
> >Bytway - I think we should take this discussion to the Developer list.
> >
> >Regards,
> >Somik
> >----- Original Message -----
> >From: "Raghavender Srimantula" <kin...@ho...>
> >To: <htm...@li...>
> >Sent: Monday, April 08, 2002 6:39 AM
> >Subject: [Htmlparser-user] HTML parser 1.1
> >
> >
> > > Hi Somik,
> > > when would be this HTMLparser 1.1 out?
> > > one more question. to parse the FORM tags, I have a small question.
> > > let us say this is a form tag
> > >
> > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke">
> > > <P>User name:
> > > <INPUT TYPE="text" NAME="userName" SIZE="10">
> > > <P>Password:
> > > <INPUT TYPE="password" NAME="password" SIZE="12">
> > > <P><INPUT TYPE="submit" VALUE="Log in">
> > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()">
> > > </FORM>
> > >
> > > I am not sure, but to me the way htmlparser parses is it gives me the
> >tag
> > > parameter of the first line in the above snippet of html code, when I
do
> > > Hashtable table = tag.parseParameters();
> > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > .....</FORM>
> > >
> > > could you suggest me how to go ahead with this.
> > > Raghav
> > >
> > >
> > > to extract the INPUT tag parameters
> > >
> > >
> > >
> > >
> > >
> > > _________________________________________________________________
> > > MSN Photos is the easiest way to share and print your photos:
> > > http://photos.msn.com/support/worldwide.aspx
> > >
> > >
> > > _______________________________________________
> > > Htmlparser-user mailing list
> > > Htm...@li...
> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
> >_________________________________________________________
> >Do You Yahoo!?
> >Get your free @yahoo.com address at http://mail.yahoo.com
> >
>
>
>
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at
http://explorer.msn.com/intl.asp.
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
|
|
From: Raghavender S. <kin...@ho...> - 2002-04-11 00:23:02
|
hi Somik, any ideas about my previous mail. let us say if we have <OPTION value="#">Select a destination</OPTION> when I do a node = reader.readElement(); where "reader" is HTMLReader the node I get is of type neither HTMLStringNode, HTMLEndTag, HTMLRemarkNode. how do I classify this if I want to do some thing with them. Raghav >From: "Somik Raha" <so...@ya...> >To: "Raghavender Srimantula" <kin...@ho...> >CC: <htm...@li...> >Subject: Re: [Htmlparser-user] HTML parser 1.1 >Date: Mon, 8 Apr 2002 13:04:07 +0900 > >Hi Raghav > > when would be this HTMLparser 1.1 out? >As soon as I can wrap it up. Technically, the code is ready and already >checked into CVS. I need to do the process of creating a release - make >some >documentation, check everything is ok, .. >If I had some help I could wrap it up sooner. > > > I am not sure, but to me the way htmlparser parses is it gives me the >tag > > parameter of the first line in the above snippet of html code, when I do > > Hashtable table = tag.parseParameters(); > > it is looking for parameters inside <FORM ..... >, but not <FORM > > .....</FORM> > >Yes - parseParameters() will give you the stuff inside the FORM tag. That >is >what I call "microscopic" parsing. But to get the remaining tags - till you >encounter </FORM> you need to do "macroscopic" parsing. This is not hard- >check HTMLAppletScanner as an example. > >In a nutshell - concept is very simple. The scan method provides you with a >reader. So you are to use that reader to read ahead and get the next tags. >This is simple bcos the reader will automatically identify the correct >tags, >and the mechanism is very similar to using the parser to get the tags you >want. The HTMLLinkScanner among others, also works on the same principle. > >Bytway - I think we should take this discussion to the Developer list. > >Regards, >Somik >----- Original Message ----- >From: "Raghavender Srimantula" <kin...@ho...> >To: <htm...@li...> >Sent: Monday, April 08, 2002 6:39 AM >Subject: [Htmlparser-user] HTML parser 1.1 > > > > Hi Somik, > > when would be this HTMLparser 1.1 out? > > one more question. to parse the FORM tags, I have a small question. > > let us say this is a form tag > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > <P>User name: > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > <P>Password: > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > <P><INPUT TYPE="submit" VALUE="Log in"> > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > </FORM> > > > > I am not sure, but to me the way htmlparser parses is it gives me the >tag > > parameter of the first line in the above snippet of html code, when I do > > Hashtable table = tag.parseParameters(); > > it is looking for parameters inside <FORM ..... >, but not <FORM > > .....</FORM> > > > > could you suggest me how to go ahead with this. > > Raghav > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > _________________________________________________________________ > > MSN Photos is the easiest way to share and print your photos: > > http://photos.msn.com/support/worldwide.aspx > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >_________________________________________________________ >Do You Yahoo!? >Get your free @yahoo.com address at http://mail.yahoo.com > _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. |
|
From: Somik R. <so...@ya...> - 2002-04-09 14:42:52
|
Hi Raghav > Begin Tag : SELECT name="pulldown" class="smaller-text"; begins at : 0; ends > at : 44 > > this node which I get is of neither HTMLRemarkNode, HTMLStringNode, > HTMLEndTag. Thats right- this is expected behaviour. The type of this node is HTMLTag. If you downcast to HTMLTag, you can get all the info. Regards, Somik ----- Original Message ----- From: "Raghavender Srimantula" <kin...@ho...> To: <so...@ya...> Cc: <htm...@li...> Sent: Tuesday, April 09, 2002 7:01 PM Subject: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1 > hi Somik, > question regarding the form parsing. let us say I have this tag > <SELECT name="pulldown" class="smaller-text"> > > so now when I do a > node = reader.readElement(); > > if I do a node.print(), I get > > Begin Tag : SELECT name="pulldown" class="smaller-text"; begins at : 0; ends > at : 44 > > this node which I get is of neither HTMLRemarkNode, HTMLStringNode, > HTMLEndTag. > I am not sure how to classify this. because if I want to take some action > here I need to classify this node. > could you help me out. > Raghav > > > >From: "Somik Raha" <so...@ya...> > >To: "Raghavender Srimantula" <kin...@ho...> > >CC: <htm...@li...> > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > >Date: Mon, 8 Apr 2002 13:04:07 +0900 > > > >Hi Raghav > > > when would be this HTMLparser 1.1 out? > >As soon as I can wrap it up. Technically, the code is ready and already > >checked into CVS. I need to do the process of creating a release - make > >some > >documentation, check everything is ok, .. > >If I had some help I could wrap it up sooner. > > > > > I am not sure, but to me the way htmlparser parses is it gives me the > >tag > > > parameter of the first line in the above snippet of html code, when I do > > > Hashtable table = tag.parseParameters(); > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > .....</FORM> > > > >Yes - parseParameters() will give you the stuff inside the FORM tag. That > >is > >what I call "microscopic" parsing. But to get the remaining tags - till you > >encounter </FORM> you need to do "macroscopic" parsing. This is not hard- > >check HTMLAppletScanner as an example. > > > >In a nutshell - concept is very simple. The scan method provides you with a > >reader. So you are to use that reader to read ahead and get the next tags. > >This is simple bcos the reader will automatically identify the correct > >tags, > >and the mechanism is very similar to using the parser to get the tags you > >want. The HTMLLinkScanner among others, also works on the same principle. > > > >Bytway - I think we should take this discussion to the Developer list. > > > >Regards, > >Somik > >----- Original Message ----- > >From: "Raghavender Srimantula" <kin...@ho...> > >To: <htm...@li...> > >Sent: Monday, April 08, 2002 6:39 AM > >Subject: [Htmlparser-user] HTML parser 1.1 > > > > > > > Hi Somik, > > > when would be this HTMLparser 1.1 out? > > > one more question. to parse the FORM tags, I have a small question. > > > let us say this is a form tag > > > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > > <P>User name: > > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > > <P>Password: > > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > > <P><INPUT TYPE="submit" VALUE="Log in"> > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > > </FORM> > > > > > > I am not sure, but to me the way htmlparser parses is it gives me the > >tag > > > parameter of the first line in the above snippet of html code, when I do > > > Hashtable table = tag.parseParameters(); > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > .....</FORM> > > > > > > could you suggest me how to go ahead with this. > > > Raghav > > > > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > > > > > > > _________________________________________________________________ > > > MSN Photos is the easiest way to share and print your photos: > > > http://photos.msn.com/support/worldwide.aspx > > > > > > > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > >_________________________________________________________ > >Do You Yahoo!? > >Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > _________________________________________________________________ > Join the world's largest e-mail service with MSN Hotmail. > http://www.hotmail.com > > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
|
From: Raghavender S. <kin...@ho...> - 2002-04-09 10:01:43
|
hi Somik, question regarding the form parsing. let us say I have this tag <SELECT name="pulldown" class="smaller-text"> so now when I do a node = reader.readElement(); if I do a node.print(), I get Begin Tag : SELECT name="pulldown" class="smaller-text"; begins at : 0; ends at : 44 this node which I get is of neither HTMLRemarkNode, HTMLStringNode, HTMLEndTag. I am not sure how to classify this. because if I want to take some action here I need to classify this node. could you help me out. Raghav >From: "Somik Raha" <so...@ya...> >To: "Raghavender Srimantula" <kin...@ho...> >CC: <htm...@li...> >Subject: Re: [Htmlparser-user] HTML parser 1.1 >Date: Mon, 8 Apr 2002 13:04:07 +0900 > >Hi Raghav > > when would be this HTMLparser 1.1 out? >As soon as I can wrap it up. Technically, the code is ready and already >checked into CVS. I need to do the process of creating a release - make >some >documentation, check everything is ok, .. >If I had some help I could wrap it up sooner. > > > I am not sure, but to me the way htmlparser parses is it gives me the >tag > > parameter of the first line in the above snippet of html code, when I do > > Hashtable table = tag.parseParameters(); > > it is looking for parameters inside <FORM ..... >, but not <FORM > > .....</FORM> > >Yes - parseParameters() will give you the stuff inside the FORM tag. That >is >what I call "microscopic" parsing. But to get the remaining tags - till you >encounter </FORM> you need to do "macroscopic" parsing. This is not hard- >check HTMLAppletScanner as an example. > >In a nutshell - concept is very simple. The scan method provides you with a >reader. So you are to use that reader to read ahead and get the next tags. >This is simple bcos the reader will automatically identify the correct >tags, >and the mechanism is very similar to using the parser to get the tags you >want. The HTMLLinkScanner among others, also works on the same principle. > >Bytway - I think we should take this discussion to the Developer list. > >Regards, >Somik >----- Original Message ----- >From: "Raghavender Srimantula" <kin...@ho...> >To: <htm...@li...> >Sent: Monday, April 08, 2002 6:39 AM >Subject: [Htmlparser-user] HTML parser 1.1 > > > > Hi Somik, > > when would be this HTMLparser 1.1 out? > > one more question. to parse the FORM tags, I have a small question. > > let us say this is a form tag > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > <P>User name: > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > <P>Password: > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > <P><INPUT TYPE="submit" VALUE="Log in"> > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > </FORM> > > > > I am not sure, but to me the way htmlparser parses is it gives me the >tag > > parameter of the first line in the above snippet of html code, when I do > > Hashtable table = tag.parseParameters(); > > it is looking for parameters inside <FORM ..... >, but not <FORM > > .....</FORM> > > > > could you suggest me how to go ahead with this. > > Raghav > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > _________________________________________________________________ > > MSN Photos is the easiest way to share and print your photos: > > http://photos.msn.com/support/worldwide.aspx > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >_________________________________________________________ >Do You Yahoo!? >Get your free @yahoo.com address at http://mail.yahoo.com > _________________________________________________________________ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com |
|
From: Somik R. <so...@ya...> - 2002-04-08 04:06:58
|
Hi Raghav > when would be this HTMLparser 1.1 out? As soon as I can wrap it up. Technically, the code is ready and already checked into CVS. I need to do the process of creating a release - make some documentation, check everything is ok, .. If I had some help I could wrap it up sooner. > I am not sure, but to me the way htmlparser parses is it gives me the tag > parameter of the first line in the above snippet of html code, when I do > Hashtable table = tag.parseParameters(); > it is looking for parameters inside <FORM ..... >, but not <FORM > .....</FORM> Yes - parseParameters() will give you the stuff inside the FORM tag. That is what I call "microscopic" parsing. But to get the remaining tags - till you encounter </FORM> you need to do "macroscopic" parsing. This is not hard- check HTMLAppletScanner as an example. In a nutshell - concept is very simple. The scan method provides you with a reader. So you are to use that reader to read ahead and get the next tags. This is simple bcos the reader will automatically identify the correct tags, and the mechanism is very similar to using the parser to get the tags you want. The HTMLLinkScanner among others, also works on the same principle. Bytway - I think we should take this discussion to the Developer list. Regards, Somik ----- Original Message ----- From: "Raghavender Srimantula" <kin...@ho...> To: <htm...@li...> Sent: Monday, April 08, 2002 6:39 AM Subject: [Htmlparser-user] HTML parser 1.1 > Hi Somik, > when would be this HTMLparser 1.1 out? > one more question. to parse the FORM tags, I have a small question. > let us say this is a form tag > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > <P>User name: > <INPUT TYPE="text" NAME="userName" SIZE="10"> > <P>Password: > <INPUT TYPE="password" NAME="password" SIZE="12"> > <P><INPUT TYPE="submit" VALUE="Log in"> > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > </FORM> > > I am not sure, but to me the way htmlparser parses is it gives me the tag > parameter of the first line in the above snippet of html code, when I do > Hashtable table = tag.parseParameters(); > it is looking for parameters inside <FORM ..... >, but not <FORM > .....</FORM> > > could you suggest me how to go ahead with this. > Raghav > > > to extract the INPUT tag parameters > > > > > > _________________________________________________________________ > MSN Photos is the easiest way to share and print your photos: > http://photos.msn.com/support/worldwide.aspx > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
|
From: Somik R. <so...@ya...> - 2002-04-05 07:14:43
|
Hi Folks,
The dynamic page parsing bug is fixed, and as far as I've tested, I =
am able to parse correctly pages like =
http://search.yahoo.com/bin/search?p=3Ddogs=20
which Mats had posted earlier.
We are now ready for release 1.1. I'd be grateful if I had some help =
in testing the parser - and see if there are any showstopper bugs for =
this release. (Get the latest code from CVS)
Regards,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-04-05 03:11:21
|
> I have used parser available in JDK. > If u say I can send u example. Yes Asgher, pls go ahead. Regards, Somik _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
|
From: Somik R. <so...@ya...> - 2002-04-05 03:04:56
|
Hi Folks,
An important bug has been pointed out by Raj Sharma, which would =
halt the parser if a page contained a link spread over two lines. This =
was a bug in HTMLTag, and I was able to find it quickly, thanks to the =
refactoring done earlier with the help of Arnaud.
Also - HTMLLinkScanner and HTMLImageScanner have some small changes =
in connection with the fix.
Please get the latest code from CVS.
=20
Regards,
Somik
=20
|
|
From: Somik R. <so...@ya...> - 2002-04-04 15:51:32
|
>How come when you use the parser on most sites to extract links it works >fine but when you use it on search engine i.e. >http://search.yahoo.com/bin/search?p=dogs which is a page with search >results for dogs, it does not work? Ah - this is a known bug. It doesent work bcos the parser is not capable of handling dynamic pages. This is actually not a difficult bug to fix. Version 1.10 of HTMLParser (the next release - coming soon) will contain this and other fixes. So you will have to wait till this weekend, or make the fix yourself - the bug probably lies in HTMLParser.java itself, in the way a page extension is handled. Regards Somik _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
|
From: Somik R. <so...@ya...> - 2002-04-04 02:22:15
|
Hi Asgher,
> I have used parser available in JDK.
> If u say I can send u example.
Yes, pls go ahead. I dont have much time till the weekend, and it would
really help me get up to speed with some help.
Regards,
Somik
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
|
|
From: Asgher A. <as...@lw...> - 2002-04-02 05:02:54
|
I have used parser available in JDK. If u say I can send u example. On Monday, April 01, 2002 at 12:47:41 PM, htm...@li... wrote: > Send Htmlparser-developer mailing list submissions to > htm...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > or, via email, send a message with subject or body 'help' to > htm...@li... > > You can reach the person managing the list at > htm...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Htmlparser-developer digest..." > > > Today's Topics: > > 1. Re: [Htmlparser-user] Swing integration (Somik Raha) > > --__--__-- > > Message: 1 > From: "Somik Raha" <so...@ya...> > To: "HTMLParser User List" <htm...@li...> > Cc: "HTMLParser Developer List" <htm...@li...> > Date: Tue, 2 Apr 2002 00:28:28 +0900 > Subject: [Htmlparser-developer] Re: [Htmlparser-user] Swing integration > > Hi Craig > Wow! Thats a great question. > Actually, I doubt if I could replace Sun Microsystems' code with mine. I > dont think Java is that open (or is it ?) > However, we could think of writing our own adapter for the html parser that > might plugin in some way... > I have never used Sun's html parser (If I had, I might not have started > this project). > I will need to study Sun's parser before I can answer your question.. > But there does seem to be some interesting possibilities. > > Regards > Somik > ----- Original Message ----- > From: "Craig Raw" <cr...@qu...> > To: <htm...@li...> > Sent: Monday, April 01, 2002 10:20 PM > Subject: [Htmlparser-user] Swing integration > > > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to > > provide a better implementation of JEditorPane's HTML viewing > > capabilities? HTML Parser would need to replace > > javax.swing.text.html.parser.Parser, which is currently somewhat buggy. > > Anyone tried this? > > > > -craig > > > > > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > _________________________________________________________ > Do You Yahoo!? > Get your free @yahoo.com address at http://mail.yahoo.com > > > > > --__--__-- > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > End of Htmlparser-developer Digest > > Asgher Ali e-mail: as...@lw... --------------------------------------------- Lahore Wide Web "The Intranet Company" http://www.lww.org/ |