htmlparser-developer Mailing List for HTML Parser

Brought to you by: derrickoswald

htmlparser-developer — The developer mailing list of the htmlparser project

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (1)	Dec (4)
2002	Jan (12)	Feb	Mar (7)	Apr (27)	May (14)	Jun (16)	Jul (27)	Aug (74)	Sep (1)	Oct (23)	Nov (12)	Dec (119)
2003	Jan (31)	Feb (23)	Mar (28)	Apr (59)	May (119)	Jun (10)	Jul (3)	Aug (17)	Sep (8)	Oct (38)	Nov (6)	Dec (1)
2004	Jan (4)	Feb (4)	Mar (1)	Apr (2)	May	Jun (7)	Jul (6)	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan	Feb (1)	Mar	Apr (8)	May	Jun	Jul	Aug (2)	Sep (10)	Oct (4)	Nov (15)	Dec
2006	Jan	Feb (1)	Mar	Apr (4)	May (11)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2007	Jan (3)	Feb (2)	Mar	Apr (2)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (5)	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr (2)	May	Jun (4)	Jul	Aug (1)	Sep	Oct	Nov	Dec (2)
2010	Jan (1)	Feb	Mar	Apr (8)	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (1)	Dec
2011	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (2)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov (2)	Dec (2)

S	M	T	W	T	F	S
	1 (1)	2 (1)	3	4 (2)	5 (3)	6
7	8 (1)	9 (2)	10	11 (3)	12 (2)	13
14	15 (1)	16 (2)	17 (2)	18 (1)	19	20
21	22	23 (1)	24	25	26 (2)	27 (2)
28	29 (1)	30

Flat | Threaded

1 2 > >> (Page 1 of 2)

[Htmlparser-developer] Re: [Htmlparser-user] Hints on how to change image tag locations and write out document

From: Raghavender S. <kin...@ho...> - 2002-04-29 23:37:03

Hi Somik,
I encountered a strange problem today. while I was running htmlparser...I 
got a java.lang.OutOfMemoryError. seems that lot of objects are being 
allocated. where exactly is this happening. I mean could you give me an idea 
where or in which file the potential problem could be.
Raghav


>From: "Somik Raha" <so...@ya...>
>Reply-To: htm...@li...
>To: <htm...@li...>
>CC: <htm...@li...>
>Subject: Re: [Htmlparser-user] Hints on how to change image tag locations 
>and write out document
>Date: Sat, 27 Apr 2002 18:22:34 +0900
>
>Hi Annette,
>     Pls find attached a program to get you started. This program will do 
>what you want - you will need to modify the construct that checks for the 
>image tag - and replace it with the location of your choice.
>     Also - I found one bug thanks to this requirement - image tags params 
>were not being correctly put in. Though it needs a deeper look, I have done 
>a quick fix for now, and all test cases are passing (with one test case in 
>HTMLImageScannerTest trapping this bug).
>     Please check out the latest html parser source code from CVS.
>
>Regards,
>Somik
>
>   ----- Original Message -----
>   From: Doyle, Annette
>   To: htm...@li...
>   Sent: Friday, April 26, 2002 10:08 PM
>   Subject: [Htmlparser-user] Hints on how to change image tag locations 
>and write out document
>
>
>   Could you please give me some hints as how to change only image tag 
>locations and then, (or at the same time) write out the html document to 
>file (with new image tag locations)?
>
>
>
>   Thanks-
>
>   Annette Doyle
>
><< ImageTagRetriever.java >>




_________________________________________________________________
Join the worlds largest e-mail service with MSN Hotmail. 
http://www.hotmail.com

[Htmlparser-developer] Integration with Swing - Huge Pains

From: Somik R. <so...@ya...> - 2002-04-27 09:33:26

Hi Folks,   =20
    I am getting a lot of pain integrating html parser with Swing. It =
seems like Sun doesent want folks to change their parser. I am trying to =
come to terms with the fact that I need 72 if-thens, for all kinds of =
tags. I had initially written an object framework to compare html parser =
parsed objects with the swing parser objects, and its a nightmare, bcos =
even simple tags are not being picked up correctly by the latter.
    Meta tags dont seem to work, or tags with attributes have the =
attributes not showing up.
    I think its crazy for one person to do all of this, but if I can =
have help - then I will put up this integration code, and maybe we'd be =
able to get this done in a month (??)
    I guess this would be kind of prestigious if it gets finished - so =
developers- pls let me know who volunteers to help in this enterprise. =
(Its not hard really, but lots to be done)

Cheers,
Somik

[Htmlparser-developer] Re: [Htmlparser-user] Not all image tags are returned [Not a Bug]

From: Somik R. <so...@ya...> - 2002-04-26 03:43:51

Hi Annette,
    I just figured out what is happening...

    Sorry for the previous mail - this is not a bug in the parser. You see -
the tags which werent getting reported as image tags, were sandwiched
between link tags <A HREF="..."><IMG ..></A>. Hence, in your application,
you will also need to watch out for link tags, and pick up the images from
within should there be any.

    Now - if this causes you additional headaches, then dont register all
the scanners, so the link scanner will not interfere, and you will only get
the image tags.

   In order to prove that this analysis is correct - I added one more test
case to HTMLImageScannerTest.java -

testImageTagsFromYahooWithAllScannersRegistered()

This test case extracts the link and checks that the image is found within.
Also no of tags found is verified. You can check out this code from CVS, it
might help you if you are interested in getting image tags out of link tags.

Correspondingly, there is also testImageTagsFromYahoo() which passes (with
only html image scanner registered).

Let me know if you need further help.

Regards,
Somik
----- Original Message -----
From: Doyle, Annette
To: htm...@li...
Sent: Friday, April 26, 2002 1:32 AM
Subject: [Htmlparser-user] Not all image tags are returned


Is there any known problem about not all image tags being returned? I did
the following code:

                       HTMLParser parser = new
HTMLParser(htmlOriginalFileLoc);
                         // Registering all the common scanners
                         parser.registerScanners();
                         for (Enumeration e =
parser.elements();e.hasMoreElements();) {
                            HTMLNode node = (HTMLNode)e.nextElement();
                            if (node instanceof HTMLImageTag)
                            {
                                 System.out.println();

System.out.println(((HTMLImageTag)node).getTagLine());
                                System.out.println();


file://imageTagsUrl.addElement(((HTMLImageTag)node).getImageLocation());
                            }
                         }

I was testing with another html parser and it found all the image tags.
Attached is the source from www.yahoo.com when I ran the code above.

[Htmlparser-developer] Re: [Htmlparser-user] Not all image tags are returned

From: Somik R. <so...@ya...> - 2002-04-26 03:28:17

Hi Annette,
    Thanks for the report, I wrote a functional testcase, to do a raw =
check IMG tags, and with the parser, and could reproduce the bug. I dont =
think its a problem with the image scanner code - bcos the unit tests =
are passing with the same yahoo tags.
    Here's a quick solution for you : Dont use registerScanners() for =
now. Since your app specifically needs to check only image scanners, =
replace the line :

parser.registerScanners();=20

with
parser.addScanner(new HTMLImageScanner("-i"));=20

    I checked that all the yahoo image tags come fine with this change. =
The functional test has been checked into CVS (FunctionalTests.java), =
and the one with registerScanners() fails. The corresponding unit test =
in HTMLImageScanner passes.

    Meanwhile, I am trying to find out which scanner is messing up..
    Thanks again for your report.

Cheers,
Somik

  ----- Original Message -----=20
  From: Doyle, Annette=20
  To: htm...@li...=20
  Sent: Friday, April 26, 2002 1:32 AM
  Subject: [Htmlparser-user] Not all image tags are returned


  Is there any known problem about not all image tags being returned? I =
did the following code:

  =20

                         HTMLParser parser =3D new =
HTMLParser(htmlOriginalFileLoc);

                           // Registering all the common scanners

                           parser.registerScanners();=20

                           for (Enumeration e =3D =
parser.elements();e.hasMoreElements();) {

                              HTMLNode node =3D =
(HTMLNode)e.nextElement();

                              if (node instanceof HTMLImageTag)

                              {

                                   System.out.println();

                                  =
System.out.println(((HTMLImageTag)node).getTagLine());

                                  System.out.println();

                                 =20

                                  =
//imageTagsUrl.addElement(((HTMLImageTag)node).getImageLocation());

                              }

                           }

  =20

  I was testing with another html parser and it found all the image =
tags.  Attached is the source from www.yahoo.com when I ran the code =
above.

[Htmlparser-developer] Gordon's bug report

From: Somik R. <so...@ya...> - 2002-04-23 14:56:17

Hi Developers,
    What do you think of Gordon Deudney's bug report at =
http://sourceforge.net/tracker/?group_id=3D24399&atid=3D381399
This is actually open for discussion.

Regards
Somik

[Htmlparser-developer] To do list

From: Somik R. <so...@ya...> - 2002-04-18 01:59:39

Hi Folks,
    To all the developers - here is a to-do list for the project. You =
can pick any to get involved :

[1] Swing integration - Plugin htmlparser and demonstrate how it can be =
used instead of HTML Parser that comes with the Sun JDK
[2] Set up a servlet - which allows people to test the html parser =
online. The idea is :
    (i) Enter your URL - and click Parse
    (ii) The parser is launched on the server, and produces all the =
nodes (node.print()) on the display.
    (iii) If an exception gets thrown, then this url is saved into a =
database(??)
    (iv) If no exception is thrown, but however there is an error in the =
parsing, then a report can be entered on the result page by the tester, =
telling us why he thinks the output is incorrect.
    (v) We get notified everytime there is a report, either of a crash, =
or a human reported error

    The vision is - to capitalize on distributed testing resources. Also =
everyone has a tendency to desire simple testing -without downloading =
and wasting time thru manuals. I think we can get a lot of feedback if =
we can harness the power of the web.
   =20
    To do this - some simple servlets will need to be written. And we =
will need to find hosting, either at sourceforge or myservlets.com

[3] Create AWT components which can understand HTML formatting. Since =
HTMLParser works with Java 1.1, no special download is required for it =
run in standard browsers.=20

[4] Have a report section on the htmlparser site, where people can =
report and see how html parser is being used in various industry =
projects.

Pls feel free to add to this list - especially if you have any =
interesting insights or vision about where you see this project going. =
Once we are done with some basic brainstorming, we could probably set =
milestones for each of these tasks.=20

Cheers,
Somik

[Htmlparser-developer] HTMLParser 1.1 released

From: Somik R. <so...@ya...> - 2002-04-17 03:40:48

Hi Folks,
    HTMLParser 1.1 has just been released. This is a production release =
- HTMLParser finally moves out of the beta stage.=20
    A whole lot of bug fixes, architecture modifications, and intense =
testing has been done.=20
    You can get it from http://htmlparser.sourceforge.net
    Thanks are due to a whole lot of people who helped with bug reports =
and suggestions for this release:

[1] Sam Joseph
[2] Raj Sharma
[3] Raghavender Srimantula

Regards,
Somik

Re: [Htmlparser-developer] RE: Swing integration

From: Somik R. <so...@ya...> - 2002-04-17 02:31:05

> Due to time constraints, I've decided to use the HTML parser in Swing
> for the time being, but I'd definitely like to see the effect of a
> better parser in Swing. Just try a search for 'JEditorPane' in the Bug
> Parade and you'll see how long Sun has had issues with this area...

Yes, I know the parser from Sun is not good.

> I think your idea of trying the integration after 1.1 release is good.
Ok - 1.1 should be out really soon. I am done with an ant script for
building (phew!), and am giving the final touches to the code. We can expect
a release this week.

Regards,
Somik

----- Original Message -----
From: "Craig Raw" <cr...@qu...>
To: "'Somik Raha'" <so...@ya...>;
<htm...@li...>
Sent: Tuesday, April 16, 2002 6:17 PM
Subject: [Htmlparser-developer] RE: [Htmlparser-user] Swing integration


> Due to time constraints, I've decided to use the HTML parser in Swing
> for the time being, but I'd definitely like to see the effect of a
> better parser in Swing. Just try a search for 'JEditorPane' in the Bug
> Parade and you'll see how long Sun has had issues with this area...
>
> I think your idea of trying the integration after 1.1 release is good.
>
> -craig
>
>
> -----Original Message-----
> From: Somik Raha [mailto:so...@ya...]
> Sent: 16 April 2002 04:57 AM
> To: htm...@li...
> Cc: Craig Raw
> Subject: Re: [Htmlparser-user] Swing integration
>
> Hi Craig, Asgher
>     I finally had the time to check Swing integration. Boy - the parser
> design in Swing sucks!! Theoretically its possible to do it - and I got
> started, but just realized that in order to be compatible with swing
> objects
> that do compile time type checking with a particular tag, I have to
> actually
> have 73 if statements to give the right tag to the callback.
>     I have more important things to do at the moment, but probably will
> get
> back to this donkey work. *sigh*
>
>     I am thinking we should make release 1.1 and then try this. Any
> suggestions ?
>
> Regards,
> Somik
> ----- Original Message -----
> From: "Somik Raha" <so...@ya...>
> To: <htm...@li...>
> Sent: Thursday, April 04, 2002 11:20 AM
> Subject: Re: [Htmlparser-user] Swing integration
>
>
> > Hi Craig,
> >     Thanks a lot for the post. Pls go ahead with your analysis. I will
> try
> > to catch up this weekend.
> > Regards,
> > Somik
> > ----- Original Message -----
> > From: "Craig Raw" <cr...@qu...>
> > To: "'Somik Raha'" <so...@ya...>
> > Sent: Tuesday, April 02, 2002 3:32 PM
> > Subject: RE: [Htmlparser-user] Swing integration
> >
> >
> > > Hi Somik,
> > >
> > > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc -
> which
> > > is the driver behind JEditorPane's reading and writing HTML
> > > capabilities.
> > >
> > > ---
> > > Extendable/Scalable
> > >
> > > To maximize the usefulness of this kit, a great deal of effort has
> gone
> > > into making it extendable. These are some of the features.
> > > The parser is replaceable. The default parser is the Hot Java parser
> > > which is DTD based. A different DTD can be used, or an entirely
> > > different parser can be used. To change the parser, reimplement the
> > > getParser method. The default parser is dynamically loaded when
> first
> > > asked for, so the class files will never be loaded if an alternative
> > > parser is used. The default parser is in a separate package called
> > > parser below this package.
> > >
> > > The parser drives the ParserCallback, which is provided by
> HTMLDocument.
> > > To change the callback, subclass HTMLDocument and reimplement the
> > > createDefaultDocument method to return document that produces a
> > > different reader. The reader controls how the document is
> structured.
> > > Although the Document provides HTML support by default, there is
> nothing
> > > preventing support of non-HTML tags that result in alternative
> element
> > > structures.
> > > ---
> > >
> > > I may find some time to look into this as well, although I am not
> sure
> > > how much it would fix JEditorPane's somewhat buggy HTML rendering
> > > capabilities....
> > >
> > > -craig
> > >
> > >
> > > -----Original Message-----
> > > From: htm...@li...
> > > [mailto:htm...@li...] On Behalf Of
> Somik
> > > Raha
> > > Sent: 01 April 2002 05:28 PM
> > > To: HTMLParser User List
> > > Cc: HTMLParser Developer List
> > > Subject: Re: [Htmlparser-user] Swing integration
> > >
> > > Hi Craig
> > >     Wow! Thats a great question.
> > >     Actually, I doubt if I could replace Sun Microsystems' code with
> > > mine. I
> > > dont think Java is that open (or is it ?)
> > > However, we could think of writing our own adapter for the html
> parser
> > > that
> > > might plugin in some way...
> > >      I have never used Sun's html parser (If I had, I might not have
> > > started
> > > this project).
> > >      I will need to study Sun's parser before I can answer your
> > > question..
> > >     But there does seem to be some interesting possibilities.
> > >
> > > Regards
> > > Somik
> > > ----- Original Message -----
> > > From: "Craig Raw" <cr...@qu...>
> > > To: <htm...@li...>
> > > Sent: Monday, April 01, 2002 10:20 PM
> > > Subject: [Htmlparser-user] Swing integration
> > >
> > >
> > > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to
> > > > provide a better implementation of JEditorPane's HTML viewing
> > > > capabilities? HTML Parser would need to replace
> > > > javax.swing.text.html.parser.Parser, which is currently somewhat
> > > buggy.
> > > > Anyone tried this?
> > > >
> > > > -craig
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Htmlparser-user mailing list
> > > > Htm...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > >
> > >
> > > _________________________________________________________
> > > Do You Yahoo!?
> > > Get your free @yahoo.com address at http://mail.yahoo.com
> > >
> > >
> > > _______________________________________________
> > > Htmlparser-user mailing list
> > > Htm...@li...
> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
> > _________________________________________________________
> > Do You Yahoo!?
> > Get your free @yahoo.com address at http://mail.yahoo.com
> >
> >
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer

[Htmlparser-developer] RE: [Htmlparser-user] Swing integration

From: Craig R. <cr...@qu...> - 2002-04-16 09:19:24

Due to time constraints, I've decided to use the HTML parser in Swing
for the time being, but I'd definitely like to see the effect of a
better parser in Swing. Just try a search for 'JEditorPane' in the Bug
Parade and you'll see how long Sun has had issues with this area...

I think your idea of trying the integration after 1.1 release is good.

-craig


-----Original Message-----
From: Somik Raha [mailto:so...@ya...] 
Sent: 16 April 2002 04:57 AM
To: htm...@li...
Cc: Craig Raw
Subject: Re: [Htmlparser-user] Swing integration

Hi Craig, Asgher
    I finally had the time to check Swing integration. Boy - the parser
design in Swing sucks!! Theoretically its possible to do it - and I got
started, but just realized that in order to be compatible with swing
objects
that do compile time type checking with a particular tag, I have to
actually
have 73 if statements to give the right tag to the callback.
    I have more important things to do at the moment, but probably will
get
back to this donkey work. *sigh*

    I am thinking we should make release 1.1 and then try this. Any
suggestions ?

Regards,
Somik
----- Original Message -----
From: "Somik Raha" <so...@ya...>
To: <htm...@li...>
Sent: Thursday, April 04, 2002 11:20 AM
Subject: Re: [Htmlparser-user] Swing integration


> Hi Craig,
>     Thanks a lot for the post. Pls go ahead with your analysis. I will
try
> to catch up this weekend.
> Regards,
> Somik
> ----- Original Message -----
> From: "Craig Raw" <cr...@qu...>
> To: "'Somik Raha'" <so...@ya...>
> Sent: Tuesday, April 02, 2002 3:32 PM
> Subject: RE: [Htmlparser-user] Swing integration
>
>
> > Hi Somik,
> >
> > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc -
which
> > is the driver behind JEditorPane's reading and writing HTML
> > capabilities.
> >
> > ---
> > Extendable/Scalable
> >
> > To maximize the usefulness of this kit, a great deal of effort has
gone
> > into making it extendable. These are some of the features.
> > The parser is replaceable. The default parser is the Hot Java parser
> > which is DTD based. A different DTD can be used, or an entirely
> > different parser can be used. To change the parser, reimplement the
> > getParser method. The default parser is dynamically loaded when
first
> > asked for, so the class files will never be loaded if an alternative
> > parser is used. The default parser is in a separate package called
> > parser below this package.
> >
> > The parser drives the ParserCallback, which is provided by
HTMLDocument.
> > To change the callback, subclass HTMLDocument and reimplement the
> > createDefaultDocument method to return document that produces a
> > different reader. The reader controls how the document is
structured.
> > Although the Document provides HTML support by default, there is
nothing
> > preventing support of non-HTML tags that result in alternative
element
> > structures.
> > ---
> >
> > I may find some time to look into this as well, although I am not
sure
> > how much it would fix JEditorPane's somewhat buggy HTML rendering
> > capabilities....
> >
> > -craig
> >
> >
> > -----Original Message-----
> > From: htm...@li...
> > [mailto:htm...@li...] On Behalf Of
Somik
> > Raha
> > Sent: 01 April 2002 05:28 PM
> > To: HTMLParser User List
> > Cc: HTMLParser Developer List
> > Subject: Re: [Htmlparser-user] Swing integration
> >
> > Hi Craig
> >     Wow! Thats a great question.
> >     Actually, I doubt if I could replace Sun Microsystems' code with
> > mine. I
> > dont think Java is that open (or is it ?)
> > However, we could think of writing our own adapter for the html
parser
> > that
> > might plugin in some way...
> >      I have never used Sun's html parser (If I had, I might not have
> > started
> > this project).
> >      I will need to study Sun's parser before I can answer your
> > question..
> >     But there does seem to be some interesting possibilities.
> >
> > Regards
> > Somik
> > ----- Original Message -----
> > From: "Craig Raw" <cr...@qu...>
> > To: <htm...@li...>
> > Sent: Monday, April 01, 2002 10:20 PM
> > Subject: [Htmlparser-user] Swing integration
> >
> >
> > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to
> > > provide a better implementation of JEditorPane's HTML viewing
> > > capabilities? HTML Parser would need to replace
> > > javax.swing.text.html.parser.Parser, which is currently somewhat
> > buggy.
> > > Anyone tried this?
> > >
> > > -craig
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Htmlparser-user mailing list
> > > Htm...@li...
> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
> > _________________________________________________________
> > Do You Yahoo!?
> > Get your free @yahoo.com address at http://mail.yahoo.com
> >
> >
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
> _________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com
>
>
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user

[Htmlparser-developer] Re: [Htmlparser-user] Swing integration

From: Somik R. <so...@ya...> - 2002-04-16 02:59:53

Hi Craig, Asgher
    I finally had the time to check Swing integration. Boy - the parser
design in Swing sucks!! Theoretically its possible to do it - and I got
started, but just realized that in order to be compatible with swing objects
that do compile time type checking with a particular tag, I have to actually
have 73 if statements to give the right tag to the callback.
    I have more important things to do at the moment, but probably will get
back to this donkey work. *sigh*

    I am thinking we should make release 1.1 and then try this. Any
suggestions ?

Regards,
Somik
----- Original Message -----
From: "Somik Raha" <so...@ya...>
To: <htm...@li...>
Sent: Thursday, April 04, 2002 11:20 AM
Subject: Re: [Htmlparser-user] Swing integration


> Hi Craig,
>     Thanks a lot for the post. Pls go ahead with your analysis. I will try
> to catch up this weekend.
> Regards,
> Somik
> ----- Original Message -----
> From: "Craig Raw" <cr...@qu...>
> To: "'Somik Raha'" <so...@ya...>
> Sent: Tuesday, April 02, 2002 3:32 PM
> Subject: RE: [Htmlparser-user] Swing integration
>
>
> > Hi Somik,
> >
> > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc - which
> > is the driver behind JEditorPane's reading and writing HTML
> > capabilities.
> >
> > ---
> > Extendable/Scalable
> >
> > To maximize the usefulness of this kit, a great deal of effort has gone
> > into making it extendable. These are some of the features.
> > The parser is replaceable. The default parser is the Hot Java parser
> > which is DTD based. A different DTD can be used, or an entirely
> > different parser can be used. To change the parser, reimplement the
> > getParser method. The default parser is dynamically loaded when first
> > asked for, so the class files will never be loaded if an alternative
> > parser is used. The default parser is in a separate package called
> > parser below this package.
> >
> > The parser drives the ParserCallback, which is provided by HTMLDocument.
> > To change the callback, subclass HTMLDocument and reimplement the
> > createDefaultDocument method to return document that produces a
> > different reader. The reader controls how the document is structured.
> > Although the Document provides HTML support by default, there is nothing
> > preventing support of non-HTML tags that result in alternative element
> > structures.
> > ---
> >
> > I may find some time to look into this as well, although I am not sure
> > how much it would fix JEditorPane's somewhat buggy HTML rendering
> > capabilities....
> >
> > -craig
> >
> >
> > -----Original Message-----
> > From: htm...@li...
> > [mailto:htm...@li...] On Behalf Of Somik
> > Raha
> > Sent: 01 April 2002 05:28 PM
> > To: HTMLParser User List
> > Cc: HTMLParser Developer List
> > Subject: Re: [Htmlparser-user] Swing integration
> >
> > Hi Craig
> >     Wow! Thats a great question.
> >     Actually, I doubt if I could replace Sun Microsystems' code with
> > mine. I
> > dont think Java is that open (or is it ?)
> > However, we could think of writing our own adapter for the html parser
> > that
> > might plugin in some way...
> >      I have never used Sun's html parser (If I had, I might not have
> > started
> > this project).
> >      I will need to study Sun's parser before I can answer your
> > question..
> >     But there does seem to be some interesting possibilities.
> >
> > Regards
> > Somik
> > ----- Original Message -----
> > From: "Craig Raw" <cr...@qu...>
> > To: <htm...@li...>
> > Sent: Monday, April 01, 2002 10:20 PM
> > Subject: [Htmlparser-user] Swing integration
> >
> >
> > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to
> > > provide a better implementation of JEditorPane's HTML viewing
> > > capabilities? HTML Parser would need to replace
> > > javax.swing.text.html.parser.Parser, which is currently somewhat
> > buggy.
> > > Anyone tried this?
> > >
> > > -craig
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Htmlparser-user mailing list
> > > Htm...@li...
> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
> > _________________________________________________________
> > Do You Yahoo!?
> > Get your free @yahoo.com address at http://mail.yahoo.com
> >
> >
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
> _________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com
>
>
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user

[Htmlparser-developer] Major bug fix - spaces in links and images are now handled

From: Somik R. <so...@ya...> - 2002-04-15 05:14:41

Hi Folks,
    Thanks to Sam Joseph (creator of Neurogrid). Sam is using the parser =
in the neurogrid project, and has pointed out a bug that slipped our =
attention. If links or image urls contain spaces, those spaces were =
being absorbed. That is incorrect behaviour, especially if you have =
something of the form :

http://myservlet.com/someservlet?name=3DSam Joseph&age=3D22

The same goes for images like
http://www.kizna.com/images/kizna corp.jpg

Also - previously, newline character were being converted to spaces. =
This has been modified - new line characters are left as is. The =
responsibility to deal with them is now with the appropriate scanner. =
So, the link and image scanners specifically filter out the newline =
characters, whereas jsp tags which might have jsp code - would like to =
preserve the new line chars.

Over 73 testcases now in the htmlparser, and all passing..
I think we're ready for release 1.1 now, unless I get any more bug =
reports this week.

You can check out the latest code from CVS.

Regards,
Somik

Re: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1

From: Raghavender S. <kin...@ho...> - 2002-04-12 03:38:39

Thanks somik. I will work on it.
Raghav


>From: "Somik Raha" <so...@ya...>
>To: <htm...@li...>, "Raghavender Srimantula" 
><kin...@ho...>
>Subject: Re: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1
>Date: Fri, 12 Apr 2002 11:57:50 +0900
>
>Hi Raghav
>     You are right. That is indeed a bug. I have written a test case for 
>it,
>captured it, and fixed it.
>     Code is checked into CVS - it should work for you now.
>
>Regards,
>Somik
>----- Original Message -----
>From: "Raghavender Srimantula" <kin...@ho...>
>To: <so...@ya...>; <htm...@li...>
>Sent: Friday, April 12, 2002 6:12 AM
>Subject: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1
>
>
> > hi Somik,
> > the code snippet you mailed me seems to have some problems.
> > let me explain you. the method
> > isXMLTagFound(node,"OPTION")
> > would always return false. the reason: in the definition of the above
>method
> > we have
> >
> > if (node instanceof HTMLTag) {
> >       System.out.println("node instanceof HTMLTag in tagscanner  ");
> > HTMLTag tag = (HTMLTag)node;
> > if (tag.getText().equals(tagName)) {
> > xmlTagFound=true;
> > }
> > }
> >
> > tag.getText() would always give me
> > OPTION value="#">Select a destination
> >
> > which is not equal to the tagName, in this case the tagName=OPTION.
> >
> > Raghav
> >
> >
> > >From: "Somik Raha" <so...@ya...>
> > >To: "Raghavender Srimantula" <kin...@ho...>,
> > ><htm...@li...>
> > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > >Date: Thu, 11 Apr 2002 11:14:51 +0900
> > >
> > >Hi Raghav
> > >     I replied to your earlier query. Did you recieve the mail (I
>forwarded
> > >it again) ?
> > >     Regarding your current query, there are two ways to handle option
> > >tags.
> > >
> > >[1] Like in the previous question, you will have to recognize a HTMLTag
> > >(begin tag), followed by HTMLStringNode, and finally HTMLEndTag.
> > >[2] To make life easier, since this tag is basic xml, you can use a
>special
> > >XML parsing method provided in the superclass HTMLTagScanner.
> > >
> > >The methods are :
> > >(i) isXMLTagFound
> > >(ii) extractXMLData
> > >
> > >both of them are static mehods.
> > >You would use it like this :
> > >
> > >HTMLNode node = reader.readElement();
> > >if (isXMLTag(node,"OPTION")) {
> > >     String option = extractXMLData(node,"OPTION",reader);
> > >     // The string now contains the data within the option xml tag
> > >     // So given an input : <OPTION value="#">Select a
>destination</OPTION>
> > >     // option will hold "Select a destination"
> > >}
> > >
> > >But getting the value from the option tag itself would need to be 
>handled
> > >seperately.
> > >
> > >Regards,
> > >Somik
> > >----- Original Message -----
> > >From: "Raghavender Srimantula" <kin...@ho...>
> > >To: <so...@ya...>; <htm...@li...>
> > >Sent: Thursday, April 11, 2002 9:22 AM
> > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > >
> > >
> > > > hi Somik,
> > > > any ideas about my previous mail. let us say if we have
> > > > <OPTION value="#">Select a destination</OPTION>
> > > > when I do a
> > > > node = reader.readElement();
> > > > where "reader" is HTMLReader
> > > > the node I get is of type neither HTMLStringNode, HTMLEndTag,
> > > > HTMLRemarkNode.
> > > > how do I classify this if I want to do some thing with them.
> > > > Raghav
> > > >
> > > > >From: "Somik Raha" <so...@ya...>
> > > > >To: "Raghavender Srimantula" <kin...@ho...>
> > > > >CC: <htm...@li...>
> > > > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > > > >Date: Mon, 8 Apr 2002 13:04:07 +0900
> > > > >
> > > > >Hi Raghav
> > > > > > when would be this HTMLparser 1.1 out?
> > > > >As soon as I can wrap it up. Technically, the code is ready and
>already
> > > > >checked into CVS. I need to do the process of creating a release -
>make
> > > > >some
> > > > >documentation, check everything is ok, ..
> > > > >If I had some help I could wrap it up sooner.
> > > > >
> > > > > > I am not sure, but to me the way htmlparser parses is it gives 
>me
> > >the
> > > > >tag
> > > > > > parameter of the first line in the above snippet of html code,
>when
> > >I
> > >do
> > > > > > Hashtable table = tag.parseParameters();
> > > > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > > > .....</FORM>
> > > > >
> > > > >Yes - parseParameters() will give you the stuff inside the FORM 
>tag.
> > >That
> > > > >is
> > > > >what I call "microscopic" parsing. But to get the remaining tags -
>till
> > >you
> > > > >encounter </FORM> you need to do "macroscopic" parsing. This is not
> > >hard-
> > > > >check HTMLAppletScanner as an example.
> > > > >
> > > > >In a nutshell - concept is very simple. The scan method provides 
>you
> > >with
> > >a
> > > > >reader. So you are to use that reader to read ahead and get the 
>next
> > >tags.
> > > > >This is simple bcos the reader will automatically identify the
>correct
> > > > >tags,
> > > > >and the mechanism is very similar to using the parser to get the 
>tags
> > >you
> > > > >want. The HTMLLinkScanner among others, also works on the same
> > >principle.
> > > > >
> > > > >Bytway - I think we should take this discussion to the Developer
>list.
> > > > >
> > > > >Regards,
> > > > >Somik
> > > > >----- Original Message -----
> > > > >From: "Raghavender Srimantula" <kin...@ho...>
> > > > >To: <htm...@li...>
> > > > >Sent: Monday, April 08, 2002 6:39 AM
> > > > >Subject: [Htmlparser-user] HTML parser 1.1
> > > > >
> > > > >
> > > > > > Hi Somik,
> > > > > > when would be this HTMLparser 1.1 out?
> > > > > > one more question. to parse the FORM tags, I have a small
>question.
> > > > > > let us say this is a form tag
> > > > > >
> > > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke">
> > > > > > <P>User name:
> > > > > > <INPUT TYPE="text" NAME="userName" SIZE="10">
> > > > > > <P>Password:
> > > > > > <INPUT TYPE="password" NAME="password" SIZE="12">
> > > > > > <P><INPUT TYPE="submit" VALUE="Log in">
> > > > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()">
> > > > > > </FORM>
> > > > > >
> > > > > > I am not sure, but to me the way htmlparser parses is it gives 
>me
> > >the
> > > > >tag
> > > > > > parameter of the first line in the above snippet of html code,
>when
> > >I
> > >do
> > > > > > Hashtable table = tag.parseParameters();
> > > > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > > > .....</FORM>
> > > > > >
> > > > > > could you suggest me how to go ahead with this.
> > > > > > Raghav
> > > > > >
> > > > > >
> > > > > > to extract the INPUT tag parameters
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 
>_________________________________________________________________
> > > > > > MSN Photos is the easiest way to share and print your photos:
> > > > > > http://photos.msn.com/support/worldwide.aspx
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Htmlparser-user mailing list
> > > > > > Htm...@li...
> > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > > > >
> > > > >
> > > > >_________________________________________________________
> > > > >Do You Yahoo!?
> > > > >Get your free @yahoo.com address at http://mail.yahoo.com
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > _________________________________________________________________
> > > > Get your FREE download of MSN Explorer at
> > >http://explorer.msn.com/intl.asp.
> > >
> > >
> > >_________________________________________________________
> > >Do You Yahoo!?
> > >Get your free @yahoo.com address at http://mail.yahoo.com
> > >
> >
> >
> >
> >
> > _________________________________________________________________
> > Chat with friends online, try MSN Messenger: http://messenger.msn.com
> >
> >
> > _______________________________________________
> > Htmlparser-developer mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>
>_________________________________________________________
>Do You Yahoo!?
>Get your free @yahoo.com address at http://mail.yahoo.com
>




_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.

Re: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1

From: Somik R. <so...@ya...> - 2002-04-12 03:00:50

Hi Raghav
    You are right. That is indeed a bug. I have written a test case for it,
captured it, and fixed it.
    Code is checked into CVS - it should work for you now.

Regards,
Somik
----- Original Message -----
From: "Raghavender Srimantula" <kin...@ho...>
To: <so...@ya...>; <htm...@li...>
Sent: Friday, April 12, 2002 6:12 AM
Subject: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1


> hi Somik,
> the code snippet you mailed me seems to have some problems.
> let me explain you. the method
> isXMLTagFound(node,"OPTION")
> would always return false. the reason: in the definition of the above
method
> we have
>
> if (node instanceof HTMLTag) {
>       System.out.println("node instanceof HTMLTag in tagscanner  ");
> HTMLTag tag = (HTMLTag)node;
> if (tag.getText().equals(tagName)) {
> xmlTagFound=true;
> }
> }
>
> tag.getText() would always give me
> OPTION value="#">Select a destination
>
> which is not equal to the tagName, in this case the tagName=OPTION.
>
> Raghav
>
>
> >From: "Somik Raha" <so...@ya...>
> >To: "Raghavender Srimantula" <kin...@ho...>,
> ><htm...@li...>
> >Subject: Re: [Htmlparser-user] HTML parser 1.1
> >Date: Thu, 11 Apr 2002 11:14:51 +0900
> >
> >Hi Raghav
> >     I replied to your earlier query. Did you recieve the mail (I
forwarded
> >it again) ?
> >     Regarding your current query, there are two ways to handle option
> >tags.
> >
> >[1] Like in the previous question, you will have to recognize a HTMLTag
> >(begin tag), followed by HTMLStringNode, and finally HTMLEndTag.
> >[2] To make life easier, since this tag is basic xml, you can use a
special
> >XML parsing method provided in the superclass HTMLTagScanner.
> >
> >The methods are :
> >(i) isXMLTagFound
> >(ii) extractXMLData
> >
> >both of them are static mehods.
> >You would use it like this :
> >
> >HTMLNode node = reader.readElement();
> >if (isXMLTag(node,"OPTION")) {
> >     String option = extractXMLData(node,"OPTION",reader);
> >     // The string now contains the data within the option xml tag
> >     // So given an input : <OPTION value="#">Select a
destination</OPTION>
> >     // option will hold "Select a destination"
> >}
> >
> >But getting the value from the option tag itself would need to be handled
> >seperately.
> >
> >Regards,
> >Somik
> >----- Original Message -----
> >From: "Raghavender Srimantula" <kin...@ho...>
> >To: <so...@ya...>; <htm...@li...>
> >Sent: Thursday, April 11, 2002 9:22 AM
> >Subject: Re: [Htmlparser-user] HTML parser 1.1
> >
> >
> > > hi Somik,
> > > any ideas about my previous mail. let us say if we have
> > > <OPTION value="#">Select a destination</OPTION>
> > > when I do a
> > > node = reader.readElement();
> > > where "reader" is HTMLReader
> > > the node I get is of type neither HTMLStringNode, HTMLEndTag,
> > > HTMLRemarkNode.
> > > how do I classify this if I want to do some thing with them.
> > > Raghav
> > >
> > > >From: "Somik Raha" <so...@ya...>
> > > >To: "Raghavender Srimantula" <kin...@ho...>
> > > >CC: <htm...@li...>
> > > >Subject: Re: [Htmlparser-user] HTML parser 1.1
> > > >Date: Mon, 8 Apr 2002 13:04:07 +0900
> > > >
> > > >Hi Raghav
> > > > > when would be this HTMLparser 1.1 out?
> > > >As soon as I can wrap it up. Technically, the code is ready and
already
> > > >checked into CVS. I need to do the process of creating a release -
make
> > > >some
> > > >documentation, check everything is ok, ..
> > > >If I had some help I could wrap it up sooner.
> > > >
> > > > > I am not sure, but to me the way htmlparser parses is it gives me
> >the
> > > >tag
> > > > > parameter of the first line in the above snippet of html code,
when
> >I
> >do
> > > > > Hashtable table = tag.parseParameters();
> > > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > > .....</FORM>
> > > >
> > > >Yes - parseParameters() will give you the stuff inside the FORM tag.
> >That
> > > >is
> > > >what I call "microscopic" parsing. But to get the remaining tags -
till
> >you
> > > >encounter </FORM> you need to do "macroscopic" parsing. This is not
> >hard-
> > > >check HTMLAppletScanner as an example.
> > > >
> > > >In a nutshell - concept is very simple. The scan method provides you
> >with
> >a
> > > >reader. So you are to use that reader to read ahead and get the next
> >tags.
> > > >This is simple bcos the reader will automatically identify the
correct
> > > >tags,
> > > >and the mechanism is very similar to using the parser to get the tags
> >you
> > > >want. The HTMLLinkScanner among others, also works on the same
> >principle.
> > > >
> > > >Bytway - I think we should take this discussion to the Developer
list.
> > > >
> > > >Regards,
> > > >Somik
> > > >----- Original Message -----
> > > >From: "Raghavender Srimantula" <kin...@ho...>
> > > >To: <htm...@li...>
> > > >Sent: Monday, April 08, 2002 6:39 AM
> > > >Subject: [Htmlparser-user] HTML parser 1.1
> > > >
> > > >
> > > > > Hi Somik,
> > > > > when would be this HTMLparser 1.1 out?
> > > > > one more question. to parse the FORM tags, I have a small
question.
> > > > > let us say this is a form tag
> > > > >
> > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke">
> > > > > <P>User name:
> > > > > <INPUT TYPE="text" NAME="userName" SIZE="10">
> > > > > <P>Password:
> > > > > <INPUT TYPE="password" NAME="password" SIZE="12">
> > > > > <P><INPUT TYPE="submit" VALUE="Log in">
> > > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()">
> > > > > </FORM>
> > > > >
> > > > > I am not sure, but to me the way htmlparser parses is it gives me
> >the
> > > >tag
> > > > > parameter of the first line in the above snippet of html code,
when
> >I
> >do
> > > > > Hashtable table = tag.parseParameters();
> > > > > it is looking for parameters inside <FORM ..... >, but not <FORM
> > > > > .....</FORM>
> > > > >
> > > > > could you suggest me how to go ahead with this.
> > > > > Raghav
> > > > >
> > > > >
> > > > > to extract the INPUT tag parameters
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > _________________________________________________________________
> > > > > MSN Photos is the easiest way to share and print your photos:
> > > > > http://photos.msn.com/support/worldwide.aspx
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Htmlparser-user mailing list
> > > > > Htm...@li...
> > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > > >
> > > >
> > > >_________________________________________________________
> > > >Do You Yahoo!?
> > > >Get your free @yahoo.com address at http://mail.yahoo.com
> > > >
> > >
> > >
> > >
> > >
> > > _________________________________________________________________
> > > Get your FREE download of MSN Explorer at
> >http://explorer.msn.com/intl.asp.
> >
> >
> >_________________________________________________________
> >Do You Yahoo!?
> >Get your free @yahoo.com address at http://mail.yahoo.com
> >
>
>
>
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
>
>
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com