htmlparser-developer Mailing List for HTML Parser
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
(1) |
19
|
20
|
21
(1) |
22
(2) |
23
|
24
|
25
|
26
(3) |
27
(7) |
28
|
29
|
30
(3) |
31
|
|
|
|
|
|
|
From: Derrick O. <Der...@ro...> - 2003-08-30 23:35:16
|
Chris, Maybe you misconstrued the open source paradigm, it's only slightly organized anarchy. You can do whatever you want, with or without my, or any one elses, permission. It's not my code, I only rent. BTW, you don't lose the power of inheritance, you only constrain it to an interface driven methdology. Derrick Christopher Bird wrote: > Great, thanks. > > Yes I had thought that a factory mechanism would be a good way as well > - almost a decorator pattern at that point, I think. Actually, that > whole idea suggests a development paradigm for OO projects in general. > Of course you lose the power of inheritance (and the native engine > performance opportunities), but you gain a great deal of flexibility. > > I would love to hear some consensus (or at least informed opinions) on > this. > > I also plan (with your permission) to write to the IEEE Sofware > Engineering magazine (I am an IEEE member) and ask for opinions there > too. I would like your permission because I would like to reference > this concrete example. I would be glad to submit the letter to you > before sending it - I am not interested in making waves, but am always > interested in finding ways to make our profession better. Since the > HTMLParser is such a well executed piece of software, it strikes me > that it would make a good example for the letter. > > Regards > > Chris > > > > >> From: Derrick Oswald <Der...@ro...> >> To: sea...@ho... >> CC: htm...@li... >> Subject: Re: Adding methods to Tag >> Date: Fri, 29 Aug 2003 22:20:09 -0400 >> MIME-Version: 1.0 >> Received: from fep02-mail.bloor.is.net.cable.rogers.com >> ([66.185.86.72]) by mc8-f13.law1.hotmail.com with Microsoft >> SMTPSVC(5.0.2195.5600); Fri, 29 Aug 2003 19:20:12 -0700 >> Received: from rogers.com ([24.102.205.244]) by >> fep02-mail.bloor.is.net.cable.rogers.com (InterMail >> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP id >> <200...@ro...>; >> Fri, 29 Aug 2003 22:20:10 -0400 >> X-Message-Info: JGTYoYF78jHaxjh7Y9B8uHCMhasyqgjM >> Message-ID: <3F5...@ro...> >> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) >> Gecko/20030225 >> X-Accept-Language: en-us, en >> References: <Sea...@ho...> >> In-Reply-To: <Sea...@ho...> >> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at >> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] using >> ID <der...@ro...> at Fri, 29 Aug 2003 22:20:10 -0400 >> Return-Path: Der...@ro... >> X-OriginalArrivalTime: 30 Aug 2003 02:20:12.0495 (UTC) >> FILETIME=[3DF5DDF0:01C36E9D] >> >> Chris, >> >> I'm opening this up to a wider audience, because it may have been >> solved before, or might be of interest to others with the same problem. >> >> The basic problem is how to add functionality like supportsColor() to >> base classes, like Tag, without recompiling the whole class heirarchy. >> >> One way would be to join the htmlparser project as a developer and >> just add it, if it's germane to others besides yourself. If it's not, >> then a bolt-on is needed. >> >> One way to handle this problem is a 'Factory' mechanism. A >> 'deep-in-the-bowels' class would ask the 'factory' for a tag, i.e. >> factory.makeTag ("Form"). So you would wedge your own factory in >> there. Choosing the factory is usually done with a Class.forName() >> where the string specifying the class comes from a configuration >> setting. With some design effort, we should be able to come up with >> a definition for a factory class and a suitable set of interfaces >> which the whole project would be refactored to use, i.e. the IFormTag >> interface extends the ICompositeTag interface and adds form related >> methods; the ICompositeTag interface extends the IBaseTag interface >> and adds child accessors; and nothing references FormTag directly >> except the factory. >> >> So then there is the problem of your factory supplying your special >> tag that implements IFormTag *and* IColorSupport when makeTag >> ("Form") is called. Most of what you need is already written in >> FormTag, you just need to add a couple of methods. I think this is >> where dynamic proxies come in: >> http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html. The >> InvocationHandler would determine if the target method comes from >> IColorSupport, and if so perform the needful directly. Otherwise it >> would delegate to the wrapped tag object. This means the whole >> htmlparser project shuttles wrapped tag objects around and doesn't >> know it, till they bubble up to your code where you cast them to an >> IColorSupport and invoke the supportsColor() method: >> >> Parser.setTagFactory ("ChrisBirdFactory"); >> Parser parser = new Parser (url); >> parser.registerScanners (); >> for (NodeIterator e = parser.elements (); e.hasMoreNodes (); ) >> { >> Node node = e.nextNode (); >> if (node instanceof Tag) // I presume all tags, but not nodes, >> support IColorSupport >> ((IColorSupport)node).supportsColor (); >> } >> >> Derrick >> >> Christopher Bird wrote: >> >>> Thanks for the reply, that is my dilemma. >>> >>> I am an old Smalltalk programmer from years gone by and have always >>> used what is sometimes called responsibility driven development. So >>> (at least in my head), the responsibilty for knowing that a tag >>> supports color or BG color is the tag's responsibilty, and not the >>> responsibility of some agent acting on the tag. >>> >>> The trouble with that style of development, especially for >>> "packaged" software is that you(I) find yourself(myself) in a bind >>> like this one. >>> >>> Indeed I had to recompile the whole package! But since I have the >>> source in my project (to help me learn the intricacies of certain >>> behaviors - especially the creation of handlers for very complex >>> tags) that was no big deal for me. However now I am in violation of >>> protocol for OpenSource, I am sure. >>> >>> This really gets to the crux of OOness and Open Source development. >>> When there are requirements for classes high in the inheritance >>> hierarchy and they do "rightfully" belong there how does one get >>> them there - short term to overcome a specific issue, and long term >>> as part of the overall release cycle of the product. >>> >>> I am probably not the first to wonder this! >>> >>> BTW, I love the implementation. It took some mind-bending to get >>> used to it at first - again separating the responsibilities out so I >>> can factor my solutions properly was initialy a challenge, but I >>> have become very productive. >>> >>> Thank you so much for an excellent piece of technology. >>> >>> Regards >>> >>> Chris >>> >>> >>>> From: Derrick Oswald <Der...@ro...> >>>> To: Christopher Bird <se...@us...> >>>> Subject: Re: Adding methods to Tag >>>> Date: Fri, 29 Aug 2003 07:41:08 -0400 >>>> MIME-Version: 1.0 >>>> Received: from sc8-sf-mx1.sourceforge.net ([66.35.250.206]) by >>>> mc4-f42.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); >>>> Fri, 29 Aug 2003 04:41:47 -0700 >>>> Received: from fep02-mail.bloor.is.net.cable.rogers.com >>>> ([66.185.86.72])by sc8-sf-mx1.sourceforge.net with esmtp (Exim >>>> 4.22)id 19shdQ-0004My-0Sfor se...@us...; Fri, 29 >>>> Aug 2003 04:41:44 -0700 >>>> Received: from rogers.com ([24.102.205.244]) by >>>> fep02-mail.bloor.is.net.cable.rogers.com (InterMail >>>> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP id >>>> <200...@ro...> >>>> for <se...@us...>; Fri, 29 Aug >>>> 2003 07:41:12 -0400 >>>> X-Message-Info: JGTYoYF78jGnyWgKUPy676KmG5L9JDoH >>>> Message-ID: <3F4...@ro...> >>>> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) >>>> Gecko/20030225 >>>> X-Accept-Language: en-us, en >>>> References: <E19...@sc...> >>>> In-Reply-To: <E19...@sc...> >>>> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at >>>> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] >>>> using ID <der...@ro...> at Fri, 29 Aug 2003 07:41:12 -0400 >>>> X-Spam-Score: -2.1 (--) >>>> X-Spam-Report: -2.1/5.0The original message has been attached along >>>> with this report, soyou can recognize or block similar mail in >>>> future.See http://spamassassin.org/tag/ for more details.Content >>>> preview: Chris, It's unclear how your ChrisTag method would >>>> workwithout recompiling the whole package. The Tag class >>>> extendsAbstractNode, so presumably ChrisTag would extend >>>> AbstractNode and addthe methods you want, then Tag would extend >>>> ChrisTag. You would stillneed to 'fix' each new release by >>>> doctoring Tag. [...] Content analysis details: (-2.10 points, 5 >>>> required)USER_AGENT_MOZILLA_UA (0.0 points) User-Agent header >>>> indicates a non-spam MUA (Mozilla)IN_REP_TO (-0.5 points) >>>> Has a In-Reply-To headerX_ACCEPT_LANG (-0.1 points) Has a >>>> X-Accept-Language headerREFERENCES (-0.5 points) Has a >>>> valid-looking References headerEMAIL_ATTRIBUTION (-0.5 points) >>>> BODY: Contains what looks like an email >>>> attributionREPLY_WITH_QUOTES (-0.5 points) Reply with quoted text >>>> Return-Path: Der...@ro... >>>> X-OriginalArrivalTime: 29 Aug 2003 11:41:50.0512 (UTC) >>>> FILETIME=[89217300:01C36E22] >>>> >>>> Chris, >>>> >>>> It's unclear how your ChrisTag method would work without >>>> recompiling the whole package. The Tag class extends AbstractNode, >>>> so presumably ChrisTag would extend AbstractNode and add the >>>> methods you want, then Tag would extend ChrisTag. You would still >>>> need to 'fix' each new release by doctoring Tag. >>>> >>>> The best way is probably to have a class external to everything >>>> with the static methods needed (see Tag.breaksFlow() for example >>>> code): >>>> class ColorKnowledge { >>>> public static boolean supportsColor (Node node) >>>> { return >>>> (listofNodesSupportingForegroundColor.contains(node.getText().toUpperCase()));} >>>> >>>> >>>> ... >>>> >>>> If it's generic enough, submit it and we'll add it to Node. >>>> >>>> Derrick >>>> >>>> Christopher Bird wrote: >>>> >>>>> I am new to using OpenSource code. I have found it very >>>>> >>>>> helpful, and am using the HTMLParser for a number of >>>>> >>>>> purposes. >>>>> >>>>> >>>>> >>>>> I am wanting to add some code to Tag - especially the >>>>> >>>>> following two methods: >>>>> >>>>> >>>>> >>>>> public boolean supportsColor() >>>>> >>>>> /* returns true iff the color attribute is valid for this tag >>>>> >>>>> >>>>> >>>>> public boolean supportsBGColor () >>>>> >>>>> >>>>> >>>>> /* Returns true iff the bgColor attribute is valid for this tag >>>>> >>>>> >>>>> >>>>> I would be happy if you guys were to add that, but failing >>>>> >>>>> that what is the process if I have to do it myself? There may >>>>> >>>>> be a bunch of other things that I will want to add to Tag - >>>>> >>>>> for handling some of my own custom behaviors. >>>>> >>>>> >>>>> >>>>> I can see a couple of ways of doing this (none pretty). One is >>>>> >>>>> to create a new ChrisTag superclass and change Tag's >>>>> >>>>> implements clause to implements ChrisTag. I can then define >>>>> >>>>> my methods there. Of course the community doesn't get the >>>>> >>>>> benefit (? dubious in some cases, I fear) of my additions. >>>>> >>>>> >>>>> >>>>> The other obvious way is simply to add the methods to Tag >>>>> >>>>> itself. I am not wild about doing that either because as I >>>>> >>>>> download new editions of HTMLParser, my changes get lost - >>>>> >>>>> especially since I am a solo practitioner at the moment and >>>>> >>>>> am not using a source code management system. >>>>> >>>>> >>>>> >>>>> Any assistance would be gratefully appreciated - both to the >>>>> >>>>> short term (immediate) problem and to the general question. >>>>> >>>>> >>>>> >>>>> Thanks in advance >>>>> >>>>> >>>>> >>>>> Chris Bird >>>> |
From: Joshua K. <jo...@in...> - 2003-08-30 23:28:52
|
Derrick, Thanks for sharing this email. I have some opinions on this -- can't email them at the moment. Will do soon... regards jk Derrick Oswald wrote: > Chris, > > I'm opening this up to a wider audience, because it may have been solved > before, or might be of interest to others with the same problem. > > The basic problem is how to add functionality like supportsColor() to > base classes, like Tag, without recompiling the whole class heirarchy. > > One way would be to join the htmlparser project as a developer and just > add it, if it's germane to others besides yourself. If it's not, then a > bolt-on is needed. > > One way to handle this problem is a 'Factory' mechanism. A > 'deep-in-the-bowels' class would ask the 'factory' for a tag, i.e. > factory.makeTag ("Form"). So you would wedge your own factory in there. > Choosing the factory is usually done with a Class.forName() where the > string specifying the class comes from a configuration setting. With > some design effort, we should be able to come up with a definition for a > factory class and a suitable set of interfaces which the whole project > would be refactored to use, i.e. the IFormTag interface extends the > ICompositeTag interface and adds form related methods; the ICompositeTag > interface extends the IBaseTag interface and adds child accessors; and > nothing references FormTag directly except the factory. > > So then there is the problem of your factory supplying your special tag > that implements IFormTag *and* IColorSupport when makeTag ("Form") is > called. Most of what you need is already written in FormTag, you just > need to add a couple of methods. I think this is where dynamic proxies > come in: http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html. > The InvocationHandler would determine if the target method comes from > IColorSupport, and if so perform the needful directly. Otherwise it > would delegate to the wrapped tag object. This means the whole > htmlparser project shuttles wrapped tag objects around and doesn't know > it, till they bubble up to your code where you cast them to an > IColorSupport and invoke the supportsColor() method: > > Parser.setTagFactory ("ChrisBirdFactory"); > Parser parser = new Parser (url); > parser.registerScanners (); > for (NodeIterator e = parser.elements (); e.hasMoreNodes (); ) > { > Node node = e.nextNode (); > if (node instanceof Tag) // I presume all tags, but not nodes, > support IColorSupport > ((IColorSupport)node).supportsColor (); > } > > Derrick > > Christopher Bird wrote: > >> Thanks for the reply, that is my dilemma. >> >> I am an old Smalltalk programmer from years gone by and have always >> used what is sometimes called responsibility driven development. So >> (at least in my head), the responsibilty for knowing that a tag >> supports color or BG color is the tag's responsibilty, and not the >> responsibility of some agent acting on the tag. >> >> The trouble with that style of development, especially for "packaged" >> software is that you(I) find yourself(myself) in a bind like this one. >> >> Indeed I had to recompile the whole package! But since I have the >> source in my project (to help me learn the intricacies of certain >> behaviors - especially the creation of handlers for very complex tags) >> that was no big deal for me. However now I am in violation of protocol >> for OpenSource, I am sure. >> >> This really gets to the crux of OOness and Open Source development. >> When there are requirements for classes high in the inheritance >> hierarchy and they do "rightfully" belong there how does one get them >> there - short term to overcome a specific issue, and long term as part >> of the overall release cycle of the product. >> >> I am probably not the first to wonder this! >> >> BTW, I love the implementation. It took some mind-bending to get used >> to it at first - again separating the responsibilities out so I can >> factor my solutions properly was initialy a challenge, but I have >> become very productive. >> >> Thank you so much for an excellent piece of technology. >> >> Regards >> >> Chris >> >> >>> From: Derrick Oswald <Der...@ro...> >>> To: Christopher Bird <se...@us...> >>> Subject: Re: Adding methods to Tag >>> Date: Fri, 29 Aug 2003 07:41:08 -0400 >>> MIME-Version: 1.0 >>> Received: from sc8-sf-mx1.sourceforge.net ([66.35.250.206]) by >>> mc4-f42.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Fri, >>> 29 Aug 2003 04:41:47 -0700 >>> Received: from fep02-mail.bloor.is.net.cable.rogers.com >>> ([66.185.86.72])by sc8-sf-mx1.sourceforge.net with esmtp (Exim >>> 4.22)id 19shdQ-0004My-0Sfor se...@us...; Fri, 29 >>> Aug 2003 04:41:44 -0700 >>> Received: from rogers.com ([24.102.205.244]) by >>> fep02-mail.bloor.is.net.cable.rogers.com (InterMail >>> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP id >>> <200...@ro...> >>> for <se...@us...>; Fri, 29 Aug >>> 2003 07:41:12 -0400 >>> X-Message-Info: JGTYoYF78jGnyWgKUPy676KmG5L9JDoH >>> Message-ID: <3F4...@ro...> >>> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) >>> Gecko/20030225 >>> X-Accept-Language: en-us, en >>> References: <E19...@sc...> >>> In-Reply-To: <E19...@sc...> >>> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at >>> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] using >>> ID <der...@ro...> at Fri, 29 Aug 2003 07:41:12 -0400 >>> X-Spam-Score: -2.1 (--) >>> X-Spam-Report: -2.1/5.0The original message has been attached along >>> with this report, soyou can recognize or block similar mail in >>> future.See http://spamassassin.org/tag/ for more details.Content >>> preview: Chris, It's unclear how your ChrisTag method would >>> workwithout recompiling the whole package. The Tag class >>> extendsAbstractNode, so presumably ChrisTag would extend AbstractNode >>> and addthe methods you want, then Tag would extend ChrisTag. You >>> would stillneed to 'fix' each new release by doctoring Tag. [...] >>> Content analysis details: (-2.10 points, 5 >>> required)USER_AGENT_MOZILLA_UA (0.0 points) User-Agent header >>> indicates a non-spam MUA (Mozilla)IN_REP_TO (-0.5 points) >>> Has a In-Reply-To headerX_ACCEPT_LANG (-0.1 points) Has a >>> X-Accept-Language headerREFERENCES (-0.5 points) Has a >>> valid-looking References headerEMAIL_ATTRIBUTION (-0.5 points) BODY: >>> Contains what looks like an email attributionREPLY_WITH_QUOTES (-0.5 >>> points) Reply with quoted text >>> Return-Path: Der...@ro... >>> X-OriginalArrivalTime: 29 Aug 2003 11:41:50.0512 (UTC) >>> FILETIME=[89217300:01C36E22] >>> >>> Chris, >>> >>> It's unclear how your ChrisTag method would work without recompiling >>> the whole package. The Tag class extends AbstractNode, so presumably >>> ChrisTag would extend AbstractNode and add the methods you want, then >>> Tag would extend ChrisTag. You would still need to 'fix' each new >>> release by doctoring Tag. >>> >>> The best way is probably to have a class external to everything with >>> the static methods needed (see Tag.breaksFlow() for example code): >>> class ColorKnowledge { >>> public static boolean supportsColor (Node node) >>> { return >>> (listofNodesSupportingForegroundColor.contains(node.getText().toUpperCase()));} >>> >>> ... >>> >>> If it's generic enough, submit it and we'll add it to Node. >>> >>> Derrick >>> >>> Christopher Bird wrote: >>> >>>> I am new to using OpenSource code. I have found it very >>>> >>>> helpful, and am using the HTMLParser for a number of >>>> >>>> purposes. >>>> >>>> >>>> >>>> I am wanting to add some code to Tag - especially the >>>> >>>> following two methods: >>>> >>>> >>>> >>>> public boolean supportsColor() >>>> >>>> /* returns true iff the color attribute is valid for this tag >>>> >>>> >>>> >>>> public boolean supportsBGColor () >>>> >>>> >>>> >>>> /* Returns true iff the bgColor attribute is valid for this tag >>>> >>>> >>>> >>>> I would be happy if you guys were to add that, but failing >>>> >>>> that what is the process if I have to do it myself? There may >>>> >>>> be a bunch of other things that I will want to add to Tag - >>>> >>>> for handling some of my own custom behaviors. >>>> >>>> >>>> >>>> I can see a couple of ways of doing this (none pretty). One is >>>> >>>> to create a new ChrisTag superclass and change Tag's >>>> >>>> implements clause to implements ChrisTag. I can then define >>>> >>>> my methods there. Of course the community doesn't get the >>>> >>>> benefit (? dubious in some cases, I fear) of my additions. >>>> >>>> >>>> >>>> The other obvious way is simply to add the methods to Tag >>>> >>>> itself. I am not wild about doing that either because as I >>>> >>>> download new editions of HTMLParser, my changes get lost - >>>> >>>> especially since I am a solo practitioner at the moment and >>>> >>>> am not using a source code management system. >>>> >>>> >>>> >>>> Any assistance would be gratefully appreciated - both to the >>>> >>>> short term (immediate) problem and to the general question. >>>> >>>> >>>> >>>> Thanks in advance >>>> >>>> >>>> >>>> Chris Bird >>>> >>>> > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2003-08-30 02:20:43
|
Chris, I'm opening this up to a wider audience, because it may have been solved before, or might be of interest to others with the same problem. The basic problem is how to add functionality like supportsColor() to base classes, like Tag, without recompiling the whole class heirarchy. One way would be to join the htmlparser project as a developer and just add it, if it's germane to others besides yourself. If it's not, then a bolt-on is needed. One way to handle this problem is a 'Factory' mechanism. A 'deep-in-the-bowels' class would ask the 'factory' for a tag, i.e. factory.makeTag ("Form"). So you would wedge your own factory in there. Choosing the factory is usually done with a Class.forName() where the string specifying the class comes from a configuration setting. With some design effort, we should be able to come up with a definition for a factory class and a suitable set of interfaces which the whole project would be refactored to use, i.e. the IFormTag interface extends the ICompositeTag interface and adds form related methods; the ICompositeTag interface extends the IBaseTag interface and adds child accessors; and nothing references FormTag directly except the factory. So then there is the problem of your factory supplying your special tag that implements IFormTag *and* IColorSupport when makeTag ("Form") is called. Most of what you need is already written in FormTag, you just need to add a couple of methods. I think this is where dynamic proxies come in: http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html. The InvocationHandler would determine if the target method comes from IColorSupport, and if so perform the needful directly. Otherwise it would delegate to the wrapped tag object. This means the whole htmlparser project shuttles wrapped tag objects around and doesn't know it, till they bubble up to your code where you cast them to an IColorSupport and invoke the supportsColor() method: Parser.setTagFactory ("ChrisBirdFactory"); Parser parser = new Parser (url); parser.registerScanners (); for (NodeIterator e = parser.elements (); e.hasMoreNodes (); ) { Node node = e.nextNode (); if (node instanceof Tag) // I presume all tags, but not nodes, support IColorSupport ((IColorSupport)node).supportsColor (); } Derrick Christopher Bird wrote: > Thanks for the reply, that is my dilemma. > > I am an old Smalltalk programmer from years gone by and have always > used what is sometimes called responsibility driven development. So > (at least in my head), the responsibilty for knowing that a tag > supports color or BG color is the tag's responsibilty, and not the > responsibility of some agent acting on the tag. > > The trouble with that style of development, especially for "packaged" > software is that you(I) find yourself(myself) in a bind like this one. > > Indeed I had to recompile the whole package! But since I have the > source in my project (to help me learn the intricacies of certain > behaviors - especially the creation of handlers for very complex tags) > that was no big deal for me. However now I am in violation of protocol > for OpenSource, I am sure. > > This really gets to the crux of OOness and Open Source development. > When there are requirements for classes high in the inheritance > hierarchy and they do "rightfully" belong there how does one get them > there - short term to overcome a specific issue, and long term as part > of the overall release cycle of the product. > > I am probably not the first to wonder this! > > BTW, I love the implementation. It took some mind-bending to get used > to it at first - again separating the responsibilities out so I can > factor my solutions properly was initialy a challenge, but I have > become very productive. > > Thank you so much for an excellent piece of technology. > > Regards > > Chris > > >> From: Derrick Oswald <Der...@ro...> >> To: Christopher Bird <se...@us...> >> Subject: Re: Adding methods to Tag >> Date: Fri, 29 Aug 2003 07:41:08 -0400 >> MIME-Version: 1.0 >> Received: from sc8-sf-mx1.sourceforge.net ([66.35.250.206]) by >> mc4-f42.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Fri, >> 29 Aug 2003 04:41:47 -0700 >> Received: from fep02-mail.bloor.is.net.cable.rogers.com >> ([66.185.86.72])by sc8-sf-mx1.sourceforge.net with esmtp (Exim >> 4.22)id 19shdQ-0004My-0Sfor se...@us...; Fri, 29 >> Aug 2003 04:41:44 -0700 >> Received: from rogers.com ([24.102.205.244]) by >> fep02-mail.bloor.is.net.cable.rogers.com (InterMail >> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP id >> <200...@ro...> >> for <se...@us...>; Fri, 29 Aug >> 2003 07:41:12 -0400 >> X-Message-Info: JGTYoYF78jGnyWgKUPy676KmG5L9JDoH >> Message-ID: <3F4...@ro...> >> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) >> Gecko/20030225 >> X-Accept-Language: en-us, en >> References: <E19...@sc...> >> In-Reply-To: <E19...@sc...> >> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at >> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] using >> ID <der...@ro...> at Fri, 29 Aug 2003 07:41:12 -0400 >> X-Spam-Score: -2.1 (--) >> X-Spam-Report: -2.1/5.0The original message has been attached along >> with this report, soyou can recognize or block similar mail in >> future.See http://spamassassin.org/tag/ for more details.Content >> preview: Chris, It's unclear how your ChrisTag method would >> workwithout recompiling the whole package. The Tag class >> extendsAbstractNode, so presumably ChrisTag would extend AbstractNode >> and addthe methods you want, then Tag would extend ChrisTag. You >> would stillneed to 'fix' each new release by doctoring Tag. [...] >> Content analysis details: (-2.10 points, 5 >> required)USER_AGENT_MOZILLA_UA (0.0 points) User-Agent header >> indicates a non-spam MUA (Mozilla)IN_REP_TO (-0.5 points) >> Has a In-Reply-To headerX_ACCEPT_LANG (-0.1 points) Has a >> X-Accept-Language headerREFERENCES (-0.5 points) Has a >> valid-looking References headerEMAIL_ATTRIBUTION (-0.5 points) BODY: >> Contains what looks like an email attributionREPLY_WITH_QUOTES (-0.5 >> points) Reply with quoted text >> Return-Path: Der...@ro... >> X-OriginalArrivalTime: 29 Aug 2003 11:41:50.0512 (UTC) >> FILETIME=[89217300:01C36E22] >> >> Chris, >> >> It's unclear how your ChrisTag method would work without recompiling >> the whole package. The Tag class extends AbstractNode, so presumably >> ChrisTag would extend AbstractNode and add the methods you want, then >> Tag would extend ChrisTag. You would still need to 'fix' each new >> release by doctoring Tag. >> >> The best way is probably to have a class external to everything with >> the static methods needed (see Tag.breaksFlow() for example code): >> class ColorKnowledge { >> public static boolean supportsColor (Node node) >> { return >> (listofNodesSupportingForegroundColor.contains(node.getText().toUpperCase()));} >> >> ... >> >> If it's generic enough, submit it and we'll add it to Node. >> >> Derrick >> >> Christopher Bird wrote: >> >>> I am new to using OpenSource code. I have found it very >>> >>> helpful, and am using the HTMLParser for a number of >>> >>> purposes. >>> >>> >>> >>> I am wanting to add some code to Tag - especially the >>> >>> following two methods: >>> >>> >>> >>> public boolean supportsColor() >>> >>> /* returns true iff the color attribute is valid for this tag >>> >>> >>> >>> public boolean supportsBGColor () >>> >>> >>> >>> /* Returns true iff the bgColor attribute is valid for this tag >>> >>> >>> >>> I would be happy if you guys were to add that, but failing >>> >>> that what is the process if I have to do it myself? There may >>> >>> be a bunch of other things that I will want to add to Tag - >>> >>> for handling some of my own custom behaviors. >>> >>> >>> >>> I can see a couple of ways of doing this (none pretty). One is >>> >>> to create a new ChrisTag superclass and change Tag's >>> >>> implements clause to implements ChrisTag. I can then define >>> >>> my methods there. Of course the community doesn't get the >>> >>> benefit (? dubious in some cases, I fear) of my additions. >>> >>> >>> >>> The other obvious way is simply to add the methods to Tag >>> >>> itself. I am not wild about doing that either because as I >>> >>> download new editions of HTMLParser, my changes get lost - >>> >>> especially since I am a solo practitioner at the moment and >>> >>> am not using a source code management system. >>> >>> >>> >>> Any assistance would be gratefully appreciated - both to the >>> >>> short term (immediate) problem and to the general question. >>> >>> >>> >>> Thanks in advance >>> >>> >>> >>> Chris Bird >>> >>> |
From: Couball, J. <jam...@co...> - 2003-08-27 17:01:13
|
Although I personally prefer tabs, I would +1 any consistent coding style. FWIW, you may want to loosely enforce coding standards through the use of the Checkstyle ant task (see http://checkstyle.sourceforge.net/). This could produce a report of violations without really impacting the project. An example report is here: http://maven.apache.org/checkstyle-report.html. Sincerely, James. -----Original Message----- From: Fernando Machado [mailto:fn...@ne...]=20 Sent: Tuesday, August 26, 2003 11:13 PM To: htm...@li... Hi all, +1 for Sun Coding Standard Regards, -fmc > Subject: AW: [Htmlparser-developer] tabs > Date: Tue, 26 Aug 2003 09:37:49 +0200 > From: "Holger Stenzhorn" <Hol...@xt...> > To: <htm...@li...> > Reply-To: htm...@li... (...) > How about the original one from Sun =3D > (http://java.sun.com/docs/codeconv/)? (...) >=20 > Cheers, > Holger >=20 > From: "Somik Raha" <so...@ya...> > To: <htm...@li...> > Subject: Re: [Htmlparser-developer] Re: Htmlparser-developer digest, Vol 1 #255 - 1 msg > Date: Tue, 26 Aug 2003 20:32:31 -0400 > Reply-To: htm...@li... >=20 > Hi Folks, (...) >=20 > I would personally prefer to maintain the tabs, and follow the Sun > Microsystems java coding standard. > http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html >=20 (...) >=20 > Regards > Somik >=20 ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: zheng z. <mon...@ya...> - 2003-08-27 14:46:39
|
I'm a beginner of htmlparser developer,It will be appreciate if sb. can give me some hints.Here is the code: NodeReader nodeR = new NodeReader(new FileReader(new File("C:/temp/b.html")),1000); System.out.println("nodeR.getLineCount():"+nodeR.getLineCount()); problem is why nodeR.getLineCount() always 1. thans again zz ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca |
From: Derrick O. <Der...@ro...> - 2003-08-27 11:21:03
|
Tabs are an issue because of the ambiguity. A space is a space. A tab can be anything. The original intent of tabs (in typewriters) was to allow quick columnar alignment. This obviously doesn't work in electronic documents where the interpretation is dependant on the program used. Try opening a file formatted with a tabstop setting of 4 in a program (like notepad) that has a hard-coded tabstop spacing of 8. Some think that tabs conserve disk space (one tab is worth 8 spaces right) but hard disk space is pennies a megabyte and compression programs handle lots of spaces very nicely for transmission. I just think they are an anachronism that's long since lost it's usefulness. The Sun "Code Conventions for the Java Programming Language" available at http://java.sun.com/docs/codeconv is a good basis from which to start. I'm adjusting it a bit to account for open source, cvs and htmlparser specifics. I disagree with some of it's suggestions though, like: Four spaces should be used as the unit of indentation. The exact construction of the indentation (spaces vs. tabs) is unspecified. Tabs must be set exactly every 8 spaces (not 4). I mean this is just sloppy. It should be: Four spaces should be used as the unit of indentation. The use of tabs, vertical tab, form-feed, carriage-return and other control characters, other than newline, to control displayed formatting is forbidden. I hope to provide a rationale for where I have differed within the document. Derrick Somik Raha wrote: >Hi Derrick, > Hmm.. If you mean that Eclipse will auto-convert tabs to spaces, maybe I >misunderstood. As long as one does not have to press space four times... > > Bytway, just curious as to why tabs are a problem in the first place.. > > > >>I'm working on a Java Coding Standards document. >> >> > > What do you think of the Sun coding standard? > >Cheers, >Somik >----- Original Message ----- >From: "Derrick Oswald" <Der...@ro...> >To: <htm...@li...> >Sent: Tuesday, August 26, 2003 9:09 PM >Subject: Re: [Htmlparser-developer] tabs > > > > >>Somik, >> >>The (rather meager) response to an earlier poll indicated that Eclipse >>was the most popular by far, followed by a few NetBeans and JBuilder users. >> >> >>Replacing tabs with spaces is automatic in modern editors and IDEs. For >>Eclipse, you need to make the settings in the Java/Editor Typing tab. >>You also need to make the setting in the Java/Code Formatter Style tab. >>For Netbeans use Tools-Options-Editing-Editor Settings-Java Editor-Java >>Indentation Engine-...-Expand Tabs To Spaces-True. I don't know how to >>do it in JBuilder, but I know it can be done. >> >>I'm working on a Java Coding Standards document. >> >>Derrick >> >> |
From: Fernando M. <fn...@ne...> - 2003-08-27 06:12:32
|
Hi all, +1 for Sun Coding Standard Regards, -fmc > Subject: AW: [Htmlparser-developer] tabs > Date: Tue, 26 Aug 2003 09:37:49 +0200 > From: "Holger Stenzhorn" <Hol...@xt...> > To: <htm...@li...> > Reply-To: htm...@li... (...) > How about the original one from Sun = > (http://java.sun.com/docs/codeconv/)? (...) > > Cheers, > Holger > > From: "Somik Raha" <so...@ya...> > To: <htm...@li...> > Subject: Re: [Htmlparser-developer] Re: Htmlparser-developer digest, Vol 1 #255 - 1 msg > Date: Tue, 26 Aug 2003 20:32:31 -0400 > Reply-To: htm...@li... > > Hi Folks, (...) > > I would personally prefer to maintain the tabs, and follow the Sun > Microsystems java coding standard. > http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html > (...) > > Regards > Somik > |
From: Somik R. <so...@ya...> - 2003-08-27 02:11:18
|
Hi Derrick, Hmm.. If you mean that Eclipse will auto-convert tabs to spaces, maybe I misunderstood. As long as one does not have to press space four times... Bytway, just curious as to why tabs are a problem in the first place.. > I'm working on a Java Coding Standards document. What do you think of the Sun coding standard? Cheers, Somik ----- Original Message ----- From: "Derrick Oswald" <Der...@ro...> To: <htm...@li...> Sent: Tuesday, August 26, 2003 9:09 PM Subject: Re: [Htmlparser-developer] tabs > Somik, > > The (rather meager) response to an earlier poll indicated that Eclipse > was the most popular by far, followed by a few NetBeans and JBuilder users. > > Replacing tabs with spaces is automatic in modern editors and IDEs. For > Eclipse, you need to make the settings in the Java/Editor Typing tab. > You also need to make the setting in the Java/Code Formatter Style tab. > For Netbeans use Tools-Options-Editing-Editor Settings-Java Editor-Java > Indentation Engine-...-Expand Tabs To Spaces-True. I don't know how to > do it in JBuilder, but I know it can be done. > > I'm working on a Java Coding Standards document. > > Derrick > > Somik Raha wrote: > > >Hi Folks, > > > >For what its worth, tabs are an incredibly useful and standard way of > >formatting code. A lot of folks use Eclipse, and pressing the tab key every > >so many seconds comes really naturally. It also reduces the risk of RSI > >(pressing space four times as opposed to tab once). Note that the space key > >is a big killer - it really hurts your thumb in the long run. A > >state-of-the-art ergonomic keyboard that I am trying to adjust to takes the > >space key away from the thumb. (Does this reason sound silly? Look at your > >fingers, do you feel any pain in your thumbs? Or your shoulder or your neck? > >Do you want to avoid surgery?) > > > >It would also be good to know what IDE most developers on this project use. > > > >I would personally prefer to maintain the tabs, and follow the Sun > >Microsystems java coding standard. > >http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html > > > >There are pieces of code where the braces are not consistent. > > > >I agree with Fernando - we should have an "official" coding standard that is > >clearly communicated on the site. > > > >Finally, the coding standard is for active developers who must feel > >comfortable with it. My views are secondary to the active developers, as I > >have ceased to contribute beyond an occasional code-cleanup. > > > >Regards > >Somik > > > > I n d u s t r i a l L o g i c , I n c . > >Somik Raha > >Extreme Programmer & Coach > >http://industriallogic.com > >http://industrialxp.org > >866-540-8336 (toll free) > >510-540-8336 (phone) > > > >.. the major danger in vertical thinking is not that of being trapped > >by the obvious but of failing to realize that one may be trapped by > >the obvious. It is not a matter of avoiding vertical thinking but of > >using it and at the same time being aware that it might be > >necessary to escape from a particular way of looking at a situation. > > > >--- Edward De Bono in Lateral Thinking, Chapter 16, Analogies > > > > > > > > > >------------------------------------------------------- > >This sf.net email is sponsored by:ThinkGeek > >Welcome to geek heaven. > >http://thinkgeek.com/sf > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2003-08-27 01:10:13
|
Somik, The (rather meager) response to an earlier poll indicated that Eclipse was the most popular by far, followed by a few NetBeans and JBuilder users. Replacing tabs with spaces is automatic in modern editors and IDEs. For Eclipse, you need to make the settings in the Java/Editor Typing tab. You also need to make the setting in the Java/Code Formatter Style tab. For Netbeans use Tools-Options-Editing-Editor Settings-Java Editor-Java Indentation Engine-...-Expand Tabs To Spaces-True. I don't know how to do it in JBuilder, but I know it can be done. I'm working on a Java Coding Standards document. Derrick Somik Raha wrote: >Hi Folks, > >For what its worth, tabs are an incredibly useful and standard way of >formatting code. A lot of folks use Eclipse, and pressing the tab key every >so many seconds comes really naturally. It also reduces the risk of RSI >(pressing space four times as opposed to tab once). Note that the space key >is a big killer - it really hurts your thumb in the long run. A >state-of-the-art ergonomic keyboard that I am trying to adjust to takes the >space key away from the thumb. (Does this reason sound silly? Look at your >fingers, do you feel any pain in your thumbs? Or your shoulder or your neck? >Do you want to avoid surgery?) > >It would also be good to know what IDE most developers on this project use. > >I would personally prefer to maintain the tabs, and follow the Sun >Microsystems java coding standard. >http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html > >There are pieces of code where the braces are not consistent. > >I agree with Fernando - we should have an "official" coding standard that is >clearly communicated on the site. > >Finally, the coding standard is for active developers who must feel >comfortable with it. My views are secondary to the active developers, as I >have ceased to contribute beyond an occasional code-cleanup. > >Regards >Somik > > I n d u s t r i a l L o g i c , I n c . >Somik Raha >Extreme Programmer & Coach >http://industriallogic.com >http://industrialxp.org >866-540-8336 (toll free) >510-540-8336 (phone) > >.. the major danger in vertical thinking is not that of being trapped >by the obvious but of failing to realize that one may be trapped by >the obvious. It is not a matter of avoiding vertical thinking but of >using it and at the same time being aware that it might be >necessary to escape from a particular way of looking at a situation. > >--- Edward De Bono in Lateral Thinking, Chapter 16, Analogies > > > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |
From: Somik R. <so...@ya...> - 2003-08-27 00:32:15
|
Hi Folks, For what its worth, tabs are an incredibly useful and standard way of formatting code. A lot of folks use Eclipse, and pressing the tab key every so many seconds comes really naturally. It also reduces the risk of RSI (pressing space four times as opposed to tab once). Note that the space key is a big killer - it really hurts your thumb in the long run. A state-of-the-art ergonomic keyboard that I am trying to adjust to takes the space key away from the thumb. (Does this reason sound silly? Look at your fingers, do you feel any pain in your thumbs? Or your shoulder or your neck? Do you want to avoid surgery?) It would also be good to know what IDE most developers on this project use. I would personally prefer to maintain the tabs, and follow the Sun Microsystems java coding standard. http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html There are pieces of code where the braces are not consistent. I agree with Fernando - we should have an "official" coding standard that is clearly communicated on the site. Finally, the coding standard is for active developers who must feel comfortable with it. My views are secondary to the active developers, as I have ceased to contribute beyond an occasional code-cleanup. Regards Somik I n d u s t r i a l L o g i c , I n c . Somik Raha Extreme Programmer & Coach http://industriallogic.com http://industrialxp.org 866-540-8336 (toll free) 510-540-8336 (phone) .. the major danger in vertical thinking is not that of being trapped by the obvious but of failing to realize that one may be trapped by the obvious. It is not a matter of avoiding vertical thinking but of using it and at the same time being aware that it might be necessary to escape from a particular way of looking at a situation. --- Edward De Bono in Lateral Thinking, Chapter 16, Analogies |
From: Holger S. <Hol...@xt...> - 2003-08-26 07:38:33
|
Hi, The proposed standard of an indent to 4 sounds good.=20 We at our company actually use an indent of 2. Would that be ok too? I also support Fernando in his view of more complete coding standards. How about the original one from Sun = (http://java.sun.com/docs/codeconv/)? Also: Many if not most classes are pretty well documented, but some = aren't. I would like to actively join this project again, so that might help a = lot. ;-) Cheers, Holger -----Urspr=FCngliche Nachricht----- Von: Derrick Oswald [mailto:Der...@ro...]=20 Gesendet: Dienstag, 26. August 2003 04:56 An: htm...@li... Betreff: [Htmlparser-developer] tabs I'm thinking of making a gratuitous change to nearly all the htmlparser=20 source files -- replace tabs with spaces. I've been using a tabstop of 4 and my guess is some others have been=20 using 8. This is too much in my opinion, but the point is there seems to = be too much ambiguity in the repository at the moment about whether to=20 use tabs or not and how many spaces they represent and hence how much=20 indent is applied when entering a block of code. Maybe it's my fault.=20 I've been a 4 space person ever since moving away from the old DOS text=20 screens, where it was two spaces, and only because screen real-estate=20 was so precious. So the code I've inserted must look horrendous for=20 those with an 8 spacing. How about arbitrarily dictating that no tabs are allowed, and the indent = is 4? Just set a standard and adhere to it. I know every editor in use has a 'replace tabs with spaces' option and=20 it's just a matter of some people turning that feature on. I can=20 correct the existing files in a few minutes (correctly adding the number = of spaces to get to the next tabs stop, not just globally substituting=20 spaces for tabs). I know this is a religious issue, so I'll gladly offer to convince=20 anyone my way is correct and theirs is wrong, and trump anyone's code=20 drop with one that doesn't contain tabs until they give up. Harrumph! Derrick ------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. = WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the = same time. Free trial click here:http://www.vmware.com/wl/offer/358/0 _______________________________________________ Htmlparser-developer mailing list = Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Fernando M. <fn...@ne...> - 2003-08-26 03:39:47
|
Hi, What do you think about set a complete conding standards? Not only spaces but comments, functions, for's, if's etc. My $0.02. Regards, -fmac > Message: 1 > Date: Mon, 25 Aug 2003 22:55:34 -0400 > From: Derrick Oswald <Der...@ro...> > To: htm...@li... > Subject: [Htmlparser-developer] tabs > Reply-To: htm...@li... > > > I'm thinking of making a gratuitous change to nearly all the htmlparser > source files -- replace tabs with spaces. > > I've been using a tabstop of 4 and my guess is some others have been > using 8. This is too much in my opinion, but the point is there seems to > be too much ambiguity in the repository at the moment about whether to > use tabs or not and how many spaces they represent and hence how much > indent is applied when entering a block of code. Maybe it's my fault. > I've been a 4 space person ever since moving away from the old DOS text > screens, where it was two spaces, and only because screen real-estate > was so precious. So the code I've inserted must look horrendous for > those with an 8 spacing. > > How about arbitrarily dictating that no tabs are allowed, and the indent > is 4? Just set a standard and adhere to it. > > I know every editor in use has a 'replace tabs with spaces' option and > it's just a matter of some people turning that feature on. I can > correct the existing files in a few minutes (correctly adding the number > of spaces to get to the next tabs stop, not just globally substituting > spaces for tabs). > > I know this is a religious issue, so I'll gladly offer to convince > anyone my way is correct and theirs is wrong, and trump anyone's code > drop with one that doesn't contain tabs until they give up. Harrumph! > > Derrick |
From: Derrick O. <Der...@ro...> - 2003-08-26 02:56:14
|
I'm thinking of making a gratuitous change to nearly all the htmlparser source files -- replace tabs with spaces. I've been using a tabstop of 4 and my guess is some others have been using 8. This is too much in my opinion, but the point is there seems to be too much ambiguity in the repository at the moment about whether to use tabs or not and how many spaces they represent and hence how much indent is applied when entering a block of code. Maybe it's my fault. I've been a 4 space person ever since moving away from the old DOS text screens, where it was two spaces, and only because screen real-estate was so precious. So the code I've inserted must look horrendous for those with an 8 spacing. How about arbitrarily dictating that no tabs are allowed, and the indent is 4? Just set a standard and adhere to it. I know every editor in use has a 'replace tabs with spaces' option and it's just a matter of some people turning that feature on. I can correct the existing files in a few minutes (correctly adding the number of spaces to get to the next tabs stop, not just globally substituting spaces for tabs). I know this is a religious issue, so I'll gladly offer to convince anyone my way is correct and theirs is wrong, and trump anyone's code drop with one that doesn't contain tabs until they give up. Harrumph! Derrick |
From: zheng z. <mon...@ya...> - 2003-08-22 18:13:44
|
hello everyone: I want to get a DOM tree after parsing a web page. I saw the Parser could register with registerDomScanners() ,but there is no difference with recreateReader() , does anybody know how to use it? ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca |
From: Marc N. <ma...@ke...> - 2003-08-22 05:24:03
|
Derrick, these changes sound great! Thank you so much for putting so = much work into creating a top notch lexer package. I'll definitely put = some time into going over your code, and I'll definitely help with = testing out the integration once it gets underway. Marc -----Original Message----- From: Derrick Oswald [mailto:Der...@ro...] Sent: Wednesday, August 20, 2003 7:56 PM To: htm...@li... Subject: [Htmlparser-developer] new i/o subsystem Marc, James, Somik, Joshua, Amit, et. al. I've just dropped some speed fixes to the lexer package, the new low=20 level i/o subsystem I've been working on. It now appears to be 10% to 50% faster at getting raw nodes than the=20 NodeReader/parserHelpers were. It's not complete: - it needs an EndNode class for speed and memory reasons - I backed off multi-threading for speed - character set detection isn't really working yet - there's no constructor taking a file name But the next logical step is probably integration into the real parser=20 to run against real test cases. However, I think this will cause a *lot* of unit tests to fail. There are a number of reasons for this: - attributes will have case preserved, I think I've gotten around=20 this temporarily with a switch in the ParserTestCase class - whitespace is preserved, a lot of this has to do with the=20 different line endings handling - the order of attributes in tags is preserved, so toHtml() output=20 is completely different - the count of nodes may be altered by the whitespace nodes, this=20 may require changing the ParserTestCase counting strategy - remark nodes store all the text, even the dashes - I mostly only paid attention to the HTML specification, real HTML=20 is somewhat more exotic All these failing tests will need labour intensive manual attention to=20 detail to get the tests correct again. In other words, once this is integrated there's no turning back. As with any animal that's having it's spine replaced, there's bound to=20 be a bit of pain. So, before that happens, the code should go through a period of severe=20 code review. That's what open source is about right? So if you have some time. please go over the lexer package with a fine=20 tooth comb. Add more test cases to the lexerTests package. Take a look at the toString() output (see testReal in LexerTests for=20 example). Optimize the hell out of it. Bounce it around and see what methods would make you happy. Then add = them. I'm thinking, two weeks minimum, so this period would span at least two=20 integration builds. The first one will be August 24th, so if you don't have CVS access=20 you'll need to start with that. OK, let's have at 'er folks! Derrick ------------------------------------------------------- This SF.net email is sponsored by Dice.com. Did you know that Dice has over 25,000 tech jobs available today? From careers in IT to Engineering to Tech Sales, Dice has tech jobs from the best hiring companies. http://www.dice.com/index.epl?rel_code=3D104 _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2003-08-21 06:03:09
|
Marc, James, Somik, Joshua, Amit, et. al. I've just dropped some speed fixes to the lexer package, the new low level i/o subsystem I've been working on. It now appears to be 10% to 50% faster at getting raw nodes than the NodeReader/parserHelpers were. It's not complete: - it needs an EndNode class for speed and memory reasons - I backed off multi-threading for speed - character set detection isn't really working yet - there's no constructor taking a file name But the next logical step is probably integration into the real parser to run against real test cases. However, I think this will cause a *lot* of unit tests to fail. There are a number of reasons for this: - attributes will have case preserved, I think I've gotten around this temporarily with a switch in the ParserTestCase class - whitespace is preserved, a lot of this has to do with the different line endings handling - the order of attributes in tags is preserved, so toHtml() output is completely different - the count of nodes may be altered by the whitespace nodes, this may require changing the ParserTestCase counting strategy - remark nodes store all the text, even the dashes - I mostly only paid attention to the HTML specification, real HTML is somewhat more exotic All these failing tests will need labour intensive manual attention to detail to get the tests correct again. In other words, once this is integrated there's no turning back. As with any animal that's having it's spine replaced, there's bound to be a bit of pain. So, before that happens, the code should go through a period of severe code review. That's what open source is about right? So if you have some time. please go over the lexer package with a fine tooth comb. Add more test cases to the lexerTests package. Take a look at the toString() output (see testReal in LexerTests for example). Optimize the hell out of it. Bounce it around and see what methods would make you happy. Then add them. I'm thinking, two weeks minimum, so this period would span at least two integration builds. The first one will be August 24th, so if you don't have CVS access you'll need to start with that. OK, let's have at 'er folks! Derrick |
From: Amit R. <ami...@ya...> - 2003-08-18 07:17:14
|
Hi, I looked in the problem briefly, while trying to parse for links on www.009.com the parser loops infinetly on following tags <IMG src="www_009_com home page_files/imode.gif" border=0 width="44" height="54"><A href="http://www.009.com/cgi/machine1.pl"> iモード対応ページ</A> I will look in detail when i get time later. Regards, Amit. NOTE: Parser succesfully returns <IMG src="www_009_com home page_files/qv10anim.gif" border=0 width="28" height="16"><BR><A href="http://www.009.com/suginami/">Digital Photo 杉並デジカメ探偵団 with Casio QV-10A</A> __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |