htmlparser-developer Mailing List for HTML Parser

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Chris,

Maybe you misconstrued the open source paradigm, it's only slightly 
organized anarchy.  You can do whatever you want, with or without my, or 
any one elses, permission. It's not my code, I only rent.

BTW, you don't lose the power of inheritance, you only constrain it to 
an interface driven methdology.

Derrick

Christopher Bird wrote:

> Great, thanks.
>
> Yes I had thought that a factory mechanism would be a good way as well 
> - almost a decorator pattern at that point, I think. Actually, that 
> whole idea suggests a development paradigm for OO projects in general. 
> Of course you lose the power of inheritance (and the native engine 
> performance opportunities), but you gain a great deal of flexibility.
>
> I would love to hear some consensus (or at least informed opinions) on 
> this.
>
> I also plan (with your permission) to write to the IEEE Sofware 
> Engineering magazine (I am an IEEE member) and ask for opinions there 
> too. I would like your permission because I would like to reference 
> this concrete example. I would be glad to submit the letter to you 
> before sending it - I am not interested in making waves, but am always 
> interested in finding ways to make our profession better. Since the 
> HTMLParser is such a well executed piece of software, it strikes me 
> that it would make a good example for the letter.
>
> Regards
>
> Chris
>
>
>
>
>> From: Derrick Oswald <Der...@ro...>
>> To: sea...@ho...
>> CC: htm...@li...
>> Subject: Re: Adding methods to Tag
>> Date: Fri, 29 Aug 2003 22:20:09 -0400
>> MIME-Version: 1.0
>> Received: from fep02-mail.bloor.is.net.cable.rogers.com 
>> ([66.185.86.72]) by mc8-f13.law1.hotmail.com with Microsoft 
>> SMTPSVC(5.0.2195.5600); Fri, 29 Aug 2003 19:20:12 -0700
>> Received: from rogers.com ([24.102.205.244])          by 
>> fep02-mail.bloor.is.net.cable.rogers.com          (InterMail 
>> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP          id 
>> <200...@ro...>; 
>>          Fri, 29 Aug 2003 22:20:10 -0400
>> X-Message-Info: JGTYoYF78jHaxjh7Y9B8uHCMhasyqgjM
>> Message-ID: <3F5...@ro...>
>> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) 
>> Gecko/20030225
>> X-Accept-Language: en-us, en
>> References: <Sea...@ho...>
>> In-Reply-To: <Sea...@ho...>
>> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at 
>> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] using 
>> ID <der...@ro...> at Fri, 29 Aug 2003 22:20:10 -0400
>> Return-Path: Der...@ro...
>> X-OriginalArrivalTime: 30 Aug 2003 02:20:12.0495 (UTC) 
>> FILETIME=[3DF5DDF0:01C36E9D]
>>
>> Chris,
>>
>> I'm opening this up to a wider audience, because it may have been 
>> solved before, or might be of interest to others with the same problem.
>>
>> The basic problem is how to add functionality like supportsColor() to 
>> base classes, like Tag, without recompiling the whole class heirarchy.
>>
>> One way would be to join the htmlparser project as a developer and 
>> just add it, if it's germane to others besides yourself. If it's not, 
>> then a bolt-on is needed.
>>
>> One way to handle this problem is a 'Factory' mechanism.  A 
>> 'deep-in-the-bowels' class would ask the 'factory' for a tag, i.e. 
>> factory.makeTag ("Form"). So you would wedge your own factory in 
>> there. Choosing the factory is usually done with a Class.forName() 
>> where the string specifying the class comes from a configuration 
>> setting.  With some design effort, we should be able to come up with 
>> a definition for a factory class and a suitable set of interfaces 
>> which the whole project would be refactored to use, i.e. the IFormTag 
>> interface extends the ICompositeTag interface and adds form related 
>> methods; the ICompositeTag interface extends the IBaseTag interface 
>> and adds child accessors; and nothing references FormTag directly 
>> except the factory.
>>
>> So then there is the problem of your factory supplying your special 
>> tag that implements IFormTag *and* IColorSupport when makeTag 
>> ("Form") is called. Most of what you need is already written in 
>> FormTag, you just need to add a couple of methods.  I think this is 
>> where dynamic proxies come in: 
>> http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html. The 
>> InvocationHandler would determine if the target method comes from 
>> IColorSupport, and if so perform the needful directly.  Otherwise it 
>> would delegate to the wrapped tag object.  This means the whole 
>> htmlparser project shuttles wrapped tag objects around and doesn't 
>> know it, till they bubble up to your code where you cast them to an 
>> IColorSupport and invoke the supportsColor() method:
>>
>> Parser.setTagFactory ("ChrisBirdFactory");
>> Parser parser = new Parser (url);
>> parser.registerScanners ();
>> for (NodeIterator e = parser.elements (); e.hasMoreNodes (); )
>> {
>>    Node node = e.nextNode ();
>>    if (node instanceof Tag) // I presume all tags, but not nodes, 
>> support IColorSupport
>>        ((IColorSupport)node).supportsColor ();
>> }
>>
>> Derrick
>>
>> Christopher Bird wrote:
>>
>>> Thanks for the reply, that is my dilemma.
>>>
>>> I am an old Smalltalk programmer from years gone by and have always 
>>> used what is sometimes called responsibility driven development. So 
>>> (at least in my head), the responsibilty for knowing that a tag 
>>> supports color or BG color is the tag's responsibilty, and not the 
>>> responsibility of some agent acting on the tag.
>>>
>>> The trouble with that style of development, especially for 
>>> "packaged" software is that you(I) find yourself(myself) in a bind 
>>> like this one.
>>>
>>> Indeed I had to recompile the whole package! But since I have the 
>>> source in my project (to help me learn the intricacies of certain 
>>> behaviors - especially the creation of handlers for very complex 
>>> tags) that was no big deal for me. However now I am in violation of 
>>> protocol for OpenSource, I am sure.
>>>
>>> This really gets to the crux of OOness and Open Source development. 
>>> When there are requirements for classes high in the inheritance 
>>> hierarchy and they do "rightfully" belong there how does one get 
>>> them there - short term to overcome a specific issue, and long term 
>>> as part of the overall release cycle of the product.
>>>
>>> I am probably not the first to wonder this!
>>>
>>> BTW, I love the implementation. It took some mind-bending to get 
>>> used to it at first - again separating the responsibilities out so I 
>>> can factor my solutions properly was initialy a challenge, but I 
>>> have become very productive.
>>>
>>> Thank you so much for an excellent piece of technology.
>>>
>>> Regards
>>>
>>> Chris
>>>
>>>
>>>> From: Derrick Oswald <Der...@ro...>
>>>> To: Christopher Bird <se...@us...>
>>>> Subject: Re: Adding methods to Tag
>>>> Date: Fri, 29 Aug 2003 07:41:08 -0400
>>>> MIME-Version: 1.0
>>>> Received: from sc8-sf-mx1.sourceforge.net ([66.35.250.206]) by 
>>>> mc4-f42.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); 
>>>> Fri, 29 Aug 2003 04:41:47 -0700
>>>> Received: from fep02-mail.bloor.is.net.cable.rogers.com 
>>>> ([66.185.86.72])by sc8-sf-mx1.sourceforge.net with esmtp (Exim 
>>>> 4.22)id 19shdQ-0004My-0Sfor se...@us...; Fri, 29 
>>>> Aug 2003 04:41:44 -0700
>>>> Received: from rogers.com ([24.102.205.244])          by 
>>>> fep02-mail.bloor.is.net.cable.rogers.com          (InterMail 
>>>> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP          id 
>>>> <200...@ro...> 
>>>>          for <se...@us...>;          Fri, 29 Aug 
>>>> 2003 07:41:12 -0400
>>>> X-Message-Info: JGTYoYF78jGnyWgKUPy676KmG5L9JDoH
>>>> Message-ID: <3F4...@ro...>
>>>> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) 
>>>> Gecko/20030225
>>>> X-Accept-Language: en-us, en
>>>> References: <E19...@sc...>
>>>> In-Reply-To: <E19...@sc...>
>>>> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at 
>>>> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] 
>>>> using ID <der...@ro...> at Fri, 29 Aug 2003 07:41:12 -0400
>>>> X-Spam-Score: -2.1 (--)
>>>> X-Spam-Report: -2.1/5.0The original message has been attached along 
>>>> with this report, soyou can recognize or block similar mail in 
>>>> future.See http://spamassassin.org/tag/ for more details.Content 
>>>> preview:  Chris, It's unclear how your ChrisTag method would 
>>>> workwithout recompiling the whole package. The Tag class 
>>>> extendsAbstractNode, so presumably ChrisTag would extend 
>>>> AbstractNode and addthe methods you want, then Tag would extend 
>>>> ChrisTag. You would stillneed to 'fix' each new release by 
>>>> doctoring Tag. [...] Content analysis details:   (-2.10 points, 5 
>>>> required)USER_AGENT_MOZILLA_UA (0.0 points)  User-Agent header 
>>>> indicates a non-spam MUA (Mozilla)IN_REP_TO          (-0.5 points) 
>>>> Has a In-Reply-To headerX_ACCEPT_LANG      (-0.1 points) Has a 
>>>> X-Accept-Language  headerREFERENCES         (-0.5 points) Has a 
>>>> valid-looking References headerEMAIL_ATTRIBUTION  (-0.5 points) 
>>>> BODY: Contains what looks like an email 
>>>> attributionREPLY_WITH_QUOTES  (-0.5 points) Reply with quoted text
>>>> Return-Path: Der...@ro...
>>>> X-OriginalArrivalTime: 29 Aug 2003 11:41:50.0512 (UTC) 
>>>> FILETIME=[89217300:01C36E22]
>>>>
>>>> Chris,
>>>>
>>>> It's unclear how your ChrisTag method would work without 
>>>> recompiling the whole package. The Tag class extends AbstractNode, 
>>>> so presumably ChrisTag would extend AbstractNode and add the 
>>>> methods you want, then Tag would extend ChrisTag. You would still 
>>>> need to 'fix' each new release by doctoring Tag.
>>>>
>>>> The best way is probably to have a class external to everything 
>>>> with the static methods needed (see Tag.breaksFlow() for example 
>>>> code):
>>>>    class ColorKnowledge {
>>>>        public static boolean supportsColor (Node node)
>>>>        {   return 
>>>> (listofNodesSupportingForegroundColor.contains(node.getText().toUpperCase()));} 
>>>>
>>>>
>>>>        ...
>>>>
>>>> If it's generic enough, submit it and we'll add it to Node.
>>>>
>>>> Derrick
>>>>
>>>> Christopher Bird wrote:
>>>>
>>>>> I am new to using OpenSource code. I have found it very
>>>>>
>>>>> helpful, and am using the HTMLParser for a number of
>>>>>
>>>>> purposes.
>>>>>
>>>>>
>>>>>
>>>>> I am wanting to add some code to Tag - especially the
>>>>>
>>>>> following two methods:
>>>>>
>>>>>
>>>>>
>>>>> public boolean supportsColor()
>>>>>
>>>>> /* returns true iff the color attribute is valid for this tag
>>>>>
>>>>>
>>>>>
>>>>> public boolean supportsBGColor ()
>>>>>
>>>>>
>>>>>
>>>>> /* Returns true iff the bgColor attribute is valid for this tag
>>>>>
>>>>>
>>>>>
>>>>> I would be happy if you guys were to add that, but failing
>>>>>
>>>>> that what is the process if I have to do it myself? There may
>>>>>
>>>>> be a bunch of other things that I will want to add to Tag -
>>>>>
>>>>> for handling some of my own custom behaviors.
>>>>>
>>>>>
>>>>>
>>>>> I can see a couple of ways of doing this (none pretty). One is
>>>>>
>>>>> to create a new ChrisTag superclass and change Tag's
>>>>>
>>>>> implements clause to implements ChrisTag. I can then define
>>>>>
>>>>> my methods there. Of course the community doesn't get the
>>>>>
>>>>> benefit (? dubious in some cases, I fear) of my additions.
>>>>>
>>>>>
>>>>>
>>>>> The other obvious way is simply to add the methods to Tag
>>>>>
>>>>> itself. I am not wild about doing that either because as I
>>>>>
>>>>> download new editions of HTMLParser, my changes get lost -
>>>>>
>>>>> especially since I am a solo practitioner at the moment and
>>>>>
>>>>> am not using a source code management system.
>>>>>
>>>>>
>>>>>
>>>>> Any assistance would be gratefully appreciated - both to the
>>>>>
>>>>> short term (immediate) problem and to the general question.
>>>>>
>>>>>
>>>>>
>>>>> Thanks in advance
>>>>>
>>>>>
>>>>>
>>>>> Chris Bird
>>>>

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (1)	Dec (4)
2002	Jan (12)	Feb	Mar (7)	Apr (27)	May (14)	Jun (16)	Jul (27)	Aug (74)	Sep (1)	Oct (23)	Nov (12)	Dec (119)
2003	Jan (31)	Feb (23)	Mar (28)	Apr (59)	May (119)	Jun (10)	Jul (3)	Aug (17)	Sep (8)	Oct (38)	Nov (6)	Dec (1)
2004	Jan (4)	Feb (4)	Mar (1)	Apr (2)	May	Jun (7)	Jul (6)	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan	Feb (1)	Mar	Apr (8)	May	Jun	Jul	Aug (2)	Sep (10)	Oct (4)	Nov (15)	Dec
2006	Jan	Feb (1)	Mar	Apr (4)	May (11)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2007	Jan (3)	Feb (2)	Mar	Apr (2)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (5)	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr (2)	May	Jun (4)	Jul	Aug (1)	Sep	Oct	Nov	Dec (2)
2010	Jan (1)	Feb	Mar	Apr (8)	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (1)	Dec
2011	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (2)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov (2)	Dec (2)

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18 (1)	19	20	21 (1)	22 (2)	23
24	25	26 (3)	27 (7)	28	29	30 (3)
31

htmlparser-developer Mailing List for HTML Parser

htmlparser-developer — The developer mailing list of the htmlparser project