htmlparser-developer Mailing List for HTML Parser
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
(1) |
19
|
20
|
21
(1) |
22
(2) |
23
|
24
|
25
|
26
(3) |
27
(7) |
28
|
29
|
30
(3) |
31
|
|
|
|
|
|
|
From: zheng z. <mon...@ya...> - 2003-08-22 18:13:44
|
hello everyone: I want to get a DOM tree after parsing a web page. I saw the Parser could register with registerDomScanners() ,but there is no difference with recreateReader() , does anybody know how to use it? ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca |
From: Marc N. <ma...@ke...> - 2003-08-22 05:24:03
|
Derrick, these changes sound great! Thank you so much for putting so = much work into creating a top notch lexer package. I'll definitely put = some time into going over your code, and I'll definitely help with = testing out the integration once it gets underway. Marc -----Original Message----- From: Derrick Oswald [mailto:Der...@ro...] Sent: Wednesday, August 20, 2003 7:56 PM To: htm...@li... Subject: [Htmlparser-developer] new i/o subsystem Marc, James, Somik, Joshua, Amit, et. al. I've just dropped some speed fixes to the lexer package, the new low=20 level i/o subsystem I've been working on. It now appears to be 10% to 50% faster at getting raw nodes than the=20 NodeReader/parserHelpers were. It's not complete: - it needs an EndNode class for speed and memory reasons - I backed off multi-threading for speed - character set detection isn't really working yet - there's no constructor taking a file name But the next logical step is probably integration into the real parser=20 to run against real test cases. However, I think this will cause a *lot* of unit tests to fail. There are a number of reasons for this: - attributes will have case preserved, I think I've gotten around=20 this temporarily with a switch in the ParserTestCase class - whitespace is preserved, a lot of this has to do with the=20 different line endings handling - the order of attributes in tags is preserved, so toHtml() output=20 is completely different - the count of nodes may be altered by the whitespace nodes, this=20 may require changing the ParserTestCase counting strategy - remark nodes store all the text, even the dashes - I mostly only paid attention to the HTML specification, real HTML=20 is somewhat more exotic All these failing tests will need labour intensive manual attention to=20 detail to get the tests correct again. In other words, once this is integrated there's no turning back. As with any animal that's having it's spine replaced, there's bound to=20 be a bit of pain. So, before that happens, the code should go through a period of severe=20 code review. That's what open source is about right? So if you have some time. please go over the lexer package with a fine=20 tooth comb. Add more test cases to the lexerTests package. Take a look at the toString() output (see testReal in LexerTests for=20 example). Optimize the hell out of it. Bounce it around and see what methods would make you happy. Then add = them. I'm thinking, two weeks minimum, so this period would span at least two=20 integration builds. The first one will be August 24th, so if you don't have CVS access=20 you'll need to start with that. OK, let's have at 'er folks! Derrick ------------------------------------------------------- This SF.net email is sponsored by Dice.com. Did you know that Dice has over 25,000 tech jobs available today? From careers in IT to Engineering to Tech Sales, Dice has tech jobs from the best hiring companies. http://www.dice.com/index.epl?rel_code=3D104 _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |