This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
Log of /trunk/spider
Directory Listing
Revision
57 -
Directory Listing
Modified
Sun Jan 25 16:49:50 2004 UTC
(20 years, 2 months ago)
by
dpavlin
various fixes
Revision
56 -
Directory Listing
Modified
Fri Jan 23 13:10:40 2004 UTC
(20 years, 2 months ago)
by
dpavlin
better support for DocBook generated files
Revision
51 -
Directory Listing
Modified
Tue Jan 20 18:40:06 2004 UTC
(20 years, 2 months ago)
by
dpavlin
better removal of JavaScript
Revision
50 -
Directory Listing
Modified
Tue Jan 20 18:13:32 2004 UTC
(20 years, 2 months ago)
by
dpavlin
support for 0-size files
Revision
48 -
Directory Listing
Modified
Tue Jan 20 16:01:13 2004 UTC
(20 years, 2 months ago)
by
dpavlin
removed debugging output
Revision
46 -
Directory Listing
Modified
Sat Jan 17 23:57:55 2004 UTC
(20 years, 2 months ago)
by
dpavlin
- moved text/html content filtering to filter.pm to faciliate code re-use
- added progspider which can be used with -S prog to crawl files and
use filtering subroutines
Revision
45 -
Directory Listing
Modified
Wed Nov 19 12:07:07 2003 UTC
(20 years, 4 months ago)
by
dpavlin
fixes and improvements
Revision
42 -
Directory Listing
Modified
Tue Jul 29 10:40:58 2003 UTC
(20 years, 8 months ago)
by
dpavlin
better handling of chars in URL, support for
<!-- noindex -->, <!-- index --> which is supported natively in swish 2.4
Revision
40 -
Directory Listing
Modified
Sun Jun 1 11:45:19 2003 UTC
(20 years, 10 months ago)
by
dpavlin
- support for listing of files in .tar.gz; decompressing of .gz and .bz2
content
- changed order of arguments for swishspider: now baseurl,url (but it's
backwards compatibile, so your old configurations will work)
- do html fixup just on html files (to prevent binary archive corruption)
- crawl sites that have frames
Revision
32 -
Directory Listing
Modified
Wed Apr 30 12:40:09 2003 UTC
(20 years, 11 months ago)
by
dpavlin
added make_config.pl which creates swish config file
added checkbox to hide document properties (like content, size etc)
remove comments between <html> and <head> which confuse swish
Revision
30 -
Directory Listing
Modified
Mon Mar 24 09:57:44 2003 UTC
(21 years ago)
by
dpavlin
added instructions about formating of html before indexing it (and added
ability to unroll wrongly splited tags in form which is acceptable to swish)
Revision
15 -
Directory Listing
Modified
Sun Mar 16 21:31:55 2003 UTC
(21 years ago)
by
dpavlin
support for image map and skip pictures (speedup)
Revision
1 -
Directory Listing
Added
Tue Jun 4 06:39:53 2002 UTC
(21 years, 10 months ago)
by
dpavlin
Initial revision