This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
Log of /trunk/spider
Directory Listing
Revision
66 -
Directory Listing
Modified
Wed Mar 17 12:19:42 2004 UTC
(20 years, 3 months ago)
by
dpavlin
index pdf files page-by-page
Revision
65 -
Directory Listing
Modified
Wed Mar 17 12:19:14 2004 UTC
(20 years, 3 months ago)
by
dpavlin
fixed back-references in regexps
Revision
63 -
Directory Listing
Modified
Fri Feb 6 13:29:39 2004 UTC
(20 years, 4 months ago)
by
dpavlin
convert pdf files when indexing with progspider
Revision
61 -
Directory Listing
Modified
Thu Jan 29 18:26:19 2004 UTC
(20 years, 4 months ago)
by
dpavlin
better extracting of titles
Revision
57 -
Directory Listing
Modified
Sun Jan 25 16:49:50 2004 UTC
(20 years, 5 months ago)
by
dpavlin
various fixes
Revision
56 -
Directory Listing
Modified
Fri Jan 23 13:10:40 2004 UTC
(20 years, 5 months ago)
by
dpavlin
better support for DocBook generated files
Revision
51 -
Directory Listing
Modified
Tue Jan 20 18:40:06 2004 UTC
(20 years, 5 months ago)
by
dpavlin
better removal of JavaScript
Revision
50 -
Directory Listing
Modified
Tue Jan 20 18:13:32 2004 UTC
(20 years, 5 months ago)
by
dpavlin
support for 0-size files
Revision
48 -
Directory Listing
Modified
Tue Jan 20 16:01:13 2004 UTC
(20 years, 5 months ago)
by
dpavlin
removed debugging output
Revision
46 -
Directory Listing
Modified
Sat Jan 17 23:57:55 2004 UTC
(20 years, 5 months ago)
by
dpavlin
- moved text/html content filtering to filter.pm to faciliate code re-use
- added progspider which can be used with -S prog to crawl files and
use filtering subroutines
Revision
45 -
Directory Listing
Modified
Wed Nov 19 12:07:07 2003 UTC
(20 years, 7 months ago)
by
dpavlin
fixes and improvements
Revision
42 -
Directory Listing
Modified
Tue Jul 29 10:40:58 2003 UTC
(20 years, 10 months ago)
by
dpavlin
better handling of chars in URL, support for
<!-- noindex -->, <!-- index --> which is supported natively in swish 2.4
Revision
40 -
Directory Listing
Modified
Sun Jun 1 11:45:19 2003 UTC
(21 years ago)
by
dpavlin
- support for listing of files in .tar.gz; decompressing of .gz and .bz2
content
- changed order of arguments for swishspider: now baseurl,url (but it's
backwards compatibile, so your old configurations will work)
- do html fixup just on html files (to prevent binary archive corruption)
- crawl sites that have frames
Revision
32 -
Directory Listing
Modified
Wed Apr 30 12:40:09 2003 UTC
(21 years, 1 month ago)
by
dpavlin
added make_config.pl which creates swish config file
added checkbox to hide document properties (like content, size etc)
remove comments between <html> and <head> which confuse swish
Revision
30 -
Directory Listing
Modified
Mon Mar 24 09:57:44 2003 UTC
(21 years, 3 months ago)
by
dpavlin
added instructions about formating of html before indexing it (and added
ability to unroll wrongly splited tags in form which is acceptable to swish)
Revision
15 -
Directory Listing
Modified
Sun Mar 16 21:31:55 2003 UTC
(21 years, 3 months ago)
by
dpavlin
support for image map and skip pictures (speedup)
Revision
1 -
Directory Listing
Added
Tue Jun 4 06:39:53 2002 UTC
(22 years ago)
by
dpavlin
Initial revision