This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
Log of /trunk/spider
Directory Listing
Revision
72 -
Directory Listing
Modified
Tue Apr 6 15:06:58 2004 UTC
(19 years, 11 months ago)
by
dpavlin
pdf pagination now works correctly
Revision
71 -
Directory Listing
Modified
Sat Apr 3 15:15:36 2004 UTC
(19 years, 11 months ago)
by
dpavlin
remove empty lines before <html> so that swish parser will catch <title>
correctly
Revision
69 -
Directory Listing
Modified
Thu Mar 18 23:07:21 2004 UTC
(20 years ago)
by
dpavlin
more verbose adding of titles
Revision
68 -
Directory Listing
Modified
Thu Mar 18 11:14:49 2004 UTC
(20 years ago)
by
dpavlin
don't save empty pages in index
Revision
66 -
Directory Listing
Modified
Wed Mar 17 12:19:42 2004 UTC
(20 years ago)
by
dpavlin
index pdf files page-by-page
Revision
65 -
Directory Listing
Modified
Wed Mar 17 12:19:14 2004 UTC
(20 years ago)
by
dpavlin
fixed back-references in regexps
Revision
63 -
Directory Listing
Modified
Fri Feb 6 13:29:39 2004 UTC
(20 years, 1 month ago)
by
dpavlin
convert pdf files when indexing with progspider
Revision
61 -
Directory Listing
Modified
Thu Jan 29 18:26:19 2004 UTC
(20 years, 2 months ago)
by
dpavlin
better extracting of titles
Revision
57 -
Directory Listing
Modified
Sun Jan 25 16:49:50 2004 UTC
(20 years, 2 months ago)
by
dpavlin
various fixes
Revision
56 -
Directory Listing
Modified
Fri Jan 23 13:10:40 2004 UTC
(20 years, 2 months ago)
by
dpavlin
better support for DocBook generated files
Revision
51 -
Directory Listing
Modified
Tue Jan 20 18:40:06 2004 UTC
(20 years, 2 months ago)
by
dpavlin
better removal of JavaScript
Revision
50 -
Directory Listing
Modified
Tue Jan 20 18:13:32 2004 UTC
(20 years, 2 months ago)
by
dpavlin
support for 0-size files
Revision
48 -
Directory Listing
Modified
Tue Jan 20 16:01:13 2004 UTC
(20 years, 2 months ago)
by
dpavlin
removed debugging output
Revision
46 -
Directory Listing
Modified
Sat Jan 17 23:57:55 2004 UTC
(20 years, 2 months ago)
by
dpavlin
- moved text/html content filtering to filter.pm to faciliate code re-use
- added progspider which can be used with -S prog to crawl files and
use filtering subroutines
Revision
45 -
Directory Listing
Modified
Wed Nov 19 12:07:07 2003 UTC
(20 years, 4 months ago)
by
dpavlin
fixes and improvements
Revision
42 -
Directory Listing
Modified
Tue Jul 29 10:40:58 2003 UTC
(20 years, 8 months ago)
by
dpavlin
better handling of chars in URL, support for
<!-- noindex -->, <!-- index --> which is supported natively in swish 2.4
Revision
40 -
Directory Listing
Modified
Sun Jun 1 11:45:19 2003 UTC
(20 years, 10 months ago)
by
dpavlin
- support for listing of files in .tar.gz; decompressing of .gz and .bz2
content
- changed order of arguments for swishspider: now baseurl,url (but it's
backwards compatibile, so your old configurations will work)
- do html fixup just on html files (to prevent binary archive corruption)
- crawl sites that have frames
Revision
32 -
Directory Listing
Modified
Wed Apr 30 12:40:09 2003 UTC
(20 years, 11 months ago)
by
dpavlin
added make_config.pl which creates swish config file
added checkbox to hide document properties (like content, size etc)
remove comments between <html> and <head> which confuse swish
Revision
30 -
Directory Listing
Modified
Mon Mar 24 09:57:44 2003 UTC
(21 years ago)
by
dpavlin
added instructions about formating of html before indexing it (and added
ability to unroll wrongly splited tags in form which is acceptable to swish)
Revision
15 -
Directory Listing
Modified
Sun Mar 16 21:31:55 2003 UTC
(21 years ago)
by
dpavlin
support for image map and skip pictures (speedup)
Revision
1 -
Directory Listing
Added
Tue Jun 4 06:39:53 2002 UTC
(21 years, 9 months ago)
by
dpavlin
Initial revision