Parent Directory | Revision Log
- support for listing of files in .tar.gz; decompressing of .gz and .bz2 content - changed order of arguments for swishspider: now baseurl,url (but it's backwards compatibile, so your old configurations will work) - do html fixup just on html files (to prevent binary archive corruption) - crawl sites that have frames
1 | dpavlin | 4 | ################################################### |
2 | dpavlin | 40 | IncludeConfigFile /data/swish/common.config |
3 | dpavlin | 4 | |
4 | # this is a cludge to implement no parent URL feature in swish indexer | ||
5 | #IndexDir "start_URI don't_go_above_this_URI" | ||
6 | dpavlin | 40 | IndexDir "https://www.rot13.org/~dpavlin/ https://www.rot13.org/~dpavlin/" |
7 | dpavlin | 4 | # remove parent URI from index file |
8 | dpavlin | 40 | ReplaceRules regex "!^http://.*rot13.org/~dpavlin/ !!i" |
9 | dpavlin | 4 | #ReplaceRules regex "! http://.*$!!i" |
10 | |||
11 | IndexFile /data/swish/index/rot13 | ||
12 | |||
13 | IndexName "rot13" | ||
14 | IndexDescription "Personal web pages" | ||
15 | IndexPointer "https://www.rot13.org/~dpavlin/" | ||
16 | IndexAdmin "dpavlin@rot13.org" | ||
17 | |||
18 | # servers which are same | ||
19 | EquivalentServer http://cgi.rot13.org http://www.rot13.org | ||
20 | # rewrite names for index | ||
21 | ReplaceRules replace cgi.rot13.org www.rot13.org | ||
22 | |||
23 | # how many bytes of html to store? | ||
24 | StoreDescription HTML <body> 500 | ||
25 |
Name | Value |
---|---|
cvs2svn:cvs-rev | 1.3 |
ViewVC Help | |
Powered by ViewVC 1.1.26 |