/[tokyocabinet-toys]/crawl.sh
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /crawl.sh

Parent Directory Parent Directory | Revision Log Revision Log


Revision 4 - (show annotations)
Tue Jul 21 14:33:31 2009 UTC (14 years, 9 months ago) by dpavlin
File MIME type: application/x-sh
File size: 450 byte(s)
strip trailing slash

1 #!/bin/sh -x
2
3 test -z "$1" && echo "Usage: $0 http://www.example.com" && exit
4
5 base=`echo $1 | sed -e 's!http://!!' -e 's!/*$!!'`
6 dir=`dirname $base`
7 test ! -z "$dir" -a ! -d "$dir" && mkdir -v -p $dir
8
9 test -f $base.tsv || ruby wgettsv -allow "$base" -deny cgi -max 10000 http://$base > $base.tsv || exit
10 tctmgr importtsv $base.tct $base.tsv
11 tctmgr setindex -it qgram $base.tct title
12 tctmgr setindex -it qgram $base.tct body
13 tctmgr inform $base.tct

Properties

Name Value
svn:executable *

  ViewVC Help
Powered by ViewVC 1.1.26