/[swish]/trunk
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Log of /trunk

View Directory Listing Directory Listing


Sticky Revision:

Revision 107 - Directory Listing
Modified Sat Jul 9 13:14:25 2005 UTC (13 years, 11 months ago) by dpavlin
highlite to last word characters to catch suffixes


Revision 106 - Directory Listing
Modified Sat Jul 9 13:10:22 2005 UTC (13 years, 11 months ago) by dpavlin
experiment with HyperEstraier perl module


Revision 105 - Directory Listing
Modified Sat Jul 9 13:09:57 2005 UTC (13 years, 11 months ago) by dpavlin
fixed


Revision 104 - Directory Listing
Modified Sat Apr 30 23:35:27 2005 UTC (14 years, 1 month ago) by dpavlin
merge index slices into single index


Revision 103 - Directory Listing
Modified Sat Apr 30 23:29:27 2005 UTC (14 years, 1 month ago) by dpavlin
fixed warning


Revision 102 - Directory Listing
Modified Sat Apr 30 23:29:14 2005 UTC (14 years, 1 month ago) by dpavlin
check indexes, re-index if needed


Revision 101 - Directory Listing
Modified Sat Apr 30 20:21:10 2005 UTC (14 years, 1 month ago) by dpavlin
small fixes


Revision 100 - Directory Listing
Modified Sat Apr 30 20:21:02 2005 UTC (14 years, 1 month ago) by dpavlin
extract title from beginning of document if no other data is found


Revision 99 - Directory Listing
Modified Sat Apr 30 20:20:42 2005 UTC (14 years, 1 month ago) by dpavlin
support for multipe directories


Revision 98 - Directory Listing
Modified Sun Apr 24 18:09:01 2005 UTC (14 years, 1 month ago) by dpavlin
added --skipoutput (for testing)


Revision 97 - Directory Listing
Modified Sun Apr 24 16:44:13 2005 UTC (14 years, 1 month ago) by dpavlin
small changes


Revision 96 - Directory Listing
Modified Sun Apr 24 16:34:21 2005 UTC (14 years, 1 month ago) by dpavlin
added merge splitting in slices


Revision 95 - Directory Listing
Modified Sun Apr 24 16:33:53 2005 UTC (14 years, 1 month ago) by dpavlin
added --exclude path


Revision 94 - Directory Listing
Modified Sun Apr 24 16:33:21 2005 UTC (14 years, 1 month ago) by dpavlin
added experimental slicing before merge (not used)


Revision 93 - Directory Listing
Modified Mon Nov 22 17:09:44 2004 UTC (14 years, 6 months ago) by dpavlin
better cleanup


Revision 92 - Directory Listing
Modified Mon Nov 22 17:09:23 2004 UTC (14 years, 6 months ago) by dpavlin
skip symlinks


Revision 91 - Directory Listing
Modified Tue Sep 14 19:29:50 2004 UTC (14 years, 9 months ago) by dpavlin
fixed warning


Revision 90 - Directory Listing
Modified Wed Sep 1 14:12:57 2004 UTC (14 years, 9 months ago) by dpavlin
support for array of arrays in highlite, this way you may
fill alternative spelling from e.g. Lingua::Spelling::Alternative
and get correct highlightning


Revision 89 - Directory Listing
Modified Tue Aug 31 09:04:15 2004 UTC (14 years, 9 months ago) by dpavlin
extract snippet and highlite into separate module


Revision 88 - Directory Listing
Modified Tue Aug 31 07:47:05 2004 UTC (14 years, 9 months ago) by dpavlin
ignore not words (-computer) in queries when highliting


Revision 87 - Directory Listing
Modified Mon Aug 30 16:59:17 2004 UTC (14 years, 9 months ago) by dpavlin
much, much better snippets


Revision 86 - Directory Listing
Modified Mon Aug 30 11:16:39 2004 UTC (14 years, 9 months ago) by dpavlin
better snippets


Revision 85 - Directory Listing
Modified Mon Aug 30 11:14:24 2004 UTC (14 years, 9 months ago) by dpavlin
extract metadata for LJ


Revision 84 - Directory Listing
Modified Sun Aug 29 21:19:13 2004 UTC (14 years, 9 months ago) by dpavlin
if pdf file doesn't have a title, display filesname and page number


Revision 83 - Directory Listing
Modified Sun Aug 29 18:26:58 2004 UTC (14 years, 9 months ago) by dpavlin
produce valid html, escape characters in snippet


Revision 82 - Directory Listing
Modified Sun Aug 29 18:17:15 2004 UTC (14 years, 9 months ago) by dpavlin
added maximum size of content to extract snippet from (16k), smaller other
improvements


Revision 81 - Directory Listing
Modified Sat Aug 28 22:15:59 2004 UTC (14 years, 9 months ago) by dpavlin
implement snippets of content and highlighthing of words


Revision 80 - Directory Listing
Modified Sat May 22 18:33:33 2004 UTC (15 years, 1 month ago) by dpavlin
major improvement: added <path2title> to configuration so that you can specify
part of path to add prefix (collection title) to results,
code cleanup (removed unused parts of code), specified but non-existant
affix and findaffix files will be skipped


Revision 79 - Directory Listing
Modified Sun Apr 18 08:36:35 2004 UTC (15 years, 2 months ago) by dpavlin
new search URL


Revision 78 - Directory Listing
Modified Sun Apr 18 08:11:22 2004 UTC (15 years, 2 months ago) by dpavlin
modified configuration to include frameset which will have search on
top and normal mailman or search results on bottom


Revision 77 - Directory Listing
Modified Sun Apr 18 06:31:38 2004 UTC (15 years, 2 months ago) by dpavlin
index MailMan archives


Revision 76 - Directory Listing
Modified Sat Apr 17 18:41:21 2004 UTC (15 years, 2 months ago) by dpavlin
Pages: translation


Revision 75 - Directory Listing
Modified Sat Apr 17 18:34:45 2004 UTC (15 years, 2 months ago) by dpavlin
new navigation: previous page (<<), previous set (..), pages (1..x),
next set (..), next page (>>)


Revision 74 - Directory Listing
Modified Wed Apr 7 12:54:21 2004 UTC (15 years, 2 months ago) by dpavlin
fix title extraction (again)


Revision 73 - Directory Listing
Modified Tue Apr 6 19:21:07 2004 UTC (15 years, 2 months ago) by dpavlin
print collection name before link - collection name
is part of document title up to first " :: " delimiter


Revision 72 - Directory Listing
Modified Tue Apr 6 15:06:58 2004 UTC (15 years, 2 months ago) by dpavlin
pdf pagination now works correctly


Revision 71 - Directory Listing
Modified Sat Apr 3 15:15:36 2004 UTC (15 years, 2 months ago) by dpavlin
remove empty lines before <html> so that swish parser will catch <title>
correctly


Revision 70 - Directory Listing
Modified Fri Mar 19 09:46:33 2004 UTC (15 years, 3 months ago) by dpavlin
update SourceForge repository


Revision 69 - Directory Listing
Modified Thu Mar 18 23:07:21 2004 UTC (15 years, 3 months ago) by dpavlin
more verbose adding of titles


Revision 68 - Directory Listing
Modified Thu Mar 18 11:14:49 2004 UTC (15 years, 3 months ago) by dpavlin
don't save empty pages in index


Revision 67 - Directory Listing
Modified Wed Mar 17 12:22:26 2004 UTC (15 years, 3 months ago) by dpavlin
if path is specified use progspider


Revision 66 - Directory Listing
Modified Wed Mar 17 12:19:42 2004 UTC (15 years, 3 months ago) by dpavlin
index pdf files page-by-page


Revision 65 - Directory Listing
Modified Wed Mar 17 12:19:14 2004 UTC (15 years, 3 months ago) by dpavlin
fixed back-references in regexps


Revision 63 - Directory Listing
Modified Fri Feb 6 13:29:39 2004 UTC (15 years, 4 months ago) by dpavlin
convert pdf files when indexing with progspider


Revision 62 - Directory Listing
Modified Fri Feb 6 13:27:51 2004 UTC (15 years, 4 months ago) by dpavlin
small improvements


Revision 61 - Directory Listing
Modified Thu Jan 29 18:26:19 2004 UTC (15 years, 4 months ago) by dpavlin
better extracting of titles


Revision 60 - Directory Listing
Modified Thu Jan 29 18:25:55 2004 UTC (15 years, 4 months ago) by dpavlin
fix for pages_in_set when there are no results (I should really report this
as a bug!)


Revision 59 - Directory Listing
Modified Mon Jan 26 08:08:41 2004 UTC (15 years, 4 months ago) by dpavlin
implemented usage of SWISH::API instead of SWISH::Fork, new pages
using Data::Pageset


Revision 58 - Directory Listing
Modified Mon Jan 26 08:05:39 2004 UTC (15 years, 4 months ago) by dpavlin
use HTML or HTML2 parser


Revision 57 - Directory Listing
Modified Sun Jan 25 16:49:50 2004 UTC (15 years, 4 months ago) by dpavlin
various fixes


Revision 56 - Directory Listing
Modified Fri Jan 23 13:10:40 2004 UTC (15 years, 4 months ago) by dpavlin
better support for DocBook generated files


Revision 55 - Directory Listing
Modified Tue Jan 20 20:36:32 2004 UTC (15 years, 5 months ago) by dpavlin
moved rot13.config to config/ dir


Revision 54 - Directory Listing
Modified Tue Jan 20 18:42:05 2004 UTC (15 years, 5 months ago) by dpavlin
make script less chatty


Revision 53 - Directory Listing
Modified Tue Jan 20 18:41:38 2004 UTC (15 years, 5 months ago) by dpavlin
configuration moved to config/ directory


Revision 52 - Directory Listing
Modified Tue Jan 20 18:40:52 2004 UTC (15 years, 5 months ago) by dpavlin
common configuration for file-sytem indexing


Revision 51 - Directory Listing
Modified Tue Jan 20 18:40:06 2004 UTC (15 years, 5 months ago) by dpavlin
better removal of JavaScript


Revision 50 - Directory Listing
Modified Tue Jan 20 18:13:32 2004 UTC (15 years, 5 months ago) by dpavlin
support for 0-size files


Revision 49 - Directory Listing
Modified Tue Jan 20 16:02:27 2004 UTC (15 years, 5 months ago) by dpavlin
example configuraion which craws www.rot13.org


Revision 48 - Directory Listing
Modified Tue Jan 20 16:01:13 2004 UTC (15 years, 5 months ago) by dpavlin
removed debugging output


Revision 47 - Directory Listing
Modified Tue Jan 20 15:58:15 2004 UTC (15 years, 5 months ago) by dpavlin
Start parallel swish-e to index multiple sets of documents.
More info at: http://blog.rot13.org/index.cgi/id_14


Revision 46 - Directory Listing
Modified Sat Jan 17 23:57:55 2004 UTC (15 years, 5 months ago) by dpavlin
- moved text/html content filtering to filter.pm to faciliate code re-use
- added progspider which can be used with -S prog to crawl files and
  use filtering subroutines


Revision 45 - Directory Listing
Modified Wed Nov 19 12:07:07 2003 UTC (15 years, 7 months ago) by dpavlin
fixes and improvements


Revision 44 - Directory Listing
Modified Mon Aug 4 16:41:14 2003 UTC (15 years, 10 months ago) by dpavlin
added some html and URI of indexed content


Revision 43 - Directory Listing
Modified Sun Aug 3 21:36:16 2003 UTC (15 years, 10 months ago) by dpavlin
added template make and shell script which merges all indexes


Revision 42 - Directory Listing
Modified Tue Jul 29 10:40:58 2003 UTC (15 years, 10 months ago) by dpavlin
better handling of chars in URL, support for
<!-- noindex -->, <!-- index --> which is supported natively in swish 2.4


Revision 41 - Directory Listing
Modified Sun Jun 1 12:13:36 2003 UTC (16 years ago) by dpavlin
- support for more than one affix or findaffix file at same time


Revision 40 - Directory Listing
Modified Sun Jun 1 11:45:19 2003 UTC (16 years ago) by dpavlin
- support for listing of files in .tar.gz; decompressing of .gz and .bz2
  content
- changed order of arguments for swishspider: now baseurl,url (but it's
  backwards compatibile, so your old configurations will work)
- do html fixup just on html files (to prevent binary archive corruption)
- crawl sites that have frames


Revision 39 - Directory Listing
Modified Sun Jun 1 11:41:39 2003 UTC (16 years ago) by dpavlin
support for affix and findaffix in same configuration file


Revision 38 - Directory Listing
Modified Tue May 20 21:01:11 2003 UTC (16 years, 1 month ago) by dpavlin
more logical place for translation


Revision 37 - Directory Listing
Modified Tue May 20 20:57:31 2003 UTC (16 years, 1 month ago) by dpavlin
updated Croatian translation


Revision 36 - Directory Listing
Modified Tue May 20 20:41:09 2003 UTC (16 years, 1 month ago) by dpavlin
additional properties example


Revision 35 - Directory Listing
Modified Tue May 20 20:10:16 2003 UTC (16 years, 1 month ago) by dpavlin
fix "Use of uninitialized value" in apache error.log


Revision 34 - Directory Listing
Modified Sun May 4 12:08:29 2003 UTC (16 years, 1 month ago) by dpavlin
added optional title, and fixed strip url


Revision 33 - Directory Listing
Modified Sun May 4 01:31:31 2003 UTC (16 years, 1 month ago) by dpavlin
usage for "strip url" option, fix for indexing of whole host (without
path in URL argument)


Revision 32 - Directory Listing
Modified Wed Apr 30 12:40:09 2003 UTC (16 years, 1 month ago) by dpavlin
added make_config.pl which creates swish config file
added checkbox to hide document properties (like content, size etc)
remove comments between <html> and <head> which confuse swish


Revision 31 - Directory Listing
Modified Mon Mar 24 16:14:44 2003 UTC (16 years, 2 months ago) by dpavlin
save document title too


Revision 30 - Directory Listing
Modified Mon Mar 24 09:57:44 2003 UTC (16 years, 3 months ago) by dpavlin
added instructions about formating of html before indexing it (and added
ability to unroll wrongly splited tags in form which is acceptable to swish)


Revision 29 - Directory Listing
Modified Mon Mar 24 09:04:57 2003 UTC (16 years, 3 months ago) by dpavlin
escape special characters in title


Revision 28 - Directory Listing
Modified Fri Mar 21 22:23:06 2003 UTC (16 years, 3 months ago) by dpavlin
don't store index dir under CVS


Revision 27 - Directory Listing
Modified Fri Mar 21 22:01:59 2003 UTC (16 years, 3 months ago) by dpavlin
search for explicit path, added examples


Revision 26 - Directory Listing
Modified Fri Mar 21 21:28:21 2003 UTC (16 years, 3 months ago) by dpavlin
added limit to path (and save swishdocpath to database to enable that)


Revision 25 - Directory Listing
Modified Fri Mar 21 21:27:51 2003 UTC (16 years, 3 months ago) by dpavlin
ignore some files in CVS


Revision 24 - Directory Listing
Modified Fri Mar 21 21:16:55 2003 UTC (16 years, 3 months ago) by dpavlin
better design


Revision 23 - Directory Listing
Modified Fri Mar 21 21:10:51 2003 UTC (16 years, 3 months ago) by dpavlin
added limit to path


Revision 22 - Directory Listing
Modified Tue Mar 18 20:24:57 2003 UTC (16 years, 3 months ago) by dpavlin
properties are optional


Revision 21 - Directory Listing
Modified Tue Mar 18 20:20:11 2003 UTC (16 years, 3 months ago) by dpavlin
support for different properties in output (aside from standard ones) and
formatting of output for each hit


Revision 20 - Directory Listing
Modified Tue Mar 18 19:08:56 2003 UTC (16 years, 3 months ago) by dpavlin
better explanation


Revision 19 - Directory Listing
Modified Sun Mar 16 22:08:17 2003 UTC (16 years, 3 months ago) by dpavlin
I use findaffix output and not affix :-)


Revision 18 - Directory Listing
Modified Sun Mar 16 21:59:10 2003 UTC (16 years, 3 months ago) by dpavlin
decode all strings before output to charset defined in xml file


Revision 17 - Directory Listing
Modified Sun Mar 16 21:45:23 2003 UTC (16 years, 3 months ago) by dpavlin
all.xml is english template, while rot13.xml is croatian one


Revision 16 - Directory Listing
Modified Sun Mar 16 21:44:42 2003 UTC (16 years, 3 months ago) by dpavlin
moved all text into configuration file


Revision 15 - Directory Listing
Modified Sun Mar 16 21:31:55 2003 UTC (16 years, 3 months ago) by dpavlin
support for image map and skip pictures (speedup)


Revision 12 - Directory Listing
Modified Sun Mar 16 21:20:22 2003 UTC (16 years, 3 months ago) by dpavlin
Initial revision


Revision 11 - Directory Listing
Modified Sun Mar 16 21:16:41 2003 UTC (16 years, 3 months ago) by dpavlin
Initial revision


Revision 8 - Directory Listing
Modified Sun Mar 16 21:06:43 2003 UTC (16 years, 3 months ago) by dpavlin
Initial revision


Revision 7 - Directory Listing
Modified Sun Mar 16 21:02:29 2003 UTC (16 years, 3 months ago) by dpavlin
Initial revision


Revision 4 - Directory Listing
Modified Tue Jun 4 07:04:34 2002 UTC (17 years ago) by dpavlin
Initial revision


Revision 1 - Directory Listing
Added Tue Jun 4 06:39:53 2002 UTC (17 years ago) by dpavlin
Initial revision


  ViewVC Help
Powered by ViewVC 1.1.26