/[swish]/trunk/spider/filter.pm
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Log of /trunk/spider/filter.pm

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (annotate)
Sticky Revision:

Revision 85 - (view) (annotate) - [select for diffs]
Modified Mon Aug 30 11:14:24 2004 UTC (19 years, 6 months ago) by dpavlin
File length: 5241 byte(s)
Diff to previous 74
extract metadata for LJ


Revision 74 - (view) (annotate) - [select for diffs]
Modified Wed Apr 7 12:54:21 2004 UTC (19 years, 11 months ago) by dpavlin
File length: 4833 byte(s)
Diff to previous 71
fix title extraction (again)


Revision 71 - (view) (annotate) - [select for diffs]
Modified Sat Apr 3 15:15:36 2004 UTC (19 years, 11 months ago) by dpavlin
File length: 3962 byte(s)
Diff to previous 69
remove empty lines before <html> so that swish parser will catch <title>
correctly


Revision 69 - (view) (annotate) - [select for diffs]
Modified Thu Mar 18 23:07:21 2004 UTC (20 years ago) by dpavlin
File length: 3619 byte(s)
Diff to previous 65
more verbose adding of titles


Revision 65 - (view) (annotate) - [select for diffs]
Modified Wed Mar 17 12:19:14 2004 UTC (20 years ago) by dpavlin
File length: 3405 byte(s)
Diff to previous 61
fixed back-references in regexps


Revision 61 - (view) (annotate) - [select for diffs]
Modified Thu Jan 29 18:26:19 2004 UTC (20 years, 1 month ago) by dpavlin
File length: 3405 byte(s)
Diff to previous 57
better extracting of titles


Revision 57 - (view) (annotate) - [select for diffs]
Modified Sun Jan 25 16:49:50 2004 UTC (20 years, 1 month ago) by dpavlin
File length: 3144 byte(s)
Diff to previous 56
various fixes


Revision 56 - (view) (annotate) - [select for diffs]
Modified Fri Jan 23 13:10:40 2004 UTC (20 years, 2 months ago) by dpavlin
File length: 3073 byte(s)
Diff to previous 51
better support for DocBook generated files


Revision 51 - (view) (annotate) - [select for diffs]
Modified Tue Jan 20 18:40:06 2004 UTC (20 years, 2 months ago) by dpavlin
File length: 1597 byte(s)
Diff to previous 46
better removal of JavaScript


Revision 46 - (view) (annotate) - [select for diffs]
Added Sat Jan 17 23:57:55 2004 UTC (20 years, 2 months ago) by dpavlin
File length: 1591 byte(s)
- moved text/html content filtering to filter.pm to faciliate code re-use
- added progspider which can be used with -S prog to crawl files and
  use filtering subroutines


This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.

  Diffs between and
  Type of Diff should be a

  ViewVC Help
Powered by ViewVC 1.1.26