/[Grep]/lib/Grep
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Log of /lib/Grep

View Directory Listing Directory Listing


Sticky Revision:

Revision 134 - Directory Listing
Modified Tue May 1 21:06:10 2007 UTC (17 years ago) by dpavlin
More DWIM changes: scrape can now also return multiple elements, which will
be separated in results by <hr/>.
Attribute values are now treated as words surrounded by word boundary (\b)
so multiple classes separated with spaces will now be treathed correctly.

Revision 133 - Directory Listing
Modified Tue May 1 20:50:14 2007 UTC (17 years ago) by dpavlin
Very experimental support for selecting multiple wrapper divs in which
we will then try to find search results -- this change is mostly needed
for sites which have so little semantic markup that we need to pass
several divs of which just one have results.
To Source modules everything should "just work"(tm).
PunBB forum is to blame for this feature, so it's new source. 

Revision 132 - Directory Listing
Modified Tue May 1 12:20:49 2007 UTC (17 years ago) by dpavlin
Implemented redirect_single_result option for sources (MoinMoin uses that), and documented
element_by_triplet and scrape

Revision 131 - Directory Listing
Modified Tue May 1 10:47:23 2007 UTC (17 years ago) by dpavlin
synced with KinoSearch subversion r2382

Revision 130 - Directory Listing
Modified Sun Apr 29 12:07:06 2007 UTC (17 years ago) by dpavlin
remove debugging snippet

Revision 128 - Directory Listing
Modified Sun Apr 29 00:48:04 2007 UTC (17 years ago) by dpavlin
use superuser when reindex-ing

Revision 127 - Directory Listing
Modified Sun Apr 29 00:16:05 2007 UTC (17 years ago) by dpavlin
Move from Lucene (mostly because locking problems prevented fastcgi
deployment, and later haunted development server too) to KinoSearch.
For good measure added (slow) de-duplication and increased version to 0.02

Revision 126 - Directory Listing
Modified Sat Apr 28 22:53:37 2007 UTC (17 years ago) by dpavlin
rename templates to triplets to be in sync with method name

Revision 125 - Directory Listing
Modified Sat Apr 28 22:52:08 2007 UTC (17 years ago) by dpavlin
fix to really support triplets in templates

Revision 124 - Directory Listing
Modified Sat Apr 28 22:51:43 2007 UTC (17 years ago) by dpavlin
It seems that Jifty::DBI::Record has problem with https:// urls which got
overriden in URI::https, so we convert them to http:// and hope for the
best.

Revision 123 - Directory Listing
Modified Sat Apr 28 17:55:37 2007 UTC (17 years ago) by dpavlin
support legacy singe result

Revision 122 - Directory Listing
Modified Sat Apr 28 17:54:53 2007 UTC (17 years ago) by dpavlin
from my limited sample, I would say that all DokuWiki results are inside
<div class="search_result">

Revision 121 - Directory Listing
Modified Sat Apr 28 13:08:50 2007 UTC (17 years ago) by dpavlin
Refactor scraping by extracting element_by_triplet into own method, now
every parametar accepts one argument (tag) or multiple number of triplets
(tag, attribute, value)

Revision 120 - Directory Listing
Modified Mon Apr 9 19:23:05 2007 UTC (17 years ago) by dpavlin
fix warning

Revision 118 - Directory Listing
Modified Sun Apr 1 11:53:22 2007 UTC (17 years, 1 month ago) by dpavlin
tweaks

Revision 115 - Directory Listing
Modified Sun Mar 25 11:55:08 2007 UTC (17 years, 1 month ago) by dpavlin
collect dd instead of dt

Revision 114 - Directory Listing
Modified Sun Mar 25 11:42:49 2007 UTC (17 years, 1 month ago) by dpavlin
scraper for unknown wiki engine at http://www.nslu2-linux.org/

Revision 113 - Directory Listing
Modified Sun Mar 25 11:41:54 2007 UTC (17 years, 1 month ago) by dpavlin
dump number of result nodes for better debugging

Revision 112 - Directory Listing
Modified Wed Mar 14 21:10:53 2007 UTC (17 years, 1 month ago) by dpavlin
Grep::Search really shouldn't be Jifty::Object beause it's serialization
within Jifty confuse Lucene locks. We need just ->log anyway...

Revision 111 - Directory Listing
Modified Wed Mar 14 21:09:57 2007 UTC (17 years, 1 month ago) by dpavlin
finish is called from collection so this is redundant

Revision 110 - Directory Listing
Modified Wed Mar 14 20:02:19 2007 UTC (17 years, 1 month ago) by dpavlin
another bunch of various tweaks, but Lucene still doesn't lock index right

Revision 109 - Directory Listing
Modified Wed Mar 14 18:46:37 2007 UTC (17 years, 1 month ago) by dpavlin
rewrite Grep::Search to be isa Jifty::Object

Revision 107 - Directory Listing
Modified Tue Mar 6 23:00:32 2007 UTC (17 years, 1 month ago) by dpavlin
show list of sources on home page if user is logged in

Revision 106 - Directory Listing
Modified Tue Mar 6 16:17:19 2007 UTC (17 years, 1 month ago) by dpavlin
 r860@mjesec:  dpavlin | 2007-03-06 00:11:49 +0100
 add front page explaning why we need users to sign in


Revision 104 - Directory Listing
Modified Sun Mar 4 23:29:37 2007 UTC (17 years, 1 month ago) by dpavlin
isa Jifty::Object (so that ->log works)

Revision 103 - Directory Listing
Modified Sun Mar 4 22:51:01 2007 UTC (17 years, 1 month ago) by dpavlin
fetch maximum of 15 pages from remote wiki when scraping results

Revision 102 - Directory Listing
Modified Sun Mar 4 22:16:23 2007 UTC (17 years, 1 month ago) by dpavlin
removed all debug warn(s) or move them to $self->log->debug

Revision 101 - Directory Listing
Modified Sun Mar 4 22:04:58 2007 UTC (17 years, 1 month ago) by dpavlin
added support for DokuWiki (not superb, but working)

Revision 100 - Directory Listing
Modified Sat Feb 24 12:32:31 2007 UTC (17 years, 2 months ago) by dpavlin
isa Jifty::Object

Revision 99 - Directory Listing
Modified Sat Feb 24 12:32:09 2007 UTC (17 years, 2 months ago) by dpavlin
really save feed.xml content, put item link under title if there is no title

Revision 98 - Directory Listing
Modified Sat Feb 24 12:16:57 2007 UTC (17 years, 2 months ago) by dpavlin
code cleaup, now isa Jifty::Object, more debug loging

Revision 96 - Directory Listing
Modified Sat Feb 24 11:56:18 2007 UTC (17 years, 2 months ago) by dpavlin
explicitly destroy $parent passed to plugins as another try to get around
Lucene's locking problems
use HTML::ResolveLink to resolve all links before add_record

Revision 95 - Directory Listing
Modified Sat Feb 24 11:42:10 2007 UTC (17 years, 2 months ago) by dpavlin
undef all Lucene vars on finish

Revision 94 - Directory Listing
Modified Sat Feb 24 11:33:57 2007 UTC (17 years, 2 months ago) by dpavlin
fix warning

Revision 93 - Directory Listing
Modified Sat Feb 24 11:16:17 2007 UTC (17 years, 2 months ago) by dpavlin
WebGUI search scraper

Revision 92 - Directory Listing
Modified Sat Feb 24 11:16:05 2007 UTC (17 years, 2 months ago) by dpavlin
redirect errors and warnings to warn so they are all non-fatal

Revision 90 - Directory Listing
Modified Fri Feb 23 23:57:42 2007 UTC (17 years, 2 months ago) by dpavlin
skip form submit for MoinMoin

Revision 89 - Directory Listing
Modified Fri Feb 23 21:54:39 2007 UTC (17 years, 2 months ago) by dpavlin
report feed name in error message

Revision 88 - Directory Listing
Modified Fri Feb 23 21:52:29 2007 UTC (17 years, 2 months ago) by dpavlin
treat missing results div as no results and don't die

Revision 87 - Directory Listing
Modified Fri Feb 23 21:17:54 2007 UTC (17 years, 2 months ago) by dpavlin
added two variants of MediaWiki

Revision 86 - Directory Listing
Modified Fri Feb 23 21:16:44 2007 UTC (17 years, 2 months ago) by dpavlin
added hooks to Grep::Source->save to keep useful snippets of html in /tmp/grep (if writable)

Revision 85 - Directory Listing
Modified Fri Feb 23 20:47:08 2007 UTC (17 years, 2 months ago) by dpavlin
refactor most of code for scraping into common code making plugins really simple

Revision 84 - Directory Listing
Modified Fri Feb 23 18:51:40 2007 UTC (17 years, 2 months ago) by dpavlin
hush

Revision 83 - Directory Listing
Modified Fri Feb 23 18:38:48 2007 UTC (17 years, 2 months ago) by dpavlin
remove *all* arguments from page uris

Revision 82 - Directory Listing
Modified Fri Feb 23 18:10:26 2007 UTC (17 years, 2 months ago) by dpavlin
really work like designed as opposed to returning first available plugin (ouch!)

Revision 81 - Directory Listing
Modified Fri Feb 23 18:09:41 2007 UTC (17 years, 2 months ago) by dpavlin
decode special chars from bookmarklet

Revision 80 - Directory Listing
Modified Fri Feb 23 18:09:19 2007 UTC (17 years, 2 months ago) by dpavlin
set correct class (strinfigy) for source

Revision 78 - Directory Listing
Modified Fri Feb 23 17:34:20 2007 UTC (17 years, 2 months ago) by dpavlin
give base_uri to Feed::Find->find_in_html so that relative links works

Revision 77 - Directory Listing
Modified Fri Feb 23 17:33:43 2007 UTC (17 years, 2 months ago) by dpavlin
remove arguments from page uri to make it unique

Revision 76 - Directory Listing
Modified Fri Feb 23 17:17:50 2007 UTC (17 years, 2 months ago) by dpavlin
close searcher and undef writer after close to prevent orphan locks on lucene

Revision 75 - Directory Listing
Modified Fri Feb 23 17:16:51 2007 UTC (17 years, 2 months ago) by dpavlin
remove page_tree when note needed any more

Revision 74 - Directory Listing
Modified Fri Feb 23 17:16:33 2007 UTC (17 years, 2 months ago) by dpavlin
fix warning

Revision 73 - Directory Listing
Modified Fri Feb 23 11:48:39 2007 UTC (17 years, 2 months ago) by dpavlin
each feed now has default source class which is called for it. Added PhpWiki
source. Code still has problems with Lucene locking.

Revision 72 - Directory Listing
Modified Fri Feb 23 09:54:28 2007 UTC (17 years, 2 months ago) by dpavlin
another great refactoring: added new Source object which implements
searching within feed (which now can be anything as long as it produce fields
which somewhat resamble RSS feed). Source plugins implement just (site or
source format specific) fetching of items. 

Sample implementation of MoinMoin scraper, which fetch full pages from wiki
for results, so it has performance impact on remote wiki, be kind to it.

Revision 71 - Directory Listing
Modified Thu Feb 22 18:12:35 2007 UTC (17 years, 2 months ago) by dpavlin
refactor XML feed parsing and Grep::Model::ItemCollection generation...

Revision 70 - Directory Listing
Modified Thu Feb 22 17:59:56 2007 UTC (17 years, 2 months ago) by dpavlin
make models less chatty about current_user_can in debug log

Revision 69 - Directory Listing
Modified Thu Feb 22 16:57:23 2007 UTC (17 years, 2 months ago) by dpavlin
we will always fetch url using our LWP::Ua, xhich doesn't require locally
modified Feed::Find and has additional benefit of providing xml feed
detection (so you can now again add URIs directly to feed)

Revision 67 - Directory Listing
Modified Wed Feb 21 21:39:52 2007 UTC (17 years, 2 months ago) by dpavlin
ups, I forgot to commit CurrentUser which knows how to return user's feeds

Revision 65 - Directory Listing
Modified Wed Feb 21 20:24:34 2007 UTC (17 years, 2 months ago) by dpavlin
different users can have same URIs but with different credentials...

Revision 64 - Directory Listing
Modified Wed Feb 21 20:22:07 2007 UTC (17 years, 2 months ago) by dpavlin
store _owner_id in index and add them to search queries so that we get just
rearch results we should see (as opposed to Jifty current_user_can throwing
them away much later)

Revision 63 - Directory Listing
Modified Wed Feb 21 19:52:31 2007 UTC (17 years, 2 months ago) by dpavlin
proper ACL model for Item so that users only see their own

Revision 60 - Directory Listing
Modified Wed Feb 21 19:31:26 2007 UTC (17 years, 2 months ago) by dpavlin
dump Lucene query

Revision 59 - Directory Listing
Modified Wed Feb 21 19:11:06 2007 UTC (17 years, 2 months ago) by dpavlin
better error messages which enable Grep to actually work without index for the first time

Revision 58 - Directory Listing
Modified Wed Feb 21 19:10:20 2007 UTC (17 years, 2 months ago) by dpavlin
Another mish-mash of different changes rolled into one commit: added feed owners,
better support for bootstraping without data (actaully, fixes to be able to do so...)

Revision 57 - Directory Listing
Modified Wed Feb 21 17:42:24 2007 UTC (17 years, 2 months ago) by dpavlin
catching signals (as expected) broke Jifty in so many ways...

Revision 55 - Directory Listing
Modified Wed Feb 21 16:19:31 2007 UTC (17 years, 2 months ago) by dpavlin
added form element to setup number of results

Revision 53 - Directory Listing
Modified Wed Feb 21 16:06:25 2007 UTC (17 years, 2 months ago) by dpavlin
better unrolling of object values (why do I have to peek inside {values} to make title work?),
added snippet

Revision 49 - Directory Listing
Modified Wed Feb 21 13:01:34 2007 UTC (17 years, 2 months ago) by dpavlin
re-wrote Lucene support with fresh eyes

Revision 47 - Directory Listing
Modified Wed Feb 21 03:04:48 2007 UTC (17 years, 2 months ago) by dpavlin
use real full-text search engine (Lucene in this case) for Search action,
added Grep::Search helper object

Revision 46 - Directory Listing
Modified Tue Feb 20 22:44:59 2007 UTC (17 years, 2 months ago) by dpavlin
understand messages from load_or_create to count new entries (and produce
nice sentences about results)

Revision 44 - Directory Listing
Modified Tue Feb 20 21:55:24 2007 UTC (17 years, 2 months ago) by dpavlin
refactoring: after testing IPC::PubSub with vairous back-ends, it seems that COMET isn't a
good choice if you want predictible delivery. It also has problems with delay, because it's
a, uh, bus....

However, this refactoring has a good side: code size is reduced and is now easier to handle.

Revision 43 - Directory Listing
Modified Tue Feb 20 16:26:56 2007 UTC (17 years, 2 months ago) by dpavlin
small refactoring for better debugging messages while exploring Jifty::Event
way of match(ing) events (while my use is more filter-like) and de-crufting code
in places

Revision 42 - Directory Listing
Modified Tue Feb 20 12:26:14 2007 UTC (17 years, 2 months ago) by dpavlin
A bit better messages, Fetch action result now include count as number of results,
remote feeds have now run parametar to, well, run them :-)

Revision 41 - Directory Listing
Modified Tue Feb 20 11:53:13 2007 UTC (17 years, 2 months ago) by dpavlin
hush debug log

Revision 39 - Directory Listing
Modified Tue Feb 20 10:51:36 2007 UTC (17 years, 2 months ago) by dpavlin
fix warning with anonymous users

Revision 38 - Directory Listing
Modified Tue Feb 20 10:51:15 2007 UTC (17 years, 2 months ago) by dpavlin
better message

Revision 37 - Directory Listing
Modified Tue Feb 20 10:32:30 2007 UTC (17 years, 2 months ago) by dpavlin
current_user_can everything (for now) if logged in

Revision 36 - Directory Listing
Modified Tue Feb 20 10:14:28 2007 UTC (17 years, 2 months ago) by dpavlin
a bit of glue to use Login plugin

Revision 34 - Directory Listing
Modified Mon Feb 19 21:57:07 2007 UTC (17 years, 2 months ago) by dpavlin
small cleanup

Revision 33 - Directory Listing
Modified Mon Feb 19 21:26:30 2007 UTC (17 years, 2 months ago) by dpavlin
report better messages

Revision 32 - Directory Listing
Modified Mon Feb 19 20:53:09 2007 UTC (17 years, 2 months ago) by dpavlin
actually use transported message in event, result hits message now include source

Revision 31 - Directory Listing
Modified Mon Feb 19 20:40:55 2007 UTC (17 years, 2 months ago) by dpavlin
re-wrote parts of Fetch action to better support it's publish mode
(mostly with error reporting)

Revision 30 - Directory Listing
Modified Mon Feb 19 20:07:48 2007 UTC (17 years, 2 months ago) by dpavlin
transfer item_fragment correctly to results, report number of new results or error
(needs source feed info), removed some debugging code

Revision 29 - Directory Listing
Modified Mon Feb 19 18:25:12 2007 UTC (17 years, 2 months ago) by dpavlin
Fetch action now returns Grep::Model::ItemCollection and new fragment on the right allows
using PubSub refresh of left pane with new results

Revision 28 - Directory Listing
Modified Mon Feb 19 16:28:00 2007 UTC (17 years, 2 months ago) by dpavlin
sweeping changes to include PubSub backend JiftyDBI to make publishing work,
re-organize templates into (hopefully) more meaningful hierarchy,
and a new Search action to drive it all.

Revision 26 - Directory Listing
Modified Mon Feb 19 10:58:27 2007 UTC (17 years, 2 months ago) by dpavlin
a sprinke of ajax magic and better detection of grep in search URIs

Revision 25 - Directory Listing
Modified Sun Feb 18 20:45:59 2007 UTC (17 years, 2 months ago) by dpavlin
implement item_fragment selection in UI, better error message,
warn on empty ATOM feeds instead of die

Revision 22 - Directory Listing
Modified Sun Feb 18 15:44:29 2007 UTC (17 years, 2 months ago) by dpavlin
created separate page for fetching new results

Revision 21 - Directory Listing
Modified Sun Feb 18 15:07:03 2007 UTC (17 years, 2 months ago) by dpavlin
Bookmarklet is now designed to work on html results page (to capture cookies so that
Grep will later be able to fetch feeds with user credentials creating single sign-on
scenario :-), and it will automatically (using new requirement Feed::Find) find feed
on that page.

For that to work, new action AddFeed was added.

Revision 20 - Directory Listing
Modified Sun Feb 18 12:53:48 2007 UTC (17 years, 2 months ago) by dpavlin
add feed option to navigation

Revision 19 - Directory Listing
Modified Sun Feb 18 12:51:26 2007 UTC (17 years, 2 months ago) by dpavlin
added cookie and search_uri to feed model

Revision 12 - Directory Listing
Modified Sun Feb 18 00:35:48 2007 UTC (17 years, 2 months ago) by dpavlin
helpful hints

Revision 9 - Directory Listing
Modified Sat Feb 17 23:32:33 2007 UTC (17 years, 2 months ago) by dpavlin
huge sweeping changes including: PubSub infrastructure and well results :-)

Revision 8 - Directory Listing
Modified Sat Feb 17 21:24:48 2007 UTC (17 years, 2 months ago) by dpavlin
with no results report failure which will keep entered data in form. nice :-)

Revision 7 - Directory Listing
Modified Sat Feb 17 21:22:17 2007 UTC (17 years, 2 months ago) by dpavlin
support %s in feed URI as placeholder for query string

Revision 5 - Directory Listing
Modified Sat Feb 17 18:06:42 2007 UTC (17 years, 2 months ago) by dpavlin
store actual content body, not class name :-)

Revision 3 - Directory Listing
Modified Sat Feb 17 17:10:27 2007 UTC (17 years, 2 months ago) by dpavlin
a try at implementing fetch action

Revision 2 - Directory Listing
Added Sat Feb 17 16:17:18 2007 UTC (17 years, 2 months ago) by dpavlin
items, feeds


  ViewVC Help
Powered by ViewVC 1.1.26