Parent Directory
|
Revision Log
Links to HEAD: | (view) (annotate) |
Sticky Revision: |
one more level in tree, nodes with children are no longer links.
refactore tree output into data-driven one (and recursive)
added show/hide to tree view
better progress_bar, more documentation
low_mem option for desktop class-machines
warn, don't die
create tree structure from input data
clean old data before generating new, create JavaScript indexes, fix inserting into index
create index files for bfilter
implemented filtered sorted indexes
added sorted index using WebPAC::Index module
create index with much larger B, found jsFind bug.
moved headline information into $webpac->{'headline'} after data_structure is called. This makes headline desapier from output templates, and namebles new template veriable 'headline' to contain headline.
fix path for index
first cut into making jsFind-based search
added progress_bar
save mfn as field v000, _get logger handles calls from main as it should, support for <filename> tag
a lot more logging, lookups are now working as expected (and documented)
Log4perl implementation
method output using Template Toolkit to produce output
make in-memory data_structure
open_import_xml, debug option to new
fetch_rec method
implemeted eval{...}
format seems to work
implement limit_mfn
WebPac -> WebPAC
varous clenups
Object-orineted design re-implementation: simple field substitution and lookups are working well. Added some documentation about new features.
don't materialize hash values which are undef
first commit of new code
new trunk for webpac v2
print warning if type is not handled (probably a typo)
implement my_unac_string function, and my_unac_filter option in global.conf which you *REALLY* want to use if you don't have only clean 7-bit characters in your data
You can now specify configuration file as command-line option, and if you don't do that, it will use default one called all2xml.conf
delimiter and append now works as expected
Implemented new form of delimiters like this: <tag> <delimiter>, </delimiter> <value>200a</value> </tag> which is equivavelnt to following old mark-up: <tag delimiter=", ">200a</tag> but, it won't loose spaces in attribute values (which are invalid by XML specification and XML::Simple removes them so WebPac never get them)
<config> tags (which use values from all2xml.conf) are now properly handled if there is more than one in same swish tag. However, to use <config type="index"> is useless IMHO, and <config type="index_lookup"> is not implemented.
ported r260 from hidra branch: moved eval to parse_format.pm where it belongs. Also changed eval format to: eval{v901^a eq "Mikrotezaurus"} (please note same format as in ISIS formating language)
ported 257:258 from hidra branch all2xml.pl - fix for swish without filter openisis/perl/OpenIsis.pm - removed warning
ported r254 from hidra branch
ported r248:252 from hidra branch: r248: much improved installation instructions, especially for Debian GNU/Linux distributions r249: changed use of Spreadsheet::ParseExcel and MARC to require/import so that dependency on those modules can be resolved in runtime. r250: finished installation documentation r251: removing dependency on HTML::Parser would ease installation r252: smaller eval{} fiexes. eval{} logic should really move to parse_format.pm
eval{...} now works for type="swish" also...
lookup_key and lookup_val types now support filters
clear memory cache when opening new file lookup
important bug fix for bug introduced in 1.57: it might eat your data if you are not using filter. This one was hard do find...
Changed never userd format configuration option for import_xml to marc_format to prevent clash with format for output. If you don't specify it (as I never do) it will default to 'usmarc' which is probably the right thing (tm).
brown-bag bug: I was using MARC.pm wrong: now whole file will be loaded at start of indexing, changing memory usage to much more step-like, but that enables real progress indicator and few seconds gain in indexing speed.
thesaurus is finally working... It contains recursive entries to parnet term, and we actually needed to display narrower terms, so mem_lookup was created. Important changes: - you can write eval{"901a" eq "Mikrotezaurus"} within <isis> tag and if expression evaluates to false, no content will be outputed (It's used to hide microtesarus terms from lover level descriptors) - mem_lookup.pm now supports formats: you can write something like [a:5614];;[d:[a:5614]] and it will correctly embed values
fixed filter delimiter bug
Changed behaviour of creating data for swish_exact when using type="index". Now every line is separate entry in swish_exact. That will create additional clutter in index (fields which wouldn't be used because we are not insering them in index), but you will have to bare with this for now.
correct support for swish_exact when there are repeatable fields
don't repeat field name if same as last, support format_name and format_delimiter on field level if using iterate_by_page (without this, it's really hard to get useful formating when using iterate_by_page), don't warn on rare occasion (which is faulty import_xml definition, but anyway...) when using append="1"
implemented index_delimiter which enables to to format index entries in format (values to be inserted in index);;(values to be displayed) if there is definition of index_delimiter=";;". This will allow you to index (and search) through values from original database and still have ability to display lookup fields.
make index with lookup field working with iterate on page
fix swish_exact fields so that they don't show up in display
invalidate memory cache when needed
major improvements: you can select order of scanning in each topic tag to be eather by line (which is default, repeatable fields in one line will be unrolled) or page-by-page (using new interate_by_page="1" attribute). New page-by-page mode is really useful with lookups (because you can append fields with lookups in same line, but using two tags), but it will create multiple rows in html output.
support for lookup fields. Implemented using GDBM or TDB (which I recommend because it's fastest implementation)
Re-wrote parsing for ISO-type data (isis, marc) to use in-memory cache of format... 10% speed improvement and cleaner code. Include filter functions just once.
implemented filter which can replace (or be used together with) unac_string from Text::Unaccent
Added type="swish_exact" to save data into swish index with boundaries xxbxx data xxexxx. This is helpful to implement exact match from beginning of query and exact match to full query which are defined using e[nr] field in web user interface (with same [nr] as f[nr] and v[nr] fields) which have to have value 1 (from beginning) 2 (from end, not that useful...) or 3 (1+2 - exact match)
implemented formats which can be used to produce links between records in WebPac (documented in README.links)
fixed filters (again)
Aargh! I should really go to sleep or make PostgeSQL replication or something...
I removed too much: this always added delimiter before first element
another fix for repeatable fields
fix repeatable fields in index data
erase also *.PTR files
Overcome limit of 32 open databases. Unfortunatly, OpenIsis in current version (0.9.0) doesn't support close call, so you need patch from: https://www.rot13.org/~dpavlin/projects/openisis-0.9.0-perl_close.diff
check for bogus *.TXT databases (with zero length or 0 records) and erase them to force OpenIsis to use binary files
remove fake progress bar also
removed debugging
- better error reporing from OpenIsis - added show_progress in global.conf to turn off progress bar
fixed ordering
ability to join repeatable fields before inseting into index
repeatable fields (broken when other input formats where introduced) work again
the great rename: isis2xml.* -> all2xml.*
support for new feed format which have decimal number of field, semicolumn and space at beginning of each line (like: 0: data)
implemented feed method which calls external program that returns data line-by-line
added MARC file import
added config tag which can read any variable from isis2xml.conf file for that library
support type and sub-types (in form type_subtype)
don't choke on input which iconv can't convert
use start_row from excel.xml
added Microsoft Excel file import
move database arguments to .conf file
fix
fixed alphabet soup -- characters encoding should really work now!
filter fix && optimisation
major de-mungling of different codepages: use same codepage inside perl (as opposed to UTF-8) and in files on disk
last changes; completly broken charsets
append="1" fix
display fields using order="" attribute
repeatable field support, filter functions added, broken charset (again!)
fix
add filter="name" for fields (to correct strane input data or make variations for indexing)
fix index insertion
added configuration file with database descriptions, moved isis.xml definition file in separate directory (in preparation for MARK), support for different encodings in different files, various fixes, improvements and badly written parts which will change ;-)
bunch of changes: make design more modular, implement index (partial implementation) and other small and big changes
renamed "old" index to swish, and introduced index which is -- index; implemented using PostgreSQL for now.
major modifications to produce first (non-working) version of Web CGI interface.
require 1.02 version of Text::Unaccent (1.01 can't pass 'make test' here!)
remove subfield definition from values which are displayed and indexed
first really working version -- creates xml file for swish + swish config
Initial revision
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.
ViewVC Help | |
Powered by ViewVC 1.1.26 |