Revision 237 (by dpavlin, 2004/03/08 17:43:12) initial import of openisis 0.9.0 vendor drop
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  <title>OpenIsis 0.8.x</title>
	<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
</head>
<body>
<a href="#current">current version</a>
<a href="#isis">isis</a>
<a href="#openisis">openisis</a>
<a href="#howto">howto</a>
<a href="#about">about</a>
<a href="#links">links</a>
<a href="index-es.html">en espanol</a>
<hr>
<img src="openisi.gif" alt="Isi" width="128" height="128" align="RIGHT" border="0"/>
<h2>welcome to OpenIsis.org</h2>
<h3><a name="current">current version</a></h3>
	DEMOS:<br>
	<a href="openisis/Demo?search=development">standard</a>
	or try
	<a href="openisis/unicode?search=&prefix=1&action=search">
	searching for unicode characters</a>
	(<a href="openisis/unicode?search=&prefix=1&action=search&terms=1">
	unicode index</a>)
	<br>
	<br>
	Available for download:<br>
	Sources
	<ul>
	<li><a href="tar/openisis.0.8.7.tgz">Current version 0.8.7</a>
	This probably works on Linux only.
	</li>
	<li><a href="tar/openisis.0.8.6.tgz">version 0.8.6</a></li>
	<li><a href="tar/openisis.0.8.5.2.tgz">version 0.8.5.2</a></li>
	</ul>
	Binaries
	<ul>
	<li><a href="tar/builds/win32/openisis-0860-win.zip">win32 (0.8.6)</a>
	( mingw32 cross-build) should work on all win32 platforms from win95b on.<br>
	Note: If you want to use java you will need the JDK1.3.x<br>
	The current windows version is not yet thread-safe, so using it with a Java
	servlet engine under windows requires some care.
	</li>
	<li><a href="tar/builds/solaris/solaris-20020212.tar.gz">solaris (0.8.4)</a>
	( build on solaris 5.8 )
	Please <a href="mailto:erik@openisis.org">mail</a> us
	if you need a more up-to-date solaris binary.
	<br>Note: the JDK shipped with solaris 5.7 / 5.8 ( java 1.2 ) should work.
	</li>
	</ul>
	News:
	<ul>
	<li>October 19th 2002<br>
	The OpenIsis society ("Verein") has been founded with 15 members.
	Chairman is Erik Grziwotz, other board members are
	Gabi Rohmann, Ingo Struck and Thomas Sonnemann.
	</li>
	<li>0.8.7 October 2002<br>
	Version 0.8.7 supports writing of DOS/WinIsis masterfiles and xrf.
	This currently works fine for a single process on Linux.
	See <a href="doc/writing.txt">writing.txt</a> for details.
	Current TODOs:
	interlock multiprocess writing (PHP, Perl CGI)
	and fix the windows and solaris versions.
	Besides writing support, there is a new streaming record reader,
	which groks various formats like the SYSPAR.PAR, email headers and
	property files, so you can fill your db from such textfiles.
	Next step: new indexing engine.
	</li>
	<li>August 2002<br>
		No new software yet. Still very busy doing metawork.
		We are preparing to set up an organisation to support OpenIsis
		development with much more momentum and a company for professional
		services like help on large scale ISIS installations.
		paperwork on <a href="doc/unirec.txt">the universal ISIS record</a>
	</li>
	<li>July 2002<br>
		some paperwork on <a href="doc/whatabout.txt">
		What is it about ISIS that makes it ISIS?</a>
	</li>
	<li>0.8.6 June 2002<br>
		This version supports basic formatting. While most, especially graphical,
		features of WinISIS or CISIS formatting are not yet supported
		(which are typically not used in a web environment anyway),
		there is support for repeated subfields as declared by MARC for many fields.
		See <a href="doc/formatting.txt">formatting notes</a> for details.
		The perl binding supports formatting (see the test.pl),
		enhanced versions for all those languages are to follow soon.
	</li>
	<li>PHP June 2002<br>
		<a href="mailto:braulio@bsolano.com">Braulio José Solano Rojas</a>
		from Costa Rica created a PHP binding, which can be seen in action at
		<a href="http://galileo.or.cr/php_isis/">galileo</a>.
		Available as <a href="ftp://galileo.or.cr/php_isis.zip">download</a>
		or from sourceforge module php-openisis.
		Also, the <a href="#itb">Institut Teknologi Bandung</a> of Indonesia
		switched it's web index to a PHP/OpenIsis based version.
	</li>
	<li>0.8.5.2 March 2002<br>
		Much enhanced PERL binding. See the OpenIsis.pm included in the sources.
	</li>
	<li>0.8.5.1 March 2002<br>
		Java now has support for basic formatting modes like MHL,
		various HTML-safety modes (like escaping all non-ASCIIs),
		a <a href="jdoc/org/openisis/Rec.html">Vn-field-selector-style method</a>
		and several nice utils.
		Indentation is not properly handled, since there is no easy
		common solution in HTML. Will build tools for a choice of
		standard strategies later ...
	</li>
	<li>0.8.5 March 2002<br>
		Finally some implementation of the query language
		(<a href="openisis/Demo?search=plant+water">demo</a>).
		All the operators are there (including /(tag), but not /(t1,t2...)).
		Every attempt is made to limit the potential costs of even extremely
		stupid queries like "$"^"$", so no historical (#n) or intermediate
		results (for precedence) are stored.
		Queries are processed strictly left to right on a buffer of 1000 hits.
	</li>
	<li>0.8.4 January 2002<br>
		Nearly complete rewrite of search code with support for NEAR conditions.
		Fixed alignment problems in IFP, now works with the unesb db
		(-format aligned) and the cds db as distributed with winisis.<br>
		The cds db we had with previous versions (from an old CDS distribution)
		has a mixed format: aligned leaf files, others unaligned.
		Although support for a mixed format is easily achieved with openisis,
		it will not be included unless somebody has a need for it (send us mail).<br>
		The JSP demo now supports
		<a href="openisis/Demo?db=1&search=a%24">searching the unesb db</a>
		with over 58.000 rows (note the hosting server is a 500MHz Celeron).<br>
		Searching is limited to 1000 postings, usually resulting in a
		somewhat lower number of rows (where rows have multiple matching postings).
		The lowest row number (MFN) that was cut off is recorded,
		and it is possible (not yet in the JSP) to repeat the search
		starting from that rowid.
	</li>
	<li>0.8.3 October 2001<br>
		First truly usable release, since we have true search by index now
		(as prefix or complete word).
		Search gives a list (array) of sorted MFNs;
		arithmetic on those lists (and, or, not) is straightforward.
		The JSP <a href="openisis/Demo?search=development">demo</a>
		shows how a query is refined (narrowed) iteratively
		by ANDing it with a second query.
		And thanks to Veronica Lencinas and colleagues,
		we have a version of this document <a href="index-es.html">en espanol</a>.
	</li>
	<li>September 2001<br>
		Not a new release yet, but maintenance and testing.
		New structure logging mechanism.
		Sources are available via CVS at 
		<a href="http://sourceforge.net/projects/isis/">sourceforge</a>.
		Windows version openisis.exe running.
	</li>
	<li>0.8.2 August 2001<br>
		openisis now under <a href="LGPL.txt">LGPL</a>, no legacy code.
		Conversion from file structures governed by abstract dynamic description
		rather than C-structs, so we can support different file layouts,
		much larger databases, big endian processors and more.
		Simple full-scan search available in C-Lib and Java.
		Given a random read throughput of about 30.000 Records/sec on a
		lame 300 MHz Notebook this seems to be of practic use.
		jsp demo under <a href="openisis/Demo?search=development">development</a> ;).
	</li>
	<li>0.8.1 June 2001<br>
		Java native interface version available.
		Java package org.openisis has Db, Rec, Field, Test.
		NativeDb implemented in libopenjsis.so.
		Subfield splitting and htmlifying in pure java.
	</li>
	<li>0.8.0 May 2001<br>
		Subfield splitting and htmlifying.
		Everything also available from perl as an xsub.
		Record shows up as hash, handy but no repeated fields.
		Makefile.PL, test.pl etc.
	</li>
	<li>0.7.9 May 2001<br>
		First version:
		static C-Library libopenisis.a for reading records by rowid(Mfn).
		Executable "openisis" does test.
		Logging, argumentparsing, Makefile, demo etc.
	</li>
	</ul>

<h3><a name="isis">what is isis</a></h3>
	Isis is a simple, yet powerful database system with a large installed
	base since the 80s. Since it's well suited for bibliographic data,
	it's commonly used in libraries, and since it's very low cost,
	especially in those running on a low budget.

<h4><a name="isisdb">introduction to the isis db</a></h4>
	An isis DB is a list of rows of unspecified structure,
	each identified by a unique number, the rowid (a.k.a. mfn).
	Each row is a list of fields, and each field has number (tag)
	and a string value. Within a row there may be zero, one or more
	fields with a given tag. While the field's value usually is a
	textual representation of data in one or the other character
	encoding (commonly one of the IBM/DOS code pages), it may
	actually contain arbitrary bytes.
	This is closely modelled after ISO2709 "Information Interchange Format"
	(IIF, a.k.a. ANSI/NISO
	<a href="http://www.niso.org/standards/resources/Z39-2.pdf">Z39.2</a>)
	<h5>subfields</h5>
	There is a convention to encode multiple fields in one by separating
	them with a <tt>'^'</tt> followed by one character tagging the subfield.
	So the field value <tt>'^afoo^bbar^bbaz'</tt> represents a field having
	one 'a' subfield with value 'foo' and two 'b' subfields 'bar' and 'baz'.
	An other separator char may be used,
	e.g. ASCII character 31 ("Unit Separator") is used in the
	<a href="http://www.loc.gov/marc/specifications/specrecstruc.html">MARC</a>
	standard.
	<h5>formatting</h5>
	There is a formatting language, with literal text, field and subfield
	variables, <code>if-else</code> branches (on field existance)
	and <code>for</code> loops (over field repetitions) (roughly speaking).
	<h5>indexing</h5>
	An index is build by converting a row into a list of words
	(optionally applying formats) and stuffing every word,
	qualified by the position of it's occurence in the row,
	into a B+-Tree (which is actually spread to six files).
	Searching for a word or
	word prefix is possible with or without qualifying the position (field).
	Since all fields can be combined into one index, it is usually not
	necessary (but possible) to set up multiple indexes.
	<h5>queries</h5>
	A query language allows for combination of word lookups using
	<code>and</code>, <code>or</code> and <code>not</code>(without) operators.
	This is very similar to the "Type-1" query of
	<a href="ftp://ftp.loc.gov/pub/z3950/official/part1.txt">Z39.50</a>.
	<h5>usage</h5>
	While isis lacks most features of RDBMS like complex relations
	between different entities, it's flexibility comes in handy for
	many catalogues and directories with highly varying records and
	one single level of substructure, which today are usually
	modelled in XML documents rather than table rows.
	In other words, isis is an ideal storage for many XML applications.
	The flexible indexing mechanism combines the best of full text
	searching and structured retrieval.

<h4><a name="isissoft">overview of isis software</a></h4>
	The mother of all isis software is a DOS version of "MicroISIS"
	as an integrated system with textual user interface.
	There is a BSD version of "CDS/ISIS" which
	also runs under linux up to some 2.2.x kernels
	(current 2.4 kernels do not support the iBCS module for COFF binaries).
	Then there are several versions of "WinISIS" (M$-Windows only,
	but runs under linux/wine).
<p>
	A shared library version "isis.dll" of functions
	to access an isis db from your code exists, despite it's name,
	also in a linux version ("isilux");
	however, you need some pretty special libs to make it run.
	A set of command line tools ("cisis") performs tasks like importing
	ISO2709 bibliographic databases, inverting (index building) etc.
	The thing next to an isis database server is "wwwisis",
	which runs as CGI or from the command line and performs
	most isis tasks (win and lin versions). However, it actually
	runs per request, not as a server itself, and thus cannot
	provide concurrency control.
<p>
	This "official" isis software, which is maintained by
	<a href="#unesco">Unesco</a> and/or
	<a href="#bireme">Bireme</a>,
	is accompanied by a couple of independent developments,
	some of which are in the public domain.
	<a href="#javaisis">Javaisis</a>
	is an AWT-based GUI (3.5 uses SWING) and a corresponding server,
	which in turn uses wwwisis.
	<a href="#rj">Robert Janusz</a>
	wrote a C-lib (iAPI) from scratch,
	which was the starting point for the openisis software.

<h3><a name="openisis">what is openisis</a></h3>
	So why are we writing the openisis software?
	Because Isis is not open source software, it's not even free software,
	and that leads to a whole bunch of problems.
<h4><a name="problems">problems resulting from closed software</a></h4>
	<ol>
	<li>Availability (in theory)<br>
	Versions of the software exist for some operating systems,
	library versions and languages.
	For other environments, there is no version of the software,
	and there is not much one can do about it.
	</li>
	<li>Availability (in practice)<br>
	You may download most the software, but it's partly protected with passwords,
	which you have to order at some national distributor.
	You have to pay some fee and/or declare some good reasons,
	why you want to use the software.
	Then you have to wait. In germany, for example,
	it didn't work at all for some time, until the newly founded
	<a href="http://www.isisnetz.de/">Isisnetz</a> remedied the situation.
	</li>
	<li>Availability (in legal terms)<br>
	Some parts of the software are accompanied by different documents
	stating some license terms, others are not.
	Terms seem to be pretty different between countries.
	One can not easily figure out, what exactly might be allowed usage.
	</li>
	<li>Availability (of documentation)<br>
	Some documentation is available in english,
	some only in portugese, espanol or italiano.
	Only a small part is downloadable at all, most is paperware.
	</li>
	<li>Bugfixing<br>
	There is no way one can fix a bug,
	and not much one can do about having somebody fix it.
	</li>
	<li>Extending<br>
	The only way one could write a Binding for perl or Java
	would be using the isis.dll.
	There are problems with regard to required additional libraries
	(especially some C++ stuff), there are no statements about
	thread safety, unicode compatibility and so on.
	As a consequence, it's practically impossible to write a
	state-of-the-art web application based on an isis db.
	</li>
	<li>Improving<br>
	Many users develop useful ideas for improvements from practice.
	Their expertise is lost as they are not able to turn it into
	improved software.
	</li>
	<li>Enabling<br>
	While open source software enables people all over the world
	to shape their tools themselves, closed software lets them in
	dependency.
	</li>
	</ol>

<h4><a name="measures">benefits of open software</a></h4>
	To address these problems we feel the need for an open source
	implementation of isis.
	Of course it would be best to have all of the existing isis code
	under one or the other form of open license (GPL, LGPL, artistic
	or similar as appropriate).
<p>
	On the other hand, an independent secondary implementation has
	advantages in it's own right. It may have a different focus
	and develop strengths in one aspect while another aproach
	performs better in other situations. For example,
	openisis will have some support for multithreading and unicode,
	which is paid for by a certain overhead.
	A rewrite by developers with a different background might
	introduce new ideas which finally, after having had their
	indepent test bed, help improve the standard.
<p>
	OpenIsis as a software to access isis databases is and will be freely
	available for everybody with full sourcecode, no fee, no restrictions.

<h4><a name="roadmap">developing openisis</a></h4>
	In general, there are no plans to reimplement every piece
	of code ever written for isis. To be of practical value,
	OpenIsis has to maintain compatibility in the format of
	the database files anyway. So, one may use winisis or
	whatever existing import scripts to create and maintain
	the database, yet deploy OpenIsis' perl interface to run
	powerful reports and the Java Native Interface to allow
	queries from a Servlet based web application.
<p>
	OpenIsis will focus on providing tools rather than applications.
	For example, there will be no attempt to mirror functionality
	of winisis unless the GUI toolkit is done. To achive this,
	OpenIsis provides access from the most important programming
	languages: Java and PHP for the web (DONE), perl for the scripts (DONE) and
	Tcl/Tk for platform independent GUIs (partly DONE).
	All others can, of course, link the lib.
<p>
	Next steps:
	<ul>
	<li>make file layout configurable to allow for larger db (DONE)
	</li>
	<li>implement searching (full-scan searching DONE)
	</li>
	<li>implement index-based searching (DONE)
	</li>
	<li>more performance: try std (DONE, performs badly)
	and homegrown io buffering,
	further accelerate loop in ldb's convert function (DONE)
	</li>
	<li>start work on thread-save pure-java implementation
	(cancelled due to lack of demand)
	</li>
	<li>prepare binary releases for windows (.exe and .dll for java) (DONE)
	</li>
	<li>implement query language (simple version DONE)
	</li>
	<li>implement formatting (simple version DONE)
	</li>
	<li>implement writing data (masterfile DONE, index underway)
	</li>
	<li>finish Tcl binding, create GUI version using TK (-&gt; Erik)
	</li>
	<li>implement server version
	</li>
	<li>... volunteers are welcome !
	</li>
	</ul>

<h3><a name="howto">howto open isis</a></h3>
	Start by downloading the Software.
	Unpack everything in some arbitrary directory.
	For the tests, you will also need some isis database,
	which must be located as files db/cds/cds.*.
	Try <a href="tar/cds.tgz">this one</a>.
	Make sure filenames are lowercase.
<p>
	If you are on Windows, you should either get yourself the cygwin
	environment with tools like gmake and gcc or volunteer as a porter
	and start writing the Makefile for your make and compiler.
	Erik has build a Windows version using mingw and Linux gcc as crosscompiler.
	If you are on Linux, everything is fine.
	Ports to MAC OS X and other UNIXes should be no problem.
<p>
	Type "make" and enjoy the compiler messages.
	(If your make complains, e.g. on BSD, try "gmake").
	Type "make demo" and enjoy your first open isis record.
	(You installed a db/cds/cds.*, didn't you? It has 42 rows?)
	Type "make run" and watch the guts of your db passing by. 
	Type "make test", there should be no difference between the testout.txt
	and the testres.txt as provided
	(using <a href="tar/cds.tgz">this cds</a> database from winisis
	and <a href="tar/unesb.tgz">this 15 MB 58.000+ row unesb.zip</a> db).
	Type "make time" to measure performance,
	subsequent tries are usually much faster.
	My 800 MHz P3 random-reads more than 179.000 records a second,
	once the files are in the system cache.
	Typical values:
	<pre>
time ./openisis -perf 1000000 -db db/cds/cds &gt;/dev/null

real    0m5.655s
user    0m3.650s
sys     0m2.000s

time ./openisis -perf 100000 -db db/unesb/unesb &gt;/dev/null

real    0m0.991s
user    0m0.670s
sys     0m0.320s

time ./openisis -fmt mfnf -search 'k$' -db db/unesb/unesb &gt;/dev/null
860     rows for        k

real    0m0.044s
user    0m0.040s
sys     0m0.000s
	</pre>
	Type "make perl" to build the perl stuff;
	some perl 5.* must be installed beforehand.
	Type "make java" or, if you just can't get enough, "make jdump",
	to see it all happen in your shiny new JDK1.3 Java VM.
	Some 1.2.* JDKs should do, but tell the Makefile to not
	look in /usr/java/jdk1.3 by setting JAVAHOME.

<h4><a name="installing">installing openisis</a></h4>
	libopenisis.a can be linked with your code; no installation necessary.
	You may wan't to install the 'openisis' binary somewhere in your path
	for the fun of it; go ahead, just copy, no magic registry entries.
<p>
	To install the perl stuff for general availability in your
	/usr/lib/perl5 or whatever, cd to the perl subdir (after "make perl")
	and issue "make install" (as root or otherwise legitimated).
	After that, try "perldoc OpenIsis" and the demo.pl script.
<p>
	Java, like perl, needs to dynamically slurp both some stuff in
	the own language and a native shared object.
	The former is openisis.jar, set your CLASSPATH to include it,
	or specify when invoking java like in the Makefile.
	The latter is libopenjsis.so on linux (yes, it's <em>j</em>sis).
	The system dynamic linker must be able to find it;
	see NativeDb.java for details.

<h3><a name="about">about openisis.org</a></h3>
	OpenIsis.org is sponsored by
	<a href="http://www.allmaxx.de/">
	<img src="http://public.allmaxx.de/img/public/banner/aufweiss.gif"
		width="114" height="38" border="0" alt="allmaxx.de"></a>,
	a service of <a href="http://www.merconic.com/">merconic</a>, Berlin, Germany.
	As a student's site, allmaxx supports open software with a focus on
	education and knowledge management.
	See also the <a href="http://www.oc4s.org/">open community for science</a>.
<p>
	Currently the site is maintained by
	<a href="#erik">Erik</a> and <a href="#paul">Paul</a>.
	Volunteers are very welcome.
	Openisis sources are available at
	<a href="http://sourceforge.net/projects/isis/">
	<img src="http://sourceforge.net/sflogo.php?group_id=11257"
		width="88" height="31" border="0" alt="SourceForge"></a>
	side by side with Franck Martin's PHP isis project. Thanks, Franck!

<h3><a name="links">links</a></h3>
	<a name="sourceforge" href="http://sourceforge.net/projects/isis/">
		openisis and PHP isis at sourceforge</a><br>
	isis core sites:<br>
	<a name="unesco" href="http://www.unesco.org/webworld/isis/isis.htm">Unesco</a><br>
	<a name="bireme" href="http://www.bireme.br/isis/">Bireme</a><br>
	documentation:<br>
	<a name="manual" href="http://www.cindoc.csic.es/isis/indice.htm">
		THE BOOK</a> CDS/ISIS reference manual incl. data formats (en espanol)<br>
	standards:<br>
	ISO2709 "Information Interchange Format", a.k.a. ANSI/NISO
	<a href="http://www.niso.org/standards/resources/Z39-2.pdf">Z39.2</a> <br>
	(US)
	<a href="http://www.loc.gov/marc/specifications/specrecstruc.html">MARC</a>
	21, <a href="http://lcweb.loc.gov/marc/">overview</a> <br>
	<a href="ftp://ftp.loc.gov/pub/z3950/official/">Z39.50</a>,
	overview at
	<a href="http://www.oclcpica.org/content/45/pdf/z3950_handbook_paper.pdf">
	OCLC|Pica</a>,<br>
	links at <a href="http://indexdata.dk/z3950/">indexdata</a>,
	makers of excellent free Z39.50 software. <br>
	people and projects:<br>
	<a name="rj" href="http://www.jezuici.krakow.pl/soft/iapi/">
		Robert Janusz' iAPI</a><br>
	<a name="kc" href="http://members.aol.com/cdsisis/">
		Kafkas Caprazli's EVERYTHING about CDS/ISIS</a><br>
	<a name="oss4lib" href="http://www.oss4lib.org/">
		open source software for libraries</a><br>
	<a name="javaisis" href="http://web.tiscali.it/javaisis/">javaisis</a><br>
	<a name="itb" href="http://isisonline.lib.itb.ac.id/">
		Institut Teknologi Bandung</a>
		IsisOnline in Indonesia<br>
	user groups:<br>
	<a name="nlug" href="http://www.bib.wau.nl/isis/">
		Netherlands / international</a><br>
	<a name="isisplus" href="http://www.axp.mdx.ac.uk/~alan2/isisplus.htm">
		UK (ISIS PLUS)</a><br>
	<a name="isisnetz" href="http://www.isisnetz.de/">
		Germany (isisnetz)</a><br>
	staff:<br>
	<a name="erik" href="mailto:erik@openisis.org">Erik Grziwotz</a><br>
	<a name="paul" href="mailto:krip@openisis.org">Klaus "Paul" Ripke</a><br>
	<a name="braulio" href="mailto:braulio@bsolano.com">Braulio José Solano Rojas</a><br>

<h3><a name="docs">documentation</a></h3>
	<a name="charsets" href="doc/charsets.html">ISIS, charsets and unicode</a><br>
	<a href="doc/whatabout.txt">What is it about ISIS that makes it ISIS?</a><br>
	<a href="doc/unirec.txt">the universal ISIS record</a><br>
	<a href="doc/writing.txt">record writing implementation</a><br>
	<a href="doc/threading.txt">multi-threading performance</a><br>
<hr>
$Revision: 1.32 $ last changed $Date: 2002/10/21 10:24:16 $ by $Author: kripke $
<hr>
(this page intentionally left blank :)
</body>
</html>