current version
isis
openisis
howto
about
links
en espanol
welcome to OpenIsis.org
DEMOS:
standard
or try
searching for unicode characters
(
unicode index)
Available for download:
Sources
Binaries
- win32 (0.8.6)
( mingw32 cross-build) should work on all win32 platforms from win95b on.
Note: If you want to use java you will need the JDK1.3.x
The current windows version is not yet thread-safe, so using it with a Java
servlet engine under windows requires some care.
- solaris (0.8.4)
( build on solaris 5.8 )
Please mail us
if you need a more up-to-date solaris binary.
Note: the JDK shipped with solaris 5.7 / 5.8 ( java 1.2 ) should work.
News:
- October 19th 2002
The OpenIsis society ("Verein") has been founded with 15 members.
Chairman is Erik Grziwotz, other board members are
Gabi Rohmann, Ingo Struck and Thomas Sonnemann.
- 0.8.7 October 2002
Version 0.8.7 supports writing of DOS/WinIsis masterfiles and xrf.
This currently works fine for a single process on Linux.
See writing.txt for details.
Current TODOs:
interlock multiprocess writing (PHP, Perl CGI)
and fix the windows and solaris versions.
Besides writing support, there is a new streaming record reader,
which groks various formats like the SYSPAR.PAR, email headers and
property files, so you can fill your db from such textfiles.
Next step: new indexing engine.
- August 2002
No new software yet. Still very busy doing metawork.
We are preparing to set up an organisation to support OpenIsis
development with much more momentum and a company for professional
services like help on large scale ISIS installations.
paperwork on the universal ISIS record
- July 2002
some paperwork on
What is it about ISIS that makes it ISIS?
- 0.8.6 June 2002
This version supports basic formatting. While most, especially graphical,
features of WinISIS or CISIS formatting are not yet supported
(which are typically not used in a web environment anyway),
there is support for repeated subfields as declared by MARC for many fields.
See formatting notes for details.
The perl binding supports formatting (see the test.pl),
enhanced versions for all those languages are to follow soon.
- PHP June 2002
Braulio José Solano Rojas
from Costa Rica created a PHP binding, which can be seen in action at
galileo.
Available as download
or from sourceforge module php-openisis.
Also, the Institut Teknologi Bandung of Indonesia
switched it's web index to a PHP/OpenIsis based version.
- 0.8.5.2 March 2002
Much enhanced PERL binding. See the OpenIsis.pm included in the sources.
- 0.8.5.1 March 2002
Java now has support for basic formatting modes like MHL,
various HTML-safety modes (like escaping all non-ASCIIs),
a Vn-field-selector-style method
and several nice utils.
Indentation is not properly handled, since there is no easy
common solution in HTML. Will build tools for a choice of
standard strategies later ...
- 0.8.5 March 2002
Finally some implementation of the query language
(demo).
All the operators are there (including /(tag), but not /(t1,t2...)).
Every attempt is made to limit the potential costs of even extremely
stupid queries like "$"^"$", so no historical (#n) or intermediate
results (for precedence) are stored.
Queries are processed strictly left to right on a buffer of 1000 hits.
- 0.8.4 January 2002
Nearly complete rewrite of search code with support for NEAR conditions.
Fixed alignment problems in IFP, now works with the unesb db
(-format aligned) and the cds db as distributed with winisis.
The cds db we had with previous versions (from an old CDS distribution)
has a mixed format: aligned leaf files, others unaligned.
Although support for a mixed format is easily achieved with openisis,
it will not be included unless somebody has a need for it (send us mail).
The JSP demo now supports
searching the unesb db
with over 58.000 rows (note the hosting server is a 500MHz Celeron).
Searching is limited to 1000 postings, usually resulting in a
somewhat lower number of rows (where rows have multiple matching postings).
The lowest row number (MFN) that was cut off is recorded,
and it is possible (not yet in the JSP) to repeat the search
starting from that rowid.
- 0.8.3 October 2001
First truly usable release, since we have true search by index now
(as prefix or complete word).
Search gives a list (array) of sorted MFNs;
arithmetic on those lists (and, or, not) is straightforward.
The JSP demo
shows how a query is refined (narrowed) iteratively
by ANDing it with a second query.
And thanks to Veronica Lencinas and colleagues,
we have a version of this document en espanol.
- September 2001
Not a new release yet, but maintenance and testing.
New structure logging mechanism.
Sources are available via CVS at
sourceforge.
Windows version openisis.exe running.
- 0.8.2 August 2001
openisis now under LGPL, no legacy code.
Conversion from file structures governed by abstract dynamic description
rather than C-structs, so we can support different file layouts,
much larger databases, big endian processors and more.
Simple full-scan search available in C-Lib and Java.
Given a random read throughput of about 30.000 Records/sec on a
lame 300 MHz Notebook this seems to be of practic use.
jsp demo under development ;).
- 0.8.1 June 2001
Java native interface version available.
Java package org.openisis has Db, Rec, Field, Test.
NativeDb implemented in libopenjsis.so.
Subfield splitting and htmlifying in pure java.
- 0.8.0 May 2001
Subfield splitting and htmlifying.
Everything also available from perl as an xsub.
Record shows up as hash, handy but no repeated fields.
Makefile.PL, test.pl etc.
- 0.7.9 May 2001
First version:
static C-Library libopenisis.a for reading records by rowid(Mfn).
Executable "openisis" does test.
Logging, argumentparsing, Makefile, demo etc.
Isis is a simple, yet powerful database system with a large installed
base since the 80s. Since it's well suited for bibliographic data,
it's commonly used in libraries, and since it's very low cost,
especially in those running on a low budget.
An isis DB is a list of rows of unspecified structure,
each identified by a unique number, the rowid (a.k.a. mfn).
Each row is a list of fields, and each field has number (tag)
and a string value. Within a row there may be zero, one or more
fields with a given tag. While the field's value usually is a
textual representation of data in one or the other character
encoding (commonly one of the IBM/DOS code pages), it may
actually contain arbitrary bytes.
This is closely modelled after ISO2709 "Information Interchange Format"
(IIF, a.k.a. ANSI/NISO
Z39.2)
subfields
There is a convention to encode multiple fields in one by separating
them with a '^' followed by one character tagging the subfield.
So the field value '^afoo^bbar^bbaz' represents a field having
one 'a' subfield with value 'foo' and two 'b' subfields 'bar' and 'baz'.
An other separator char may be used,
e.g. ASCII character 31 ("Unit Separator") is used in the
MARC
standard.
formatting
There is a formatting language, with literal text, field and subfield
variables, if-else
branches (on field existance)
and for
loops (over field repetitions) (roughly speaking).
indexing
An index is build by converting a row into a list of words
(optionally applying formats) and stuffing every word,
qualified by the position of it's occurence in the row,
into a B+-Tree (which is actually spread to six files).
Searching for a word or
word prefix is possible with or without qualifying the position (field).
Since all fields can be combined into one index, it is usually not
necessary (but possible) to set up multiple indexes.
queries
A query language allows for combination of word lookups using
and
, or
and not
(without) operators.
This is very similar to the "Type-1" query of
Z39.50.
usage
While isis lacks most features of RDBMS like complex relations
between different entities, it's flexibility comes in handy for
many catalogues and directories with highly varying records and
one single level of substructure, which today are usually
modelled in XML documents rather than table rows.
In other words, isis is an ideal storage for many XML applications.
The flexible indexing mechanism combines the best of full text
searching and structured retrieval.
The mother of all isis software is a DOS version of "MicroISIS"
as an integrated system with textual user interface.
There is a BSD version of "CDS/ISIS" which
also runs under linux up to some 2.2.x kernels
(current 2.4 kernels do not support the iBCS module for COFF binaries).
Then there are several versions of "WinISIS" (M$-Windows only,
but runs under linux/wine).
A shared library version "isis.dll" of functions
to access an isis db from your code exists, despite it's name,
also in a linux version ("isilux");
however, you need some pretty special libs to make it run.
A set of command line tools ("cisis") performs tasks like importing
ISO2709 bibliographic databases, inverting (index building) etc.
The thing next to an isis database server is "wwwisis",
which runs as CGI or from the command line and performs
most isis tasks (win and lin versions). However, it actually
runs per request, not as a server itself, and thus cannot
provide concurrency control.
This "official" isis software, which is maintained by
Unesco and/or
Bireme,
is accompanied by a couple of independent developments,
some of which are in the public domain.
Javaisis
is an AWT-based GUI (3.5 uses SWING) and a corresponding server,
which in turn uses wwwisis.
Robert Janusz
wrote a C-lib (iAPI) from scratch,
which was the starting point for the openisis software.
So why are we writing the openisis software?
Because Isis is not open source software, it's not even free software,
and that leads to a whole bunch of problems.
- Availability (in theory)
Versions of the software exist for some operating systems,
library versions and languages.
For other environments, there is no version of the software,
and there is not much one can do about it.
- Availability (in practice)
You may download most the software, but it's partly protected with passwords,
which you have to order at some national distributor.
You have to pay some fee and/or declare some good reasons,
why you want to use the software.
Then you have to wait. In germany, for example,
it didn't work at all for some time, until the newly founded
Isisnetz remedied the situation.
- Availability (in legal terms)
Some parts of the software are accompanied by different documents
stating some license terms, others are not.
Terms seem to be pretty different between countries.
One can not easily figure out, what exactly might be allowed usage.
- Availability (of documentation)
Some documentation is available in english,
some only in portugese, espanol or italiano.
Only a small part is downloadable at all, most is paperware.
- Bugfixing
There is no way one can fix a bug,
and not much one can do about having somebody fix it.
- Extending
The only way one could write a Binding for perl or Java
would be using the isis.dll.
There are problems with regard to required additional libraries
(especially some C++ stuff), there are no statements about
thread safety, unicode compatibility and so on.
As a consequence, it's practically impossible to write a
state-of-the-art web application based on an isis db.
- Improving
Many users develop useful ideas for improvements from practice.
Their expertise is lost as they are not able to turn it into
improved software.
- Enabling
While open source software enables people all over the world
to shape their tools themselves, closed software lets them in
dependency.
To address these problems we feel the need for an open source
implementation of isis.
Of course it would be best to have all of the existing isis code
under one or the other form of open license (GPL, LGPL, artistic
or similar as appropriate).
On the other hand, an independent secondary implementation has
advantages in it's own right. It may have a different focus
and develop strengths in one aspect while another aproach
performs better in other situations. For example,
openisis will have some support for multithreading and unicode,
which is paid for by a certain overhead.
A rewrite by developers with a different background might
introduce new ideas which finally, after having had their
indepent test bed, help improve the standard.
OpenIsis as a software to access isis databases is and will be freely
available for everybody with full sourcecode, no fee, no restrictions.
In general, there are no plans to reimplement every piece
of code ever written for isis. To be of practical value,
OpenIsis has to maintain compatibility in the format of
the database files anyway. So, one may use winisis or
whatever existing import scripts to create and maintain
the database, yet deploy OpenIsis' perl interface to run
powerful reports and the Java Native Interface to allow
queries from a Servlet based web application.
OpenIsis will focus on providing tools rather than applications.
For example, there will be no attempt to mirror functionality
of winisis unless the GUI toolkit is done. To achive this,
OpenIsis provides access from the most important programming
languages: Java and PHP for the web (DONE), perl for the scripts (DONE) and
Tcl/Tk for platform independent GUIs (partly DONE).
All others can, of course, link the lib.
Next steps:
- make file layout configurable to allow for larger db (DONE)
- implement searching (full-scan searching DONE)
- implement index-based searching (DONE)
- more performance: try std (DONE, performs badly)
and homegrown io buffering,
further accelerate loop in ldb's convert function (DONE)
- start work on thread-save pure-java implementation
(cancelled due to lack of demand)
- prepare binary releases for windows (.exe and .dll for java) (DONE)
- implement query language (simple version DONE)
- implement formatting (simple version DONE)
- implement writing data (masterfile DONE, index underway)
- finish Tcl binding, create GUI version using TK (-> Erik)
- implement server version
- ... volunteers are welcome !
Start by downloading the Software.
Unpack everything in some arbitrary directory.
For the tests, you will also need some isis database,
which must be located as files db/cds/cds.*.
Try this one.
Make sure filenames are lowercase.
If you are on Windows, you should either get yourself the cygwin
environment with tools like gmake and gcc or volunteer as a porter
and start writing the Makefile for your make and compiler.
Erik has build a Windows version using mingw and Linux gcc as crosscompiler.
If you are on Linux, everything is fine.
Ports to MAC OS X and other UNIXes should be no problem.
Type "make" and enjoy the compiler messages.
(If your make complains, e.g. on BSD, try "gmake").
Type "make demo" and enjoy your first open isis record.
(You installed a db/cds/cds.*, didn't you? It has 42 rows?)
Type "make run" and watch the guts of your db passing by.
Type "make test", there should be no difference between the testout.txt
and the testres.txt as provided
(using this cds database from winisis
and this 15 MB 58.000+ row unesb.zip db).
Type "make time" to measure performance,
subsequent tries are usually much faster.
My 800 MHz P3 random-reads more than 179.000 records a second,
once the files are in the system cache.
Typical values:
time ./openisis -perf 1000000 -db db/cds/cds >/dev/null
real 0m5.655s
user 0m3.650s
sys 0m2.000s
time ./openisis -perf 100000 -db db/unesb/unesb >/dev/null
real 0m0.991s
user 0m0.670s
sys 0m0.320s
time ./openisis -fmt mfnf -search 'k$' -db db/unesb/unesb >/dev/null
860 rows for k
real 0m0.044s
user 0m0.040s
sys 0m0.000s
Type "make perl" to build the perl stuff;
some perl 5.* must be installed beforehand.
Type "make java" or, if you just can't get enough, "make jdump",
to see it all happen in your shiny new JDK1.3 Java VM.
Some 1.2.* JDKs should do, but tell the Makefile to not
look in /usr/java/jdk1.3 by setting JAVAHOME.
libopenisis.a can be linked with your code; no installation necessary.
You may wan't to install the 'openisis' binary somewhere in your path
for the fun of it; go ahead, just copy, no magic registry entries.
To install the perl stuff for general availability in your
/usr/lib/perl5 or whatever, cd to the perl subdir (after "make perl")
and issue "make install" (as root or otherwise legitimated).
After that, try "perldoc OpenIsis" and the demo.pl script.
Java, like perl, needs to dynamically slurp both some stuff in
the own language and a native shared object.
The former is openisis.jar, set your CLASSPATH to include it,
or specify when invoking java like in the Makefile.
The latter is libopenjsis.so on linux (yes, it's jsis).
The system dynamic linker must be able to find it;
see NativeDb.java for details.
OpenIsis.org is sponsored by
,
a service of merconic, Berlin, Germany.
As a student's site, allmaxx supports open software with a focus on
education and knowledge management.
See also the open community for science.
Currently the site is maintained by
Erik and Paul.
Volunteers are very welcome.
Openisis sources are available at
side by side with Franck Martin's PHP isis project. Thanks, Franck!
openisis and PHP isis at sourceforge
isis core sites:
Unesco
Bireme
documentation:
THE BOOK CDS/ISIS reference manual incl. data formats (en espanol)
standards:
ISO2709 "Information Interchange Format", a.k.a. ANSI/NISO
Z39.2
(US)
MARC
21, overview
Z39.50,
overview at
OCLC|Pica,
links at indexdata,
makers of excellent free Z39.50 software.
people and projects:
Robert Janusz' iAPI
Kafkas Caprazli's EVERYTHING about CDS/ISIS
open source software for libraries
javaisis
Institut Teknologi Bandung
IsisOnline in Indonesia
user groups:
Netherlands / international
UK (ISIS PLUS)
Germany (isisnetz)
staff:
Erik Grziwotz
Klaus "Paul" Ripke
Braulio José Solano Rojas
ISIS, charsets and unicode
What is it about ISIS that makes it ISIS?
the universal ISIS record
record writing implementation
multi-threading performance
$Revision: 1.32 $ last changed $Date: 2002/10/21 10:24:16 $ by $Author: kripke $
(this page intentionally left blank :)