1 Representing ISIS data in a relational database
3 If you consider setting up a table with columns "author", "title"
4 and so on, don't do it. The following structure gives you at least
5 some of the benefits of ISIS, because it's shaped more closely
6 to the ISIS data model.
10 It is not very difficult to think of a table structure
11 for lossless representation of the contents of an ISIS masterfile:
14 mfn number, -- master file number
15 fld number, -- counting fields in this masterfile record
16 tag number, -- the field tag
17 val varchar(2048) -- field's content
21 - a row in this table does NOT hold an ISIS record,
22 but one field of an ISIS record
23 - each ISIS record is represented by all table rows
25 - you may omit the column "fld" if you don't care for the ordering
26 - you may choose other datatypes for the field tag,
27 for example number(3) or char(3),
28 depending on the compatibility required (e.g. MARC)
29 - you may choose other dimensioning for the val column.
30 ISIS can hold up to 32KB (signed short) in one field,
31 Oracle has a maximum of 4KB on the varchar type.
35 Retrieving row 123 is easily done with
37 SELECT tag, val WHERE mfn = 123 ORDER BY fld
39 It is suggested that you fill the val column with native
40 ISIS data as of "MPL" proof mode. Then you have to extract
41 subfields and perform other substitutions for output.
42 This can be achieved for example using the v-method of the
43 > /jdoc/org/openisis/Field.html Field
44 Java class (it's a static method of about 100 lines,
45 which you may grab and port to your system).
48 An alternative would be to pre-split the subfields
55 sub char(1), -- subfield code
56 val varchar(2048) -- subfield's content
59 A typical keywords field like
61 <deltas><humid zones><tropical zones>
63 should be split into three subfields using any subfield code, say '>'.
65 The advantage of the pre-split approach is that it makes both
66 display and indexing easier.
67 The disadvantage is, that you'll loose some information,
72 Searching is also relatively easy, as long as you ask simple questions ;).
73 To get a list of the mfns with a field 24 containing Twain, use
75 SELECT DISTINCT(mfn) FROM Mst WHERE val = 'Twain' AND tag = 24
77 For this to work efficiently, you have to put an index on val.
78 You may than search for 'Twain' with or without the restriction
79 on a certain tag, just like in ISIS.
80 You can also use right truncation and restrict on a couple of tags:
82 SELECT DISTINCT(mfn) FROM Mst WHERE val LIKE 'Twain%' AND tag IN (201,401)
84 Right truncation should still use the index on val,
85 thus be reasonable efficient (depending on how smart your RDB is).
86 With the pre-split structure as above, this applies to any subfield.
87 If you want to find Twain anywhere within the field's values, you need
89 SELECT DISTINCT(mfn) FROM Mst WHERE val LIKE '%Twain%'
91 This will perform a pretty costly full table scan,
92 so you may go fetch yourself a coffee meanwhile.
95 While you may use SQLs OR predicate to combine several terms as
96 alternatives, the ISIS AND does NOT map to SQLs AND,
97 but to an INTERSECTion on the mfn lists.
98 It's actually pretty much the same as ISIS does internally (but more costly).
102 Still, this is lightyears away from the possibilities of an ISIS db.
103 The index on val does not do word splits, does not use stopwords,
104 is not case insensitive, does not apply character or word conversion
107 To come anywhere near the features of ISIS, you have to do just
108 what ISIS does: set up a separate "Inverted File".
111 val varchar(30), -- index entry
112 mfn number, -- record containing entry
113 tag number, -- tag of field containing entry
114 occ number, -- field is occ'th occurence of field tag in record
115 pos number -- entry is pos'th word in field
118 To fill this table, you could of course go and reimplement
119 all the features of an ISIS .FST in, say, Oracle's PL/SQL
120 and recreate one record's entries from within an INSERT/UPDATE trigger.
122 Much easier, given you had a true ISIS database in the first place,
123 is to load the .LK1 and .LK2 text files created during full index update.
124 Some ISIS tools also allow you to create these files separately.
126 Searching is performed against the Index table
127 (which of course needs an RDB index at least on val),
129 The latter is only used to retrieve the full content for a given mfn,
130 or for full table scans.
132 * so why then use ISIS in the first place?
134 There is a long list of reasons, here go a few of them:
135 - You don't want to reimplement the .FST, the full formatting language,
136 all the nifty details of the various indexing techniques and so on.
137 Actually, if you DO want, you're very welcome in the OpenIsis team ;)
138 - Not only the index building, but also data entry is done much more
139 comfortable using ISIS tools.
140 - Any queries but the most simple examples given above will not only
141 be somewhat complex to write in SQL, but will also execute much slower
142 than in a native ISIS implementation (consider publishing with OpenIsis).
143 - Especially the NEAR operators can't easily be expressed in SQL.
144 Although the index structure given above contains the necessary data
145 in occ and pos, advanced SQL features like nested queries are needed
146 in the appropriate SQL query. You need to have an advanced (read:
147 expensive) RDBMS and a deep and thorough understanding of it's
148 query optimizer and hinting system for this to run in any
150 - So why then use a RDBMS in the first place?
153 > whatabout a comparision of ISIS and other database systems
155 > unirec notes on the flexibility of the ISIS data model.
157 $Id: RdbConv.txt,v 1.4 2002/11/26 16:57:19 kripke Exp $