Documentation describing usage of lookups

author Dobrica Pavlinusic <dpavlin@rot13.org>

Sun, 8 Feb 2004 15:44:28 +0000 (15:44 +0000)

committer Dobrica Pavlinusic <dpavlin@rot13.org>

Sun, 8 Feb 2004 15:44:28 +0000 (15:44 +0000)
author Dobrica Pavlinusic <dpavlin@rot13.org>
Sun, 8 Feb 2004 15:44:28 +0000 (15:44 +0000)
committer Dobrica Pavlinusic <dpavlin@rot13.org>
Sun, 8 Feb 2004 15:44:28 +0000 (15:44 +0000)
diff --git a/doc/lookup.txt b/doc/lookup.txt

new file mode 100644 (file)

index 0000000..ea8fc2b
--- /dev/null
+++ b/doc/lookup.txt
@@ -0,0 +1,169 @@
+How to lookup some value in my output?
+
+
+You might want to use these feature if you try to display something that is
+related to current record.
+
+All lookups are modelled around key => value(s) idea, so you can store any
+value attached to unique key value. Both of those values can have fields for
+any import formats or fixed values (delimiters, prefixes etc.)
+
+First, it's important that database that have to create key => value data
+must be specified before database that uses those values in all2xml.conf.
+
+Second, that usually means that you will have to have two database
+configurations in all2xml.conf which point to same database if you want to
+lookup records from same database. I would suggest to have two import_xml/
+files, one which just store lookup key and values (and thus is faster
+executed) and another that creates output for swish and indexer which just
+use lookup.
+
+
+1. Lookup to other database (using type="lookup_key" and lookup="1")
+
+For example (from import_xml/isis_hidra_ths.xml) thesaurus have terms which
+have unique identifiers in field 900 and we want those term for display.
+
+Bibliographic database (import_xml/isis_hidra_bib.xml) have just field
+which has field 900 from entry in thesaurus. While that's enough to create
+links in search results (using links and format, see doc/links.txt) we would
+like to display term from thesaurus and not value of field 900.
+
+In first step, we store fields from thesaurus (as value) that relates to
+field 900 for that entry (which is key) using following XML (in
+import_xml/isis_hidra.ths.xml):
+
+       <IDths name="ID" order="300">
+               <isis type="lookup_key">900</isis>
+       </IDths>
+
+       <SubjectIndex name="Predmetno kazalo" order="301">
+               <isis type="lookup_val">[5624] 562a</isis>
+       </SubjectIndex>
+
+This will create lookup which you might write like this:
+
+       900 => "[5624] 562a"
+
+Quotes are added to denote that value is single entry.
+We also have to specify in all2xml.conf something like:
+
+       lookup_newfile=/data/webpac/thes.lookup
+
+Which will create new lookup file.
+
+For bibliographic database which will do lookups into previously created file,
+all2xml.conf must have:
+
+       lookup_open=/data/webpac/thes.lookup
+
+and then in import_xml/ we use:
+
+       <isis lookup="1">6013</isis>
+
+Value of field 6103 must match exactly to field 900 (which is key) from
+thesaurus. You can however add arbitrary prefix or suffix to store unrelated
+keys in values in same lookup.
+
+
+1.1 NOTE about memory usage:
+
+This lookups are created on disk. Default configuration also creates
+memory cache for faster indexing which you can turn off by changing line
+
+ my $use_lhash_cache = 1;
+
+in all2xml.pl to
+
+ my $use_lhash_cache = 0;
+
+You won't probably need to do that so, it's not configuration option.
+
+
+2. Lookup that has to store more than one value
+
+While lookups described above are sufficient when you want to store just one
+value associated with one key, they don't quite help us if we need to have
+more than one value for each key.
+
+Typical example of that might be displaying of narrower terms in thesaurus.
+Each narrower term have id of parent term (which is enough to display
+narrower term), but we would like to display all brother terms with each
+term also.
+
+So, we'll store under key of parent term all keys of terms which are brother.
+But, we would also like to display terms and not term numbers. That requests
+first to find all brother terms (which is lookup returning one or more term ids)
+and than lookup names of those returned terms for display.
+
+It's usually called indirect lookup, and is much hated by CS majors in their
+freshman year. Later, it becomes so natural that you think it's the only way
+to solve problem. So, you are stuck with it :-)
+
+Since lookups can return more than one value, and we would like to use format
+to create links, this lookup is implemented like filter="mem_lookup". Let's
+look at example.
+
+       <LookupThesNT name="lookup for thesaurus narrow term">
+               <!--
+                       Store value of field 250a (for display) in key composed
+                       of prefix "d:" and value of field 900.
+                       This is one key - one value lookup.
+               -->
+               <isis filter="mem_lookup" type="display">d:900 => 250a</isis>
+
+               <!--
+                       Now, for each entry generate parent ID (using fields
+                       5614, 5624, 4611 add prefix "a:" to it as a key)
+                       and value of field 900 for value.
+                       That will create lookup which can (and will) have
+                       more than one value for each key (because parent
+                       term have more than one child).
+               -->
+               <isis filter="mem_lookup" type="display">a:5614:5624:4611 => 900</isis>
+
+       </LookupThesNT>
+
+So, after we index database with import_xml which have mem_lookup filter (which won't
+create any output to swish or index) we have just two lookups stored in memory (that's
+where name mem_lookup comes from):
+
+       d:900 => 250a
+
+       a:5614:5624:4611 => 900 900 900 900 900 ...
+
+Actual key of second ("a:") lookup can have form of a:5614, a:5614:5624 or
+a:5614:5624:4611 depending on record (micro-thesaurus terms have just 5614,
+and descriptors have 5614 and 5624 or all of them, depending on level).
+
+Now, let's display some of those lookups.
+
+First, we can display all ids of fields which are child to field 251:
+
+       <isis type="display" filter="mem_looku">[a:251]</isis>
+
+That's not very useful, because we would like to display terms, and not
+ids, possibly separated by " * ".
+
+       <isis type="display" filter="mem_lookup" delimiter=" * ">[d:[a:251]]</isis>
+
+That's great. But, let's link those fields using format:
+
+       <format name="IDths"><![CDATA[
+               <a href="?rm=results&show_full=1&f=IDths&v=%s">%s</a>
+       ]]></format>
+
+       <isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">[a:251];;[d:[a:251]]</isis>
+
+
+There is only one problem left. Since we want to display just child records
+from current record, we have to use three different tags to display child
+records (for field, micro-thesaurus and term). However, that means that
+term will display also all child fields and child micro-thesaurus terms which
+isn't what's needed.
+
+But, each record has also it's own level written in 901a, so we can filter
+just correct child entries using something like:
+
+       <isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">eval{"901a" eq "Podruèje"}[a:251];;[d:[a:251]]</isis>
+
author	Dobrica Pavlinusic <dpavlin@rot13.org>
	Sun, 8 Feb 2004 15:44:28 +0000 (15:44 +0000)
committer	Dobrica Pavlinusic <dpavlin@rot13.org>
	Sun, 8 Feb 2004 15:44:28 +0000 (15:44 +0000)