From e10ef24d73868411ad330c3fe07479ed7ecee5ea Mon Sep 17 00:00:00 2001 From: Dobrica Pavlinusic Date: Sun, 8 Feb 2004 15:44:28 +0000 Subject: [PATCH] Documentation describing usage of lookups git-svn-id: file:///home/dpavlin/private/svn/webpac/trunk@222 13eb9ef6-21d5-0310-b721-a9d68796d827 --- doc/lookup.txt | 169 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100644 doc/lookup.txt diff --git a/doc/lookup.txt b/doc/lookup.txt new file mode 100644 index 0000000..ea8fc2b --- /dev/null +++ b/doc/lookup.txt @@ -0,0 +1,169 @@ +How to lookup some value in my output? + + +You might want to use these feature if you try to display something that is +related to current record. + +All lookups are modelled around key => value(s) idea, so you can store any +value attached to unique key value. Both of those values can have fields for +any import formats or fixed values (delimiters, prefixes etc.) + +First, it's important that database that have to create key => value data +must be specified before database that uses those values in all2xml.conf. + +Second, that usually means that you will have to have two database +configurations in all2xml.conf which point to same database if you want to +lookup records from same database. I would suggest to have two import_xml/ +files, one which just store lookup key and values (and thus is faster +executed) and another that creates output for swish and indexer which just +use lookup. + + +1. Lookup to other database (using type="lookup_key" and lookup="1") + +For example (from import_xml/isis_hidra_ths.xml) thesaurus have terms which +have unique identifiers in field 900 and we want those term for display. + +Bibliographic database (import_xml/isis_hidra_bib.xml) have just field +which has field 900 from entry in thesaurus. While that's enough to create +links in search results (using links and format, see doc/links.txt) we would +like to display term from thesaurus and not value of field 900. + +In first step, we store fields from thesaurus (as value) that relates to +field 900 for that entry (which is key) using following XML (in +import_xml/isis_hidra.ths.xml): + + + 900 + + + + [5624] 562a + + +This will create lookup which you might write like this: + + 900 => "[5624] 562a" + +Quotes are added to denote that value is single entry. +We also have to specify in all2xml.conf something like: + + lookup_newfile=/data/webpac/thes.lookup + +Which will create new lookup file. + +For bibliographic database which will do lookups into previously created file, +all2xml.conf must have: + + lookup_open=/data/webpac/thes.lookup + +and then in import_xml/ we use: + + 6013 + +Value of field 6103 must match exactly to field 900 (which is key) from +thesaurus. You can however add arbitrary prefix or suffix to store unrelated +keys in values in same lookup. + + +1.1 NOTE about memory usage: + +This lookups are created on disk. Default configuration also creates +memory cache for faster indexing which you can turn off by changing line + + my $use_lhash_cache = 1; + +in all2xml.pl to + + my $use_lhash_cache = 0; + +You won't probably need to do that so, it's not configuration option. + + +2. Lookup that has to store more than one value + +While lookups described above are sufficient when you want to store just one +value associated with one key, they don't quite help us if we need to have +more than one value for each key. + +Typical example of that might be displaying of narrower terms in thesaurus. +Each narrower term have id of parent term (which is enough to display +narrower term), but we would like to display all brother terms with each +term also. + +So, we'll store under key of parent term all keys of terms which are brother. +But, we would also like to display terms and not term numbers. That requests +first to find all brother terms (which is lookup returning one or more term ids) +and than lookup names of those returned terms for display. + +It's usually called indirect lookup, and is much hated by CS majors in their +freshman year. Later, it becomes so natural that you think it's the only way +to solve problem. So, you are stuck with it :-) + +Since lookups can return more than one value, and we would like to use format +to create links, this lookup is implemented like filter="mem_lookup". Let's +look at example. + + + + d:900 => 250a + + + a:5614:5624:4611 => 900 + + + +So, after we index database with import_xml which have mem_lookup filter (which won't +create any output to swish or index) we have just two lookups stored in memory (that's +where name mem_lookup comes from): + + d:900 => 250a + + a:5614:5624:4611 => 900 900 900 900 900 ... + +Actual key of second ("a:") lookup can have form of a:5614, a:5614:5624 or +a:5614:5624:4611 depending on record (micro-thesaurus terms have just 5614, +and descriptors have 5614 and 5624 or all of them, depending on level). + +Now, let's display some of those lookups. + +First, we can display all ids of fields which are child to field 251: + + [a:251] + +That's not very useful, because we would like to display terms, and not +ids, possibly separated by " * ". + + [d:[a:251]] + +That's great. But, let's link those fields using format: + + %s + ]]> + + [a:251];;[d:[a:251]] + + +There is only one problem left. Since we want to display just child records +from current record, we have to use three different tags to display child +records (for field, micro-thesaurus and term). However, that means that +term will display also all child fields and child micro-thesaurus terms which +isn't what's needed. + +But, each record has also it's own level written in 901a, so we can filter +just correct child entries using something like: + + eval{"901a" eq "Podruèje"}[a:251];;[d:[a:251]] + -- 2.20.1