step
- source file CDS/ISIS, MARC, Excel, robots, ...
+ source data CDS/ISIS, MARC, Excel, robots, ...
|
- 1 | apply import normalisation rules (xml)
+ 0 | apply lookup rules (optional)
+ 1 | apply input normalisation rules (xml or yaml)
V
- intermidiate this data is re-formatted source data converted
- data to chunks based on tag names from config/input/*.xml
+ intermidiate this data is re-formatted source data converted
+ data to chunks based on tag names from config/input/
|
- 2 | apply output filter (TT2)
+ 2 | optionally apply output filter (TT2)
V
- data search engine, HTML, OAI, RDBMS
+ data search engine, HTML, OAI, RDBMS
|
3 | filter using query in REST format
4 | apply output filter (TT2)
V
- client Web browser, SOAP
+ client Web browser (html), JSON
-=head2 Normalisation and Intermidiate data
+=head2 Source data
-This is first step in working with your data.
+WebPAC supports various input formats:
+
+=over 2
+
+=item L<WebPAC::Input::ISIS> CDS/ISIS data
+
+=item L<WebPAC::Input::MARC> for MARC records
+
+=item L<WebPAC::Input::Excel> Microsoft Excel C<.xls> support
+
+=item L<WebPAC::Input::DBF> support legacy tables (e.g. Clipper)
+
+=item L<WebPAC::Input::Gutenberg> for RDF catalog data from Project Gutenberg
+
+=back
+
+=head2 Create data lookups
+
+Before you can begin normalisation, you might want to create lookups which store
+C<< key -> value(s) >> pair(s). Lookups are especially useful if you want to
+I<well> lookup value of some other record using some sort of identifier.
+
+Lookup are described in more details in L<WebPAC::Lookup>.
+
+=head2 Normalisation to intermidiate data
+
+Intermidiate data is internal representation of data on which WebPAC operates.
You are creating mappings, one-to-one from source data records to documents
-in webpac. You can split or merge data from input records, apply filters
-(perl subroutines), use lookups within same source file or do simple
-evaluations while producing output.
+in WebPAC. You can split or merge data from input records, apply regexes,
+use lookups within same source file, do conditions, branches and/or
+simple evaluations while producing intermidiate data.
+
+All that is controlled with C<config/config.yml> configuration file.
+This file is in human-readable YAML format, and it describes all configuration of
+WebPAC and it's front-end Webpacus.
-All that is controlled with C<config/input/*.xml> configuration file. You
+
+All that is controlled with C<config/input/> configuration files. You
will want to create fine-grained chunks of data (like separate first and
last name), which will later be used to produce output. You can think of
-conversation process as application of C<config/input/*.xml> recepie on
+conversation process as application of C<config/input/> recepie on
every input record.
Each tag within recepie is creating one new records as long as there are
formatting or specification of output type and that granularity of each tag
has increased.
+B<this document should really be updated to reflect Webpacus front-end from
+this point...>
+
=head2 Output filter
Now that we have normalized record, we can create some output. You can create