IMPORTANT - refactor MARC character set handling
authorGalen Charlton <galen.charlton@liblime.com>
Fri, 1 Feb 2008 00:27:04 +0000 (18:27 -0600)
committerJoshua Ferraro <jmf@liblime.com>
Sun, 3 Feb 2008 13:23:56 +0000 (07:23 -0600)
commit60a98d258addadd6e642dea4483b0451b0fe37f7
tree80611f0965e443d4d0450acd043b4696c49fb7de
parent0f0699f2c2bd2d3a5dff27e0cdc199c9050ecfea
IMPORTANT - refactor MARC character set handling

* IsStringUTF8ish - determine if scalar contains a string in UTF8
* MarcToUTF8Record - convert MARC blob or MARC::Record to UTF8
* SetMarcUnicodeFlag - set appropriate MARC21 or UNIMARC field to
  indicate that record is in UTF-8.

Design points of this module include:

* No dependencies on other C4 modules, making it easier to add
  more test cases
* All character conversion code in one place
* Single entry point for doing a character conversion on a
  MARC record
* Capture of errors and warnings produced by Text::Iconv
  and MARC::Charset
* Start of support for guessing the source character set of
  a MARC record.

Several functions were moved from other scripts
or modules to C4::Charset:

* C4::Koha->FixEncoding (expanded and renamed
  MarcToUTF8Record)
* C4::Koha->char_decode5426
* fMARC8ToUTF8 from bulkmarcimport.pl (renamed
  _marc_marc8_to_utf8)

Several batch jobs were adjusted to use MarcToUTF8Record instead of
FixEncoding.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
C4/Breeding.pm
C4/Charset.pm [new file with mode: 0644]
C4/ImportBatch.pm
C4/Koha.pm
cataloguing/z3950_search.pl
misc/migration_tools/bulkmarcimport.pl
t/Charset.t [new file with mode: 0755]