<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<article>
  <title>Set Parameters and Migrate Your Data to Koha 2.2</title>

  <articleinfo>
    <pubdate>2005-04-18</pubdate>

    <author>
      <firstname>Paul</firstname>

      <surname>POULAIN</surname>

      <email>paul AT koha-fr.org</email>
    </author>

    <authorblurb>
      <para>Consultant indépendant en logiciels libres, Koha "Release Manager"
      pour la version 2.0 puis 2.2, membre du comité de pilotage
      international</para>
    </authorblurb>

    <copyright>
      <year>2004</year>

      <year>2005</year>

      <holder>Paul Poulain</holder>
    </copyright>

    <legalnotice>
      <para>This document is related to Koha and is licensed to you under the
      GNU General Public License version 2 or later (<ulink
      url="http://www.gnu.org/licenses/gpl.html">http://www.gnu.org/licenses/gpl.html</ulink>).</para>

      <para>Koha-related documents may be reproduced and distributed in whole
      or in part, in any medium physical or electronic, as long as this
      copyright notice is retained on all copies.</para>

      <para>You may create a derivative work and distribute it provided that
      you:</para>

      <orderedlist>
        <listitem>
          <para>License the derivative work with this same license, or the
          Linux Documentation Project License (<ulink
          url="http://www.tldp.org/COPYRIGHT.html">http://www.tldp.org/COPYRIGHT.html</ulink>).
          Include a copyright notice and at least a pointer to the license
          used.</para>
        </listitem>

        <listitem>
          <para>Give due credit to previous authors and major
          contributors.</para>
        </listitem>
      </orderedlist>

      <para>Commercial redistribution is allowed and encouraged; however, the
      author would like to be notified of any such distributions.</para>

      <para>No liability for the contents of this document can be accepted.
      Use the concepts, examples and information at your own risk. There may
      be errors and inaccuracies, that could be damaging to your system.
      Proceed with caution, and although this is highly unlikely, the
      author(s) do not take any responsibility.</para>

      <para>All copyrights are held by their by their respective owners,
      unless specifically noted otherwise. Use of a term in this document
      should not be regarded as affecting the validity of any trademark or
      service mark. Naming of particular products or brands should not be seen
      as endorsements.</para>
    </legalnotice>

    <revhistory>
      <revision>
        <revnumber>2.2.2</revnumber>

        <date>2005-04-18</date>

        <authorinitials>rsh</authorinitials>

        <revdescription>
          <para>English translation by Regula Sebastiao H.</para>
        </revdescription>
      </revision>

      <revision>
        <revnumber>2.2.0</revnumber>

        <date>2005-01-06</date>

        <authorinitials>pp</authorinitials>

        <revdescription>
          <para>Version initiale</para>
        </revdescription>
      </revision>
    </revhistory>
  </articleinfo>

  <section>
    <title>Introduction</title>

    <section>
      <title>General remarks</title>

      <para>Data migration is done in an iterative way :</para>

      <itemizedlist>
        <listitem>
          <para>Examination of data to migrate (supposedly from a ISO2709
          file)</para>
        </listitem>

        <listitem>
          <para>Do part of the parametrization</para>
        </listitem>

        <listitem>
          <para>Import records and control</para>
        </listitem>

        <listitem>
          <para>Refine parametrization of bibliographic records</para>
        </listitem>

        <listitem>
          <para>Re-import record, etc.</para>
        </listitem>

        <listitem>
          <para>As soon as the import of the bibliographic records is finished
          and validated, migrate the authority records, which is done with a
          similar iterative process.</para>
        </listitem>
      </itemizedlist>
    </section>

    <section>
      <title>Essentials</title>

      <para>The data migration demands librarian's and computer competences .
      Don't start if you cannot reunite both. If you are IT, you may
      disappoint your users. If you are a librarian all which follows may seem
      completely obscure to you.</para>

      <para>So if you do not have both competences, it might be a good idea to
      ask for external help.</para>

      <para>Technical essentials:</para>

      <itemizedlist>
        <listitem>
          <para>Koha installation; it is supposed that Koha is installed.
          $KOHA is the directory in which the professional (librarian's)
          interface is stored.</para>

          <programlisting>ls -l $KOHA</programlisting>

          <para>must output: <filename>cgi-bin</filename>
          <filename>htdocs</filename> <filename>modules</filename>
          <filename>scripts</filename></para>
        </listitem>

        <listitem>
          <para>The configuration file of Koha is in
          <filename>/etc/koha.conf</filename></para>
        </listitem>

        <listitem>
          <para>All following commands are executed from a console (shell) on
          the server.</para>
        </listitem>

        <listitem>
          <para>In the shell (command line), verify that Perl can find your
          modules:</para>

          <programlisting>export PERL5LIB=$KOHA/modules</programlisting>
        </listitem>
      </itemizedlist>
    </section>
  </section>

  <section>
    <title>First iteration</title>

    <section>
      <title>dumpmarc.pl</title>

      <para>Go to the directory where the file of the records to migrate is.
      In this example it's called <filename>notices.iso</filename>.</para>

      <para>To examine its content, the script
      <filename>dumpmarc.pl</filename> is useful:</para>

      <programlisting>$KOHA/scripts/misc/dumpmarc.pl -f notices.iso</programlisting>

      <para>The whole of the records is displayed in a form which can be read
      by a normal human being, which is not the case of a record in raw
      "iso2709" format.</para>

      <para>It's possible to stop the display with
      <userinput>CTRL-C</userinput> if you have a lot of records!</para>

      <para>Examine the records displayed, and note the most frequently used
      tags. The following tags probably occur most frequently (in UNIMARC, in
      [...] the corresponding tags of MARC21):</para>

      <itemizedlist>
        <listitem>
          <para>200, with title and authors. [245]</para>
        </listitem>

        <listitem>
          <para>205, for edition informations. [250]</para>
        </listitem>

        <listitem>
          <para>210, publisher. [260]</para>
        </listitem>

        <listitem>
          <para>215, physical description [300]</para>
        </listitem>

        <listitem>
          <para>010, ISBN [020]</para>
        </listitem>

        <listitem>
          <para>011, ISSN [022]</para>
        </listitem>

        <listitem>
          <para>600 to 699 subject authorities</para>
        </listitem>

        <listitem>
          <para>700 to 799 author authorities</para>
        </listitem>
      </itemizedlist>

      <para>But this document is not meant to be a UNIMARC [nor MARC21 ;-)]
      lesson!</para>

      <para>It's also helpful to look at the item records. They should be
      stored in Tag 995 if your records respect the recommendation of
      995.</para>
    </section>

    <section>
      <title>UNIMARC parametrization</title>

      <para>When your Koha was first installed, you had to choose and to
      import all UNIMARC tags and sub-fields [valid for other MARC formats
      too]. You thus can go to</para>

      <synopsis>Koha &gt;&gt; Parameters &gt;&gt; Biblio framework (MARC structure) </synopsis>

      <para>This is where you can adapt your frameworks according to your
      wishes and according to what you have been looking at in the last
      section.</para>

      <section>
        <title>Basic notion : MARC based / non-MARC based</title>

        <para>One of the major constraints of the development team of Koha is
        to develop a "multi-MARC" application.</para>

        <para>But although all declinations of the MARC format have the same
        form (ISO2709), the meaning of each tag and it's sub-fields can be
        completely different from one dialect to another.</para>

        <para>For example, in UNIMARC, the title is stored in 200$a.</para>

        <para>In MARC21, it's in 245$a.</para>

        <para>Whereas the 245$a of UNIMARC contains ... nothing (the field
        simply doesn't exist !)</para>

        <para>Well, this means that it's impossible to administer all the
        differences in a simple way.</para>

        <para>The Koha development team has thus chosen a method to circumvent
        this problem, without putting a strain on performance: everything is
        stored twice:<note>
            <para>To those who wonder about the fact of storing everything
            twice: the cost in MB's of hard disk is ridiculous compared to the
            gain in search performance! By the way, and without entering into
            details, the data actually are stored three times, still for
            performance reasons.</para>
          </note>in "MARC" format (200$a which ignores the meaning of the
        field) and in a "decoded" form (title which ignores the MARC
        position).</para>

        <para>A table for parametrization is set up at installation. This
        table contains all the labels for all MARC21 (or UNIMARC) fields and
        subfields, as well as the links between the two databases.</para>

        <para>It's the application which takes care of storing all data twice,
        namely at the creation or modification of a record.</para>

        <para>Even if this adds a lot to the complexity of the operations for
        record creation/modification, it simplifies the operations of search
        and reading just as much. Since in an ILS over 90% of access
        operations are for search and less than 10% are record
        creation/modification operations, the speed of the application is
        reinforced even more so.</para>

        <para>For example, for the display of a search result of a query, the
        application uses the "title" directly. If the MARC format was used,
        the whole record would have to be searched in order to only extract
        it's title!</para>
      </section>

      <section>
        <title>Biblio framework configuration</title>

        <para>Note that Koha is capable of managing different frameworks. I
        recommend refining the first framework before creating subsequent
        ones.</para>

        <para>At first, we are not looking at authority records.</para>

        <para>Select the default framework and modify it's fields one after
        the other.</para>

        <para>For each MARC tag, the following information has to be
        defined:</para>

        <itemizedlist>
          <listitem>
            <para>Repeatable or not. If repeatable, a sign appears in front of
            the tag, allowing to repeat the field if necessary.</para>
          </listitem>

          <listitem>
            <para>Mandatory or not. If a field is mandatory, it is not
            possible to validate the record without at least one subfield in
            this tag.</para>
          </listitem>

          <listitem>
            <para>The label of the tag is also defined in the configuration
            tables. It's possible to adapt the label to your needs, default
            values are UNIMARC labels..</para>
          </listitem>
        </itemizedlist>

        <para>For each subfield, the following information has to be
        defined:</para>

        <itemizedlist>
          <listitem>
            <para>Activated or not. Chose a tab other than -1 (ignore) to
            activate it. Note that all subfields of a given tag must appear in
            the same tag. Otherwise, and if the tag is repeatable, Koha won't
            know how to react when the field is to be repeated!</para>
          </listitem>

          <listitem>
            <para>Mandatory or not. If the subfield is mandatory, the record
            can be validated only after containing data.</para>
          </listitem>

          <listitem>
            <para>Repeatable or not. If a subfield is repeatable, this can be
            simply done by separating the repeated values by the sign |
            .</para>
          </listitem>

          <listitem>
            <para>Linking the "Koha field" with the MARC subfield. As Koha is
            multi-MARC, the meaning of certain specific MARC fields has to be
            "taught" to it first. For example, teach it that the subfield
            245$a contains the title of the record, the subfield 245$c the
            author(s), etc. In the default template, the main "links" have
            already been activated. It might be necessary to modify certain
            links as subject.bibliosubject which point towards a subfield 6XX,
            or the links to additionalauthors.authors, which contains the link
            to the co-authors, or the link to bibliosubtitle.subtitle
            containing the subtitles.</para>

            <para>The "Koha fields" can originate from the following tables:
            biblio, biblioitems, items, additionalauthors, bibliosubtitle,
            bibliosubject.</para>
          </listitem>

          <listitem>
            <para>"Related fields". If you search a certain field, Koha will
            automatically expand the search to the "related fields" which you
            have declared. This allows to extend an author search to
            co-authors, to author authorities, if you use them, or to expand a
            title search to subtitles, serial titles, uniform titles,
            etc.</para>
          </listitem>

          <listitem>
            <para>Check or not option "hidden". This option is normally used
            for all $9 fields of authorities (see section on authorities). If
            you check this option, the field will be displayed in the MARC
            editor, but not for display. This option can be used to hide a
            subfield from OPAC display.</para>
          </listitem>

          <listitem>
            <para>Check or not option "URL". If you check this option, the
            field will be a hyperlink.</para>
          </listitem>
        </itemizedlist>

        <para>There are also 3 options to add constraints for the data input
        in subfields:</para>

        <itemizedlist>
          <listitem>
            <para>Authorised values</para>
          </listitem>

          <listitem>
            <para>Thesaurus</para>
          </listitem>

          <listitem>
            <para>Plugin</para>
          </listitem>
        </itemizedlist>

        <para>These elements can be neglected at first, they'll be treated
        later.</para>

        <para>Now, go to the cataloging menu and add an empty record. Verify
        if you have indeed activated all the options you want to.</para>
      </section>

      <section>
        <title>Verification of the "MARC &lt;=&gt; non-MARC link"</title>

        <para>The links between the two databases MARC and non-MARC can be
        verified easily through</para>

        <synopsis>Koha &gt;&gt; Parameters &gt;&gt; Links Koha - MARC DB</synopsis>

        <para>This window lists all the connections of the fields between the
        non-MARC and the MARC fields/subfield.</para>

        <para>The links can be modified in this interface, but beware:
        <emphasis role="bold">each modification is valid for ALL
        frameworks</emphasis>.</para>
      </section>
    </section>

    <section>
      <title>First import</title>

      <para>Return to the shell, and start the first import:</para>

      <programlisting>$KOHA/scripts/misc/bulkmarcimport.pl -d -c UNIMARC --file notices.iso</programlisting>

      <para>The option -d allows to delete existing records. It's useless for
      the first import, but as it will be useful later, it's good to start
      with the right habit.</para>

      <para>The option -c UNIMARC indicates how the characters of your file
      are ENCODED. Attention: it's possible to have records in UNIMARC format
      using characters according to MARC21 norms. If at looking at the
      records, you see strange diacritical characters, try the other option
      (-c MARC21).</para>
    </section>

    <section>
      <title>First results</title>

      <section>
        <title>Visual verification</title>

        <para>As soon as the script has finished running or if you have
        interrupted it, you can have a look at the catalogue.</para>

        <para>Normally, the first result won't look great. Search for a
        frequently used term and check whether there are any item records (in
        the simple view as well as in the full view all items are regrouped in
        a special "group").</para>

        <para>If the item records are missing go back to parametrization of
        Koha so as to make item records appear.</para>

        <para>Verify also if both the "simple", i.e. non-MARC, and "complete",
        i.e. MARC, view contain data.</para>

        <para>If subfields seem to be missing in the MARC view on display, two
        things are possible:</para>

        <itemizedlist>
          <listitem>
            <para>The subfield exists in the catalogue, but doesn't appear on
            screen. This means that the subfield has not been "activated", but
            contains -1 (ignore) in the tag. Modify the parametrization and go
            back to check the result at the display without re-importing the
            data.</para>
          </listitem>

          <listitem>
            <para>The subfield is actually missing from the database, but you
            think it should be there. Check with the utility dumpmarc.pl if it
            really exists where you expect it.</para>
          </listitem>
        </itemizedlist>

        <para>If data appears in the "complete" (MARC) record but not in the
        "simple" (non-MARC) record, this can be OK, or not.</para>

        <para>Actually, the number of fields which can appear in the
        "non-MARC" part is limited. Only the following fields can be found
        there:</para>

        <itemizedlist>
          <listitem>
            <para>author</para>
          </listitem>

          <listitem>
            <para>title</para>
          </listitem>

          <listitem>
            <para>unititle (uniform title)</para>
          </listitem>

          <listitem>
            <para>notes (bibliographic)</para>
          </listitem>

          <listitem>
            <para>abstract</para>
          </listitem>

          <listitem>
            <para>seriestitle</para>
          </listitem>

          <listitem>
            <para>copyrightdate</para>
          </listitem>

          <listitem>
            <para>volume</para>
          </listitem>

          <listitem>
            <para>number (of volume)</para>
          </listitem>

          <listitem>
            <para>classification (local classification code)</para>
          </listitem>

          <listitem>
            <para>itemtype (document type)</para>
          </listitem>

          <listitem>
            <para>url</para>
          </listitem>

          <listitem>
            <para>isbn and issn</para>
          </listitem>

          <listitem>
            <para>dewey</para>
          </listitem>

          <listitem>
            <para>publicationyear (date of publication, edition)</para>
          </listitem>

          <listitem>
            <para>publishercode (name of publisher)</para>
          </listitem>

          <listitem>
            <para>volumedate</para>
          </listitem>

          <listitem>
            <para>volumeddesc (description of volume)</para>
          </listitem>

          <listitem>
            <para>illus (illustrator)</para>
          </listitem>

          <listitem>
            <para>pages</para>
          </listitem>

          <listitem>
            <para>bnotes (2nd note field)</para>
          </listitem>

          <listitem>
            <para>size (textarea, to be able to have an area as big as
            24x30)</para>
          </listitem>

          <listitem>
            <para>lccn (Library of Congress classification)</para>
          </listitem>
        </itemizedlist>

        <para>Alls these zones are NOT repeatable in the "non-MARC" part of
        the database, the part which appears in the display of the simple
        view. If one of these zones is repeated, only the first one will be
        displayed or in certain cases, the zones will be separated by the |
        character.</para>

        <para>Three further zones are repeatable:</para>

        <itemizedlist>
          <listitem>
            <para>Additional authors (additionalauthors.author)</para>
          </listitem>

          <listitem>
            <para>Subtitles (bibliosubtitle.subtitle)</para>
          </listitem>

          <listitem>
            <para>Subjects (bibliosubject.subject)</para>
          </listitem>
        </itemizedlist>
      </section>

      <section>
        <title>Verification query</title>

        <para>It's also possible to make some SQL queries for data
        verification.</para>

        <para>The following query will display all the fields and subfields
        with their use. This query is quite helpful in detecting errors of the
        parametrization of the cataloguing framework:</para>

        <programlisting>SELECT tab, tagfield, tagsubfield, count( * ) AS tot FROM marc_subfield_structure 
LEFT JOIN marc_subfield_table ON marc_subfield_table.tag=marc_subfield_structure.tagfield 
AND marc_subfield_table.subfieldcode=marc_subfield_structure.tagsubfield 
GROUP BY tab, tag, subfieldcode</programlisting>
      </section>

      <section>
        <title>Iterations</title>

        <para>It's possible to iterate the imports of the catalogue as long as
        needed, i.e. until a satisfying result is obtained.</para>

        <para>Note that the script <filename>bulkmarcimport.pl</filename> does
        not modify the bibliographic records.</para>

        <para>A migration can by the way be a good occasion to clean up your
        data, deleting parts which will not be used anymore, or to modify
        others to render them norm compliant.</para>

        <para>To modify the <filename>bulkmarcimport.pl</filename> script
        needs good knowledge of Perl, and mainly in MARC record
        manipulation.</para>

        <para>The manipulation of MARC records is done by the Perl package
        <filename>MARC::Record</filename>.</para>

        <para>The description can be found with Perldoc:</para>

        <programlisting>perldoc /usr/lib/perl5/site_perl/5.8.3/MARC/Record.pm</programlisting>

        <para>(the path is the path of my Mandrake 10.0 machine, it can differ
        according to your linux/unix distribution)</para>

        <para>The same documentation can be found on the net:</para>

        <para><ulink
        url="???http://marcpm.sourceforge.net/MARC/Record.html">http://marcpm.sourceforge.net/MARC/Record.html</ulink></para>

        <para>and</para>

        <para><ulink
        url="http://marcpm.sourceforge.net/MARC/Field.html">http://marcpm.sourceforge.net/MARC/Field.html</ulink></para>

        <para><filename>MARC::Record</filename> allows any manipulation
        directly on the record. It's only flaw is the complex interface which
        is a direct consequence of the complexity of the iso2709 norm.</para>
      </section>
    </section>

    <section>
      <title>Enhancing the cataloging framework</title>

      <para>Up till now, all the subfields have exactly the same input format:
      the subfield is unrestricted, the cataloguer can input any kind of
      data.</para>

      <para>Koha has three more formats for subfield input:</para>

      <itemizedlist>
        <listitem>
          <para>List of authorized values</para>
        </listitem>

        <listitem>
          <para>Thesaurus/authority</para>
        </listitem>

        <listitem>
          <para>Plugin</para>
        </listitem>
      </itemizedlist>

      <section>
        <title>List of authorized values</title>

        <para>The lists of authorized values allow to define a list of
        possible values for a given subfield. This is especially wanted for
        subfields where only a well defined and limited set of normalized
        values are allowed for input.</para>

        <para>Examples:</para>

        <itemizedlist>
          <listitem>
            <para>Languages (language codes)</para>
          </listitem>

          <listitem>
            <para>Countries (country codes)</para>
          </listitem>

          <listitem>
            <para>Codes for the function of secondary authors</para>
          </listitem>
        </itemizedlist>

        <para>Here follows the example of how to activate the list of language
        codes.</para>

        <para>As soon as this has been done, the subfield isn't a free text
        zone anymore, but a drop-down list of pre-defined values.</para>

        <para><note>
            <para>From an ergonomic point of view, a drop-down list is useful
            only if the number of given values is limited. In general, the
            list shouldn't contain more than around 20 entries. We recommend
            not to use larger lists.</para>
          </note></para>

        <section>
          <title>Definition of the drop-down list box and its authorized
          values</title>

          <para>Koha &gt;&gt; Parameters &gt;&gt; Authorised values &gt;&gt;
          New category</para>

          <para>Choose the category code. It's helpful to use an intelligent
          code (e.g. LANG for the languages).</para>

          <para>After having entered the category code, enter the first
          possible value.</para>

          <para>There are always two elements to the authorized values: the
          code which is put into the record; and it's label which will be
          displayed in the cataloguing template.</para>

          <para>Example: languages <emphasis>- eng</emphasis> would be the
          code and <emphasis>english</emphasis> the label. The code
          <emphasis>eng</emphasis> will be inserted in the record if the
          cataloguer chooses "english" in the list.</para>
        </section>

        <section>
          <title>"Connection"</title>

          <para>When your list is complete (or as soon as you wish, it's
          always possible to edit the list afterwards), return to the
          parametrization of the biblio framework, go to the field you want to
          "constraint" (e.g. languages, 041 in MARC21 - but this is still not
          a MARC class)</para>

          <para>In the input constraints for the tag, you have now in the list
          of "authorized values" a category LANG. Choose this category and
          from now on the cataloging template doesn't offer a text field in
          041 anymore, but a drop-down list of the predefined values.</para>
        </section>

        <section>
          <title>Two more things for drop-down lists</title>

          <para>... you should know about:</para>

          <itemizedlist>
            <listitem>
              <para>If a subfield is optional, the MARC editor adds
              automatically the value empty. If the subfield is mandatory, no
              empty value is added and has not to be entered. This means that
              in any case, an empty entry is useless!</para>
            </listitem>

            <listitem>
              <para>Please note that the values will be listed in the order of
              the labels, and not in the order of codes. A blank is considered
              as "smaller than the letter 'A'". Knowing this, it's possible to
              define a default value for a subfield: enter a space before a
              label, and the value will be listed on top. Also, as a solitary
              blank in front of a character string is ignored in HTML, the
              label will display without the added leading space!</para>
            </listitem>
          </itemizedlist>
        </section>

        <section>
          <title>Conclusion</title>

          <para>Now that you know how to define authorized values, it's time
          to learn that there exists a script which is actually taking over
          the task to define the list of languages.</para>

          <para>The script is :</para>

          <programlisting>$KOHA/misc/migration_tools/buildLANG.pl</programlisting>

          <para>Execute it without parameters so as to have the options (and
          don't forget to export PERL5LIB so that the script uses certain
          particular Koha modules).</para>
        </section>

        <section>
          <title>Special case : itemtypes and branches (IMPORTANT)</title>

          <para>The list of authorized values for a subfield "linking" is
          automatically augmented for two particular types.</para>

          <para><emphasis role="bold">These two authorized values must be
          linked to MARC fields.</emphasis></para>

          <variablelist>
            <varlistentry>
              <term>itemtypes</term>

              <listitem>
                <para>The table of document types which are possible in the
                programme. This table is used in different places.</para>

                <itemizedlist>
                  <listitem>
                    <para>The readers can limit their search for a document
                    type</para>
                  </listitem>

                  <listitem>
                    <para>The loan rules depend on the document types</para>
                  </listitem>
                </itemizedlist>

                <para>This table must thus be linked. In UNIMARC, it should
                logically be connected to the 200$b field.</para>

                <para>The field to which the table itemtypes is linked must
                also be linked to the <emphasis>Koha field</emphasis>
                biblioitems.itemtype.</para>
              </listitem>
            </varlistentry>

            <varlistentry>
              <term>branches</term>

              <listitem>
                <para>The table of library branches. At least one branch (the
                main branch) has to be defined.</para>

                <para>This branch must be linked to two subfields of the item
                record field. For UNIMARC, it's used according to the 995
                recommendation.</para>

                <para>These two subfields have to be linked to the fields
                items.holdingbranch and items.homebranch.</para>
              </listitem>
            </varlistentry>
          </variablelist>
        </section>
      </section>

      <section>
        <title>Thesaurus / Authority lists</title>

        <para>I will not explain to you the nature of authority records
        according to the UNIMARC norm.</para>

        <para>Koha is able to manage authorities in the MARC (UNIMARC or
        another MARC declination) format. The following section tells how to
        parametrize the different categories of thesaurus/authorities, and how
        to migrate a thesaurus.</para>

        <section>
          <title>Initial remarks</title>

          <para>In the UNIMARC norm, subfield $3 of the bibliographic record
          is used to stock the number of the authority record, and thus the
          link between the authority record (A) and a bibliographic record (B)
          is established.</para>

          <para>Koha works with this field, but it uses subfield $9 to
          preserve internally the links between records (B and A). $9 is
          reserved for the local system, thus it's all conform to the norms.
          And it presents also some other internal advantages.</para>

          <para>Note that in this section, the migration from a version 2.0 to
          a 2.2 is not mentioned. It's a complicated migration, please contact
          me, if you need help as I do have some scripts I could send
          you.</para>
        </section>

        <section>
          <title>Definition of thesaurus / authority list</title>

          <para>At the installation of Koha, you had the possibility to import
          the definition of UNIMARC authorities (at the end, when you had the
          option to import SQL files).</para>

          <para>If this has not been done, do it manually. Search for the SQL
          file in the directory where you have decompressed Koha.</para>

          <para>This parameter file cannot be used directly: it lists all
          basic fields needed to define your own authority structures.</para>

          <para>Follow the example of how to define the structure for
          authorities for personal author names. The idea is the same for all
          authorities, no matter whether subjects or authors.</para>
        </section>

        <section>
          <title>Parametrization of authority frameworks</title>

          <para>Go to: Koha &gt;&gt; Parameters &gt;&gt; Thesaurus
          Structure</para>

          <para>Click on <emphasis>Add an authority type</emphasis>.</para>

          <para>Input:</para>

          <itemizedlist>
            <listitem>
              <para>A code for this authority type. For personal names, you
              could use PN</para>
            </listitem>

            <listitem>
              <para>Description: of the authority type. This description is
              purely for information.</para>
            </listitem>

            <listitem>
              <para>Summary: enter here the elements which will allow to
              define the display of the records for search result lists. It's
              possible to define here the subfields which will be displayed in
              [ ]. Each subfield can be preceded or followed by a string. For
              example, for personal names of authors, you may indicate (in
              UNIMARC) :</para>

              <synopsis>[200a][200b][200c]
[400a][400z]
[100a]</synopsis>

              <para>The display of the summary of the record in search result
              lists would thus contain the heading field and the see from
              tracing fields. If you find the list too long add only the line
              of the [200]s.</para>
            </listitem>

            <listitem>
              <para>Summary (MARC21):</para>

              <synopsis>[100a][...][...] to be completed for MARC21
[400a][...]
[500a]</synopsis>
            </listitem>

            <listitem>
              <para>The number (000 - 999) of the authority record field
              corresponding to the bibliographic record field. For the PN
              authorities, e.g., it's the 200 tag which corresponds to the
              bibliographic record (tags 700, 701 or 702 UNIMARC, or 700, 710,
              720 MARC21). As you can see, you indicate the whole field, not
              subfields. Actually, Koha will automatically make all subfields
              automatically correspond in the bibliographic record. In the
              above example, 200$a corresponds to 700$a, 200$b to 700$b,
              etc.</para>
            </listitem>
          </itemizedlist>

          <para>As soon as you have finished, validate the input. The
          authority type should appear in the list of authorities. Click on
          "MARC structure". Koha will detect first that there is no template
          for this authority type yet and it'll ask which existing template
          has to be copied. At the beginning only "default" can be
          copied.</para>

          <para>Now, an authority framework can be defined exactly as the
          framework for cataloging bibliographic records. Some options which
          are specific to bibliographic records will not be available, but
          otherwise it looks pretty much the same.</para>

          <para>Once the authority framework is defined, go to the authority
          menu and create an authority record to control how the input
          template for authorities looks.</para>
        </section>

        <section>
          <title>Connexion B =&gt; A</title>

          <para>Now that authority records are parametrized, return to
          parametrization of bibliographic records so as to link the B records
          (bibliographic) with the A records (authorities). </para>

          <para>Thus, return to</para>

          <para><synopsis>Koha &gt;&gt; Parameters &gt;&gt; Biblio framework (MARC structure)</synopsis></para>

          <para>Go to tag 700 containing the heading of the main author or of
          the main authors.</para>

          <variablelist>
            <varlistentry>
              <term>Subfield $9</term>

              <listitem>
                <para>As explained above, Koha uses subfield $9 to link record
                B with record A.</para>

                <para>Verify if subfield $9 has been created, and if not,
                create it, and activate it in the same tab as the other
                subfields of tag 700. Check the box "Hidden" so as to have it
                not displayed on the OPAC.</para>
              </listitem>
            </varlistentry>

            <varlistentry>
              <term>Link</term>

              <listitem>
                <para>Modify the subfield 700. For one of the subfields
                (logically it would be $a, but this is not mandatory) select
                the value PN from the authority/thesaurus list as an input
                constraint.</para>

                <para>Thus you indicated to Koha that the 700 field contains
                values of the PN authority list.</para>

                <para>Add a new record now. Note that before 700$a three dots
                are displayed: ... </para>

                <para>These ... are a hyperlink opening a popup window, which
                allows a search in the authority list and to copy over the
                found values. After selecting the value, the popup window
                closes and the selected entry has been copied automatically
                into the bibliographic window.</para>
              </listitem>
            </varlistentry>
          </variablelist>
        </section>

        <section>
          <title>Automatic reconstruction</title>

          <para>It is impossible to propose a script which works always.
          Especially as it would have to be different for PN (personal names),
          NC (nouns), CO (corporate entries), etc.</para>

          <para>Nevertheless you'll find a script to reconstruct NC
          authorities in the directory
          <filename>misc/migration_tools</filename>,
          <filename>build6xx.pl</filename>, which rebuilds the 606 tag while
          checking for the presence of subfield $x.</para>

          <para>This script can be used as a base to reconstruct the 700, 500,
          or whatever you'll need.</para>
        </section>
      </section>

      <section>
        <title>Plugins (so far - UNIMARC only)</title>

        <para>Plugins are supplementary modules which can be used to treat
        data in different ways so as to ease cataloguing of bibliographic (or
        authority) records.</para>

        <para>For example, for UNIMARC, there are plugins facilitating:</para>

        <itemizedlist>
          <listitem>
            <para>input of publishers and series statements</para>
          </listitem>

          <listitem>
            <para>input of coded fields (1XX)</para>
          </listitem>
        </itemizedlist>

        <section>
          <title>General remarks</title>

          <para>A plugin is active on one given subfield. When the plugin is
          active, the subfield has the following properties:</para>

          <itemizedlist>
            <listitem>
              <para>Manual input is possible as for any normal field.</para>
            </listitem>

            <listitem>
              <para>It has ... which open a window. This popup window assists
              in filling in the field.</para>
            </listitem>

            <listitem>
              <para>If you enter or leave a subfield, certain plugins will
              modify the content of the entered values automatically in other
              subfields of the cataloging framework.</para>
            </listitem>
          </itemizedlist>
        </section>

        <section>
          <title>Coded fields</title>

          <para>In version 2.2 of Koha, the "Ecole Nationale Supérieure des
          Mines de Paris" has developed a plugin for all 1xx (UNIMARC) which
          are coded fields.</para>

          <para>The ... open up a popup window which contains a drop-down list
          for each coded element of a subfield, easing thus the cataloguing
          process considerably.</para>
        </section>

        <section>
          <title>Publishers/Series statements</title>

          <para>The plugins called <filename>unimarc_field_210c.pl</filename>
          and <filename>unimarc_field_225a.pl</filename> will fill in
          automatically the fields containing publishers and the series
          statement according to the ISBN (if it exists).</para>

          <para>This plugin is pretty delicate in its configuration.</para>

          <para>First of all, it uses ... authority files. The publisher's
          list can indeed be looked at as an authority list (= authority being
          the heading under which the editor is listed).</para>

          <para>Thus:</para>

          <para>Koha &gt;&gt; Parameters &gt;&gt; Authority values</para>

          <para>Add an authority type which must be call EDITORS. It's summary
          will be:<programlisting>[200a ][ / 200b]</programlisting> and the
          linking field will be 200.</para>

          <para>Define for the publisher authority records the following
          structure:</para>

          <programlisting>200$a =&gt; ISBN
200$b =&gt; Publisher
200$c (repeatable) =&gt; Series</programlisting>

          <para>Then "link" the UNIMARC subfields 210$c and 225$a.</para>

          <para>That's it.</para>

          <para>It's now possible to input the "publisher records" by using
          the ISBN's first two elements (e.g. of ISBN 1-22-333333-4, use 122,
          but WITHOUT the hyphens).</para>

          <para>Enter the information you desire for the publisher, and the
          series statements (separating them by | as always when separating
          subfields).</para>

          <para>These functions are pretty handy for libraries who have a
          limited number of publishers (i.e. mainly specialized
          libraries).</para>

          <para>As in cataloguing, after having entered the ISBN in a
          bibliographic record, going to the publisher field will lead to an
          automatic entry of the field.</para>

          <para>Clicking on the ... in front of the series statements will
          open a drop-down list with the "series" of this publisher.</para>

          <para>Please note that it's not yet possible to create a series
          statement on the go, and neither to create authority tables
          automatically from records (in the process of a migration, e.g.),
          but somebody will do that one day.</para>
        </section>
      </section>
    </section>

    <section>
      <title>Migration of a non-iso2709 file</title>

      <para>This case is really easy to manage: construct a MARC::Record (i.e.
      iso2709) on the go.</para>

      <para>This is pretty complex. Especially as all cases must be
      imagined!</para>

      <section>
        <title>Migration of a Texto file</title>

        <para>I have some routines to migrate a Texto database. They are not
        available here as each Texto installation has it's own list of
        fields.</para>

        <para>But the use of a Texto file can be summarized as follows:</para>

        <programlisting>$/ = "";
while (&lt;AJOUT_PILOTE&gt;)
{
    my @fichier = split/\n/,$_; 
# print "taille fichier".$#fichier."\n";
    foreach my $ligne (@fichier) {
# on examine le contenu de la fichier
if ($ligne eq "") {# separateur de fichiers

}

my @mots = split(/\./, $ligne);
$mots[0] =~ s/ //g;
$last=$mots[0] if $mots[0];
$resul{$last}.=$mots[1]." ";
    }

# ok, on a l'enregistrement, on construit le MARC::Record.
my $newRecord = MARC::Record-&gt;new();</programlisting>

        <para>This part of code will build a hash table for each record. From
        there an iso2709 record can be created, and thus imported
        normally.</para>

        <programlisting>$resul{ISBN} =~ s/-//g;
my $newField = MARC::Field-&gt;new(
  '010','','',
  'a' =&gt; $resul{ISBN},
);
$newRecord-&gt;insert_fields_ordered($newField);
$newField = MARC::Field-&gt;new(
  '011','','',
  'a' =&gt; $resul{ISSN},
);
newField = MARC::Field-&gt;new(
  '200','','',
  'a' =&gt; $resul{TIT},
  'b' =&gt; $resul{TYP},
  'e' =&gt; $resul{STI},
  'f' =&gt; $resul{AUT},
  'g' =&gt; $resul{NOT},
);
$newRecord-&gt;insert_fields_ordered($newField);
my ($bibid,$oldbibnum,$oldbibitemnum) = NEWnewbiblio($dbh,$newRecord,'') unless ($test_parameter);</programlisting>

        <para>Please not that the above code doesn't treat item records (this
        is really much too specific a case).</para>
      </section>
    </section>
  </section>
</article>