|
|
Introduction Background to the CCF Differences between the CCF and MARC Requirements of the exchange format today What can the CCF teach us here Conclusion |
|
Introduction
It is difficult in this conference to know exactly which aspects of the CCF to cover in my paper. It is particularly difficult as I am following a paper on MARC since what applies to MARC, within the context of this conference, in almost all cases applies to the CCF.
I therefore propose to begin with a history of the CCF, pointing out as I go the main differences between the CCF and the MARC formats, where there are any.
I then propose to discuss what is required of a communication format today. In doing this I will take into account the requirements of users of bibliographic data at this moment in time and I will make suggestions as to how these features might be achieved. It will be useful to look back to what the CCF thought it needed to provide and to ask the question if the huge quantities of the world's bibliographic records in ISO 2709-based (1) formats can ever provide what is required of the bibliographic record today.
(Incidentally, I believe communication format and exchange format to be absolutely synonymous when
used in the bibliographic arena in the English language.)
I will begin only with the birth of the CCF; the events leading up to that time belong more properly to the
history of MARC.
In April 1978 the UNESCO General Information Programme (UNESCO/PGI) sponsored an International
Symposium on Bibliographic Exchange Formats (2), held in Taormina, Sicily. Organized by the UNISIST
International Centre for Bibliographic Descriptions (UNIBID) in co-operation with the International
Council of Scientific Unions Abstracting Board (ICSU-AB), the International Federation of Library
Associations and Institutions (IFLA), and the International Organization for Standardization (ISO), the
Symposium was convened 'to study the desirability and feasibility of establishing maximum compatibility
between existing bibliographic exchange formats.' The need for this discussion had arisen because
UNESCO and ICSU-AB had themselves within the framework of their UNISIST programme developed
an exchange format (known as the UNISIST Reference Manual (3)) to enable institutions in the secondary
services sector to exchange data with each other. Unlike the majority of MARC formats at the time
which laid the accent on the monograph, the UNISIST Reference Manual gave equal emphasis to the
article in a journal or a contribution to a proceedings. There were occasions when consultants working
for UNESCO were faced with having to make the decision as to whether to use this format or to use the
MARC formats which were being developed by the national libraries.
The Symposium recognised in its recommendations the need for compatibility to be achieved. Following
the Symposium, and as a direct result of its recommendations, UNESCO/PGI formed the Ad hoc Group
on the Establishment of a Common Communication Format, which included experts able to present the
views of a broad spectrum of the information community. Members of this Group worked at meetings
and through correspondence to produce a common bibliographic exchange format that would be useful
both to libraries and other information services. At the start of its deliberations the Group decided:
In addition it was affirmed that the CCF should be more than merely a new format: it should be based on,
and provide a bridge between, the major international exchange formats, while taking into account the
International Standard Bibliographic Descriptions (ISBD) developed by IFLA.
Early in its deliberations the Group undertook a comparison of all of the data elements in the Reference
Manual, UNIMARC (4), ISDS Manual (5),
MEKOF-2 (6), ASIDIC/EUSIDIC/ICSU-AB/NFAIS Interchange
Specifications (7), and the USSR-US Common Communication Format (8). With these six standard formats
as a guide, the Group identified a small number of data elements which were used by virtually all
information-handling communities, including both libraries and abstracting and indexing organizations.
These commonly used data elements formed the core of the CCF. A technique was developed to show
relationships between bibliographic records, and between elements within bibliographic records. The
concept of the record segment was developed and refined, and a method for designating relationships
between records, segments, and fields was accepted by the group. The first edition of CCF: The
Common Communication Format (9) was published in 1984.
Following its publication, bibliographic agencies around the world developed national and local formats
based on the CCF. These were presented at the first meeting of CCF users, which was held in Geneva in
1989 (10). At that meeting users recommended some minor changes which were incorporated into the
format. There was also one main change. Hitherto, there had been two methods of linking between
records, each of which required a separate segment. Henceforward, a separate segment is required only if
the data of the related record is embedded in the main record. If it is desired to link to a related item for
which a record actually exists, this is now achieved by incorporating a linking field into the main record.
This change was made to simplify the format. Segments are retained for the treatment of records which
include within them segments relating to different bibliographic levels.
At the same time, though it perhaps concerns us little in this conference, a new manual was published to
include those data elements for recording factual information which are most often used for referral
purposes. The result was the division of the CCF format documentation into two volumes: CCF/B for
bibliographic information (11), and CCF/F for factual information (12).
Since 1991 there has been little development work on the CCF, in contrast to the recent activity on the
MARC formats. It is worth considering why that is. One key decision of the group in the beginning was
that the CCF should be kept simple. The National Library community shares its record creation in such a
way that it can aim at high quality records. This is much more appropriate and necessary for
monographic material than it is for articles in monographs or journals or for those organisations indexing
grey literature. There is a need for national libraries to constantly monitor new developments in the
material they catalogue, so there is a continuous evolution of the MARC formats. The CCF on the other
hand has been kept simple, with as few fields as possible and there is not so much need for evolution. It
was interesting that, in discussions at the CCF Users Meeting in 1989, a number of participants reported
using the CCF for many different kinds of bibliographic material and not merely for printed materials.
One area, however, which does need development is a CCF for archive material. Otherwise, the latest
revision of the CCF should remain valid for some time, so long as we see the CCF as an ISO 2709-based
record exchange format. This of course begs many of the questions being asked in this conference.
To return to the differences between the CCF and MARC, we have noted the record linking. This was
developed to be complex but logical and was indeed developed with the hindsight of record linking in
MARC since the Ad hoc Group appointed a subgroup consisting of experts with the experience of
developing record-linking procedures for MARC.
Record linking has always been more important for the secondary services than for national bibliography
production because they wish to provide records at different bibliographic levels in a common database.
Nevertheless, UNIMARC has had from the outset a record linking technique and many national MARC
formats also include record linking. The real difference probably is that secondary services cannot begin
to function without their databases taking into account bibliographic levels and their relationships; the
vast majority of records from national libraries, in percentage terms, do not contain any explicit links.
Indeed, some national MARC formats still do not include a facility for record linking. It is interesting to
look at the complex record the developers of the CCF had in mind when the CCF was being devised.
Figure 1 taken from the Implementation notes for the CCF (13) is
an example of such a record structure.
Other differences relate to the philosophy of the majority of the users of each format respectively. To
take the users of MARC first. The national library sector, which leads the MARC format users, tends to
aim for cooperation and therefore for standardisation to the greatest extent possible. There have been
moves over the years to standardise initially the cataloguing codes used by the national libraries of
Australia, Canada and Great Britain and the Library of Congress. This has extended to standardising the
exact forms of headings used. At the same time, three of the four countries have agreed on a standard
format and Great Britain will, no doubt, follow. There have been moves also to provide simpler, or
minimal, cut down, versions of the format. Standard formats like UNIMARC have been developed to
enable the exchange of data between the different MARC formats. But it is more economical on time and
effort if everyone uses the same.
Thus, the main aim of MARC is to provide a universally acceptable record which follows standard rules
and which can be incorporated into the records of other national libraries and their national systems. The
existence of separate national formats rather than one international MARC format can be seen as an
accident of history rather than a technical necessity.
The users of the CCF on the other hand, coming from many different backgrounds (some indeed national
libraries), would never consider aiming at such a level of homogeneity between records originating in
different systems. They have been able to accept that there will be different practices in record creation
resulting in records which, when merged into a database, will show their different origins. Many users of
the CCF use the format in the hope that they might gain by using a common format in a number of ways
in addition to exchanging data. Many of them are looking for a simple format; many are users of
CDS/ISIS (14), a UNESCO software product.
There is certainly no feeling that all users should use related cataloguing codes; even if there were such a
desire, it is unlikely they would find a code acceptable to everyone since the organisations concerned have
many different clienteles for their records. Some, though, have adopted AACR or AACR-based formats
as being the most readily available.
Users of a format (as indeed of any standard) will always be circumscribed by the characteristics of the
format. It is often stated that standards stifle innovation, though it is my view that innovation is better
launched from the stability afforded by standards.
ISO 2709-based formats were intended for use with half-inch magnetic tape. Though they have been
applied fairly well to exchange on other media, e.g. on-line and disk, new media have caused slight
problems. It is inconvenient for files in a microcomputer, because of the way the majority of programs to
manipulate files have been written, to have no carriage returns. So carriage returns have been added
either at the end of each record or after every 80 characters. The use of ISO 2709 for disk files is not
ideal, because users make these slight variations to the record structure which result in the files being
more difficult to transfer to other systems.
ISO 2709-based records form huge collections of records throughout the world. Many are held in
computer systems in different formats and only become ISO 2709 while being exchanged. Export and
import procedures have been set up in systems using the ISO 2709 record structure as the exchange
format. It is no less ideal than any other format for the exchange of large data between systems for which
it has been developed.
However, record exchange is being done in different ways now. Library automation systems need to
capture and provide records in other ways.
ISO 2709 standard formats were already seen as being in need of greater definition in 1984 when the
Office of Official Publications of the European Community developed an extension to the CCF using
SGML, the Standard Generalised Mark-Up language which was in the process of becoming an
international standard (15). They needed to produce a journal and the actual articles and used the CCF as a
basis for producing the index to it and bibliographic records relating to each article. Nevertheless the
whole record was held in the CCF format in a field which included the SGML coding. In fact this was the
field used to produce the journal article. It has to be stated that this use of the CCF was beyond what an
ISO 2709 format was intended to do.
What do we see today as the main areas of interest in the exchange of bibliographic records? We still see
large quantities of CCF and MARC records being exchanged between systems. Usually records are being
obtained from bibliographic utilities to add to an individual catalogue. Sometimes they are being
transferred between different software systems belonging to one organisation, either on an ongoing basis
or when one system is being discontinued in favour of another. In the UK, for example, when we wish to
transfer data between systems because we are changing our library automation software package or
because library systems merge as a result of changes in local government or in academic institutions, we
feel the lack of a holdings format within UK MARC. Similarly, it was never thought necessary to add
this kind of data to the CCF, though many users have added such fields to their own systems each doing it
differently because there is no recommendation to follow.
Today, the situation is different from that of the early days of MARC. The explosion in the use of
microcomputers has changed all this. Bibliographic records are transferred no longer only between large
computer systems. End users of systems now have their own requirements for the exchange of
bibliographic data. The use of PCs as intelligent terminals means that students in school expect to be
able to manipulate bibliographic records as they see them on the screen. Many library automation
systems have a facility for the library customer to be able to capture screens or, better, to extract records
as displayed on the screen into a floppy disk file. Almost invariably these records will be delimited by
position or by the label from the library catalogue display. Often the high level of specificity found in
MARC is lost. In many cases such refinements as italicisation are lost. However, in the case of WWW
based systems which use HTML, it is possible to get as high a level of specificity as in the source which
generates the view on the screen. Is this the kind of area to develop for the future? Patrons want to be
able to incorporate these records into bibliographies; professors want to produce reading lists; students
want to use this data as input to references at the end of their papers.
There is a danger that HTML is being used purely to represent the visual. End users do not realise this
and wish to do more with the records they can extract from HTML than the format will allow them. We
return to the catalogue card or rather multiple representations possible for cataloguing cards: even AACR
allows a little licence in the eye-readable record. What is needed is a standard underlying the WWW
record which can give the greater specificity found in MARC. Of course this would suit the end users if
they were provided with software into which they could download data. Procite is such an example and
some people have used CDS/ISIS; but many library automation systems do not provide an output format
which can provide easy record transfer into these desk-top bibliographic packages. The users want to
produce records in a particular house-style. MARC has from the beginning been defined to allow this.
UK MARC even allows for different punctuation to be generated at more points than US MARC, such as
between a person's family name and forename. The solution is to be able to incorporate the highly
specified MARC definition into the records as they are 'exported' from the screen display.
It should be remembered that the CCF and MARC can easily output into HTML. It is not difficult to
write a program to convert a standard record to an HTML record. The resulting HTML record will look
quite different when different people have prepared their own specifications. Therefore what is more
difficult (if not impossible) is to construct a MARC record from an HTML record. To avoid this it is
necessary to have a standard which recommends a way of doing this. HTML needs to include the
specificity of the ISO 2709-based formats from which the records have been devised to enable the
flexibility enjoyed by systems which use the ISO 2709 record. Whether these records will remain in the
ISO 2709 record structure is another matter which will I am sure be covered elsewhere in this conference.
It is interesting to observe that the exchange of records via ISO 2709 based programs does at present
enable a more controlled higher quality record to be exchanged and prevent unauthorised users obtaining
records to which they are not entitled for reasons of copyright. Abandoning a record structure like ISO
2709 would make the records procured this way fully integratable into library systems. Many librarians
when considering copyright issues of MARC records have felt reassured that records downloaded from
their systems would not be capable of being put to very sophisticated uses in other systems. Many library
automation systems do not allow their users to see the MARC record for similar reasons.
One of the problems with which the developers of the CCF wrestled was record linking. The majority of
the world's library systems tend to regard the borrowable item as the unit record. The national libraries
create records which relate to the ‘work'. The first area of conflict may well be caused by a discrepancy
between how the two parties see the bibliographic unit. A multi-volume monograph to a national library
will be many separate volumes to an issuing system and probably to an end user. A volume of an
encyclopedia may be the unit or the whole encyclopedia may the unit. Here libraries of the same type may
differ in their desired approach. Libraries tend to index series but the series provides nothing more than
an entry point, and that is probably all it really is. However, using theories of bibliographic levels, system
designers have seen fit to create distinct records for these.
Add to the above the need to be able to search a library catalogue for any article in any journal in the
library, and other kinds of links become necessary. The end user needs not only a reference to the article
but an immediate indication as to whether the issue (or the volume into which it has been bound) is owned
by that library and if it is on loan or not. This requires the facility to apply record linking between all
types of record which the structure of the CCF well permits. Some record linking of this kind must surely
be incorporated into HTML if we wish to be able to use records we find on the WWW and incorporate
them into other systems. Indeed, HTML is a 'format' which is very good at dealing with linking.
The question is 'Are we too late to go in this direction?'. MARC has in practice very much concerned
itself with a unit record though linking mechanisms have been developed for MARC. I speak with most
of my MARC experience as a UK MARC user where we have not developed record linking except for
analytics in monographs containing more than one work separately published elsewhere. We have
structures set up to provide for the needs of part of the library world which is a smaller part than it used
to be. Would existing MARC records need too much adaptation to enable record linking? This is a
question which I hope to hear answered by others in this Conference. However, I am optimistic, because
even though these links are not there explicitly in the records, they can be created in systems which use
MARC records, usually via the authority file or index: during the last year I have overseen the transfer of
over 300,000 records from one system to another and the authority records have been generated with a
large degree of success from the bibliographic MARC records. Even a system without authority records
will have indexes which provide conceptually a kind of record linking. Similarly, other kinds of records
(i.e. sub-records) could be created from records with a flat structure.
So, I am optimistic that existing records in the ISO 2709 format can continue to be used. There is also
something to be said for simplicity in transferring large quantities of data. The main problem with ISO
2709 is that the record structure does not lend itself to reading by the human eye. However, HTML is
just as bad if not worse!
I can find no better way to conclude than to quote from the CCF (16).
The CCF states that within an information system, records making up the database will usually exist in a
number of separate but highly compatible formats. At the very least there will be:
In addition, if two or more organizations wish to exchange records with one another, it will be necessary
for each of these organizations to agree upon a common standard format for exchange purposes. Each
must be able to convert to an exchange-format record from an internal-format record, and vice versa.
Exchange formats are clearly needed for the last purpose. We must avoid the danger of trying to include
in an exchange format features that more properly belong elsewhere and which can be generated within a
system rather than have to be communicated or exchanged between systems.
Background to the CCF
Differences between the CCF and MARC

Figure 1: Relationships in a complex CCF record
Requirements of the exchange format today
What can the CCF teach us here
Conclusion