THE FUTURE OF COMMUNICATION FORMATS

The Common Communication Format (CCF)

BY
Alan Hopkinson
Systems Librarian
Middlesex University
London, UK

October 7th 1996



Introduction


Background to the CCF


Differences between the CCF and MARC


Requirements of the exchange format today


What can the CCF teach us here


Conclusion

Introduction

It is difficult in this conference to know exactly which aspects of the CCF to cover in my paper. It is particularly difficult as I am following a paper on MARC since what applies to MARC, within the context of this conference, in almost all cases applies to the CCF.

I therefore propose to begin with a history of the CCF, pointing out as I go the main differences between the CCF and the MARC formats, where there are any.

I then propose to discuss what is required of a communication format today. In doing this I will take into account the requirements of users of bibliographic data at this moment in time and I will make suggestions as to how these features might be achieved. It will be useful to look back to what the CCF thought it needed to provide and to ask the question if the huge quantities of the world's bibliographic records in ISO 2709-based (1) formats can ever provide what is required of the bibliographic record today.

(Incidentally, I believe communication format and exchange format to be absolutely synonymous when used in the bibliographic arena in the English language.)

Background to the CCF

I will begin only with the birth of the CCF; the events leading up to that time belong more properly to the history of MARC.

In April 1978 the UNESCO General Information Programme (UNESCO/PGI) sponsored an International Symposium on Bibliographic Exchange Formats (2), held in Taormina, Sicily. Organized by the UNISIST International Centre for Bibliographic Descriptions (UNIBID) in co-operation with the International Council of Scientific Unions Abstracting Board (ICSU-AB), the International Federation of Library Associations and Institutions (IFLA), and the International Organization for Standardization (ISO), the Symposium was convened 'to study the desirability and feasibility of establishing maximum compatibility between existing bibliographic exchange formats.' The need for this discussion had arisen because UNESCO and ICSU-AB had themselves within the framework of their UNISIST programme developed an exchange format (known as the UNISIST Reference Manual (3)) to enable institutions in the secondary services sector to exchange data with each other. Unlike the majority of MARC formats at the time which laid the accent on the monograph, the UNISIST Reference Manual gave equal emphasis to the article in a journal or a contribution to a proceedings. There were occasions when consultants working for UNESCO were faced with having to make the decision as to whether to use this format or to use the MARC formats which were being developed by the national libraries.

The Symposium recognised in its recommendations the need for compatibility to be achieved. Following the Symposium, and as a direct result of its recommendations, UNESCO/PGI formed the Ad hoc Group on the Establishment of a Common Communication Format, which included experts able to present the views of a broad spectrum of the information community. Members of this Group worked at meetings and through correspondence to produce a common bibliographic exchange format that would be useful both to libraries and other information services. At the start of its deliberations the Group decided:

In addition it was affirmed that the CCF should be more than merely a new format: it should be based on, and provide a bridge between, the major international exchange formats, while taking into account the International Standard Bibliographic Descriptions (ISBD) developed by IFLA.

Early in its deliberations the Group undertook a comparison of all of the data elements in the Reference Manual, UNIMARC (4), ISDS Manual (5), MEKOF-2 (6), ASIDIC/EUSIDIC/ICSU-AB/NFAIS Interchange Specifications (7), and the USSR-US Common Communication Format (8). With these six standard formats as a guide, the Group identified a small number of data elements which were used by virtually all information-handling communities, including both libraries and abstracting and indexing organizations. These commonly used data elements formed the core of the CCF. A technique was developed to show relationships between bibliographic records, and between elements within bibliographic records. The concept of the record segment was developed and refined, and a method for designating relationships between records, segments, and fields was accepted by the group. The first edition of CCF: The Common Communication Format (9) was published in 1984.

Following its publication, bibliographic agencies around the world developed national and local formats based on the CCF. These were presented at the first meeting of CCF users, which was held in Geneva in 1989 (10). At that meeting users recommended some minor changes which were incorporated into the format. There was also one main change. Hitherto, there had been two methods of linking between records, each of which required a separate segment. Henceforward, a separate segment is required only if the data of the related record is embedded in the main record. If it is desired to link to a related item for which a record actually exists, this is now achieved by incorporating a linking field into the main record. This change was made to simplify the format. Segments are retained for the treatment of records which include within them segments relating to different bibliographic levels.

At the same time, though it perhaps concerns us little in this conference, a new manual was published to include those data elements for recording factual information which are most often used for referral purposes. The result was the division of the CCF format documentation into two volumes: CCF/B for bibliographic information (11), and CCF/F for factual information (12).

Since 1991 there has been little development work on the CCF, in contrast to the recent activity on the MARC formats. It is worth considering why that is. One key decision of the group in the beginning was that the CCF should be kept simple. The National Library community shares its record creation in such a way that it can aim at high quality records. This is much more appropriate and necessary for monographic material than it is for articles in monographs or journals or for those organisations indexing grey literature. There is a need for national libraries to constantly monitor new developments in the material they catalogue, so there is a continuous evolution of the MARC formats. The CCF on the other hand has been kept simple, with as few fields as possible and there is not so much need for evolution. It was interesting that, in discussions at the CCF Users Meeting in 1989, a number of participants reported using the CCF for many different kinds of bibliographic material and not merely for printed materials. One area, however, which does need development is a CCF for archive material. Otherwise, the latest revision of the CCF should remain valid for some time, so long as we see the CCF as an ISO 2709-based record exchange format. This of course begs many of the questions being asked in this conference.

Differences between the CCF and MARC

To return to the differences between the CCF and MARC, we have noted the record linking. This was developed to be complex but logical and was indeed developed with the hindsight of record linking in MARC since the Ad hoc Group appointed a subgroup consisting of experts with the experience of developing record-linking procedures for MARC.

Record linking has always been more important for the secondary services than for national bibliography production because they wish to provide records at different bibliographic levels in a common database. Nevertheless, UNIMARC has had from the outset a record linking technique and many national MARC formats also include record linking. The real difference probably is that secondary services cannot begin to function without their databases taking into account bibliographic levels and their relationships; the vast majority of records from national libraries, in percentage terms, do not contain any explicit links. Indeed, some national MARC formats still do not include a facility for record linking. It is interesting to look at the complex record the developers of the CCF had in mind when the CCF was being devised. Figure 1 taken from the Implementation notes for the CCF (13) is an example of such a record structure.


Figure 1: Relationships in a complex CCF record

Other differences relate to the philosophy of the majority of the users of each format respectively. To take the users of MARC first. The national library sector, which leads the MARC format users, tends to aim for cooperation and therefore for standardisation to the greatest extent possible. There have been moves over the years to standardise initially the cataloguing codes used by the national libraries of Australia, Canada and Great Britain and the Library of Congress. This has extended to standardising the exact forms of headings used. At the same time, three of the four countries have agreed on a standard format and Great Britain will, no doubt, follow. There have been moves also to provide simpler, or minimal, cut down, versions of the format. Standard formats like UNIMARC have been developed to enable the exchange of data between the different MARC formats. But it is more economical on time and effort if everyone uses the same.

Thus, the main aim of MARC is to provide a universally acceptable record which follows standard rules and which can be incorporated into the records of other national libraries and their national systems. The existence of separate national formats rather than one international MARC format can be seen as an accident of history rather than a technical necessity.

The users of the CCF on the other hand, coming from many different backgrounds (some indeed national libraries), would never consider aiming at such a level of homogeneity between records originating in different systems. They have been able to accept that there will be different practices in record creation resulting in records which, when merged into a database, will show their different origins. Many users of the CCF use the format in the hope that they might gain by using a common format in a number of ways in addition to exchanging data. Many of them are looking for a simple format; many are users of CDS/ISIS (14), a UNESCO software product.

There is certainly no feeling that all users should use related cataloguing codes; even if there were such a desire, it is unlikely they would find a code acceptable to everyone since the organisations concerned have many different clienteles for their records. Some, though, have adopted AACR or AACR-based formats as being the most readily available.

Requirements of the exchange format today

Users of a format (as indeed of any standard) will always be circumscribed by the characteristics of the format. It is often stated that standards stifle innovation, though it is my view that innovation is better launched from the stability afforded by standards.

ISO 2709-based formats were intended for use with half-inch magnetic tape. Though they have been applied fairly well to exchange on other media, e.g. on-line and disk, new media have caused slight problems. It is inconvenient for files in a microcomputer, because of the way the majority of programs to manipulate files have been written, to have no carriage returns. So carriage returns have been added either at the end of each record or after every 80 characters. The use of ISO 2709 for disk files is not ideal, because users make these slight variations to the record structure which result in the files being more difficult to transfer to other systems.

ISO 2709-based records form huge collections of records throughout the world. Many are held in computer systems in different formats and only become ISO 2709 while being exchanged. Export and import procedures have been set up in systems using the ISO 2709 record structure as the exchange format. It is no less ideal than any other format for the exchange of large data between systems for which it has been developed.

However, record exchange is being done in different ways now. Library automation systems need to capture and provide records in other ways.

ISO 2709 standard formats were already seen as being in need of greater definition in 1984 when the Office of Official Publications of the European Community developed an extension to the CCF using SGML, the Standard Generalised Mark-Up language which was in the process of becoming an international standard (15). They needed to produce a journal and the actual articles and used the CCF as a basis for producing the index to it and bibliographic records relating to each article. Nevertheless the whole record was held in the CCF format in a field which included the SGML coding. In fact this was the field used to produce the journal article. It has to be stated that this use of the CCF was beyond what an ISO 2709 format was intended to do.

What do we see today as the main areas of interest in the exchange of bibliographic records? We still see large quantities of CCF and MARC records being exchanged between systems. Usually records are being obtained from bibliographic utilities to add to an individual catalogue. Sometimes they are being transferred between different software systems belonging to one organisation, either on an ongoing basis or when one system is being discontinued in favour of another. In the UK, for example, when we wish to transfer data between systems because we are changing our library automation software package or because library systems merge as a result of changes in local government or in academic institutions, we feel the lack of a holdings format within UK MARC. Similarly, it was never thought necessary to add this kind of data to the CCF, though many users have added such fields to their own systems each doing it differently because there is no recommendation to follow.

Today, the situation is different from that of the early days of MARC. The explosion in the use of microcomputers has changed all this. Bibliographic records are transferred no longer only between large computer systems. End users of systems now have their own requirements for the exchange of bibliographic data. The use of PCs as intelligent terminals means that students in school expect to be able to manipulate bibliographic records as they see them on the screen. Many library automation systems have a facility for the library customer to be able to capture screens or, better, to extract records as displayed on the screen into a floppy disk file. Almost invariably these records will be delimited by position or by the label from the library catalogue display. Often the high level of specificity found in MARC is lost. In many cases such refinements as italicisation are lost. However, in the case of WWW based systems which use HTML, it is possible to get as high a level of specificity as in the source which generates the view on the screen. Is this the kind of area to develop for the future? Patrons want to be able to incorporate these records into bibliographies; professors want to produce reading lists; students want to use this data as input to references at the end of their papers.

There is a danger that HTML is being used purely to represent the visual. End users do not realise this and wish to do more with the records they can extract from HTML than the format will allow them. We return to the catalogue card or rather multiple representations possible for cataloguing cards: even AACR allows a little licence in the eye-readable record. What is needed is a standard underlying the WWW record which can give the greater specificity found in MARC. Of course this would suit the end users if they were provided with software into which they could download data. Procite is such an example and some people have used CDS/ISIS; but many library automation systems do not provide an output format which can provide easy record transfer into these desk-top bibliographic packages. The users want to produce records in a particular house-style. MARC has from the beginning been defined to allow this. UK MARC even allows for different punctuation to be generated at more points than US MARC, such as between a person's family name and forename. The solution is to be able to incorporate the highly specified MARC definition into the records as they are 'exported' from the screen display.

It should be remembered that the CCF and MARC can easily output into HTML. It is not difficult to write a program to convert a standard record to an HTML record. The resulting HTML record will look quite different when different people have prepared their own specifications. Therefore what is more difficult (if not impossible) is to construct a MARC record from an HTML record. To avoid this it is necessary to have a standard which recommends a way of doing this. HTML needs to include the specificity of the ISO 2709-based formats from which the records have been devised to enable the flexibility enjoyed by systems which use the ISO 2709 record. Whether these records will remain in the ISO 2709 record structure is another matter which will I am sure be covered elsewhere in this conference.

It is interesting to observe that the exchange of records via ISO 2709 based programs does at present enable a more controlled higher quality record to be exchanged and prevent unauthorised users obtaining records to which they are not entitled for reasons of copyright. Abandoning a record structure like ISO 2709 would make the records procured this way fully integratable into library systems. Many librarians when considering copyright issues of MARC records have felt reassured that records downloaded from their systems would not be capable of being put to very sophisticated uses in other systems. Many library automation systems do not allow their users to see the MARC record for similar reasons.

What can the CCF teach us here

One of the problems with which the developers of the CCF wrestled was record linking. The majority of the world's library systems tend to regard the borrowable item as the unit record. The national libraries create records which relate to the ‘work'. The first area of conflict may well be caused by a discrepancy between how the two parties see the bibliographic unit. A multi-volume monograph to a national library will be many separate volumes to an issuing system and probably to an end user. A volume of an encyclopedia may be the unit or the whole encyclopedia may the unit. Here libraries of the same type may differ in their desired approach. Libraries tend to index series but the series provides nothing more than an entry point, and that is probably all it really is. However, using theories of bibliographic levels, system designers have seen fit to create distinct records for these.

Add to the above the need to be able to search a library catalogue for any article in any journal in the library, and other kinds of links become necessary. The end user needs not only a reference to the article but an immediate indication as to whether the issue (or the volume into which it has been bound) is owned by that library and if it is on loan or not. This requires the facility to apply record linking between all types of record which the structure of the CCF well permits. Some record linking of this kind must surely be incorporated into HTML if we wish to be able to use records we find on the WWW and incorporate them into other systems. Indeed, HTML is a 'format' which is very good at dealing with linking.

The question is 'Are we too late to go in this direction?'. MARC has in practice very much concerned itself with a unit record though linking mechanisms have been developed for MARC. I speak with most of my MARC experience as a UK MARC user where we have not developed record linking except for analytics in monographs containing more than one work separately published elsewhere. We have structures set up to provide for the needs of part of the library world which is a smaller part than it used to be. Would existing MARC records need too much adaptation to enable record linking? This is a question which I hope to hear answered by others in this Conference. However, I am optimistic, because even though these links are not there explicitly in the records, they can be created in systems which use MARC records, usually via the authority file or index: during the last year I have overseen the transfer of over 300,000 records from one system to another and the authority records have been generated with a large degree of success from the bibliographic MARC records. Even a system without authority records will have indexes which provide conceptually a kind of record linking. Similarly, other kinds of records (i.e. sub-records) could be created from records with a flat structure.

So, I am optimistic that existing records in the ISO 2709 format can continue to be used. There is also something to be said for simplicity in transferring large quantities of data. The main problem with ISO 2709 is that the record structure does not lend itself to reading by the human eye. However, HTML is just as bad if not worse!

Conclusion

I can find no better way to conclude than to quote from the CCF (16).

The CCF states that within an information system, records making up the database will usually exist in a number of separate but highly compatible formats. At the very least there will be:

In addition, if two or more organizations wish to exchange records with one another, it will be necessary for each of these organizations to agree upon a common standard format for exchange purposes. Each must be able to convert to an exchange-format record from an internal-format record, and vice versa.

Exchange formats are clearly needed for the last purpose. We must avoid the danger of trying to include in an exchange format features that more properly belong elsewhere and which can be generated within a system rather than have to be communicated or exchanged between systems.