The video revolution — or why the 1990s will be the decade of the image in the information...

6
World Patent Information, Vol. 13, No. 4, pp. 187-192, 1991. Printed in Great Britain. 0172-2190/91$3.00+.00. Pergamon Press plc CEC/WIPO The Video Revolution - or Why the 1990s Will be the Decade of the Image in the Information Industry* Paul Dixon, Derwent Publications Ltd., London WClX 8RP, U. K. Summary The 1970s and 1980s saw the rise and rapid develop- ment of online textual databases, particularly those covering science and technology. Online patents files are amongst the most used of these databases, and are now accessed worldwide by both patent specialists and researchers. However, most online databases covering intellectual property are based on an original document or publication which includes highly meaningful drawings, chemical structures or diagrams. Until recently these have not been avail- able in electronic format. The recent rapid advances in image handling and distribution technology, such as Document Image Processing (DIP), image extrac- tion, CD-ROM, WORM discs, erasable optical discs, high speed laser printing, and the deliv- ery of images over high speed telecommunications networks or by Group 4 fax, will enable the next generation of information products to be developed during the 1990s. The ability to mix searchable text with corresponding images, held either locally or delivered online, will enable the benefits of the original printed document or product to be made available to users in an electronic format. Since images are such a key element in making the most effective use of patent information this will mean that patent information products will be amongst the first to experience this image handling revolution. This paper reviews some of the present developments in image handling, and speculates on what kinds of information tools might evolve during the decade. Introduction The availability of online services, offering access to scientific and technical databases, revolutionised the working practices of information professionals when they appeared in the mid-1970s. In those early days access to these services could be difficult and was often frustrating largely because of the *Paper presented at the Symposium Practical Problems in the Online Search of Patent Information held by OLPI (Intema- tional Association of Producers and Users of Online Patent Information) in association with the World Intellectual Property Organisation, Geneva, 4-5 July 1991. low telecommunications speeds which were avail- able. Anyone who now uses online at 2400 baud, or higher, and who started to access databases at 11-12 baud (on a teletype compatible terminal with an integral tape reader) will realise just how much progress has been made in the past 15 years. However, with a gradual improvement in telecom- munications access the 1980s proved to be a dec- ade of significant growth in the online industry - with new hosts and external databases appearing on a regular basis. This encouraged interest and use among an increasingly wide audience. Now the benefits of online retrieval of scientific, technical and business information are available to professionals in large, medium and small organisations. However, few real ‘end users’ access the services with which the information professional is familiar, and searching these sources of information remains a specialist skill largely in the hands of the intermediary. Statistics which estimate the size and value of the online information industry appear on a regular basis. Frankly most of these are meaningless as they vary so much in how they define what is included within the ‘online industry’. Many of these estimates include real-time financial services, elec- tronic mail services and end user online services (such as CompuServe) - which are very many times more heavily used than the kinds of online services with which we are most familiar in the information profession. Within the specialist area of patent information one of the best estimates of the volume of online usage was made by Jacques Michel of the European Patent Office at the Montreux Chemical Information Meeting held last September. He estimated the value of this usage at $86m per year. As someone with some inside knowledge in this area I would think that this estimate bears a close relation to the current volumes of online use of patents databases. While growth in the online industry is still very healthy, and shows percentage annual increases that would be the envy of most industries, it is now not as spectacular as the 25% plus seen during the 1980s as the industry expanded rapidly. I would contend that 187

Transcript of The video revolution — or why the 1990s will be the decade of the image in the information...

Page 1: The video revolution — or why the 1990s will be the decade of the image in the information industry

World Patent Information, Vol. 13, No. 4, pp. 187-192, 1991. Printed in Great Britain.

0172-2190/91$3.00+.00. Pergamon Press plc

CEC/WIPO

The Video Revolution - or Why the 1990s Will be the Decade of the Image in the Information Industry*

Paul Dixon, Derwent Publications Ltd., London WClX 8RP, U. K.

Summary

The 1970s and 1980s saw the rise and rapid develop- ment of online textual databases, particularly those covering science and technology. Online patents files are amongst the most used of these databases, and are now accessed worldwide by both patent specialists and researchers. However, most online databases covering intellectual property are based on an original document or publication which includes highly meaningful drawings, chemical structures or diagrams. Until recently these have not been avail- able in electronic format. The recent rapid advances in image handling and distribution technology, such as Document Image Processing (DIP), image extrac- tion, CD-ROM, WORM discs, erasable optical discs, high speed laser printing, and the deliv- ery of images over high speed telecommunications networks or by Group 4 fax, will enable the next generation of information products to be developed during the 1990s. The ability to mix searchable text with corresponding images, held either locally or delivered online, will enable the benefits of the original printed document or product to be made available to users in an electronic format. Since images are such a key element in making the most effective use of patent information this will mean that patent information products will be amongst the first to experience this image handling revolution. This paper reviews some of the present developments in image handling, and speculates on what kinds of information tools might evolve during the decade.

Introduction

The availability of online services, offering access to scientific and technical databases, revolutionised the working practices of information professionals when they appeared in the mid-1970s. In those early days access to these services could be difficult and was often frustrating largely because of the

*Paper presented at the Symposium Practical Problems in the Online Search of Patent Information held by OLPI (Intema- tional Association of Producers and Users of Online Patent Information) in association with the World Intellectual Property Organisation, Geneva, 4-5 July 1991.

low telecommunications speeds which were avail- able. Anyone who now uses online at 2400 baud, or higher, and who started to access databases at 11-12 baud (on a teletype compatible terminal with an integral tape reader) will realise just how much progress has been made in the past 15 years.

However, with a gradual improvement in telecom- munications access the 1980s proved to be a dec- ade of significant growth in the online industry - with new hosts and external databases appearing on a regular basis. This encouraged interest and use among an increasingly wide audience. Now the benefits of online retrieval of scientific, technical and business information are available to professionals in large, medium and small organisations. However, few real ‘end users’ access the services with which the information professional is familiar, and searching these sources of information remains a specialist skill largely in the hands of the intermediary.

Statistics which estimate the size and value of the online information industry appear on a regular basis. Frankly most of these are meaningless as they vary so much in how they define what is included within the ‘online industry’. Many of these estimates include real-time financial services, elec- tronic mail services and end user online services (such as CompuServe) - which are very many times more heavily used than the kinds of online services with which we are most familiar in the information profession. Within the specialist area of patent information one of the best estimates of the volume of online usage was made by Jacques Michel of the European Patent Office at the Montreux Chemical Information Meeting held last September. He estimated the value of this usage at $86m per year. As someone with some inside knowledge in this area I would think that this estimate bears a close relation to the current volumes of online use

of patents databases.

While growth in the online industry is still very healthy, and shows percentage annual increases that would be the envy of most industries, it is now not as spectacular as the 25% plus seen during the 1980s as the industry expanded rapidly. I would contend that

187

Page 2: The video revolution — or why the 1990s will be the decade of the image in the information industry

P. Dixon

this slowing in growth is because access to online services has now been extended to most information professionals around the world - leaving a much smaller pool of potential new users. Another reason is because little really fundamentally new develop- ments have taken place in the past few years. Most of the developments have been in advancing techniques of retrieval rather than in developing completely new sources or kinds of information. After all it is now very difficult to see where you could launch a new database covering an area not previously exploited, and which would prove popular enough to put it in the Top Ten or Twenty of the most heavily used scientific and technical databases. If you can think of a database which has this potential I would be very pleased to hear from you!

CD-ROM

This is the best known, and most widely used, of the new media capable of holding large quantities of data. The primary reasons why CD-ROM has proved successful are almost certainly its ease of manufacture (both in terms of cost and speed), the relatively low cost of a CD-ROM drive, and most importantly, the fact that there is an interna- tional standard. The formal PC CD-ROM standard, developed by an ad-hoc industry group known as ‘High Sierra’ was approved by the International Standards Organisation (ISO) in 1987. By laying out the CD-ROM files in IS0 9660 format, a publisher ensures that the CD-ROM will play on any standard CD-drive attached to a computer with the appropriate software.

This means that database producers need to re- examine their core services and look at methods of making them even more useful and meaningful. For publishers, like Derwent, who publish numerous information products on paper the target must be to make all of the benefits of the paper product avail- able in the electronic version. In the case of patent information it is quite clear that, until recently, the electronic information products had one major disadvantage over the paper products. While an online database offers really sophisticated methods and systems for retrieval the references which are retrieved do not offer the drawings, diagrams and chemical structures which are such an important fea- ture of the printed equivalent products. Many users of Derwent’s printed services, particularly engineers who use the printed products produced as part of the ‘Electrical Patents Index’ Service, actually scan the diagrams, rather than the text, in order to locate new patents which may be of interest to them.

Recent figures indicate that the number of installed CD-ROM drives worldwide is increasing very rapidly:

Installed drives (000s)

1988 171

1989 541 1990 1300

With the availability of new optical storage media I believe that the numerous benefits of having textual information in close association with a meaningful diagram or drawing, which until now have been the preserve of printed products, will increasingly become available as an integral part of new elec- tronic versions of these traditional products. My thesis is that the availability of new, or enhanced, information products with ‘images’ represents the next revolutionary step in the development of the information industry and that this will add new impetus to growth during the 1990s. This current decade will be dominated by technology used to distribute, store and retrieve information products with ‘pictures’.

This level of increase in interest in CD-ROM is clearly illustrated by the number of articles and advertisements which now appear in the popular PC magazines. The average PC user ‘in the street’ is now becoming familiar with the basic concepts of the technology and for this reason the forecast for 1992 is that six million drives, will be in use worldwide. Furthermore the following PC manufacturers are currently committed to installing CD-ROM drives as an integral part of their hardware, which will widen the market yet further:

IBM: mainframes, RISC 6000, PS2 95

NEC: PCs and portables Apple: all computers

Sun: all workstations DEC: all PCs BULL: all PCs ICL: all mainframes and PCs

Another important reason for the wide acceptance of CD-ROM relates to the price of the drives. Prices are falling very rapidly as a result of the benefits arising from mass manufacture. The movement in the average price for a CD-ROM drive over recent years is illustrated below:

In recent years there have been very significant Ironically the price axis below is not labelled and

advances in the basic technology necessary for these could be expressed in $ or f (or the Sterling equiva-

developments to become possible. I would now like lent in any European currency), and the figures

to move on to discuss the most important of these would not change. However, this disparity in pricing

advances. is a wider subject outside the scope of this paper. The

Page 3: The video revolution — or why the 1990s will be the decade of the image in the information industry

The Video Revolution 189

Price of CD-ROM Drives

1400

r 1200

90

07 88 9s 90 91 92

Fig. 1

very low prices predicted in the near future reflect the fact that a CD-ROM drive will be an integral

part of a microcomputer, and that the drive would simply be another mass-manufactured module.

The geographical breakdown 9f the current installed base of drives, illustrated in the pie chart below, is also interesting:

Asia

North America 56%

Fig. 2

Much of the Japanese growth has been in the manu- facture of CD-ROM drives for use with entertain- ment products, e.g. computer games. This emphasis on the ‘out of office’ market was illustrated during a recent survey which Derwent conducted of its subscribers. The results revealed a very low number of CD-ROM drives in use in the offices of the major companies in Japan who subscribe to Derwent’s patents services. The survey asked for details of the equipment available to them for the storage of large quantities of data and showed that they have already moved on to use the larger capacity optical disks. This is rather ironic given that many of these organisations are the major manufacturers of CD-ROM drives.

My view is that, for all its advantages, CD-ROM is not in itself the total solution for the long term storage of patents information containing drawings. For example the existing Derwent backlog image collection, going back to 1975, would occupy over 300 CD-ROMs which, while an improvement on

the equivalent on microfilm, would not provide the convenience and ease of use which is really required. I believe, as do many of the manufacturers of CD-ROMs and the CD retrieval companies, that the future of CD-ROM is as a distribution medium rather than as a storage medium for these particular kinds of applications. CD-ROM products will, of course, continue to flourish where the product can be put onto one disk, and we will undoubtably see an explosion of new products and services of all kinds on CD-ROM. However, I would anticipate that the vast bulk of these new products will be aimed at the end user rather than at the information professional. It is also likely that collections of software will be distributed on CD-ROM since it is possible to pack a lot of files onto a single disk.

The advantages of CD-ROM for distribution are, perhaps, best illustrated by looking at both its capac- ity to hold data and by comparing it with the cost of distribution of data over telecommunications net- works. In simple terms, of the kind used by the Guiness Book of Records, a CD-ROM has the following characteristics:

0 Equivalent to: -- a 36 foot high pile of paper - 1 tonne of paper - 8 medium sized trees

l One CD-ROM can hold 2036 floppy disks

0 There are 3.876 miles of track on a CD-ROM

l There are 21 752 000 000 bits on a CD-ROM

However, it is in the area of data distribution where CD-ROM really has its advantages, and this can be illustrated by direct comparison with data transmission over networks. The capacity of a CD-ROM is 703 Mbytes and to transmit this data at 1200 baud over a network would take 57 days with the associated very high cost. If, however, an information producer wished to distribute this same volume data to different customers around the globe this can be achieved at a price of about f4 per CD-ROM inclusive of the postage and packaging costs!

My thesis is, therefore, that CD-ROM will be used to distribute large collections of patent information for uploading onto other devices, This then leaves the question of how this data will be stored for long-term archival use, and for this reason it is now important to move on to discuss other methods of mass storage.

Page 4: The video revolution — or why the 1990s will be the decade of the image in the information industry

190 P. Dixon

Optical, Disks and Associated Technology

There are now a wide variety of mass storage devices on the market, but not all are likely to become widely used or to have a long term future. The table below identifies some of these media and provides details of their storage capacity and the cost of the medium itself (not including any necessary devices). In order to put these storage capacities into context I have indicated how many of each storage medium would be required to hold the Derwent backlog collection of images of abstracts and drawings (1975-1990) of about 180 Gbytes.

Medium

Floppy Disk (5.25”)

Bernoulli Box

Magnetic Tape

IBM 3480 Tape Cartridge

Erasable Optical Disk - (5.25”)

CD-ROM

WORM Disk (5.25”)

Helical-scan Tape Cartridge (DAT)

Helical-scan Tape Cartridge (Exabyte)

12” WORM Disk

Capacity

0.36 Mbyte

44 Mbyte

160 Mbyte

210 Mbyte

650 Mbyte f22.5

700 Mbyte &5

940 Mbyte sf140

1200 Mbyte f10

2300 Mbyte &30

6500 Mbyte f300

Of these alternatives it is very likely that WORM (‘write once read many times’) and Erasable (‘re- writable’) optical disks will provide the practical means of holding large collections of patent informa- tion in image format. The current problems in this area are the lack of standardisation, with each manufacturer adopting a different format, and the relatively high cost of the device to play the disk. However, a standard is likely to evolve - if only by ‘attrition’, with one manufacturer eventually dominating the market. The price of these storage devices is also likely to fall very rapidly, in line with the price reductions I outlined earlier for CD-ROM players.

The implication is, therefore, that CD-ROM will be used to distribute information products in image format with the data being uploaded onto a larger capacity optical disk. The producers of retrieval software will then be providing the capability to access and use the data from either a CD-ROM or an optical disk. Of course, not all users will

have access to optical storage devices or need to move in this direction. However, the major users of patent information are the very large companies who very often already have such devices installed within their organisations. The advent of Document Image Processing (DIP) for the long term storage of documents has generated great interest in the Administrative Departments of many companies on the basis of saving storage space. It should be possible for patent and information departments, who would be relatively small users of such systems, to ally themselves with these departments in order to gain access to DIP systems and the benefits they offer.

cost

fl

f90

00

f4

No. required to hold Derwent image collection

500,000

4,090

1,200

860

330

300

190

150

80

40

Smaller companies will still continue to use CD-ROM until high capacity optical devices fall to an accept- able price level. CD-ROM sub-collections of patent data in subject areas where the organisation has the highest level of interest are ideal in these cases. Simi- larly full text CD-ROM products, like ESPACE, provide an ideal method of maintaining an archive of patent documents which can be printed on demand. After all the CD-ROMs occupy a very small amount of space and it is really not inconvenient to have to take the appropriate disk off the shelf and to load the disk containing the particular document you want to print.

However, neither CD-ROM or optical disks provide the answer for the occasional user of patent infor- mation - who, whatever the price, is not going to invest in products providing local access to this data when this is only needed very infrequently. These users want to be able to access information products containing drawings, diagrams and chemi- cal structures on demand, as the need arises. This

Page 5: The video revolution — or why the 1990s will be the decade of the image in the information industry

The Video Revolution 191

then moves us onto the next topic - which is the availability of information products with images on online systems.

Online Databases with Images

Drawings and diagrams are a vital component of any information service covering intellectual property since at a glance a diagram, chemical structure or device can instantly provide a great deal of detailed information which is incapable of being adequately expressed in words.

For this reason the database producers who are most involved with intellectual property are also those who have made the biggest strides towards the provision of online databases containing images. Trademark databases, in particular, really are incomplete if you cannot immediately retrieve the trademark device or symbol as a result of your search. Two sis- ter companies of Derwent, within The Thomson Corporation, are Thomson & Thomson and Compu- Mark. Both of these organisations have made con- siderable progress in developing production systems which incorporate advanced methods of image cap- ture and storage, and in the provision of trademark images online. The online hosts have also made some advances in providing capabilities for trans- mitting images to an online terminal in real time, and in allowing images to be incorporated into offline prints.

DIALOG Information Services have provided a module within their DIALOGLINK communica- tions software package which can recover trademark images as a result of a search of the Thomson & Thomson Trademarkscan database. DIALOG can also provide these trademark images as laser printed offline prints, with the ASCII text record shown merged with its associated design. QUESTEL are also working in this area and demonstrated software which can capture trademark images transmitted online during last years International Online Infor- mation Meeting in London.

After trademarks it is patent information which is clearly the next area where important developments can be expected. One component of the Derwent digitisation project involves ‘clipping’ out the draw- ings, diagrams and chemical structures from the scanned images of the Derwent abstracts. The result- ing subcollection could then be loaded in parallel with the online ‘World Patents Index’ database to provide laser printed offline prints as ASCII text together with the associated bit mapped image. Naturally it would also be technically possible to transmit the image through the telecommunications network to the user. However, two remaining prob- lems need to be solved before these options become a practical possibility. Firstly, the online hosts have

to address the question of how to store these images. Using magnetic storage devices, which are currently used to hold text databases, to store these very large image collections is not practical in the long term. The hosts will need to invest in optical storage devices to make this really economic. The second area for development is out of the hands of both the online hosts and the database producers and concerns the speed of telecommunications. Current line speeds mean that the transmission of images is relatively slow. However, this is an area which is developing very rapidly and, with the advent of more advanced Integrated Services Digital Net- works (ISDNs), very much faster line speeds can be expected. This development will revolutionise the use of online generally and will also make the transmission of large numbers of images a really practical possibility. In my view until it is possible to download an image in less than 10 seconds this method of delivery will not, however, have universal appeal.

In the interim period, and while telecommunications advance, most users are likely to opt for obtaining images incorporated into their search results as offline prints. These offline prints could then be sent by airmail or transmitted by fax to the user. There is nothing new about this method of working since, in the early days of online when line speeds were much lower than they are now, most people opted to receive their search results as offline prints by mail as the most economic method of delivery.

If intellectual property database producers make the breakthrough that I have described then other producers will not be far behind. However, I am a little surprised that producers of other information products do not yet seem to be taking much interest in these developments. After all there is absolutely no reason why an online database covering the journal literature in a particular subject area should not be considerably enhanced by adding the capacity to recover important diagrams, drawings, graphs, or charts from the original papers and documents,

Business databases could be significantly improved by adding graphs, charts and other graphics as images, and it is my view that these business pro- ducers will be the next to exploit these new devel- opments in image technology.

This then brings me to explore the future, and to speculate on what information products might be available at the turn of the century.

The Shape of Things to Come . . .

Most of the developments which I am predicting are not as outlandish as they may sound, and are based on developments which have already taken place

Page 6: The video revolution — or why the 1990s will be the decade of the image in the information industry

192 P. Dixon

or on work currently underway. However, like all predictions of the future I have no doubt that when, in 10 years time, I rediscover what I have written for this meeting I will be amazed how just naive these predictions were.

However, here are just a few of the things which I believe will be familiar to the patents searcher of 1999:

-Microcomputer-based front end software will allow online databases, CD-ROM products and optical disks containing patent information, with and without images, to be searched in total combination. It will use standard Graphical User Interface (GUI) functions common to many other software products. The microcomputer on which this will run will be at least ten times as powerful as current machines, and is likely to be UNIX based. The GUI will operate within a windows environment, which has the benefits of coping with a wide variety of screen and printer drivers. It will be possible to operate in different windows at the same time, i.e. to be online in one window and to be recovering images stored locally within another window simultaneously. Data will be able to be automatically transferred between the different windows using electronic ‘cut and paste’.

-CD-ROM will be used to distribute the full text of patent specifications or abstract journals, some of which currently appear only in print, on a weekly basis. Data will be transferred onto larger optical storage devices held locally or retained on CD-ROM as an archive. In larger organisations, this will be used to distribute the appropriate documents direct to the terminal on the desk of the chemist or engineer. Front end software on each terminal will present the ASCII text, in a choice of font styles, together with any associated graphic-mimicking the printed page. The end user will be able to browse the latest material for current awareness purposes and to archive interesting documents to an optical disk to create their own personal database.

-Compression techniques, based on advanced developments in Mandelbrot Theory, will allow data to be compressed 100 times more than is currently possible. This will mean that even greater collections of data will be able to be distributed and held on relatively‘ small storage devices used locally.

-New optical storage media will have evolved which will hold Terabytes of data at a very reasonable price. (Currently Creo Products Inc., in Canada, are working on an optical tape player which uses ICI Image Data’s Digital Paper. One single reel of 35 mm optical tape will hold a Terabyte, or 1000 Gigabytes, of data. The ini- tial use of this tape is for holding sensing data beamed from earth observation satellites).

-The major online patents databases will, as a matter of routine, provide access to drawings, diagrams or chemical structures associated with the online record. These images will be able to be downloaded at the high line speeds offered by new ISDNs. Alternatively, offline prints will be available to the user and will be delivered through high speed electronic mail services which utilise satellite communications, or through high speed fax delivery. Group 4 fax will be the standard, with CCITI Group 5 and 6 Standards having been agreed.

-Patent documents will continue to be made available in paper form. However, it will also be possible to obtain the document on an optical card the size of a credit card. This card will hold a very highly compressed form of the ASCII text, together with any associated drawings or chemical structures all of which will be able to be read with a standard front end software package. The drawings on the card, if they have been created using computer aided design (CAD), will be able to be loaded into software packages capable of developing the image as

a three-dimensional wire drawing. Chemical structures in the document will be capable of being loaded into three-dimensional molecular modelling packages. Other kinds of data (e.g. nucleic acid sequences) will be capable of being extracted and loaded into other applications used by scientists.

-Printed information bulletins will continue to be produced and read on a regular basis and will have an even wider audience. The printed infor- _mation product still has the major advantages of being able to be browsed very quickly which allows for ‘chance’ retrieval, and to be capable of being read anywhere at anytime. Even the tcch- nology of the late 1990s will not have overcome these considerable benefits.