Survey of PCT search reports and the importance of the internet as a source of non-patent literature

12
Survey of PCT search reports and the importance of the internet as a source of non-patent literature Stephen Adams Magister Ltd., Crown House, 231 Kings Road, Reading RG1 4LS, UK Keywords: PCT application International search Authority Search report Non-patent literature Internet Field of search abstract Previous studies of the internet as prior art in patentability searching have concentrated on the difculty of establishing a date of publication and a stable form of citation. The current work examines whether the internet is actually contributing new prior art, or merely replicating non-patent literature which can be obtained by other means. A sample of PCT international applications published in 2007 provides some evidence that certain ISAs are more effective in locating and citing internet-based non-patent literature than others. The sample also reinforces the widespread perception that non-patent literature forms a higher proportion of total citations in distinct technical elds. Some recommendations are made about bibliographic control of internet disclosures, and the methods of citation in search reports which are the most helpful for third parties wishing to locate the cited work. Ó 2012 Elsevier Ltd. All rights reserved. 1. Background One of the primary motivations for the current research work was the emergence of an EPO Board of Appeal decision [1] in early 2007. The applicant was appealing against a decision by the Examining Division of the EPO to refuse their patent application. The applications priority date was 28th July 2000 and the search report was completed on 12th November 2003. This search report cited an item of non-patent literature from the Wayback Machine (www.archive.org) which had been accessed on 6th November 2003 [2]. The patent application in suit disclosed a computer game based on stock trading, and it appeared from the cited web pages that a similar game had been accessible via this website before the priority date (and should therefore be considered part of the state of the art and included in the EPOs assessment of patentability). However, the game itself was loaded in a password-protected part of the website, and was thus invisible to the web crawlers and not archived. The cited web pages consisted of help les and FAQ pages for the game. The Board of Appeal held that there was insufcient evidence to support the view that the cited website had appeared in exactly the same form on the earlier date as it had been found in the Wayback Machine version. Although part of the Wayback URL is usually indicative of the point in time when the original website had been crawled for archiving (expressed as the string YYYYMMDDHHMMSS), there were certain other points which did not support this date. The Board was also concerned about the lack of completeness of the trawling processes employed by the Wayback Machine and noted that: .the archived web pages constitute no more than circumstantial evidence of the existence of a game with the properties and advantages described in the web pages at their archiving date, but do not disclose the game itself.The decision also raised doubts about the general veracity of the internet as a source of prior art for patent applications, and stated that: The Board holds that absent directly applicable rules or guidelines, the criteria to be applied for establishing a disclosure made avail- able to the public through the Internet as in the present case should be the same as those introduced by the jurisprudence of the Boards of Appeal for establishing a prior use or a prior oral disclosure .[my emphasis]. In other words, the criteria for establishing a publication date when dealing with internet-based prior art should be rather stricter than when dealing with conventionalpublications. Any doubts about publication date are of course a primary concern in establishing whether an internet disclosure is a valid part of the state of the art, and the Board highlighted this problem when they made the comment that: Web sites are updated and come and go with breathtaking speed. Linking between sites changes continuously. Internet and web are E-mail address: [email protected]. Contents lists available at SciVerse ScienceDirect World Patent Information journal homepage: www.elsevier.com/locate/worpatin 0172-2190/$ e see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.wpi.2012.01.003 World Patent Information 34 (2012) 112e123

Transcript of Survey of PCT search reports and the importance of the internet as a source of non-patent literature

at SciVerse ScienceDirect

World Patent Information 34 (2012) 112e123

Contents lists available

World Patent Information

journal homepage: www.elsevier .com/locate/worpat in

Survey of PCT search reports and the importance of the internet as a sourceof non-patent literature

Stephen AdamsMagister Ltd., Crown House, 231 Kings Road, Reading RG1 4LS, UK

Keywords:PCT applicationInternational search AuthoritySearch reportNon-patent literatureInternetField of search

E-mail address: [email protected].

0172-2190/$ e see front matter � 2012 Elsevier Ltd.doi:10.1016/j.wpi.2012.01.003

a b s t r a c t

Previous studies of the internet as prior art in patentability searching have concentrated on the difficultyof establishing a date of publication and a stable form of citation. The current work examines whetherthe internet is actually contributing new prior art, or merely replicating non-patent literature which canbe obtained by other means. A sample of PCT international applications published in 2007 provides someevidence that certain ISAs are more effective in locating and citing internet-based non-patent literaturethan others. The sample also reinforces the widespread perception that non-patent literature formsa higher proportion of total citations in distinct technical fields. Some recommendations are made aboutbibliographic control of internet disclosures, and the methods of citation in search reports which are themost helpful for third parties wishing to locate the cited work.

� 2012 Elsevier Ltd. All rights reserved.

1. Background

One of the primary motivations for the current research workwas the emergence of an EPO Board of Appeal decision [1] in early2007. The applicant was appealing against a decision by theExamining Division of the EPO to refuse their patent application.The application’s priority date was 28th July 2000 and the searchreport was completed on 12th November 2003. This search reportcited an item of non-patent literature from the Wayback Machine(www.archive.org) which had been accessed on 6th November2003 [2].

The patent application in suit disclosed a computer game basedon stock trading, and it appeared from the cited web pages thata similar game had been accessible via this website before thepriority date (and should therefore be considered part of the stateof the art and included in the EPO’s assessment of patentability).However, the game itself was loaded in a password-protected partof the website, and was thus invisible to the web crawlers and notarchived. The cited web pages consisted of help files and FAQ pagesfor the game.

The Board of Appeal held that there was insufficient evidenceto support the view that the cited website had appeared inexactly the same form on the earlier date as it had been found inthe Wayback Machine version. Although part of the Wayback URLis usually indicative of the point in time when the originalwebsite had been crawled for archiving (expressed as the string

All rights reserved.

‘YYYYMMDDHHMMSS’), there were certain other points whichdid not support this date. The Board was also concerned aboutthe lack of completeness of the trawling processes employed bythe Wayback Machine and noted that:

“.the archived web pages constitute no more than circumstantialevidence of the existence of a game with the properties andadvantages described in the web pages at their archiving date, butdo not disclose the game itself.”

The decision also raised doubts about the general veracity of theinternet as a source of prior art for patent applications, and statedthat:

“The Board holds that absent directly applicable rules or guidelines,the criteria to be applied for establishing a disclosure made avail-able to the public through the Internet as in the present case shouldbe the same as those introduced by the jurisprudence of the Boardsof Appeal for establishing a prior use or a prior oral disclosure.” [myemphasis].

In other words, the criteria for establishing a publication datewhen dealingwith internet-based prior art should be rather stricterthan when dealing with “conventional” publications.

Any doubts about publication date are of course a primaryconcern in establishing whether an internet disclosure is a validpart of the state of the art, and the Board highlighted this problemwhen they made the comment that:

“Web sites are updated and come and go with breathtaking speed.Linking between sites changes continuously. Internet and web are

S. Adams / World Patent Information 34 (2012) 112e123 113

..relatively unregulated.[,].allow unrestricted access in largeareas [and].are notoriously insecure..It is thus at the presentstate of affairs often very difficult to establish with a high degree ofreliability what exactly appeared on a web site and when.” [orig-inal emphasis].

The Board’s decision noted that a distinction could be madebetween “reputable” or “regulated” websites wherein claimedpublication dates could be treated at face value, and “websites ofunknown reliability” where supporting evidence would berequired.

From the point of view of the information specialist, this deci-sion could have profound consequences on the ease of use of theinternet as a source of prior art. It could affect not only how theinternet can be cited but also the manner of the retrieval andstorage of any relevant disclosures.

However, there is a dearth of substantive evidence concerninghow the internet is contributing to the state of the art at the presenttime, let alone how this might be changing as patent offices payincreased attention to the non-patent literature (NPL). A fewexamples exist in the literature of studies on the use of NPL bypatent offices, but these have usually been limited to either onepatent office or a defined technical field [3,4]. The paper by Michelet al. [5] considered the question of variation between patentoffices, but did not further analyse performance in relation to NPL,whilst the earlier paper by Baré [6] concentrated only on EPOsearches. Previous work on PCT search reports has been carried outby Claus et al. [7]. The paper by Sternitzke [8] mentioned theimportance of NPL citations, but only in the context of EPOsearches, and did not distinguish internet citations from other NPL.Callaert [9] has conducted a study on non-patent references in USand EP search reports, but again has not explicitly consideredinternet citations.

Before examining the overall impact of the internet, it isimportant to distinguish between its different possible uses duringpatentability searches.

2. The internet: passive distribution medium, active newsource, or both?

Many searchers will be familiar with electronic journals, whichare essentially an accelerated distribution of traditional paperjournals. In some instances, the publishers make the final form ofpapers available in electronic form several weeks or months earlierthan their paper counterparts, but otherwise the content is iden-tical. Under these circumstances, the internet is a passive means ofstoring and distributing the journal content, and the only point ofcontention may be if the publication dates of the electronic andpaper editions straddle the priority date of a patent application,making it desirable to cite the earliest of two possible dates ofpublic availability.

However, as the use of the internet for scientific and technicalcommunication increases, there is a greater likelihood that it willbecome a primary source in its own right i.e. the actual disclosuresare not present in any other form. The simplest form of this devi-ation from conventional publishing is found in journals whichmake available some form of ‘enhanced content’ in their electronicversions, such as supplementary data packages, more sophisticatedgraphics or even entire articles which appear in the ‘web-only’version of a journal. More serious challenges to the prior artsearcher come in the form of open access (OA) journals, some ofwhich only publish in electronic form and have no paper counter-parts. Added to this is the ever-increasing number of institutionalrepositories, websites, blogs, wikis, discussion lists, mash-ups,social networking sites and other tools, most of which are, in the

legal sense, ‘made available to the public’ and hence form part ofthe state of the art. Some of these contain explicit, consolidateddisclosures (actual ‘documents’) whilst others contain adventitiousdisclosures in the form of (e.g.) press releases, individual webpages, ongoing work-sharing discussions, archived instant-messaging sessions or similar unencrypted text which may, enpassant, reveal some of the knowledge held by the proverbial“person skilled in the art” at a fixed point in time e informationwhich may be vital to a later litigation [10]. Some examples of thenew types of disclosures which wemay encounter as a result of thewidespread availability of the internet have been described ina paper by van Staveren [11].

The objective of this current work was to answer a number ofdistinct questions:

a) What proportion of a typical patent office search reportcurrently consists of NPL, including material which is accessiblevia the internet (i.e. passive disclosures)?, and

b) Are there any significant differences in the proportion of NPLlocated by the different patent offices during their searchwork? (both discussed in section 5.1 below)

c) If differences do exist, do they reflect genuine variation inretrieval performance between patent offices and/or a bias inthe significance of NPL in certain technical fields? (section 5.2below)

d) What proportion of the NPL in a typical search report is onlypresent on the internet? (i.e. active disclosures e section 5.3below).

It is hoped that, if these questions can be answered, it will giveboth patent offices and patent information professionals somemeans of identifying which are the most important issues whendeveloping policies for storing information on, or retrieving infor-mation from, the internet. In some quarters, scepticism remainsconcerning the ability of patent offices to locate the best prior art atall, especially when it is found in the NPL, so this is also animportant issue from the point of view of confidence in the patentsystem itself [12].

3. Internet citation standards

Much of the prior work in relation to internet disclosures inpatent search reports has concentrated on the issue of how to citethe disclosure accurately (i.e. relating to objective (a) above) ratherthanwhether or not the internet is actually providing newmaterial(objective (d) above). It is therefore appropriate to consider this aspart of the current research.

In the wake of the T1134/06 decision, the EPO produced newguidelines on the use of the internet, which were incorporated intoa revised edition of the formal Guidelines for Examination [13],together with a policy statement in the Official Journal [14] whichreiterated many of the points raised by T1134/06. Some of the legalchallenges surrounding internet citations, such as establishingproof of publication date, had earlier been raised by Arch-ontopoulos in a paper in 2004 [15], and remain pertinent in thelight of the later decision.

In addition, the EPO was asked at the 2009 European PatentInformation Conference in Biarritz [16] to consider who bore theresponsibility to guarantee the correctness of all NPL citations,including internet URLs, in their published applications and gran-ted patents. The EPO response was to point out that the imple-mentation of a formal time stamping and archiving mechanismwould require further analysis, as this service would also involvedata for which the copyright was not held by the EPO [17]. A similar

Table 2Variant forms of citation.

Citation Form

http://jis.sagepub.com/cgi/content/abstract/10/4/181

Electronic availability,subscriber-only, pre-publication

Oppenheim C. (1985) Journalof Information Science, 10(4),pp.181e186

Harvard citation standardfor paper journal

Oppenheim, Charles. Journal ofInformation Science 10(4)(1985):181e186

MLA citation standard forsame journal

http://www.doi.org/10.1177/016555158501000408

Digital Object Identifier citation(without browser plug-in)

DOI: 10.1177/016555158501000408 Digital Object Identifier citation(with browser plug-in)

S. Adams / World Patent Information 34 (2012) 112e123114

suggestion had been made in the Archontopoulos paper, whocommented that:

“In the future, it is foreseeable that the use of time stampingauthorities, sort of electronic notaries, will be more frequent forintellectual property witnessing and document authentication,especially if performed following international standards..”

The 2009 Biarritz statement also pointed out that the EPO hadsought to improve the URL stability of some internet citations byadoption of DOIs (Digital Object Identifiers) in the Espacenet searchengine and for NPL citations in the Register in relation to opposi-tions and appeals, and were currently seeking to harmonise therules for presentation of citations in both Register and Espacenet.

Although it is undeniable that the internet has created a generalimprovement in the accessibility of many forms of information, thishas often been at the expense of bibliographic control. Evenexperienced information professionals are not clear about thesignificance of certain data elements, in the process of providinga clear path to retrieve a document at a later date. This is particu-larly true when citing a URL. Many readers will have seen instanceswhere an applicant wishes to refer to a known article from a journalwhich is available electronically within their organisation, andthink that (by analogy with citing other website content) they canprovide an adequate citation by simply cutting and pasting the URLfrom their browser window. In fact, these URLs frequently containembedded access controls which mean that a non-subscriber willeffectively find that the link is broken; at best, it provides access toan abstract, certainly not to the full text which the subscriber wasviewing at the time of pasting the link. For example, in Table 1, thesame URLs have very different results in terms of what informationis actually accessible to different users.

Quite apart from the extreme clumsiness of citing a URL as longas the examples given, it is clear that the link is effectively broken(as measured by whether it gives direct access to the informationwhich the person intended to cite) for all users other than theoriginal subscriber. Indeed, it is possible that by citing such URLs,some secure information may be divulged; note that in the secondexample, the string “&_user¼” has been amended to remove theauthor’s ScienceDirect user number!

The phenomenon of broken links on the internet, and theconsequent usefulness of this method of providing a citation toprior publications, is not a unique problem for patent applicants orsearchers. It has been estimated [18] that up to 81% of the linkscited in Hansard (the official UK record of Parliamentary debates) inanswer to parliamentary questions in the period 1997e2006 werebroken, effectively rendering the request for information un-answered. The best performance during the same period was 56%of the links working up to 10 years after being cited, which stillleaves nearly half effectively useless. Surveys by the NationalArchives [18] show, not surprisingly, that Members of Parliamentwere frustrated that they could no longer access the informationwhich they had sought to elicit from the Government. Partly asa result of this work, the UK Government has issued best-practicerecommendations in respect of public sector websites [19].

Table 1URL citation to subscriber-only NPL.

URL

http://www.sciencedirect.com/science/article/B6V5D-4YB84H7-1/2/c516264056fff2bc26781950976627?&zone¼raall

http://www.sciencedirect.com/science?_ob¼MImg&_imagekey¼B6V5D-4YB84H7-1-1&_cdi¼5784&_user¼0000000&_pii¼S0172219009001409&_origin¼na&_coverDate¼12%2F31%2F2010&_sk¼999679995&view¼c&wchp¼dGLzVtz-zSkzV&md5¼8f433676404f83f5b1a829da703d6efc&ie¼/sdarticle.pdf

Despite increasing awareness of the problem of instability ofcitations on the internet, it is becomingmore common for there to bemultiple forms of citation for a single item of information, includingat least one ‘internet citation’. This is true partly because of theexistence of multiple copies of the same item (e.g. copyright-freedocuments loaded on different websites, or plagiarised text repro-duced without acknowledgement in different places) but also whenthe itemhas several ‘identities’ at different points in its life, or indeedvariant forms of the same identity, as shown in Table 2.Whilst it maybe very easy for the human-readable citations (examples 2 and 3) tobe reconciled as essentially the same but merely re-formatted,computer-readable citations such as examples 1 and 4 need onlyvary by a single character to become un-usable to at least some of thepotential readership. Indeed, in the course of preparing this paper, itwas discovered that the DOI in examples 4 and 5, which is supposedto be a permanent link, was itself broken.

As indicated above, it seems very unlikely that there are manyinstances in which the dates of public availability of different formsof the same disclosure straddle a priority date, leading to the situ-ation that one disclosure may be novelty-destroying but a later onewould not. Citations which utilise the standards for the paperformat of a document are relatively stable, and certainly lesssensitive to minor variation (e.g. a missing punctuation mark oreven an entire missing element such as ‘end page’ does not renderthe citation useless). Therefore, I would suggest that patent officesshould enforce a standard for both applicants and their own searchexaminers, to require that whenever citations to multiple formatscould be given, the preferred form is that which enables access tothe maximum number of users for the longest possible time i.e. theconventional paper form. It should be optional to record the meansof access (and/or date of access) to equivalent electronic forms, butthese would only be considered if the date of disclosure wasobjected to by the applicant or a later litigant. Such objectionwouldonly be reasonable if the cited disclosure date was very close to thepriority date of the document, and there was reason to believe thatone form of disclosure was made earlier than another.

Before turning to the analysis which forms the bulk of this paper,it is worth considering the EPO’s views (as cited above) on the issue

Result forsubscriber

Result for non-subscriber

a8 Full HTMLtext of article

Title, abstract and table of contents

Full PDFof article

Login screen to pay-per-view

S. Adams / World Patent Information 34 (2012) 112e123 115

of “reputable” and “unknown reliability” in websites. In the periodafter T1134/06 was issued, the epi made a suggestion that the EPOshould consider setting up a formal website of deposit, with a veri-fiable date and time mechanism. Such a website would be regardedas a “reputable” source of prior art. However, in my opinion, it isunlikely that the existence of such a mechanism in 2003 would haveavoided the T1134/06 dispute. Any depositary is only as good as therules and procedures for deposit, and the people required to makedeposits. It is extremely difficult to ensure that all authors are awareof the archiving policies of their institution or corporation. This canbe seen in the statistics of deposit at many so-called Green OpenAccess sites; they frequently exhibit a surge in depositswhen the siteis opened and the backfile is loaded, but this often tails off in lateryears unless there are sustained efforts to incorporate the systeminto organisational memory and procedure [20].

This ad hoc approach becomes even worse when attempting tocontrol the archiving of non-documentary material, such as webpages, blogs etc. Suchmaterial is, at themoment, rarely regarded byits authors as likely to form part of the state of the art in a patentingdispute, and therefore seen as not worth archiving in any controlledmanner. For example, I consider it highly unlikely that webmastersaround the world could be persuaded regularly to deposit date-stamped copies of their websites into a formal EPO depositary, onthe off-chance that they may disclose novelty-destroying subjectmatter. Commercial information providers, such as ResearchDisclosure or IP.com, already offer a mechanism for rapid depositand entry into the state of the art, but their effectiveness alsodepends entirely upon the initiative of the author, who must takethe decision that the material is worth depositing.

If we accept that it is e and will remain e extremely difficult toregulate all electronic disclosures in a manner which satisfies thelegal definition of ‘public availability’, the best compromise is likelyto be achieved by making an explicit distinction between the dualroles of the internet. Whenever the internet is being used as onlyone of a number of different distribution media, then decisionsneed to be made concerning which citation format will provide thebest long-term information retrieval quality. Secondly, wheneverthe internet is a source of unique material, it is necessary to insti-tute better control and archiving in order to minimise any ambi-guity about the public availability of the material. It is clear thatestablishing such policies would affect (at present) a very smallproportion of the state of the art, and would inevitably be difficultand costly to achieve. However, if nothing is done, we face a futurewhere there may be doubt about the usefulness of an increasingproportion of the theoretical state of the art.

At present, there is a mixture of policies and standards for useand citation of the internet, and this will be seen in the followinganalysis, based on PCT search reports.

4. Sampling and data extraction

According to the statistics published by WIPO [21], the year2007 saw production of a total of 150,075 newly published appli-cations, in the form of WO-A1 or WO-A2 pamphlets. A substantialproportion of the WO-A2 applications were followed by a delayedsearch report (WO-A3) within the following 12 months. In order totest some of the hypotheses outlined above, a 1% sample (1500documents) of the 2007 output was analysed during 2009e10, andcertain information about the nature of their search reports wasrecorded.

In determining how to make the sample, it was obviouslyimportant to try to ensure that it should be representative of boththe relative volumes of work done by each ISA, and across alltechnical fields in awaywhich represented the relative proportionsof applications across all parts of the IPC. In an ideal world, it would

have been useful to select records according to the ISA which hadconducted the search, but this field is not searchable in any form ofthe PCT database.

With regard to stratifying the sample according to the IPC, it wasconsidered that it would be unhelpful to try to bias the sample inthis way. Part of the objective of the present work was to try toascertain whether ISAs were in practice identifying NPL across allsubject fields. It is impossible to ascertain objectively whethera search report which returns zero NPL is because (i) no NPL exists,or (ii) NPL exists but was not located by the ISA or (iii) the NPL wasnot considered because the ISA located close prior art in the patentliterature, and closed the search. However, by locating exampleswhere each ISA has been facedwith a broad range of subject matter,it was hoped to obtain some clue to the performance of the ISA bycomparing NPL proportions identified in the same subject field.

Given the lack of searchability of the ISA and the need to obtainan unbiased range of subject matter, it was decided to select thesample on the basis of random publication numbers. Before beingallocated a publication number, each week’s production of PCTdocuments is sorted into alphanumeric order according to theinternational application number (PCT/CCYYYY/NNNNNN). Conse-quently, within any given week, a random selection of publicationnumbers will reflect the pro rata distribution across all ReceivingOffices, and hence by extension across all nominated ISAs.Extending across a full year, the sample should also be a fairreflection of the relative proportions of applications filed in eachsection of the IPC, worldwide.

The sample was selected using the random number generatorwithin Microsoft Excel�, with limits set between 0 and 150,000. Atotal of 1500 entries were generated, rounded to the nearestinteger and sorted into numerical order. This list then formed thebasis of sampling by investigating the PDF version of the corre-sponding publication number WO 2007/nnnnnn from the WIPOPatentScope site.

During the course of analysing, the following policies wereadopted:

a) If the selected publication was originally released as a WO-A2,the PatentScope record was examined to see whether a corre-spondingWO-A3was available as of the date of the sampling. Ifso, the A3 document was used.

b) If no WO-A3 document was available, the selected publicationnumber was eliminated from the sample and replaced by thenext integer for which a search report was available (either asWO-A1 or WO-A3).

c) If the rounding process of the random numbers had resulted intwo entries with the same integer, the first samplewas taken asthat integer and a second sample was taken from the nextinteger for which a search report was available, using thecriteria for (a) and (b) above.

For each document sampled, the following items wereextracted:

i) Publication numberii) First-cited IPC to sub-class leveliii) The International Search Authority (ISA) which conducted the

searchiv) The total number of items in the search reportv) The total number of patents (or utility models) present in the

search reportvi) The total number of non-patent literature (NPL) items in the

search reportvii) The number of NPL items which were ‘internet-unique’

(defined below).

Table 3Sources of 2007 search reports.

ISA No. of reportsin sample

Percentage within sample(rounded to 1 decimal place)

Overall 2007percentagea

AT 13 0.9% 0.7%AU 29 1.9% 1.8%CA 27 1.8% 1.6%CN 48 3.2% 3.5%EP 708 47.2% 47.1%ES 10 0.7% 0.7%FI 9 0.6% 0.5%JP 250 16.7% 16.5%KR 77 5.1% 6.4%RU 9 0.6% 0.5%SE 30 2.0% 2.0%US 290 19.3% 18.7%Total 1500 100.0% 100.0%

a Data obtained from reference [21].

Table 4Gross statistics on search reports (all ISAs).

Patents (A) NPL (B) Total (A þ B) ‘Internet-unique’

Total number of items(all reports)

6254 828 7082 107

Percent of each literaturetype (all reports)

88.31% 11.69% 100.00% 1.51%

Mean number of itemsper report

4.17 0.55 4.72 0.07

Median number of itemsper report

4 0 4 0

Modal number of itemsper report

3 0 4 0

S. Adams / World Patent Information 34 (2012) 112e123116

Given data values for (vi) and (iv), the percentage of NPL in eachsearch reportwas calculated inExcel and retained for furtheranalysis.

In determining whether an item of NPL was ‘internet-unique’,the basic criterion was whether the citation in the search reportincluded any evidence which might indicate that the same docu-ment had been published in some medium other than via theinternet. This included, for example,

� a 10- or 13-digit ISBN found in association with a book title orconference proceedings, or

� an ISSN, journal name, volume or pagination details in asso-ciation with a journal item.

Whilst this methodmay be prone to accidentally counting someestablished open-access internet-only journals (with ISSNs) as ifthey are conventional journals with print equivalents, the problemof identifying these does not seem to have arisen. Inmost instances,it was straightforward to make a judgement on whether a givencitation was ‘internet-unique’ or not. Details of the items whichwere found are discussed later in this paper.

Before discussing the results, it is important to emphasise thatthis size of sample makes it impossible to run formal statisticalanalyses. In many instances, sub-samples (for example, by ISA orIPC sub-class) consisted of fewer than 10 documents, and thereforeopen to statistical distortion. However, as noted above, since it iscurrently impossible to use the ISA as a searchable field or tocategorise the contents of a search report by automatic means, eachdata item in the overall sample was extracted manually from thePDF version of the document. It is hard to see how this type ofanalysis can be extended to much larger samples, yielding enoughdata to determine statistically meaningful trends, until these issueshave been addressed. Despite these shortcomings, the authorbelieves that some interesting phenomena have been observed.

5. Results

5.1. Proportions of NPL retrieved and variations between ISAs

After applying the selection policies above, a total of 44 itemsout of 1500 in the sample (2.9%) had to be replaced by a newnumber due to (b) or (c) above. Although this is not a report on thetimeliness of PCT searching as such, it is worth noting in passingthat several documents which had been published as a WO-A2during 2007 were still without a search report in 2010, over threeyears later, when most of the data extraction was carried out.

The first analysis was to determine whether a reasonable spreadof International Searching Authorities (ISAs) had featured in thesearch reports. At the time, the Nordic Patent Institute was not yetfunctioning as an ISA so does not feature in the list. The overallresults are found in Table 3, and the distribution is compared withthe overall distribution for the entire year. This shows a closematch,indicating that the sample is reasonably representative of the entireyear’s content. Unfortunately, WIPO only records statistics on thenumber of applications filed broken down by selected ISAs, not onthe actual annual number of searches conducted by each ISA, so it isimpossible to carry out a chi-squared test or similar technique tomeasure quantitatively the exact closeness of fit. All that can be saidin the light of Table 3 is that the random sampling method hasapparently delivered a representative spread of work by all the ISAs.

The first analysis of the composition of the search reports wasa single figure grossed across all ISAs. It became evident thatoutlying data points (mainly in the form of unusually large singlesearch reports) had the potential to skew an arithmetical mean, sothe analysis was conducted by calculating the mean, median andmode, for comparison purposes. These results are shown in Table 4.

In the Table, the number of citations to items which are onlyaccessible via the internet (the so-called ‘internet-unique’ items inthe final column) is a subset of the NPL data (column (B)).

These total figures are capable of two contrasting interpreta-tions, one of which suggests that the internet is under-performingand the second that it is making a reasonable contribution. Firstly,the absolute figures show that the internet is responsible for patentoffice examiners (or, in the case of applicant-supplied citations, theapplicants’ pre-filing searches) locating less than 2% (107/7082) ofthe prior art, and that over 98% is still coming from establishedliterature sources. The more optimistic conclusion, based on theratios, is that the internet is revealing 107/828 or nearly 13% of thetotal non-patent literature in search reports.

The next analysis was directed towards understanding whetherthe ISAs differed in the proportion of NPL cited in their searchreports. As for Table 4, this was initially carried out for all reports bya given ISA, across all technical fields.

At this point, it became clear that there are two distinct ‘averages’which might be calculated during the analysis. One average (whichwill be referred to as themean percentage NPL, orM1) is calculated bydetermining the percentage of NPL in each search report in thesample, and taking the mean of all those values (Equation (1)). Theother figure is derived by calculating the total NPL items across allreports and expressing this as a percentage of the total items of alltypes (percentage mean NPL or M2 in Equation (2)).

M1 ¼Pn

1

�Bn

An þ Bn

n(1)

M2 ¼Pn

1BnPn1ðAn þ BnÞ

(2)

Table 5Relative proportions of patent and NPL retrieved, by ISA.

InternationalsearchAuthority

Mean patentitemsper report

Mean NPLitemsper report

MeanpercentageNPL (M1)

PercentagemeanNPL (M2)

AT 3.23 0.15 4.5% 4.5%AU 5.24 1.76 19.6% 25.0%CA 4.33 0.48 13.8% 10.0%CN 4.35 0.15 4.8% 3.2%EP 4.35 0.76 11.9% 14.8%ES 3.80 0.40 10.0% 9.5%FI 4.00 0.22 3.2% 5.3%JP 4.55 0.43 6.1% 8.6%KR 3.84 0.14 3.7% 3.6%RU 4.00 0.56 13.3% 12.2%SE 4.17 0.50 10.1% 10.7%US 3.40 0.26 6.5% 7.2%

S. Adams / World Patent Information 34 (2012) 112e123 117

In both Equations, An ¼ the number of patents in the nth searchreport, Bn ¼ the number of items of NPL in the nth search report,and n ¼ the number of search reports in the sample.

The advantage of M1 is that it is not distorted by the presence ofa small number of very large search reports (where A þ B is large,irrespective of the size of B), whereas M2 will be. This effect is morepronounced with small sample sizes, such as some of the moredetailed analysis below. The analyses in Table 5 show both M1 andM2, but Fig. 1 is based solely upon M2. Unless otherwise stated, themore detailed discussions below use the data for M1 only.

The implications of both Table 5 and Fig. 1 are that some ISAs arebeing rather more ‘successful’ in locating the NPL than others. It isno surprise that the EPO is consistently achieving over 10%, but theperformances of AU, CA, ES, RU and SE are also above this level.There is some distortion due to small sample numbers, but it will beseen in later analysis that IP Australia in particular has an above-average performance.

Turning to a subject-based analysis, it should be noted that oneof the prime motivators in the increase in coverage of NPL, partic-ularly by the EPO, was the perception that in certain fast-developing subject areas the NPL was a particularly importantaspect of the state of the art. This was first seen in the field ofbiotechnology, where fast-publishing journals and electronic pre-prints became the media of choice for disseminating research

Fig. 1. Analysis by ISA of percentage of NPL per search report (M2).

results. Accordingly, it has long been accepted that the NPL is likelyto form a more prominent component of search reports in sometechnical fields than in others. This research provided one mecha-nism for measuring this, since the first-cited IPC to sub-class levelwas recorded for all records in the sample.

For each search report, the IPC to sub-class level and the ISAwere both recorded. A cross analysis using pivot tables in Excelrevealed a number of interesting phenomena.

The sample of 1500 search reports was spread across a total of105 IPC classes. Of these, only 43 classes had a value of M1 greaterthan 0.00% (to 2 decimal places). The top 20 subject fields and theircorresponding M1 values are shown in Table 6. It is interesting toobserve that only the top 12 have an M1 greater than the gross M2score of 11.69% reported in Table 4. The bottom 21 non-zero classesare below this gross average. This tends to support the observationthat the expected NPL content in search reports is highly skewed bysubject, and not equally distributed across all technical fields.

A further analysis of some of the top-ranked IPC classes revealsthat by no means all ISAs perform equally. For example, inbiotechnology (Table 7, column 2) there were 6 ISAs which hadconducted one or more search in this field [22]; three of these (AU,JP and EP) were clustered in a range from just under 60% to just over70% NPL per search report, but the US appears to be under-performing, in that it has only averaged just over 30% NPL in itsreports.

For organic chemistry (Table 7, column 3), the picture is some-what different. The performances of the EPO, the USPTO, SIPO andKIPO are within a fairly narrow range from 15 to 25%, but now theJPO comparatively under-performs, retrieving only 8% NPL onaverage in the relatively small number of reports which itcompleted.

In the field of computing, the differences are even more marked(Table 7, column 4). Despite conducting more search reports thanany other ISA, the USPTO only averages just under half the averageM1 and the JPO and KIPO are both even worse. In all of theseexamples, the EPO seems to be at or above the average performancefor all ISAs.

It proved more difficult than anticipated to come to anyconclusion about the relative performance of ISAs in retrieving NPLacross a wide range of subject matter. Only 13 IPC classes had beensearched by a majority (6 or more) of the ISAs, and only 9 classesshowed non-zero retrieval by all ISAs. The bulk result is shown inTable 8 below. Since the absolute numbers of searches conductedby any single ISA in a given class was small, the M1 score is prone todistortion; see, for example the 100% record of the Spanish ISA inclass A61 and the 83.33% record of Canada in class C07 (both basedon a single report, n ¼ 1 in Equation (1)).

The Table does seem to indicate that useful NPL disclosures arerarely found by any ISA in the subject fields of sports and games(A63), layered products (B32), printing (B41), vehicles (B60), dyes,paints, adhesives (C09) or basic electrical elements (H01).Conversely, a majority of ISAs found at least some NPL whensearching organic chemistry and biotechnology (C07, C12 as notedabove in Table 7 above), measuring and testing (G01) andcommunications (H04).

As noted above, it would be inaccurate, if not invidious, to infertoo many trends from these data due to the small sample size, butsome points are worthy of note:

� The JPO is the only office which has located NPL in the field ofsports and games (A63), whereas the EPOwas the only office tolocate NPL in basic electrical elements (H01).

� Of the tripartite offices (which performed over 80% of all thesearches under scrutiny), the USPTO attains a worse-than-mean retrieval in 8 out of 9 non-zero subject areas, and

Table 7NPL retrieval measured by M1 score in a range of significant IPC classes, by ISA.

ISA M1 (C12) M1 (C07) M1 (G06)

AU 72.12% 65.00% 13.84%CA None 83.33% 45.00%CN 0.00% 22.50% 0.00%EP 58.95% 20.89% 26.09%JP 65.17% 8.08% 2.08%KR 100.00% 20.00% 0.00%RU None None 0.00%SE None 0.00% 0.00%US 30.95% 16.67% 6.62%

Table 6Top 20 values of M1 across all ISAs.

Rank IPC class Technical field Mean percentage NPL (M1) Number of search reports

1 C12 Biochemistry; beer; spirits; wine; vinegar; microbiology;enzymology; mutation or genetic engineering

55.65% 50

2 G21 Nuclear physics; nuclear engineering 37.50% 23 G05 Controlling; regulating 24.65% 84 G10 Musical instruments; acoustics 22.78% 65 C07 Organic chemistry 21.02% 626 C25 Electrolytic or electrophoretic processes; apparatus therefor 20.00% 57 C30 Crystal growth 16.67% 28 H03 Basic electronic circuitry 15.49% 129 A01 Agriculture; forestry; animal husbandry; hunting; trapping; fishing 13.27% 2310 H04 Electric communication technique 12.88% 16211 A61 Medical or veterinary science; hygiene 12.81% 18312 G01 Measuring; testing 12.42% 8213 G06 Computing; calculating; counting 11.62% 12014 G03 Photography; cinematography; analogous techniques using

waves other than optical waves; electrography; holography11.11% 9

15 C22 Metallurgy; ferrous or non-ferrous alloys; treatment ofalloys or non-ferrous metals

11.11% 6

16 C05 Fertilisers; manufacture thereof 10.00% 217 B81 Micro-structural technology 10.00% 218 C04 Cements; concrete; artificial stone; ceramics; refractories 9.40% 719 A23 Foods or foodstuffs; their treatment, not covered by other classes 7.21% 1620 C02 Treatment of water, waste water, sewage, or sludge 7.14% 7

S. Adams / World Patent Information 34 (2012) 112e123118

better- than-mean in only one (G01); the EPO performs better-than-mean in 6 out of 9 subject areas, and worse-than-mean inthe remaining three (including organic chemistry, C07); theJPO is intermediate, being better-than-mean in 3 subject areas(A63, C08 (polymers) and C12) but worse-than-mean in six.

� The searches from IP Australia did not always locate NPL incommon subject areas, but on those occasions when they did(see columns for C07, C12, G01, G06 and H04 in Table 8), theyinvariably returned higher-than-average M1 scores.

Table 8Variation in NPL retrieval (M1 score) across common IPC classes (% to 2 decimal places).

A61 A63 B32 B41 B60 C07

AT 0.00 0.00AU 0.00 0.00 0.00 0.00 65.00CA 0.00 0.00 0.00 83.33CN 25.00 0.00 0.00 0.00 0.00 22.50EP 13.30 0.00 0.00 0.00 0.00 20.89ES 100.00 0.00FI 0.00JP 20.00 12.50 0.00 0.00 0.00 8.08KR 0.00 0.00 0.00 20.00RU 33.33SE 0.00 0.00 0.00US 10.37 0.00 0.00 0.00 0.00 16.67Mean M1 20.20 1.39 29.56EP eMean �6.90 �1.39 �8.67US eMean �9.83 �1.39 �12.89JP eMean �0.20 11.11 �21.48

5.2. Evidence of subject specialisation

The nature of the survey rendersmany phenomena subject to thevagaries of statistical method.For example, the only ISA to retrieveany NPL in technical field A63 was the JPO (Table 8, column 2); theirperformance appears impressive until it is realised that the numbersare based on only two search reports, which retrieved a total of 16items (eight in each), of which 2 were NPL and both appeared in thesame report, for WO 2007/086399. The two items consist ofa German trade journal article (Schiff Hafen Kommandobrücke41(8), (1989), 45e48) and a technical paper published by a USprofessional institute (Pap. Amer. Inst. Aeronaut. Astronaut. AIAA-93-0016 (1993)). Examination of the English-language translationof this application on entering national phase in the United Kingdom(GB 2448261-A) indicates that neither of these NPL citations wasprovided by the applicant, so we can infer that they were located bythe search examiner. Since neither item is from Japanese-languageNPL, this would have required good searching skills in languagesother than the mother tongue of the searcher.

At the other extreme, the EPO’s record in technical field H01(Table 8, column 12) is somewhat better attested. The ISA

C08 C09 C12 G01 G06 H01 H04

25.000.00 72.12 16.67 13.84 0.00 50.00

25.00 45.00 0.00 0.000.00 0.00 0.00 2.27

4.39 0.00 58.95 13.04 26.09 9.17 22.750.00

0.006.81 0.00 65.17 1.52 2.08 0.00 7.498.33 0.00 100.00 0.00 0.00 0.00 5.88

0.00 33.33 0.00 0.00 0.000.00 0.00 0.00 43.300.00 0.00 30.95 16.30 5.62 0.00 3.602.79 54.53 11.76 10.29 1.31 16.031.60 4.42 1.28 15.80 7.86 6.72

�2.79 �23.58 4.54 �4.68 �1.31 �12.434.02 10.64 �10.25 �8.21 �1.31 �8.54

Table 10Citations of NPL in IPC class G01 by the USPTO (PCT Minimum Documentation initalics).

Publicationnumber

Items ofNPL insearchreport

% NPL insearchreport

Comments

WO 2007/016418 2 33% J. Nanoparticle Res. (2001);Proc. Nat. Acad. Sci. (USA) (2001)

WO 2007/035676 1 50% Clin. Cancer Res. (2005)WO 2007/056160 1 33% J. Biol. Chem. (2005)WO 2007/062431 1 50% General Technical Report

NE-313, US Dept. ofAgriculture (2004)

WO 2007/133215 7 78% 2 @ Materials Science& Engineering (1998); 2 @J. Eur. Ceramic Soc. (2000,2001);Sensors & Actuators A (1996);NDT&E International (1996);Materials Characterization (2000)

S. Adams / World Patent Information 34 (2012) 112e123 119

completed 38 search reports in this field, with a total of 21 items ofNPL contained in 14 of the reports, and again none of the other ISAswhich completed search reports in this technical field located anyNPL at all. The details of the EPO reports are shown in Table 9. It isclear that a large proportionwere identified either by searching fulltext journals (Elsevier, North Holland etc.) or appropriateabstracting databases such as INSPEC. All the citations of NPLincluded an ISSN or similar which would have facilitated access tothe hard-copy equivalents of these publications. It may also besignificant that, although a large proportion of these cases wereapplications originating in the USA, the EPO had been nominated asthe competent ISA.

TheUSPTO’s performance inG01 (measuringand testing) isworthexamining further, since this is the single area in which this ISAperformed better than average,whilst the other twoTripartite officeswere either average or worse than average (Table 8, column 10).

The USPTO completed 15 search reports in G01, shown inTable 10. The reports contained 69 items, of which 12were NPL. Tenof the search reports contained no NPL at all, and 7 of the 12 NPLitems were in a single report (WO 2007/133215). Seven of the 15reports in class G01 were classified in sub-class G01N (‘investigatingchemical or physical properties’; sub-classes include analytical anddiagnostic methods for food, drugs and biological materials), and ofthese, four featured at least one item of NPL in their search report;the only other sub-class citing NPL was G01B (‘measuring dimen-sions, angles, areas or surfaces’; sub-classes include a wide variety ofinstrumentation).

Of the ten sources of NPL in these five reports, only two (initalics in Table 10) are found in the February 2010 listing of peri-odicals forming part of the PCT Minimum Documentation [23]. Tolocate the US Government report cited in the ‘431 case wouldcertainly require some ingenuity on the part of a person skilled inthe subject field; there is no indication that the applicant cited thisdocument, although they did make reference to a number of otherrelatively unusual sources, such as ANSI and national trade asso-ciation standards. However, apart from this report, there is noobvious reasonwhy the USPTO has beenmore successful than otherISAs in locating NPL in this subject area; all of the other items arefromwell-established, widely available journals of recent date, notlocal or limited-availability items.

5.3. The nature of ‘internet-unique’ citations

The following table (Table 11) provides a complete listing of the51 search reports which cited one or more pieces of information

Table 9Citations of NPL in IPC class H01 by the EPO.

Publication number Items of NPLin search report

% NPL insearch report

WO 2007/040774 1 20%WO 2007/041113 3 38%

WO 2007/049170 1 13%WO 2007/052083 2 33%WO 2007/059961 1 17%WO 2007/064334 2 29%WO 2007/064507 2 25%WO 2007/087959 1 25%WO 2007/107601 1 20%WO 2007/116362 1 20%WO 2007/127014 2 50%

WO 2007/131659 1 10%WO 2007/134560 1 13%WO 2007/149680 1 25%

which were considered to be ‘internet-unique’. The list has beensorted in order of IPC sub-class to highlight which subject areasseem to attract the most internet-unique NPL items. This confirmsthat sub-classes C12N (micro-organisms and enzymes), H04L (datatransmission) and G06F (electronic data processing) feature highlynot just in the generic NPL citation rate, but also in the internet-unique area as well; these three areas contribute 58/107 or 54%of the total internet-unique citations in the sample.

Fig. 2 and Fig. 3 show a breakdown of the internet-uniquecitations according to the ISA which cited them (across all tech-nical fields), and by the general form or category of the disclosurebeing cited. The categories used in Fig. 3 are explained in moredetail, with an example, in Table 12.

Fig. 2 confirms what has been observed from other analyses;namely, that the EPO is a significant performer in retrieving NPL ofall types, and that (for this sample at least), IP Australia alsoperforms impressively.

It is worth highlighting some aspects from the categories andexamples in Table 12:

� Biological databank: Although these databank records ofteninclude a bibliographic reference to a corresponding journal orpatent disclosure, the sequence data may not be retrievable inelectronic form from these sources (e.g. the journal article mayonly include a printed version of the sequence). Therefore, the

Comments

IEEE conference (1988) (XP10750696)2 @ J. Power Sources (2003) (XP4430304 and XP4430222);Electrochimica Acta (2002) (XP4391765)IEEE Circuits and Devices Magazine (2004) (XP1192975)2 @ Applied Physics Letters (1997, 1998) (XP680570 and XP12021895)Applied Physics Letters (2004) (XP1234739)2 @ Applied Physics Letters (1993, 1998) (XP394584 and XP12021001)2 @ IEEE conferences (1998, 2002) (XP10361991 and XP10677014)Progress in Quantum Electronics (2006) (XP5280682)IBM Technical Disclosure Bulletin (1989) (XP27009)IBM Technical Disclosure Bulletin (1990) (XP123854)J. Non-Crystalline Solids (2006) (XP5280521);J. Alloys and Compounds (2003) (XP4422774)Science et Technique du Froid (2004) (XP962505)Materials at High Temperatures (2005) (XP8083919)IEEE Transactions.(2006) (XP2464046)

Table 11’Internet-unique’ NPL references.

IPC sub-class ISA Publication No. ‘Internet-unique’items (no.)

A23L US WO2007/046937 1(Sub-class total) 1

A61C US WO2007/096824 1(Sub-class total) 1

A61K EP WO2007/051763 1EP WO2007/080263 1EP WO2007/113207 1RU WO2007/008111 1US WO2007/044989 1

(Sub-class total) 5A61Q EP WO2007/003915 2(Sub-class total) 2

B29C EP WO2007/024856 4(Sub-class total) 4

C07D JP WO2007/043400 1(Sub-class total) 1

C07K EP WO2007/044756 1(Sub-class total) 1

C10L US WO2007/142983 1(Sub-class total) 1

C12N AU WO2007/048189 13EP WO2007/048086 7EP WO2007/051483 2EP WO2007/147395 2JP WO2007/018047 1JP WO2007/114516 3JP WO2007/119515 2

(Sub-class total) 30C12Q EP WO2007/039232 6

EP WO2007/060471 1EP WO2007/101664 1

(Sub-class total) 8E04C AU WO2007/131284 2(Sub-class total) 2

F02C AU WO2007/137370 3(Sub-class total) 3

G01N EP WO2007/000583 1(Sub-class total) 1

G01T CA WO2007/016783 1(Sub-class total) 1

G05B EP WO2007/096322 1(Sub-class total) 1

G06F AU WO2007/118271 3EP WO2007/068563 2EP WO2007/088100 2EP WO2007/089503 1US WO2007/076313 2

(Sub-class total) 10G06Q AU WO2007/065209 1(Sub-class total) 1

G06T CA WO2007/112557 2EP WO2007/023502 1

(Sub-class total) 3G09F FI WO2007/116117 2(Sub-class total) 2

G11B EP WO2007/063348 2(Sub-class total) 2

H03M EP WO2007/127360 2(Sub-class total) 2

H04H SE WO2007/132310 1(Sub-class total) 1

H04K US WO2007/078329 1(Sub-class total) 1

H04L AU WO2007/018476 1EP WO2007/036399 1EP WO2007/066299 1EP WO2007/082056 2EP WO2007/100809 2EP WO2007/130637 3JP WO2007/015482 3SE WO2007/007170 5

(Sub-class total) 18

Table 11 (continued)

IPC sub-class ISA Publication No. ‘Internet-unique’items (no.)

H04M EP WO2007/125025 1(Sub-class total) 1

H04Q EP WO2007/077436 2EP WO2007/133448 2

(Sub-class total) 4

S. Adams / World Patent Information 34 (2012) 112e123120

databank record is sufficiently distinct to be regarded as an‘internet-unique’ source.

� Defensive Publication: Although both Research Disclosure andIP.com are available as printed disclosure journals, they bothhave enhanced searchability in their electronic format via theirown website, as well as being partially covered by some elec-tronic secondary services (e.g. CAS and Thomson DWPI), sowere treated as internet sources for the purposes of this survey.Searching on the XP number in Espacenet produces an addi-tional accession number which could be used to locate theindividual article in the IP.com Journal issue cited.

� Standard/draft standard: The cited URL is a typical example ofthe problem of citing from within search engines (discussedabove); for non-subscribers, it does not link to the documentbut only to the login page of the IEEE Xplore digital library. See<http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber¼6236> as the nearest corresponding link for a non-subscriber. In addition, given the ANSI number, it is possible topurchase a copy from the ANSI website, but the searcher onlyhas access to a brief abstract to assist retrieval.

� Technical Document: This example article had been previouslypublished in CircuiTree Magazine by an employee of thecompany owning the Camtek.co.il website, and had loaded asa facsimile reprint.

As a general observation, it is also worth pointing out that 4 outof the 7 URLs in these examples did not lead to the document whichhad been cited in the original search report; this was often due towebsite restructuring or, in the case of the EMEA example (E-commerce information), relaunch of the website under a newdomain name. The maximum interval between date of citation inthe search report and checking the link was less than 5 years, which

Fig. 2. Internet-unique citations by ISA.

Fig. 3. Internet-unique citations by category.

S. Adams / World Patent Information 34 (2012) 112e123 121

does not bode well for the archival quality of such search reports inthe future.

The breakdown in Fig. 3 shows that the various public databanksof genetic sequences (‘Biological databank’) form the largest singleproportion of all internet-unique disclosures. This is not surprising,since these information sources have effectively grown up with the‘internet age’, and with a few exceptions have not been reloadedonto existing commercial online hosts, so the internet is the mosteffective way of retrieving information from them.

Table 12Categorisation of internet-unique citations.

Category Detailed content Example

Archivedwebpage

Record of website asrecorded in Archive.orgor similar backup service.

WO 2007/065209-AW3C recommendat<http://web.archive

Biologicaldatabank

Individual record fromgenetic sequence sourcessuch as GeneSeq, EMBL, NCBI

WO 2007/147395-Aaccession number E

Defensivepublication

Dedicated electronic sourcesfor deliberate public disclosure,e.g. Research Disclosure, IP.com

WO 2007/036399-AIP.com Journal 2003

E-commerceinformation

Websites for sale of inventioncomponents, including productdatasheets or factsheets,chemical catalogues etc.

WO 2007/051763-A<http://www.emea[broken link e closeindex.jsp?curl¼pagmenus/medicines/m

Meetingproceedings

References to, or documents from,technical meetings where there isno evidence to suggest thatformal publication ofproceedings has taken place.

WO 2007/096322-AApplication to Fluid<http://www.ritsum

Standard/draftstandard

Complete standards whichhave been promulgated throughthe website of the standard-settingbody, or public submissions byindustry commenting on draft versions.

WO 2007/016783-ASpectrometers for tANSI N42.14-1999.<http://ieeexplore.i

Technicaldocument

Individual white papers, reports,presentation slides, Word documentsor similar discrete disclosuresavailable from within a website.

WO 2007/023502-Asystems”. <http://wgid,4/Itemid,148/gt;[broken link e nowcom_docman&task¼

Universityrepository/thesis

Reference to, or full text of, advancedtheses from one or more universities,loaded into an institutional documentrepository.

WO 2007/112557-AVisualization.” Disse<http://www.cg.tuw[broken link e docu

Webpage Full text disclosures embedded withina website; content other thandiscrete documents.

WO 2007/142983-A<http://journeytofo

The next most popular category is ‘Technical Document’, whichcovers a wide range of disclosures, similar to the classical paper‘grey literature’. Examples include White Papers, archived Power-Point presentations, reports in the form of Word or PDF documentsetc.; essentially, these are all items which are amenable to thecreation of a structured bibliographic reference and storage ina separate database. One of the drawbacks of the present situationis that, even once a document of this type is retrieved, its date ofpublic availability may be difficult to establish. Copyright state-ments, if present at all, only distinguish the year of publication.Material in this category might be better controlled if an auditeddepositary server mechanism was available (such as proposed bythe epi), but as can be seen, this would only directly impact perhaps25% (22/107) of internet-unique citations. University repository(especially thesis) servers already provide this service in theory fortheir specialist publications, but it seems likely that few otherorganisations yet recognise the possible impact of loading their‘internal’ documents onto a website. Only a small minority utiliseeither university repositories or the formal defensive publicationservices such as Research Disclosure or IP.com (only 5 instances outof 107, or less than 5%, within these two categories).

One of the surprises in the analysis was the number of e-commerce sites which were cited. These form a special type ofwebsite, where information about products offered for sale may beaccompanied by detailed specifications, photographs etc. describingthe product. Details of the public availability of the disclosure areparticularly difficult to establish, as products may be placed on saleand withdrawn from sale on a regular basis. One example is fromWO 2007/080263-A1 (Pharmaceutical composition containingomega-3 fatty acids and a silica), where the EPO has cited

1: Cascading Style Sheet, Level 2, CSS2 Specification,ion..org/web/20051013073706/www.w3.org/TR/REC-CSS2/cover.html#minitoc>3: “Solanum tuberosum disease resistance homolog gene, clone 32.” EBIMBL:U60077, nucleotides 430e444

1: “Mobile Transaktionsnummern”-07-23. XP13012161.

1: “Equilis Prequenza Te: summary of product characteristics.”.eu.int/vetdocs/vets/Epar/equilisPrequenzaTe/equilisPrequenzaTeM.htm>

st corresponding document now at <http://www.ema.europa.eu/ema/es/medicines/veterinary/medicines/000095/vet_med_000118.jsp&murl¼edicines.jsp&mid¼WC0b01ac058001fa1c>]3: “Silicon piezoresistive 6-DOF Micro Force-Moment Sensing Chip andDynamics.”ei.ac.jp/acd/cg/se/rt/mems/report/FirstSymposium/proc/Dzung-MicroForce.pdf>

1: “American National Standard for Calibration and Use of Germaniumhe Measurement of Gamma-Ray Emission Rates of Radionuclides.”

eee.org/ie15/6236/16663/00768889.pdf> [unusable link]

3: “Optimization and evaluation method for optical inspectionww.camtek.co.il/php/component/option,com_docman/task,doc_view/

available at <http://www.camtek.co.il/php/index.php?option¼cat_view&gid¼92&Itemid¼148gt;]1: “Interactive Volume-Rendering Techniques for Medical Datartation, Institut für Technische Naturwissenschaften und Informatik.ien.at/research/publications/2001/Csebfalvi-thesis/Csebfalvi-thesis-PDF.pdf>

ment cannot be located]3: “The modelling of the biodiesel reaction.”rever.org/biofuel_library/macromodx.html>

Fig. 4. Extract from www.buyomegaprotein.com [accessed on 2011.08.19, � Omega Protein, 2011].

S. Adams / World Patent Information 34 (2012) 112e123122

XP002394475, a reference to thewebsite<www.buyomegaprotein.com>, shown in Fig. 4. The exact citation referred to a sub-pageproviding certain product descriptions, which no longer exists; themost likely corresponding information is now on a companionwebsite, <www.omegapure.com>.

One example from the general Webpage category illustrates thedanger of ‘accidental’ disclosures. Publication WO 2007/088100 isan application by IBM (Computer-implemented method, system,and program product for optimizing a distributed application). Oneof the seven named inventors is Lee Kang-Won, a Korean nationalresident in the US. The search report compiled by the EPO containsonly one X-citation, given the reference XP002429930with the URL<http://www.research.ibm.com/people/k/kangwon/publications/deployment_time_optimization.pdf>. This leads to a reprint ofa paper from the IBM Journal of Research and Development, co-authored by the same Lee Kang-Won, entitled “Deployment timeoptimization of distributed applications.” There is a second L-cita-tion in the search report which corroborates the publication date ofthe report (November 2005) as being before the priority date of theinvention (February 2006).

6. Summary and conclusions

This analysis has confirmed the general anecdotal evidencethat “NPL now constitutes up to 15% of the average patentabilitysearch report”; the gross finding in this survey indicatesapproximately 11% when measured across multiple searchingoffices. It has also provided some quantitative evidence of

differences in NPL retrieval across technical fields (ranging from 7to 55% amongst the highest scoring examples). For the first time,differences in the performance between 12 national or regionalpatent offices have been examined in detail, both at the grosslevel and in specific subject fields, in circumstances when alloffices have access to the same information sources in their roleas PCT ISAs.

A major finding is that the internet is only providing newmaterial to the state of the art in approximately 2% of all searchreport citations; in the majority of cases where a URL is given, theinternet is acting as a medium for distribution for conventionally-published NPL which is available via multiple delivery platforms.In these latter cases, it would be preferable to enforce the use ofa more conventional and stable citation form in search reports. Thiswould assist third parties to locate the cited references in thefuture. In cases where different disclosure forms may straddlea priority date, searchers should of course retain the electroniccitation, but it need only be cited directly in the event of dispute,and otherwise used in a form which helps to ensure long-termstability.

In the minority of cases where the internet is acting as a genuinenew source, better bibliographic control and archiving policies areneeded for the discrete documents, to ensure continued accessi-bility. Disclosures in the form of dynamic web pages are potentiallyvery damaging to novelty, but so far poorly understood orcontrolled; examples include internal wikis (which have not beencited in this study) but also rapidly updated websites, including e-commerce sites.

S. Adams / World Patent Information 34 (2012) 112e123 123

References

[1] Game system, game providing method, and information recordingmedium. European Patent Office Board of Appeal Decision T1134/06, 16Jan 2007.

[2] FAZ Börsenspiel [Online], XP002260458. Retrieved from the Internet: URL:http://web.archive.org/web/20000620174023/http://www.boersenspiel.de[Retrieved on 06.11.11].

[3] Park J. Evolution of industry knowledge in the public domain: prior artsearching for software patents. SCRIPT-ed 2005;2:47e70.

[4] Sampat BN. Examining patent examination: an analysis of examiner- andapplicant-generated prior art. Working Paper, National Bureau of EconomicResearch (NBER) Summer Institute. Cambridge, MA: NBER; 2004.

[5] Michel J, Bettels B. Patent citation analysis: a closer look at the basic input datafrom patent search reports. Scientometrics 2001;51:185e201.

[6] Baré R. Results of a statistical study of the references cited in the searchreports established by the EPO (January 1981). World Patent Information1981;3:56e60.

[7] Claus P, Higham PA. Study of citations given in search reports of internationalpatent applications published under the Patent Cooperation Treaty. WorldPatent Information 1982;4:105e9.

[8] Sternitzke C. Reducing uncertainty in the patent application procedure e

insights from invalidating prior art in European Patent applications. WorldPatent Information 2009;31:48e53.

[9] Callaert J, Van Looy B, Verbeek A, Debackere K, Thijs B. Traces of prior art: ananalysis of non-patent references found in patent documents. Scientometrics2006;69:3e20.

[10] It should be noted that the author’s recent experience with a supposedly‘closed’ discussion list has illustrated that personal details, such as the topicsand content of discussions and e-mail addresses of the correspondents, canindeed ‘leak’ into the public internet as well.

[11] van Staveren M. Prior art searching on the Internet: further insights. WorldPatent Information 2009;31:54e6.

[12] Royal Society Working Group in Intellectual Property. Keeping science open:the effects of intellectual property policy on the conduct of science. Para. 3.28.London: The Royal Society; April 2003.

[13] Guidelines for Examination in the European Patent Office, April 2010 edition,Part B (Guidelines for search), Sections B-III, 2.5 and B-IV, 2.3 para. (vi) and(vii); Part C (Guidelines for Substantive Examination), Section C-IV, 6.2 andPart D (Guidelines for Opposition and Limitation/Revocation Procedures),Section D-V, 3.1.3.

[14] Notice from the European Patent Office concerning internet citations. Off. J.EPO, 2009;32:456e462.

[15] Archontopoulos E. Prior art search tools on the Internet and legal status ofresults: a European Patent Office perspective. World Patent Information 2004;26:113e21.

[16] Actions resulting from the EPO Patent Information Conference 2009 In Biar-ritz, France: Status Report.

[17] The epi, the professional body representing European Patent Attorneys, hadsuggested that the EPO should consider establishing a new electronic repos-itory, with a formal time/date-stamping system and safeguards against laterchanges, as a better alternative to the unverified archiving of the WaybackMachine.

[18] Spencer A, Sheridan J, Thomas D, Pullinger D. UK Government Web conti-nuity: persisting access through aligning infrastructures. International Journalof Digital Curation 2009;4:107e24 and refs. therein to unpublished studies bythe UK National Archives. Available on the internet at: www.ijdc.net/index.php/ijdc/article/view/106/81 [accessed 2011.02.03].

[19] Spencer A, Pullinger D. Managing URLs. Guidance Note TG125 v1.0, UK CentralOffice of Information, March 2009.

[20] See, for example, the activity statistics generated by the Registry of OpenAccess Repositories (ROAR). Available at: http://roar.eprints.org/

[21] Anon. The international patent system yearly review; developments andperformance in 2007. Publication No. 901(E), section 3.6, p.14. Geneva: WIPO;2008. ISBN 978-92-805-1712-5.

[22] ISAs which had conducted no searches in a given subject area are marked as‘None’ in the table, whereas ISAs which had conducted searches but found noNPL are given as ‘0.00%’.

[23] Anon. PCT Minimum Documentation. WIPO Handbook on Industrial PropertyInformation and Documentation, Part 4.2, Appendix 2. Available at: http://www.wipo.int/standards/en/part_04.html [accessed on 17.08.11].

S.R. Adams is founder and managing director ofMagister Ltd., an information and training consul-tancy specialising in patents documentation. Hetrained as a chemist at the University of Bristol, UK,followed by a Masters degree in Information Scienceat City University, London. He has worked in tech-nical information since 1981, latterly with ZenecaAgrochemicals (now Syngenta) as their principalpatent searcher until 1997. He has also been theeditor of ‘‘International Packaging Abstracts’’, a tech-nical searcher in the Ministry of Agriculture, Fish-eries & Food in the UK, and Chair of the Patent andTrade Mark Group.