0% found this document useful (0 votes)
134 views15 pages

An Introduction To The Patstat Database With Example Queries

Uploaded by

Stan Spinoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views15 pages

An Introduction To The Patstat Database With Example Queries

Uploaded by

Stan Spinoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262954135

An Introduction to the Patstat Database with Example Queries

Article in Australian Economic Review · March 2014


DOI: 10.1111/1467-8462.12073

CITATIONS READS

54 2,185

3 authors, including:

Gaétan De Rassenfosse Hélène Dernis


École Polytechnique Fédérale de Lausanne Organisation for Economic Co-operation and Development (OECD)
112 PUBLICATIONS 1,874 CITATIONS 30 PUBLICATIONS 1,543 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Key enabling and emerging technologies View project

IPRoduct View project

All content following this page was uploaded by Gaétan De Rassenfosse on 25 April 2020.

The user has requested enhancement of the downloaded file.


The Australian Economic Review, vol. 47, no. 3, pp. 395–408

Data Survey

An Introduction to the Patstat Database with Example Queries

Gaétan de Rassenfosse, Hélène Dernis and Geert Boedt*

Abstract 1. Introduction

This article provides an introduction to the Empirical research on the economics and
Patstat patent database. It offers guided management of innovation is benefiting from
examples of 10 popular queries that are greater availability of structured data. The most
relevant for research purposes and that cover prominent database is certainly the European
the most important data tables. It is targeted at Patent Office’s (EPO’s) Worldwide Patent
academic researchers and practitioners who Statistical Database, henceforth ‘Patstat’. Pat-
are willing to learn the basics of the database. stat offers bibliographic patent data for more
than 100 patent offices, sometimes as early as
the nineteenth century. It is a valuable tool for
the community of researchers because it
contains raw data that are collected in a
transparent manner. This rich database prom-
ises to greatly improve the quality of empirical
research in the field. It is, however, difficult to
navigate in the wealth of data it offers and many
prospective users are deterred by its apparent
complexity.
This article seeks to demystify Patstat and
offers guided examples on a broad range of
queries.1 It is assumed that the reader has
a general knowledge of Structured Query
Language (SQL).2 We have used the
April 2013 edition of the database and rely
on the MySQL language. Users of another
dialect of SQL may have to slightly adapt the
queries. Our guiding philosophy in creating the
* de Rassenfosse: Melbourne Institute of Applied Economic queries was to cover the most important tables
and Social Research and Intellectual Property Research and to exploit useful SQL commands. We
Institute of Australia, The University of Melbourne, Victoria devote particular attention to outlying some
3010 Australia; Dernis: Organisation for Economic Co-
operation and Development, 75775 Paris Cedex 16 France;
potential uses of the queries for research
Boedt: European Patent Office, Vienna 1030 Austria. purposes, as well as explaining pitfalls of the
Corresponding author: de Rassenfosse, email <gaetand@ data. The reader shall refer to OECD (2009) for
unimelb.edu.au>. The authors are grateful to Monica further guidelines for building and interpreting
Coffano, Jérôme Danguy, Paul Jensen, Catalina Martínez, patent data.
Clinton McCarthur, Nico Rasters and Gianluca Tarasconi
for helpful comments. The article has also benefited from
In a desire to make this introduction
comments by participants at a staff development workshop accessible to the greatest number, we have
at IP Australia, Canberra. produced a test database in MS Access format.

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
Published by Wiley Publishing Asia Pty Ltd
396 The Australian Economic Review September 2014

This database contains the relevant data used in This sub-class includes all the IPC codes that
this article, as well as all the queries. It allows start with F03D such as ‘F03D 1/00’ (wind
readers to familiarise themselves with the motors with rotation axis substantially in wind
Patstat database without having to install direction) and ‘F03D 5/02’ (wind-engaging
it on their computers. The test database and parts being attached to endless chains or the
the queries are available at <http://www. like).
runmycode.org> or upon request from the
authors. 2.1 Identification of Patents by Technology
Field
2. Patstat Cookbook
We use the term ‘application’ to refer to entries
The Patstat database consists of a set of tables in table tls201_appln. This table lists all the
that follow a relational database schema, applications available in the Patstat database
where tables can be connected to each other and assigns them a unique and stable appln_id,
using a relevant entry key.3 The table on which is built from a combination of the patent
patent applications, labelled tls201_appln, authority (the patent office where the applica-
contains more than 74 million records and is tion was submitted), the patent application
the central element of Patstat, as shown in number and the application kind code (indicat-
Figure 1. The other tables contain information ing, for example, whether the application is a
on each of the patent applications; for patent application, a Patent Cooperation Treaty
example, inventors and applicants, technology (PCT) application or a design application). The
fields, titles and abstracts, publication instan- application number is distinct from the appli-
ces and citations. cation identifier. The application number is the
To limit the volume of data retrieval, we run number issued by the patent authority where the
our queries on a sample of patent applications application was filed, whereas the application
describing inventions related to wind-turbine identifier is specific to the Patstat database. The
technologies and filed in the year 2005 any- latter is called a ‘primary key’ in SQL jargon. In
where in the world. Patent applications related Query 1, the appln_id is used to link table
to wind-turbine technologies are predominantly tls201_appln with table tls209_appln_ipc,
found in the International Patent Classification which contains the IPC codes assigned to
(IPC) sub-class F03D (Dubaric et al. 2011).4 each application.

Figure 1 Patstat Database Schema

Classification Title
TLS209_APPLN_IPC TLS202_APPLN_TITLE
TLS222_APPLN_JP_CLASS Abstract
TLS224_APPLN_CPC TLS203_APPLN_ABSTR

Families Legal status


TLS218_DOCDB_FAM TLS221_INPADOC_PRS
TLS219_INPADOC_FAM Application
TLS201_APPLN
Citations
Priorities Publication TLS212_CITATION
TLS204_APPLN_PRIOR TLS212_PAT_PUBLN TLS214_NPL_PUBLN
Applicants and
inventors TLS215_CITN_CATEG
TLS206_PERSON
TLS207_PERS_APPLN
TLS208_DOC_STD_NMS

Note: Not all the tables are reported.


Source: European Patent Office, Patstat database, April 2013.

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 397

The second statement creates a temporary


table (a ‘view’ in SQL jargon) that is referred to
as our_sample and contains the set of patents
related to wind-turbine technologies as defined
in the first statement. Views are particularly
useful to break down queries into smaller,
simpler pieces. Views cannot have indices, so
that they are better suited for small populations.
Users of MS SQL should remove the ORDER
BY keyword from the first query.

2.2 Identifying Patent Cooperation Treaty


The first statement extracts: the unique Applications
application identifier (appln_id); the two-letter
code of the patent application authority The PCT is an international patent law treaty
(appln_auth); the patent application number that provides a unified procedure for filing
(appln_nr) and the kind of application (appln_ patent applications to protect an invention in
kind). We select patent applications by choos- each of its contracting states. A patent applica-
ing applications with an appln_kind code of tion filed under the PCT is called an interna-
either ‘A’ (direct filing) or ‘W’ (PCT applica- tional application, or PCT application. These
tion, see Subsection 2.2). The SELECT applications are often associated with inven-
DISTINCT clause is used to avoid duplicates tions of high market potential (van Zeebroeck
in the result table in case a given patent and van Pottelsberghe de la Potterie 2011) and
application has more than one IPC code starting are being used increasingly by patent applicants
with F03D. The query returns 2,125 distinct (Frietsch, Neuhäusler and Rothengatter 2013).6
patent applications and sorts them by appln_ Researchers sometimes use them to study the
auth and appln_id. The first five results are international dimension of patenting activity
presented in Table 1 for illustrative purposes. (see, for example, Allred and Park 2007). A
Note that the use of ORDER BY generally PCT application does not automatically lead to
slows down queries and can usually be avoided. global patent protection. Instead, patent appli-
The two-letter code shown in appln_auth cants eventually need to ‘apply’ for patents in
column in Table 1 corresponds to the receiving each of the jurisdictions where they wish to
office: ‘AP’ refers to the African Regional pursue patent protection by starting the national
Intellectual Property Organization (ARIPO) search and/or examination process. These
and ‘AR’ to Argentina’s National Institute of ‘national’ patents are formally referred to as
Industrial Property. The codes follow the World national-phase entry of PCT applications.
Intellectual Property Office’s (WIPO’s) ST.3 In Patstat, PCT applications at international
format.5 Exceptionally, some codes in Patstat phase can be identified in different ways. They
might not have a correspondence (for instance, are associated with an appln_kind code ‘W’ in
if an applicant cites a patent document with a table tls201_appln and they are associated
non-standard country code). with a publishing patent authority (publn_
auth) that is set to ‘WO’ in table tls211_pat_
Table 1 First Five Rows of Query 1 publn. The two-letter code ‘WO’ stands for
WIPO. National-phase entry of PCT applica-
appln_id appln_auth appln_nr appln_kind tions can be identified with the field internat_
55286477 AP 200603687 A appln_id in table tls201_appln, which corre-
55286499 AP 200603713 A sponds to the appln_id of the PCT application
532990 AR P050100289 A (the field internat_appln_id is set to 0 for
533082 AR P050100386 A applications not originating from a PCT
533175 AR P050100493 A
filing).

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
398 The Australian Economic Review September 2014

The query below lists all PCT applications of patent applications described in Query 1 may
that entered the national phase either at the State lead to multiple counts of inventions since it
Intellectual Property Office of the People’s mixes both priority and second filings. Count-
Republic of China (SIPO) or at the Japan Patent ing unique inventions involves counting only
Office (JPO) and for which the Danish Patent priority filings.7 de Rassenfosse et al. (2013)
and Trademark Office (DKPTO) was the explain the details of such a ‘worldwide count
receiving office (that is, application authority) of priority filings’. The issue of double counting
of the initial PCT application. becomes less acute if patents are counted at a
single office of reference such as at the EPO.8
Second, it may be desirable to know the priority
status of the patent document in order to avoid
potential selection bias, especially when patents
are counted at a single office of reference.
de Rassenfosse, Schoen and Wastyn (2014)
explain that the single-office count may
produce biased econometric estimates of patent
production functions.9 They propose a test
based on the priority status of the patent
application to detect the presence of selection
bias.
The query below returns the priority status of
The statement relies on patents in our_ the patent documents in our set.
sample. It literally selects all the applications
submitted at the SIPO or the JPO that have their
internat_appln_id equal to the appln_id of PCT
patent applications in our_sample that were
submitted at the DKPTO. The first five results
(out of a total of 15 national-phase entries) are
presented in Table 2.

2.3 Obtaining Information on Priority Status

A priority patent application is the first patent


application that was filed to protect an inven- This statement selects every appln_id from
tion. Under the 1883 Paris convention, a our_sample dataset and matches them to
priority patent can be filed in other jurisdictions, appln_id provided in table tls204_appln_
with the aim of extending the protection to other prior, which lists priority patents claimed in
countries. The subsequent patents are called second filings. By definition, all patent appli-
‘second filings’. cations that do not claim a priority are priority
The priority status of patent documents is an filings. Therefore, the column labelled is_a_pf
important piece of information. First, the count takes the value 1 if no match is found. Note

Table 2 First Five Rows of Query 2

PCT_appln_id PCT_appln_auth PCT_appln_nr appln_kind appln_id_sf appln_auth_sf

15563101 DK 2005000031 W 8300709 CN


15563116 DK 2005000046 W 8300768 CN
15563118 DK 2005000048 W 8300756 CN
15563246 DK 2005000181 W 8306357 CN
15563258 DK 2005000193 W 39635652 JP

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 399

Table 3 First Five Rows of Query 3

appln_id is_a_pf

65303 0
133780 0
149552 1
151084 0
151176 0

Notice that Query 4 calls table tls219_inpa-


that, contrary to previous queries, tables are doc_fam twice, under the aliases t2 and t3: t2
linked together using the LEFT OUTER JOIN links each appln_id from our_sample to its
statement. This joint returns all rows from the patent-family identifier inpadoc_family_id and
left table (t1) and adds information from the is in turn linked to t3 to retrieve and count all
right-hand-side table (t2) when a match exists. family members (t3.appln_id) that belong to
Note also that second filings may claim the same inpadoc_family_id. The first five rows
more than one priority filing in table tls204_ are presented in Table 4.
appln; hence, the use of the DISTINCT clause. Researchers are sometimes interested in the
Query 3 reports 957 priority applications out of number of jurisdictions that the family covers.
2,125 patent applications originally identified in For example, the OECD produces an indicator
our_sample. The first five records are presented on triadic patent families, which captures
in Table 3 for illustrative purposes. patents granted by the US Patent and Trademark
Office (USPTO) and filed at the EPO and the
2.4 Computing the Patent-Family Size JPO to protect the same set of inventions (Dernis
and Khan 2004). de Rassenfosse and van
A patent family refers to a group of patent Pottelsberghe de la Potterie (2009) show that
applications that are all related to each other by triadic patents are a good indicator of countries’
way of one or several common priority filings. research productivity (compared with priority
Following Putnam (1996), researchers use filings, which are affected by variations in the
information on patent families as a proxy for propensity to patent across countries). Informa-
patent value. The validity of this approach was tion on how to identify triadic patents in Patstat
established by Harhoff, Scherer and Vopel is provided in Appendix 1. Another family-
(2003), who show that family size is correlated based indicator is obtained by simply counting
with estimates of the value of patent rights from the number of jurisdictions identified in a
a survey of patent-holders. The family size is an family—we call it here the ‘geographic’ family
internationally comparable measure of value size (see also Squicciarini, Dernis and Criscuolo
and is thus well suited for studies relying on 2013). Query 4 can be easily adapted to measure
patent applications that are filed in different the geographic family size, as illustrated in
jurisdictions. Query 5.
The next query counts the patent-family size
associated with the applications in our_sample.
We adopt the ‘extended’ family definition
(International Patent Documentation Center
(INPADOC)), which captures all applications
that are directly or indirectly linked via priority
filings. An alternative approach involves
using the DOCDB family, available in table
tls218_docdb_fam. Various definitions of (and
hence ways to measure) patent families exist
and a good overview is provided in OECD Compared with Query 4, Query 5 uses
(2009) and Martínez (2011). information from an additional table, tls211_

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
400 The Australian Economic Review September 2014

Table 4 First Five Rows of Query 4 approximately 500 million people, its actual
appln_id family_size
coverage could be much smaller depending on
the countries in which the patent was validated.
65303 9 One way of dealing with the issue involves
133780 4
adding information on the number of juris-
149552 14
151084 13 dictions in which regional applications were
151176 9 validated after the patent was granted at the
regional office considered. This can be done
with Patstat but we will not discuss it as it
far exceeds the scope of this article.11 Note
Table 5 First Five Rows of Query 5 also that patents listed in our_sample may
belong to the same family and further consoli-
appln_id geog_family_size
dation may be envisaged to control for double
65303 4 counting.
133780 4
149552 12
2.5 Counting Patents by Country (Simple
151084 11
151176 8 Counts versus Fractional Counts)

Patent data provide information on inventors


and applicants and thus are a rich source of
pat_publn, to recover information on the patent information about the structure of technology
offices of destination (publication authorities) production. Briefly, the inventor country
of all INPADOC family members and excludes of residence reflects the country of origin of
the PCT publication authority (WO) as it has an inventions, whereas the applicant country of
international coverage.10 The first five results residence reflects the ownership of inventions.
are presented in Table 5. A comparison with the OECD (2009) provides a comprehensive
results that are presented in Table 4 suggests discussion on the choice of the reference
that large differences may exist between the country for building patent counts. Two distinct
two measures of family size. For example, counting approaches can be applied in response
while the family associated with appln_id to specific analytical requirements: simple
number 65303 has nine members, it covers count method versus fractional count method.
only four jurisdictions: Germany, members of Since a large number of patent applications are
the European Patent Convention (through the due to teamwork, it is likely that more than one
EPO), the United States and China. There are inventor has contributed to the protected
various reasons why the family size may differ invention, located in one or several countries.
from the geographic family size such as Similarly, several applicants may co-own a
procedural reasons (unity of invention require- unique patent. The fractional count procedure is
ment or maximum number of independent used to better reflect the contribution of each
claims) and patent strategy reasons (for exam- country and avoid multiple counts of the same
ple, creation of patent thickets). patent in different countries.
Note that Query 5 reports the number of The list of inventors (applicants) can be
distinct patent offices and not the number of identified using two additional tables:
distinct countries per se. This distinction tls207_pers_appln lists the correspondence
matters when patents are filed at regional between patent application and inventors
offices, such as the ARIPO or the EPO, which (applicants) and tls206_person provides details
cover many jurisdictions. Patents granted by a on names and addresses. The person_id
regional office must be validated in each of the identifier enables one to establish the link
member states where patent protection is between these two tables. Note that not all
sought. As a result, while a patent application patent documents listed in tls201_appln have
at the EPO virtually covers a market of an entry in tls207_pers_appln.

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 401

The query below performs a fractional count Table 6 Five Randomly Selected Rows of Field
Selection of Query 6 (Joined from t1 and t2)
of inventors’ country of residence for patent
applications in our_sample. Inventors have a appln_id person_ctry_code tot_in_ctry tot_in_patent
field invt_seq_nr greater than 0 in table
263066 DE 2 2
tls207_pers_appln, while applicants have a
273390 CH 1 4
field applt_seq_nr greater than 0. 273390 DE 3 4
273768 JP 1 1
273769 JP 1 1

Table 7 First Five Rows of Query 6


(Fractional Count by Applicant Country)

person_ctry_code fractional_count

– a
609.5
DE 357.2
US 248.0
CN 155.8
DK 113.5

Note: (a) The 609.5 figure represents the total fraction of


inventors for which no country code is available.

needs to be a float by using (CONVERT(float,


COUNT(b.person_id)) AS tot_in_ctry) in
query t1. Previews of results for the field
The above script is more advanced than selection (our_sample_with_country) and the
previous scripts as it is composed of embedded final count are presented in Tables 6 and 7,
queries providing intermediary counts for respectively.
facilitating fractional counts by country. (It is Table 6 shows that all the inventors of
possible to break it into smaller statements appln_id 263066 are German. By contrast, one-
using VIEWS.) The aggregated counts by fourth of inventors of appln_id 273390 is Swiss
country are performed on a selection of fields and three-fourths are German. Grouping all the
(named our_sample_with_country), extracted shares by person_ctry_code leads to the results
using our_sample table and two sub-queries. presented in Table 7. Among the 2,125 patent
Sub-query t1 reports the count of inventors by applications in our_sample, 609.5 have not
country and by patent and sub-query t2 reports been allocated to a country and 357.2 patents
the total number of inventors by patent. Output were due to German inventors. A methodology
from t1 and t2 is then linked to patents in for recovering missing country codes is pre-
our_sample by using a LEFT OUTER JOIN sented in de Rassenfosse et al. (2013).
statement to account for missing records in It is straightforward to adapt Query 6 to
tls207_pers_appln table. MySQL function applicants’ country of residence (using the
‘ifnull()’ replaces the missing records with an applt_seq_nr field instead of invt_seq_nr in
empty record and sets the count to 1 where tls207_pers_appln). It is important to stress that
records are missing (either because the appln_id applicant and inventor information provided in
was not found in table tls207_pers_appln or Patstat and linked via the tls207_pers_appln
because no person_id was identified for table corresponds to the information available
invt_seq_nr greater than 0). Users of MS in the last publication associated with an
SQL should use the ‘isnull()’ function instead. application. For example, if an EP-B1 publica-
They should also specify that the final count tion has different applicant names to the

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
402 The Australian Economic Review September 2014

original EP-A1 publication, then only the Table 8 First Five Rows of Query 7
person information for the B1 publication appln_id nb_locations
will be available.12 To recover the current
information, it is possible to link Patstat data 48145305 3
273390 2
with data provided by national patent offices, as
4975233 2
explained in Subsection 2.9. 4979189 2
5804835 2
2.6 Identifying Patents Resulting from
International Collaborations

The information on applicants and inventors 2.7 Counting Citations Received


has been used to study, among other questions,
international R&D collaboration (Guellec Following early works by Carpenter, Narin and
and van Pottelsberghe de la Potterie 2001; Woolf (1981) and Trajtenberg (1990), citation
Picci 2010; Danguy 2014), R&D offshoring data are used as an indicator of quality, which is
(Thomson 2013), or network of inventors broadly defined as the technological merit and
(Balconi, Breschi and Lissoni 2004; Ejermo the economic potential of an invention. Note
and Karlsson 2006). To the best of our that other indicators of patent quality exist: see
knowledge, only a limited number of studies in particular the recent work by Squicciarini,
assess the validity of these indicators. One such Dernis and Criscuolo (2013). Citation data are
study is Bergek and Bruzelius (2010), which also frequently used to track knowledge flows
casts some doubt on the use of inventor data to (Jaffe, Trajtenberg and Henderson 1993) and to
measure R&D collaboration. measure the speed of knowledge obsolescence
An example of query identifying patents (Clark 1976; Jaffe and Trajtenberg 1996).
resulting from international collaboration is While patent citation data may offer very rich
presented below, the rationale being that patent insights, they must be used with caution.
applications for which the field nb_locations is One must pay close attention to the effects
greater than 1 involve inventors that reside in of the institutional environment on the rele-
different countries and are thus the outcome vance of citation data as an economic indicator.
of an international collaboration (that is, In particular, patent citation practices differ
co-invented patents). across patent offices (Michel and Bettels
2001) and examiner-added citations may add
extra noise to the data (see Alcácer and
Gittelman 2006 for USPTO evidence). In
addition, many publications from different
patenting authorities but covering the same
invention can be cited, leading to a fragmenta-
tion of citation records, as explained in Webb
et al. (2005).
The next query counts the number of
citations received in a 3-year time window by
patent applications published by the German
Query 7 counts the number of distinct Patent and Trade Mark Office by patent
inventor countries listed in each patent applica- applications published by the EPO. Citations
tion in our_sample. It reports a positive received by a patent are often referred to as
international collaboration conditional on the ‘forward’ citations, in opposition to ‘backward’
availability of records in tls207_pers_appln citations, which indicate citations made by a
table or in the person_ctry_code field in table patent. The latter is sometimes also called
tls206_person. The first five results are pre- ‘references’ (by analogy to the reference list of
sented in Table 8. a scientific paper).

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 403

Table 9 First Five Rows of Query 8

appln_id cites_3y

14995919 5
14997816 2
14971868 1
14974947 1
14975309 1

Zuniga 2012), but most of the value of a patent


is achieved when the patent is granted and the
owner can enforce its exclusive right. The grant
status is therefore an important economic
variable. Query 9 shows how to recover infor-
mation on whether patent applications in our_
Note that the citation records are based on the sample that were filed at the UK Intellectual
published patent documents; hence, the use Property Office (UKIPO) have been granted.
of publn_auth from table tls211_pat_publn
instead of appln_auth from table tls201_appln.
The field publn_auth captures the publica-
tion authority of the patent document. The
publication authority is also often the receiving
office (appln_auth), except in the case of PCT
applications, where the publication authority is
WIPO and the receiving office is the patent
office where the patent application was actually
lodged. Thus, an alternative to criterion The query uses information from table
(t2.publn_auth ¼ ‘DE’) is (t1.appln_auth ¼ tls211_pat_publn. Each application is asso-
‘DE’ AND t1.appln_kind ¼ ‘A’). The use of ciated with one or more published documents
a time window is important when working and each published document is tagged with an
with patents of different age cohorts in order to office-specific publication kind code to indicate
avoid data truncation. It is easily implemented the kind of publication. The Patstat team has
with the function ‘DATE_ADD()’. Users of identified the publication kind codes associated
MS SQL should use (DATEDIFF(YEAR, with granted documents and the earliest docu-
t2.earliest_date, t4.publn_date) <¼ 3) instead. ment of an application corresponding to a grant
In order to better estimate the citation lag, the is given a value of 1 in the field publn_first_
date of reference is set to the earliest date of grant. All other documents are given a value
publication of the cited patent. Note that the of 0. A simple way of finding whether a patent
count is fairly naïve for reasons explained application was granted is thus to select the
above, as well as because it does not take into maximum value of the field publn_first_grant
account the type of EPO citation. See Webb for each appln_id. If the maximum value is 1, the
et al. (2005, p. 8) for an overview of citation patent was granted. The status of a patent
types at the EPO. The first five results are application associated with a value of 0 is
presented in Table 9. unclear. Other types of legal status include, but
are not limited to: pending, withdrawn, and
2.8 Obtaining Grant Information refused. Detailed information on legal status can
be recovered from table tls221_inpadoc_prs for
A published patent application provides legal some patent offices (see Appendix 1 for details).
rights and some economic benefits to its owner For other offices, it is necessary to link Patstat
(see, for example, Guellec, Martínez and data with data provided by national patent

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
404 The Australian Economic Review September 2014

Table 10 First Five Rows of Query 9 Table 11 First Five Rows of Query 10

appln_id granted appln_id publn_nr_patstat publn_nr_ukipo

21465239 1 21465239 2410379 GB2410379


21466952 0 21467768 2423650 GB2423650
21467768 0 21470294 2441770 GB2441770
21470294 0 21471154 2424926 GB2424926
21471154 0 21471862 2425334 GB2425334

offices, as explained in Subsection 2.9. Note that element of the SELECT DISTINCT clause with
PCT applications are never granted per se: only ((‘GB’ þ RIGHT(t2.publn_nr,7)) AS publn_
applications that entered the national phase can nr_ukipo). Note the exclusion of documents
be granted. The first five results are presented in with publn_kind code ‘D0’, which for the
Table 10. UKIPO correspond to patent applications filed.
The field publn_nr_ukipo can now be used to
2.9 Linking Patstat with Data Provided by search for additional information on the UKIPO
National Patent Offices website. More generally, one must reverse-
engineer the Patstat format to the format use
It is sometimes desirable to enrich Patstat data with the national patent office. The first five
with data directly provided by national patent results are presented in Table 11.
offices; for example, to get accurate informa-
tion on the legal status of patent applications or 3. Concluding Remarks
to collect information on reassignments. This
can be done by using information from the This article has provided a broad overview of
field publn_nr in table tls211_pat_publn. The the Patstat database by discussing typical
reconstruction of the publication number is queries that rely on the main tables. A good
specific to each patent office and Query 10 way to proceed from here is to slightly alter the
focuses on the rather simple example of the queries and observe how result-sets returned
UKIPO. are affected. We hope that users will be able to
devise indicators tailored to their research needs
and therefore contribute to further improving
the quality of empirical research in the fields of
economics and management of innovation. In
order to avoid duplication of work, however,
we encourage researchers to share their con-
tributions with the broad community. Appen-
dix 1 briefly describes add-ons provided by
institutions or individual contributors to enrich
Patstat data.
A large community of users has emerged
The online patent document and information over time and is keen to share its experience and
service of the UKIPO (Ipsum) requires the answer questions of beginners on the Patstat
publication number to be in the following forum on the EPO website. An additional
format: ‘GBnnnnnnn’; that is, the characters helpful resource is the annual Patent Statistics
‘GB’ followed by seven digits. Query 10 thus for Decision Makers conference (and the
appends the characters ‘GB’ in front of the last preceding user workshop), where the Patstat
seven digits of the field publn_nr in order to community gathers and exchanges recent
recompose a publication number that is developments.
compatible with the UKIPO online service.
Users of MS SQL need to replace the last April 2014

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 405

Appendix 1: Resources for Patstat All OECD databases are freely available on the
OECD website.
European Patent Office’s Worldwide Legal
Status Database OECD Triadic Patent Families Database

Also known as tls221_inpadoc_prs table, it It relies on a specific definition of patent family,


contains information on legal events that covering patent applications filed at the EPO,
occurred during the life of a patent, either the JPO and granted by the USPTO and that
before or after grant. Typical events are share the same set of priorities (Dernis and
payment of (national) renewal fees, lapse of Khan 2004). These data are compiled by
the patent, change of ownership, withdrawal of using different patent linkages provided in
the application and entrance into the national Patstat and are a consolidated sub-set of the
phase. The records in this table originate from tls219_ inpadoc_fam table. The appln_id field
the patent gazettes and registers of various can be used to link the data to Patstat.
national patent authorities, including the EPO
and WIPO. Currently over 50 offices provide OECD Citations Database
the EPO with legal status data.
It proposes a consolidated patent citation record
European Patent Register Database of Patstat data for patents filed at the EPO or
through the PCT. It mainly draws on the
Released twice a year, the database contains infrastructure proposed in Webb et al. (2005)
bibliographic, legal and procedural information and takes into account citations of patent and
on published European patent applications and non-patent literature (NPL). In addition to the
on published PCT applications for which the list of cited patents and NPL, it proposes a list of
EPO is a designated office (so-called Euro-PCT EPO or WIPO equivalents to patents cited in
applications). The database is extracted from order to facilitate further consolidation of the
the European Patent Register, which stores all data.
publicly available information that the EPO has
on European patent applications as they pass OECD Patent Quality Indicators Database
through the application and examination pro-
cedure. It includes information on applicants, It proposes a number of indicators that are
inventors, opponents and representatives, pro- aimed at capturing the quality of patents and the
cedural events during application and exami- possible impact that patent quality might have
nation proceedings, opposition and appeal on subsequent technological developments, as
proceedings, and limitation and revocation described in Squicciarini, Dernis and Criscuolo
proceedings. (2013). The current version of the dataset only
relies on patent applications filed at the EPO but
OECD REGPAT Database coverage probably will be expanded in the
future to include patents filed to other offices.
It covers records on patent applications at the Indicators can be replicated using the program
EPO (derived from Patstat) and PCT patents at lines available in Squicciarini, Dernis and
international phase (derived from the EPO’s Criscuolo (2013).
Bibliographic Database’s weekly downloads),
for which addresses of inventors and applicants OECD Harmonised Applicant Names
have been regionalised (that is, assigned to a Database
region code); see Maraut et al. (2008) for
technical details. The dataset covers regional The OECD Harmonised Applicant Names
information for most OECD and EU28 member database proposes a grouping of patent appli-
countries, plus the BRICS countries. It can be cant names resulting from a cleaning and
linked to Patstat data using the appln_id field. matching of names.

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
406 The Australian Economic Review September 2014

EEE-PPAT Table specific introduction exists for the online version of Patstat
(‘Sample Queries and Tips—Patstat Online’), which is
available on the EPO website. The online version offers
In collaboration with the ECOOM department visualisation tools and linked resources, but is less flexible
at KU Leuven, the EPO and the Sogeti than the offline version.
(a software consultancy), EUROSTAT has 2. In particular, we assume knowledge of joins, groups,
devoted considerable effort to harmonising views and embedded queries. Many introductory courses to
applicant names and allocating applicants to SQL are freely available online, including one on the EPO
sectors (private business enterprises, universi- website.
ties and higher education institutions, govern- 3. The identifier of patent applications (appln_id) is
mental agencies, individuals). Sector allocation frequently used to link tables with each other. A full
is relevant for analysing the constituents and description of tables and fields is provided in the Data
dynamics of technological performance on the Catalog, which is available on the Patstat DVDs and can
also be downloaded from the EPO website.
level of innovation systems. Read more at
<http://www.ecoom.be/en/EEE-PPAT>. 4. International Patent Classification codes are used by
patent examiners to identify the areas of technology to which
patents pertain. Note that not all patents have IPC codes.
EP-INV Database on Academic Inventors Wind energy patents can also be identified using the
Cooperative Patent Classification (CPC) code Y02E10/70
This database is the result of a project spon- that is available in table tls224_appln_cpc. The CPC is a
sored by the European Science Foundation and joint classification system between the USPTO and the EPO.
chaired by Francesco Lissoni. The database 5. See WIPO’s ‘Recommended Standard on Two-Letter
contains cleaned and standardised inventors’ Codes for the Representation of States, Other Entities and
names and addresses, as well as information on Intergovernmental Organizations’ (Standard ST.3) for the
exhaustive list of codes, available on the WIPO website.
the affiliations of academic scientists. See Den
Besten et al. (2012) for more information. 6. Note that the link between patent value and PCT status is
a priori ambiguous. As Guellec and van Pottelsberghe de la
Potterie (2000) and Reitzig (2004) point out, patent
World Intellectual Property Office’s applicants may be uncertain about the economic success
International Patent Classification— of the patent’s underlying invention and use the PCT route
Technology Concordance Table to ‘buy’ additional decision-making time. Alternatively, the
economic success of the patent’s underlying invention may
The WIPO’s technology concordance table be well established at the date of filing and PCT is used to
seek global protection as fast as possible.
links the IPC symbols with 35 fields of
technology. The concordance table is updated 7. Note that an alternative approach for counting unique
inventions involves counting families of patents. More
on a regular basis to reflect revisions to the IPC.
information on patent families is provided in the next
Further information is provided on the WIPO section.
website.
8. One can often observe priority filings and subsequent
second filings at the same patent office. This phenomenon is
Worldwide Count of Priority Filings driven by divisional (or similar) applications. If a priority
application was filed at the EPO and a divisional application
de Rassenfosse et al. (2013) have proposed an was also filed at the EPO, this divisional application would
algorithm that exploits patent-family linkages claim priority from the original document and is therefore
technically equivalent to a second filing. Such cases can be
(direct equivalents and other second filings) to identified with table tls216_appln_contn.
recover missing information on inventor and
applicant country of residence. Their algorithm 9. Patent production functions are used in econometric
studies to analyse the determinants of the number of patents
can be used for the recovery of other informa- produced by an economic unit such as a firm or a country.
tion such as missing IPC codes.
10. Indeed, not excluding the PCT application at interna-
tional phase inflates the family count by one unit. For
Endnotes example, if the JPO is the receiving office of a PCT
application, that then enters national phase at the JPO only;
1. This document focuses on the offline Patstat database not excluding the PCT application at international phase
that is available in a series of DVDs from the EPO. A will lead to a family size of 2 instead of 1.

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 407

11. Briefly, the approach for EPO patents would be to use de Rassenfosse, G. and van Pottelsberghe de la
the INPADOC legal status database in addition to Patstat
Potterie, B. 2009, ‘A policy insight into the
and identify the relevant legal status codes that indicate a
validation or renewal fee payment in a designated state. The
R&D–patent relationship’, Research Policy,
INPADOC database is available as an add-in table to vol. 38, pp. 779–92.
Patstat, as explained in Appendix 1. Den Besten, M., Lissoni, F., Maurino, A.,
12. Persons are also linkable to publications since the Pezzoni, M. and Tarasconi, G. 2012,
October 2013 release of Patstat. Kind code ‘A1’ refers to a ‘APE-INV data dissemination and users’
European patent application that is published with European feedback project’, draft paper, 6 June,
search report and ‘B1’ refers to a European patent granted. viewed May 2014, <http://www.esf-ape-
inv.eu/download/Feedback_Document.pdf>.
References Dernis, H. and Khan, M. 2004, ‘Triadic patent
families methodology’, Organisation for
Alcácer, J. and Gittelman, M. 2006, ‘Patent Economic Co-operation and Development
citations as a measure of knowledge Directorate for Science, Technology and
flows: The influence of examiner citations’, Industry Working Paper no. 2004/02, Paris.
Review of Economics and Statistics, vol. 88, Dubaric, E., Giannoccaro, D., Bengtsson, R.
pp. 774–9. and Ackermann, T. 2011, ‘Patent data as
Allred, B. and Park, W. 2007, ‘Patent rights and indicators of wind power technology devel-
innovative activity: Evidence from national opment’, World Patent Information, vol. 33,
and firm-level data’, Journal of International pp. 144–9.
Business Studies, vol. 38, pp. 878–900. Ejermo, O. and Karlsson, C. 2006, ‘Interre-
Balconi, M., Breschi, S. and Lissoni, F. 2004, gional inventor networks as studied by patent
‘Networks of inventors and the role of coinventorships’, Research Policy, vol. 35,
academia: An exploration of Italian patent pp. 412–30.
data’, Research Policy, vol. 33, pp. 127–45. Frietsch, R., Neuhäusler, P. and Rothengatter,
Bergek, A. and Bruzelius, M. 2010, ‘Are O. 2013, ‘Which road to take? Filing routes
patents with multiple inventors from different to the European Patent Office’, World Patent
countries a good indicator of international Information, vol. 35, pp. 8–19.
R&D collaboration? The case of ABB’, Guellec, D., Martínez, C. and Zuniga, P. 2012,
Research Policy, vol. 39, pp. 1,321–34. ‘Pre-emptive patenting: Securing market
Carpenter, M., Narin, F. and Woolf, P. 1981, exclusion and freedom of operation’, Eco-
‘Citation rates to technologically important nomics of Innovation and New Technology,
patents’, World Patent Information, vol. 3, vol. 21, pp. 1–29.
pp. 160–3. Guellec, D. and van Pottelsberghe de la
Clark, C. 1976, ‘Obsolescence of the patent Potterie, B. 2000, ‘Applications, grants and
literature’, Journal of Documentation, vol. the value of patent’, Economics Letters,
32, pp. 32–52. vol. 69, pp. 109–14.
Danguy, J. 2014, ‘Globalization of innovation Guellec, D. and van Pottelsberghe de la
production: A patent-based industry analy- Potterie, B. 2001, ‘The internationalisation
sis’, iCite Working Paper no. 2014-009, of technology analysed with patent data’,
Université Libre de Bruxelles. Research Policy, vol. 30, pp. 1,253–66.
de Rassenfosse, G., Dernis, H., Guellec, D., Harhoff, D., Scherer, F. and Vopel, K. 2003,
Picci, L. and van Pottelsberghe de la Potterie, ‘Citations, family size, opposition and the
B. 2013, ‘The worldwide count of priority value of patent rights’, Research Policy,
patents: A new indicator of inventive acti- vol. 32, pp. 1,343–63.
vity’, Research Policy, vol. 42, pp. 720–37. Jaffe, A. and Trajtenberg, M. 1996, ‘Flows of
de Rassenfosse, G., Schoen, A. and Wastyn, A. knowledge from universities and federal
2014, ‘Selection bias in innovation studies: laboratories: Modeling the flow of patent
A simple test’, Technological Forecasting citations over time and across institutional
and Social Change, vol. 81, pp. 287–99. and geographic boundaries’, Proceedings

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
408 The Australian Economic Review September 2014

of the National Academy of Sciences of Reitzig, M. 2004, ‘Improving patent valua-


the United States of America, vol. 93, tions for management purposes—Validating
pp. 12,671–7. new indicators by analyzing application
Jaffe, A., Trajtenberg, M. and Henderson, R. rationales’, Research Policy, vol. 33,
1993, ‘Geographic localization of knowl- pp. 939–57.
edge spillovers as evidenced by patent Squicciarini, M., Dernis, H. and Criscuolo, C.
citations’, Quarterly Journal of Economics, 2013, ‘Measuring patent quality: Indicators
vol. 108, pp. 577–98. of technological and economic value’,
Maraut, S., Dernis, H., Webb, C., Spiezia, V. Organisation for Economic Co-operation
and Guellec, D. 2008, ‘The OECD REGPAT and Development Directorate for Science,
database: A presentation’, Organisation for Technology and Industry Working Paper no.
Economic Co-operation and Development 2013/03, Paris.
Directorate for Science, Technology and Thomson, R. 2013, ‘National scientific capaci-
Industry Working Paper no. 2008/02, Paris. ty and R&D offshoring’, Research Policy,
Martínez, C. 2011, ‘Patent families: When do vol. 42, pp. 517–28.
different definitions really matter?’, Sciento- Trajtenberg, M. 1990, ‘A penny for your
metrics, vol. 86, pp. 39–63. quotes: Patent citations and the value of
Michel, J. and Bettels, B. 2001, ‘Patent citation innovations’, RAND Journal of Economics,
analysis: A closer look at the basic input data vol. 21, pp. 172–87.
from patent search reports’, Scientometrics, van Zeebroeck, N. and van Pottelsberghe de la
vol. 51, pp. 185–201. Potterie, B. 2011, ‘Filing strategies and
Organisation for Economic Co-operation and patent value’, Economics of Innovation and
Development 2009, OECD Patent Statistics New Technology, vol. 20, pp. 539–62.
Manual, OECD, Paris. Webb, C., Dernis, H., Harhoff, D. and Hoisl, K.
Picci, L. 2010, ‘The internationalization of 2005, ‘Analysing European and international
inventive activity: A gravity model using patent citations: A set of EPO patent database
patent data’, Research Policy, vol. 39, building blocks’, Organisation for Economic
pp. 1,070–81. Co-operation and Development Directorate
Putnam, J. 1996, ‘The value of international for Science, Technology and Industry Work-
patent rights’, PhD thesis, Yale University. ing Paper no. 2005/9, Paris.

°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research

View publication stats

You might also like