An Introduction To The Patstat Database With Example Queries
An Introduction To The Patstat Database With Example Queries
net/publication/262954135
CITATIONS READS
54 2,185
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Gaétan De Rassenfosse on 25 April 2020.
Data Survey
Abstract 1. Introduction
This article provides an introduction to the Empirical research on the economics and
Patstat patent database. It offers guided management of innovation is benefiting from
examples of 10 popular queries that are greater availability of structured data. The most
relevant for research purposes and that cover prominent database is certainly the European
the most important data tables. It is targeted at Patent Office’s (EPO’s) Worldwide Patent
academic researchers and practitioners who Statistical Database, henceforth ‘Patstat’. Pat-
are willing to learn the basics of the database. stat offers bibliographic patent data for more
than 100 patent offices, sometimes as early as
the nineteenth century. It is a valuable tool for
the community of researchers because it
contains raw data that are collected in a
transparent manner. This rich database prom-
ises to greatly improve the quality of empirical
research in the field. It is, however, difficult to
navigate in the wealth of data it offers and many
prospective users are deterred by its apparent
complexity.
This article seeks to demystify Patstat and
offers guided examples on a broad range of
queries.1 It is assumed that the reader has
a general knowledge of Structured Query
Language (SQL).2 We have used the
April 2013 edition of the database and rely
on the MySQL language. Users of another
dialect of SQL may have to slightly adapt the
queries. Our guiding philosophy in creating the
* de Rassenfosse: Melbourne Institute of Applied Economic queries was to cover the most important tables
and Social Research and Intellectual Property Research and to exploit useful SQL commands. We
Institute of Australia, The University of Melbourne, Victoria devote particular attention to outlying some
3010 Australia; Dernis: Organisation for Economic Co-
operation and Development, 75775 Paris Cedex 16 France;
potential uses of the queries for research
Boedt: European Patent Office, Vienna 1030 Austria. purposes, as well as explaining pitfalls of the
Corresponding author: de Rassenfosse, email <gaetand@ data. The reader shall refer to OECD (2009) for
unimelb.edu.au>. The authors are grateful to Monica further guidelines for building and interpreting
Coffano, Jérôme Danguy, Paul Jensen, Catalina Martínez, patent data.
Clinton McCarthur, Nico Rasters and Gianluca Tarasconi
for helpful comments. The article has also benefited from
In a desire to make this introduction
comments by participants at a staff development workshop accessible to the greatest number, we have
at IP Australia, Canberra. produced a test database in MS Access format.
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
Published by Wiley Publishing Asia Pty Ltd
396 The Australian Economic Review September 2014
This database contains the relevant data used in This sub-class includes all the IPC codes that
this article, as well as all the queries. It allows start with F03D such as ‘F03D 1/00’ (wind
readers to familiarise themselves with the motors with rotation axis substantially in wind
Patstat database without having to install direction) and ‘F03D 5/02’ (wind-engaging
it on their computers. The test database and parts being attached to endless chains or the
the queries are available at <http://www. like).
runmycode.org> or upon request from the
authors. 2.1 Identification of Patents by Technology
Field
2. Patstat Cookbook
We use the term ‘application’ to refer to entries
The Patstat database consists of a set of tables in table tls201_appln. This table lists all the
that follow a relational database schema, applications available in the Patstat database
where tables can be connected to each other and assigns them a unique and stable appln_id,
using a relevant entry key.3 The table on which is built from a combination of the patent
patent applications, labelled tls201_appln, authority (the patent office where the applica-
contains more than 74 million records and is tion was submitted), the patent application
the central element of Patstat, as shown in number and the application kind code (indicat-
Figure 1. The other tables contain information ing, for example, whether the application is a
on each of the patent applications; for patent application, a Patent Cooperation Treaty
example, inventors and applicants, technology (PCT) application or a design application). The
fields, titles and abstracts, publication instan- application number is distinct from the appli-
ces and citations. cation identifier. The application number is the
To limit the volume of data retrieval, we run number issued by the patent authority where the
our queries on a sample of patent applications application was filed, whereas the application
describing inventions related to wind-turbine identifier is specific to the Patstat database. The
technologies and filed in the year 2005 any- latter is called a ‘primary key’ in SQL jargon. In
where in the world. Patent applications related Query 1, the appln_id is used to link table
to wind-turbine technologies are predominantly tls201_appln with table tls209_appln_ipc,
found in the International Patent Classification which contains the IPC codes assigned to
(IPC) sub-class F03D (Dubaric et al. 2011).4 each application.
Classification Title
TLS209_APPLN_IPC TLS202_APPLN_TITLE
TLS222_APPLN_JP_CLASS Abstract
TLS224_APPLN_CPC TLS203_APPLN_ABSTR
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 397
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
398 The Australian Economic Review September 2014
The query below lists all PCT applications of patent applications described in Query 1 may
that entered the national phase either at the State lead to multiple counts of inventions since it
Intellectual Property Office of the People’s mixes both priority and second filings. Count-
Republic of China (SIPO) or at the Japan Patent ing unique inventions involves counting only
Office (JPO) and for which the Danish Patent priority filings.7 de Rassenfosse et al. (2013)
and Trademark Office (DKPTO) was the explain the details of such a ‘worldwide count
receiving office (that is, application authority) of priority filings’. The issue of double counting
of the initial PCT application. becomes less acute if patents are counted at a
single office of reference such as at the EPO.8
Second, it may be desirable to know the priority
status of the patent document in order to avoid
potential selection bias, especially when patents
are counted at a single office of reference.
de Rassenfosse, Schoen and Wastyn (2014)
explain that the single-office count may
produce biased econometric estimates of patent
production functions.9 They propose a test
based on the priority status of the patent
application to detect the presence of selection
bias.
The query below returns the priority status of
The statement relies on patents in our_ the patent documents in our set.
sample. It literally selects all the applications
submitted at the SIPO or the JPO that have their
internat_appln_id equal to the appln_id of PCT
patent applications in our_sample that were
submitted at the DKPTO. The first five results
(out of a total of 15 national-phase entries) are
presented in Table 2.
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 399
appln_id is_a_pf
65303 0
133780 0
149552 1
151084 0
151176 0
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
400 The Australian Economic Review September 2014
Table 4 First Five Rows of Query 4 approximately 500 million people, its actual
appln_id family_size
coverage could be much smaller depending on
the countries in which the patent was validated.
65303 9 One way of dealing with the issue involves
133780 4
adding information on the number of juris-
149552 14
151084 13 dictions in which regional applications were
151176 9 validated after the patent was granted at the
regional office considered. This can be done
with Patstat but we will not discuss it as it
far exceeds the scope of this article.11 Note
Table 5 First Five Rows of Query 5 also that patents listed in our_sample may
belong to the same family and further consoli-
appln_id geog_family_size
dation may be envisaged to control for double
65303 4 counting.
133780 4
149552 12
2.5 Counting Patents by Country (Simple
151084 11
151176 8 Counts versus Fractional Counts)
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 401
The query below performs a fractional count Table 6 Five Randomly Selected Rows of Field
Selection of Query 6 (Joined from t1 and t2)
of inventors’ country of residence for patent
applications in our_sample. Inventors have a appln_id person_ctry_code tot_in_ctry tot_in_patent
field invt_seq_nr greater than 0 in table
263066 DE 2 2
tls207_pers_appln, while applicants have a
273390 CH 1 4
field applt_seq_nr greater than 0. 273390 DE 3 4
273768 JP 1 1
273769 JP 1 1
person_ctry_code fractional_count
– a
609.5
DE 357.2
US 248.0
CN 155.8
DK 113.5
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
402 The Australian Economic Review September 2014
original EP-A1 publication, then only the Table 8 First Five Rows of Query 7
person information for the B1 publication appln_id nb_locations
will be available.12 To recover the current
information, it is possible to link Patstat data 48145305 3
273390 2
with data provided by national patent offices, as
4975233 2
explained in Subsection 2.9. 4979189 2
5804835 2
2.6 Identifying Patents Resulting from
International Collaborations
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 403
appln_id cites_3y
14995919 5
14997816 2
14971868 1
14974947 1
14975309 1
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
404 The Australian Economic Review September 2014
Table 10 First Five Rows of Query 9 Table 11 First Five Rows of Query 10
offices, as explained in Subsection 2.9. Note that element of the SELECT DISTINCT clause with
PCT applications are never granted per se: only ((‘GB’ þ RIGHT(t2.publn_nr,7)) AS publn_
applications that entered the national phase can nr_ukipo). Note the exclusion of documents
be granted. The first five results are presented in with publn_kind code ‘D0’, which for the
Table 10. UKIPO correspond to patent applications filed.
The field publn_nr_ukipo can now be used to
2.9 Linking Patstat with Data Provided by search for additional information on the UKIPO
National Patent Offices website. More generally, one must reverse-
engineer the Patstat format to the format use
It is sometimes desirable to enrich Patstat data with the national patent office. The first five
with data directly provided by national patent results are presented in Table 11.
offices; for example, to get accurate informa-
tion on the legal status of patent applications or 3. Concluding Remarks
to collect information on reassignments. This
can be done by using information from the This article has provided a broad overview of
field publn_nr in table tls211_pat_publn. The the Patstat database by discussing typical
reconstruction of the publication number is queries that rely on the main tables. A good
specific to each patent office and Query 10 way to proceed from here is to slightly alter the
focuses on the rather simple example of the queries and observe how result-sets returned
UKIPO. are affected. We hope that users will be able to
devise indicators tailored to their research needs
and therefore contribute to further improving
the quality of empirical research in the fields of
economics and management of innovation. In
order to avoid duplication of work, however,
we encourage researchers to share their con-
tributions with the broad community. Appen-
dix 1 briefly describes add-ons provided by
institutions or individual contributors to enrich
Patstat data.
A large community of users has emerged
The online patent document and information over time and is keen to share its experience and
service of the UKIPO (Ipsum) requires the answer questions of beginners on the Patstat
publication number to be in the following forum on the EPO website. An additional
format: ‘GBnnnnnnn’; that is, the characters helpful resource is the annual Patent Statistics
‘GB’ followed by seven digits. Query 10 thus for Decision Makers conference (and the
appends the characters ‘GB’ in front of the last preceding user workshop), where the Patstat
seven digits of the field publn_nr in order to community gathers and exchanges recent
recompose a publication number that is developments.
compatible with the UKIPO online service.
Users of MS SQL need to replace the last April 2014
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 405
Appendix 1: Resources for Patstat All OECD databases are freely available on the
OECD website.
European Patent Office’s Worldwide Legal
Status Database OECD Triadic Patent Families Database
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
406 The Australian Economic Review September 2014
EEE-PPAT Table specific introduction exists for the online version of Patstat
(‘Sample Queries and Tips—Patstat Online’), which is
available on the EPO website. The online version offers
In collaboration with the ECOOM department visualisation tools and linked resources, but is less flexible
at KU Leuven, the EPO and the Sogeti than the offline version.
(a software consultancy), EUROSTAT has 2. In particular, we assume knowledge of joins, groups,
devoted considerable effort to harmonising views and embedded queries. Many introductory courses to
applicant names and allocating applicants to SQL are freely available online, including one on the EPO
sectors (private business enterprises, universi- website.
ties and higher education institutions, govern- 3. The identifier of patent applications (appln_id) is
mental agencies, individuals). Sector allocation frequently used to link tables with each other. A full
is relevant for analysing the constituents and description of tables and fields is provided in the Data
dynamics of technological performance on the Catalog, which is available on the Patstat DVDs and can
also be downloaded from the EPO website.
level of innovation systems. Read more at
<http://www.ecoom.be/en/EEE-PPAT>. 4. International Patent Classification codes are used by
patent examiners to identify the areas of technology to which
patents pertain. Note that not all patents have IPC codes.
EP-INV Database on Academic Inventors Wind energy patents can also be identified using the
Cooperative Patent Classification (CPC) code Y02E10/70
This database is the result of a project spon- that is available in table tls224_appln_cpc. The CPC is a
sored by the European Science Foundation and joint classification system between the USPTO and the EPO.
chaired by Francesco Lissoni. The database 5. See WIPO’s ‘Recommended Standard on Two-Letter
contains cleaned and standardised inventors’ Codes for the Representation of States, Other Entities and
names and addresses, as well as information on Intergovernmental Organizations’ (Standard ST.3) for the
exhaustive list of codes, available on the WIPO website.
the affiliations of academic scientists. See Den
Besten et al. (2012) for more information. 6. Note that the link between patent value and PCT status is
a priori ambiguous. As Guellec and van Pottelsberghe de la
Potterie (2000) and Reitzig (2004) point out, patent
World Intellectual Property Office’s applicants may be uncertain about the economic success
International Patent Classification— of the patent’s underlying invention and use the PCT route
Technology Concordance Table to ‘buy’ additional decision-making time. Alternatively, the
economic success of the patent’s underlying invention may
The WIPO’s technology concordance table be well established at the date of filing and PCT is used to
seek global protection as fast as possible.
links the IPC symbols with 35 fields of
technology. The concordance table is updated 7. Note that an alternative approach for counting unique
inventions involves counting families of patents. More
on a regular basis to reflect revisions to the IPC.
information on patent families is provided in the next
Further information is provided on the WIPO section.
website.
8. One can often observe priority filings and subsequent
second filings at the same patent office. This phenomenon is
Worldwide Count of Priority Filings driven by divisional (or similar) applications. If a priority
application was filed at the EPO and a divisional application
de Rassenfosse et al. (2013) have proposed an was also filed at the EPO, this divisional application would
algorithm that exploits patent-family linkages claim priority from the original document and is therefore
technically equivalent to a second filing. Such cases can be
(direct equivalents and other second filings) to identified with table tls216_appln_contn.
recover missing information on inventor and
applicant country of residence. Their algorithm 9. Patent production functions are used in econometric
studies to analyse the determinants of the number of patents
can be used for the recovery of other informa- produced by an economic unit such as a firm or a country.
tion such as missing IPC codes.
10. Indeed, not excluding the PCT application at interna-
tional phase inflates the family count by one unit. For
Endnotes example, if the JPO is the receiving office of a PCT
application, that then enters national phase at the JPO only;
1. This document focuses on the offline Patstat database not excluding the PCT application at international phase
that is available in a series of DVDs from the EPO. A will lead to a family size of 2 instead of 1.
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
de Rassenfosse, Dernis and Boedt: An Introduction to the Patstat Database 407
11. Briefly, the approach for EPO patents would be to use de Rassenfosse, G. and van Pottelsberghe de la
the INPADOC legal status database in addition to Patstat
Potterie, B. 2009, ‘A policy insight into the
and identify the relevant legal status codes that indicate a
validation or renewal fee payment in a designated state. The
R&D–patent relationship’, Research Policy,
INPADOC database is available as an add-in table to vol. 38, pp. 779–92.
Patstat, as explained in Appendix 1. Den Besten, M., Lissoni, F., Maurino, A.,
12. Persons are also linkable to publications since the Pezzoni, M. and Tarasconi, G. 2012,
October 2013 release of Patstat. Kind code ‘A1’ refers to a ‘APE-INV data dissemination and users’
European patent application that is published with European feedback project’, draft paper, 6 June,
search report and ‘B1’ refers to a European patent granted. viewed May 2014, <http://www.esf-ape-
inv.eu/download/Feedback_Document.pdf>.
References Dernis, H. and Khan, M. 2004, ‘Triadic patent
families methodology’, Organisation for
Alcácer, J. and Gittelman, M. 2006, ‘Patent Economic Co-operation and Development
citations as a measure of knowledge Directorate for Science, Technology and
flows: The influence of examiner citations’, Industry Working Paper no. 2004/02, Paris.
Review of Economics and Statistics, vol. 88, Dubaric, E., Giannoccaro, D., Bengtsson, R.
pp. 774–9. and Ackermann, T. 2011, ‘Patent data as
Allred, B. and Park, W. 2007, ‘Patent rights and indicators of wind power technology devel-
innovative activity: Evidence from national opment’, World Patent Information, vol. 33,
and firm-level data’, Journal of International pp. 144–9.
Business Studies, vol. 38, pp. 878–900. Ejermo, O. and Karlsson, C. 2006, ‘Interre-
Balconi, M., Breschi, S. and Lissoni, F. 2004, gional inventor networks as studied by patent
‘Networks of inventors and the role of coinventorships’, Research Policy, vol. 35,
academia: An exploration of Italian patent pp. 412–30.
data’, Research Policy, vol. 33, pp. 127–45. Frietsch, R., Neuhäusler, P. and Rothengatter,
Bergek, A. and Bruzelius, M. 2010, ‘Are O. 2013, ‘Which road to take? Filing routes
patents with multiple inventors from different to the European Patent Office’, World Patent
countries a good indicator of international Information, vol. 35, pp. 8–19.
R&D collaboration? The case of ABB’, Guellec, D., Martínez, C. and Zuniga, P. 2012,
Research Policy, vol. 39, pp. 1,321–34. ‘Pre-emptive patenting: Securing market
Carpenter, M., Narin, F. and Woolf, P. 1981, exclusion and freedom of operation’, Eco-
‘Citation rates to technologically important nomics of Innovation and New Technology,
patents’, World Patent Information, vol. 3, vol. 21, pp. 1–29.
pp. 160–3. Guellec, D. and van Pottelsberghe de la
Clark, C. 1976, ‘Obsolescence of the patent Potterie, B. 2000, ‘Applications, grants and
literature’, Journal of Documentation, vol. the value of patent’, Economics Letters,
32, pp. 32–52. vol. 69, pp. 109–14.
Danguy, J. 2014, ‘Globalization of innovation Guellec, D. and van Pottelsberghe de la
production: A patent-based industry analy- Potterie, B. 2001, ‘The internationalisation
sis’, iCite Working Paper no. 2014-009, of technology analysed with patent data’,
Université Libre de Bruxelles. Research Policy, vol. 30, pp. 1,253–66.
de Rassenfosse, G., Dernis, H., Guellec, D., Harhoff, D., Scherer, F. and Vopel, K. 2003,
Picci, L. and van Pottelsberghe de la Potterie, ‘Citations, family size, opposition and the
B. 2013, ‘The worldwide count of priority value of patent rights’, Research Policy,
patents: A new indicator of inventive acti- vol. 32, pp. 1,343–63.
vity’, Research Policy, vol. 42, pp. 720–37. Jaffe, A. and Trajtenberg, M. 1996, ‘Flows of
de Rassenfosse, G., Schoen, A. and Wastyn, A. knowledge from universities and federal
2014, ‘Selection bias in innovation studies: laboratories: Modeling the flow of patent
A simple test’, Technological Forecasting citations over time and across institutional
and Social Change, vol. 81, pp. 287–99. and geographic boundaries’, Proceedings
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
408 The Australian Economic Review September 2014
°
C 2014 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research