SSRN Id4032020
SSRN Id4032020
Abstract
The copyright protectability of outputs generated by, or with the help of, Artificial Intelligence (AI) is a hotly
debated question in academia and by many institutions. In practice, sophisticated AI algorithms have
become a meaningful assistant in the European news industry in the reporting of sports (Retresco’s
collaboration with the German Football Association), weather (textOmatic’s collaboration with FOCUS
Online) or finance (the Guardian’s “Guarbot”). Furthermore, for the first time in copyright history a court in
China had to assess the validity of a company’s copyright claim over the articles produced by the
corporation’s algorithm. The protection with copyright of robojournalism is no longer just a buzzwordy trend.
From a technological perspective, robojournalism currently relies on assistive, generative and distributive
technologies. The first two seem to be the most problematic from a copyright perspective as they challenge
the well-rooted human authorship requirement. While so far experts have agreed that it does not look like
AI technology is going to be a disruptive force in the media industry, researching the impact of AI in
journalism matters a great deal. There are numerous benefits stemming from the use of AI in the newsroom
- from expanding news coverage, through fast content production, all the way to leaving journalists more
time for “creative” and investigative tasks where the algorithm remains weak.
This paper addresses, first, the protectability of the outputs of robojournalism under the existing European
Union copyright laws. Second, it introduces the findings related to the practical significance of
robojournalism in the European news industry. Here, our focus is on the business, media and
communications studies perspectives of automated journalism. Our results demonstrate that the extent to
which European journalism relies on assistive and generative technologies to produce written output does
not justify, from a copyright perspective, the changing of the current anthropocentric copyright system.
These findings have wider implications as AI-generated outputs have prompted many to talk about market
failure in case copyright (or related rights) protection is refused for such works.
Keywords
artificial intelligence, copyright law, robojournalism, European media industry, authorship, originality
1. Introduction
Sophisticated Artificial Intelligence (AI) algorithms have become a meaningful assistant in the
European news industry. Going beyond mere computer-assisted reporting, otherwise known as
1
Parts of the paper were presented at the SLS 2021 Annual Conference and IP After AI Conference 2021.
The authors are grateful for the comments of Daniela Simone, Tanya Aplin, Joe Atkinson, Dilan
Thampapillai and Jeremie Clos. We are grateful for the excellent support of our research assistant Anushka
Tanwar in completing this research.
2
Assistant Professor in Law and Autonomous Systems, The University of Nottingham. Email:
[email protected].
3
Associate Professor, University of Szeged, Faculty of Law and Political Sciences, Institute of Comparative
Law and Legal Theory. Adjunct Professor (dosentti) of the University of Turku (Finland). Member of the
European Copyright Society. Email: [email protected].
This selected list of examples highlights that the topic of robojournalism is no longer just
a buzzwordy trend. Algorithmic or automated content creation seems to be an irreversible part of
2014) <https://slate.com/technology/2014/03/quakebot-los-angeles-times-robot-journalist-writes-article-
on-la-earthquake.html> accessed 9 February 2022; Bruce Boyden, ‘Emergent Works’ (2016) 39 Colum.
JL & Arts 377, 380–381; Denicola (n 10) 257.
12
Oremus (2014). See further Boyden (2016) 380-381; Denicola (2016) 257.
13 ‘“Tencent Dreamwriter” - Decision of the People’s Court of Nanshan (District of Shenzhen) 24
December 2019 – Case No. (2019) Yue 0305 Min Chu No. 14010’ (2020) 51 IIC 652, where the Court
argued that direct connection (or causal link) existed between the editorial team’s creative choices and
the final output of the applied algorithm. The selection, judgment and skills of the editorial team’s
members and the above-the-minimum level of creativity of the outputs ultimately allow for the protection
of the news reports by copyright for the benefit of the publisher (the employer of the editors). ; Comapre
to Li Yan, ‘Court Rules AI-Written Article Has Copyright’ (ECNS, 9 January 2020)
<http://www.ecns.cn/news/2020-01-09/detail-ifzsqcrm6562963.shtml> accessed 9 February 2022; Rory
O’Neill, ‘AI-Written Articles Are Copyright-Protected, Rules Chinese Court’ (World IP Review, 10 January
2020) <https://www.worldipreview.com/news/ai-written-articles-are-copyright-protected-rules-chinese-
court-19102> accessed 9 February 2022; For a detailed analysis of AI under Chinese copyright law see
He Tianxiang, ‘The Sentimental Fools and the Fictitious Authors: Rethinking the Copyright Issues of AI-
Generated Contents in China’ [2019] Asia Pacific Law Review 184.
One of the central issues in this respect is the copyright protectability of outputs generated
by, or with the help of, AI. This has given rise to masses of academic research, consultations on
multiple fora – both nationally as well as internationally,18 and various institutional reports.19 This
literature focuses on the authorship and originality issues, which underlie copyright protectability.
The discussion has pivoted around the ability of the human author to express free and creative
choices in the algorithmic process.
14 Michael Latzer and others, ‘The Economics of Algorithmic Selection on the Internet’ in Johannes M
Bauer and Michael Latzer (eds), Handbook on the Economics of the Internet (Edward Elgar Publishing
2016) 396–397.
15 Tania Bucher, ‘Machines Don’t Have Instincts’: Articulating the Computational in Journalism’ [2017]
Machine 1.
17 Caitlin Petre, ‘A Quantitative Turn in Journalism?’ (Tow Center for Digital Journalism, 30 October 2013)
(WIPO 2019) WIPO/IP/AI/2/GE/20/1; WIPO, ‘Revised Issues Paper on Intellectual Property Policy and
Artificial Intelligence’ (WIPO 2020) WIPO/IP/AI/2/GE/20/1 REV
<https://www.wipo.int/edocs/mdocs/mdocs/en/wipo_ip_ai_2_ge_20/wipo_ip_ai_2_ge_20_1_rev.pdf>
accessed 27 November 2020.
19 See for example Bernt Hugenholtz and others, ‘Trends and Developments in Artificial Intelligence -
24 Mark Perry and Thomas Margoni, ‘From Music Tracks to Google Maps: Who Owns Computer-Generated
Works?’ (2010) 26 Computer Law & Security Review 621; Ana Ramalho, ‘Will Robots Rule the (Artistic)
World? A Proposed Model for the Legal Status of Creations by Artificial Intelligence Systems’ (2017) 21
Journal of Internet Law 12; Ana Ramalho, ‘Originality Redux: An Analysis of the Originality Requirement in
AI-Generated Works’ [2018] AIDA 23; Jean-Marc Deltorn and Franck Macrez, ‘Authorship in the Age of
Machine Learning and Artificial Intelligence’ in Sean M O’Connor (ed), The Oxford Handbook of Music Law
and Policy (2019) <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3261329> accessed 5
September 2019; Gerald Spindler, ‘Copyright Law and Artificial Intelligence’ (2019) 50 IIC - International
Review of Intellectual Property and Competition Law 1049; Péter Mezei, ‘From Leonardo to the Next
Rembrandt – The Need for AI-Pessimism in the Age of Algorithms’ (2020) 2 UFITA Forthcoming;
Hugenholtz and others (n 19); P Bernt Hugenholtz and João Pedro Quintais, ‘Copyright and Artificial
Creation: Does EU Copyright Law Protect AI-Assisted Output?’ [2021] IIC - International Review of
Intellectual Property and Competition Law <https://link.springer.com/epdf/10.1007/s40319-021-01115-0>
accessed 5 October 2021; Daniel J Gervais, ‘The Human Cause’ in Ryan Abbott (ed), Research Handbooks
on Intellectual Property and Artificial Intelligence (Forthcoming)
<https://papers.ssrn.com/abstract=3857844> accessed 19 November 2021; Tim W Dornis, ‘Artificial
Creativity: Emergent Works and the Void in Current Copyright Doctrine’ (2020) 22 YALE J. L. & TECH 1;
Tim W Dornis, ‘Of “Authorless Works” and “Inventions without Inventor” - the Muddy Waters of “AI
Autonomy” in Intellectual Property Doctrine’ (2021) 43 EIPR 570.
25 Hugenholtz and others (n 19) 33.
26 See Section 5.1.
27 Elizabeth Blankespoor, Ed deHaan and Christina Zhu, ‘Capital Market Effects of Media Synthesis and
Dissemination: Evidence from Robo-Journalism’ (2018) 23 Review of Accounting Studies 1; Yair Galily,
‘Artificial Intelligence and Sports Journalism: Is It a Sweeping Change?’ (2018) 54 Technology in Society
47; Andrey Miroshnichenko, ‘AI to Bypass Creativity. Will Robots Replace Journalists? (The Answer Is
“Yes”)’ (2018) 9 Information 183.
28 David Caswell and Konstantin Dörr, ‘Automated Journalism 2.0: Event-Driven Narratives: From Simple
Descriptions to Real Stories’ (2018) 12 Journalism Practice 477, 478; Aljosha Karim Schapals and Colin
Porlezza, ‘Assistance or Resistance? Evaluating the Intersection of Automated Journalism and Journalistic
Role Conceptions’ (2020) 8 Media and Communication 16, 21.
The paper is structured as follows. In Section 2, this paper lays the groundwork of the law
by addressing briefly the protectability of the outputs of robojournalism under the existing
European Union copyright laws. Section 3 introduces the technological perspectives of
robojournalism. Section 4 covers the business realities of robojournalism in the European written
news industry. Finally, Section 5 summarizes the key findings of media and communications
studies research papers on the implications of AI for journalism.
Our findings generally indicate that the majority of corporations outsource the creation of
the relevant technology, and, to a certain degree, they apply the same available technologies,
namely natural language processing. Still, our results demonstrate that the extent to which
European journalism relies on assistive and generative technologies to produce written output
does not justify, from a copyright perspective, changing the current anthropocentric copyright
system. These findings have wider implications as AI-generated outputs have prompted many to
talk about market failure in case copyright (or related rights) protection is refused for such works.
We believe that our research evidences that the correct argument is to the contrary. Relying on
automated journalism has other benefits that go beyond copyright law - being able to report news
extremely quickly in the manner in which only “robojournalists” are capable of satisfies demanding
consumer expectations, namely getting news reports extremely quickly on a wide variety of topics.
This, coupled with the fact that human journalists will now have more free time to dedicate to
creative and investigative journalism, should be seen as a sufficient incentive for the news
industry, leaving extended copyright protection aside. An important caveat is nonetheless needed
in this respect – the newly introduced press publishers’ related right still needs to be tested on the
market with respect to robojournalism. It is interesting to see to what extent robojournalism will
challenge the operation of this new right.
When copyright and AI are concerned, the big discussion can be divided in two specific categories
of issues – upstream and downstream.29 The former tackle questions with the input of an AI
process, namely the legal issues tied to the training data. These include text and data mining,30
liability for copyright infringing content, adaptation right and derivative works, as well as broader
29 Burkhard Schafer and others, ‘A Fourth Law of Robotics? Copyright and the Law and Ethics of Machine
Co-Production’ (2015) 23 Artificial Intelligence and Law 217, 219.
30 For a recent EU discussion on the problems and solutions with respect to text and data mining, see Alain
Strowel and Rossana Ducato, ‘Ensuring Text and Data Mining: Remaining Issues With the EU Copyright
Exceptions and Possible Ways Out’ (2021) 43 EIPR 322.
One final caveat is necessary. The newly introduced related right for press publishers as
per Article 15 of the CDSM Directive33 will certainly have significant consequences for journalism
fueled by AI.34 This paper only briefly touches upon the potential impact of the new right. We
carefully warn that perhaps not copyright protection, but a related rights protection of AI-generated
output in the field of journalism will be the revolutionary legal right.35 Before going into this, this
section will focus on the so-called downstream, or output issues, and question to what extent
copyright protection sustains for robojournalistic output.
Output generated through, and with the assistance of, AI requires serious considerations of the
essence of copyright law. The two key notions are ‘authorship’ and ‘originality’. These are highly
interconnected and discussion of one inevitably leads to considerations of the other.36 Despite the
Journalism?’ in Taina Pihlajarinne and Anette Alén-Savikko (eds), Artificial Intelligence and the Media
(2022) <https://papers.ssrn.com/abstract=3853730> accessed 11 January 2022.
35 For further analysis of the press publishers’ right see Ula Furgał, ‘The EU Press Publishers’ Right:
Where Do Member States Stand?’ (2021) 16 Journal of Intellectual Property Law & Practice 887;
Pihlajarinne and others (n 34).
36 Jane C Ginsburg, ‘The Concept of Authorship in Comparative Law’ (2003) 52 DePaul Law Review 1063,
1072; Jani McCutcheon, ‘The Concept of the Copyright Work under EU Law’ (2019) 44 European Law
Review 767, 183.
With respect to authorship, the Berne Convention lacks a correlative definition.38 This
could be due to the fact that the necessity for such a definition is redundant, or even perhaps
because it may be considered obvious that the author of a copyright work must be a human being.
With this in mind, some academics as well as copyright law statutes suggest that despite the lack
of an explicit internationally agreed definition of an author, generally the author is the one who
creates the work.39 To this end, the substantive provisions of the Berne Convention point towards
human authorship. One such indication, according to Sam Ricketson and Jane C. Ginsburg,
transpires, on the one hand, from the fact that copyright duration is linked to the life of the author
and, on the other hand, moral rights only entitle a human. In that respect, moral rights are attached
to the personality and presence of an author.40 Thus, the human being is an indispensable
element in the equation. Besides, considering that the Berne Convention was inspired by a group
of European authors under the leadership of Victor Hugo,41 it is not surprising that an
anthropocentric view on authorship prevailed.42
Equally, the term ‘originality’ is not defined in the Berne Convention. There is, however, a
reference to “intellectual creations” in Article 2(5), but this is strictly tied to collections of literary or
artistic works such as encyclopaedias and anthologies. However, considering the dependence of
authorship on originality which becomes clearer in the brief analysis of the EU setting below, the
anthropocentric view of originality comes to the surface.
37 See the following among many others Andreas Rahmatian, ‘Originality in UK Copyright Law: The Old
“Skill and Labour” Doctrine Under Pressure’ (2013) 44 IIC - International Review of Intellectual Property
and Competition Law 4; Thomas Margoni, ‘The Harmonisation of EU Copyright Law: The Originality
Standard’ in Mark Perry (ed), Global Governance of Intellectual Property in the 21st Century (Springer
2016); Eleonora Rosati, Originality in EU Copyright: Full Harmonization through Case Law (Edward Elgar
Pub 2013).Sam Ricketson, ‘The 1992 Horace S. Manges Lecture - People or Machines: The Berne
Convention and the Changing Concept of Authorship’ (1991) 16 Columbia-VLA Journal of Law & the Arts
1; Adolf Dietz, ‘The Concept of Authorship under the Berne Convention’ (1993) 155 RIDA 3; Lionel Bently,
‘Copyright and the Death of the Author in Literature and Law’ (1994) 57 Modern Law Review 973; Lionel
Bently, ‘R. v. the Author: From Death Penalty to Community Service - 20th Annual Horace S. Manges
Lecture, Tuesday, April 10, 2007’ (2008) 32 Columbia Journal of Law and the Arts 1; Jane C Ginsburg,
‘The Role of the Author in Copyright’ in Ruth L Okediji (ed), Copyright Law in an Age of Limitations and
Exceptions (Cambridge University Press 2017); Martha Woodmansee, ‘On the Author Effect: Recovering
Collectivity’ in Martha Woodmansee and Peter Jaszi (eds), The Construction of Authorship: Textual
Appropriation in Law and Literature (Duke University Press 1994); Martha Woodmansee and Peter Jaszi,
The Construction of Authorship: Textual Appropriation in Law and Literature (Duke University Press 1994).
38 Sam Ricketson, The Berne Convention for the Protection of Literary and Artistic Works: 1886-1986 (1987)
para 6.4.
39 Antoon Quaedvlieg, ‘Authorship and Ownership: Authors, Entrepreneurs and Rights’ in Tatiana-Eleni
Synodinou (ed), Codification of European Copyright Law Challenges and Perspectives (Wolters Kluwer
2012) 198–199; Copyright, Designs and Patents Act 1988, section 9(1) (UK).
40 Stef van Gompel, ‘Creativity, Autonomy and Personal Touch’ in Mireille van Eechoud (ed), The Work of
Convention and Beyond Two Volume Set (Second Edition, Oxford University Press 2006) pt 1.
42 Madeleine de Cock Buning, ‘Autonomous Intelligent Systems as Creative Agents under the EU
Framework for Intellectual Property’ (2016) 7 Eur. J. Risk Reg. 310, 319; Ricketson (n 37) 6.
Turning to the EU, the focus of this paper, human authorship emerges very prominently from the
originality standard. Originality has been the subject of a long list of cases from the Court of Justice
of the European Union (CJEU), some of which will be briefly analysed here from the perspective
of journalism. A detailed and thorough analysis of the two key notions – authorship and originality
– usually engages with them separately and studies their origins and evolution independently
before bringing them under the same umbrella. Yet, such an exercise is beyond the objective of
this paper and furthermore, as stated above, the two are highly interdependent. Therefore, this
section will only briefly examine authorship and originality from an EU law perspective and will
then reflect upon what these legal standards mean for the purposes of journalism and more
specifically, robojournalism.
The standard of originality that the CJEU established necessitates that a work be
considered original (and thus, potentially copyright protected), only if it constitutes the author’s
own intellectual creation.43 This definition has been criticised as being rather circular.44
Nonetheless, it puts the figure of the author at the centre stage of EU copyright law. The standard
is said to entail two dimensions: normative and causative,45 also known as a subjective and an
objective one.46 The normative focuses on the substance of originality as such, namely a work
should reflect an intellectual creation. Present very prominently in civil law jurisdictions, this
constitutes the idea that a work should demonstrates the imprint and personal stamp of the
author.47 Importantly, the emphasis on intellectual creation and authorial imprint should not be
confused with a requirement for a certain degree of aesthetic quality, merit or specific purpose
that do not form a requirement of originality under copyright law.48
The causative, also considered an objective, dimension pertains to the originating factor.49
Rooted in UK copyright law, the idea is that a work is not protected unless it originates from a
human author. Thus, the emphasis in originality is not on novelty and creativity, but on the fact
that a work is created by an author. This is a clear indication of how the originality standard
encompasses the authorship notion. Consequently, a work thus is protected only if it is the product
of a human author whose intellectual expression stamps the work50 and all of this should result in
a subject matter that is sufficiently clear and objective.51
43 Case C-145/10 Eva-Maria Painer v Standard VerlagsGmbH and Others [2011] CJEU
ECLI:EU:C:2011:798 [89]; Case C-833/18 SI and Brompton Bicycle Ltd v Chedech / Get2Ge [2020]
[22].Case C-5/08 Infopaq International A/S v Danske Dagblades Forening [2009] [37].
44 Hugenholtz and others (n 19) 70.
45 Daniela Simone, Copyright and Collective Authorship: Locating the Authors of Collaborative Work
Determining the presence of “free and creative choices”52 and thus of the intellectual creation in
a work is not a straightforward exercise. Is it a high or a low hurdle to pass?53 Is it capable at all
of being assessed objectively?54 Does the originality test follow the common law or the civil law
tradition; or is it better described as a mix of both?55 To that end, the CJEU case-law has provided
some insight into the parameters of originality.
Even though the “author’s own intellectual creation” standard is nowadays understood to
apply universally to all types of works, an argument can be made that it is of value to nonetheless
determine and bear in mind the type of work in question. Journalistic literary output very often
follows a pre-determined style, imposed by the specific type of publication, newspaper or
magazine, audience or subject matter, among other things. There will be norms with which
journalists would have to necessarily comply, as a matter of general journalistic practice, but also
imposed more specifically by their editors. This discussion is something copyright scholarship has
tackled and discussed under the broader label of “creative constraints”56 or “freedom of the
creator”.57
The CJEU’s guidance in this respect has been instructive. In BSA, the CJEU addressed
the protectability of a graphic user interface enabling communication between a computer
program and the user. The interface may potentially fall within the general protectable subject
matter by copyright law pursuant to the Information Society Directive58 provided that the interface
meets the golden “author’s own intellectual creation” standard.59 The CJEU stressed that if the
expression of the graphic user interface’s components was dictated by their technical function,
the criterion of originality would not be met.60 In Football Dataco, the Court expanded further on
this notion of functionality and technical limitations.61 That case concerned a claim of infringement
of intellectual property rights – a sui generis database as well as copyright as a database – in
27; Benoît Michaux, ‘L’originalitéen Droit d’auteur, Une Notion Davantage Communautaire a Prés l’arrêt
Infopaq’ (2009) 5 Auteurs & Media 473, 473.
56 van Gompel (n 40) 104.
57 Estelle Derclaye and Marco Ricolfi, ‘Opinion of the European Copyright Society in Relation to the Pending
Reference before the CJEU in Cofemel v G-Star, C-683/17’ (European Copyright Society 2018) 6
<https://europeancopyrightsocietydotorg.files.wordpress.com/2018/11/ecs-opinion-
cofemel_final_signed.pdf> accessed 11 January 2022.
58 Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the
harmonisation of certain aspects of copyright and related rights in the information society Official Journal
L 167 (hereinafter ’InfoSoc Directive’).
59 Case C‑393/09 Bezpečnostní softwarová asociace – Svaz softwarové ochrany v Ministerstvo kultury
ECLI:EU:C:2012:115.
Importantly, the CJEU placed emphasis on the way in which the selection and the
arrangement of the data in the databases was carried out. In Football Dataco, this was done in
accordance with a set of rules, parameters and organisational constraints as well as the specific
requests of the football clubs in question.66 With this in mind, the CJEU turned to analyse whether
this process could reach the required originality threshold – would the selection and the
arrangement of the data in the fixtures amount to the expression of the author’s creative ability in
an original manner through which that author has made free and creative choices and thus
stamped the work with own personal touch? At this stage, the CJEU reaffirmed its position that
there will be no room for creative freedom where choices are dictated by technical considerations,
rules or constrains. Consequently, the CJEU seems to suggest that evaluating the creative
elements in the process of producing the copyright work is as important as the final creative
features of the product itself.
A very crucial aspect in this discussion is the available room for creativity, i.e. the creative
constraints. Limiting the author by certain creative constraints is not sufficient reason to deny that
author copyright protection.67 Yet, this is a very a delicate point. Some constraints might be too
rigid leaving the author no, or very limited, space for creativity. Others may actually stir creativity
– too much freedom may “paralyse” creativity as the creative space becomes too wide to control
and make any creative choices.68
The creative freedom of journalists could also be dictated by some very specific restraints.
Journalists often strictly follow an editorial statute or/and an ethical code.69 For example, the
Reuters Handbook of Journalism lists the following as aspects guiding their journalistic outputs:
story length, basic story structure, consistency of style, key words, language that must be
62 Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection
of databases OJ L 77, Article 7(1).
63 ibid, Article 3.
64 ibid.
65 Football Dataco and Others (n 61) para 33.
66 ibid 35.
67 Hugenholtz and others (n 19) 73.
68 van Gompel (n 40) 107.
69 ibid 116.
Inevitably these lead to one very straightforward and simple conclusion, often undermined
in the context of AI. Not all output, even if directly the product of human hands, shall receive
copyright protection. Free and creative choices and expression of intellectual creation must be
present. Turning to the realities of robojournalism, in identifying these choices which trigger
copyrightability, it is necessary to lift the technical veil and unpack the basic process behind the
production of journalistic pieces with the aid of, or allegedly entirely autonomously by, AI.
A 2020 report, commissioned by the European Commission and carried out by the Institute for
Information Law (IViR) and the Joint Institute for Innovation Policy (JIIP), studied the specific IPR
challenges from the perspective of copyright and patent law. It identified three specific domains
as priority ones – pharmaceutical research, science/meteorology and journalism.75 The present
ECLI:EU:C:2019:623 [19–20].
73 Opinion of Advocate General in C ‑ 469/17 Funke Medien NRW GmbH v Bundesrepublik Deutschland
[2018] [21].
74 van Gompel (n 40) 107.
75 Hugenholtz and others (n 19) 33.
What creates most difficulties are generative technologies. The presumption there is that
these are “capable of creating media content largely autonomously and with very little human
intervention”.79 In this part we seek to unpack these technologies and identify the degree of
autonomy in the robojournalism process where generative technologies are used. The available
technologies have become mainstream in the field of descriptive reporting tasks, but are still not
capable of being directly applied to other more complicated forms such as storytelling journalism.
This study turns to the technological reality behind descriptive reporting only. However, it is worth
mentioning that the reason why more sophisticated journalism is still out of reach for
robojournalism is due to the lack of data models suitable for encoding event-driven narratives.80
Some progress has been made and models have nowadays been suggested that aim to target
event-driven natural language generation.81 With that in mind, our study digs into the technology
behind robojournalism that thrives on dry statistics and numbers, such as sports, weather and
finance. From the outside, it appears that the technology in these fields has indeed become rather
mainstream and accessible to many. This is a proposition we test with the empirical study in
Section 4. Here, we seek to unpack their functioning in order to address the copyright protectability
issues. In this discussion, we start from the main protagonist – “natural language generation”, or
“NLG”.
76 Efthimis Kotenidis and Andreas Veglis, ‘Algorithmic Journalism—Current Applications and Future
Perspectives’ (2021) 2 Journalism and Media 244, 246.
77 Hugenholtz and others (n 19) 57.
78 Jane C Ginsburg and Luke Ali Budiardjo, ‘Authors and Machines’ (2019) 34 Berkeley Technology Law
The study in this paper turns to unravel NLG techniques as applied to the specific
journalistic fields of weather, sport, finance and real estate reports. The reason behind the
selection of these specific domains of journalism is that they focus on telling us “what happened
or is happening” since “the limitation of only answering the “what”, rather than the “why”, is due to
the inability of computer systems to analyse events against contextual life-world knowledge”.88 An
important development which fuels robojournalism in these fact-driven fields is datafication.
Society is used to digitalization – there is barely a domain operating on analogue. With respect to
news, however, the fact that data are constantly generated, rendered open and available has
pushed literature to talk about digitization evolving into datafication, which becomes particularly
relevant for the news industry.89 The accuracy and reliability of data is of pertinence, especially in
a field such as journalism which is tied to many ethical responsibilities. This is usually referred to
broadly as “data-to-text” generation.90 Many authors and journalists spend a significant amount
of their time producing documents from data and this is often not their primary task. Once this can
be delegated, either entirely or to a large extent, to an algorithm the journalists’ productivity and
morale are automatically enhanced.91 It must, however, be borne in mind that developing an NLG
system is costly and not all news companies can afford it. The decision to invest is very often an
economic one.92
82 ibid 477.
83 Stefanie Sirén-Heikel and others, ‘Unboxing News Automation’ (2019) 1 Nordic Journal of Media Studies
47, 49.
84 ibid.
85 Warren McCulloch and Walter Pitts, ‘A Logical Calculus of Ideas Immanent in Nervous Activity’ (1943) 5
Tasks, Applications and Evaluation’ (2018) 61 The Journal of Artificial Intelligence Research 65, 66.
91 Ehud Reiter and Robert Dale, ‘Building Applied Natural Language Generation Systems’ (1997) 3 Natural
and Ching-Hua Chuan, ‘Artificial Intelligence and Journalism’ (2019) 96 Journalism & Mass Communication
Quarterly 673, 677-678.
The central question is – what is the degree of human involvement in the generative technology
process itself and does that degree justify a copyright claim to arise? In these complex technical
processes, human involvement can take place at various stages of the process – pre-production,
during the NLG process and post-production. This is something Hugenholtz and Quintais refer to
as conception, execution and redaction.93 This corresponds neatly to the CJEU’s analysis in
Painer, where the CJEU emphasised that authorial creative choices can take place in three
different stages when a photograph is taken: at the preparation stage, when taking the photograph
and post-production when choosing a developing technique and method.94 Applying this to the
NLG process, it is important to dissect the typical NLG process.
As with all machine learning based technologies, there is no one-size-fits all technical
model for all NLG systems. Yet, it appears that a consensus exists that in any NLG process six
basic activities need to be performed; these start all the way from input data to a final output text.95
Even though the order of these may vary, some of them may be merged together, these stages
always come back in one way or another as they represent the stages of any text generation.
Reiter and Dale define them in the following manner:96
Each stage entails its own individual peculiarities, depending on several elements, including the
type of text to be produced, the style of writing, the target audience. For example, the editorial
constraints discussed above with the Reuters Handbook example would certainly play a heavy
role in the setting up of the technical specifications in each of the six stages. The beauty of a
process of this kind is that it provides editors, journalists and the computer scientists involved with
a wide freedom to tweak and adjust. In that regard, a parallel could be made with what Lehr and
Ohm underline with respect to machine learning in general – these complex processes are not
Matching this technical analysis with the copyright discussion above, cases like Painer already
stress the fact that originality can take place at different stages.99 What matters is not one single
epiphany-like moment. Instead, creativity and originality can take place at different moments of
the NLG process. Consequently, one is prompted to seek the choices that authors involved in the
process make (as opposed to the system itself in order to satisfy the human authorship
requirement) and determine whether these choices indeed are “free and creative” to constitute an
intellectual expression. The main instrument in this analysis will be notion of constraints
introduced above in Section 2. The essential question to be asked is whether the imposed
technical constraints limit excessively the creators’ freedom in each of these stages to the extent
that there is no copyright claim subsisting.
Literature has categorised these six tasks in two stages – early and late ones.100 Early
decisions are directly tied to the input data. In this respect, Gatt and Krahmer pivot the early
decisions around the question of which information to convey to the reader, while the late
decisions are strictly tied to the decision of which words to use in a particular sentence and how
to put them in their correct order.101 The first stage – content determination – is an early task and
it can be suggested that the decision of which data to insert in the NGL process is not immediately
the type of free and creative choice that triggers a copyright claim. More importantly, this content
determination in the NLG process is typically carried out through automated means where the
process leaves little room for human intervention.
Content determination does not appear to entail any free and creative choices that would
trigger a copyright claim. This is due to the idea/expression dichotomy, according to which
copyright does not protect ideas, but only expression.102 While this has always been a very difficult
line to draw in reality, it can be safely stated that deciding what information should be
communicated in the text may stay closer to the idea side of the spectrum. Admittedly, it can be
argued that there may be some free and creative choices in the selection of the information that
would go into the NLG process, it must be stressed that words in isolation would not constitute
the author’s own intellectual creation. Infopaq taught us that it is “only through the choice,
sequence and combination of those words that the author may express his creativity in an original
97 David Lehr and Paul Ohm, ‘Playing with the Data: What Legal Scholars Should Learn About Machine
Learning’ (2017) 51 U.C. DAVIS L. REV. 653, 669.
98 Reiter and Dale (n 91) 68.
99 Painer (n 43).
100 Gatt and Krahmer (n 90) 71.
101 Gatt and Krahmer (n 90).
102 TRIPS (n 31) article 9(2).
Discourse planning, the second activity, on the other hand may cover some important
features for the copyright claim. The ordering and structuring of the text into a coherent form
whereby logical connections between the beginning, the middle and the end of the text are present
would certainly entail free and creative choices, which could be limited by the editorial constraints
imposed by the specific journalistic output, but would regardless of this be the type of activity that
triggers a copyright claim, that goes beyond simple idea, dictated by functionality.
The next stage – sentence aggregation – does not appear to have any impact on the
copyright claims, especially considering that this is not always a necessary stage and would
typically entail the grouping of the sentences together. Arguably, these are not choices that would
entail sufficient intellectual creation in a free and creative manner as required by copyright law.
Most likely, these choices would be heavily influenced by the information that is being conveyed.
Thereafter comes the lexicalisation phase, which appears to be particularly important from
a copyright perspective. Lexicalisation entails the process of deciding which specific words and
phrases would be used to express the domain concepts and relations.104 It looks like lexicalisation
can be carried out by hard-coding whereby humans determine in advance which words would
come to represent any specific concept or domain. Arguably, the decision of using one word
instead of another could reflect free and creative choices of the author. Yet, it is questionable
whether choosing merely one word could constitute the authorial choice sufficient to trigger
originality. The CJEU case-law has not established a minimum, nor a de minimis rule; thus, a
case-by-case analysis is required here.
As for the task of referring generation expression, considering that at this point what
happens is that certain phrases or words are selected to be identified with others, it does not look
like any copyright relevant free and creative choices would take place here. Deciding to use ‘the
team’ and ‘they’, or ‘the score’ and ‘it’ interchangeably are minimal choices which do not contribute
to the creative expression.
Finally, during the linguistic realisation task, grammar, syntax, morphology and
orthography are revised. Once again, none of these pertain to the copyright-relevant intellectual
creativity – these decisions are mostly dictated by certain rules and therefore the creative freedom
for such choices is rather restricted.
4. Business perspectives
So far, a limited amount of literature has been published that empirically tests NLG service
providers. One of the relevant sources pointed out that the algorithmic content industry (ranging
from the selection through recommendation to creation) is a massively developing field of
business.105 In 2014, Latzer et al. nevertheless found that automation was ancillary for the news
industry.106 Some studies have nonetheless started shedding light on the use of NLG. Graefe107
and Dörr108 discussed the functionality and offers of 11 and 13 NLG service providers,
respectively, while, Fanta interviewed 15 news agencies on their use of AI tools.109
None of these research papers had any focus on the copyright law aspects of
robojournalism. This does not, however, mean that the researchers’ findings cannot inform – at
least indirectly – a research project on copyright implications of robojournalism. One of the most
important findings of these research papers is that media corporations generally outsource the
development of AI tools with which they might generate the final literary outputs. As Graefe noted,
“[m]any newsrooms, however, lack the necessary resources and skills to develop automated
journalism solutions in-house. Media organizations have thus started to collaborate with
companies that specialize in developing natural language generation technology to automatically
generate stories from data for a variety of domains.”110 The involvement of NLG service providers
in the production of media outputs questions the manner and extent to which media companies’
may claim copyright protection of output generated with the assistance of NLG.
This led us to conduct a targeted empirical analysis of selected European NLG service providers
under various factors. We checked how widely they support news publishers with automated
journalism tools. We analysed 10 service providers: AX Semantics, Retresco, Textomatic from
Germany; Syllabs and Labsense from France; United Robots from Sweden; Bakken & Baeck
from Norway; Arria and RADAR from the United Kingdom; and Connexun from Italy.111
The collection of data has evidenced a significant overlap of functionalities and market
presence of the distinct service providers, as well as a huge difference among the service
providers with respect to the transparency/availability of data on the distinct factors/variables we
paid attention to.
To start with the commonalities (or available information): the majority of services are
offered on a software-as-a-service (SaaS) basis, although some corporations provide for a
content-as-a-service (CaaS), hyper-personalised or custom-built solutions. The majority of
service providers offer NLG, but several corporations also provide for NLP solutions.
Only half of the analysed services published data on the number of available languages
in which they offer their services. Where data was available with respect to this variable, the
numbers vary heavily: from 6112 (United Robots) to 110 (AX Semantics). If the language variations
111
Our focus on the “European” news industry was therefore not limited to European Union Member States
only.
112
It is hard to measure whether Syllabs’ “multiple languages” means more or less than 6.
Corporations are simultaneously present with their solutions in multiple market segments,
ranging from e-commerce to national government communications. In general, our empirical
findings confirm that the most relevant services are connected to data-driven markets:
telecommunication; financial sector; weather forecasts; sports; real estates etc.
The number of confirmed clients of the service providers varies significantly. Three
corporations do not provide data on the number of their clients,113 but the rest report from multiple
dozens to 800+ clients. At the same time, these numbers are not fully comparable. Some
corporations publish the overall number of their partners, while others specify the number of the
news industry clients, too. For example, Syllabs has 800+ and AX Semantics has 500+ clients
overall; on the other hand, Labsense and United Robots report 100+ media clients (including,
however, radios/audiovisual corporations, too). Finding a correlation between the various factors
was not the purpose of the present paper. It would be interesting to explore further how the various
service providers’ language variations or their market presence correlates with the reported client
numbers. Only a much deeper empirical analysis – with a direct focus on the given corporation’s
business strategy – would be capable of shedding light on the correlations.
The best available examples of the use of the selected services tend to focus on - the
often mentioned – sports or financial reports, or weather forecasts;114 although other important
elements of the online publishing process (e.g. SEO visibility or topic management) are supported
as well.115 This aspect once again underlines the growing prominence of robojournalism in these
specific fields.
113
These are Textomatic, Bakken & Baeck and Connexun. We nevertheless know that they have existing
(and famous) collaborations: Textomatic has built a fruitful collaboration with FOCUS Online (compare to
note 6 above); and Bakken & Baeck has collaborated with NTB on football sport reports.
114
The Stuttgarter Zeitung uses AX Semantic’s service to generate sport, fine dust and live air quality
reports (https://en.ax-semantics.com/portfolio/stuttgarter-zeitung/); FOCUS online automatically generates
weather and finance news with the help of Textomatic’s solution (https://www.pt-
magazin.de/de/wirtschaft/innovation/roboter-journalismus---ist-nicht-mehr-wegzudenken_jknpci4d.html);
Mediafin automatically generates stock market news feed with Syllabs technology
(https://www.syllabs.com/en/client/lecho-automatically-generates-stock-market-newsfeed); Ouest France
generates reports on weather and upcoming cultural events by using Syllabs’ solution
(https://www.syllabs.com/en/client/Ouest-France-boosts-its-local-information); 60.000 local soccer games
were reported on during the first “COVID season” in the Netherlands (https://www.unitedrobots.ai/for-
newsrooms/news/how-dutch-ndc-will-cover-60000-regional-football-matches?hsLang=en); Bonnier News
Local also automated live sports reporting (https://www.unitedrobots.ai/for-newsrooms/news/automating-
live-sports-at-bonnier-news-local?hsLang=en); Bakken & Baeck and NTB’s collaboration was also
centered around digital football reporters (https://medium.com/bakken-b%C3%A6ck/building-a-robot-
journalist-171554a68fa8).
115
The FAZ.NET opts for an audience-first experience to increase SEO visibility and topic management
(https://www.retresco.de/wp-content/uploads/2020/09/Retresco-TMS-Case-Study-FAZ.pdf); TF1 uses
Labsense’s service to generate automated editorial content (https://www.lalettrea.fr/medias_presse-
ecrite/2019/05/20/tf1-fait-appel-a-l-intelligence-artificielle-de-labsense-pour-rediger-des-textes-
automatiques,108357671-art)
This empirical research demonstrates that the NLG services market is thriving. Outsourcing the
development of algorithms is the standard solution in robojournalism. It comes as no surprise that
the use of algorithms is generally present only in data-driven fields such as finance, weather
forecast or sports reporting. Most of the analysed service providers obscure their contractual
practices. The publicly available and relevant documents almost unanimously necessitate the
client to provide the source data and allow the use of the content without claiming any copyright
interest in the input content. Indeed, it is plausible to believe that the other service providers,
which failed to disclose their service contracts, follow the same logic. Furthermore, although the
majority of services advertise the underlying algorithm as fully automated, the final publication of
the given content necessitates more or less human intervention in the newsrooms. Hence, the
copyright protection of the relevant media outputs might effectively arise as a consequence to the
potential free and creative choices made at the level of editing, after the NLG process has taken
place. These choices will certainly vary widely from process to process – each newsroom is
orchestrated differently, so the amount of postproduction creative effort necessary to bring the
NLG output to a readable journalistic piece is not always the same in all circumstances. These
practices are discussed later in this paper. For the purposes of the interim conclusion on the
116
AX Semantics’ Master Subscription Agreement, §2.2 (“The customer may only process his own data, or
data he has legal access and usage rights for, for his own purposes. All rights to the data provided by the
customer remain with him.”) Available via https://assets.ax-semantics.com/terms-and-conditions.pdf.
117
Retresco’s Terms & Conditions, G.2 [“Retresco will store (duplicate) and process (catalogue or prepare
and summarise for the semantic search function) the Customer’s data and content solely on behalf of the
Customer and, unless expressly agreed otherwise, for use by the Customer.”] Available via
https://www.retresco.com/terms-conditions/.
118
Textomatic’s Cooperation Agreement for News-Alert-System (NAS) and rob.by-Chatbot, Preamble and
Definitions [“The databases of the NAS system are filled with licensed data (e.g. Tradegate/Deutsche
Börse/VWD, DFB, Wetterkontor) and Open Data (e.g. Wikipedia) or with content from media partners.”]
Available via https://newsletter.textomatic.ag/en/Contract/NAS/index.html.
119
Connexun’s Terms & Conditions, API Data usage [“Data accessible through Connexun may contain
Third Party Content (such as text, images, videos obtained from various news sources). This content will
remain the sole responsibility of those who make it available. In some cases content accessible through
our Services may also be subject to intellectual property rights. In these cases you are allowed only to
perform actions and activities that are awarded to you by the owner of the content.”] Available via
https://connexun.com/terms-and-conditions.
There is a general understanding among some AI researchers that the biggest threat to the
development of AI is the human fear of the effects of such changes.123 Such “Frankenstein
Complex” is certainly present with respect to robojournalism as well. Journalists inescapably meet
the challenge of “resistance versus assistance”, that is, whether they believe robojournalists will
replace or only supplement them.
120 Daewon Kim and Seongcheol Kim, ‘Newspaper Companies’ Determinants in Adopting Robot
Journalism’ (2017) 117 Technological Forecasting and Social Change 184, 188.
121 ibid.
122
At the same time, we will not be discussing professional questions such as the ethical aspects of
automated journalism, as well as issues related to objectivity, bias or newsworthiness.
123 Lee McCauley, ‘The Frankenstein Complex and Asimov’s Three Laws’ (AAAI, 10 May 2007)
Indeed, the general trend among journalists is to argue that “[a]lgorithms make possible
journalistic practices that would not be feasible based on human labor alone. Algorithmic systems
help news sites determine quality reader comments, find important stories on social media
platforms, and use data sets to generate stories”.126 The empirical research of Schapals and
Porlezza showed that journalists tend to defend their positions by referring to expressions like
creativity, context or uniqueness to describe their work; and journalism is regularly treated by
journalists themselves as “an ‘art’ or a ‘craft’ rather than some manual task on an assembly
belt”.127 Human experience and know-how is argued to be irreplaceable,128 especially as
algorithms are only a form of programmed logic.129 As Coddington stated, “[d]ata journalism
retains an emphasis on editorial selection and professional news judgment in analysing and
presenting data, but it does so while also building around a recognition that expertise in analyzing
and drawing meaning from that data often exists outside of the profession, among the
audience”.130 Some estimate that only about 15% of journalists’ and 9% of editors’ jobs might be
replaced by automated technologies.131
Furthermore, robojournalists are usually designed to free up human journalists for more
sophisticated workplace tasks,132 and so they get the chance and time to “produce a better
story”.133 Arguably, this refers to practices such as creative writing, investigative journalism as
well clever interviewing, where the creative intellectual effort of the journalist is indispensable to
the final piece. Another study found that the journalists’ three key motives for using AI were:
making their own work more efficient; delivering more relevant content; and improving business
efficiency.134 Each of these is directly linked to the speed and coverage that AI systems are
As the interviews made by Schapals and Porlezza showed, journalists’ craft can “best be
described by linguistic eloquence, stylistic nuance and a general need to not merely convey facts
objectively, but to contextualise them, that is, to take readers by the hand and help understand
the deeper meanings, possible consequences and wider (societal) significance of the factual
information they are consuming. [The journalists] also stressed the need for a human editor to
double-check and to validate accounts of sports or financial news coverage”.138 Finally, as Graefe
pointed out, journalists should focus on tasks that algorithms cannot perform. The authors suggest
that going forward, human and automated journalism will likely become closely integrated and
form a relationship that Reginald Chua refers to as a ‘man-machine marriage’, whereby algorithms
will analyze data, find interesting stories, and provide a first draft, which journalists will then enrich
with more in-depth analyses, interviews with key people, and behind-the-scenes reporting.139 As
the technological reality section below will demonstrate, this is already the reality.
No doubt: not all journalists are happy with the recent changes. Those, who are less
trained in technology, might find their future in the news industry more vulnerable. Empirical
evidence also shows the fears of gradual disappearance of data intensive newsroom jobs, 140
especially related to sports, weather and financial reports.141
135 Mark Hansen and others, ‘Artificial Intelligence: Practice and Implications for Journalism’ (2017) 8
<https://academiccommons.columbia.edu/doi/10.7916/D8SN0NFD/download> accessed 9 February
2022.
136 Graefe (n 8) 597.
137 Fanta (n 109) 10; Neil Thurman, Konstantin Dörr and Jessica Kunert, ‘When Reporters Get Hands-on
Compositional Forms, and Journalistic Authority’ (2015) 3 Digital Journalism 416, 422–424.
141 Graefe (n 8) 33–34; Schapals and Porlezza (n 28) 22.
Graefe pointed out that “[i]n automating traditional journalistic tasks, such as data collection and
analysis, as well as the actual writing and publication of news stories, there are two obvious
economic benefits: increasing the speed and scale of news coverage. Advocates further argue
that automated journalism could potentially improve the accuracy and objectivity of news
coverage. Finally, the future of automated journalism will potentially allow for producing news on
demand and writing stories geared toward the needs of the individual reader”.142 Reading this
opinion in conjunction with other sources, the key motivation of publishers in introducing NLG
solutions might be the speedy creation of new products, rather than cutting costs of human
workload. 143 Indeed, there is a sensible “profit trap” in NLG solutions. On the one hand, publishers’
struggle for profitability, and NLG solutions are able to reduce some transaction costs due to
process automation.144 On the other hand, collaboration between journalists and computer
scientists necessitates extra resources.145 The development expenses of robojournalism,
including the hiring of trained technical experts or the internal training of them, are barriers to entry
and further expansion.146
Another key factor is that “[c]omputers never get tired. Thus, algorithms are less error-
prone”. We do not believe that the latter necessarily flows from the former. Computers do crash
147
and the code could be flawed, and the data with which the machine learning algorithm is fed could
be biased and lacking in objectivity. Yet, the absence of physical and emotional tiredness of which
even the keenest and most dedicated human journalists suffer makes the machine more efficient
in contrast to humans. While this is a factor that publishers typically tend to consider from the
perspective of users’ expectations rather than from the perspective of the creation of news
outputs, it must be highlighted that this accuracy and speed certainly render the use of
robojournalists more attractive to publishers and should be seen as a benefit in itself.
(n 135).
The rising potential of NLG has led to rising user expectations. Such expectations are related to
the quality of news,150 transparency,151 trustworthiness of robojournalists,152 the personalisation
of media coverage153 or “news on demand”154 among many others. The importance of these
values becomes even greater. This is essentially due to the fact that NLG algorithms are capable
of generating outputs that the readers/consumers identify with human messages.155
At the same time, there is a perceptible danger for an “information overload”.156 It is more
than a hypothesis that robojournalism multiplies “the number of available stories well beyond
present limits”.157 There is, however, a significant risk that “[t]his expansion of stories necessarily
reduces the odds that any single story will be read”.158 Tied to this is the well-known danger of not
being able to determine the authenticity and trustworthiness of information, as well as the
potentiality of falling into a filter-bubble.159 If so, the negative externalities of NLG-based news
production can heavily outweigh the benefits of robojournalism.
and Kunert (n 137) 1252. Graefe (n 8) 36–42; As Fanta pointed out, however, “not all use of automation is
made transparent to customers and readers. Reuters, AP and NTB usually tag their robot stories,
However, this does not apply to single-line alerts, so-called snaps, which Reuters sends out. At least two
news agencies produce partial stories from templates without mentioning the robot as a co-author”, see
more at Fanta (n 109) 11.
152 Inge Graef, Raphael Gellert and Martin Husovec, ‘Towards a Holistic Regulatory Approach for the
European Data Economy: Why the Illusive Notion of Non-Personal Data Is Counterproductive to Data
Innovation’ [2018] TILEC Discussion Paper No. 2018-029 599
<https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3256189>.
153 As Graefe says, to “[t]ell the same story in a different tone depending on the reader’s needs”, see
Graefe (n 8) 22.
154 ibid 27.
155 Christer Clerwall, ‘Enter the Robot Journalist - Users’ Perceptions of Automated Content’ [2014]
News publishers certainly see a possibility in NLG services; but financial considerations
play a frustrating role in this regard. Due to the massive amount of resources needed to set up a
functioning robojournalism newsroom (including the building of human-robot collaboration in the
creative phase), only bigger media corporations are in the position to - take the first steps to -
switch to NLG solutions (just yet). At the same time, cost reduction seems to remain a daydream,
which is another reason for small players to think twice before investing in robojournalism.
It is not possible to measure yet, whether the externalities of robojournalism will mainly be
positive or negative for users. As a general consequence, however, we can conclude that the
massive news consumption, in conjunction with the generational shift towards tweets or Tik-Tok
videos rather than in-depth writing162 might contribute to a substantive devaluation of journalism.
Taking all these considerations into account - the long-lasting need for human involvement
in news creation; the limited switch to NLG by the bigger media corporations; and the hardly
predictable outcomes of robojournalism for users - we argue that there is no convincing evidence
in media and communications studies to introduce the copyright protection of automated news for
the benefit of artificial intelligence or their developers.
6. Conclusion/recommendations
This paper looked at the implications of robojournalism from the perspective of copyright law. It
studied the techniques of NLG as applied to journalism and established that there may be several
stages in the process where there is room for free and creative choices that would trigger a valid
copyright claim. Yet, this should not be taken at face value. Most of the journalistic fields in which
NLG is applied relate to rather dry, data-heavy, fact-based fields such as sports, weather and
finance. Thus, it is questionable whether even if the journalistic output in those fields were written
by a human author, completely excluding the presence of any NLG system, that it would actually
trigger a valid copyright claim. Basic principles of copyright law dictate that what is subject to
protection are the expression of ideas and facts belong to the public domain. Additionally, from
the perspective of business, developing an NLG system is particularly costly. This is backed up
by the empirical analysis underlying this paper which proved that outsourcing the development of
NLG – due to the lack of resources and/or the lack of expertise – is the standard practice. Looking
into the practices in the editorial room it appears that postproduction plays a significant role.
The three perspectives studied in this paper – technological, business as well as media
and communications – demonstrate that copyright law is not to be extended to cover output
generated by NLG. The current copyright framework is rooted in the presence of a human author
and that should remain to be so. The absence of free and creative choices should not be artificially
compensated by considerations for potential market failures if copyright protection does not arise
for robojournalism output. It can be concluded that robojournalism follows well the negative
spaces theory.163 Being the first one to utilise generative techniques that are trustworthy,
transparent, accurate, zeroing discrimination brings well enough benefits to companies resorting
to NLG techniques even in the lack of intellectual property, especially copyright protection.
163Chris Sprigman and K Raustiala, ‘The Piracy Paradox: Innovation and Intellectual Property in Fashion
Design’ (2006) 39 Cardozo Arts & Entertainment Law Journal 535, 538, according to which certain
creative fields thrive regardless of the protection of intellectual property.