fotolia, Gerhard Seybert
Plagiarism: Definition and Detection
Prof. Dr. Debora Weber-Wulff
HTW Berlin
Science – Between Research Ethics and Plagiarism
November 12, 2020 // GRACE@Zoom
Debora Weber-Wulff
• Professor for Media and Computing, HTW Berlin
• PhD in theoretical computer science
• Plagiarism researcher since 2002, have tested
plagiarism detection software since 2004
(newest test just published)
Software cannot detect plagiarism!
• Computing & Ethics working group and fellow
of the Gesellschaft für Informatik
• Active with the German plagiarism documentation
wikis GuttenPlag Wiki and VroniPlag Wiki since
2011
2 / 34
Definition: What is Plagiarism?
Charles Robinson, 2008, http://charles.robinsontwins.org/photos/2008/twinsdays/content/IMGP1792_large.html
Used by permission.
3 / 34
Definition: What is Plagiarism?
• A legal definition does not exist; this is an
educational and integrity problem.
• Some authors speak of theft of text, others about
fraud.
• A common misconception sees plagiarism only as
an issue of copyright.
• It is not about giving proper references for facts.
• There are quite a number of attempts to define
plagiarism in English, but most are more or less:
"We know it when we see it"1.
1 378 U.S. 184 at 197 (1964)
[https://supreme.justia.com/cases/federal/us/378/184/case.html#197,
Justice Stewart concurring]
4 / 34
What is Plagiarism?
Definition Adapted from Teddi Fishman
(Former Director International Center for Academic Integrity, ICAI)
•• Plagiarism
Plagiarism occurs
occurs when
when someone
someone
•• uses
uses words,
words, ideas,
ideas, or
or work
work products
products
•• attributable
attributable to
to another
another identifiable
identifiable
person
person or
or source
source
•• without
without properly attributing
attributing the workthe work
to the to the source
source
from
from which
which it
it was
was obtained
obtained
• in a situation in which there is a legitimate
expectation of original authorship.
Fishman, T. (2009)“We know it when we see it” is not good enough: toward a standard definition of plagiarism that transcends theft, fraud, and copyright In: Proceedings of the Fourth
Asia Pacific Conference on Educational Integrity (4APCEI) 28–30 September 2009 University of Wollongong NSW Australia.
5 / 34
What About Intent?
Portrait of Tommasi Inghirami, Wikimedia Commons
6 / 34
Types of
Academic Misconduct
• Plagiarism
• Copy & Paste DerHexer, [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)],
via Wikimedia Commons
• Disguised Plagiarism
• Shake & Paste
• Translation
• Pawn Sacrifice
• Other types
• Image Manipulation
• Data Fabrication
• Duplicate Publication
• …
7 / 34
8 / 34
Copy & Paste
Siena, 2012, PhD, https://vroniplag.wikia.org/de/wiki/Mmu/Fragment_085_01
9 / 34
Copy & Paste with Wikipedia links
source!
copied from
LMU München, 2010, Medicine, http://de.vroniplag.wikia.com/wiki/Pak/Fragment_024_01
Wikipedia links
Disguised or Mosaic Plagiarism
• Changing word order in phrases
“Iceland and Greenland” → “Greenland and Iceland”
• Exchanging words with synonyms
“The hypothesis that gained wide acceptance is that the spread of SD
probably involves the release and diffusion of the chemical mediators” →
“The hypothesis that gained wide acceptance is that the propagation of SD
probably involves the release of different chemical mediators”
http://de.vroniplag.wikia.com/wiki/Amh/Fragment_007_14
• Adding or removing words
10 / 34
11 / 34
or
Mosaic
Disguised
Plagiarism
WU Wien, 2001, PhD, https://vroniplag.wikia.org/de/wiki/Svr/Fragment_061_01
12 / 34
Disguised or Mosaic Plagiarism
TU München, med., http://de.vroniplag.wikia.com/wiki/Vm/Fragment_036_01
Mosaic Plagiarism in Scientific,
“Peer-Reviewed” Publications
Int J Clin Exp Med. 2015; 8(5): 6773–6783. Int J Clin Exp Med. 2015; 8(5): 6957–6966.
13 / 34
Shake & Paste
• Wikipedia: Wer wird Millionär
Osnabrück, 2009, Cultural Studies http://de.vroniplag.wikia.com/wiki/Gma/070
2008
• Bild, 14 June 2007
• Der Tagesspiegel, 14 June
2007
• Die Welt, 14 June 2007
14 / 34
15 / 34
Translation
Heidelberg, 1997, Medicine, http://de.vroniplag.wikia.com/wiki/Awb/Fragment_022_01
15 / 34
Translation
Heidelberg, 1997, Medicine, http://de.vroniplag.wikia.com/wiki/Awb/Fragment_022_01
15 / 34
Translation
Heidelberg, 1997, Medicine, http://de.vroniplag.wikia.com/wiki/Awb/Fragment_022_01
16 / 34
Translation
MH Hannover, med.habil., 2015, http://de.vroniplag.wikia.com/wiki/Mjm/Fragment_036_55
16 / 34
Translation
MH Hannover, med.habil., 2015, http://de.vroniplag.wikia.com/wiki/Mjm/Fragment_036_55
Pawn Sacrifice
Münster, 2010, Medicine http://de.vroniplag.wikia.com/wiki/Slo/Fragment_008_09
17 / 34
18 / 34
Image Manipulation
TU München, med., http://de.vroniplag.wikia.com/wiki/Vm/Fragment_028_00
Example of a Manipulated Blot from an Online
Journal
19 / 34
Plagiarism AND Data Fabrication
Charité, 2010, Medicine, http://de.vroniplag.wikia.com/wiki/Ali/Fragment_024_01
20 / 34
Plagiarism AND Data Fabrication
Different values
Charité, 2010, Medicine, http://de.vroniplag.wikia.com/wiki/Ali/Fragment_024_01
20 / 34
Plagiarism AND Data Fabrication
Different values Same percentages!
Charité, 2010, Medicine, http://de.vroniplag.wikia.com/wiki/Ali/Fragment_024_01
20 / 34
Detecting Plagiarism
• Teachers, researchers, and administrators want a
simple solution
Photo: Flickr cc-by-nc-sa: xtrarant, 2008
Art Installation: Jamie Pawlus, Indianapolis, Indiana, 2003
21 / 34
There are many companies willing to oblige
22 / 34
The sales pitches promise the moon
• "Advanced online plagiarism detection" (CatchItFirst)
• "Originality check" (Turnitin)
• "Student work is instantly checked for potential
Luc Viatour, CC BY-SA 3.0, via Wikimedia Commons
plagiarism using pattern recognition
algorithms." (Turnitin)
• "Easy, quick and accurate" (Ephorus)
• "Based on the latest research in computer
linguistics" (PlagScan)
• "Verification of originality" (StrikePlagiarism)
• "Plagiarism prevention that simply works" (Urkund)
23 / 34
European Network for Academic Integrity:
Testing of support tools for plagiarism detection
https://link.springer.com/article/10.1186/s41239-020-00192-4
24 / 34
Test of 15 web-based text-matching systems
in eight languages
• Czech
• English
• German
• Italian
• Latvian
• Slovakian
• Spanish
• Turkish
25 / 34
Coverage vs. Usability
26 / 34
Coverage vs. Usability
→ Ouriginal
= Turnitin
26 / 34
Such software is mostly snake oil
• False positives
• False negatives
• The companies sometimes violate
student’s copyrights.
• Reports are often very hard to
interpret.
• The numbers reported are generally
meaningless.
Snake Oil, Macau 2015,
Debora Weber-Wulff
CC-BY-SA
27 / 34
So do I have to look myself?
Sure! But you have to read the text
• The text is nicely written but …
• … it is stylistically above the level of the writer.
• Strange formatting
• Typos
• Abrupt style changes
• Odd words
• Embedded links
Flickr, cc-by-nc-nd, t_buchtele, 2009
28 / 34
Not just dissertations, published articles may also
contain hidden links pointing to a source
Simorangkir, D. N. (2010). The Feminization of Public Relations in Indonesia: Role Expectations and Prejudices.
International Journal of Arts and Sciences, 3(15), 71-89.
29 / 34
Searching with Google & Co
• Three to five nouns
• Enclose a phrase in
“..." Flickr, cc-by-nc-nd, Athena1970, 2008
• The typo
• Phrases from an often
referenced source
• Don’t only look at the
first page
• Set a time limit
30 / 34
Less is more
• Three to five words suffice
31 / 34
Less is more
• Three to five words suffice
Really!
31 / 34
Additional Tools
• Google Scholar
• Google Books
• Scanner, Optical Character Recognition (OCR)
• PicaPica
• Wikiblame
• TinEye & Google image search
• Text Compare
• Similarity Texter
https://people.f4.htw-berlin.de/~weberwu/
simtexter/app.html
32 / 34
Don’t sweep plagiarism under the carpet!
By Banksy
Photo byAbhishek.ujoshi (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via
Wikimedia Commons
33 / 34
Thank you! Hvala vam!
Questions? Vprašanja?
• My homepage:
people.f4.htw-berlin.de/~weberwu/
• My blog:
copy-shake-paste.blogspot.com
• VroniPlag Wiki: (c) 2018 HTW Berlin /
Nikolas Fahlbusch
vroniplag.wikia.org/de/wiki/Home
• TeSToP report:
https://link.springer.com/article/
10.1186/s41239-020-00192-4
34 / 34