Linked Books
Giovanni Colavizza EPFL
Motivation: a question
How to find sources for a humanities research?
How to find literature for a research in “hard” sciences?
Motivation: the differences between humanities and
“hard” sciences
• Primary and secondary sources
• Citation history (e.g. Google Scholar)
• Citation semantics
Motivation: primary and secondary sources
Approx. half of the citations in humanities are to
primary sources [Wiberley (2009)].
Their use has hardly ever been studied with
citation analytic methods.
“For	
  scholarship	
  in	
  the	
  humanities	
  there	
  are	
  three	
  kinds	
  of	
  literature:	
  primary	
  literature	
  that	
  
contains	
  the	
  evidence	
  on	
  which	
  humanists base	
  their	
  scholarship,	
  secondary	
  literature	
  in	
  
which	
  humanists	
  write	
  up	
  their	
  scholarship,	
  and	
  access	
  services	
  that	
  describe	
  and	
  index	
  the	
  
publications	
  written	
  by	
  humanists.”	
  (Wiberley,	
  2009)
Motivation: citation history
Lack of data [Sula and Miller (2014)], why?
• Sparse and local sub-fields
• Nationality (language and schools)
• Proliferation of editorial practices
Motivation: citation semantics
•Humanists are less prone to credit each other than scientists [Heinzkill,
1980; Swales, 1990; Hellqvist, 2010]
•They are less prone to work together. Avg. authors per publication of
1.06 in a study by Linmans (2010)
•They use citations with a great variety of meanings and ways: agree,
disagree, full association, minor reference, etc. [Harwood (2008), Cano
(1989)]
Examples:
Strongly negative: “Professor Epstein’s comment presents no new
findings and ignores the theoretical issues I raise.” and quote to Epstein
2008. Ogilvie (2008).
Association: “non basta ridimensionare gli aspetti strutturali del declino
economico, che per Venezia fu comunque solo “relativo”, ..” and quote
to Rapp 1979. Trivellato (2000).
Motivation: our answer
Citation analysis for humanities is an almost non-existent
field, yet the results could be very rich:
We cannot simply use traditional citation analysis
methods on humanities data. We need new questions
and methods.
The project: goals
• Digitise all historiography on Venice we can (i.e.,
for now, history).
• Extract all citations and populate a database.
• Analyse the history of the history of Venice and
develop a framework for citation analysis for
humanities.
• Publish an open access search engine for scholars
and general public.
The project: goals
“Side effects”, we have the full text of
most publications on Venice,
considering we are also digitising
documents at the Archive..
• Indexes of keywords (e.g. named
entities)
• Direct link publication-sources
• Topic modelling and fine-grain
classification of publications
(currently at most Dewey subjects..)
• Enhanced library catalogue
The project: partners and materials
Partnership with Ca’ Foscari Library System (humanities
library) and discussion with major Venetian libraries.
Digitisation goal: digitise all secondary literature on
Venice for the last 200y (monographs, journals, editions,
etc.). Currently circa 5000 estimated items (there are
many more). Digitisation ongoing (1513 done last
Friday).
Methods: overview
Methods I: data extraction
Methods I: data extraction
Methods I: data extraction
The steps:
• OCR
• Citation detection
• Citation parsing
• Model and populate the db (ontologies for citations)
Basic tools:
• Active annotation for supervised learning (minimise
training data to annotate)
• Conditional Random Fields for parsing
• RDF and triple stores as database
Methods II: citation analysis, networks
Network-based models. Remember primary and
secondary sources, how many graphs can we
build?
Bibliographic coupling and co-citation
Methods II: citation analysis, networks
Methods II: citation analysis
Network-based models:
• Global analysis
• Local analysis (communities and nodes)
• Temporal analysis
• Publication classification and analysis
Big questions:
• Key works, authors, sources
• Disciplinary segmentations
• Measure intellectual influence and schools of thought
• Map scholarly debates
Linked Books
Thank you
Giovanni Colavizza EPFL

Linked Books - DH Venice Fall School 2014

  • 1.
  • 2.
    Motivation: a question Howto find sources for a humanities research? How to find literature for a research in “hard” sciences?
  • 3.
    Motivation: the differencesbetween humanities and “hard” sciences • Primary and secondary sources • Citation history (e.g. Google Scholar) • Citation semantics
  • 4.
    Motivation: primary andsecondary sources Approx. half of the citations in humanities are to primary sources [Wiberley (2009)]. Their use has hardly ever been studied with citation analytic methods. “For  scholarship  in  the  humanities  there  are  three  kinds  of  literature:  primary  literature  that   contains  the  evidence  on  which  humanists base  their  scholarship,  secondary  literature  in   which  humanists  write  up  their  scholarship,  and  access  services  that  describe  and  index  the   publications  written  by  humanists.”  (Wiberley,  2009)
  • 5.
    Motivation: citation history Lackof data [Sula and Miller (2014)], why? • Sparse and local sub-fields • Nationality (language and schools) • Proliferation of editorial practices
  • 6.
    Motivation: citation semantics •Humanistsare less prone to credit each other than scientists [Heinzkill, 1980; Swales, 1990; Hellqvist, 2010] •They are less prone to work together. Avg. authors per publication of 1.06 in a study by Linmans (2010) •They use citations with a great variety of meanings and ways: agree, disagree, full association, minor reference, etc. [Harwood (2008), Cano (1989)] Examples: Strongly negative: “Professor Epstein’s comment presents no new findings and ignores the theoretical issues I raise.” and quote to Epstein 2008. Ogilvie (2008). Association: “non basta ridimensionare gli aspetti strutturali del declino economico, che per Venezia fu comunque solo “relativo”, ..” and quote to Rapp 1979. Trivellato (2000).
  • 7.
    Motivation: our answer Citationanalysis for humanities is an almost non-existent field, yet the results could be very rich: We cannot simply use traditional citation analysis methods on humanities data. We need new questions and methods.
  • 8.
    The project: goals •Digitise all historiography on Venice we can (i.e., for now, history). • Extract all citations and populate a database. • Analyse the history of the history of Venice and develop a framework for citation analysis for humanities. • Publish an open access search engine for scholars and general public.
  • 9.
    The project: goals “Sideeffects”, we have the full text of most publications on Venice, considering we are also digitising documents at the Archive.. • Indexes of keywords (e.g. named entities) • Direct link publication-sources • Topic modelling and fine-grain classification of publications (currently at most Dewey subjects..) • Enhanced library catalogue
  • 10.
    The project: partnersand materials Partnership with Ca’ Foscari Library System (humanities library) and discussion with major Venetian libraries. Digitisation goal: digitise all secondary literature on Venice for the last 200y (monographs, journals, editions, etc.). Currently circa 5000 estimated items (there are many more). Digitisation ongoing (1513 done last Friday).
  • 11.
  • 12.
    Methods I: dataextraction
  • 13.
    Methods I: dataextraction
  • 14.
    Methods I: dataextraction The steps: • OCR • Citation detection • Citation parsing • Model and populate the db (ontologies for citations) Basic tools: • Active annotation for supervised learning (minimise training data to annotate) • Conditional Random Fields for parsing • RDF and triple stores as database
  • 15.
    Methods II: citationanalysis, networks Network-based models. Remember primary and secondary sources, how many graphs can we build? Bibliographic coupling and co-citation
  • 16.
    Methods II: citationanalysis, networks
  • 17.
    Methods II: citationanalysis Network-based models: • Global analysis • Local analysis (communities and nodes) • Temporal analysis • Publication classification and analysis Big questions: • Key works, authors, sources • Disciplinary segmentations • Measure intellectual influence and schools of thought • Map scholarly debates
  • 18.