SlideShare a Scribd company logo
MACROSCOPES
AND DISTANT
READING
IMPLICATIONS FOR INFRASTRUCTURES TO
SUPPORT COMPUTATIONAL HUMANITIES
SCHOLARSHIP
Joel de Rosney, 1979, The Macroscope
DISTANT READING
"Where distance is not an
obstacle, but a specific form of
knowledge: fewer elements,
hence a sharper sense of their
overall interconnection. Shapes,
relations, structures. Forms.
Models."
Moretti, Graphs, Maps and Trees: Abstract Models for Literary
History.
HYPERMEDIA
FLATTENED
HIERARCHIES
EXPLORE
ORDER IN
AGGREGATES:
COLLECTIONS,
ARCHIVES, ETC.
THREE MODELS IN THE WILD
1.Data Dumps & Export Buttons
2.Sandboxes & Platforms
3.Analysis as a Service & Onsite
Facilities
EXPORT
BUTTONS
& DATA DUMPS
THREE CHALLENGES
1.Rights: Can you broadly provide bulk
access to works?
2.Scale: Can your infrastructure deliver bulk
exports? Is the material too large for
researchers to work with in their
environments?
3.Skills: Do your users have the skills to work
with data at the command line?
SANDBOXES & PLATFORMS
ANALYSIS
AS A SERVICE
& ONSITE
FACILITIES
UNPACKING IMPLICATIONS
1.Whenever possible, move toward
providing bulk access to data.
2.Consider deriving intermediary or
transformative data products, like n-
grams
3. If no go on 1 & 2 explore
possibilities for analytic services
Catspyjamasnz, The Network by @nancywhite,
https://www.flickr.com/photos/catspyjamasnz/7169043832
CC-BY-NC-ND
onegoodbumblebee Pez.
https://www.flickr.com/photos/onegoodbumblebee/141388029
4 CC-BY-NA-ND
Dstrelau, Toys
https://www.flickr.com/photos/dstrelau/5861814214 CC-BY

More Related Content

PPTX
People, Communities and Platforms: Digital Cultural Heritage and the Web
PPT
Shaping our Future: Digitization Partnerships Across Libraries, Archives and ...
PPTX
Materiality and the digital archive
PPTX
Columbia.lippincott.2012
PPT
Social Networking Sites and Libraries
PDF
Digital Libraries Digital Humanities: Current and Emerging Roles for Librarians
PDF
Technology Trends in Libraries - Today & Tomorrow
PPTX
Dh intro
People, Communities and Platforms: Digital Cultural Heritage and the Web
Shaping our Future: Digitization Partnerships Across Libraries, Archives and ...
Materiality and the digital archive
Columbia.lippincott.2012
Social Networking Sites and Libraries
Digital Libraries Digital Humanities: Current and Emerging Roles for Librarians
Technology Trends in Libraries - Today & Tomorrow
Dh intro

What's hot (20)

PPTX
Digital text as a phenomenon of culture
PDF
Makerspaces: a great opportunity to enhance academic libraries, Stellenbosch...
PPT
Electronic publishing
PDF
Topic Maps: Romancing Conversation Topics
PDF
Preservation for all: the future of government documents and the “digital FDL...
PPTX
Virtual Symposium
PPTX
Dh presentation 2018
PDF
Granada0611 digital humanities
PDF
Week van de Mediawijsheid #WvdM15: Bibliotheken, makersbeweging en FabLabs
ODP
ARIN6912 Presentation Week 5: Digital Environments
PPTX
Wapahani library
PPTX
Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...
PPT
Going social: the librarians bag of tricks
PPTX
Libraries, research infrastructures and the digital humanities: are we ready ...
PDF
2 virtual library article 21 34
PPTX
Digital Humanities by Ingrid Thomson
PPTX
The Role of the Library in a Digital World
PPT
Elsevier Gran Challenge: The living document
PPTX
Promise of web science
PPTX
Dh presentation 2019
Digital text as a phenomenon of culture
Makerspaces: a great opportunity to enhance academic libraries, Stellenbosch...
Electronic publishing
Topic Maps: Romancing Conversation Topics
Preservation for all: the future of government documents and the “digital FDL...
Virtual Symposium
Dh presentation 2018
Granada0611 digital humanities
Week van de Mediawijsheid #WvdM15: Bibliotheken, makersbeweging en FabLabs
ARIN6912 Presentation Week 5: Digital Environments
Wapahani library
Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...
Going social: the librarians bag of tricks
Libraries, research infrastructures and the digital humanities: are we ready ...
2 virtual library article 21 34
Digital Humanities by Ingrid Thomson
The Role of the Library in a Digital World
Elsevier Gran Challenge: The living document
Promise of web science
Dh presentation 2019
Ad

Similar to Macroscopes and Distant Reading: Implications for Infrastructures to Support Computational Humanities Scholarship (15)

PDF
SP1: Exploratory Network Analysis with Gephi
PDF
Gephi icwsm-tutorial
PDF
Gephi short introduction
PDF
Cyber Summit 2016: Technology, Education, and Democracy
PPT
Clouds, Graphs, and Maps: CCC from a Distance
KEY
Document Management
PPTX
For netapp haifa 2012 v3
PPTX
Understanding the Big Data Enterprise
PPT
Class 5-introto dl
PPT
Class 5-introto dl
PPTX
Boundless Opportunity
PPTX
Data, Infrastructure and Public Policy
PPT
Data, Infrastructures and Public Policy
PPTX
Rightscaling, engagement, learning: reconfiguring the library for a network e...
SP1: Exploratory Network Analysis with Gephi
Gephi icwsm-tutorial
Gephi short introduction
Cyber Summit 2016: Technology, Education, and Democracy
Clouds, Graphs, and Maps: CCC from a Distance
Document Management
For netapp haifa 2012 v3
Understanding the Big Data Enterprise
Class 5-introto dl
Class 5-introto dl
Boundless Opportunity
Data, Infrastructure and Public Policy
Data, Infrastructures and Public Policy
Rightscaling, engagement, learning: reconfiguring the library for a network e...
Ad

More from Trevor Owens (20)

PDF
Caring for Digital Collections in the Anthropocene
PPTX
Theory and Craft of Digital Preservation Lightning Talk
PPTX
Planning for Digital Preservation in Organizations
PPTX
Enduring Digital Access: Establishing, Supporting, and Sustaining Digital Cur...
PPTX
Make it Last: Principals for Digital Preservation and Conservation
PPTX
Digital Preservation: Understanding the Risks
PPTX
Testing Our Assumptions: The Centrality of Design Thinking and Scholarship fo...
PPTX
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
PPTX
Start Today: Digital Stewardship Communities & Collaborations
PDF
Scientists’ Hard Drives, Databases, and Blogs: Preservation Intent and Source...
PPTX
Platform Thinking: Frameworks for a National Digital Platform State of Mind
PPTX
Digital Infrastructures that Embody Library Principles: The IMLS national dig...
PPTX
The IMLS National Digital Platform & Your Library: Tools You Can Use
PPTX
Update on IMLS National Digital Platform
PPTX
Next Steps for IMLS's National Digital Platform
PPTX
Next Steps for IMLS's National Digital Platform
PPTX
Digital Preservation's Role in the Future of the Digital Humanities
PPT
Cultural Heritage and the Crowd
PPTX
Signifying and significance
PPTX
Viewshare Curategear 2013
Caring for Digital Collections in the Anthropocene
Theory and Craft of Digital Preservation Lightning Talk
Planning for Digital Preservation in Organizations
Enduring Digital Access: Establishing, Supporting, and Sustaining Digital Cur...
Make it Last: Principals for Digital Preservation and Conservation
Digital Preservation: Understanding the Risks
Testing Our Assumptions: The Centrality of Design Thinking and Scholarship fo...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
Start Today: Digital Stewardship Communities & Collaborations
Scientists’ Hard Drives, Databases, and Blogs: Preservation Intent and Source...
Platform Thinking: Frameworks for a National Digital Platform State of Mind
Digital Infrastructures that Embody Library Principles: The IMLS national dig...
The IMLS National Digital Platform & Your Library: Tools You Can Use
Update on IMLS National Digital Platform
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital Platform
Digital Preservation's Role in the Future of the Digital Humanities
Cultural Heritage and the Crowd
Signifying and significance
Viewshare Curategear 2013

Recently uploaded (20)

PPT
Mutation in dna of bacteria and repairss
PPTX
PMR- PPT.pptx for students and doctors tt
PPTX
Understanding the Circulatory System……..
PDF
Science Form five needed shit SCIENEce so
PPT
veterinary parasitology ````````````.ppt
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPT
Presentation of a Romanian Institutee 2.
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
A powerpoint on colorectal cancer with brief background
PPTX
Probability.pptx pearl lecture first year
PPT
Computional quantum chemistry study .ppt
PPTX
perinatal infections 2-171220190027.pptx
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPTX
Substance Disorders- part different drugs change body
Mutation in dna of bacteria and repairss
PMR- PPT.pptx for students and doctors tt
Understanding the Circulatory System……..
Science Form five needed shit SCIENEce so
veterinary parasitology ````````````.ppt
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Enhancing Laboratory Quality Through ISO 15189 Compliance
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Seminar Hypertension and Kidney diseases.pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Presentation of a Romanian Institutee 2.
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
A powerpoint on colorectal cancer with brief background
Probability.pptx pearl lecture first year
Computional quantum chemistry study .ppt
perinatal infections 2-171220190027.pptx
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Substance Disorders- part different drugs change body

Macroscopes and Distant Reading: Implications for Infrastructures to Support Computational Humanities Scholarship

Editor's Notes

  • #2: As scholars become increasingly interested in approaching digital collections and digital objects as data for computational analysis it becomes critical for libraries, archives and museums to rethink some of their paradigms for providing access to materials. Two related concepts in emergent methodologies in the digital humanities, macroscopes and the notion of distant reading, provide a point of entry for identifying the requirements for digital library platforms to support this kind of scholarship.
  • #3: Josh Greenberg of the Sloan foundation described the concept of macroscopes thusly, where “Telescopes let you see far, microscopes let you see small, a macroscope lets you see big and complex.” That is, it’s about zooming out to visualize and explore relationships and patterns in aggregates and networks.
  • #4: Related, literary scholar Franco Morritti has famously coined the term “distant reading” to describe similar kinds of activities. In contrast to close reading, distant reading involves studying trends and patterns in things like graphs, maps and tree diagrams of features of texts. These two neologisms are part of a common trend, a push by scholars to make use of tools to explore and interpret patterns in wholes.
  • #5: By and large, the web has been great for the item and the object in cultural heritage organizations. In hypermedia, every resource is the first resource; every item’s URL is potentially the front door to everything else. As far as Google’s search algorithms are concerned, the page for each of the individual thousand items in a collection is as important as the page about the collection they form part of. This non-hierarchical and rhizomic nature to the web, and much of digital media more broadly, has been a bit disconcerting to librarians an archivists long committed to the coherence of collections and the importance of the context of fonds.
  • #6: To this end, the move to interest in macroscopes and distant reading provides a potential shift in approach to interpretation and analysis that could potentially better respect the value that comes from aggregates. That is, the parts in the whole of a particular archive or collection and their relationship to each other. Importantly, this makes it all the more critical that the structure and completeness of any given archive or collection is front and center for analysis. That is, the pattern in any distant reading of an archive is as much a map of relationships in the content as it is a map of the processes by which records were created, appraised, selected, and organized.
  • #8: In the emerging literature on historians use of digital collections for data analysis a common theme is to try, as quickly as possible, to download data to take it away to use it in their own tools on their own systems. Ian Milligan, who works with web archives, has referred to this as “Looking for the big red button.” To this end, whenever possible, the best first step for systems to support this kind of scholarly use is to provide easy ways for someone to export aggregate data. With this noted, with particularly large sets of data or data which is limited to various kinds of use, it’s likely a good idea to provide smaller sample sets of data.
  • #9: With this said, it is important to note that data dumps are not the bulk access silver bullet that one might hope for three reasons; rights, scale and the skills necessary to make use of them. In terms of rights, many collections, particularly of modern materials, come with rights restrictions that make it impossible to provide direct downloads of full content. In terms of scale, while it is possible to allow someone to download increasingly large scale sets of data it is still the case that there are aggregates of data that require significant resources to provide access to. Importantly, in many humanities cases this kind of analysis is still possible with scales that are modest in comparison to the requirements that scientists have for working with data sets. Lastly, there is a significant skills gap around the use of working with “raw” data. That is, of the possible field of users of a data set in the humanities there is a rather small community of them who have the necessary chops to work at the command line to iron out issues and process collection and object data into processable and computable information. With that said, there are a range of projects and initiatives ongoing focused on bootstrapping humanities scholars into the required competencies to do this kind of work. To this end, there are two other primary methods for working around these three limitations that I think are promising in a variety of ways.
  • #10: Sandboxes & Multi-Purpose and Purpose Built Platforms: A tool like the Bookworm, the software that powers the Google Books N-Gram viewer, illustrates the potential for two related approaches to enable scholars with limited command line chops to engage in analysis of or the similar. Set up against the derived set of n-grams, a derivative data product created from the google books corpus which notes the frequency of sequences of words in the corpus of google books, the viewer lets a user search for terms and compare their relative frequency in a corpus over time. In this case, the production of a derivative data set, the n-grams, they have side stepped the rights issues that would have occurred if they had provided raw full text access to the underlying works. To this end, the n-grams can themselves be downloaded and used with other tools. Along with that, the Bookworm platform provides a way for scholars who do not have any command line expertise to make use of the data. There are a range of tools and platforms that I would put in this category, for example this is the kind of thing that the Hathi Trust Research Center is working to support. With this noted, it is important to recognize the limitations of these kinds of purpose built tools. In cases where one does not provide the data product underlying the tool there are clear limits to what scholars can do with the underlying data. Furthermore, the reason that google n-gram works is that considerable work was put into the preparation of the underlying dataset. In contrast, many digital collections are a bit of a mess, so it is likely that for a researcher to do sophisticated computational work with them there would be a need for them to engage in this kind of data cleanup and processing to get materials in a form fit for analysis.
  • #11: Analysis as a Service and Onsite Research Facilities: Something like the National Software Reference Library, a project of the US National Institutes of Standards and Technology, models a third example of supporting this kind of computational work. The NSRL provides an onsite research environment where researchers can come in to engage in computational analysis of the tens of millions of files from commercial software in the collection. Staff in this research environment can also run algorithms created by researchers remotely and provide them with the outputs and results. In this case, with a collection of materials at an organization with particularly high concerns about limiting access to the corpus creating an onsite research space and setting up staff to run the jobs that researcher around the world create provides a solution that ensures that rights are protected while computational scholarship is enabled. In this case, the significant limitations is the resources required to stand up and staff such a research center and the fact that the process is much less immediate than the more direct ability to either manipulate some platform or interface on the web or to directly download data.
  • #12: Whenever possible, move toward providing bulk access to data. That means, ideally, exploring ways to offer downloads of arbitrary aggregates of both metadata and digital objects. Given that some of these aggregates could be massive in size, it is likely best to explore ways to queue large requests up and use things like bit torrent as a way to limit the resources they would consume. Provide persistent identifiers for those aggregates to enable dataset citation. Consider deriving intermediary or transformative data products, like n-grams, in cases where one cannot provide access directly to works and explore ways to create purpose built tools, like the google n-gram viewer, that can be deployed to enable exploratory analysis of intermediary products. In cases with particularly thorny rights situations, consider establishing in house services whereby researchers can give you their algorithms and you run them against a corpora and provide the outputs back to them.