Working with Scholarly APIs
NISO Training Series
Course Moderator: Phill Jones, PhD
Co-founder, Digital and Technology, MoreBrains
What is this course
about?
1
Application Programming Interface
“An application programming interface (API) is a connection between computers or between
computer programs. It is a type of software interface, offering a service to other pieces of
software. A document or standard that describes how to build or use such a connection or
interface is called an API specification. A computer system that meets this standard is said to
implement or expose an API. The term API may refer either to the specification or to the
implementation.”*
*Reddy, Martin. API Design for C++. United States, Elsevier Science, 2011.
“An application programming interface (API) is a connection between computers or between
computer programs. It is a type of software interface, offering a service to other pieces of
software. A document or standard that describes how to build or use such a connection or
interface is called an API specification. A computer system that meets this standard is said to
implement or expose an API. The term API may refer either to the specification or to the
implementation.”*
Application Programming Interface
*Reddy, Martin. API Design for C++. United States, Elsevier Science, 2011.
APIs let computers and programs talk to each other
API
● They are software, specifications, and implementations that let systems communicate in predictable ways
● They are NOT user interfaces
● There are several types of API; Operating system API, Remote API, Database API, Web API
Web applications weren’t always built with APIs
Clien
t
Server
Initial request
HTML
Form POST
HTML
Traditional way - multi page apps Using a web API - single page apps
● When user inputs a request, the server
generates a new page based on a template
● The ‘work’ is done on the server
● Less demanding for client computer
● Tightly coupled to a single server / data
source
● When user inputs a request, a call is sent
to an API on a server
● The response is in plain text
● Front-end Javascript is used to interpret
the results and display them on the page
● Can call on multiple APIs if needed
Clien
t
Server
Initial request
HTML
API call
JSON
{...}
21°C
Internet
of Things
API
Database
The power of the API-driven web
Server
● Reproducibly connect to a service
● Can be available for all to use, or restricted using
authentication
● Read and write data in well supported formats
● Makes life easier for a range of use cases
○ App development
○ Data interchange workflows
○ Data science
More about use
cases
2
Example 1 - Workflow integrations
● Metadata about research objects moves around the
scholarly research cycle
○ Research grants
○ Outputs
○ People
○ Institutions
● It’s wasteful and error-prone for that information to be
manually rekeyed into….
○ Research management systems (CRIS)
○ Funder databases
○ Researchers personal profiles and CVs
● Persistent identifier (PID) agencies use APIs to enable
stakeholders to deposit and read metadata from central
registries
● Publishers, institutions and funders can save resources and
increase accuracy, timeliness and completeness of data by
integrating with APIs provided by those registries https://doi.org/10.5281/zenodo.4991733
Example 2 - Building applications
API Same organisation
Different organisation
Combining data from
multiple sources
● it’s relatively easy to access data and services if organisations make the API and documentation
public
● Many organisations will use the same API for their own applications than they offer to third party
developers - e.g. ORCID, Altmetric
List of Public APIs: https://github.com/public-apis/public-apis#open-data
API
Example 3 - Data science / bibliometrics
Science, Digital; Draux, Hélène, Szomszor, Martin; (2017): Topic Modelling of
Research in the Arts and Humanities. Digital Science. Report.
https://doi.org/10.6084/m9.figshare.5621260.v2
● Data can be extracted from databases using APIs and analysed
using languages like Python and R
● Because there are strong standards around APIs, libraries
exists to make it easy to connect to APIs and get the data you
want
● There are increasing numbers of free APIs that you can use to
do your own research
Some basic
vocabulary
3
REpresentational State Transfer
● Not a protocol, REST is a set of design principles for web APIs that ‘sits on top of’ HTTP
● Intended to be lighter weight and easier than previous approaches
● Makes up over 70% of APIs
● Stateless
○ The server doesn’t remember so if you have to authenticate, you do so with EVERY request
● Uses HTTP methods or ‘verbs’ to perform operations
Create
Read
Update
Delete
Post
Get
Put
Delete
Database operations RESTful methods
Being RESTful
Photo by Lisa Fotios: https://www.pexels.com/photo/adult-dog-on-
white-bed-2102839/
API
Request (GET/POST/PUT/DELETE)
Response (Usually JSON text)
GET Asks the API for data. The URL that you send
will have a query built into it
POST
Used for adding a new ‘child’ resource (e.g. a
new record)
PUT
Used for making modifications to an existing
child resource
DELETE What it says on the tin. Deletes a resource
Anatomy of an API GET request
Method
AKA Verb
Endpoint Parameters
eg.
● Search terms
● Filters
● etc
Headers
eg.
● Authorization
● User-Agent
● Content-Type
cURL -X GET https://example.com/API/?<par1>=<val1>&<par2>=<val2> -H Authorization Bearer:<token>
What you get back from a simple GET request
● Plain text response
● Usually, JSON or XML
● Highly flexible tree-like structure
● May not easily go into a single table
{
"characters": [
{
"name": "Janet",
"age": 12,
"species": "human"
},
{
"name": "John",
"age": 11,
"species": "human"
},
{
"name": "Spot",
"age": 4,
"species": "dog",
"fur": [
{
"colour": "brown"
},
{
"pattern": "spotted"
}
]
}
]
}
name age species furID
Janet 12 human
John 11 human
Spot 4 dog 00001
furID colour pattern
00001 brown spotted
Status codes
In a world with few guarantees, for any API request you will ALWAYS get a status code…
…it just might not be the one you want
1xx Informational Used in more advanced multi-post workflows
2xx Success Generally good news - something worked
3xx Redirection Usually the URL has changed, use the new one
4xx Client Error You probably made a mistake
5xx Server Error API server is probably having trouble
Example common codes:
● 200 - OK
● 201 - Created
● 202 - Accepted
400 - Bad Request (an error in the request)
403 - Forbidden (eg a bad authorisation key)
404 - Not Found (wrong URL)
500 - Internal Server Error (a problem with the server)
Complete list: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
Tools used in this
course
4
For each lecture, you may or may not need…
1. Command line to use cURL
1. Postman (either the client app or online)
1. Google Colaboratory (online)
a. …or Jupyter Notebooks is an alternative
cURL on the command line
1. For MAC, open terminal - For Windows, open CMD
1. Create a temporary folder in your home directory
> mkdir curl-temp
> cd curl-temp
cURL on the command line
1. For MAC, open terminal - For Windows, open CMD
1. Create a temporary folder in your home directory
> mkdir curl-temp
> cd curl-temp
> curl -X GET "http://info.cern.ch/hypertext/WWW/TheProject.html"
cURL on the command line
1. For MAC, open terminal - For Windows, open CMD
1. Create a temporary folder in your home directory
> mkdir curl-temp
> cd curl-temp
> curl -X GET "http://info.cern.ch/hypertext/WWW/TheProject.html"
> curl -X GET "https://api.crossref.org/works/10.5555%2F12345678"
Postman
https://www.postman.com/niso-api-course-lecturers/workspace/2022-niso-api-course-collection/
Google Colaboratory
https://drive.google.com/drive/folders/1zYFefI4uvLg3oq0kD98_jBlrlHhdL6mx
Google Colaboratory - Requirement
You need to a google (gmail) account
Schedule
28 Apr, 2022 Dr Phill Jones Introduction Co-Founder, MoreBrains Cooperative
5 May, 2022 Jordan Holt ORCID API Member Support Technical Specialist, ORCID
12 May, 2022 Patricia Feeney Crossref API Head of Metadata, Crossref
19 May, 2022 Dr Hélène Draux Dimensions API Data Scientist, Digital Science
26 May, 2022
Pavel Kasyanov and Kadri
Nedbiu Web of Science Bibliometrics expert and Product manager, Clarivate
2 Jun, 2022 Dr Donny Winston Open Alex President, Polyneme LLC
9 Jun, 2022 Dr Martin Szomszor APIs for Data Science Founder and Chief Scientist, Electric Data Solutions
16 Jun, 2022 Jakob Fix OECD Data API Digital Product Engineering Manager
Questions
Open
Discussion

Jones "Working with Scholarly APIs: A NISO Training Series, Session One: Foundational Specifics"

  • 1.
    Working with ScholarlyAPIs NISO Training Series Course Moderator: Phill Jones, PhD Co-founder, Digital and Technology, MoreBrains
  • 2.
    What is thiscourse about? 1
  • 3.
    Application Programming Interface “Anapplication programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build or use such a connection or interface is called an API specification. A computer system that meets this standard is said to implement or expose an API. The term API may refer either to the specification or to the implementation.”* *Reddy, Martin. API Design for C++. United States, Elsevier Science, 2011.
  • 4.
    “An application programminginterface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build or use such a connection or interface is called an API specification. A computer system that meets this standard is said to implement or expose an API. The term API may refer either to the specification or to the implementation.”* Application Programming Interface *Reddy, Martin. API Design for C++. United States, Elsevier Science, 2011.
  • 5.
    APIs let computersand programs talk to each other API ● They are software, specifications, and implementations that let systems communicate in predictable ways ● They are NOT user interfaces ● There are several types of API; Operating system API, Remote API, Database API, Web API
  • 6.
    Web applications weren’talways built with APIs Clien t Server Initial request HTML Form POST HTML Traditional way - multi page apps Using a web API - single page apps ● When user inputs a request, the server generates a new page based on a template ● The ‘work’ is done on the server ● Less demanding for client computer ● Tightly coupled to a single server / data source ● When user inputs a request, a call is sent to an API on a server ● The response is in plain text ● Front-end Javascript is used to interpret the results and display them on the page ● Can call on multiple APIs if needed Clien t Server Initial request HTML API call JSON {...}
  • 7.
    21°C Internet of Things API Database The powerof the API-driven web Server ● Reproducibly connect to a service ● Can be available for all to use, or restricted using authentication ● Read and write data in well supported formats ● Makes life easier for a range of use cases ○ App development ○ Data interchange workflows ○ Data science
  • 8.
  • 9.
    Example 1 -Workflow integrations ● Metadata about research objects moves around the scholarly research cycle ○ Research grants ○ Outputs ○ People ○ Institutions ● It’s wasteful and error-prone for that information to be manually rekeyed into…. ○ Research management systems (CRIS) ○ Funder databases ○ Researchers personal profiles and CVs ● Persistent identifier (PID) agencies use APIs to enable stakeholders to deposit and read metadata from central registries ● Publishers, institutions and funders can save resources and increase accuracy, timeliness and completeness of data by integrating with APIs provided by those registries https://doi.org/10.5281/zenodo.4991733
  • 10.
    Example 2 -Building applications API Same organisation Different organisation Combining data from multiple sources ● it’s relatively easy to access data and services if organisations make the API and documentation public ● Many organisations will use the same API for their own applications than they offer to third party developers - e.g. ORCID, Altmetric List of Public APIs: https://github.com/public-apis/public-apis#open-data API
  • 11.
    Example 3 -Data science / bibliometrics Science, Digital; Draux, Hélène, Szomszor, Martin; (2017): Topic Modelling of Research in the Arts and Humanities. Digital Science. Report. https://doi.org/10.6084/m9.figshare.5621260.v2 ● Data can be extracted from databases using APIs and analysed using languages like Python and R ● Because there are strong standards around APIs, libraries exists to make it easy to connect to APIs and get the data you want ● There are increasing numbers of free APIs that you can use to do your own research
  • 12.
  • 13.
    REpresentational State Transfer ●Not a protocol, REST is a set of design principles for web APIs that ‘sits on top of’ HTTP ● Intended to be lighter weight and easier than previous approaches ● Makes up over 70% of APIs ● Stateless ○ The server doesn’t remember so if you have to authenticate, you do so with EVERY request ● Uses HTTP methods or ‘verbs’ to perform operations Create Read Update Delete Post Get Put Delete Database operations RESTful methods
  • 14.
    Being RESTful Photo byLisa Fotios: https://www.pexels.com/photo/adult-dog-on- white-bed-2102839/ API Request (GET/POST/PUT/DELETE) Response (Usually JSON text) GET Asks the API for data. The URL that you send will have a query built into it POST Used for adding a new ‘child’ resource (e.g. a new record) PUT Used for making modifications to an existing child resource DELETE What it says on the tin. Deletes a resource
  • 15.
    Anatomy of anAPI GET request Method AKA Verb Endpoint Parameters eg. ● Search terms ● Filters ● etc Headers eg. ● Authorization ● User-Agent ● Content-Type cURL -X GET https://example.com/API/?<par1>=<val1>&<par2>=<val2> -H Authorization Bearer:<token>
  • 16.
    What you getback from a simple GET request ● Plain text response ● Usually, JSON or XML ● Highly flexible tree-like structure ● May not easily go into a single table { "characters": [ { "name": "Janet", "age": 12, "species": "human" }, { "name": "John", "age": 11, "species": "human" }, { "name": "Spot", "age": 4, "species": "dog", "fur": [ { "colour": "brown" }, { "pattern": "spotted" } ] } ] } name age species furID Janet 12 human John 11 human Spot 4 dog 00001 furID colour pattern 00001 brown spotted
  • 17.
    Status codes In aworld with few guarantees, for any API request you will ALWAYS get a status code… …it just might not be the one you want 1xx Informational Used in more advanced multi-post workflows 2xx Success Generally good news - something worked 3xx Redirection Usually the URL has changed, use the new one 4xx Client Error You probably made a mistake 5xx Server Error API server is probably having trouble Example common codes: ● 200 - OK ● 201 - Created ● 202 - Accepted 400 - Bad Request (an error in the request) 403 - Forbidden (eg a bad authorisation key) 404 - Not Found (wrong URL) 500 - Internal Server Error (a problem with the server) Complete list: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
  • 18.
    Tools used inthis course 4
  • 19.
    For each lecture,you may or may not need… 1. Command line to use cURL 1. Postman (either the client app or online) 1. Google Colaboratory (online) a. …or Jupyter Notebooks is an alternative
  • 20.
    cURL on thecommand line 1. For MAC, open terminal - For Windows, open CMD 1. Create a temporary folder in your home directory > mkdir curl-temp > cd curl-temp
  • 21.
    cURL on thecommand line 1. For MAC, open terminal - For Windows, open CMD 1. Create a temporary folder in your home directory > mkdir curl-temp > cd curl-temp > curl -X GET "http://info.cern.ch/hypertext/WWW/TheProject.html"
  • 22.
    cURL on thecommand line 1. For MAC, open terminal - For Windows, open CMD 1. Create a temporary folder in your home directory > mkdir curl-temp > cd curl-temp > curl -X GET "http://info.cern.ch/hypertext/WWW/TheProject.html" > curl -X GET "https://api.crossref.org/works/10.5555%2F12345678"
  • 23.
  • 24.
  • 25.
    Google Colaboratory -Requirement You need to a google (gmail) account
  • 26.
    Schedule 28 Apr, 2022Dr Phill Jones Introduction Co-Founder, MoreBrains Cooperative 5 May, 2022 Jordan Holt ORCID API Member Support Technical Specialist, ORCID 12 May, 2022 Patricia Feeney Crossref API Head of Metadata, Crossref 19 May, 2022 Dr Hélène Draux Dimensions API Data Scientist, Digital Science 26 May, 2022 Pavel Kasyanov and Kadri Nedbiu Web of Science Bibliometrics expert and Product manager, Clarivate 2 Jun, 2022 Dr Donny Winston Open Alex President, Polyneme LLC 9 Jun, 2022 Dr Martin Szomszor APIs for Data Science Founder and Chief Scientist, Electric Data Solutions 16 Jun, 2022 Jakob Fix OECD Data API Digital Product Engineering Manager
  • 27.