Manual
Manual
Mathew W. McLean
Texas A&M University
Abstract
This work introduces the R package RefManageR, which provides tools for importing
and working with bibliographic references. It extends the bibentry class in R in a number
of useful ways, including providing R with previously unavailable support for BIBLATEX.
BIBLATEX provides a superset of the functionality of BIBTEX, including full Unicode sup-
port, no memory limitations, additional fields and entry types, and more sophisticated
sorting of references. RefManageR provides functions for citing and generating a bibliog-
raphy with hyperlinks for documents prepared with RMarkdown or RHTML. Existing .bib
files can be read into R and converted from BIBTEX to BIBLATEX and vice versa. References
can also be imported via queries to NCBI’s Entrez, Zotero libraries, Google Scholar, and
CrossRef. Additionally, references can be created by reading PDFs stored on the user’s
machine with the help of Poppler. Entries stored in the reference manager can be easily
searched by any field, by date ranges, and by various formats for name lists (author by
last names, translator by full names, etc.). Entries can also be updated, combined, sorted,
printed in a number of styles, and exported.
1. Introduction
Creating, managing, and processing references can often be a hastle. There are a number of
reasons one may want or need to work with bibliographic data in R (R Core Team 2013), for
example for bibliometrics. The person and bibentry classes available in the base-priority
utils package since R 2.14.0 provide very useful functionality for working with names and
bibliographic information, respectively. An introduction to these classes is available in Hornik,
Murdoch, and Zeileis (2012). In this paper, I introduce the RefManageR package, which uses
these classes as building blocks to greatly simplify working with bibliographies in R.
The bibentry class is designed to work with references in BIBTEX format (Patashnik 1988).
2 Straightforward Bibliography Managament in R Using the RefManageR Package
RefManageR provides the BibEntry class which also works with BIBTEX references, but
additionally supports BIBLATEX formatting.
The BIBTEX fields stored in a bibentry object can be easily accessed using the `$` operator,
but there do not exist functions for conveniently conducting complicated searches. These
are provided by the RefManageR package using the `[` operator. With this operator one
may search a collection of references by any field or group of fields. BIBLATEX fields for lists
of names, such as ’author’ and ’editor’, can be searched by family name only, full name, or
full name with initials. Additionally, dates may be specified by ranges and are compared
using the lubridate package (Grolemund and Wickham 2011). Entries may also be indexed
by key, created in several different ways using functions BibEntry and as.BibEntry, and
updated using the `[<-` operator. The bibentry class provides a method for the c generic for
concatenating entries, our package retains this feature, while also providing a merge method
to remove potential duplicate entries when combining entries from various sources.
Entries may be imported into R in a number of ways. A function is provided for reading in
.bib files in BIBLATEX and BIBTEX format. For machines with Poppler (http://poppler.
freedesktop.org) installed, bibliographic metadata can be read from PDFs stored on the
user’s machine to generate a citation for each PDF. The package also contributes interfaces to
the CrossRef, Zotero, and NCBI’s Entrez APIs to search and import references from these
resources, using the RCurl package (Lang 2013a) for the HTTP requests. References can
additionally be obtained from a researcher’s Google Scholar profile.
The package is equiped with additional printing formats and several bibliography and citation
styles. All the bibliography sorting options available in BIBLATEX are available in RefManageR.
A convenient interface for setting optional arguments for the most commonly used functions
similar to the options function is used. In case it is necessary to convert between formats, for
example when submitting to a journal that does not support BIBLATEX, a function is provided
for converting a bibliography with BIBLATEX formatting back to BIBTEX.
To our knowledge our package is the first of its kind to provide support for including citations
and bibliographys with hyperlinks in [R]HTML and [R]Markdown documents. Links can
point from each citation to their bibliography entry and vice versa, and hyperlinks are also
automatically created for values in the BIBLATEX fields ‘url’, ‘doi’, and ‘eprint’.
The rest of the document proceeds as follows: In Section 2 I show how to create bibliography
entries in R, import them from local files, and discuss setting package options; Section 3
discusses importing references from the web; in Section 4 I discuss printing, sorting, and
exporting references; in Section 5 I show how to search and update BibEntry objects; Section 6
introduces using RefManageR to cite references and print a bibliography of only cited references;
lastly, Section 7 concludes.
other arguments in field = value form to the "..." argument. Though the ‘year’ field is
still supported in BIBLATEX to allow backwards compatibility with BIBTEX, the field ‘date’ is
preferred and allows for a number of different formats for the date, which will be discussed
later. The field ‘journaltitle’ is preferred for specifying journals, though ‘journal’ remains
supported. Below I create and print an entry of type ‘Article’ with fields ‘author’, ‘title’,
‘date’, ‘journaltitle’, ‘volume’, and ‘number’. The print function for BibEntry objects offers a
number of features which will be discussed in detail later. Its default settings are chosen to
mimic the defaults of BIBLATEX. The toBiblatex function can be used to display the entry in
its .bib file format.
toBiblatex(bib)
## @Article{barry1996,
## date = {1996-08},
## title = {A Diagnostic to Assess the Fit of a Variogram to Spatial Data},
## author = {Ronald Barry},
## journaltitle = {Journal of Statistical Software},
## volume = {1},
## number = {1},
## }
BIBLATEX offers a huge amount of additional functionality compared to BIBTEX. For the
full details, one can see the 253 page user manual (Lehman, Kime, Boruvka, and Wright
2013). BIBLATEX expands the number of automatically recognized entry types and fields offered
by BIBTEX, allowing for much more detailed bibliographic entries, while still maintaining
compatibility with BIBTEX. For example, to handle an arXiv eprint in BIBTEX, one needs to
use or create a special BIBTEX style, or perhaps use the ‘note’ and ‘year’ fields in unintended
ways. The below entry is used in an attempt to cite a submitted manuscript of the first
author’s using the bibentry function.
Though arXiv provides suggestions for creating BIBTEX entries for their papers (http://
arxiv.org/hypertex/bibstyles/), there is a frustrating lack of consistency in how people
choose to create BIBTEX entries for their arXiv papers. In BIBLATEX, there is greatly expanded
support for electronic publications with fields for eprint, eprinttype, eprintclass, urldate, and
pubstate. One can cite the same article in BIBLATEX without the need of a special .bst file or
the note field using
@misc{mclean2013bayesian,
author = {M. W. McLean and F. Scheipl and G. Hooker
and S. Greven and D. Ruppert},
title = {Bayesian Functional Generalized Additive Models
with Sparsely Observed Covariates},
urldate = {2013-10-06},
date = {2013},
eprinttype = {arxiv},
eprintclass = {stat.ME},
eprint = {1305.3585},
pubstate = {submitted},
}
In BIBLATEX the ‘eprint’ identifier will automatically become a hyperlink to the paper on
arXiv.
The bibentry class supports BIBTEX-style crossreferencing, while the BibEntry class. Cross
references are handled specially when indexing and searching BibEntry objects and discussed
in Section 5.1. In a similar vain as cross-referencing, BIBLATEX supports an entry type “XData”
which is never printed, but may be used to store fields that are shared by several entries.
Entries can specify a field ‘xdata’ containing a comma separated list of keys belonging to
XData entries that the entry inherits from. The following example demonstrates its use for
User Manual for R package RefManageR 5
online references available on arXiv, and uses the c operator for combining BibEntry objects.
## @XData{statME,
## eprinttype = {arxiv},
## eprintclass = {stat.ME},
## }
##
## @XData{online2013,
## year = {2013},
## urldate = {2013-12-20},
## }
## XData: online2013
##
## XData: statME
##
## [1] M. McLean, G. Hooker and D. Ruppert. _Restricted Likelihood
## Ratio Tests for Scalar-on-Function Regression_. 2013. arXiv:
## 1310.5811 [stat.ME]. <URL: http://arxiv.org/abs/1310.5811>
## (visited on 12/20/2013).
##
## [2] M. McLean, F. Scheipl, G. Hooker, et al. _Bayesian Functional
## Generalized Additive Models for Sparsely Observed Covariates_.
## 2013. arXiv: 1305.3585 [stat.ME]. <URL:
## http://arxiv.org/abs/1305.3585> (visited on 12/20/2013).
6 Straightforward Bibliography Managament in R Using the RefManageR Package
The cross-referencing system in BIBLATEX and RefManageR is more sophisticated than the
symmetric field mapping system used in BIBTEX, allowing for less cluttering and duplication
of fields. In BIBLATEX, the “InBook” entry type is used for a self-contained work with its own
title within a book, as opposed to simply referring to an untitled part of a book as in BIBTEX.
In the following example, involving an ‘InBook’ entry inheriting from a ‘Book’ entry, there is
no need to create a ‘booktitle’ field duplicating the ‘title’ field in the parent entry to pass on
to the child entry, and there is also no need to create an empty ‘subtitle’ field in the child
entry to ensure it does not incorrectly inherit the ‘subtitle‘ of the parent.
RefManageR recognizes some, but not all, localization keys defined by default in BIBLATEX. A
localization key is a special value that BIBLATEX parses for certain fields and replaces with
predefined text called the ‘localization string‘ when printing the bibliography. In the example
below I use localization keys to specify the roles of editors using the ‘editortype‘ field and refer
to portions of a text using the ‘bookpagination‘ field.
This last feature is important when searching the BibEntry object later; as it would not be
possible to properly search by parts of a name (such as family name only) if a name has not
been correctly converted to a person object. Since it is often not necessary in BIBLATEX to
provide all the "required" fields for an entry, it can be useful to be able turn off the check for
required fields in R when one wants to work with entries that are missing some fields. For
example, the sample .bib file that comes with the BIBLATEX package, and is also included with
RefManageR for demonstration purposes has three entries that are missing required fields.
The default behaviour is to not add these entries, but this can be changed.
## Ignoring entry titled "The Chicago Manual of Style" because A bibentry of bibtype
## 'Manual' has to specify the field: c("author", "editor")
## Ignoring entry titled "CTAN" because A bibentry of bibtype 'Online' has to
## specify the field: c("author", "editor")
## Ignoring entry titled "Computers and Graphics" because A bibentry of bibtype '
## Periodical' has to specify the field: editor
## [1] _The Chicago Manual of Style. The Essential Guide for Writers,
## Editors, and Publishers_. 15th ed. Chicago, Ill.: University of
## Chicago Press, 2003. ISBN: 0-226-10403-6.
##
## [2] _Computers and Graphics_. 35.4 (2011): _Semantic 3D Media and
## Content_. ISSN: 0097-8493.
##
## [3] _CTAN. The Comprehensive TeX Archive Network_. 2006. <URL:
## http://www.ctan.org> (visited on 10/01/2006).
If there is no DOI available and the document does not have a JSTOR cover page, it is
considerably more difficult to obtain an accurate citation. The function is often able to
recover the title, author, and date information. It can parse journal title, volume, and issue
information if it is present in an obvious format. Articles with complicated formatting and
missing the features discussed in the previous paragraph are not likely to be parsed correctly
and the user will have to manually edit the entries, which will be covered in a later section.
With the following code, Windows binaries of Poppler are downloaded along with some PDFs
to test out the function.
bib
Note that entry [4] is not complete. To clean up use setwd(curdir) and unlink(tmpdir).
Keys may be extracted from BibEntry objects using either the names method or the `$`
operator with name argument (the value to the right of the ‘$’ sign) equal to ‘key’. The extra
step involving the `names<-` method for BibEntry objects assigns a unique key to each entry,
as the citation function does not provide a key for each entry and may return more than
one reference for a single package. At this point, because of the use of lapply, pkg.bib is
a list of BibEntry objects, instead of a single BibEntry object. One way to rectify this is
using the internal function MakeCitationList. Additionally, the `names<-` method can be
used to assign keys. Entries in our BibEntry object of packages may be referred to using the
package name/key because of the special features of the `[` operator for BibEntry objects,
which will be discussed in detail in a later section.
pkg.bib[key = "boot"]
Using pkg.bib["boot"] matches the entry with key exactly “boot”. Using pkg.bib[key =
"boot"], a (partial) match occurs for any entry whose ‘key’ contains the string “boot”, due
to the default settings of the `[` operator discussed in Section 5. The BibEntry class also
has methods as.data.frame and unlist to convert BibEntry objects to a data frame and
unlist’ed vector, respectively. The function as.data.frame will create a data frame from a
BibEntry object with each row corresponding to a unique entry and one column for every
field present in the BibEntry object, including a column called ‘bibtype’ for the type of entry.
NA values indicate that the field is not present in that entry (row of the data frame). The
row names will be the ‘key’s of the entries.
BibOptions("check.entries")
## $check.entries
## [1] "error"
User Manual for R package RefManageR 11
The "..." argument of ReadPubMed can be used to pass additional optional arguments to
ESearch. Among them are retmax to specify the maximum number of entries to return,
retstart to specify the index of the first result to return, and field to search only a
particular field of the entries for a match. For controlling the date of the matches there are
options, datetype which gives the type of date to consider when searching by date; for example
datetype = "pdat" specifies to search by publication date and datetype = "mdat" specifies
to search by modification date. The mindate and maxdate options specify the minimum and
maximum dates that the search results should be restricted to. Dates should be in the format
"YYYY", "YYYY/MM", or "YYYY/MM/DD". Our next query returns one entry published in
2009 in the Journal of Statistical Software
The GetPubMedRelated function uses the ELink E-Utility to find related articles to a set of
articles or IDs. Either a character vector of IDs or a BibEntry object containing entries with
‘eprinttype’ field equal to "pubmed" and pubmed ID’s stored in the ‘eprint’ field (the format
expected by BIBLATEX and also returned by the ReadPubMed function) should be specified for
the id argument. ELink can perform in two distinct ways given a set of IDs, either search for
related articles for each ID in the set separately, or use the entire set at once to find articles
that are related to every article specified by the set of IDs. The latter type of behaviour is
requested in GetPubMedRelated by specifying batch.mode = TRUE as an argument in the call.
In the below example I find related entries to the articles returned by the previous query for
publications by RJC.
Entrez returns a similarity score with each returned citation giving a measure of how similar
the returned entry is to the specified IDs. These scores can be returned in the outputted
BibEntry object in a field called ‘score’ by specifying return.sim.scores = TRUE in the call.
Additionally, the IDs in the call that were used to determine the relation can be included in
the output in a field called ‘PMIDrelated’ if the argument return.related.ids is TRUE. In
the next example, batch.mode = FALSE is used and one related article is returned for each of
two entries in rjc.pm.
BibOptions(check.entries = FALSE)
ids <- rjc.pm$eprint[3:4]
ids
User Manual for R package RefManageR 13
## $guenther2014healthy
## [1] "24453128"
##
## $li2013selecting
## [1] "24376287"
## @Article{guenther2008evaluation,
## title = {Evaluation of the Healthy Eating Index-2005},
## author = {Patricia M Guenther and Jill Reedy and Susan M Krebs-Smith and
## Bryce B Reeve},
## year = {2008},
## journal = {Journal of the American Dietetic Association},
## volume = {108},
## number = {11},
## pages = {1854-64},
## eprint = {18954575},
## doi = {10.1016/j.jada.2008.08.011},
## eprinttype = {pubmed},
## score = {54583749},
## pmidrelated = {24453128},
## }
##
## @Article{seghouane2007criterion,
## title = {The AIC criterion and symmetrizing the Kullback-Leibler
## divergence},
## author = {Abd-Krim Seghouane and Shun-Ichi Amari},
## year = {2007},
## journal = {IEEE transactions on neural networks / a publication of the
## IEEE Neural Networks Council},
## volume = {18},
## number = {1},
## pages = {97-106},
## eprint = {17278464},
## doi = {10.1109/TNN.2006.882813},
## eprinttype = {pubmed},
## score = {20610997},
## pmidrelated = {24376287},
## }
BIBTEX file of references to RJC papers from Google Scholar and search for PubMed ID’s for
the first ten entries. If the search is successful and an ID is found, the corresponding entry is
updated so that the ‘eprinttype’ field is assigned the value “pubmed” and the ‘eprint’ field is
assigned the ID.
Finally, the GetPubMedByID function uses Entrez’s Efetch to obtain bibliography data given a
vector of PubMed ID’s. The just obtained PubMed IDs can be used to get the BIBTEX entry
from Entrez and compare it with the one already in our bibliography from Google Scholar.
GetPubMedByID(unlist(bib$eprint)[1L])
If one wishes to use other NCBI E-Utilities and does not wish to work with BibEntry or
bibentry objects, see the rentrez package (Winter 2012).
3.2. Zotero
Zotero is free, open source software for collecting and sharing bibliographic information.
Zotero can automatically retrieve bibliographic metadata that has been embedded in web-
pages using ContextObjects in Spans (COinS), and is thus a very convenient way to collect
bibliographic information when browsing, for example, journal websites. The RefManageR
package contains functions for querying existing Zotero libraries and converting the results
to a BibEntry object and also for uploaded an existing BibEntry object to a Zotero library.
To use the Zotero API, one needs a Zotero account, a userID and an API key for the library
one wishes to access. The userID and API key for personal libraries may be found by logging
in and visiting the page https://www.zotero.org/settings/keys. The following call to
ReadZotero searches for the first two references with the word ‘Bayesian’ in the title contained
in the library specified by the ‘key’ parameter.
User Manual for R package RefManageR 15
The function also stores the number of citations of each result. Each BibEntry will store the
number of citations in a field ’cites’, which is ignored when generating a bibliography by
BIBLATEX or BIBTEX without additional effort to handle a custom entry field. The following
code will obtain the second author’s three most cited works according to Google Scholar and
prints the citation count and entry type for each entry.
cbind(rjc.bib$cites, rjc.bib$bibtype)
## [,1] [,2]
## carroll2012measurement "2495" "Book"
## ruppert2003semiparametric "1931" "Book"
## carroll1988transformation "1416" "Book"
A shortcoming of this approach, is that long author lists, long titles, or long journal/publisher
info can all lead to incomplete information being returned for those fields for the offending
entries. In this case, the ReadGS function will either not include entry or provide a add the
entry with a warning depending on the value of the check.entries argument.
length(rjc.bib) == length(rjc.bib2)
## [1] FALSE
## the offending entry. RJC is missing because list of authors was too long
print(rjc.bib2[title='dietary measurement error'],
.opts = list(max.names = 99, bib.style = 'alphabetic'))
3.4. CrossRef
The function ReadCrossRef uses the CrossRef Metadata Search API (http://search.crossref.
org/help/api) to import references based on a search of CrossRef’s nearly 60 million records.
Given a search term and possibly a search year, the function receives BIBTEX entries as JSON
objects using the RJSONIO package (Lang 2013b), which are saved to a temporary file and
then read back into R using the ReadBib function to be returned as a BibEntry object.
Although false negatives are rare, the CrossRef Metadata Search can be prone to false
positives. For this reason, it is important to specify the min.relevance argument. Each
reference returned by CrossRef comes with a relevancy score which is CrossRef’s determination
of how likely the reference is to be a match for the supplied query. The maximum possible
value is 100, so for the most strict possible matching, specify min.relevance = 100. If the
argument verbose is TRUE, then a message is printed with the relevancy score and full citation
for each reference with a relevancy score greater than min.reference in addition to returning
the references in a BibEntry object.
4.1. Printing
A number of BIBLATEX bibliography styles are available in RefManageR for formatting and
displaying citations. The styles currently implemented are “numeric” (the default), “authorti-
tle”, “authoryear”, “alphabetic”, and “draft”. The “authoryear” style always begins with the
family name of the first author and follows the list of authors with the year of publication in
parentheses. The other four styles all use the same format, differing only in the label they print
before each entry. Style “numeric” prints the numeric index of each entry in the bibliography,
style “authortitle” uses no label, style “alphabetic” creates a label using the family names of
the authors and the last two digits of the publication year, and style “draft” uses the entry
key as the label.
Entries may be printed as plain text, HTML, BIBTEX format, BIBLATEX format, as R code,
Markdown, or as a mixture of BIBTEX and plain text commonly used for citations. For an
example of the “authoryear” style
The package has a number of options similar to those available in BIBLATEX, including dashed
to control the use of dashes for duplicate authors as in the above example, max.names to
control the number of names in name list fields that will be printed before they are truncated
with “et al.”, and first.inits to control whether given names are truncated to first initials
or full names are used. These options can be set using the BibOptions function or passed
as options to the .opts argument of the print method. There is also a package option,
no.print.fields for supressing the printing of certain fields.
## <p><cite>Spiegelberg, H.
## ““Intention” und “Intentionalität” in
## der Scholastik, bei Brentano und Husserl”.
## In: <EM>Studia Philosophica</EM> 29 (1969), pp. 189-216.</cite></p>
The user can create a custom BIBLATEX or BIBTEX bibliography style using the bibstyle
fuction in the tools package. To do this involves creating an environment containing functions
for formatting entries of each type with signatures such as formatArticle(paper) and
formatBook(paper).
A downside of BIBLATEX is that the majority of academic journals do not support its use,
having long ago written a custom bst file for generating citations which can only be used
by BIBTEX. For this reason RefManageR provides a toBibtex method returning a character
vector with entries converted from BIBLATEX to BIBTEX format. Entries of a type that are
not supported by BIBTEX will be converted to a type that is, e.g., entries of type ‘report’ are
converted to type ‘techreport’. Other conversions include replacing the ‘date’ field with a
properly formatted ‘year’ field (if year is not already present) and converting the ‘journaltitle’
field to ‘journal’. Since the cross-referencing system in BIBTEX is more limited than the one
supported by BIBLATEX, an attempt is made to ensure the cross-referencing will still work as
20 Straightforward Bibliography Managament in R Using the RefManageR Package
expected in BIBTEX. All fields not normally supported by BIBTEX are dropped unless they are
specified in the argument extra.fields. The argument note.replace.field can be used to
specify fields to add to the ‘note’ field in entries that are missing it. As already demonstrated,
the toBiblatex function will convert a BibEntry object to a character vector contains lines
of the corresponding BIBLATEX-formatted bibliography. No fields are converted or dropped by
this function; in this way it is very similar to the toBibtex method for bibentry objects.
## @Thesis{schieplthesis,
## date = {2011-03-17},
## url = {http://edoc.ub.uni-muenchen.de/13028/},
## urldate = {2014-03-06},
## title = {Bayesian Regularization and Model Choice for Structured Additive
## Regression},
## type = {phdthesis},
## institution = {LMU Munich},
## author = {Fabian Scheipl},
## }
## @PhdThesis{schieplthesis,
## url = {http://edoc.ub.uni-muenchen.de/13028/},
## title = {Bayesian Regularization and Model Choice for Structured Additive
## Regression},
## author = {Fabian Scheipl},
## year = {2011},
## month = {mar},
## school = {LMU Munich},
## note = {Last visited on 03/06/2014},
## }
The function WriteBib, based on the function write.bib in the package bibtex (Francois
2013), is provided for writing a BibEntry object to a bib file in BIBLATEX or BIBTEX format
using toBiblatex and toBibtex, respectively, depending on the value of the biblatex logical
argument to WriteBib. In the next example I write the previous thesis reference to a file
in BIBTEX format, and for demonstration purposes only, read back in the .bib file using
read.bib in package bibtex so that a bibentry object is created instead of a BibEntry one.
User Manual for R package RefManageR 21
unlink(tmpfile)
4.2. Sorting
Nine different methods are available for sorting citations stored in a BibEntry object, cor-
responding to the ones predefined in BIBLATEX. Depending on the bib.style option, the
default sorting method is “nty” to sort by ‘name’ (‘n’), then ‘title’ (‘t’), then ‘year’/‘date’
(‘y’). Other possibilities are “debug” to sort by ‘key’, “none” for no sorting, “nyt”, “nyvt”,
“anyt”, “anyvt”, “ynt”, and “ydnt”; where the ‘a’ stands for sorting by alphabetic label, ‘v’
stands for sorting by ‘volume’, and ‘yd’ for sorting by ‘year’/‘date’ in descending order.
All sorting methods first consider the field ‘presort’, if available. Entries with no ‘presort’ field
are assigned ‘presort’ value “mm”. Next the ‘sortkey’ field is used. When sorting by name,
the ‘sortname’ field is used first. If it is not present, the ‘author’ field is used, if that is not
present ‘editor’ is used, and if that is not present ‘translator’ is used. When sorting by ‘title’,
first ‘sorttitle’ is considered. Similarly, when sorting by ‘year’, ‘sortyear’ is first considered.
When sorting by ‘volume’, if the field is present, it is padded to four digits with leading zeros;
otherwise, the string “0000” is used. When sorting by alphabetic label, first ‘shorthand’ is
considered, then ‘label’, then ‘shortauthor’, ‘shorteditor’, ‘author’, ‘editor’, and ‘translator’.
Refer to Lehman et al. (Sections 3.1.2.1 and 3.5 and Appendix C.2 2013) for further details.
a .bib file, e.g., “Doe, Jr., John and Jane {Doe Smith}”. Names can be matched based
on family names only, by family name and given name initials, or by full name, depending on
the value of the option match.author.
Entries containing valid crossref and xdata fields are expanded prior to searching, so that
when a match is found for a field and value that a child entry inherits from its parent, the
result is both the parent and child being returned. If a match is found in a child entry and not
in the parent, only the child entry is returned, but the returned entry will contain any fields
it inherits from its parent. Any xdata entries that the child references will also be returned.
Examples follow.
# no match with parent entry, the returned child has inherited fields
bib[author = "westfahl"]
##
## [4] Homer. _Die Ilias_. Trans. by W. Schadewaldt. With an intro.
## by J. Latacz. 3rd ed. Düsseldorf and Zürich: Artemis \& Winkler,
## 2004.
length(bib[author = "!knuth"])
## [1] 85
The list extraction operator, `[[`, is used for extacting BibEntry objects by position (an
integer) or the entry key (a string). Unlike the default operator, a vector of indices may be
given to extract more than one entry at a time.
As with bibentry objects, the `$` operator for BibEntry objects is used to return a list
containing the value of a particular field for all entries, with a value of NULL returned for
entries that do not have the specified field. A list of all entry types or keys for the BibEntry
object, bib, can be obtained using bib$bibtype and bib$key, respectively.
## [1] FALSE
## $z2010oracle
## [1] "J G M AR TI NE Z" "R J C AR RO LL"
##
## $caroll2006measurement
## [1] "R J Caroll" "D Ruppert" "L A Stefanski" "C M Crainiceanu"
##
## $ll1996measurement
## [1] "R J C AR RO LL"
##
## $wu1989estimation
User Manual for R package RefManageR 25
Clearly, one paper is incorrectly attributed to RJC and the other four have spelling errors.
We thus drop that entry and correct the spelling on the other four entries.
## [1] TRUE
I can update different fields of multiple entries using the operator `[<-` as follows.
Notice that I set sorting = "none" above. Sorting of the entries is done by default when
printing, and after sorting the print order of entries is unlikely to correspond to the index
order in the BibEntry object.
A BibEntry object may be used as the replacement value. A field may be removed by specifying
its value be set to the empty string ”.
5.3. Merging
The combine function, c, is available for concatenating multiple BibEntry objects, and has
been inherited from the bibentry class. Of course, this does not perform any checking for
duplicate entries. For this, there is the base package generics anyDuplicated, duplicated,
and unique, which check vectors for duplicate elements. However, if BibEntry objects have
been compiled from a number of different sources, these functions may be too strict, declaring
entries distinct even if only one field has a small difference between the two entries. For
this reason, an additional operator ’+’ is supplied along with a wrapper function merge,
that compares entries only based on the fields specified by the user. Given BibEntry objects
bib1 and bib2, bib1 + bib2 will return bib1 appended with all entries of bib2 that have
been determined not be duplicates of entries already in bib1 by comparing all fields in
BibOptions()$merge.fields.to.check, which can include bibtype and key. The function
also checks if there are any duplicate keys in the result, and will force them to be unique if
duplicates are detected using make.unique.
Citet(bib, "loh") produces Loh (1992), a "textual" citation using an entry key. It is possible
to cite in parentheses by ‘year’ using Citep(bib, year = "1899", .opts = list(cite.style
= "alphabetic")) [Wil99]. Next, three works by Averroes are cited AutoCite(bib, author
= "averroes", .opts = list(super = TRUE, cite.style = "numeric")) [1;2;3] . There
is some support for resolving ambiguous citations; consider Citet(bib, author = "Baez")
Baez and Lauda (2004a); Baez and Lauda (2004b). Finally, the bibliography is printed using
PrintBibliography.
Typically, when using knitr, one would load RefManageR, load the bibliography, and set
package options in a chunk at the start of the document using option include = FALSE and
then include citations and print the bibliography with options echo = FALSE and results
= "asis". To see demonstrations of these functions use in RMarkdown and RHTML documents
and the hyperlinking features, see the package vignettes as well as the examples at ?Cite.
User Manual for R package RefManageR 29
7. Conclusion
The RefManageR package provides R with considerable extra resources for working with
bibliographic data; alleviating much of the difficulty of managing references from several
different sources. Functions have been introduced for importing references from a number of
online resources and additionally for conveniently editing entries and creating new ones. By
implementing many of the features of BIBLATEX, several shortcomings of working with BIBTEX
format are removed. Conversion between different formats, bibliography styles, and between
BIBLATEX and BIBTEX is made easy with the package. The user is able to be less dependent on
remembering entry keys when writing a document and is able to make complicated searches
using a simple syntax with the `[` operator. As more and more researchers become aware of
the benefits of working with Markdown, the citation, hyperlinking, and printing capabilities of
RefManageR will be a useful tool.
Future work on RefManageR will include allowing for additional citation and bibliography
styles, making it easier for users to define custom styles, and creating support for Pandoc
(http://johnmacfarlane.net/pandoc/) style citations. Additionally, more work may be
needed to ensure that searching and merging can be done very quickly for extremely large
bibliographies for certain applications. I also wish to explore creating a revamped version of
the citEntry function in package utils to allow package developers to include citations in
BIBLATEX format in their packages.
Acknowledgements
The author was supported in part by a postdoctoral award from the Texas A&M Institute for
Applied Mathematics and Computational Science, and in part by a grant from the National
Cancer Institute (R37-CA057030, R. J. Carroll, P.I.). He would also like to thank R. J.
Carroll for helpful comments on the manuscript and for having so many articles to reference
in examples.
References
Francois R (2013). bibtex: bibtex parser. R package version 0.3-6, URL http://CRAN.
R-project.org/package=bibtex.
Grolemund G, Wickham H (2011). “Dates and Times Made Easy with lubridate.” Journal of
Statistical Software, 40(3), 1–25. URL http://www.jstatsoft.org/v40/i03/.
Hornik K, Murdoch D, Zeileis A (2012). “Who Did What? The Roles of R Package Authors
and How to Refer to Them.” The R Journal, 4(1). URL http://journal.r-project.org/
archive/2012-1/RJournal_2012-1.pdf.
30 Straightforward Bibliography Managament in R Using the RefManageR Package
Keirstead J (2013). scholar: Analyse citation data from Google Scholar. R package version
0.1.1, URL http://CRAN.R-project.org/package=scholar.
Lang DT (2013a). RCurl: General network (HTTP/FTP/...) client interface for R. R package
version 1.95-4.1, URL http://CRAN.R-project.org/package=RCurl.
Lehman P, Kime P, Boruvka A, Wright J (2013). The biblatex Package. URL http://ctan.
mirrorcatalogs.com/macros/latex/contrib/biblatex/doc/biblatex.pdf.
R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation
for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Affiliation:
Mathew W. McLean
Institute for Applied Mathematics and Computational Science
Texas A&M University
3143 TAMU
College Station, TX, 77843
E-mail: [email protected]
URL: http://stat.tamu.edu/~mmclean