MIT 201_Tutorial 02
MIT 201_Tutorial 02
1 Libraries in R
Libraries, also known as packages, in R are collections of functions, data, and documentation
that extend the functionality of base R. Libraries provide additional capabilities and tools for
data manipulation, analysis, visualization, and more. In R, you can install and load libraries to
access their functions and resources.
1.1 Installing Libraries
Before using a library, you need to install it on your system. This is typically done once using
the install.packages() function, specifying the name of the library you want to install.
Example:
# Installing a library
install.packages("ggplot2")
1.2 Loading Libraries
Once a library is installed, you can load it into your R session using
the library() or require() function. Loading a library makes its functions and resources
available for use.
Example:
# Loading a library
library(ggplot2)
1.3 Commonly Used Libraries
R has a vast ecosystem of libraries covering various domains. Here are some commonly used
libraries:
• ggplot2: A powerful library for data visualization, offering a wide range of high-quality
plotting capabilities.
• dplyr: A library for data manipulation and transformation, providing efficient and
intuitive functions for filtering, summarizing, and transforming data.
• tidyr: A library for data tidying, used for reshaping and restructuring data into a more
organized format.
• readr: A library for reading and writing structured data in various formats, such as CSV,
Excel, and more.
• caret: A library for machine learning and predictive modeling, offering tools for data
preprocessing, model training, and evaluation.
• stringr: A library for string manipulation, providing functions for pattern matching,
string extraction, and text manipulation.
• lubridate: A library for working with dates and times, offering convenient functions for
parsing, manipulating, and formatting date-time data.
• magrittr: A library for enhancing the readability and expressiveness of R code by
enabling the use of the pipe operator %>%.
• purrr: A library that enhances functional programming in R, providing a consistent and
intuitive syntax for working with lists and vectors.
• rmarkdown: A library for creating dynamic and reproducible reports and documents using
R Markdown, allowing integration of code, text, and visualizations.
1.4 Using Library Functions
Once a library is loaded, you can use its functions by calling them directly. Library functions
are typically preceded by the library name or package name, followed by the function name.
Example:
# Using a function from the ggplot2 library
ggplot(data = mydata, aes(x = Age, y = Salary)) + geom_point()
Libraries in R extend the capabilities of base R, providing additional functionality and tools
for various tasks. By leveraging the power of libraries, you can enhance your data analysis,
visualization, and programming workflows in R.
2 tidyverse
The tidyverse is a collection of R packages designed to enhance data manipulation, analysis,
and visualization. It follows the principles of tidy data and provides a consistent and intuitive
grammar for working with data. The tidyverse packages are designed to work together
seamlessly, allowing you to build efficient and readable data workflows. Let's explore the
tidyverse in more detail:
2.1 Core Packages
The tidyverse consists of several core packages that form the foundation of the ecosystem.
These packages include:
• ggplot2: A powerful package for data visualization, offering a layered grammar of
graphics that enables the creation of highly customizable and publication-quality plots.
• dplyr: A package for data manipulation and transformation, providing a set of intuitive
functions for filtering rows, selecting columns, mutating data, and summarizing data.
• tidyr: A package for data tidying, used to reshape and restructure data into a more
organized format, such as converting data from wide to long format or vice versa.
• readr: A package for reading and writing structured data in various formats, such as
CSV, Excel, and more. It provides fast and efficient functions for importing data into R.
• purrr: A package that enhances functional programming in R, providing a consistent and
intuitive syntax for working with lists and vectors.
2.2 Loading the tidyverse
To use the tidyverse packages, you can load them all at once using
the library(tidyverse) command. This will load the core packages and their dependencies
into your R session.
Example:
# Loading the tidyverse
library(tidyverse)
2.3 Tidy Data Principles
The tidyverse follows the principles of tidy data, which emphasize a standardized format for
data representation. Tidy data has the following characteristics:
• Each variable has its own column.
• Each observation has its own row.
• Each value has its own cell.
By adhering to these principles, the tidyverse simplifies data manipulation and analysis
workflows.
2.4 Tidyverse Workflow
The tidyverse promotes a consistent and intuitive workflow for working with data. It typically
involves a series of steps, such as importing data, tidying data, transforming and summarizing
data, and visualizing the results using ggplot2.
Example:
# Example tidyverse workflow
library(tidyverse)
# FTP example
url <- "ftp://example.com/data.csv"
data <- getURL(url)
4.5 Other Protocols and Libraries
R supports additional protocols and libraries for importing data from the internet. For
example, the curl package allows you to make HTTP requests and retrieve data.
The XML and jsonlite packages can handle XML and JSON data retrieved from web
services.
Example (using curl package to retrieve data using HTTP):
# Installing and loading the curl package
install.packages("curl")
library(curl)
# HTTP request example
url <- "https://api.example.com/data"
response <- curl_fetch_memory(url)
data <- fromJSON(response$content)
These are just a few examples of how to import data from the internet into R. R provides a
wide range of packages and functions to handle web-based data sources, allowing you to
efficiently retrieve and work with data from websites, APIs, FTP servers, and more.
7 Previewing data in R
Previewing data is an essential step in data analysis as it provides an initial understanding of
the dataset's structure and content. In R, there are several methods to preview data effectively.
The head() function allows you to quickly examine the first few rows of a dataset, giving you
a glimpse of the data's structure. Additionally, the View() function provides an interactive
spreadsheet-like view in RStudio, enabling you to explore the dataset more conveniently.
The str() function summarizes the structure of the data, displaying variable types and example
values. Furthermore, the summary() function provides summary statistics for each variable,
offering insights into the distribution and characteristics of the data. By utilizing these
previewing techniques in R, analysts can gain an initial understanding of the data, identify any
potential issues, and begin the process of data exploration and analysis.
The `file` parameter specifies the name and path of the output file, and th
e `row.names` parameter determines whether row names should be included in
the exported file.
The `write.xlsx()` function writes the data frame to an Excel file specifie
d by the `file` parameter.
The `write.table()` function can be used to export data to a TXT file, whil
e other packages like `jsonlite` and `xml2` provide functions specific to J
SON and XML file formats, respectively.
8.4 Customizing export options
Depending on the file format, you can customize various export options. For example, you
can specify the delimiter in a CSV file using the sep parameter, or include additional
parameters to control the output format.
# Export data to a CSV file with a semicolon delimiter
write.csv(data, file = "output.csv", row.names = FALSE, sep = ";")
By adjusting these export options, you can tailor the output to meet your s
pecific requirements.
The `png()` and `pdf()` functions specify the output file, and the `dev.off
()` function closes the device and saves the plot.
By utilizing these export methods in R, you can save your data and visualizations to different
file formats, making them accessible and shareable with others or for further analysis in
external applications.
In the first example, the "data" folder is located within the current worki
ng directory. In the second example, the absolute path specifies the full p
ath to the "data" folder.
These functions are useful for exploring and manipulating files and directo
ries within your working directory.
Customizing the working directory allows you to conveniently manage and access files in R.
By setting the appropriate working directory, you can easily read and write files, load data,
and save outputs, enhancing your workflow and efficiency in data analysis and project
management.