datatable

Uploaded by

seshu2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views2 pages

datatable

Uploaded by

seshu2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Transformation with data.

table : : CHEAT SHEET

Basics Manipulate columns with j Group according to by
data.table is an extremely fast and memory efficient package
for transforming data in R. It works by converting R’s native a a a dt[, j, by = .(a)] – group rows by
EXTRACT
data frame objects into data.tables with new and enhanced values in specified columns.
functionality. The basics of working with data.tables are: dt[, c(2)] – extract columns by number. Prefix
column numbers with “-” to drop. dt[, j, keyby = .(a)] – group and
dt[i, j, by] simultaneously sort rows by values
in specified columns.
Take data.table dt, b c b c dt[, .(b, c)] – extract columns by name.
subset rows using i COMMON GROUPED OPERATIONS
and manipulate columns with j,
grouped according to by. dt[, .(c = sum(b)), by = a] – summarize rows within groups.

data.tables are also data frames – functions that work with data SUMMARIZE dt[, c := sum(b), by = a] – create a new column and compute rows
frames therefore also work with data.tables. within groups.
a x dt[, .(x = sum(a))] – create a data.table with new
columns based on the summarized values of rows.
dt[, .SD[1], by = a] – extract first row of groups.

Create a data.table
[email protected]
Summary functions like mean(), median(), min(),
max(), etc. can be used to summarize rows. dt[, .SD[.N], by = a] – extract last row of groups.
IPFJ9Z8WAN
data.table(a = c(1, 2), b = c("a", "b")) – create a data.table from
scratch. Analogous to data.frame(). COMPUTE COLUMNS*
Chaining
setDT(df)* or as.data.table(df) – convert a data frame or a list to c dt[, c := 1 + 2] – compute a column based on
a data.table.
3 an expression. dt[…][…] – perform a sequence of data.table operations by
3
chaining multiple “[]”.

a a c dt[a == 1, c := 1 + 2] – compute a column

Subset rows using i 2
1
2
1
NA
3
based on an expression but only for a subset
of rows. Functions for data.tables
dt[1:2, ] – subset rows based on row numbers.
c d dt[, `:=`(c = 1 , d = 2)] – compute multiple
1 2 columns based on separate expressions. REORDER
1 2
a b a b setorder(dt, a, -b) – reorder a data.table
1 2 1 2 according to specified columns. Prefix column
a a dt[a > 5, ] – subset rows based on values in DELETE COLUMN 2 2 1 1 names with “-” for descending order.
1 1 2 2
2 6 one or more columns.
6 c dt[, c := NULL] – delete a column.
5

* SET FUNCTIONS AND :=

LOGICAL OPERATORS TO USE IN i CONVERT COLUMN TYPE data.table’s functions prefixed with “set” and the operator “:=”
work without “<-” to alter data without making copies in
< <= is.na() %in% | %like% b b dt[, b := as.integer(b)] – convert the type of a memory. E.g., the more efficient “setDT(df)” is analogous to
> >= !is.na() ! & %between% 1.5 1 column using as.integer(), as.numeric(), “df <- as.data.table(df)”.
2.6 2 as.character(), as.Date(), etc..

CC by
This file is meant for personal use BY SA Erik Petrovski • www.petrovski.dk
[email protected] • Learn more with the data.table homepage or vignette • data.table version 1.11.8 • Updated: 2019-01
only.
Sharing or publishing the contents in part or full is liable for legal action.
UNIQUE ROWS
unique(dt, by = c("a", "b")) – extract unique
BIND
Apply function to cols.
a b a b a b a b a b rbind(dt_a, dt_b) – combine rows of two
1 2 1 2 rows based on columns specified in “by”. + = data.tables.
2 2 2 2 Leave out “by” to use all columns. APPLY A FUNCTION TO MULTIPLE COLUMNS
1 2
a b a b dt[, lapply(.SD, mean), .SDcols = c("a", "b")] –
uniqueN(dt, by = c("a", "b")) – count the number of unique rows
1 4 2 5 apply a function – e.g. mean(), as.character(),
based on columns specified in “by”. a b x y a b x y cbind(dt_a, dt_b) – combine columns
2 5 which.max() – to columns specified in .SDcols
of two data.tables.
3 6 with lapply() and the .SD symbol. Also works
+ = with groups.
RENAME COLUMNS
a a a_m cols <- c("a")
a b x y setnames(dt, c("a", "b"), c("x", "y")) – rename 1 1 2 dt[, paste0(cols, "_m") := lapply(.SD, mean),
columns. .SDcols = cols] – apply a function to specified
Reshape a data.table
2 2 2
3 3 2 columns and assign the result with suffixed
variable names to the original data.
SET KEYS RESHAPE TO WIDE FORMAT
setkey(dt, a, b) – set keys to enable fast repeated lookup in
specified columns using “dt[.(value), ]” or for merging without id y a b id a_x a_z b_x b_z dcast(dt, Sequential rows
specifying merging columns using “dt_a[dt_b]”. A x 1 3 A 1 2 3 4 id ~ y,
A z 2 4 B 1 2 3 4
B x 1 3
value.var = c("a", "b")) ROW IDS
B z 2 4
dt[, c := 1:.N, by = b] – within groups, compute a
Combine data.tables
a b a b c
[email protected]
IPFJ9Z8WAN Reshape a data.table from long to wide format. 1 a 1 a 1 column with sequential row IDs.
2 a 2 a 2
dt A data.table. 3 b 3 b 1
JOIN id ~ y Formula with a LHS: ID columns containing IDs for
multiple entries. And a RHS: columns with values to
LAG & LEAD
a b x y a b x dt_a[dt_b, on = .(b = y)] – join spread in column headers.
1 c 3 b 3 b 3 data.tables on rows with equal values. value.var Columns containing values to fill into cells. dt[, c := shift(a, 1), by = b] – within groups,
2 a + 2 c = 1 c 2
a
1
b
a
a
1
b
a
c
NA duplicate a column with rows lagged by
3 b 1 a 2 a 1 2 a 2 a 1 specified amount.
RESHAPE TO LONG FORMAT 3 b 3 b NA
4 b 4 b 3
a b c x y z a b c x dt_a[dt_b, on = .(b = y, c > z)] – id a_x a_z b_x b_z id y a b melt(dt, 5 b 5 b 4 dt[, c := shift(a, 1, type = "lead"), by = b] –
1 c 7 3 b 4 3 b 4 3 join data.tables on rows with within groups, duplicate a column with rows
+ = id.vars = c("id"),
A 1 2 3 4 A 1 1 3
2 a 5 2 c 5 1 c 5 2 equal and unequal values. B 1 2 3 4 B 1 1 3 leading by specified amount.
3 b 6 1 a 8 NA a 8 1 A 2 2 4 measure.vars = patterns("^a", "^b"),
B 2 2 4 variable.name = "y",
value.name = c("a", "b"))
ROLLING JOIN read & write files
a id date b id date a id date b
Reshape a data.table from wide to long format.
1 A 01-01-2010 + 1 A 01-01-2013 = 2 A 01-01-2013 1 dt A data.table. IMPORT
2 A 01-01-2012 1 B 01-01-2013 2 B 01-01-2013 1 id.vars ID columns with IDs for multiple entries.
3 A 01-01-2014 measure.vars Columns containing values to fill into cells (often in fread("file.csv") – read data from a flat file such as .csv or .tsv into R.
1 B 01-01-2010
pattern form).
2 B 01-01-2012
variable.name, Names of new columns for variables and values fread("file.csv", select = c("a", "b")) – read specified columns from a
value.name derived from old headers. flat file into R.
dt_a[dt_b, on = .(id = id, date = date), roll = TRUE] – join
data.tables on matching rows in id columns but only keep the most
recent preceding match with the left data.table according to date
columns. “roll = -Inf” reverses direction. EXPORT
fwrite(dt, "file.csv") – write data to a flat file from R.

Hierarchical Clustering-Based Asset Allocation: Homas Affinot
No ratings yet
Hierarchical Clustering-Based Asset Allocation: Homas Affinot
11 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Table
No ratings yet
Data Table
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Enhanced Data
No ratings yet
Enhanced Data
12 pages
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
8 pages
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
8 pages
Faqs About The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Faqs About The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
21 pages
Datatable Cheat Sheet R
No ratings yet
Datatable Cheat Sheet R
1 page
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Presentation 1
No ratings yet
Presentation 1
34 pages
Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations
No ratings yet
Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations
1 page
Data - Table Tutorial (With 50 Examples) PDF
No ratings yet
Data - Table Tutorial (With 50 Examples) PDF
13 pages
M2_DAR_
No ratings yet
M2_DAR_
46 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
UL2
No ratings yet
UL2
2 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
fonction dplyr
No ratings yet
fonction dplyr
5 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
Base-R
No ratings yet
Base-R
9 pages
Data Wrangling
No ratings yet
Data Wrangling
12 pages
Solutions for QB3
No ratings yet
Solutions for QB3
14 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
Data Manipulation With R - 3
No ratings yet
Data Manipulation With R - 3
17 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
R-Cheat Sheet
100% (1)
R-Cheat Sheet
4 pages
R Command Cheatsheet2551545
No ratings yet
R Command Cheatsheet2551545
2 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
ISYS3447 - Week 3 Notes
No ratings yet
ISYS3447 - Week 3 Notes
3 pages
R-Basics.knit (1)
No ratings yet
R-Basics.knit (1)
13 pages
SAS R::: Cheat Sheet
No ratings yet
SAS R::: Cheat Sheet
2 pages
R Reference Card
100% (4)
R Reference Card
4 pages
R study material I
No ratings yet
R study material I
8 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
R_Vectors
No ratings yet
R_Vectors
22 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Chapter 1 Introduction To R
No ratings yet
Chapter 1 Introduction To R
33 pages
R Functions
No ratings yet
R Functions
8 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
R-Programming For Data Science
No ratings yet
R-Programming For Data Science
59 pages
Sas R
No ratings yet
Sas R
2 pages
First Course On R
No ratings yet
First Course On R
26 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet
Circular Webex - 0
No ratings yet
Circular Webex - 0
3 pages
Concrete Mix Design: STEP 1: Choice of Slump
No ratings yet
Concrete Mix Design: STEP 1: Choice of Slump
12 pages
Manual HQ 008 To 300 Series Installation Operation Manual Eim en 84508
No ratings yet
Manual HQ 008 To 300 Series Installation Operation Manual Eim en 84508
12 pages
Certificate of Insurance - 1714965157078
No ratings yet
Certificate of Insurance - 1714965157078
13 pages
Instruction Manual: High Speed Disc Centrifuge SPT 41-PO
No ratings yet
Instruction Manual: High Speed Disc Centrifuge SPT 41-PO
81 pages
Gates Na Tubing and Assemblies Brochure
No ratings yet
Gates Na Tubing and Assemblies Brochure
4 pages
LP Series: Service Manual
No ratings yet
LP Series: Service Manual
38 pages
MAT - Gr10 - Ch5 - Arithmetic Progression - MTAC - Q.40
No ratings yet
MAT - Gr10 - Ch5 - Arithmetic Progression - MTAC - Q.40
101 pages
Virtual Reality Class 7
No ratings yet
Virtual Reality Class 7
42 pages
Answers 2
No ratings yet
Answers 2
202 pages
Differentiation of y Ax N by The General Rule
No ratings yet
Differentiation of y Ax N by The General Rule
2 pages
Control Charts, Also Known As Shewhart Charts or Process-Behaviour Charts, in
No ratings yet
Control Charts, Also Known As Shewhart Charts or Process-Behaviour Charts, in
5 pages
Mivi Collar Classic Neckband With Fast Charging Bluetooth Headset
No ratings yet
Mivi Collar Classic Neckband With Fast Charging Bluetooth Headset
1 page
21CLD Learning Activity Rubrics
No ratings yet
21CLD Learning Activity Rubrics
44 pages
An Industry Vision For Offers and Orders: Airline Retailing
No ratings yet
An Industry Vision For Offers and Orders: Airline Retailing
26 pages
Opds
No ratings yet
Opds
3 pages
DBS211_Lab04_DDL_W24
No ratings yet
DBS211_Lab04_DDL_W24
4 pages
Week 06. Programming of Safety Critical Systems - MISRA-C
No ratings yet
Week 06. Programming of Safety Critical Systems - MISRA-C
32 pages
National Institute of Technology, Rourkela
No ratings yet
National Institute of Technology, Rourkela
2 pages
PDF of Tech Events
No ratings yet
PDF of Tech Events
3 pages
LTE Freq Hopping Jammer
No ratings yet
LTE Freq Hopping Jammer
69 pages
Exploring Li-Fi For IoT Advanced Audio Data Transfer
No ratings yet
Exploring Li-Fi For IoT Advanced Audio Data Transfer
9 pages
Plate Heat Exchanger Brochure English
No ratings yet
Plate Heat Exchanger Brochure English
3 pages
1 Eastron SDM630MCT Smart Meter Modbus Protocol Implementation V1.7 1.1 Modbus Protocol Overview
No ratings yet
1 Eastron SDM630MCT Smart Meter Modbus Protocol Implementation V1.7 1.1 Modbus Protocol Overview
24 pages
Digital Marketing Learning Material PDF
No ratings yet
Digital Marketing Learning Material PDF
14 pages
New Questions Set
67% (3)
New Questions Set
65 pages
Reconciliation SOP
No ratings yet
Reconciliation SOP
2 pages
Faq Recurring Bill Payments
No ratings yet
Faq Recurring Bill Payments
6 pages
HB Epoxy Zinc Rich Primer - TDS
No ratings yet
HB Epoxy Zinc Rich Primer - TDS
2 pages

datatable

Uploaded by

datatable

Uploaded by

Data Transformation with data.

table : : CHEAT SHEET

a a c dt[a == 1, c := 1 + 2] – compute a column

* SET FUNCTIONS AND :=

You might also like