tips: manipulating dataset

> mtcars %>% count(carb)
  carb  n
1    1  7
2    2 10
3    3  3
4    4 10
5    6  1
6    8  1

Function count, count the numbers per group we give.

Function map, map: Apply a function to each element of a vector

.x  A list or atomic vector.

mtcars %>%
    split(.$cyl) %>%
    map(~ lm(mpg ~ wt, data = .x)) 

~ : means to use map in a formula for each cyl dataset (4,6,8)

> mtcars %>% count(cyl) %>% mutate(n_samples = n %>% map(~ seq(1, .x, length.out = 10) %>% round()))
  cyl  n                        n_samples
1   4 11   1, 2, 3, 4, 5, 7, 8, 9, 10, 11
2   6  7     1, 2, 2, 3, 4, 4, 5, 6, 6, 7
3   8 14 1, 2, 4, 5, 7, 8, 10, 11, 13, 14

Here it means that I calculate the different sample size for cyl 4,6,8 and I evenly divide the samples into 10 parts. and make it as a new column n_samples. 

If I change the length.out to 2, result as follows:

> mtcars %>% count(cyl) %>% mutate(n_samples = n %>% map(~ seq(1, .x, length.out = 2) %>% round()))
  cyl  n n_samples
1   4 11     1, 11
2   6  7      1, 7
3   8 14     1, 14

Function unset

unset the multiple elements in a line

> mtcars %>% count(cyl) %>% mutate(n_samples = n %>% map(~ seq(1, .x, length.out = 2) %>% round())) %>% select(cyl, n_samples) %>% unnest(n_samples)
# A tibble: 6 × 2
    cyl n_samples
  <dbl>     <dbl>
1     4         1
2     4        11
3     6         1
4     6         7
5     8         1
6     8        14

expand_grid

Loops over the last argument, then the second-last, and so on. It should be faster than expand.grid.

> expand.grid(x = c("a", "b", "c"), y = c(1, 2), z = c(TRUE, FALSE))
   x y     z
1  a 1  TRUE
2  b 1  TRUE
3  c 1  TRUE
4  a 2  TRUE
5  b 2  TRUE
6  c 2  TRUE
7  a 1 FALSE
8  b 1 FALSE
9  c 1 FALSE
10 a 2 FALSE
11 b 2 FALSE
12 c 2 FALSE
> mtcars %>% count(cyl) %>% mutate(n_samples = n %>% map(~ seq(1, .x, length.out = 2) %>% round())) %>% select(cyl, n_samples) %>% unnest(n_samples) %>% expand_grid(trail = seq(100))
# A tibble: 600 × 3
     cyl n_samples trail
   <dbl>     <dbl> <int>
 1     4         1     1
 2     4         1     2
 3     4         1     3
 4     4         1     4
 5     4         1     5
 6     4         1     6
 7     4         1     7
 8     4         1     8
 9     4         1     9
10     4         1    10
# … with 590 more rows

Function map2 

> x <- list(1, 1, 1)
> y <- list(10, 20, 30)
> x
[[1]]
[1] 1

[[2]]
[1] 1

[[3]]
[1] 1

> y
[[1]]
[1] 10

[[2]]
[1] 20

[[3]]
[1] 30

> map2(x, y, `+`)
[[1]]
[1] 11

[[2]]
[1] 21

[[3]]
[1] 31
.x, .y

Vectors of the same length. A vector of length 1 will be recycled.

pull: Column selection

mtcars %>% pull(mpg)
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4

map_dbl

> map_dbl(mtcars, mean)
       mpg        cyl       disp         hp       drat         wt       qsec         vs         am       gear       carb
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750   0.437500   0.406250   3.687500   2.812500

map returns list, but map_dbl returns a double vector.

eg. 9 Functionals | Advanced R

# map_chr() always returns a character vector
map_chr(mtcars, typeof)
#>      mpg      cyl     disp       hp     drat       wt     qsec       vs 
#> "double" "double" "double" "double" "double" "double" "double" "double" 
#>       am     gear     carb 
#> "double" "double" "double"

# map_lgl() always returns a logical vector
map_lgl(mtcars, is.double)
#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
#> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

# map_int() always returns a integer vector
n_unique <- function(x) length(unique(x))
map_int(mtcars, n_unique)
#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
#>   25    3   27   22   22   29   30    2    2    3    6

# map_dbl() always returns a double vector
map_dbl(mtcars, mean)
#>     mpg     cyl    disp      hp    drat      wt    qsec      vs      am    gear 
#>  20.091   6.188 230.722 146.688   3.597   3.217  17.849   0.438   0.406   3.688 
#>    carb 
#>   2.812

 First function 

first function - RDocumentation

first(mtcars$mpg)
[1] 21

otu_table: Build or access the otu_table

taxa_are_rows (Conditionally optional). Logical; of length 1. Ignored unless object is a matrix, in which case it is is required

estimate_richness: Summarize alpha diversity

(Optional). Default is NULL, meaning that all available alpha-diversity measures will be included. Alternatively, you can specify one or more measures as a character vector of measure names. Values must be among those supported: c("Observed", "Chao1", "ACE", "Shannon", "Simpson", "InvSimpson", "Fisher").

Ref: 

unnest function - RDocumentation

map function - RDocumentation

map2: Map over multiple inputs simultaneously. in purrr: Functional Programming Tools

estimate_richness function - RDocumentation

otu_table function - RDocumentation

set Chinese input 

language support - How do I get Chinese input to work? - Ask Ubuntu

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值