0% found this document useful (0 votes)
48 views2 pages

Getting Started Data Tidying Basic Summary Statistics: Function Usage

This cheat sheet summarizes functions in the fasstr package for analyzing streamflow data in R. It is divided into sections for tidying data, calculating summary statistics, computing cumulative statistics, and visualizing results. The tidying section includes functions for filling missing dates, adding date variables and identifiers. The basic summary section calculates mean, median, maximum and percentiles for daily, monthly and annual time periods. The cumulative section totals flows by year or month. Annual statistics functions compute timing of flows and low flow periods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views2 pages

Getting Started Data Tidying Basic Summary Statistics: Function Usage

This cheat sheet summarizes functions in the fasstr package for analyzing streamflow data in R. It is divided into sections for tidying data, calculating summary statistics, computing cumulative statistics, and visualizing results. The tidying section includes functions for filling missing dates, adding date variables and identifiers. The basic summary section calculates mean, median, maximum and percentiles for daily, monthly and annual time periods. The cumulative section totals flows by year or month. Annual statistics functions compute timing of flows and low flow periods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Streamflow data analysis with fasstr : : CHEAT SHEET

Getting Started Data Tidying Basic Summary Statistics


fasstr, Flow Analysis Summary Statistics Tool for R, is a These functions add rows and columns to daily streamflow These functions calculate and plot the mean, median, maximum, minimum, and selected percentiles using
package for tidying, summarizing, performing hydrologic data frames to prepare for custom analyses. the ‘percentiles’ argument. Can select duration of statistics (ex. 7-day) using ‘roll_days’ and ‘roll_align’ arguments.
analyses, and visualizing daily streamflow data.
fill_missing_dates() calc_annual_stats()
Install fasstr from CRAN using : calc_longterm_daily_stats()
Fill dates with missing flow values with NA. plot_annual_stats()
install.packages("fasstr") plot_longterm_daily_stats()
add_date_variables(water_year_start = 1) Statistics for each year. calc_longterm_monthly_stats()
To use the station_number function argument, a Water Add ‘Year’, ‘Month’, ‘MonthName’, ‘WaterYear’ ‘DayofYear’ plot_longterm_monthly_stats()
Survey of Canada HYDAT database must be downloaded columns. ‘WaterYear’ and ‘DayofYear’ adjust to the selected calc_daily_stats() Statistics for all daily and monthly data
using: year start with water_year_start argument. plot_daily_stats(add_year) for each month over all years.
tidyhydat::download_hydat() add_seasons(seasons_length) Statistics for each day of the year over
all years. calc_monthly_stats()
Adds column of season identifiers called ‘Season’ with the
Function Usage plot_monthly_stats()
length of seasons in months chosen with seasons_ length
fasstr functions can be generally categorized into the Statistics for each month of each
argument, seasons start in first month of year.
following groups:
add_rolling_means(roll_days, roll_align) Cumulative Statistics year.
• Tidying - preparing data for analyses; add_* and fill_*
functions.
Add columns of rolling daily flow means (ex. 7-day means).
add_basin_area(basin_area)
These functions calculate and plot the total flows for years by
volume (m3) or by area-based yield (mm) using the
Annual Statistics
• Screening - to look for outliers and missing data; screen_* ‘use_yield’ and ‘basin_area’ area arguments. These functions calculate and plot various annual statistics
Add a basin area column, in square kilometres. See basin_area
functions. beyond the basic summary statistics.
argument on reverse of cheat sheet.
• Calculating summary statistics - long-term, annual, calc_annual_cumulative_stats(
monthly and daily statistics; calc_* functions. add_daily_volume() include_seasons = TRUE)
calc_annual_flow_timing(
Add daily volumetric flows, converted from daily mean to cubic plot_annual_cumulative_stats(
• Visualizing summary statistics - plotting the various percent_total = c(25,33.3,25,75))
metres. include_seasons = TRUE)
statistics; plot_* functions. plot_annual_flow_timing(
add_daily_yield(basin_area) Total annual cumulative discharge for percent_total = c(25,33.3,25,75))
• Computing analyses - volume frequency analyses and each year (option to include seasonal
Add daily yields, converted from daily mean to millimetres Calculate the day of year when
trending; compute_* functions. totals).
based on upstream basin area. portions of total annual flows have
• Writing data and plots - to save your data and plots;
add_cumulative_volume() calc_monthly_cumulative_stats() occurred (ex timing of half flows).
write_* functions.
Add daily cumulative volumetric flows on an annual basis, in plot_monthly_cumulative_stats( calc_annual_lowflows(
Getting Data cubic metres. add_year) roll_days = c(1,3,7,30))
There are two argument options in most functions to choose Cumulative monthly statistics for each plot_annual_lowflows(
add_cumulative_yield()
a data source: month over all years. roll_days = c(1,3,7,30))
Add daily cumulative runoff yield flows on an annual basis, in
1. data Data frame of daily data with dates (YYYY-MM-DD), millimetres based on upstream basin area. calc_daily_cumulative_stats() Calculate the values and day of
flow values, and optional groupings. ‘data’ is the first plot_daily_cumulative_stats( occurrence for annual minimum flow
argument listed to allow for piping (%>%). Arguments for add_year) values. Multiple ‘roll_days’ allowed.
selecting columns in data data frame: Cumulative daily statistics for each day calc_annual_outside_normal(
• dates Dates column, default ‘Date’.
• values Flow values column, default ‘Value’.
Data Screening of year over all years normal_percentiles = c(25,75))
plot_annual_outside_normal(
These functions calculate and plot statistics to screen data
• groups Groupings columns (optional), default
‘STATION_NUMBER’. for outliers, gaps, and missing dates. Long-term Statistics normal_percentiles = c(25,75))
Calculate the number of days per year
2. station_number Extracts daily data from a HYDAT These functions calculate and plot various long-term that occur above or below "normal",
screen_flow_data() statistics outside of the basic summary statistics.
database using a vector of HYDAT station numbers (ex. "normal" period based on values
Calculate annual mean, maximum, minimum, standard
‘08NM116’ or c(’08NM116’, ’08FA002’)); downloaded HYDAT provided.
deviation, and missing dates. plot_flow_duration()
required. plot_annual_means()
plot_data_screening() Plot flow durations curves for each
Example data with default column names: month and annually over all years. Plot annual mean flows with the x-axis
Plot annual mean, maximum,
STATION_NUMBER Date Value centred on the long-term mean
minimum, and standard
08NM116 1987-04-06 6.230 calc_longterm_mean(percent_MAD) calc_annual_peaks()
deviation.
08NM116 1987-04-07 6.440 Calculate the mean discharge over all years with options to Calculate the values and day of occurrence for annual n-day
plot_missing_dates() include percentages of the long-term mean. (using ‘roll_days’) minimum and maximum flow values.
Function Outputs Plot the number of missing dates calc_all_annual_stats(annual_percentiles = c(10,90),
for each month and year. calc_longterm_percentile(percentiles)
All outputs from are one, or lists, of the following: monthly_percentiles = c(10,20), stats_days = 1, lowflow_days
Calculate percentile flow values over all years.
= c(1,3,7,30), timing_percent = c(25,33.3,50,75),
• All data tables / data frames produced as tibbles. plot_flow_data(plot_by_year =
calc_flow_percentile(flow_value) normal_percentiles = c(25,75))
• All plots are produced as lists of ggplot2 objects. FALSE, one_plot = TRUE)
Calculate the percentile rank of a specific flow value from Calculate all statistics from all calc_annual_* and
Plot the daily mean data set.
flows over all years. calc_monthly_stats() functions.

Learn more at https://bcgov.github.io/fasstr/ • fasstr 0.3.2 • Updated: 2020-11


Arguments and Options Writing Functions Volume Frequency Analyses Computing Full Analyses
These arguments are used to customize many of the These functions help save the outputted objects (tibbles and These functions compute and plot volume frequency These functions calculate a suite of data and plots from
functions. Not all are listed; see function documentation for lists of plots) from the fasstr functions. analyses on annual low or high streamflow data. many of the fasstr functions.
more specific argument information.
write_flow_data() These functions perform volume frequency analyses by fitting These functions calculate many of the data and plot
Date Filtering and Options Write a streamflow dataset as a .xlsx, .xls, or .csv file. Can extract annual minimums or maximums to Log-Pearson Type III or analyses from the fasstr functions, producing tables and
ignore_missing Logical value indicating whether dates with Weibull probability distributions. These functions plot plots organized by analysis types. See the function
and save HYDAT data with this function.
missing values should be included in the analysis. If TRUE then probabilities of data using chosen plotting methods and documentation for more information.
a statistic will be calculated regardless of missing dates. If write_results(digits = 10)
calculates frequency quantiles (ex. 7Q10) based on fitting
FALSE then only statistics with no missing dates will be Write a data frame as a .xlsx, .xls, or .csv file. Can save a data Functions
data to selected distributions and fitting methods. See
returned. frame and round digits of all numeric columns. compute_full_analysis()
function documentation for more information.
Computes a suite of analyses from fasstr functions and
water_year_start Numeric value indicating the starting write_plots(plots, folder_name, plot_filetype, combined_pdf)
Functions produces assorted tables and plots organized in lists
month (1 through 12) of years to filter/group data instead of Write plots from a list object into a directory or PDF document.
compute_annual_frequencies() grouped by time period and analysis type.
calendar years, designated by calendar year in which year By default will save all plots in a folder. To create a PDF of all
plots, set combined_pdf = TRUE. Annual frequency analysis from daily streamflow data; write_full_analysis()
ends; default 1.
calculates minimums or maximums of selected roll_days. Writes the compute_full_analysis() objects into an Excel
start_years and end_years Numeric values of the first and write_objects_list(list, folder_name, table_filetype,
compute_frequency_quantile() workbook and accompanying plot files.
last year to consider for analysis. Leave blank to include all plot_filetype)
Annual frequency analysis from daily streamflow data;
years of data provided. Write all tables and plots contained in a list object into a folder. Arguments
calculates minimums or maximums of selected roll_days and
Saves only data frames and ggplot2 objects. analyses Numeric vector of the analyses to include;
exclude_years Numeric vector of years to exclude from return_period. Quantile value is returned.
default is all (1:7). Include those analyses with which
analysis; ex. c(1991:1993, 1995). Leave blank to include all compute_hydat_peak_frequencies() statistics are desired: 1: Screening, 2: Long-term, 3: Annual,
years of data provided. Annual frequency analysis from instantaneous peak data 4: Monthly, 5: Daily, 6: Trending, 7: Low-flow Frequencies.
complete_years Logical value indicating whether to only Annual Trending Analysis (minimum or maximum) for stations from HYDAT. Data
Writing Arguments
include years with complete data in analysis. Only in selected selected using station_number argument.
This function computes and plots prewhitened, non- file_name Name of Excel workbook, and plots folder if
analyses; default FALSE. compute_frequency_analysis()
parametric annual trends on streamflow data. necessary, to save analysis results.
months Numeric vector of months to include in analysis; Conduct a frequency analysis with custom data.
default 1:12. This function calculates prewhitened, non-parametric annual Arguments Outputs
use_max Rank data from high to low rather than low to high $Screening List of table and plot objects to review and
trends using the ‘zyp’ package. It calculates various annual
Data Analysis Options (for peak analyses); default FALSE. screen data.
metrics using the calc_all_annual_stats() function and then
roll_days Numeric value (or values for some functions) of the calculates and plots the trends. See the zyp package, function use_log Log-transform event data before analysis; default $Longterm List of table and plot objects from long-term
number of days to apply a rolling mean; default 1. documentation, and the trending vignette for more information FALSE. statistics, including summary statistics and flow duration.
roll_align Character string identifying the direction of the on the analysis. prob_plot_positions Plotting positions used to plot the $Annual List of table and plot objects from annual
rolling mean from the specified date, either by the first ('left'), probabilities; ‘weibull’ (default), ‘hazen’, or ‘median’. statistics, including summary and cumulative statistics,
Function and other annual metrics.
last ('right), or middle ('center') day of the rolling n-day group compute_annual_trends() prob_scale_points Probabilities to be plotted on the x-axis;
of observations; default 'right'. default c(.9999, .999, .99, .9, .5, .2, .1, .02, .01, .001, .0001). $Monthly List of table and plot objects from monthly
Calculate prewhitened nonlinear annual trends on streamflow
statistics, including summary and cumulative statistics.
use_yield Logical value indicating to use area-based yield, in data. fit_distr Distribution used to fit the data; one of Log-Pearson
mm, instead of volumetric for cumulative analysis functions; Type III, ‘PIII’ (default), or Weibull, ‘weibull.’. $Daily List of table and plot objects from daily statistics,
Arguments including summary and cumulative statistics.
default FALSE. Requires basin_area. fit_dist_method Method used to fit the data to the
zyp_method Prewhitening method, either ‘yuepilon’ or $Trending List of table and plot objects from an annual
basin_area Drainage basin area, in square km, to use when ‘zhang’. See zyp methodology for more information. distribution; one of method of moments , ‘MOM’ (default), or
maximum likelihood estimation, ‘MLE’. trending analysis.
use_yield = TRUE. Three options: 1) leave blank if column of include_plots Logical value indicating if annual trending plots
fit_quantiles Quantiles to be estimated from the fitted $Lowflow_Frequencies List of table and plot objects from
HYDAT station numbers; 2) single numeric value to apply to all should be included. Default TRUE.
distributions (event probabilities); default c(.975, .99, .98, .95, a low-flow frequency analysis.
observations.; 3) list each basin area for each station
zyp_alpha Numeric value of the significance level (ex. 0.05) of .90, .80, .50, .20, .10, .05, .01). Writing Outputs
c("08NM116" = 795, "08NM242" = 10) to supply an area or
when to plot a trend line. Leave blank for no line. plot_curve Plot the computed curve on the plot; default
override the HYDAT supplied area. Excel ‘.xlsx’ workbook containing tables and plots from
Outputs TRUE. selected analyses, and a folder of plots if Daily and/or
percentiles Numeric vector of percentiles to calculate, ex.
c(5,25,75,95). Set to NA if none required. $Annual_Trends_Data A tibble of annual data from the Outputs Trending analyses are computed.
`calc_all_annual_stats()` function used for trending $Freq_Analysis_Data Tibble of computed or extracted data
Tibble Options
transpose Logical value indicating whether to transpose rows
$Annual_Trends_Results A tibble of annual trending results, used in analysis. Further Help
including significance, confidence intervals, trend values, etc. $Freq_Plot_Data Tibble of plotting coordinates used in the
and columns of results; default FALSE. There are several vignettes that provide more information
$Annual_* A ggplot2 object for frequency plot.
and examples of the many fasstr functions:
Plotting Options each annual statistic trended, with $Freq_Plot ggplot2 object
• Users Guide
log_discharge Logical value to indicate plotting the discharge the slope plotted if significance is of the frequency plot with return
periods and probabilities. • Trending Analysis Guide
axis on a logarithmic scale; default FALSE. greater than ‘zyp_alpha’ provided.
• Frequency Analysis Guide
include_title Logical value to indicate adding the $Freq_Fitting fitdisplus::fitdist objects of fitted distributions.
• Full Analysis Guide
group/station number to the plot, if provided. $Freq_Fitted_Quantiles Tibble of fitted quantiles with
• fasstr Internal Workflows
probabilities and return periods.
add_year Numeric value indicating a year of daily flows to See https://bcgov.github.io/fasstr/ or
add to the daily and long-term statistics plot. view them in R using browseVignettes("fasstr")

Learn more at https://bcgov.github.io/fasstr/ • fasstr 0.3.2 • Updated: 2020-11

You might also like