Skip to Content
Data Algorithms
book

Data Algorithms

by Mahmoud Parsian
July 2015
Intermediate to advanced
778 pages
17h 9m
English
O'Reilly Media, Inc.
Content preview from Data Algorithms

Chapter 22. The T-Test

The t-test (also known as the two-sample t-test) is used in clinical applications and genome analysis to test statistical hypotheses. The t-test for independent samples compares the means (μ, also known as the average) of two samples. In statistics, to compare two data sets, we convert the data to a simpler form, such as the means of the data, and then compute and compare the means. Since we are comparing random samples, there is room for random errors (usually denoted by the sample’s standard deviation, 𝜎). The standard deviation equation for a population of N samples is defined as:

sigma equals StartRoot StartFraction sigma-summation Underscript i equals 1 Overscript upper N Endscripts left-parenthesis upper X Subscript i Baseline minus mu right-parenthesis squared Over upper N EndFraction EndRoot

where:

  • 𝜎 = the standard deviation
  • Xi = ith value in the population
  • 𝜇 = the mean of the values in the population

In factoring a random error, therefore, we might be comparing μ ± σ. According to Sarah Boslaugh’s book Statistics in a Nutshell (O’Reilly), “The purpose of [the t-test] is to determine whether the means of the populations from which the samples were drawn are the same. The subjects in the two samples are assumed to be unrelated and to have been independently selected from their populations.”

This chapter will provide MapReduce/Hadoop and Spark solutions for the t-test. The MapReduce algorithm presented here is generic and can be used for any high volume of data.

Performing the T-Test on Biosets

In genome analysis and especially in somatic mutations, the t-test ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

You might also like

Deciphering Data Architectures

Deciphering Data Architectures

James Serra
Grokking Algorithms

Grokking Algorithms

Aditya Bhargava

Publisher Resources

ISBN: 9781491906170Errata Page