Basic Statistics

Random Variable
Sample
Distribution
Law of large numbers
Central limit theorem
Normal distribution
expected value
variance
Bernoulli trials
Binomial distribution

Random Variable

Random variable is a function

Sample

The Sample Mean

A random experiment that has a sample space and a probability measure P. Suppose that X is a real-valued random variable. We will denote the mean and standard deviation of X by mu and d respectively.

Now suppose we perform independent replications of the basic experiment. This defines a new, compound experiment with a sequence of independent random variables, each with the same distribution as X:

X1, X2, ...,

In statistical terms, (X1, X2, ..., Xn) is a random sample of size n from the distribution of X for each n. The sample mean is simply the average of the variables in the sample:

Mn = (X1 + X2 + . . . + Xn) / n.

The sample mean is a real-valued function of the random sample and thus is a statistic. Like any statistic, the sample mean is itself a random variable with a distribution, mean, and variance of its own. Many times, the distribution mean is unknown and the sample mean is used as an estimator of the distribution mean.

Properties of the Sample Mean

E(Mn) = mu

This shows that Mn is an unbiased estimator of mu. Therefore, the variance of the sample mean is the mean square error, when the sample mean is used as an estimator of the distribution mean.

var(Mn) = d^2 / n.

Distribution

Discrete Densities

A random experiment with sample space R, and probability measure P. A random variable X for the experiment that takes values in a countable set S is said to have a discrete distribution. The (discrete) probability density function of X is the function f from S to R defined by

f(x) = P(X = x) for x in S.

Basic Properties

f satisfies the following properties:

   1. f(x) >= 0 for x in S.
   2. [sum] x in S f(x) = 1
   3. [sum] x in A f(x) = P(X [in] A) for A subset of S

Property 3 is particularly important since it shows that the probability distribution of a discrete random variable is completely determined by its density function.

Law of Large numbers

The Weak Law of Large Numbers

because

var(Mn) = d^2 / n.

note that var(Mn) [converges to] 0 as n [converges to] [infinity] . This means that Mn [converges to] mu as n [converges to] [infinity] in mean square.
Note :converge in mean squre is E(|Xn - X|^k) [converges to] 0 as n [converges to] [infinity] .

then
P[|Mn - mu| > r] [converges to] 0 as n [converges to] [infinity] for any r > 0.
(use Chebyshev's inequality)

This result is known as the weak law of large numbers, and states that the sample mean converges to the mean of the distribution in probability. Recall that in general, convergence in mean square implies convergence in probability.

The Strong Law of Large Numbers

The strong law of large numbers states that the sample mean Mn converges to the distribution mean mu with probability 1:

P(Mn [converges to] mu as n [converges to] [infinity] ) = 1.

this is a much stronger result than the weak law.

Central Limit Theorem

links probability and statistics

Statement of the Theorem

Roughly, the central limit theorem states that the distribution of the sum of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution.

The central limit theorem and the law of large numbers are the two fundamental theorems of probability.

The Normal Distribution

The normal distribution is also called the Gaussian distribution, in honor of Carl Friedrich Gauss, who was among the first to use the distribution.

The Standard Normal Distribution

A random variable Z is said to have the standard normal distribution if it has the probability density function g given by

g(z) = exp(-z2 / 2) / [(2 [pi] )1/2] for z in R.

Sketch of the standard normal density function is as follows:

   1. g is symmetric about z = 0.
   2. g is increasing for z < 0 and decreasing for z > 0.
   3. The mode occurs at z = 0.
   4. g is concave upward for z < -1 and for z > 1 and is concave downward for -1 < z < 1.
   5. The inflection points of g occur at z = + - 1.
   6. g(z) -> 0 as z -> inf. and as z -> - inf.

Expected value of a random variable

The expected value of a real-valued random variable gives the center of the distribution of the variable

Definitions

A random experiment that has a sample space and a probability measure P. Suppose that X is a random variable taking values in a subset S of R.

If X has a discrete distribution with density function f then the expected value of X is defined by

E(X) = [sum] x in S xf(x).

The mean is the center of the probability distribution of X.

Basic Properties

E(X + Y) = E(X) + E(Y)

E(cX) = cE(X).

E(aX + bY) = aE(X) + bE(Y)

for constants a and b; expected value is a linear operation.

if X >= 0 (with probability 1) then E(X) >= 0.

if X <= Y (with probability 1) then E(X) <= E(Y)

|E(X)| <= E(|X|)

Example

Suppose that we create a new, compound experiment by repeating the basic experiment over and over again. This gives a sequence of independent random variables,

X1, X2, X3 ...

each with the same distribution as X. In statistical terms, we are sampling from the distribution of X. The average value, or sample mean, after n runs is

Mn = (X1 + X2 + . . . + Xn) / n

The average value Mn converges to the expected value mu as n [converges to] [infinity] . The precise statement of this is the law of large numbers, one of the fundamental theorems of probability.

Variance and Higher Moments

Definition

As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X is a random variable for the experiment, taking values in a subset S of R. Recall that the expected value or mean of X gives the center of the distribution of X. The variance of X is a measure of the spread of the distribution about the mean and is defined by

var(X) = E{[X - E(X)]^2}

Thus, the variance is the second central moment of X.

Suppose that X has a discrete distribution with density function f. Use the change of variables theorem to show that

var(X) = [sum] x in S [x - E(X)]^2 f(x).

The standard deviation of X is the square root of the variance:

sd(X) = [var(X)]^1/2.

Basic Properties of variance

var(X) = E(X^2) - [E(X)]^2.

var(X) >= 0

var(X) = 0 if and only if P(X = c) = 1 for some constant c.

if a and b are constants then var(aX + b) = a^2var(X)

Let Z = [X - E(X)] / sd(X). then Z has mean 0 and variance 1.

The random variable Z is sometimes called the standard score associated with X. Since X and its mean and standard deviation all have the same physical units, the standard score Z is dimensionless. It measures the directed distance from E(X) to X in terms of standard deviations.

Suppose that Z has density f(z) = exp(-z2 / 2) / (2 [pi] )^1/2 for z in R. This defines the standard normal distribution.

var(Z) = 1.

If X has the normal distribution with mean mu and standard deviation d then Z = (X - mu) / d has the standard normal distribution.

Bernoulli Trials

The Bernoulli trials process, named after James Bernoulli, is one of the simplest yet most important random processes in probability. Essentially, the process is the mathematical abstraction of coin tossing, but because of its wide applicability, it is usually stated in terms of a sequence of generic trials that satisfy the following assumptions:

   1. Each trial has two possible outcomes, generically called success and failure.
   2. The trials are independent. Intuitively, the outcome of one trial has no influence over the outcome of another trial.
   3. On each trial, the probability of success is p and the probability of failure is 1 - p.

Mathematically, we can describe the Bernoulli trials process with a sequence of indicator random variables:

I1, I2, I3, ...

An indicator variable is a random variable that takes only the values 1 and 0, which in this setting denote success and failure, respectively. The j'th indicator variable simply records the outcome of trial j. Thus, the indicator variables are independent and have the same density function:

P(Ij = 1) = p, P(Ij = 0) = (1 - p)

Thus, the Bernoulli trials process is characterized by a single parameter p.

The Binomial Distribution

Suppose that our random experiment is to perform Bernoulli trials I1, I2, .... In this section we will study the random variable Xn that gives the number of successes in the first n trials. This variable has a simple expression in terms of the indicator variables:

Xn = I1 + I2 + . . . + In.

The Density Function

Suppose that K subset N = {1, 2, ..., n} and #(K) = k. Use the assumptions of Bernoulli trials to show that

P(Ij = 1 for j in K and Ij = 0 for j in N - K) = p^k(1 - p)^n -k.

Recall that the number of subsets of size k from a set of size n is the binomial coefficient

C(n, k). = n!/[k!(n - k)!}

P(Xn = k) = C(n, k)p^k(1 - p)^n-k for k = 0, 1, ..., n.

The distribution with this density function is known as the binomial distribution with parameters n and p. The binomial family of distributions is one of the most important in probability.

Famous Problems

In 1693, Samuel Pepys asked Isaac Newton whether it is more likely to get at least one ace in 6 rolls of a die or at least two aces in 12 rolls of a die. This problems is known a Pepys' problem; naturally, Pepys had fair dice in mind.

Moments

We will compute the mean and variance of the binomial distribution.

E(Xn) = np.

This makes intuitive sense, since p should be approximately the proportion of successes in a large number of trials.

var(Xn) = np(1 - p)

The Proportion of Successes

Suppose again that our random experiment is to perform Bernoulli trials I1, I2, ... Recall that the number of successes in the first n trials, Xn, has the binomial distribution with parameters n and p. In this section, we will study the random variable that gives the proportion of successes in the first n trials:

Mn = Xn / n = (I1 + I2 + . . . + In) / n.

Note that Mn takes the values k / n where k = 0, 1, ..., n.

The Density Function

It is easy to express the density function of the proportion of successes Mn in terms of the density function of the number of successes Xn:

P(Mn = k / n) = C(n, k) p^k (1 - p)^n-k for k = 0, 1, ..., n.

Properties

The proportion of successes can also be thought of as the average value of the indicator variables. In statistical terms, the indicator variables form a random sample, since they are independent and identically distributed, and in this context, Mn is a special case of a sample mean. The proportion of successes Mn is typically used to estimate the probability of success p when this probability is unknown. It is basic to the very notion of probability, that if the number of trials is large, then Mn should be close to p. The mathematical formulation of this idea is a special case of the law of large numbers.

E(Mn) = p.

In statistical terms, this means that Mn is an unbiased estimator of p.

var(Mn) = p(1 - p) / n.

Note that for fixed p, var(Mn) decreases to 0 as the number of trials increases to infinity. This means that the estimate improves as n increases; in statistical terms, this is known as consistency.