Random Variable
Sample
Distribution
Law of large numbers
Central limit theorem
Normal distribution
expected value
variance
Bernoulli trials
Binomial distribution
Now suppose we perform independent replications of the basic experiment. This defines a new, compound experiment with a sequence of independent random variables, each with the same distribution as X:
X1, X2, ...,
In statistical terms, (X1, X2, ..., Xn) is a random sample of size n from the distribution of X for each n. The sample mean is simply the average of the variables in the sample:
Mn = (X1 + X2 + . . . + Xn) / n.
The sample mean is a real-valued function of the random sample and thus is a statistic. Like any statistic, the sample mean is itself a random variable with a distribution, mean, and variance of its own. Many times, the distribution mean is unknown and the sample mean is used as an estimator of the distribution mean.
This shows that Mn is an unbiased estimator of mu. Therefore, the variance of the sample mean is the mean square error, when the sample mean is used as an estimator of the distribution mean.
var(Mn) = d^2 / n.
f(x) = P(X = x) for x in S.
1. f(x) >= 0 for x in S.
2. [sum] x in S f(x) = 1
3. [sum] x in A f(x) = P(X [in] A) for A subset of S
Property 3 is particularly important since it shows that the probability
distribution of a discrete random variable is completely determined by
its density function.
var(Mn) = d^2 / n.
note that var(Mn) [converges to] 0 as n [converges to] [infinity] .
This means that Mn [converges to] mu as n [converges to] [infinity] in
mean square.
Note :converge in mean squre is E(|Xn - X|^k) [converges to]
0 as n [converges to] [infinity] .
then
P[|Mn - mu| > r] [converges to] 0 as n [converges to] [infinity] for
any r > 0.
(use Chebyshev's inequality)
This result is known as the weak law of large numbers, and states that the sample mean converges to the mean of the distribution in probability. Recall that in general, convergence in mean square implies convergence in probability.
P(Mn [converges to] mu as n [converges to] [infinity] ) = 1.
this is a much stronger result than the weak law.
The central limit theorem and the law of large numbers are the two fundamental theorems of probability.
g(z) = exp(-z2 / 2) / [(2 [pi] )1/2] for z in R.
Sketch of the standard normal density function is as follows:
1. g is symmetric about z = 0.
2. g is increasing for z < 0 and decreasing for z >
0.
3. The mode occurs at z = 0.
4. g is concave upward for z < -1 and for z > 1 and
is concave downward for -1 < z < 1.
5. The inflection points of g occur at z = + - 1.
6. g(z) -> 0 as z -> inf. and as z -> - inf.
If X has a discrete distribution with density function f then the expected value of X is defined by
E(X) = [sum] x in S xf(x).
The mean is the center of the probability distribution of X.
E(cX) = cE(X).
E(aX + bY) = aE(X) + bE(Y)
for constants a and b; expected value is a linear operation.
if X >= 0 (with probability 1) then E(X) >= 0.
if X <= Y (with probability 1) then E(X) <= E(Y)
|E(X)| <= E(|X|)
X1, X2, X3 ...
each with the same distribution as X. In statistical terms, we are sampling from the distribution of X. The average value, or sample mean, after n runs is
Mn = (X1 + X2 + . . . + Xn) / n
The average value Mn converges to the expected value mu as n [converges to] [infinity] . The precise statement of this is the law of large numbers, one of the fundamental theorems of probability.
var(X) = E{[X - E(X)]^2}
Thus, the variance is the second central moment of X.
Suppose that X has a discrete distribution with density function f. Use the change of variables theorem to show that
var(X) = [sum] x in S [x - E(X)]^2 f(x).
The standard deviation of X is the square root of the variance:
sd(X) = [var(X)]^1/2.
var(X) >= 0
var(X) = 0 if and only if P(X = c) = 1 for some constant c.
if a and b are constants then var(aX + b) = a^2var(X)
Let Z = [X - E(X)] / sd(X). then Z has mean 0 and variance 1.
The random variable Z is sometimes called the standard score associated with X. Since X and its mean and standard deviation all have the same physical units, the standard score Z is dimensionless. It measures the directed distance from E(X) to X in terms of standard deviations.
Suppose that Z has density f(z) = exp(-z2 / 2) / (2 [pi] )^1/2 for z in R. This defines the standard normal distribution.
var(Z) = 1.
If X has the normal distribution with mean mu and standard deviation d then Z = (X - mu) / d has the standard normal distribution.
1. Each trial has two possible outcomes, generically called
success and failure.
2. The trials are independent. Intuitively, the outcome
of one trial has no influence over the outcome of another trial.
3. On each trial, the probability of success is p and
the probability of failure is 1 - p.
Mathematically, we can describe the Bernoulli trials process with a sequence of indicator random variables:
I1, I2, I3, ...
An indicator variable is a random variable that takes only the values 1 and 0, which in this setting denote success and failure, respectively. The j'th indicator variable simply records the outcome of trial j. Thus, the indicator variables are independent and have the same density function:
P(Ij = 1) = p, P(Ij = 0) = (1 - p)
Thus, the Bernoulli trials process is characterized by a single parameter p.
Xn = I1 + I2 + . . . + In.
P(Ij = 1 for j in K and Ij = 0 for j in N - K) = p^k(1 - p)^n -k.
Recall that the number of subsets of size k from a set of size n is the binomial coefficient
C(n, k). = n!/[k!(n - k)!}
P(Xn = k) = C(n, k)p^k(1 - p)^n-k for k = 0, 1, ..., n.
The distribution with this density function is known as the binomial distribution with parameters n and p. The binomial family of distributions is one of the most important in probability.
E(Xn) = np.
This makes intuitive sense, since p should be approximately the proportion of successes in a large number of trials.
var(Xn) = np(1 - p)
Mn = Xn / n = (I1 + I2 + . . . + In) / n.
Note that Mn takes the values k / n where k = 0, 1, ..., n.
P(Mn = k / n) = C(n, k) p^k (1 - p)^n-k for k = 0, 1, ..., n.
E(Mn) = p.
In statistical terms, this means that Mn is an unbiased estimator of p.
var(Mn) = p(1 - p) / n.
Note that for fixed p, var(Mn) decreases to 0 as the number of trials
increases to infinity. This means that the estimate improves as n increases;
in statistical terms, this is known as consistency.