четвер, 9 жовтня 2014 р.

Simple Statistics - terms, definitions and application

Real life implication for everyday marketing described here:
Mean - math average
Median - middle point of sorted points by ascending. Could be more representative when small number of unusual results
Standard Deviation (S)- how much data differs (summ of squares of differences devided into (n-1) and sq root taken)

On a histogram, if data are distributed bimodal (two peaks) it is a sign that group consists of two groups and needs to be separated. Unimodal - 1 peek on histogram.

IQR - spread of the middle of 50% of data.
Box plots are used to compare two samples of data.


Venn diagrams are graphical way to represent probabilities as event circles that may or may not interlap.
A and B are event. P(A) - probability of the event. - OR ( || or union), - AND ( && ).

Odds in favour of A:
odds = P(A)/(1-P(A)). Odds against (1-P(A))/P(A)

The probability that A or B, or both, occur: P (A ∪ B) = P(A) + P(B) − P(A ∩ B)
If A and B are mutually exclusive - A ∩ B = φ, the empty set or zero.

P(A|B), the conditional probability of A given that B has occurred

P(A ∩ B) = P(A|B)P(B)

A discrete random variable is one that can assume values only on a
A continuous random variable is measured in real units, such as time, weight, temperature or length.

For discrete random variables, a probability is assigned to each individual value that the variable can take.

Normal distribution of probability function (graph drawing)

  1. 68.2% of a normal distribution lies within one standard deviation from the mean.
  2. 95.4% of a normal distribution lies within two standard deviations from the mean.
  3. 99.7% of a normal distribution lies within three standard deviations from the mean.

A statistical hypothesis is a statement about the parameters of the population.

The statement that we are trying to verify is called the null hypothesis. It is denoted by Ho . If the null hypothesis is not supported by the data, then we will adopt the statement contained in the alternative hypothesis. It is denoted by Ha

We use p-values to decide between the null and the alternative hypothesis. The smaller the p- value the more there is evidence in favour of the alternative hypothesis.
The p-value is the probability of obtaining the observed result, or one more extreme, when the null hypothesis is true.

The conclusions corresponding to different p-values
p-value < 0.01
Very strong evidence against H0
0.01 p-value < 0.05
Strong evidence against H0
0.05 p-value < 0.1
Some inconclusive evidence against H0
p-value 0.1
Little or no evidence against H0

Regression analysis is a method for investigating the functional relationship among variables.
Based on a regression function, we can predict future values. The difference between actual value and predicted called residual.

Немає коментарів: