## четвер, 9 жовтня 2014 р.

### Simple Statistics - terms, definitions and application

Real life implication for everyday marketing described here:
http://www.optimizesmart.com/bare-minimum-statistics-web-analytics/
Mean - math average
Median - middle point of sorted points by ascending. Could be more representative when small number of unusual results
Standard Deviation (S)- how much data differs (summ of squares of differences devided into (n-1) and sq root taken)

On a histogram, if data are distributed bimodal (two peaks) it is a sign that group consists of two groups and needs to be separated. Unimodal - 1 peek on histogram.

IQR - spread of the middle of 50% of data.
Box plots are used to compare two samples of data.

#### Probabilities

Venn diagrams are graphical way to represent probabilities as event circles that may or may not interlap.
A and B are event. P(A) - probability of the event. - OR ( || or union), - AND ( && ).

Odds in favour of A:
odds = P(A)/(1-P(A)). Odds against (1-P(A))/P(A)

The probability that A or B, or both, occur: P (A ∪ B) = P(A) + P(B) − P(A ∩ B) If A and B are mutually exclusive - A ∩ B = φ, the empty set or zero.

P(A|B), the conditional probability of A given that B has occurred

P(A ∩ B) = P(A|B)P(B)

A discrete random variable is one that can assume values only on a
A continuous random variable is measured in real units, such as time, weight, temperature or length.

For discrete random variables, a probability is assigned to each individual value that the variable can take.

Normal distribution of probability function (graph drawing)

1. 68.2% of a normal distribution lies within one standard deviation from the mean.
2. 95.4% of a normal distribution lies within two standard deviations from the mean.
3. 99.7% of a normal distribution lies within three standard deviations from the mean.

A statistical hypothesis is a statement about the parameters of the population.

The statement that we are trying to verify is called the null hypothesis. It is denoted by Ho . If the null hypothesis is not supported by the data, then we will adopt the statement contained in the alternative hypothesis. It is denoted by Ha

We use p-values to decide between the null and the alternative hypothesis. The smaller the p- value the more there is evidence in favour of the alternative hypothesis.
The p-value is the probability of obtaining the observed result, or one more extreme, when the null hypothesis is true.

The conclusions corresponding to different p-values
 Conclusion A p-value < 0.01 Very strong evidence against H0 B 0.01 ≤ p-value < 0.05 Strong evidence against H0 C 0.05 ≤ p-value < 0.1 Some inconclusive evidence against H0 D p-value ≥ 0.1 Little or no evidence against H0

Regression analysis is a method for investigating the functional relationship among variables.
Based on a regression function, we can predict future values. The difference between actual value and predicted called residual.