
Recall some definitions of mathematical statistics.
Let a
probability space be given
.
Definition 1:Random variable taking values in the set
c
-algebra of subsets
called any
measurable function
, i.e
the condition is satisfied
\ xi ^ {- 1} (A) = \ {\ omega \ in \ Omega \ space \ colon \ space \ xi (w) \ in A \} \ in \ Sigma .
Definition 2:The sample space is the space of all possible values of the observation or sample along with
-algebra of measurable subsets of this space.
Designation: .
Defined on probability space
random variables
spawn in space
probabilistic measures
P_ \ xi \ {C \} = P \ {\ xi \ in C \}, P_ \ eta \ {C \} = P \ {\ eta \ in C \}, \ ldots On a sample space, not one probability measure is determined, but a finite or infinite family of probability measures.
In
problems of mathematical statistics , a family of probability measures is
known. \ {P_ \ theta, \ space \ theta \ in \ Theta \} defined in the sample space, and
it is required to determine from the sample which of the probability measures of this family corresponds to the sample.
Definition 3:A statistical model is an aggregate consisting of a sample space and a family of probability measures defined on it.
Designation: where
\ mathscr {P} = \ {P_ \ theta, \ space \ theta \ in \ Theta \} .
Let
and
- selective space.
Sampling
can be considered as a combination
real numbers. We assign to each element of the sample a probability equal to
.
Let
Definition 4:An empirical distribution constructed from sample X is a probability measure
:
I.e
- the ratio of the number of sample elements that belong
, to the total number of sample items:
.
Definition 5:Selective moment order called
-
sample mean .
Definition 6:Selective central moment of order defined by equality
-
sample variance .
In machine learning, many tasks are to learn how to select a parameter from the data available.
which best describes this data. In mathematical statistics,
the maximum likelihood method is often used to solve a similar problem.
In real life, the error distribution often has a normal distribution. For some justification, we state the
central limit theorem .
Theorem 1 (CLT):If random variables
- independent, equally distributed,
mathematical expectation variance then
\ lim \ limits_ {n \ to \ infty} P \ {\ frac {\ xi_1 + \ xi_2 + \ ldots + \ xi_n - na} {\ sigma \ sqrt {n}} \ leq x \} = F (x) = \ frac {1} {\ sqrt {2 \ pi}} \ int \ limits _ {- \ infty} ^ xe ^ {- u ^ 2/2} du.
Below, we formulate the maximum likelihood method and consider its operation as an example of a family of normal distributions.
Maximum likelihood method
Let for a statistical model
(B, \ mathscr {B}, \ mathscr {P} = \ {P_ \ theta, \ space \ theta \ in \ Theta \}) two conditions are satisfied:
- if a then ;
- there is such a measure on concerning which for any measure , , there is a density , i.e .
Definition 7:Maximum Likelihood Assessment (OMP)
parameter
called empirically constructed
corresponding to the sample
, value
at which
Definition 8:Function
as a function of
is called the
likelihood function , and the function
-
logarithmic likelihood function .
These functions peak at the same values.
, as
-
monotonous increasing function.
Example:\ mathscr {P} = \ {N (a, \ sigma ^ 2) \ space | \ space a \ in \ mathbb {R}, \ space \ sigma \ in (0, + \ infty) \} - family of
normal distributions with densities
\ phi_ {a, \ sigma ^ 2} (x) = \ frac {1} {\ sigma \ sqrt {2 \ pi}} \ exp \ {- \ frac {1} {2 \ sigma ^ 2} (xa ) ^ 2 \} . By sample
\ Lambda_ {a, \ sigma} (X) = \ frac {1} {(2 \ pi) ^ {\ frac {n} {2}} \ sigma ^ n} \ exp \ {- \ frac {1} {2 \ sigma ^ 2} \ sum \ limits_ {i = 1} ^ n (x_j-a) ^ 2 \};
Estimates for mathematical expectation and variance were obtained.
If you look closely at the formula
we can conclude that the function
assumes its maximum value when
is minimal. In machine learning problems,
the least squares method is often used, in which the sum of the squared deviations of the predicted values from the true ones is minimized.
Bibliography:
- Lecture notes on mathematical statistics, author unknown;
- “Deep learning. Immersion in the world of neural networks ”, S. Nikulenko, A. Kadurin, E. Arkhangelskaya, PETER, 2018.