Recall the mathematical analysis
Function Continuity and Derivative
Let
,
Is the limit point of the set
(those.
),
.
Definition 1 (Cauchy function limit):Function
committed to
at
seeking to
, if a
Designation:
.
Definition 2:- Interval called set ] a, b [\ space: = \ {x \ in \ mathbb {R} | a <x <b \} ;
- Point Interval is called the neighborhood of this point.
- A punctured neighborhood of a point is a neighborhood of a point from which this point itself is excluded.
Designation:
- or - neighborhood of a point ;
- - punctured neighborhood of a point ;
Definition 3 (function limit through neighborhoods):
Definitions 1 and 3 are equivalent.
Definition 4 (continuity of a function at a point):- continuous in
- continuous in
Definitions 3 and 4 show that
(
continuous in
where
- limit point
)
Definition 5:Function
called
continuous on the set if it is continuous at each point of the set
.
Definition 6:- Function defined on the set is called differentiable at the point limiting for the set if there is such a linear with respect to the increment argument function [function differential at the point ] that increment the functions represented as
- Value
called derivative function at the point .
Also
Definition 7:- Dot is called the point of local maximum (minimum) , and the value of the function in it is called the local maximum (minimum) of the function , if a :
- The points of local maximum and minimum are called points of local extremum , and the values of the function in them are called local extrema of the function .
- Dot extremum function called an internal extremum point if is the limit point as for the set E _- = \ {x \ in E | x <x_0 \} , and for the set E _ + = \ {x \ in E | x> x_0 \} .
Lemma 1 (Fermat):If the function
differentiable at the point of internal extremum
, then its derivative at this point is zero:
.
Proposition 1 (Roll's theorem):If the function
continuous on a segment
differentiable in the interval
and
then there is a point
such that
.
Theorem 1 (Lagrange finite increment theorem):If the function
continuous on a segment
and differentiable in the interval
then there is a point
such that
Corollary 1 (a sign of monotonicity of a function):If at any point of a certain interval the derivative of the function is non-negative (positive), then the function does not decrease (increases) on this interval.
Corollary 2 (criterion for the constancy of a function):Continuous on a cut
a function is not constant if and only if its derivative is zero at any point in the interval
(or at least the interval
)
Partial derivative of a function of many variables
Across
denote the set:
\ mathbb {R} ^ m = \ underbrace {\ mathbb {R} \ times \ mathbb {R} \ times \ cdots \ times \ mathbb {R}} _ m = \ {(\ omega_1, \ omega_2, ... , \ omega_m), \ space \ omega_i \ in \ mathbb {R} \ space \ forall i \ in \ overline {1, m} \}.
Definition 8:Function
defined on the set
is called
differentiable at the point limiting for the set
, if a
Where
- linear with respect to
function [function
differential at the point
(reference
or
)], but
at
.
Relation (1) can be rewritten as follows:
or
If we go to the coordinate record of the point
, vector
and linear function
, then equality (1) looks like this
Where
- associated with point
real numbers. You need to find these numbers.
We denote
Where
\ {e_1, ..., e_m \} - basis in
.
At
from (2) we obtain
From (3) we obtain
Definition 9:The limit (4) is called the
partial derivative of the function
at the point
by variable
. It is designated:
Example 1:
Gradient descent
Let
where
\ mathbb {R} ^ n = \ underbrace {\ mathbb {R} \ times \ mathbb {R} \ times \ cdots \ times \ mathbb {R}} _ n = \ {(\ theta_1, \ theta_2, ... , \ theta_n), \ space \ theta_i \ in \ mathbb {R} \ space \ forall i \ in \ overline {1, n} \} .
Definition 10:Gradient Function
called a vector,
whose element is equal to
:
Gradient is the direction in which the function increases most rapidly. This means that the direction in which it decreases most rapidly is the direction opposite to the gradient, i.e.
.
The aim of the gradient descent method is to find the
extremum (minimum) point of the function.
Denote by
function parameter vector in step
. Parameter update vector in step
:
In the formula above, the parameter
Is the
learning speed that controls the size of the step that we take in the direction of the gradient slope. In particular, two opposing problems may arise:
- if the steps are too small, the training will be too long, and the likelihood of getting stuck in a small unsuccessful local minimum along the road increases (the first image in the picture below);
- if they are too large, you can endlessly jump over the desired minimum back and forth, but never reach the lowest point (the third image in the picture below).
Example:Consider the example of the gradient descent method in the simplest case (
) I.e
.
Let
. Then:
In the case when
, the situation turns out, as in the third image of the picture above. We constantly jump over the extreme point.
Let
. Then:
It is seen that iteratively we are approaching the point of extremum.
Let
. Then:
The extremum point was found in 1 step.
Bibliography:
- "Mathematical analysis. Part 1 ", V.A. Zorich, Moscow, 1997;
- “Deep learning. Immersion in the world of neural networks ”, S. Nikulenko, A. Kadurin, E. Arkhangelskaya, PETER, 2018.