Gradually and inconspicuously, a situation develops when the complexity of serious C ++ projects becomes prohibitive. Unfortunately, now a C ++ programmer cannot rely solely on his own strengths.
Firstly, there is so much code that it is no longer possible when there are at least a couple of programmers in the project who know the whole project. For example, the Linux kernel 1.0.0 contained about 176 thousand lines of code. This is a lot, but it was possible to put a coffee machine nearby and in a couple of weeks more or less look at the entire code and understand the general principles of its operation. If we take the Linux kernel 5.0.0, then the size of the code base is already about 26 million lines of code. The kernel code has grown almost 150 times. You can only select several parts of the project and take part in their development. It is impossible to sit down and figure out how exactly this all works, what are the relationships between the various sections of the code.
Secondly, the C ++ language continues to evolve rapidly. On the one hand, this is good, as there are constructions that allow you to write more compact and secure code. On the other hand, due to backward compatibility, old large projects are becoming heterogeneous. They interweave old and new approaches to writing code. The analogy with rings on a section of a tree begs. Because of this, every year immersing yourself in projects written in C ++ is becoming more and more difficult. You need to understand the code at the same time, both written in the C style with classes, and in modern approaches (lambdas, semantics of movement, etc.). It takes too much time to fully learn C ++. But since it is still required to develop projects, people start writing code in C ++ without having studied all its nuances to the end. This leads to additional defects, but it is irrational because of this to stop and wait for all developers to know C ++ perfectly.
Is the situation hopeless? Not. A new class of tools comes to the rescue: static code analyzers. Many experienced programmers at this moment curled their lips, as if they slipped a lemon :). Like, we know these your linter ... There are a lot of messages, a little sense ... Yes, and what is this new class of tools ?! We linter launched 20 years ago.
Nevertheless, I will venture to claim that this is a new class of tools. What happened 10-20 years ago is not at all the tools that are now called static analyzers. First, I'm not talking about code formatting-oriented tools. They also relate to static analysis tools, but now we are talking about identifying errors in the code. Secondly, modern tools use sophisticated analysis technologies, taking into account the interconnections between different functions and actually virtually executing certain sections of code. These are not the same 20-year-old linters built on regular expressions. By the way, a normal static analyzer can not be
done on regular expressions at all. To find errors, technologies such as data flow analysis, automatic method annotation, symbolic execution, and so on are used.
These are not abstract words, but the reality that I observe as one of the creators of the PVS-Studio tool. Check out this article
to see why analyzers can find interesting errors.
However, it is much more important that modern static analyzers have extensive knowledge of error patterns. Moreover, the analyzers know more than even professional developers. It became too difficult to take into account and remember all the nuances when writing code. For example, unless you specifically read about it somewhere, you will never guess that calls to the memset
function to erase private data sometimes disappear, since from the point of view of the compiler, the call to the memset
function is redundant. Meanwhile, this is a serious security flaw CWE-14
, which is found literally everywhere
. Or who, for example, knows what is dangerous in such a container filling?
std::vector<std::unique_ptr<MyType>> v; v.emplace_back(new MyType(123));
I think that not everyone will immediately realize that such code is potentially dangerous
and can lead to memory leaks.
In addition to extensive knowledge of patterns, static analyzers are infinitely attentive and never get tired. For example, unlike a person, they are not too lazy to look at the header files to make sure that isspace
are real functions, and not crazy macros
that spoil everything. Such cases demonstrate the essence of the difficulty of finding errors in large projects: something changes in one place, but breaks in another.
I am convinced that soon static analysis will become an integral part of DevOps - it will be as natural and necessary as the use of version control system. This is already happening gradually at conferences devoted to the development process, where static analysis is increasingly mentioned as one of the first lines of defense for fighting bugs.
Static analysis performs a kind of coarse filter. It is inefficient to look for stupid errors and typos using unit tests or manual testing. It is much faster and cheaper to fix them immediately after writing the code, using static analysis to detect problems. This idea, as well as the importance of regular use of the analyzer, is well described in the article “ Embed static analysis into the process, and do not look for bugs with it
Someone may say that there is no point in special tools, since the compiler also learns to do such static checks. Yes it is. However, static analyzers naturally do not stand still and how specialized tools outperform compilers. For example, every time we check LLVM, we find errors
there using PVS-Studio.
The world offers a large number of tools for static code analysis. As they say, choose
to your taste. Want to find a lot of errors and potential vulnerabilities even at the stage of writing code? Use static code analyzers and improve the quality of your code base!
If you want to share this article with an English-speaking audience, then please use this link: Andrey Karpov. Why Static Analysis Can Improve a Complex C ++ Codebase