This is the second and last part of my abbreviation cheat sheet that a C ++ developer should know. C ++ is mentioned here only because I created the cheat sheet primarily for myself, but I am just the same C ++ developer.
In fact, this part contains concepts whose scope is not limited to C ++. So the selection may be of interest to a wider audience.
As in the
first part , abbreviations are grouped, if that makes sense. If it makes no sense, they are listed alphabetically.
Concurrency and atomic operations:•
CAS•
ABA•
FAA•
RCUData storage:•
ACID•
CAP•
PACELC•
BASESoftware development principles:•
DRY•
KISS•
YAGNI•
NIH•
FTSE•
GRASP•
SOLIDOther:•
ABI•
COW•
FBC, FBCP•
LRUConcurrency and Atomic Operations
Cas
Compare And Swap. Comparison with the exchange. This is an atomic instruction with three arguments: atomic variable or memory address, expected value, new value. If and only if the value of the variable matches the expected value, the variable receives a new value and the instruction completes successfully.
CAS either simply returns a Boolean value (and then it can be called Compare And Set), or if it fails, it also returns the current value of the first argument.
Pseudo codebool cas(int* addr, int& expected, int new_value) { if (*addr != expected) { expected = *addr; return false; } *addr = new_value; return true; }
In C ++,
CAS is represented by the families of methods
std::atomic<T>::compare_exchange_weak
,
std::atomic<T>::compare_exchange_strong
and free functions
std::atomic_compare_exchange_weak
,
std::atomic_compare_exchange_strong
. The difference between
* weak and
* strong is that the former can produce false negative results. Those. if the value is expected, they will return
false
and will not replace it with a new one. The reason for the existence of
* weak operations is because on some architectures
* strong is relatively expensive. In most cases,
CAS instructions spin in a loop (the so-called CAS loop), so using
* weak instead of
* strong will not change the logic, but it can improve performance.
CAS instructions are used to implement synchronization primitives (such as mutexes and semaphores) and lock-free algorithms. Often lead to an
ABA problem.
Read more:
one (Russian) ,
two (English)ABA
ABA problem. ABA problem. This problem arises in parallel algorithms based on comparisons with exchanges (see
CAS ), for example, in lock-free algorithms. The bottom line is that the thread reads the value of the atomic variable, does something else and updates this variable through comparison with the exchange. Those. the flow logic is as follows: if the variable still contains the previous value, then nothing has changed, everything is in order. But this may not be so. A more formal description of the problem:
- thread 1 reads the value of the variable, it is equal to A
- thread 1 is being forced out, thread 2 is starting
- thread 2 changes the value of the variable from A to B, makes a bunch of changes (changes some value associated with the variable or just frees memory), and then again changes the value - from B to A
- thread 1 resumes operation, compares the previously obtained value with the current one and concludes that nothing has changed
Possible solutions to the problem:
- The simplest and most obvious is to use locks. This will result in the usual thread-safe algorithm with critical sections. But it will cease to be free of locks. But if it came to CAS and ABA , then this is most likely not an option.
- Add special labels to the compared values. For example, a counter of the number of changes. On the one hand, this counter may overflow, but on the other, modern x86_64 processors support 128-bit CAS operations. Those. when comparing pointers to a counter, you can give up to 64 bits, and someone figured that this is enough for 10 years of continuous operation of the algorithm.
- Some architectures (ARM, for example) provide LL / SC (load linked, store conditional) instructions that not only allow you to get the current value of the address in memory, but also to understand whether this value has changed since the last read.
For use in data-free data structures such as stack, list, and queue - in general, where there is a risk of staying with a hanging pointer to a remote node - there is a whole family of solutions to the
ABA problem based on deferred removal of nodes. This includes the garbage collector, hazard pointers and the read-modify-write mechanism (see
RCU ).
Read more:
one (Russian) ,
two (English) ,
three (English)FAA
Fetch And Add. Ahem ... get and add (it seems this concept is never translated into Russian). An atomic operation with two arguments: an atomic variable or an address in memory, and the value by which this variable must be changed. If the architecture allows, the operation returns the previous value of the changed variable (x86 allows since i486). Unlike
CAS , the
FAA is always successful.
Pseudo code int faa(int* addr, int diff) { int value = *addr; *addr = value + diff; return value; }
In C ++, it is implemented as the families of methods
std::atomic<T>::fetch_add
,
fetch_sub
,
fetch_and
,
fetch_or
,
fetch_xor
and the corresponding free functions
std::atomic_fetch_add
, etc.
As befits an atomic instruction, the
FAA is used in implementations of synchronization primitives and lock-free algorithms and data structures.
Read more:
one (Russian) ,
two (English)RCU
Read-Copy-Update. Read-modify-write. This is a non-blocking mechanism for synchronizing access to the data structure (lock-free of course). It is used in cases where the reading speed is critical. It is an example of the tradeoff of time and memory (space-time tradeoff).
The idea of
RCU is that the writer stream does not change the data in place, but creates a copy, makes the necessary change in it and atomically swaps the current data and the changed copy. At the same time, reader threads constantly have access to data - old or new, whoever has time. When there are no readers left working with the obsolete version, the writer deletes data that is no longer needed, freeing up memory.
Very simplified RCU works like this:
- Many readers, one writer.
- Reading and change occur simultaneously.
- Readers use very lightweight synchronization. In fact, the reader only has to notify the writer at the moment of entering the critical section and at the moment of leaving it. Work with synchronized data occurs only in the critical section.
- The writer, as soon as he atomically replaced the data with a copy, announces the beginning of the so-called grace-period (grace period). The grace period ends when all readers who were in critical sections at the beginning of this period left their critical sections. Now the writer can safely delete obsolete data. It is implied that all critical sections are finite, which guarantees the finiteness of the grace period.
RCU is great for data that is frequently read and rarely updated. This mechanism is actively used in the Linux kernel, where it is quite simple to determine when the grace period ends.
Disadvantages:
- Poor for synchronizing access to frequently modified data.
- Difficult to implement in user space.
- Depends on the ability to atomically change pointers to an address in memory, but not all architectures provide this capability.
Read more:
one (Russian) ,
two (English)Data storage
ACID
Atomicity, Consistency, Isolation, Durability. Atomicity, consistency, isolation, durability. This is a set of transaction requirements in a DBMS.
ACID provides reliable and predictable DBMS operation even in case of errors.
- Atomicity ensures that the transaction either completes completely or does nothing. An intermediate state is impossible, it will not be such that one transaction operation was successful, and the other does not. All or nothing.
- Consistency ensures that all data in the database satisfies all the specified rules and restrictions, both before the transaction begins and after its completion. During the execution of the transaction, consistency may be violated.
- Isolation ensures concurrent transactions do not affect each other. No transaction has access to inconsistent data processed by another transaction.
- Durability means that the result of a successful transaction is stored in the database and cannot be lost, no matter what happens to the database immediately after the transaction is completed.
All major relational DBMSs fully support
ACID . In the NoSQL world, such full support is more likely an exception.
Read more: one
time (English) ,
two (English)Cap
CAP theorem. CAP Theorem. The theorem says that any distributed system can have no more than two properties from the list: data consistency (
Consistency ), availability (
Availability ), resistance to separation (
Partition tolerance ).
- Consistency in this case means (simplified) consistent consistency. Those. as soon as the data update operation on one node has completed successfully, all other nodes already have this updated data. Accordingly, all nodes are in a consistent state. This is not the consistency required by ACID .
- Accessibility means that every failed node returns a correct answer to each request (both read and write) in a reasonable amount of time. There is no guarantee that the responses from different nodes match.
- Resistance to separation means that the system will continue to work correctly in case of loss of an arbitrary number of messages between its nodes.
Because all three properties are unattainable; from the point of view of the
CAP theorem, all distributed systems fall into three classes:
CA ,
CP, and
AP .
CA systems are obviously not partitioned resistant. Because in the vast majority of cases, distributedness implies distributedness over a real network, and in a real network there is always a non-zero probability of losing a packet, then
CA systems are of little interest.
The choice is between
CP and
AP , i.e., between consistency and availability. Traditional relational DBMSs that follow
ACID principles prefer consistency. While many NoSQL solutions choose accessibility and
BASE .
In the case of normal operation of the network, i.e., when there is no network separation, the
CAP theorem does not impose any restrictions on consistency and availability. Those. donating something is not necessary.
Read more:
one (Russian) ,
two (English) ,
three (English)Pacelc
PACELC theorem. PACELC Theorem. This is an extension of the
CAP theorem, which states that a distributed system in the case of
Partition is forced to choose between
Availability and
Consistency , and in the case of normal network operation (
Else ), you have to choose between
Latency and
Consistency )
Accordingly, if the
CAP theorem distinguished 2 classes of systems that are stable with network separation, then
PACELC has 4 of them:
PA / EL ,
PA / EC ,
PC / EL and
PC / EC . Some NoSQL databases can change their class depending on the settings.
Read more:
one (Russian) ,
two (English)BASE
Basically Available, Soft state, Eventual consistency. Basic accessibility, unstable state, consistency in the long run. According to the
CAP theorem in distributed systems, something will have to be abandoned. Strict coherence is usually abandoned in favor of coherence in the long run. Which means that in the absence of data changes, the system will someday eventually come to a consistent state.
To indicate such a compromise, the abbreviation
BASE began to be used somewhat
tightly , and a game of chemical terms turned out (
ACID - acidity,
BASE - basicity).
- Basically Available means that the system guarantees data availability, it responds to every request. But the answer may be outdated or inconsistent data (or lack thereof)
- Soft state means that the state of the system can change over time even in the absence of requests for data changes. Because at any point in time, data can be brought into a consistent state.
- Eventual consistency means that if the data stops changing, they will of course end up in a consistent state. Those. the same request to different nodes will lead to the same answers.
Read more: one
time (English) ,
two (English)Software Development Principles
DRY
Don't Repeat Yourself. Do not repeat. This is the principle of software development, the main idea of which is to reduce the amount of duplicated information in the system, and the goal is to reduce the complexity of the system and increase its manageability.
In the original (
The Pragmatic Programmer ,
Andrew Hunt and
David Thomas ), this principle is formulated as follows: “Each piece of knowledge should have a single, consistent and authoritative representation within the system.” In this case, knowledge is understood to mean any part of the subject area or algorithm: code, database design, some interaction protocol, etc. Thus, in order to make one change to the system, only one “knowledge” needs to be updated in one place.
A dumb example: the client and server transmit structured data to each other. Because these are different applications running on different machines, then they both must have their own implementations of these structures. If something changes, changes will have to be made in two places. An obvious step to avoid this repetition is to allocate the common code into a separate library. The next step is to generate it according to the description of structures (Google Protocol Buffers, for example), so as not to write the same type of code to access the fields of structures.
Code duplication is just a special case of
DRY violation. And that is not always the case. If two pieces of code look the same, but each implement their own business logic,
DRY is not broken.
Like any other principle,
DRY is a tool, not a dogma. The larger the system, the easier it is to break this principle. First, the ideal is unattainable. And secondly, if blindly following
DRY leads to more complicated code and makes it more difficult to understand, then it is better to abandon it.
Read more:
one (Russian) ,
two (Russian)KISS
Keep It Simple, Stupid. Make it easier (stupid in this case is not a call). This is the design principle that most systems work better if they remain simple. The principle originated in the aircraft industry and is applied a lot where, including in software development.
In the latter case,
KISS is useful both in designing and in writing code directly. Simple architecture and code are not only easier to understand, they are also easier to use, maintain and develop. MapReduce should not be fooled if a producer-consumer pair is enough. The magic of metaprogramming is excessively complex if you can get by with a couple of ordinary functions and achieve the required level of performance.
With all this, one must not forget that simplicity is not a goal, but only a non-functional requirement. The main thing is to achieve the goal of design / implementation, and it is advisable to do it as simple as possible.
Read more:
one (Russian) ,
two (Russian) ,
three (English)YAGNI
You Ain't Gonna Need It. You don’t need it. This is the principle of software development, the main idea of which is to abandon excessive functionality, and the goal is to save resources spent on development.
YAGNI says it’s not necessary to design or implement functions that are not needed right now. Even if you are sure that they will be needed in the future. No need to add extra abstractions, the benefits of which will appear sometime later.
The problem is that people do not predict the future well. Therefore, most likely everything done "in reserve" simply will not come in handy. And it turns out that the time and money spent on such development, testing, and documentation are wasted. Plus, the software has become more complicated; additional resources have to be spent on supporting it again. Even worse, when it turns out that it was necessary to do differently. Money and time will be spent also on correction.
The
YAGNI principle is
somewhat more radical than
DRY and
KISS . If they break the system into understandable parts and make decisions simple, then
YAGNI simply cuts out unnecessary parts and solutions.
Read more:
one (Russian) ,
two (English) ,
three (English)Nih
Not Invented Here. Not invented here. This is a syndrome of rejection of other people's developments, a position that practically borders on the invention of a bicycle. Often, the syndrome is accompanied by the belief that creating technology within the company will be both faster and cheaper, and the technology itself will better meet the needs of the company. However, in most cases this is not the case, and
NIH is an anti-pattern.
Here are a few cases justifying
NIH :
- The quality of the third-party solution is not high enough, or the price is not low enough.
- A third-party solution has licensing restrictions.
- Using a third-party solution creates dependence on its supplier and thus threatens the business.
Read more:
one (Russian) ,
two (English) ,
three (English)Ftse
Fundamental Theorem of Software Engineering. Fundamental theorem of software development. In fact, this is not a theorem; it has no proof. This is a famous saying by Andrew Koenig:
Any problem can be solved by adding another layer of abstraction.
Sometimes they add to this phrase "... except for the problem of too many layers of abstraction." In general, the “theorem" is not a serious thing, but it's worth knowing about it.
Read more: one
time (English) ,
two (English)GRASP
General Responsibility Assignment Software Patterns. General responsibility allocation templates. These nine patterns are formulated in
Applying UML and Patterns by
Craig Larman . Each template is a typical solution to one (but rather general) software design problem.
- Information Expert . Problem: what is the general principle of the distribution of responsibilities between objects? Solution: assign a duty to someone who has the information required to fulfill this obligation.
- Creator Problem: who should be responsible for creating a new object? Solution: class B must create instances of class A if one or more of the following conditions is true:
- class B aggregates or contains instances of A
- B writes A
- B actively uses A
- B has initialization data A - Low Coupling Problem: how to reduce the impact of change? How to increase the possibility of reuse? Solution: distribute responsibilities so that connectivity is low. Coupling is a measure of how rigidly the elements are connected, how much they depend on one another. Those. It is recommended to connect objects so that they know about each other only the necessary minimum.
- High gearing ( High Cohesion ). Problem: How do I manage complexity? Solution: distribute responsibilities so that high engagement is maintained. High engagement means that the responsibilities of one element are focused on one area.
- Controller Problem: who should be responsible for handling input events? Solution: designate a class as responsible, which either represents the whole system or subsystem as a whole (external controller), or one specific script (script or session controller). At the same time, the controller does not implement a reaction to events; it delegates this to the relevant executors.
- Polymorphism ( Polymorphism ). Problem: how to handle different behaviors based on type? Solution: if the behavior depends on the type, assign as responsible the type that implements this or that behavior, access the types through a generalized interface.
- ( Pure Fabrication ). : , ? : , .
- ( Indirection ). : , Low Coupling ? : .
- Resistance to changes ( Protected Variations ). Problem: how to design objects and subsystems so that changes in them do not have an undesirable effect on other elements? Solution: find possible instability points and create a stable interface around them, communicate only through such an interface.
GRASP patterns constantly intersect with Gang Four patterns and SOLID principles . This is normal, because they all solve a common problem - to simplify the creation of high-quality software.Read more: one (Russian) , two (English) , three (Russian)SOLID
The principles of SOLID. These are the five principles of object-oriented programming and design (the abbreviation is made from the first letters of their names). Gained fame thanks to Robert Martin (Robert Martin) in the early 2000s. The main goal of the principles is to create software that is easy to understand, maintain and expand.- The principle of single responsibility ( Single Responsibility Principle, the SRP ). A class or module should have only one responsibility. “Do only one thing, but do it well.”
- / ( Open Closed Principle, OCP ). (, , ) , . : , .
- ( Liskov Substitution Principle, LSP ). , . Those. - .
- ( Interface Segregation Principle, ISP ). , . , . , , . , .
- Dependency Inversion Principle ( of Dependency Inversion Principle, the DIP ). Upper level modules should not depend on lower level modules. All modules should depend on abstractions. Abstractions should not depend on the details. Details should depend on abstractions. For example, neither interfaces nor concrete classes should contain or accept other concrete classes as arguments of their methods, but only interfaces (in the sense of Java and C #).
Read more: one (Russian) , two (English) , three (English)Other
Abi
Application Binary Interface. Binary application interface. This is a set of conventions that defines the interaction of binary modules (executable files, libraries, OS). Two modules must be created in compliance with one ABI - this is a necessary condition for their binary compatibility, in this case they can interact without problems (for example, the executable file is linked to the library and executed by the operating system).Examples of ABIs are ELF executable file formats on Linux and PE on Windows. Each OS expects that the necessary data (resources, entry point, etc.) are located in a binary file according to the corresponding format. Obviously, ELF and PE are different, because Linux programs do not run directly on Windows and vice versa.At the level of libraries and executable files, ABI can determine the placement of fields within a class, base classes inside descendants, the mechanism for implementing virtual functions, the format of the call stack frame, the rules for passing arguments to the called function, etc. etc.C ++ does not have a single standard ABI , which is not surprising, because it depends on the architecture and OS. For example, C ++ compilers for many Unix-like operating systems (Linux, FreeBSD, MacOS) on x86_64 follow System V AMD64 ABI , on ARM - ARM C ++ ABI . Visual C ++ ABI is not officially published, but at least partially re-engineered. It is very different from System V ABI, they have completely different rules for hiding names (mangling) andpassing arguments to the called function (Linux uses 6 registers, Windows uses 4 registers), and a bunch of other differences.Even if the API and ABI remain the same, and only the implementation details change, binary compatibility may be broken . For example, in C ++ 11, there was a requirement for strings to store characters sequentially (as in a vector). Because of this, GCC 5 had to change the implementation of strings ( COW was used there before ), which led to binary incompatibility.Read more: time (Russian) , two (English) and all links from the previous two paragraphs.Cow
Copy On Write. Copy on recording. This is a resource management mechanism, also known as implicit sharing and lazy copy. The idea is that when a copy is required, the resource is not actually copied, but a link to it is created. And only when there is a request for changes - in the original or in the "copy" - only then does the creation of a full copy.The advantage of COW is obvious: copying any object occurs instantly. If objects are often copied but rarely changed, performance gains can be significant.Examples of using COW :- Virtual process memory management in Linux. When the
fork()
pages are called up, the process memory is not copied, but only marked as shared. - Snapshots in some file systems (Btrfs, ZFS) and databases (MS SQL Server).
- Prior to C ++ 11, some implementations
std::string
used COW . In C ++ 11, the requirements for std::string
changed (see ABI ). - Many types in Qt use COW .
Read more: one time (English) , two (English)FBC, FBCP
Fragile Base Class (Problem). The problem of a fragile base class. This is a fundamental OOP problem, its essence is that a correct change to the base class can lead to an error in one of the heirs.For example, to infinite recursion struct Base { virtual void method1() {
You can solve the FBC problem only by abandoning inheritance in favor of composition, for example, or extending interfaces in Java terminology (in C ++, this will be inheriting only to abstract base classes without state and method implementations). In other cases, you can only try to minimize the likelihood of FBCP with the following tips:- Prohibit inheritance or redefinition where they are not needed (keyword
final
in C ++ and Java). - The heir should not have access to the interiors of the base class, communication can only go through the public interface.
- The heir method can call only those virtual methods that are called in the redefined method of the base class, and the redefined method itself.
Read more: one (English) , two (English) , three (English)LRU
Least Recently Used. Extrusion is long unused. This is one of the caching algorithms (they are also preemptive policies). In general, the cache can be considered a fast storage of key-value pairs, one of the main characteristics of which is the hit ratio. The higher this level, the more often the desired value is in the fast cache and the less often it needs to be found in slow storage. But since memory is never rubbery, cache size has to be limited. The task of caching algorithms is to determine which element to throw out of the filled cache, if necessary, so as to maximize the level of hits.LRUreplaces their cache element, which no one has accessed the longest. This is perhaps the most famous caching algorithm. Probably due to the combination of efficiency and simplicity. The memory consumption of LRU is O (n), the average access time to the value is O (1), the average time to add an element is also O (1). For implementation, a hash table and a doubly linked list are usually used.For example, so template <class K, class V> class LRU { private: using Queue = std::list<std::pair<K, V>>; using Iterator = typename Queue::iterator; using Hash = std::unordered_map<K, Iterator>; Queue queue_; Hash hash_; const size_t limit_; public: LRU(size_t limit) : limit_(limit) { } std::optional<V> get(const K& key) { const auto it = hash_.find(key); if (it == hash_.end()) { return {}; } it->second = reorder(it->second); return { it->second->second }; } void add(K&& key, V&& value) { if (hash_.size() >= limit_) { pop(); } queue_.emplace_front(std::move(key), std::move(value)); const auto it = queue_.begin(); hash_[it->first] = it; } private: Iterator reorder(Iterator it) { queue_.emplace_front(std::move(it->first), std::move(it->second)); queue_.erase(it); return queue_.begin(); } void pop() { hash_.erase(queue_.back().first); queue_.pop_back(); } };
The obvious drawback of LRU is the large memory consumption, because it uses two structures of n elements each. In addition to LRU, there are many other caching algorithms for a variety of cases: MRU (Most Recently Used), LFU (Least Frequently Used), Segmented LRU, 2Q, etc.Read one more time : once (English) , two (Russian) , threePS
If I missed something or was mistaken somewhere - write in the comments.