Hi Habr!
The other day, I once again got the type code
if(someParameter.Volatilities.IsEmpty()) { // We have to report about the broken channels, however we could not differ it from just not started cold system. // Therefore write this case into the logs and then in case of emergency IT Ops will able to gather the target line Log.Info("Channel {0} is broken or was not started yet", someParameter.Key) }
The code has one rather important feature: the recipient would very much like to know what really happened. Indeed, in one case we have problems with the system, and in the other, we just warm up. However, the model does not give us this (to please the sender, who is often the author of the model).
Moreover, even the fact “maybe something is wrong” stems from the fact that the Volatilities
collection Volatilities
empty. Which in some cases may be correct.
I’m sure that most experienced developers in the code saw lines that contained secret knowledge in the style of "if this combination of flags is set, then we are asked to make A, B and C" (although this is not visible by the model itself).
From my point of view, such a saving on the structure of classes has an extremely negative effect on the project in the future, turning it into a set of hacks and crutches, gradually transforming a more or less convenient code into legacy.
Important: in the article I give examples that are useful for projects in which several developers (and not one), plus which will be updated and expanded for at least 5-10 years. All this does not make sense if the project has one developer for five years, or if no changes are planned after the release. And it is logical, if the project is needed for only a couple of months, there is no point in investing in a clear data model.
However, if you are doing long-playing - welcome to cat.
Use visitor pattern
Often the same field contains an object that can have different semantic meanings (as in the example). However, to save classes, the developer leaves only one type, supplying it with flags (or comments in the style of "if there is nothing here, then nothing was counted"). A similar approach may mask an error (which is bad for the project, but convenient for the team that supplies the service, because the bugs are not visible from the outside). A more correct option, which allows even at the far end of the wire to find out what is actually happening, is to use the interface + visitors.
In this case, the example from the header turns into code of the form:
class Response { public IVolatilityResponse Data { get; } } interface IVolatilityResponse { TOutput Visit<TInput, TOutput>(IVolatilityResponseVisitor<TInput, TOutput> visitor, TInput input) } class VolatilityValues : IVolatilityResponse { public Surface Data; TOutput Visit<TInput, TOutput>(IVolatilityResponseVisitor<TInput, TOutput> visitor, TInput input) => visitor.Visit(this, input); } class CalculationIsBroken : IVolatilityResponse { TOutput Visit<TInput, TOutput>(IVolatilityResponseVisitor<TInput, TOutput> visitor, TInput input) => visitor.Visit(this, input); } interface IVolatilityResponseVisitor<TInput, TOutput> { TOutput Visit(VolatilityValues instance, TInput input); TOutput Visit(CalculationIsBroken instance, TInput input); }
With this kind of processing:
- We need more code. Alas, if we want to express more information in the model, it should be more.
- Due to this kind of inheritance, we can no longer serialize
Response
to json
/ protobuf
, since type information is lost there. We will have to create a special container that will do this (for example, you can make a class that contains a separate field for each implementation, but only one of them will be filled). - Extending the model (that is, adding new classes) requires expanding the
IVolatilityResponseVisitor<TInput, TOutput>
interface, which means the compiler will force it to be supported in the code. The programmer will not forget to process the new type, otherwise the project will not compile. - Due to static typing, we don’t need to store documentation somewhere with possible combinations of fields, etc. We described all the possible options in code that is understandable to both the compiler and the person. We will not have a desync between documentation and code, since we can do without the first.
About restriction of inheritance in other languages
A number of other languages (for example, Scala
or Kotlin
) have keywords that allow you to prohibit inheriting from a certain type, under certain conditions. Thus, at the compilation stage, we know all the possible heirs of our type.
In particular, the example above can be rewritten in Kotlin
like this:
class Response ( val data: IVolatilityResponse ) sealed class VolatilityResponse class VolatilityValues : VolatilityResponse() { val data: Surface } class CalculationIsBroken : VolatilityResponse()
It turned out a little less than the code, but now in the compilation process we know that all possible VolatilityResponse
are in the same file with it, which means that the following code will not compile, since we did not go through all the possible values of the class.
fun getResponseString(response: VolatilityResponse) = when(response) { is VolatilityValues -> data.toString() }
However, it is worth remembering that such checks work only for functional calls. The code below will compile without errors:
fun getResponseString(response: VolatilityResponse) { when(response) { is VolatilityValues -> println(data.toString()) } }
Not all primitive types mean the same thing
Consider a relatively typical development for a database. Most likely, somewhere in the code you will have object identifiers. For example:
class Group { public int Id { get; } public string Name { get; } } class User { public int Id { get; } public int GroupId { get; } public string Name { get; } }
It seems like a standard code. The types even match those in the database. However, the question is: is the code below correct?
public bool IsInGroup(User user, Group group) { return user.Id == group.Id; } public User CreateUser(string name, Group group) { return new User { Id = group.Id, GroupId = group.Id, name = name } }
The answer is most likely not, since we compare the user Id
and group Id
in the first example. And in the second, we mistakenly set the id
from Group
as the id
from User
.
Oddly enough, this is quite simple to fix: just get the types GroupId
, UserId
and so on. Thus, the creation of User
will no longer work, since your types will not converge. Which is incredibly cool, because you could tell the compiler about the model.
Moreover, methods with the same parameters will work correctly for you, since now they will not be repeated:
public void SetUserGroup(UserId userId, GroupId groupId) { /* some sql code */ }
However, let us return to the comparison of identifiers. It is a little more complicated, since you must prevent the compiler from comparing the incomparable during the build process.
And you can do this as follows:
class GroupId { public int Id { get; } public bool Equals(GroupId groupId) => Id == groupId?.Id; [Obsolete("GroupId can be equal only with GroupId", error: true)] public override bool Equals(object obj) => Equals(obj as GroupId) public static bool operator==(GroupId id1, GroupId id2) { if(ReferenceEquals(id1, id2)) return true; if(ReferenceEquals(id1, null) || ReferenceEquals(id2, null)) return false; return id1.Id == id2.Id; } [Obsolete("GroupId can be equal only with GroupId", error: true)] public static bool operator==(object _, GroupId __) => throw new NotSupportedException("GroupId can be equal only with GroupId") [Obsolete("GroupId can be equal only with GroupId", error: true)] public static bool operator==(GroupId _, object __) => throw new NotSupportedException("GroupId can be equal only with GroupId") }
As a result:
- We again needed more code. Alas, if you want to give more information to the compiler, you often need to write more lines.
- We have created new types (we will talk about optimizations below), which sometimes can slightly degrade performance.
- In our code:
- We have forbidden to confuse identifiers. Both the compiler and the developer now clearly see that it is impossible to
GroupId
field into the GroupId
field - We are forbidden to compare the incomparable. I
IEquitable
that the comparison code is not completely completed (it is also desirable to implement the IEquitable
interface, you must also implement the GetHashCode
method), so the example does not just need to be copied to the project. However, the idea itself is clear: we explicitly prohibited the compiler from expressing when the wrong types were compared. Those. instead of saying "are these fruits equal?" the compiler now sees "is a pear equal to an apple?"
A little more about sql and limitations
Often, in our type applications, additional rules are introduced that are easy to verify. In the worst case, a number of functions look something like this:
void SetName(string name) { if(name == null || name.IsEmpty() || !name[0].IsLetter || !name[0].IsCapital || name.Length > MAX_NAME_COLUMN_LENGTH) { throw .... } /**/ }
That is, the function accepts a fairly wide type of input, and then runs the checks. This is generally not the case since:
- We did not explain to the programmer and compiler what we want here.
- In another similar function, you will need to copy the checks.
- When we received a
string
that will denote name
, we did not fall immediately, but for some reason continued execution to fall on a few processor instructions later.
The correct behavior:
- Create a separate type (in our case, apparently,
Name
). - In it, do all the necessary validations and checks.
- Wrap
string
in Name
as quickly as possible to get an error as quickly as possible.
As a result, we get:
- Less code, since we checked out the checks for
name
in the constructor. - Fail Fast strategy - now, having received a problematic name, we will fall immediately, instead of calling a couple more methods, but still fall. Moreover, instead of an error from a database of the type type too large type, we immediately find out that it makes no sense to even start processing such names.
- It is already more difficult for us to mix up the arguments if the function signature is:
void UpdateData(Name name, Email email, PhoneNumber number)
. After all, now we pass not three identical string
, but three different different entities.
A bit about casting
Introducing a fairly strict typing, we should also not forget that when transferring data to Sql, we still need to get a real identifier. And in this case, it is logical to slightly update the types that wrap one string
:
- Add an implementation of an interface of the form
interface IValueGet<TValue>{ TValue Wrapped { get; } }
interface IValueGet<TValue>{ TValue Wrapped { get; } }
. In this case, in the translation layer in Sql, we can get the value directly - Instead of creating a bunch of more or less identical types in the code, you can make an abstract ancestor, and inherit the rest from it. The result is a code of the form:
interface IValueGet<TValue> { TValue Wrapped { get; } } abstract class BaseWrapper : IValueGet<TValue> { protected BaseWrapper(TValue initialValue) { Wrapped = initialValue; } public TValue Wrapped { get; private set; } } sealed class Name : BaseWrapper<string> { public Name(string value) :base(value) { /*no necessary validations*/ } } sealed class UserId : BaseWrapper<int> { public UserId(int id) :base(id) { /*no necessary validations*/ } }
Performance
Speaking about creating a large number of types, you can often meet two dialectical arguments:
- The more types, nesting and il code, the slower the software, since it is more difficult for jit to optimize the program. Therefore, this kind of strict typing will lead to serious brakes in the project.
- The more wrappers, the more the application eats memory. Therefore, adding wrappers will seriously increase RAM requirements.
Strictly speaking, both arguments are often given without facts, however:
- In fact, in most applications on the same java, strings (and byte arrays) take the main memory. That is, creating wrappers is generally unlikely to be noticeable to the end user. However, due to this type of typing, we get an important plus: when analyzing a memory dump, you can evaluate what contribution each of your types makes to the memory. After all, you see not just an anonymous list of lines spread over the project. On the contrary, we can understand what types of objects are larger. Plus, due to the fact that only Wrappers hold strings and other massive objects, it’s easier for you to understand what contribution each particular wrapper type makes to shared memory.
- The argument about jit optimization is partly true, but it is not completely complete. Indeed, due to strict typing, your software begins to get rid of numerous checks at the entrance to the functions. All your models are checked for adequacy in their design. Thus, in the general case, you will have fewer checks (it is enough to simply require the correct type). In addition, due to the fact that checks are transferred to the constructor, and not smeared by code, it becomes easier to determine which of them really take time.
- Unfortunately, in this article I cannot give a full-fledged performance test, which compares a project with a large number of microtypes and with classical development, using only
int
, string
and other primitive types. The main reason is that for this you must first make a typical bold project for the test, and then justify that this particular project is a typical one. And with the second point, everything is complicated, since in real life the projects are really different. However, it will be rather strange to do synthetic tests, because, as I already said, the creation of microtype objects in Enterprise applications, according to my measurements, always left negligible resources (at the level of measurement error).
How can a code consisting of a large number of such microtypes be optimized?
Important: you should deal with such optimizations only when you receive guaranteed facts that it is microtypes that slow down the application. In my experience, such a situation is rather impossible. With a higher probability, the same logger will slow you down , because each operation is waiting for a flush to disk (everything was acceptable on the developer's computer with M.2 SSD, but a user with an old HDD sees completely different results).
However, the tricks themselves:
- Use meaningful types instead of reference ones. This can be useful if Wrapper also works with significant types, which means that in theory you can pass all the necessary information through the stack. Although it should be remembered that the acceleration will be only if your code really suffers from frequent GC precisely because of microtypes.
struct
in .Net can cause frequent boxing / unboxing. At the same time, such structures may require more memory in Dictionary
/ Map
collections (since arrays are allocated with a margin in them).inline
types from Kotlin / Scala have limited applicability. For example, you cannot store multiple fields in them (which can sometimes be useful for caching the ToString
/ GetHashCode
value).- A number of optimizers are able to allocate memory on the stack. In particular, .Net does this for small temporary objects , while GraalVM in Java can allocate an object on the stack, but then copy it to the heap if it had to be returned (suitable for code rich in conditions).
- Use the internment of objects (that is, try to take ready-made, pre-created, objects).
- If the constructor has one argument, then you can simply make a cache where the key is this argument, and the value is the previously created object. Thus, if the variety of objects is quite small, you can simply reuse the ready-made ones.
- If an object has several arguments, then you can simply create a new object, and then check to see if it is in the cache. If there is a similar one, then it is better to return the already created one.
- Such a scheme slows down the work of designers, since
Equals
/ GetHashCode
must be done for all arguments. However, it also accelerates future comparisons of objects, if you cache the value of the hash, since in this case, if they differ, then the objects are different. And identical objects will often have one link. - However, this optimization will speed up the program, due to the faster
GetHashCode
/ Equals
(see paragraph above). Plus, the lifetime of new objects (which are, however, in the cache) will drop dramatically, so that they will only get into Generation 0.
- When creating new objects, check the input parameters, and do not adjust. Despite the fact that this advice often goes in the paragraph on the coding style, in fact, it allows you to increase the effectiveness of the program. For example, if your object requires a string with only BIG LETTERS, then two approaches are often used to check: either make
ToUpperInvariant
from the argument, or check in a loop that all letters are large. In the first case, a new line is guaranteed to be created, in the second case, a maximum iterator is created. As a result, you save on memory (however, in both cases, each character will still be checked, so that performance will only increase in the context of a rarer garbage collection).
Conclusion
Once again I will repeat the important point from the title: all the things described in the article make sense in large projects that have been developed and used for years. In those where it is meaningful to reduce the cost of support and reduce the cost of adding new functionality. In other cases, it is often most reasonable to make a product as quickly as possible without bothering with tests, models and “good code”.
However, for long-term projects, it is reasonable to use the most strict typing, where in the model we can strictly describe what values are possible in principle.
If your service can sometimes return a non-working result, then express it in the model and show it to the developer explicitly. Do not add a thousand flags with descriptions in the documentation.
If your types can be the same in the program, but they are different in essence of the business, then define them exactly as different. Do not mix them, even if the types of their fields are the same.
If you have questions about productivity, apply the scientific method and take a test (or better, ask an independent person to check all this). In this scenario, you will actually speed up the program, and not just waste the time of the team. However, the opposite is also true: if there is a suspicion that your program or library is slow, then do a test. No need to say that everything is fine, just show it in numbers.