Exploring the depths of type annotations in Python. Part 2

Today we are publishing the second part of a translation of the material, which is devoted to type annotations in Python.

The first part

How does Python support data types?

Python is a dynamically typed language. This means that the types of variables used are checked only during program execution. In the example given in the previous part of the article, one could see that a programmer who writes in Python did not need to plan the types of variables and think about how much memory would be needed to store his data.

Here's what happens when you prepare your Python code for execution: “In Python, the source code is converted using CPython to a much simpler form called bytecode. The bytecode consists of instructions that, in essence, are similar to processor instructions. But they are not executed by the processor, but by a software system called a virtual machine. (This is not about virtual machines whose capabilities allow you to run entire operating systems in them. In our case, this is an environment that is a simplified version of the environment available to programs running on the processor). "

How does CPython know what variable types should be when it prepares a program for execution? After all, we did not indicate these types. CPython does not know about this. He only knows that variables are objects. Everything in Python is an object , at least until it turns out that something has a more specific type.

For example, Python considers as a string everything that is enclosed in single or double quotes. If Python encounters a number, it considers the corresponding value to be of a numeric type. If we try to do something with an entity that cannot be done with an entity of its type, Python will let us know later.

Consider the following error message when trying to add a string and a number:

name = 'Vicki' seconds = 4.71; --------------------------------------------------------------------------- TypeError                 Traceback (most recent call last) <ipython-input-9-71805d305c0b> in <module>       3       4 ----> 5 name + seconds TypeError: must be str, not float 

The system tells us that it cannot add strings and floating-point numbers. Moreover, the fact that name is a string, and seconds is a number, did not interest the system until an attempt was made to add name and seconds .

In other words, it can be described as follows : “Duck typing is used when performing addition. Python is not interested in which type a particular object has. All that the system is interested in is whether it returns a meaningful call to the addition method. If this is not so, an error is issued. "

What would that mean? This means that if we write programs in Python, we will not receive an error message until the CPython interpreter is engaged in the execution of the same line in which there is an error.

This approach was inconvenient when applied in teams working on large projects. The fact is that in such projects they work not with separate variables, but with complex data structures. In such projects, some functions call other functions, and those, in turn, call some other functions. Team members should be able to quickly check the code for their projects. If they are not able to write good tests that detect errors in projects before they are put into production, this means that such projects can expect big problems.

Strictly speaking, here we come to the conversation about type annotations in Python.

We can say that, in general, the use of type annotations has many strengths . If you work with complex data structures or functions that take many input values, using annotations greatly simplifies working with similar structures and functions. Especially - some time after their creation. If you have only one function with one parameter, as in the examples given here, then working with such a function is, in any case, very simple.

What if we need to work with complex functions that accept many input values ​​that are similar to this from the PyTorch documentation:

 def train(args, model, device, train_loader, optimizer, epoch):    model.train()    for batch_idx, (data, target) in enumerate(train_loader):        data, target = data.to(device), target.to(device)        optimizer.zero_grad()        output = model(data)        loss = F.nll_loss(output, target)        loss.backward()        optimizer.step()        if batch_idx % args.log_interval == 0:            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(                epoch, batch_idx * len(data), len(train_loader.dataset), 100. * batch_idx / len(train_loader), loss.item())) 

What is model ? Of course, we can dig into the code base and find out:

 model = Net().to(device) 

But it would be nice if you could just specify the type of model in the signature of the function and save yourself from unnecessary code analysis. Perhaps it would look like this:

 def train(args, model (type Net), device, train_loader, optimizer, epoch): 

What about device ? If you rummage through the code, you can find out the following:

 device = torch.device("cuda" if use_cuda else "cpu") 

Now we are faced with the question of what torch.device . This is a special type of PyTorch. Its description can be found in the corresponding section of the PyTorch documentation.

It would be nice if we could specify the type device in the argument list of the function. Thus, we would save a lot of time for those who would have to analyze this code.

 def train(args, model (type Net), device (type torch.Device), train_loader, optimizer, epoch): 

These considerations can continue for a very long time.

As a result, it turns out that type annotations are very useful for someone writing code. But they also benefit those who read someone else's code. It is much easier to read typed code than code, to understand which you have to deal with what constitutes an entity. Type annotations improve code readability.

So, what has been done in Python to bring the code to the same level of readability that distinguishes code written in statically typed languages?

Type Annotations in Python

Now we are ready to talk seriously about type annotations in Python. When reading programs written in Python 2, one could see that the programmers supplied their code with hints telling the readers of the code what type the variables or values ​​returned by the functions are.

Similar code originally looked like this :

 users = [] # type: List[UserID] examples = {} # type: Dict[str, Any] 

Type annotations used to be simple comments. But it so happened that Python began to gradually shift towards a more uniform way of handling annotations. In particular, we are talking about the emergence of document PEP 3107 , dedicated to the annotation of functions.

Next, work began on PEP 484 . This document, devoted to type annotations, was developed in close connection with mypy, the DropBox project, which aims to verify types before running scripts. Using mypy, it is worth remembering that type checking is not performed during script execution. An error message can be received at runtime if, for example, you try to make something of a type that this type does not support. Say - if you try to slice a dictionary or call the .pop() method for a string.

Here is what you can learn from PEP 484 about the details of annotation implementation: “Although these annotations are available at run time through the regular annotations attribute, type checks are not performed at run time. Instead, this proposal provides for the existence of a separate stand-alone type checking tool, with which the user, if he wishes, can check the source code of his programs. In general, a similar type checking tool works like a very powerful linter. "Although, of course, individual users can use a similar tool for checking types at runtime, either for the implementation of the Design By Contract methodology or for the implementation of JIT optimization. But it should be noted that such tools have not yet reached sufficient maturity."

How does working with type annotations look like in practice?

For example, their use means the ability to facilitate work in various IDEs. So, PyCharm offers, based on type information, code completion and checking. Similar features are available in VS Code.

Type annotations are useful for one more reason: they protect the developer from stupid mistakes. Here is a great example of such protection.

Suppose we add the names of people to the dictionary:

 names = {'Vicki': 'Boykis',         'Kim': 'Kardashian'} def append_name(dict, first_name, last_name):    dict[first_name] = last_name   append_name(names,'Kanye',9) 

If we allow this, there will be many incorrectly formed entries in the dictionary.
Let's fix this:

 from typing import Dict names_new: Dict[str, str] = {'Vicki': 'Boykis',                             'Kim': 'Kardashian'} def append_name(dic: Dict[str, str] , first_name: str, last_name: str):    dic[first_name] = last_name append_name(names_new,'Kanye',9.7) names_new 

Now check this code with mypy and get the following:

 (kanye) mbp-vboykis:types vboykis$ mypy kanye.py kanye.py:9: error: Argument 3 to "append_name" has incompatible type "float"; expected "str" 

It can be seen that mypy does not allow you to use the number where the string is expected. Those who want to use such checks on a regular basis are advised to include mypy where code testing is carried out in their continuous integration systems.

Type hints in various IDEs

One of the most important benefits of using type annotations is that they allow Python programmers to use the same code completion features in various IDEs that are available for statically typed languages.

For example, suppose you have a piece of code that resembles the following. This is a couple of functions from previous examples wrapped in classes.

 from typing import Dict class rainfallRate:    def __init__(self, hours, inches):        self.hours= hours        self.inches = inches    def calculateRate(self, inches:int, hours:int) -> float:        return inches/hours rainfallRate.calculateRate() class addNametoDict:    def __init__(self, first_name, last_name):        self.first_name = first_name        self.last_name = last_name        self.dict = dict    def append_name(dict:Dict[str, str], first_name:str, last_name:str):        dict[first_name] = last_name addNametoDict.append_name() 

The nice thing is that we, on our own initiative, added type descriptions to the code, we can observe what happens in the program when the class methods are called:

IDE Type Tips

Getting started with type annotations

You can find good recommendations in the documentation for mypy regarding what to start with when starting typing the codebase:

  1. Start small - make sure that some files containing multiple annotations are validated with mypy.
  2. Write a script to run mypy. This will help achieve consistent test results.
  3. Run mypy in CI pipelines to prevent type errors.
  4. Gradually annotate the modules that are used most often in the project.
  5. Add type annotations to existing code that you are modifying; equip them with the new code you write.
  6. Use MonkeyType or PyAnnotate to automatically annotate old code.

Before embarking on annotating your own code, it will be useful for you to deal with something.

First, you will need to import the typing module into the code if you use something other than strings, integers, booleans, and the values ​​of other basic Python types.

Secondly, this module makes it possible to work with several complex types. Among them are Dict , Tuple , List and Set . A construction of the form Dict[str, float] means that you want to work with a dictionary whose elements use a string as a key and a floating-point number as a value. There are also types called Optional and Union .

Thirdly, you need to familiarize yourself with the type annotation format:

 import typing def some_function(variable: type) -> return_type:  do_something 

If you want to know more about how to start applying type annotations in your projects, I would like to note that a lot of good tutorials have been devoted to this. Here is one of them. I consider it the best. Having mastered it, you will learn about code annotation and its verification.

The results. Is it worth using type annotations in Python?

Now let's ask ourselves if you should use type annotations in Python. Actually, it depends on the features of your project. Here is what Guido van Rossum says about this in the documentation for mypy: “The purpose of mypy is not to convince everyone to write statically typed Python code. Static typing is completely optional now and in the future. Mypy's goal is to give Python programmers more options. It is to make Python a more competitive alternative to other statically typed languages ​​used in large projects. It is to increase the productivity of programmers and improve the quality of software. "

The time required to configure mypy and to plan the types necessary for a certain program does not justify itself in small projects and during experiments (for example, those performed in Jupyter). Which project should be considered small? Probably the one whose volume, according to careful estimates, does not exceed 1000 lines.

Type annotations make sense in larger projects. There they can, in particular, save a lot of time. We are talking about projects developed by groups of programmers, about packages, about code, which is used to develop version control systems and CI pipelines.

I believe that type annotations will become much more common in the next couple of years than they are now, not to mention the fact that they may well turn into a regular everyday tool. And I believe that anyone who starts working with them before others will lose nothing.

Dear readers! Do you use type annotations in your Python projects?

Source: https://habr.com/ru/post/463931/

All Articles