How to reduce memory usage and speed up Python code using generators

Hello. Today we want to share one useful translation prepared in advance of the launch of the course "Web-developer in Python . " Writing time-efficient and memory-efficient code in Python is especially important when creating a Web application, machine learning model, or testing.



When I started learning generators in Python, I had no idea how important they were. However, they constantly helped me with writing functions throughout my journey through machine learning.


Generator functions allow you to declare a function that will behave like an iterator. They allow programmers to create fast, simple, and clean iterators. An iterator is an object that can be repeated (looped). It is used to abstract the data container and make it behave like an iterable object. For example, an example of an iterable object can be strings, lists, and dictionaries.


The generator looks like a function, but uses the yield keyword instead of return. Let's look at an example to make it clearer.


def generate_numbers(): n = 0 while n < 3: yield n n += 1 

This is a generator function. When you call it, it returns a generator object.


 >>> numbers = generate_numbers() >>> type(numbers) <class 'generator'> 

It is important to pay attention to how the state is encapsulated in the body of the generator function. You can iterate one at a time using the built-in next () function:


 >>> next_number = generate_numbers() >>> next(next_number) 0 >>> next(next_number) 1 >>> next(next_number) 2 

What happens if you call next () after the end of execution?


StopIteration is a built-in type of exception that occurs automatically as soon as the generator stops returning a result. This is a stop signal for the for loop.


Yield statement


Its main task is to control the flow of the generator function so that it looks like a return statement. When calling the generator function or using the generator expression, it returns a special iterator called the generator. To use a generator, assign it to some variable. When calling special methods in the generator, such as next (), the function code will be executed until yield.


When it gets into the yield statement, the program pauses the function and returns the value to the object that initiated the execution. (Whereas return stops the execution of the function completely.) When the function is suspended, its state is preserved.


Now that we are familiar with generators in Python, let's compare the usual approach with the approach that uses generators in terms of memory and time that are spent on code execution.


Formulation of the problem


Suppose we need to go through a large list of numbers (for example, 100,000,000) and save the squares of all numbers that need to be stored separately in another list.


Usual approach


 import memory_profiler import time def check_even(numbers): even = [] for num in numbers: if num % 2 == 0: even.append(num*num) return even if __name__ == '__main__': m1 = memory_profiler.memory_usage() t1 = time.clock() cubes = check_even(range(100000000)) t2 = time.clock() m2 = memory_profiler.memory_usage() time_diff = t2 - t1 mem_diff = m2[0] - m1[0] print(f"It took {time_diff} Secs and {mem_diff} Mb to execute this method") 

After running the code above, we get the following:


 It took 21.876470000000005 Secs and 1929.703125 Mb to execute this method 

Using generators


 import memory_profiler import time def check_even(numbers): for num in numbers: if num % 2 == 0: yield num * num if __name__ == '__main__': m1 = memory_profiler.memory_usage() t1 = time.clock() cubes = check_even(range(100000000)) t2 = time.clock() m2 = memory_profiler.memory_usage() time_diff = t2 - t1 mem_diff = m2[0] - m1[0] print(f"It took {time_diff} Secs and {mem_diff} Mb to execute this method") 

After running the code above, we get the following:


 It took 2.9999999995311555e-05 Secs and 0.02656277 Mb to execute this method 

As we can see, runtime and memory used are significantly reduced. Generators operate on a principle known as "lazy computing." This means that they can save processor, memory, and other computing resources.


Conclusion


I hope in this article I was able to show how generators in Python can be used to save resources such as memory and time. This advantage appears because the generators do not store all the results in memory, but calculate them on the fly, and the memory is used only if we request the result of the calculations. Generators also allow you to abstract a large amount of boilerplate code, which is necessary for writing iterators, so they also help reduce the amount of code.



Source: https://habr.com/ru/post/477926/


All Articles