A Few Considerations for Concurrent Computing in R for "Enterprise" Tasks

Parallel or distributed computing - a thing in itself is very nontrivial. Both the development environment should support, and the DS specialist should have the skills to conduct parallel computing, and the task should be reduced to a form that can be divided into parts, if one exists. But with a competent approach, you can greatly accelerate the solution of the problem with single-threaded R, if you have at least a multi-core processor (and almost everyone has it now), adjusted for the theoretical acceleration limit determined by Amdal's law . However, in some cases, even it can be circumvented.


It is a continuation of previous publications .


Typical approach


As a rule, when an analyst (DS specialist, developer, or choose any suitable name for yourself) tries to speed up the task within one computer and begins to move from single-threaded to multi-threaded mode, he does it in a boilerplate manner. parApply , foreach\%dopar% , etc. You can see compactly and intelligibly, for example, here: “Parallelism in R” . 3 steps:


  1. make core-1 thread
  2. run using foreach ,
  3. We collect the answers and get the result.

For typical computing tasks that occupy 100% of the CPU and do not require the transfer of a large amount of input information, this is the right approach. The main point that needs attention is to provide logging within the threads in order to be able to control the process. Without logging, the flight will go even without instruments.


In the case of "enterprise" tasks, when they are parallelized, many additional methodological difficulties arise that significantly reduce the effect of the above straightforward approach:



This is a completely typical scenario when, as part of the process, you need to get a voluminous job as an input, read data from the disk, pick up a large chunk from the database, ask for external systems and wait for an answer from them (classic - REST API request), and then return N to the parent process megabytes as a result.


Map-reduce by users, locations, documents, ip-addresses, dates, ... (add it yourself). In the most sad cases, parallel execution may be longer than single-threaded. Out of memory problems may also occur. Everything is lost? Not at all.


Alternative way


Let us consider the thesis of a way to radically improve the situation. At the same time, we do not forget that we live in the framework of a full zoo. Productive circuit on *nix , DS laptops on Win * nix \ MacOS, but it is necessary that it works uniformly everywhere.


  1. The microtask: received the user input, requested a database, requested 2 external ICs via REST, downloaded and parsed the directory from the disk, performed a calculation, dumped the result to disk \ database. Users, for example, 10^6 .
  2. We turn to the use of the future package and the universal doFuture adapter.
  3. If separate tasks are such that within separate tasks processor time is needed in a small amount (we are waiting for answers from third-party systems), then doFuture allows you to go from a doFuture break to a break by separate processes in one line (you can see the startup parameters in *nix in htop ) .
  4. These processes can be created much more than cores. No clinching will occur as individual processes are in standby mode most of the time. But it will be necessary to experimentally select the optimal number of processes based on the cyclogram of a typical processing process.

Result - the original task is many times faster. Acceleration can be even greater than the number of available cores.
There is no code consciously, since the main task of the publication is to share the approach and the excellent family of future packages.


PS


There are a few small nuances that also need to be traced:



Previous publication - “Business processes in enterprise companies: speculation and reality. We shed light with R " .

Source: https://habr.com/ru/post/462469/


All Articles