About 4 years ago, we moved our reporting system from Oracle to SAP Hana. Today, it stores about 10,000 tables, 38,000 people use it, and more than 5,000 download processes occur daily. At the moment, our complex, on which the system works, consists of 8 servers with 14 TB of memory. Every day, the reporting system processes 1.5 pb of data. At the same time, Hana was about 20 times more productive than Oracle, DB2 and MySQL. And today I want to tell how, within the framework of the integration of M.Video and Eldorado, we additionally increased the performance of the reporting system, which optimizations we made.
Today, out of several thousand reports created by the system, only 10 generate 70% of the total load. The heaviest report in the Hana database is Weekly Bussines Review. It runs for 30 minutes and consumes 4 TB of memory. 90% of all reports are generated by the contact center: we created a service that, when a client calls by his phone number, automatically opens a report and shows the operator the entire history of the caller’s purchases and his interaction with the company.
The key data model on which the bulk of the reports is built contains 1,500 tables. Most of them are reference tables. Using this model, you can analyze any direction of the company. An example is the visualization of a data model created using the Universe designer. True, it reflects only a tenth of our system.
At the time of the merger of M.Video and Eldorado, the amount of data in the two reporting systems was about 10 TB: 7 TB in the BW on HANA M.Video system, 2 TB in the BW on HANA Eldorado and 1 TB additional data in HADOOP (historical data). When combining the systems, we had hardware limitations: the M. Video complex consisted of 6 nodes (2 TB each), including one backup. This system could be expanded to a maximum of 8 nodes, of which one backup.
In preparation for the merger, we assumed that the total amount of all data would reach 13 TB. Therefore, based on SAP recommendations, we needed a system of 16 nodes of 2 TB each, including two backup nodes. Also, one node had to be allocated as a master node, which, with such a volume of information, would take over management functions. That is, for correct operation it was necessary to deploy 13 nodes.
As you can see, the available resources are categorically not enough. And that was the first challenge.
The second main difficulty before the merger was that the speed of the system often did not satisfy the needs of the business. The main reason was the large number of concurrent calls to the database. From the point of view of the system, it looked like a snowball, which could lead to either freezing and interruption of part of the processes, or even to global dumps of “out of memory” on the nodes.
It was clear that without significant improvements, a twofold increase in the amount of data in the reports (for two brands) would lead to an approximately twofold worsening of the situation.
Therefore, we decided to optimize the system in the following areas:
- Reporting Acceleration of the most critical and most resource-intensive reports and revision of the data model.
- Repository . Archiving and storage optimization.
- Downloads . Streamline the procedure and change the download schedule.
The general approach to optimization was this:
First, we conducted an analysis in all directions, identified the causes of problems, and then analyzed the capabilities of the system with the required resources. We also immediately tried to automate this process as much as possible in order to further quickly identify the causes of problems and quickly restore productivity.
What have we done:
- Changed the configuration of ABAP application servers: the number of instances, the effective use of NUMA technology and the optimal number of workflows.
- We applied the optimal parameters of HANA and the Linux operating system.
- We analyzed the decrease in CPU consumption.
- We analyzed the consumption of RAM in the entire observed time interval.
- We analyzed the occurrence of OOM on HANA.
- We analyzed the occurrence of locks in the system and the availability of system resources for wait operations (wait).
- We analyzed the balancing of data taking into account the redistribution and repartition of data for the SCALE-OUT HANA solution.
- We analyzed the causes of ABAP dumps that affect the operation of critical chains.
Based on the results, performance reports were compiled, as well as instructions so that in the future it was possible to independently determine bottlenecks in the system and peak time intervals.
What results were achieved:
A number of optimized reports SAP BO began to work many times faster
and consume hundreds of times less memory
Here are some striking examples of how the system incorrectly fulfills the selection conditions, and how to correctly build queries in HANA.
Problems were revealed when filtering by non-materialized objects, especially (!) When using
indicators (a CD can be written both at the Universe level and in a function in CV).
Even if you exclude the CD from the query, the first option will still run 20 times slower than the second, and with a CD, the speed is more than 500 times higher.
A special case of the use of non-materialized objects in filters: composite filters of two or more objects, for example, gluing a week and a year:
Queries with glued filters do not work as slowly as converting to a date, but they still slow down queries (about 2-3 times).
To collect statistics on the operation of the reporting system, loading processes and chains, we developed the following scheme:
At the same time, we added a special comment to the reports with the name of the report. Thus, we can compare the load from different parts of the system and compare the period to period.
We have many plans for the development of business functionality and a substantial revision of the data visualization tool. The global trend in which we are actively participating is to integrate the reporting system into the paradigm of digital transformation.
What is meant?
When our reporting system was young, users often came to us with similar requests: “Automate the construction of a report that shows how much net profit a particular store or the whole company received.”
Then they began to come to us with requests to come up with an algorithm that would build a plan or forecast for a net profit, depending on specific factors.
And today we have come to the conclusion that users want to know the accurate forecast of net profit. We have all the necessary data to develop forecasting algorithms, and there are data analysis specialists who can create machine learning models. As you know, this requires really large amounts of data, so one of the main trends in the development of our reporting system is the transition to the analysis and creation of models based on big data.
A few words about our team
Today, large companies are increasingly introducing forecasting systems that are based on machine learning algorithms developed by the system itself. Two years ago, we created a competency center in the field of data analysis of the Digital Retail Data Science Center, and this year we have a group of data engineers. We are introducing a new system for processing and analyzing big data. And we need people in the team in the departments of support, development and applied data analysis.
If you are interested in these areas, if you feel the strength in yourself - welcome! An interesting and difficult job awaits you, sometimes stressful, but always creative.