Optimizing JavaScript loading on Wikipedia

The author of the material, the translation of which we publish today, says that he, in mid-September 2019, finally completed the project, which he has been working on for a year. The goal of this project was to reduce the size of the manifest required to initialize the Wikipedia asynchronous JavaScript pipeline. Namely, the manifest size was 36 Kb. It had to fit in less than 28 Kb, which corresponds to two 14-kilobyte fragments of the sequence of Internet packets.

The result of this project was a daily saving of 4.3 terabytes of traffic.


At first, the manifest size exceeded 36 Kb, and after optimization, its size became smaller than 28 Kb

The graph shows a gradual decrease in manifest size. We are talking about compressed data (that is, it is the net load on the network, which creates the transfer of this data from the server to the browser).

Optimization process


The initialization manifest is represented by data that is not easy to optimize. The bulk of his code is not something like functional logic that can be optimized by traditional means. Instead, almost the entire manifest is represented by pure data. This data is automatically generated by the ResourceLoader content delivery system. They are a registry of module bundles. Wikipedia uses the ResourceLoader system to work with JavaScript, CSS, and text resources.

The registry includes metadata for all front-end functionality deployed on Wikipedia. The manifest lists the names of bundles, their current versions, their dependencies on other similar bundles are described here.

I started by looking for code that was never used in practice ( T202154 ). This included detecting incomplete or forgotten pieces of code related to legacy features. The unused code was also removed right away, ensuring compatibility with browsers that no longer passed our test, which ensured their inclusion in the group of modern browsers ( Grade A ). I also prepared a document on page loading performance. This document served as a reference, allowing developers to understand the impact of changes of various types on different stages of the page loading process.

Reduce the number of modules


The next step was a collaboration with the engineering teams of the Wikimedia Foundation and Wikimedia Deutschland. We needed to find out what features of the system use an excessive number of modules. For example, realizing this, it would be possible to combine previously scattered bundles from which a certain functionality was built. Such bundles, even in a scattered state, always loaded together. This would lead to the fact that there would be fewer endpoints in the system whose metadata should be stored in the registry formed by ResourceLoader.

Here are some interesting points about applying this optimization approach:


It is also very important that the Wikidata client for Wikipedia has been optimized. This part of the work itself was an epic project ( T203696 ). Initially, 248 individual modules were responsible for implementing this feature. After we managed to get rid of more than 200 modules, there were only 42 of them.

The above diagram shows the small improvements that were made to the project over the year. All of them brought us closer to the goal. I would especially like to note two large drops in the size of the manifest. One such fall occurred in the first week of August. It was then that an improved version of Wikidata was deployed. The second drop in size can be observed at the very end of the graph. It happened in mid-September. Now I would like to tell you about him.

Reduce metadata sizes


The improvement in the manifesto, which occurred in mid-September, was made possible thanks to two global changes, which were aimed at a more intelligent data organization.

The first improvement is that earlier, EventLogging extension schema metadata was part of the main manifest. This mechanism was refactored, making it so that schema metadata was now included in the EventLogging client's JS bundle. As a result, the contribution to the manifest size made earlier by EventLogging has been reduced by more than 90%. And this meant that the critical path now contains 2 KB less data! This, in addition, meant that expanding the capabilities of EventLogging no longer led to an increase in manifest size. When assembling such bundles, a new feature of ResourceLoader, Package Files, was used . This feature was introduced in February 2019, one of the reasons for the interest in it is the fact that it could help reduce the number of modules in the registry. Package Files greatly simplifies the process of combining generated data and JavaScript code in a single module.

The second improvement occurred when we reduced the average size of each registry entry ( T229245 ). The manifest contains two entries for each module. This is the name of the module and the identifier (ID) of its version. Version identifier previously needed 7 bytes of data. After thinking about the birthday paradox in the context of ResourceLoader, we decided that the probability spectrum for version IDs could safely be reduced from 78 billion to “only” 60 million. Details on this can be found in the code comments . But, to summarize this improvement, we can say that this allowed us to save 2 bytes in the description of each of the 1,100 modules that are still in the registry. As a result, the size of the manifest was reduced by another 2-3 Kb.

Below is an enlarged fragment of the diagram showing the last few days of operation (these indicators are taken from the synthetic monitoring system, uncompressed data are used here).


Manifest resizing at the final stage of a project

The change was captured by the ResourceLoader monitoring system. The screenshot shows the Startup manifest size panel located in a public instance of Grafana. Here you can see that the size of the uncompressed data stream decreased by 2.8 Kb.

The deployment of the system, which took place in mid-September, led to the achievement of the original goal, which was to compress the manifest to a size not exceeding 28 Kb. The implementation of this large-scale project led to the fact that the initialization manifest was reduced by 9 Kb (we are talking about compressed data). A year ago, this size was 36.2 Kb, and after the completion of the project it was already 27.2 Kb.

About 363,000 page views are generated every minute on Wikipedia and related projects. In an hour - 21 million and 800 thousand. Daily - 523 million ( here are the statistics on page views). That version of the system, which was deployed in mid-September, led to savings of approximately 1.4 terabytes of traffic per day. And if you compare what is today with what it was a year ago, it turns out that now 4.3 terabytes of traffic are now saved daily.

What's next?


We managed to fit the 28 Kb Wikipedia initialization manifest. This is the size that was chosen because it is the smallest size that is a multiple of 14 Kb. Data of this size can be placed in fragments of the sequence of Internet packets transmitted to the browser.

Now we face a new challenge: not to give up positions. In the last year, I have been closely watching the manifesto . I did this in order to make sure of our successes and discover potential problems that are pulling us back. In the end, I automated this process using the public Grafana dashboard .

If you believe this panel, then we still have many opportunities to improve the packaging of the code, and to solve problems that are even stronger than now, facilitate the creation of bundles. I hope that these upcoming improvements will be useful to us, but for now we are working on new features of the system, while striving to comply with the requirements for the level of project performance.

Dear readers! Have you ever participated in the optimization of large Internet projects?


Source: https://habr.com/ru/post/470874/


All Articles