How to run Hi-CPU VDS for Bitrix, disperse parrots and not go broke

Few hosters offer VDS tariffs with a high processor clock speed, although it seems that everything is simple: I’ve inserted more powerful i9 into the server, set up billing and you're done.

When we prepared the Hi-CPU tariffs, we found out that:


We talk about how we dealt with this and launched Hi CPU.



Why do I need a Hi-CPU


We prepared the perfect tariff for Bitrix. Why?
Of course, because of the money.



According to CMS iTrack , half of all sites made on CMS are WordPress and only 11.68% of sites use Bitrix. However , CMS Magazine rated twice as many commercial sites using Bitrix as WordPress. Most WordPress sites are blogs, personal sites, and other business cards.

Thousands of Russian companies use Bitrix, ready to pay for high-quality VDS. And many need Hi-CPU solutions that are not enough on the market: most often, hosters offer tariffs with a processor frequency of 2-3 gigahertz - suitable for everyday tasks, but for high-speed processing, many small ones are no longer enough. Especially if the hoster does not struggle with processor time oversell.

So the sure way to become a quality Bitrix hosting was to make a profitable Hi-CPU tariff and become a featured partner - to get into the rating of recommended hosts , which is Bitrix itself.

Preparation: Initial Testing


To begin with, we checked how many Bitrix parrots the assembly produces at a standard rate. Processor - Intel Scalable Xeon Silver 4116. Received 107 parrots.


A similar assembly is available today, from 2 rubles per day .

Intel Scalable Xeon Silver 4116 does a good job with typical VDS tasks, but Bitrix needs something more powerful, especially if the goal is to get to the top of the rating.



Finding powerful iron for parrots


The first thing to do is to take the processor with a higher frequency: it is the processor frequency that mainly grows parrots.

At first, self-assembly based on the Intel Core i9-9900K S1151 was considered. Some colleagues do just that and there are even more parrots coming out of them than on server processors. However, as we mentioned, the top-end i9s and assemblies based on them consume so much energy that they would have to either raise the price or go broke on electricity bills. And the data center was not enthusiastic: he demanded to organize additional cooling of racks and engineers for setting up and maintaining self-assembly (and additional cooling of engineers).

Given the risks, the lack of a guarantee, and the overall desktop stuffing, self-assembly turned out to be more of a problem than a benefit.

We went to look for the best that official suppliers offered. In addition to performance and energy efficiency, we looked at the space occupied in the rack: you have to pay money for servicing each unit, this also increases the cost of the tariff.

The best option seemed to find MicroCloud in 3U. In fact, these are 12 servers in one, which allows 4 times to save rack space at the same performance. The servers were chosen in November 2018 and then there were not so many solutions in 3U, the choice almost immediately fell on the Supermicro SuperServer 5039MS-H12TRF .


It consists of twelve separate nodes


Each node is essentially a separate server. We have a different assembly than in the picture, but the principle is the same.

The heart chosen Intel Xeon E3-1270 v6. We relied on experience: we already used this processor on the Dell R330 platform for other highly loaded projects. E3-1270 has never failed, the price and quality suited us.

To begin with, they bought only one Microcloud: it costs around 20 thousand dollars, and there was not much free money. It’s for the better: new, more effective and inexpensive solutions are constantly appearing on the market. By the time money appeared on the new server, we analyzed the market again.

First installation issue


The first MicroCloud was delivered a week after ordering. Already in the data center it turned out that it does not fit in a rack. We wanted to put it to the 1U servers, but the rails in the rack are located so that Microcloud did not enter. To place it, I would have to arrange a downtime for other servers and move the guides.

We decided to postpone the launch and put Microcloud in a new rack. This turned out to be the optimal solution: the power consumption and heat dissipation of MicroCloud is different from ordinary servers. Yes, and network equipment with its own characteristics.


MicroClouds now live in a separate rack

MicroCloud planned to install ten-gigabit network cards in order to properly disperse the migration of VDS containers. We already did this trick with 1U servers, but with MicroCloud everything turned out to be more complicated.

Ten-gigabit network cards for MicroCloud servers were a rarity. We ordered the Low Profile AOM-CTGS-i2TM MicroLP, waited a couple of months and received the answer: “Sorry, the manufacturer rarely encounters such orders. The cards will be ready in six months. ” I had to abandon the idea: while there are enough standard gigabit cards, but in the future we will try again to buy ten-gigabit cards.


A bit of hickporno: this is how MicroCloud's are assembled

Bitrix customization template and application


Initially, we built a template with a bias towards Bitrix, but also convenience for the rest of the CMS: for example, we added a non-standard configuration for Vesta with a choice of the PHP version. All configuration and optimization was done on the apache + mod_fcgi scheme. The parameters were selected so that they give the best average result for all tariffs.

Bitrix performance depends on the processor clock speed. On average, the processor frequency for Hi-CPU tariffs was 40-50% higher than that of processors serving regular tariffs. The measurement results correlated: at least 30% more performance with a high load on the server, about 60% - in "good weather".



We got these numbers at a tariff that costs 26.6 rubles per day

When everything was debugged, they registered on the site for Bitrix partners and filled out an application, to which they attached data from VDS with a template optimized for Bitrix.

War for the first place in the ranking


The performance results were confirmed, but the final rating was lower than we expected: the rating takes into account not only performance, but also the absolute cost of the tariff and the availability of a test period.



And we consciously abandoned the struggle for first place in the ranking for two reasons.

Price and common sense


Other hosting companies sent applications with their cheapest tariffs, which have less RAM, space on the SSD and traffic than ours. We sent an application with a more expensive tariff, but more suitable for the normal operation of Bitrix.

Why refused a free trial period


The lack of a free trial period did not allow us to get on the first line of the rating, but we had a serious reason to refuse it. Why? Because we are service oriented.

Creating VDSina we rely on convenience: registration should take place on the fly, without captcha (we have heartburn from her), verification of passport data and confirmation of phone number. I entered the mail, replenished the balance by 30 rubles and VDS unfolds in 60 seconds - for us this is a matter of principle.

Hostings complicate registration in order to deal with scammers who mine in the free trial period, creating hundreds of free accounts.

With this scheme of dealing with freeloaders, normal customers suffer and we basically do not want to load them with our problem in general.

In order to be able to test the hosting, we made daily billing and a minimum payment of 30 rubles - it costs practically nothing to customers who are really looking for a convenient VDS for work.

So far, our customers are happy with this situation, and so are we.

Performance tests of our Hi CPUs





Test Details
BYTE UNIX regular VDS

=================================================== =================
BYTE UNIX Benchmarks (Version 5.1.3)

System: v148399.hosted-by-vdsina.ru: GNU / Linux
OS: GNU / Linux - 3.10.0-957.5.1.el7.x86_64 - # 1 SMP Fri Feb 1 14:54:57 UTC 2019
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap = "UTF-8", collate = "UTF-8")
CPU 0: Common KVM processor (4394.9 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER / SYSEXIT, SYSCALL / SYSRET
CPU 1: Common KVM processor (4394.9 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER / SYSEXIT, SYSCALL / SYSRET
10:42:54 up 21 min, 1 user, load average: 0.07, 0.21, 0.21; runlevel 3

- Benchmark Run: Wed Sep 11 2019 10:42:54 - 11:10:59
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 26770638.9 lps (10.0 s, 7 samples)
Double-Precision Whetstone 4222.7 MWIPS (9.8 s, 7 samples)
Execl Throughput 1763.2 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 226998.4 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 60299.3 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 702987.3 KBps (30.0 s, 2 samples)
Pipe Throughput 315773.1 lps (10.0 s, 7 samples)
Pipe-based Context Switching 85613.2 lps (10.0 s, 7 samples)
Process Creation 5140.5 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 3570.0 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 730.3 lpm (60.1 s, 2 samples)
System Call Overhead 293013.8 lps (10.0 s, 7 samples)

System Benchmarks Index Values ​​BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 26770638.9 2294.0
Double-Precision Whetstone 55.0 4222.7 767.8
Execl Throughput 43.0 1763.2 410.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 226998.4 573.2
File Copy 256 bufsize 500 maxblocks 1655.0 60299.3 364.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 702987.3 1212.0
Pipe Throughput 12440.0 315773.1 253.8
Pipe-based Context Switching 4000.0 85613.2 214.0
Process Creation 126.0 5140.5 408.0
Shell Scripts (1 concurrent) 42.4 3570.0 842.0
Shell Scripts (8 concurrent) 6.0 730.3 1217.2
System Call Overhead 15000.0 293013.8 195.3
========
System Benchmarks Index Score 552.6

- Benchmark Run: Wed Sep 11 2019 11:10:59 - 11:39:17
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables 50497275.9 lps (10.0 s, 7 samples)
Double-Precision Whetstone 8233.3 MWIPS (9.8 s, 7 samples)
Execl Throughput 3435.3 lps (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 386580.4 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 102199.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1187846.7 KBps (30.0 s, 2 samples)
Pipe Throughput 614216.9 lps (10.0 s, 7 samples)
Pipe-based Context Switching 168877.2 lps (10.0 s, 7 samples)
Process Creation 11055.3 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 5620.2 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 804.7 lpm (60.1 s, 2 samples)
System Call Overhead 561793.2 lps (10.0 s, 7 samples)

System Benchmarks Index Values ​​BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 50497275.9 4327.1
Double-Precision Whetstone 55.0 8233.3 1497.0
Execl Throughput 43.0 3435.3 798.9
File Copy 1024 bufsize 2000 maxblocks 3960.0 386580.4 976.2
File Copy 256 bufsize 500 maxblocks 1655.0 102199.5 617.5
File Copy 4096 bufsize 8000 maxblocks 5800.0 1187846.7 2048.0
Pipe Throughput 12440.0 614216.9 493.7
Pipe-based Context Switching 4000.0 168877.2 422.2
Process Creation 126.0 11055.3 877.4
Shell Scripts (1 concurrent) 42.4 5620.2 1325.5
Shell Scripts (8 concurrent) 6.0 804.7 1341.2
System Call Overhead 15000.0 561793.2 374.5
========
System Benchmarks Index Score 979.3

BYTE UNIX Old Hi-CPU VDS

=================================================== =================
BYTE UNIX Benchmarks (Version 5.1.3)

System: v148401.hosted-by-vdsina.ru: GNU / Linux
OS: GNU / Linux - 3.10.0-957.5.1.el7.x86_64 - # 1 SMP Fri Feb 1 14:54:57 UTC 2019
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap = "UTF-8", collate = "UTF-8")
CPU 0: Common KVM processor (6624.1 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER / SYSEXIT, SYSCALL / SYSRET
CPU 1: Common KVM processor (6624.1 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER / SYSEXIT, SYSCALL / SYSRET
14:01:52 up 3:40, 1 user, load average: 0.00, 0.07, 0.07; runlevel 3

- Benchmark Run: Wed Sep 11 2019 14:01:52 - 14:30:53
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 41165945.1 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3454.8 MWIPS (15.4 s, 7 samples)
Execl Throughput 2102.9 lps (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 323989.0 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 88536.1 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1090490.9 KBps (30.0 s, 2 samples)
Pipe Throughput 456730.9 lps (10.0 s, 7 samples)
Pipe-based Context Switching 126170.4 lps (10.0 s, 7 samples)
Process Creation 6282.5 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 5172.3 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1122.8 lpm (60.0 s, 2 samples)
System Call Overhead 426422.9 lps (10.0 s, 7 samples)

System Benchmarks Index Values ​​BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 41165945.1 3527.5
Double-Precision Whetstone 55.0 3454.8 628.1
Execl Throughput 43.0 2102.9 489.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 323989.0 818.2
File Copy 256 bufsize 500 maxblocks 1655.0 88536.1 535.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 1090490.9 1880.2
Pipe Throughput 12440.0 456730.9 367.1
Pipe-based Context Switching 4000.0 126170.4 315.4
Process Creation 126.0 6282.5 498.6
Shell Scripts (1 concurrent) 42.4 5172.3 1219.9
Shell Scripts (8 concurrent) 6.0 1122.8 1871.4
System Call Overhead 15000.0 426422.9 284.3
========
System Benchmarks Index Score 753.4

- Benchmark Run: Wed Sep 11 2019 14:30:53 - 15:00:04
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables 73510146.2 lps (10.0 s, 7 samples)
Double-Precision Whetstone 6546.6 MWIPS (16.2 s, 7 samples)
Execl Throughput 5306.0 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 580128.9 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 149810.9 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1896766.5 KBps (30.0 s, 2 samples)
Pipe Throughput 891359.8 lps (10.0 s, 7 samples)
Pipe-based Context Switching 245363.7 lps (10.0 s, 7 samples)
Process Creation 17811.2 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 8446.7 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1147.3 lpm (60.0 s, 2 samples)
System Call Overhead 831002.3 lps (10.0 s, 7 samples)

System Benchmarks Index Values ​​BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 73510146.2 6299.1
Double-Precision Whetstone 55.0 6546.6 1190.3
Execl Throughput 43.0 5306.0 1234.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 580128.9 1465.0
File Copy 256 bufsize 500 maxblocks 1655.0 149810.9 905.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 1896766.5 3270.3
Pipe Throughput 12440.0 891359.8 716.5
Pipe-based Context Switching 4000.0 245363.7 613.4
Process Creation 126.0 17811.2 1413.6
Shell Scripts (1 concurrent) 42.4 8446.7 1992.1
Shell Scripts (8 concurrent) 6.0 1147.3 1912.1
System Call Overhead 15000.0 831002.3 554.0
========
System Benchmarks Index Score 1391.3

BYTE UNIX Hi-CPU VDS

=================================================== =================
BYTE UNIX Benchmarks (Version 5.1.3)

System: v148401.hosted-by-vdsina.ru: GNU / Linux
OS: GNU / Linux - 3.10.0-957.5.1.el7.x86_64 - # 1 SMP Fri Feb 1 14:54:57 UTC 2019
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap = "UTF-8", collate = "UTF-8")
CPU 0: Common KVM processor (6624.1 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER / SYSEXIT, SYSCALL / SYSRET
CPU 1: Common KVM processor (6624.1 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER / SYSEXIT, SYSCALL / SYSRET
10:42:58 up 21 min, 1 user, load average: 0.03, 0.07, 0.06; runlevel 3

- Benchmark Run: Wed Sep 11 2019 10:42:58 - 11:12:20
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 50496763.2 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3290.3 MWIPS (18.2 s, 7 samples)
Execl Throughput 3416.6 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 419298.9 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 105903.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1417343.7 KBps (30.0 s, 2 samples)
Pipe Throughput 539629.9 lps (10.0 s, 7 samples)
Pipe-based Context Switching 152917.5 lps (10.0 s, 7 samples)
Process Creation 10424.5 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 7237.0 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1502.7 lpm (60.0 s, 2 samples)
System Call Overhead 495647.5 lps (10.0 s, 7 samples)

System Benchmarks Index Values ​​BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 50496763.2 4327.1
Double-Precision Whetstone 55.0 3290.3 598.2
Execl Throughput 43.0 3416.6 794.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 419298.9 1058.8
File Copy 256 bufsize 500 maxblocks 1655.0 105903.4 639.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 1417343.7 2443.7
Pipe Throughput 12440.0 539629.9 433.8
Pipe-based Context Switching 4000.0 152917.5 382.3
Process Creation 126.0 10424.5 827.3
Shell Scripts (1 concurrent) 42.4 7237.0 1706.8
Shell Scripts (8 concurrent) 6.0 1502.7 2504.5
System Call Overhead 15000.0 495647.5 330.4
========
System Benchmarks Index Score 966.0

- Benchmark Run: Wed Sep 11 2019 11:12:20 - 11:41:45
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables 101242206.9 lps (10.0 s, 7 samples)
Double-Precision Whetstone 6543.9 MWIPS (18.3 s, 7 samples)
Execl Throughput 7095.4 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 793174.9 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 203939.8 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 2721785.9 KBps (30.0 s, 2 samples)
Pipe Throughput 1072159.2 lps (10.0 s, 7 samples)
Pipe-based Context Switching 307924.6 lps (10.0 s, 7 samples)
Process Creation 23097.3 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 11354.9 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1585.1 lpm (60.1 s, 2 samples)
System Call Overhead 979658.1 lps (10.0 s, 7 samples)

System Benchmarks Index Values ​​BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 101242206.9 8675.4
Double-Precision Whetstone 55.0 6543.9 1189.8
Execl Throughput 43.0 7095.4 1650.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 793174.9 2003.0
File Copy 256 bufsize 500 maxblocks 1655.0 203939.8 1232.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 2721785.9 4692.7
Pipe Throughput 12440.0 1072159.2 861.9
Pipe-based Context Switching 4000.0 307924.6 769.8
Process Creation 126.0 23097.3 1833.1
Shell Scripts (1 concurrent) 42.4 11354.9 2678.1
Shell Scripts (8 concurrent) 6.0 1585.1 2641.9
System Call Overhead 15000.0 979658.1 653.1
========
System Benchmarks Index Score 1793.6


Future plans


Recently, the fourth server has come to us. This time, Supermicro MicroCloud with 12 x Xeon E-2136, 48 x DDR4 16Gb and 12 x 1TB NVME P4510.


On average, the performance of the new MicroCloud is 8-10% more than the rack brothers


The new MicroCloud has already been put into operation, and now we are making plans to expand Hi-CPU to the Netherlands and other countries. We have servers for regular tariffs in two Dutch data centers, but when the question arises about something more complicated than a 1U server, you have to go through 9 rounds of coordination.

But that is another story.

Source: https://habr.com/ru/post/466925/


All Articles