Apache Ignite Zero Deployment: exactly Zero?


We are a retail technology development department. Once, the management set the task of speeding up volumetric calculations by using Apache Ignite in conjunction with MSSQL, and showed a site with beautiful illustrations and examples of Java code. The site immediately liked Zero Deployment , the description of which promises miracles: you donโ€™t have to manually deploy your Java or Scala code on each node in the grid and re-deploy it each time it changes. In the course of work, it turned out that Zero Deployment has a specific usage, the features of which I want to share. Under the cat reflections and implementation details.


1. Statement of the problem


The essence of the problem is as follows. There is a sales point directory for SalesPoint and a directory of products Sku (Stock Keeping Unit). The point of sale has the attribute "Store type" with the values โ€‹โ€‹"small" and "large". An assortment (a list of goods of a point of sale) is connected (loaded from a DBMS) to each point of sale and information is provided that the indicated product has been dated
excluded from the assortment or added to the assortment.


It is required to organize a partitioned cache of points of sale and store in it information about connected goods for a month in advance. Compatibility with the combat system requires the Ignite client node to download data, calculate an aggregate of the type (Store type, Product code, day, number of points of sales) and upload it back to the DBMS.


2. The study of literature


No experience yet, so Iโ€™m starting to dance from the stove. That is, with a review of publications.


A 2016 article Introducing Apache Ignite: The first steps provide a link to the Apache Ignite project documentation and at the same time reproach it for slurring. I read it a couple of times, clarity does not come. Turning to the official getting-started tutorial, which
optimistically promises "You'll be up and running in a jiffy!". I understand the settings of environment variables, watch two Apache Ignite Essentials videos, they turned out to be not very useful for my specific task. I successfully launch Ignite from the command line with the standard file "example-ignite.xml", I build the first Compute Application using Maven. The application works and uses Zero Deployment, what a beauty!


I read further, and there the example immediately uses affinityKey (created earlier through an SQL query), and even the mysterious BinaryObject is applied:


IgniteCache<BinaryObject, BinaryObject> people = ignite.cache("Person").withKeepBinary(); 

I read a little : the binary format is a bit of a reflection, access to the fields of an object by name. It can read the value of a field without completely deserializing the object (saving memory). But why is BinaryObject used instead of Person, because there is Zero Deployment? Why is IgniteCache <Key, Person> translated into IgniteCache <BinaryObject, BinaryObject>? It's not clear yet.


I remake Compute Application to my case. The primary key of the point of sale directory in MSSQL is defined as [id] [int] NOT NULL, I create a cache by analogy


 IgniteCache<Integer, SalesPoint> salesPointCache=ignite.cache("spCache") 

In the xml-config I indicate that the cache is partitioned


 <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="spCache"/> <property name="cacheMode" value="PARTITIONED"/> </bean> 

Partitioning by points of sale assumes that the required aggregate will be built on each cluster node for the salesPointCache records that are available there, after which the client node will perform the final summation.


I read the First Ignite Compute Application tutorial, I do it by analogy. On each node of the cluster I run IgniteRunnable (), something like this:


  @Override public void run() { SalesPoint sp=salesPointCache.get(spId); sp.calculateSalesPointCount(); .. } 

I add the aggregation and upload logic, run on a test data set. Locally, everything works on the development server.


I launch two CentOs test servers, specify ip addresses in default-config.xml, execute on each


 ./bin/ignite.sh config/default-config.xml 

Both Ignite nodes start up and see each other. I specify the necessary addresses in the xml-config of the client application, it starts, adds a third node to the topology and immediately there are two nodes again. The log reads "ClassNotFoundException: model.SalesPoint" in the line


 SalesPoint sp=salesPointCache.get(spId); 

StackOverflow says the cause of the error is that CentOs servers do not have a custom SalesPoint class. Arrived. How does "you donโ€™t have to manually deploy your Java code on each node" and hereinafter? Or is your Java code not about SalesPoint?


I probably missed something - again I begin to search, read and search again. Over time, there is a feeling that I read everything on the topic, there is nothing new. While searching, I found some interesting comments.


Valentin Kulichenko , Lead Architect at GridGain Systems, response to StackOverflow, April 2016:


 Model classes are not peer deployed, but you can use withKeepBinary() flag on the cache and query BinaryObjects. This way you will avoid deserialization on the server side and will not get ClassNotFoundException. 

Another authoritative opinion: Denis Magda , Director of product management, GridGain Systems.


An article on Habrรฉ about microservices refers to three Denis Magda articles: Microservices Part I , Microservices Part II , Microservices Part III 2016-2017. In a second article, Denis suggests starting a cluster node through MaintenanceServiceNodeStartup.jar. You can also use the launch with the xml configuration and command line, but then you need to manually put custom classes on each deployed cluster node:


 That's it. Start (..) node using MaintenanceServiceNodeStartup file or pass maintenance-service-node-config.xml to Apache Ignite's ignite.sh/bat scripts. If you prefer the latter then make sure to build a jar file that will contain all the classes from java/app/common and java/services/maintenance directories. The jar has to be added to the classpath of every node where the service might be deployed. 

Indeed, that's it. Here it turns out, why, this mysterious binary format!


3. SingleJar


Denis took the first place in my personal rating, IMHO the most useful tutorial of all available. His github MicroServicesExample contains a completely ready-made example of configuring cluster nodes, which compiles without any additional squats.


I do it in the image and likeness, I get a single jar file that launches a "data node" or "client node" depending on the command line argument. The assembly starts and runs. Zero Deployment is defeated.


The transition from megabytes of test data to tens of gigabytes of combat data showed that the binary format exists for good reason. It was necessary to optimize memory consumption on nodes, and here BinaryObject was very useful.


4. Conclusions


The first rebuke we encountered about the slurred documentation of the Apache Ignite project turned out to be fair, it has changed a little since 2016. It is not easy for a beginner to build a functioning prototype based on a site and / or repository.


As a result of the work done, it seemed that Zero Deployment works, but only at the system level. Something like this: BinaryObject is used to teach remote cluster nodes how to work with custom classes; Zero Deployment - Internal Mechanism
Apache Ignite itself and distributes system objects across the cluster.


I hope my experience will be useful to new Apache Ignite users.



Source: https://habr.com/ru/post/472568/


All Articles