How we solved the task of visualizing cohorts of users in the Promopult service using Grafana.
Promopult is a powerful service with a large number of users. Over 10 years of operation, the number of registrations in the system has exceeded one million. Those who have come across similar services know that this array of users is far from homogeneous.
Someone registered and “fell asleep” forever. Someone forgot the password and registered a couple more times in six months. Someone brings money to the cashier, and someone came for free tools . And it would be nice to get some profit from everyone.
On such large data arrays as ours, it makes no sense to analyze the behavior of an individual user and make micro-decisions. But catching trends and working with large groups is possible and necessary. What we, in fact, are doing.
- What is cohort analysis and why is it needed.
- How to make cohorts by month of user registration in SQL.
- How to transfer cohorts to Grafana .
If you already know what cohort analysis is, and how to do it in SQL, skip to the last section.
1. What is cohort analysis and why is it needed
Cohort analysis is a method based on a comparison of different groups (cohorts) of users. Most often, our groups are formed by the week or month in which the user started using the service. The user’s lifetime is calculated from here, and this is an indicator on the basis of which a rather complicated analysis can be carried out. For example, understand:
- how does the channel attract on the life of the user;
- how the use of a function or service affects a lifetime;
- how the launch of features X affected the life time compared to last year.
2. How to make cohorts in SQL?
The size of the article and common sense do not allow us to give our real data here - in the test dump, statistics for a year and a half are: 1200 users and 53,000 transactions. So that you can play with this data, we have prepared a docker image with MySQL and Grafana, in which you can feel it all yourself. Link to GitHub at the end of the article.
And here we show the creation of cohorts with a simplified example.
Suppose we have a service. Users register in it and spend money on services. Over time, users fall off. We want to know how long users live, and how many of them fall off after the 1st and 2nd month of using the service.
To answer these questions, we need to build cohorts by month of registration. Activity will be measured by expenses in each month. Instead of costs, there may be orders, a monthly fee, or any other time-bound activity.
The examples are made in MySQL, but for the rest of the DBMS there should not be significant differences.
User table - users:
Spending table - billing:
Select all user charges and registration date:
SELECT b.userId, b.Date, u.RegistrationDate FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
We build cohorts by months, for this we will convert all dates to months:
Now we need to know how many months the user was active - this is the difference between the month of debiting and the month of registration. MySQL has a function PERIOD_DIFF () - the difference between two months. Add PERIOD_DIFF () to the request:
SELECT b.userId, DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth, DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth, PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
We count the users activated in each month - we group the records by BillingMonth, RegistrationMonth and MonthsDiff:
SELECT COUNT(DISTINCT(b.userId)) AS UsersCount, DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth, DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth, PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId GROUP BY BillingMonth, RegistrationMonth, MonthsDiff
In January, February and March, one new user appeared - MonthsDiff = 0. One January user was active and in February RegistrationMonth = 2019-01, BillingMonth = 2019-02, and one February user was active in March.
On a large data set, patterns are naturally better seen.
How to Transfer Cohorts to Grafana
We learned how to form cohorts, but when there are a lot of records, analyzing them is no longer easy. Records can be exported to Excel and create beautiful tables, but this is not our method!
Cohorts can be displayed as interactive graphs in Grafana .
To do this, add another query to convert the data into a format suitable for Grafana:
SELECT DATE_ADD(CONCAT(s.RegistrationMonth, '-01'), INTERVAL s.MonthsDiff MONTH) AS time_sec, SUM(s.Users) AS value, s.RegistrationMonth AS metric FROM (
And upload the data to Grafana.
Example chart from the demo :
The github repository with an example is a docker image with MySQL and Grafana, which can be run on your computer. The database already has demo data for a year and a half, from January 2018 to July 2019.
If you wish, you can upload your data to this image.
PS Articles about cohort analysis in SQL: