For what
With the complex structure of advertising campaigns and a large number of calls, additional tools for storing, processing and analyzing information about incoming calls become necessary. Often you need quick access to data over a long period of time. Sometimes you need complex data processing, correlating calls to a specific channel or campaign.
One of the options for speeding up work, which also provides additional benefits, is importing calls from CoMagic to Google BigQuery. Much has been written about the benefits of BigQuery, so let's move on to creation.
To create an automatic import you will need:
- Google account (if not already) with the created project
- Python knowledge
- Introducing Google Cloud Documentation
How to create a project is described
here . After the project is created, you need to create a dataset in BigQuery.
BQ documentation and
instructions for creating a dataset .
Retrieving data from CoMagic
Turning to the CoMagic
documentation . To get a list of calls or calls, we need the reports section.
We create a simple class for working with the CoMagic API. All necessary requirements will be indicated at the end in the link to GitHub.
import json import requests import random import pandas as pd class ComagicClient: """ CoMagic""" def __init__(self, login, password): """ CoMagic""" self.login = login self.password = password self.base_url = 'https://dataapi.comagic.ru/v2.0' self.payload_ = {"jsonrpc":"2.0", "id":1, "method":None, "params": None} self.token = self.get_token(self.login, self.password) def base_request(self, method, params): """ CoMagic. API . JSON-like . : https://www.comagic.ru/support/api/data-api/""" id_ = random.randrange(10**7)
Now you need to determine what kind of data is needed. The data needs to be processed and made visible so that it can be loaded into BigQuery.
Create an auxiliary class and define the data received from CoMagic.
class ComagicHandler(ComagicClient): """ , CoMagic""" time_partition_field = 'PARTITION_DATE' def __init__(self, login, password, first_call_date): self.day_before_first_call = pd.to_datetime(first_call_date) - pd.Timedelta(days=1) super().__init__(login, password) def get_calls_report(self, date_from, date_till): """ . . Pandas DataFrame. . Connector . . . Pnadas.DataFrame""" method = "get.calls_report" fields = ['id', 'visitor_id', 'person_id', 'start_time', 'finish_reason', 'is_lost', 'tags', 'campaign_name','communication_number', 'contact_phone_number', 'talk_duration', 'clean_talk_duration', 'virtual_phone_number', 'ua_client_id', 'ym_client_id', 'entrance_page', 'gclid', 'yclid', 'visitor_type', 'visits_count', 'visitor_first_campaign_name', 'visitor_device', 'site_domain_name','utm_source', 'utm_medium', 'utm_campaign', 'utm_content', 'eq_utm_source', 'eq_utm_medium', 'eq_utm_campaign', 'attributes']
Sending data to BigQuery
After the data from CoMagic is received and converted, you need to send it to BigQuery.
from google.cloud import bigquery from google.cloud.exceptions import NotFound import pandas as pd class BQTableHanler: """ BigQuery""" time_partition_field = 'PARTITION_DATE' def __init__(self, full_table_id, service_account_file_key_path = None): """ `myproject.mydataset.mytable`. , Application Default Credentials, .""" self.full_table_id = full_table_id project_id, dataset_id, table_id = full_table_id.split(".") self.project_id = project_id self.dataset_id = dataset_id self.table_id = table_id if service_account_file_key_path:
Determine the logic for updating data
Since there is a limit on the number of data rows received from CoMagic, it is necessary to limit the amount of requested data. We will limit the request period. To do this, you need an auxiliary function that will split a large period of time into segments of a specified length.
def interval_split(array, interval): """ . , 2, - , - . : get_intervals([1,2,3,4,5,6,7], 3) => [[1,3], [4,6], [7]] get_intervals([1,2,3], 4) => [[1,3]]""" intervals = [] iw, i = 0, 0 l = len(array) for v in array: if i==0 or (i)%interval==0: intervals.append([v]) if (i+1)%interval == 0 or (i+1) == l: intervals[iw].append(v) iw+=1 i+=1 return intervals
This is necessary when loading data for the first time, when you need to download data for a long period of time. The period is divided into several small periods. By the way, it’s better to do this without using Cloud Function, since they have a time limit. Well, or, as an option, you can run the function many, many times.
We create a connector class to link the BigQuery table where we want to store the data and the data from CoMagic.
from helpfunctions import interval_split import pandas as pd class Connector: """ """ time_partition_field = 'PARTITION_DATE'
Next, we prescribe the main function for updating the data, which will be launched on a schedule.
from connector import Connector from bqhandler import BQTableHanler from comagichandler import ComagicHandler from credfile import * def main(event, context): """ event, context : https://cloud.google.com/functions/docs/writing/background#functions-writing-background-hello-pubsub-python"""
Configure Google Cloud Platform
We collect all the files in a ZIP archive. In the credfile.py file, we enter the CoMagic username and password to receive the token, as well as the full name of the table in BigQuery and the path to the service account file if the script is run from the local machine.
Create a Cloud Function
- Go to the console
- If no function has been created yet, click “Create function”
- In the trigger field, select PUB / SUB
- Create a new theme for PUB / SUB. For example, 'update_calls'
- Source: ZIP upload (local ZIP file)
- Environment: Python 3.7
- Download the zip file
- Choosing a Cloud Storage Temporary Segment
- In the field `called function` we write 'main'
- Allocated memory: optional
Configuring Scheduler and PUB / SUB
In the last step, we created the `update_calls` trigger. This automatic topic has appeared in the
list of topics .
Now, using Cloud Scheduler, you need to configure the trigger. when it will fire and GCF will start.
- Go to the console
- In the frequency field in the CRON format, set when the trigger should trigger and the function will start.
- Destination: Pub / Sub
- Subject: register the theme that was specified when creating the function: “update_calls”
- Payloads * (Payloads) is information that will be transferred to Pub / Sub and to the main function
Now the script will be launched daily at 01:00 and call data will be updated at the end of the previous day.
Link to GitHub to run from the local computer
GitHub
Link to ZIP File