Hello!
In September of this (2019) year, the election of the Governor of St. Petersburg was held. All voting data is publicly available on the website of the election commission, we won’t break anything, we simply visualize the information from this website
www.st-petersburg.vybory.izbirkom.ru in the form that we need, we will conduct a very simple analysis and identify some "Magic" patterns.
Usually for such tasks I use Google Colab. This is a service that allows you to run Jupyter Notebooks, and having access to the GPU (NVidia Tesla K80) for free, it will significantly speed up data parsing and further processing. I needed some preparatory work before importing.
%%time !apt update !apt upgrade !apt install gdal-bin python-gdal python3-gdal
Further imports.
import requests from bs4 import BeautifulSoup import numpy as np import pandas as pd import matplotlib.pyplot as plt import geopandas as gpd import xlrd
Description of used libraries
- requests - module for requesting connection to the site
- BeautifulSoup - module for parsing html and xml documents; allows you to access directly the content of any tags in html
- numpy - a mathematical module with a basic and necessary set of mathematical functions
- pandas - data analysis library
- matplotlib.pyplot - module-set of construction methods
- geopandas - module for building an election map
- xlrd - module for reading table files
The time has come to collect the data itself, parsim. The election committee took care of our time and provided reporting in the tables, it’s convenient.
So, this is what was discussed. Data in Google Colab is collected smartly, but not so much.
Before building various graphs and maps, it’s good for us to have an idea of what we call a “dataset”.
Analysis of the data of the election commission
In the city of St. Petersburg there are 30 territorial commissions; to them, in the 31st column, we refer digital polling stations.
Each territorial commission has several dozens of PECs (precinct election commissions).
The main thing that interests us is the appearance at each polling station, and what kind of dependencies we can observe. I will build on the following:
- dependence of turnout and the number of polling stations;
- dependence of the percentage of votes for candidates on the turnout;
- Dependence of the turnout on the number of voters in the precinct.
From the bare data table it’s quite difficult to trace how the elections went and draw some conclusions, so the charts are our way out.
Let's build what we came up with.
Dependence of turnout and number of polling stationsDependence of the percentage of votes for candidates on the turnout- “Green” - votes for Amosov
Dependence of turnout on the number of voters in the precinctThe constructions are quite tolerable, but during the course of the work it turned out that on the average 400 people and a percentage for Beglov from 50 to 70, but there are two sections with a turnout> 1200 people and a percentage of 90 + -0.2. It is interesting that this happened in these areas. Did some fantastic agitators work? Or just drove 10 people buses and forced to vote? One way or another, we are excited, a small such investigation is being obtained. But we still have to draw cards. Let's continue.
Visual representation and work with geopandas
They painted the administrative districts of the city and signed them, it looks familiar, it looks like Peter, but the Neva is still missing.
Number of voters
Turnout
Conclusion
You can have fun with the data for a long time, use it in different fields and, of course, get some benefit, for this they exist. Simple and sophisticated geolocation visualization tools can do great things. Write about your success in comments!