Count Scoring de la Fer or a study on credit scoring as part of broadening one's horizons. Part 3

Part three, in which Athos precipitated, and the Count de la Fer makes wise with algorithms.

UPD Part One is here
UPD part two here

Introduction from the authors:

Good afternoon! Today we continue the series of articles devoted to scoring and the use of graph theory in it. You can get acquainted with the first and second articles here and here, respectively. We strongly recommend, otherwise, this article may seem like a meaningless experiment with algorithms.

All comic allegories, inserts, etc., are designed to slightly relieve the narrative and not allow it to fall into a boring lecture. We apologize to everyone who doesn’t get into our humor

The purpose of this article: in no more than 30 minutes, to describe the graph construction algorithm and calculate the scoring score for NPO “One for All”.

Terms and Definitions:

Depth-first search (DFS) algorithm - The strategy for deep search, as the name implies, is to go “deep” into the graph as far as possible. The search algorithm is described recursively: we sort through all the edges coming from the vertex in question. If the edge leads to a vertex that was not previously considered, then we run the algorithm from this unexamined vertex, and after that we return and continue to sort through the edges. The return occurs if there are no edges in the vertex under consideration that lead to the unexamined vertex. If, after the completion of the algorithm, not all vertices were considered, then it is necessary to run the algorithm from one of the unexamined vertices
Recursion - calling a function (procedure) from it itself, directly (simple recursion) or through other functions (complex or indirect recursion)

In our first article ( here ), we determined the graph model with which we will experiment, and gave it a brief description.

In our second article ( here ) we described the basic data structures that can be used to store the graph and described in more detail the graph model and how to work with it.

It's time to pull the sword out of its scabbard and start stabbing everyone, more precisely, create an algorithm for calculating the scoring score for NPO “One for All”.

Let's go!

The algorithm will be executed in Python . As the saying goes, pull the snake on the sword!

The graph will be stored in the dictionary, as follows:

graph = { 'Odin_za_vseh' : {'goody': 10, 'rank': 4, 'fund':4, 'relations':{'Bonasye':3, 'Trevi': 1, 'Gusak':2,'Suchka':5, 'Roshfor':1, 'Cardi':1}}, 'Cardi' : {'goody': -8, 'rank': 9, 'fund':8, 'relations':{'Korol':2,'Odin_za_vseh':3,'Gvardia':5, 'Roshfor':5 }}, 'Roshfor' : {'goody': -7, 'rank': 7, 'fund':5,'relations':{'Cardi':1, 'Suchka': 3,'Odin_za_vseh':2, 'Bonasye':5 }}, 'Suchka' : {'goody': -10, 'rank': 4, 'fund':4, 'relations':{'Odin_za_vseh':5,'Roshfor':3 }}, 'Gvardia' : {'goody': -5, 'rank': 3, 'fund':4, 'relations':{'Cardi':1,'Gusak':1 }}, 'Gusak' : {'goody': -7, 'rank': 4, 'fund':4, 'relations':{'Gvardia':5,'Odin_za_vseh':2 }}, 'Trevi' : {'goody': 7, 'rank': 5, 'fund':5, 'relations':{'Odin_za_vseh':4,'Mushketery':5 }}, 'Bonasye': {'goody': -2, 'rank': 2, 'fund':3, 'relations':{'Odin_za_vseh':1,'Roshfor':1,'Konsta':2 }}, 'Konsta' : {'goody': 10, 'rank': 5, 'fund':3, 'relations':{'Koroleva':2,'Bonasye':1 }}, 'Mushketery' : {'goody': 7, 'rank': 5, 'fund':4, 'relations':{'Korol':1, 'Trevi':1}}, 'Koroleva' : {'goody': 6, 'rank': 8, 'fund':7, 'relations':{'Korol':2,'Konsta':5, 'EnglishMan':3 }}, 'Korol' : {'goody': 5, 'rank': 10, 'fund':10, 'relations':{'Cardi':3, 'Koroleva':5,'Mushketery':5 }}, 'EnglishMan' : {'goody': 5, 'rank': 8, 'fund':10, 'relations':{'Koroleva':2}} }

'goody' - node rating - good or bad and the degree of "priority" of the node. For example, the Cardinal is a negative hero of the film, so his rating is negative. The Cardinal is one of the key enemies of our heroes, but not the most important (we decided so)
'rank' - score of the node in the ranking table
'fund' - node viability
'relations' - with whom is connected and how much can influence the node connected with it

So far, everything is simple, but an attentive reader could already notice some of the features, yes, content questions arise already. Let's try to predict and answer them. The right of the first injection is ours!

I think it’s not worth explaining what we used for the name of the key of the first level dictionary. You already perfectly guessed, although some names have undergone changes due to our perception of the work.
Where did these parameters come from ('goody', 'rank', etc.), why are they and where did their rating come from?
In order:
- In real life, these parameters will characterize a particular node-company. For everyone who will conduct the assessment and depending on the type of this assessment, these parameters will be different. For example, such parameters for evaluating the scoring of a borrower may be - turnover, amount of payables / receivables, writ of execution, etc.
- Why exactly they - the author sees it this way))) these parameters reflect the essence of what the Count describes, in our opinion
- Assessment is, as is now customary to say, expert. In real life, this estimate will be obtained through mining. We will write more about this in the next article.
Why do some nodes have different reciprocal links (for example, Odin_za_vseh - Bonasye = 3, and Bonasye - Odin_za_vseh = 1)? This is because Odin_za_vseh has a much greater effect on Bonasye than vice versa. And this is important to understand. When scoring Odin_za_vseh, we will need to consider the effect of Bonasye on Odin_za_vseh.
What is a bond score and how is it calculated?
- Communication assessment is a measure of the influence of one node on another
- Each connection in our Graph is two-way, but at the same time, the assessment of communication in different directions may vary
- Communication score is a value from 1 to 5 (from 1 to 5 just for example). Negative connection can not be, because a node is either connected to another, and we evaluate how much it affects this node, or is not connected. In this case, of course, the connection with an unreliable node will ultimately be negative for the node we are evaluating, because this unreliable node will have a negative internal rating.
- An internal evaluation of a node is an aggregated value that consists of the internal parameters of a node. In our example, this is 'goody', 'rank', 'fund'. Internal assessment may be negative.
How will a scoring ball be considered?
- Each company that will use this approach can lay down its scoring score calculation based on its requirements and its business experience.
- In our case, we chose a simple and intricate way:
  - Scoring score = Internal score of the node we are evaluating + Sum (Internal assessment of a level 1 child node evaluating the impact of this node on the node being evaluated) + (Sum (Internal rating of a level 2 child node assessing the effect of this node on a level 1 parent node) / 2)
  - If you get a negative scoring score, then we do not give a loan
- The algorithm presented below is basic and can be easily modified to fit any scoring score calculation technique.
- All the difficulty will be in the “right” mining, but it will be described in the next article

And here is the algorithm itself:

 searched=[] #   def search_outer(name,n): return graph[name]['goody']*graph[name]['rank']*graph[name]['fund']+search(name,n) # search_outer()    ,       #,       search() #name –   ,     #n –   –       def search(name, n, z=1): ball=0 #,      global searched #    if z <= n and name not in searched: #        searched.append(name) for x in graph[name]['relations']: #   ball += graph[x]['goody'] * graph[x]['rank'] * graph[x]['fund'] * graph[x]['relations'][name] / z + search(x,n,z+1) #      #   ,     return ball #, ,     def main(): ball = search_outer ('Odin_za_vseh', 2) print(ball) if __name__ == '__main__': main()

To summarize:

I am sure that reading the article did not take more than 30 minutes
We calculated the scoring score for NPO One for All. We don’t give credit, NPO “One for All” turned out to be those adventurers with a bunch of enemies. We can give credit to state employees, for example, 'Mushketery'
The algorithm turned out to be simple and quite effective.
Computational complexity is O (N ** d + 1), where d is the number of levels. Visually, the algorithm is shown in the figure below.

Thanks to sobolevn and his article

You can go to the most interesting and difficult - data mining!

PS Regarding links to other articles with theory (in the last article we made a remark):

Source: https://habr.com/ru/post/465927/

All Articles

Count Scoring de la Fer or a study on credit scoring as part of broadening one's horizons. Part 3

More articles: