Open Data Scienceãªã©ã®åªããããŒã¿ãµã€ãšã³ã¹è³æã«ãããããããç§ã¯å¿ã®I宎ããã¹ã¯ã©ãããåéãç¶ããæ©æ¢°åŠç¿ã¹ãã«ãšããŒã¿åæããŒãããç¿åŸããç§ã®çµéšãå
±æãç¶ããŠããŸãã
ååã®èšäºã§ã¯ãåŸããè¡æ¶²ã§èªåèªèº«ã®ããŒã¿ãæœåºããéçšã§ãåé¡ã«ãã£ãŠããã€ãã®åé¡ã調ã¹ãŸããããä»ã¯ååž°ã®æãæ¥ãŸããã ä»åã¯æå
ã«ç
§æããªãã£ãã®ã§ãä»ã®ã®ã³ããåã£ãŠã¿ãããšã«ããŸããã
èšäºã® 1ã€ã§ãåœå
ã®ãªãŒãã³ããŒã¿ã«ç®ãåããããèªè
ã«ãã£ã³ããŒã³ãè¡ã£ãããšãæãåºããŸãã ããããç§ã¯ãæ¶åã®ããã®ã±ãã£ã¢ããŸãã¯éЬåã®ã·ã£ã³ããŒã®åºåã®è¥ã女æ§ã§ã¯ãªãã®ã§ãç§ã®è¯å¿ã¯ç§ããããçµéšããã«äœããã¢ããã€ã¹ããããšãèš±å¯ããŸããã§ããã
ã©ãããå§ããŸããïŒ ãã¡ããããã·ã¢æ¿åºããã®å
¬éããŒã¿ã«ãããããã«ã¯å
šäœã®çããããŸãã
ãã·ã¢æ¿åºã®å
¬éããŒã¿ã«é¢ããç§ã®ç¥ãåã
ã¯ããã®èšäºã®ã€ã©ã¹ããšã»ãŒåãã§ããã ããããããŒããŒãŠã¬ã³ãŽã€ã«ããã·ããããŒã«ã®ç»é²ããã¥ãŒã©ã«ããã¢ã€ã¹ã¹ã±ãŒããªã³ã¯ã®ã¬ã³ã¿ã«æ©åšã®ãªã¹ãã«ãŸã£ããèå³ããªãããã§ã¯ãããŸããããããã¯ååž°ã¿ã¹ã¯ã«ã¯é©ããŠããŸããã
調ã¹ãŠã¿ããšããã·ã¢é£éŠæ¿åºã®ODã®ãµã€ãã§äŸ¡å€ã®ãããã®ãèŠã€ããããšãã§ãããšæããŸãããããã»ã©ç°¡åã§ã¯ãããŸããã
ãŸããåŸã§
財åçã®ããŒã¿ãæ®ãããšã«ããŸããã
ãããããç§ã¯ã¢ã¹ã¯ã¯æ¿åºã®å
¬éããŒã¿ã奜ãã ã£ãã®ã§ãããã§ããã€ãã®æœåšçãªåé¡ã調ã¹ãæçµç
ã«ã¢ã¹ã¯ã¯ã®åžæ°ã®å°äœã®è¡çºã®ç»é²ã«é¢ããæ
å ±ã幎ããšã«éžæ
ããŸããç·åœ¢ååž°ã®åéã§æå°éã®ã¹ãã«ãé©çšããŠåŸããããã®ã¯ã
GitHubã§ç°¡åã«ç¢ºèªã§ããŸãããã¡ãããç«ã®äžãèŠãããšãã§ããŸãã

UPDïŒè¿œå ã»ã¯ã·ã§ã³-ãããŒãã¹ã
ã¯ããã«
ãã€ãã®ããã«ãèšäºã®åé ã§ããã®èšäºãçè§£ããããã«å¿
èŠãªã¹ãã«ã«ã€ããŠã話ããŸãã
以äžãå¿
èŠã«ãªããŸãã
- ãã¥ãŒããªã¢ã«ãèªãããç°¡åãªæ©æ¢°åŠç¿ã³ãŒã¹ãå®è¡ããŠãã ãã
- å°ãã®Pythonãçè§£ãã
- æ°åŠã®ç¥èãã»ãšãã©ãªã
ããŒã¿åæã𿩿¢°åŠç¿ã®åéã«ãŸã£ããæ
£ããŠããªãå Žåã¯ãã·ãªãŒãºã®ä»¥åã®èšäºããã©ããŒããé ã«èŠãŠãã ãããåèšäºã¯ç±å¿ã«æžãããŠãããããŒã¿ãµã€ãšã³ã¹ã«æéãè²»ããã¹ããã©ãããçè§£ã§ããŸãã
以äžã®ãã¿ãã¬ã®äžã«ä»¥åã«æºåããããã¹ãŠã®èšäº
ãã®ä»ã®ãµã€ã¯ã«èšäº1.åºæ¬ãåŠã¶ïŒ
2.æåã®ã¹ãã«ãç·Žç¿ããŸã
ããŠãåã«çŽæããããã«ããã®ã·ãªãŒãºã®èšäºã¯ã³ã³ãã³ãã§å®äºããŸãã
å
容ïŒ
ããŒãIïŒãçµå©-é±ç®éŽãå±¥ããªãã§ãã ããã-åä¿¡ããã³äž»èŠãªããŒã¿åæãããŒãIIïŒããã£ãŒã«ãã®æŠå£«ã§ã¯ãªãã-1åäœã§ã®ååž°ããŒãIIIïŒã1ã€ã®é ã¯è¯ãããã¯ããã«è¯ãã-æ£ååã«ããããã€ãã®çç±ã§ã®ååž°ããŒãIVïŒããã¹ãŠãéã§ã¯ãªãã-æ©èœã®è¿œå ããŒãVïŒãæ°ããã«ãã¿ã³ãã«ããããå€ãã«ãã¿ã³ã詊ããŠã¿ãŠãã ããïŒã-ãã¬ã³ãäºæž¬ããŒãã¹-æãžã®ã¢ãããŒããç°ãªãããã粟床ãåäžããŸãããã§ã¯ãã¿ã¹ã¯ã«ç§»ããŸãããã ç§ãã¡ã®ç®æšã¯ãç·åœ¢ååž°ã®åºæ¬çãªææ³ãå®èšŒããäºæž¬ãããã®ãèªåã§æ±ºå®ããã®ã«ååãªããŒã¿ã»ãããæãäžããããšã§ãã
ä»åã¯ç°¡æœã«ãªããŸãããç·åœ¢ååž°ã®ã¿ãèæ
®ããããšã§ãããã¯ããéžââè±ããããšã¯ãããŸããïŒä»ã®
ã¡ãœããã®ååšã«ã€ããŠã¯ããããç¥ã£ãŠã
ãŸã ïŒ
ããŒãI ïŒãçµå©ãã-é åºãªéŽãå±¥ããªãã§ãã ããã-åä¿¡ãšäžæ¬¡ããŒã¿åæ
æ®å¿µãªãããã¢ã¹ã¯ã¯æ¿åºã®å
¬éããŒã¿ã¯ããã€ã¹ããªãŒãããã°ã©ã ã®çŸåã«è²»ããããäºç®ã»ã©åºå€§ã§ç¡éã§ã¯ãããŸããããããã§ã䟡å€ã®ãããã®ãèŠã€ããããšãã§ããŸããã
åžæ°çå°äœã®è¡çºã®ç»é²ã®ãã€ããã¯ã¹ã¯ç§ãã¡ã«éåžžã«é©ããŠããŸãã
ããã¯ãçµå©åŒãåºçã芪åé¢ä¿ãæ»äº¡ãååã®å€æŽãªã©ã®ããŒã¿ãå«ããæããšã«åé¡ãããã»ãŒ100ã®ã¬ã³ãŒãã§ãã
ååž°ã®åé¡ã解決ããã«ã¯éåžžã«é©ããŠããŸãã
ãã¹ãŠã®ã³ãŒãã¯
GitHubã§ãã¹ããã
ãŸããŸããéšåçã«ãç§ãã¡ã¯ä»ãããæºåããŠããŸãã
æåã«ãã©ã€ãã©ãªãã€ã³ããŒãããŸãã
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import numpy as np from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from sklearn import linear_model import warnings warnings.filterwarnings('ignore') %matplotlib inline
次ã«ãããŒã¿ãããŒãããŸãã Pandasã©ã€ãã©ãªã䜿çšãããšããªã¢ãŒããµãŒããŒãããã¡ã€ã«ãããŠã³ããŒãã§ããŸããããã¯ãéåžžãããŒãžãªãã€ã¬ã¯ãã¢ã«ãŽãªãºã ãããŒã¿ã«ã§å€æŽãããªããšããæ¡ä»¶ã§ãäžè¬çã«ã¯çŽ æŽãããããšã§ãã
ïŒã³ãŒãã®ããŠã³ããŒããªã³ã¯ãæ©èœããªãããšãé¡ã£ãŠããŸããæ©èœããªããªã£ãå Žåã¯ãæŽæ°ã§ããããã«ãPMãã«æžã蟌ãã§ãã ããïŒ
ããŒã¿ãèŠãŠã¿ãŸãããïŒ
ID | global_id | 幎 | æ | StateRegistrationOfBirth | StateRegistrationOfDeath | StateRegistrationOfMarriage | StateRegistrationOfDivorce | StateRegistrationOfPaternityExamination | StateRegistrationOfAdoption | StateRegistrationOfNameChange | ç·æ° |
---|
1 | 37591658 | 2010 | 1æ | 9206 | 10430 | 4997 | 3302 | 1241 | 95 | 491 | 29762 |
2 | 37591659 | 2010 | 2æ | 9060 | 9573 | 4873 | 2937 | 1326 | 97 | 639 | 28505 |
3 | 37591660 | 2010 | è¡é² | 10934 | 10528 | 3642 | 4361 | 1644 | 147 | 717 | 31973 |
4 | 37591661 | 2010 | 4æ | 10140 | 9501 | 9698 | 3943 | 1530 | 128 | 642 | 35572 |
5 | 37591662 | 2010 | 5æ | 9457 | 9482 | 3726 | 3554 | 1397 | 96 | 492 | 28204 |
6 | 62353812 | 2010 | 6æ | 11253 | 9529 | 9148 | 3666 | 1570 | 130 | 556 | 35852 |
7 | 62353813 | 2010 | 7æ | 11477 | 14340 | 12473 | 3675 | 1568 | 123 | 564 | 44220 |
8 | 62353814 | 2010 | 8æ | 10302 | 15016 | 10882 | 3496 | 1512 | 134 | 578 | 41920 |
9 | 62353816 | 2010 | 9æ | 10140 | 9573 | 10736 | 3738 | 1480 | 101 | 686 | 36454 |
10 | 62353817 | 2010 | 10æ | 10776 | 9350 | 8862 | 3899 | 1504 | 89 | 687 | 35167 |
11 | 62353818 | 2010 | 11æ | 10293 | 9091 | 6080 | 3923 | 1355 | 97 | 568 | 31407 |
12 | 62353819 | 2010 | 12æ | 10600 | 9664 | 6023 | 4145 | 1556 | 124 | 681 | 32793 |
æã«é¢ããããŒã¿ã䜿çšããå Žåã¯ãã¢ãã«ã®çè§£å¯èœãªåœ¢åŒã«å€æããå¿
èŠããããŸããscikit-learnã«ã¯ç¬èªã®ã¡ãœããããããŸãããä¿¡é Œæ§ã®ããã«ïŒå€ãã®äœæ¥ããªãããïŒæäœæ¥ã§è¡ããåæã«IDãæã€ããã€ãã®åœ¹ã«ç«ããªãåãåé€ããŸããŽãã
泚ïŒãã®å ŽåãMonthã«ã©ã ã«ã¯ã ã¯ã³ãããã³ãŒãã£ã³ã°ãé©çšããæ¹ãæ£ãããšæããŸããããã®å Žåãäºæž¬ã®å質ã«ã¯ããŸãé¢å¿ããªãããããã®ãŸãŸã«ããŠãããŸãã
UPDïŒæµæã§ããã ããŒãã¹ã»ã¯ã·ã§ã³ã«èª¿æŽãªãã·ã§ã³ã远å ããŸãã
衚圢åŒã®ãã¥ãŒããã¹ãŠã®ãŠãŒã¶ãŒã«å¯ŸããŠéããã©ããããããªãã®ã§ãç»åã䜿çšããŠããŒã¿ãèŠãŠã¿ãŸãããã

ããŒãã«ã®ã©ã®åãäºãã«ç·åœ¢ã«äŸåããŠããããæããã«ãªããã¢å³ãäœæããŸãã ãã ããããã«ãã¹ãŠã®ããŒã¿ãæ€èšããããã§ã¯ãªããããåŸã§è¿œå ãããã®ããããããæåã«ããŒã¿ã®äžéšãåé€ããŸãã
pandas Dataframeããåã®äžéšãéžæïŒãåé€ãïŒããç°¡åãªæ¹æ³ã¯ãå¿
èŠãªåãéžæããããšã§ãã
columns_to_show = ['StateRegistrationOfBirth', 'StateRegistrationOfMarriage', 'StateRegistrationOfPaternityExamination', 'StateRegistrationOfDivorce','StateRegistrationOfDeath'] data=df[columns_to_show]
ããŠãããã§ã¹ã±ãžã¥ãŒã«ãäœæã§ããŸãã
grid = sns.pairplot(data)

銬ãå¹²ãèã®å±±ã®ééãš1ãæã®å¹³åæ°å§ãšæ¯èŒããªãããã«ãç¹æ§ãã¹ã±ãŒãªã³ã°ããããšããå§ãããŸãã
ç§ãã¡ã®å Žåããã¹ãŠã®ããŒã¿ã¯åãå€ïŒç»é²ãããè¡çºã®æ°ïŒã§è¡šç€ºãããŸãããã¹ã±ãŒãªã³ã°ã®å€åãèŠãŠã¿ãŸãããã

ã»ãšãã©äœããããŸããããä¿¡é Œæ§ã®ããã«ãã¹ã±ãŒã«ãããããŒã¿ããšããŸãã
ããŒãII ïŒããã£ãŒã«ãã«ããã ãã§ã¯æŠå£«ã§ã¯ãããŸããã-1åäœã§ååž°
åçãèŠãŠãæåã®æ¹æ³ã¯ãStateRegistrationOfBirthãšStateRegistrationOfPaternityExaminationã®2ã€ã®ãµã€ã³ã®é¢ä¿ãçŽç·ã§èšè¿°ããããšã§ããããã¯äžè¬ã«ããã»ã©é©ãããšã§ã¯ãããŸããïŒããã¿ããã£ãŒãããã§ãã¯ãããã»ã©ãåäŸãç»é²ãããé »åºŠãé«ããªããŸãïŒã
ããŒã¿ãæºåããŸããã€ãŸãã2ã€ã®åã®ç¬Šå·ãšç®ç颿°ãéžæããæ¢è£œã®ã©ã€ãã©ãªã䜿çšããŠãããŒã¿ããã¬ãŒãã³ã°ããã³å¶åŸ¡ãµã³ãã«ã«åå²ããŸãïŒããŒã¿ãç®çã®åœ¢åŒã«ããã«ã¯ãã³ãŒãã®æåŸã®æäœãå¿
èŠã§ããïŒ
X = data2['StateRegistrationOfBirth'].values y = data2['StateRegistrationOfPaternityExamination'].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42) X_train=np.reshape(X_train,[X_train.shape[0],1]) y_train=np.reshape(y_train,[y_train.shape[0],1]) X_test=np.reshape(X_test,[X_test.shape[0],1]) y_test=np.reshape(y_test,[y_test.shape[0],1])
æç³»åã«ãªã³ã¯ããæç¢ºãªå¯èœæ§ã«ããããããããã¢ã³ã¹ãã¬ãŒã·ã§ã³ã®ç®çã®ããã«ãããŒã¿ãæéã«é¢ä¿ãªãåã«ã¬ã³ãŒãã®ã»ãããšèŠãªãããšã«æ³šæããããšãéèŠã§ãã
ã¢ãã«ããŒã¿ã "ãã£ãŒã"ããŠã屿§ã®ä¿æ°ã調ã¹ãR ^ 2ïŒæ±ºå®ä¿æ°ïŒã䜿çšããŠã¢ãã«ãè¿äŒŒãã粟床ãè©äŸ¡ããŸãã
ããã¯ããŸãããŸããããŸããã§ããããäžæ¹ã§ãæšæž¬ã§çªãããããã¯ããã«è¯ãã§ã
Coefficients: [[ 0.78600258]]
Score: 0.611493944197
æåã«ãã¬ãŒãã³ã°ããŒã¿ã§ãã£ãŒãã§ãããèŠãŠã¿ãŸãããïŒ
plt.scatter(X_train, y_train, color='black') plt.plot(X_train, lr.predict(X_train), color='blue', linewidth=3) plt.xlabel('StateRegistrationOfBirth') plt.ylabel('State Registration OfPaternity Examination') plt.title="Regression on train data"

ãããŠä»ãã³ã³ãããŒã«ã«ïŒ
plt.scatter(X_test, y_test, color='black') plt.plot(X_test, lr.predict(X_test), color='green', linewidth=3) plt.xlabel('StateRegistrationOfBirth') plt.ylabel('State Registration OfPaternity Examination') plt.title="Regression on test data"

ããŒãIII ïŒã1ã€ã®é ã¯è¯ãããã¯ããã«è¯ãã-æ£ååã«ããããã€ãã®çç±ã§ã®ååž°
ããã«é¢çœãããããã«ãå¥ã®ç®ç颿°ãéžæããŸããããããã¯ãæããã«ç¹åŸŽã«ããã»ã©ç·åœ¢ã«äŸåããŠããŸããã
èšäºã®ã¿ã€ãã«ã«åãããŠãç®ç颿°ãšããŠçµå©ç»é²ãéžæããŸãã
ãããŠããã¢ã®å³èšå·ã®åçã®ã»ããããæ®ãã®ãã¹ãŠã®åãäœæããŸãã
æåã«ãç·åœ¢ååž°ã¢ãã«ã®ã¿ããã¬ãŒãã³ã°ããŸãã
lr = linear_model.LinearRegression() lr.fit(X_train, y_train) print('Coefficients:', lr.coef_) print('Score:', lr.score(X_test,y_test))
éå»ã®å Žåãããçµæãå°ãæªããªããŸãïŒé©ãããšã§ã¯ãããŸããïŒ
Coefficients: [[-0.03475475 0.97143632 -0.44298685 -0.18245718]]
Score: 0.38137432065
åãã¬ãŒãã³ã°ããã³/ãŸãã¯æ©èœéžæãšæŠãããã«ãéåžžãç·åœ¢ååž°ã¢ãã«ãšãšãã«æ£ååã¡ã«ããºã ã䜿çšãããŸãããã®èšäºã§ã¯ããªããªãã¡ã«ããºã ïŒ
L1-æ£åå ïŒãæ€èšããŸãã
ã¢ã«ãã¡æ£ååä¿æ°ãé«ãã»ã©ãã¢ãã«ã¯ãååž°æ¹çšåŒã®ããã€ãã®ä¿æ°ããŒãã«ãããŸã§ã屿§ã®å€§ããªå€ãããç©æ¥µçã«çްããããŸãã
ããã§ç§ã¯ããã§æªãããšãããŠããã®ã§ããã¹ãããŒã¿ã«æ£ååä¿æ°ã調æŽããããšã«æ³šæããŠãã ãããå®éã«ã¯ãããè¡ãã¹ãã§ã¯ãããŸããããå®èšŒããã®ã¯åé¡ãããŸããã
åºåãèŠãŠã¿ãŸãããã
Appha: 0.01
Coefficients: [ 0. 0.46642996 -0. -0. ]
Score: 0.222071102783
Appha: 1e-09
Coefficients: [-0.03475462 0.97143616 -0.44298679 -0.18245715]
Score: 0.38137433837
Appha: 0.00025
Coefficients: [-0.00387233 0.92989507 -0.42590052 -0.17411615]
Score: 0.385551648602
ãã®å Žåãæ£ååãããã¢ãã«ã§ã¯å質ã倧å¹
ã«åäžããããšã¯ãããŸãããæ©èœã远å ããŠãã ããã
ããŒãIV ïŒããã¹ãŠããŽãŒã«ãã§ã¯ãªãã-æ©èœã远å
æããã«åœ¹ã«ç«ããªããç»é²ç·æ°ãã®èšå·ã远å ããŸãããªããããæãããªã®ã§ããïŒ ãããèªåã§èŠãŠãã ããã
columns_to_show3=columns_to_show2.copy() columns_to_show3.append("TotalNumber") columns_to_show3 X = df2[columns_to_show3].values
ãŸããæ£èŠåããã«çµæãèŠãŠã¿ãŸãããã
lr = linear_model.LinearRegression() lr.fit(X_train, y_train) print('Coefficients:', lr.coef_) print('Score:', lr.score(X_test,y_test))
Coefficients: [[-0.45286477 -0.08625204 -0.19375198 -0.63079401 1.57467774]]
Score: 0.999173764473
çããïŒ ã»ãŒ100ïŒ
ã®ç²ŸåºŠïŒ
ãã®å±æ§ã¯ã©ã®ããã«åœ¹ã«ç«ããªãã®ã§ããããïŒïŒ
ãŸããè³¢æã«èããŠã¿ãŸããããç§ãã¡ã®çµå©æ°ã¯åèšã«å«ãŸããŠããã®ã§ãä»ã®å
åã«é¢ããæ
å ±ãããã°ã粟床ã¯100ïŒ
ã«è¿ãã§ãã å®éã«ã¯ãããã¯ç¹ã«æçšã§ã¯ãããŸããã
ãªããªãã«ç§»ããŸããã
ãŸããå°ããªæ£ååä¿æ°ãéžæããŸãã
Appha: 0.00015
Coefficients: [-0.44718703 -0.07491507 -0.1944702 -0.62034146 1.55890505]
Score: 0.999266251287
ããŠãäœã倧ããªå€åã¯ãããŸãããé¢çœããªãã®ã§ãå¢ãããå Žåã«ã©ããªããèŠãŠã¿ãŸãããã
Appha: 0.01
Coefficients: [-0. -0. -0. -0.05177979 0.87991931]
Score: 0.802210158982
ãããã£ãŠãã¢ãã«ã§ã¯ã»ãšãã©ãã¹ãŠã®èšå·ã圹ã«ç«ããªããšèŠãªãããã¬ã³ãŒãã®ç·æ°ã®èšå·ãæãæçšãªãŸãŸã§ããããšãããããŸãããã®ãããçªç¶1-2åã®èšå·ã®ã¿ã䜿çšããå¿
èŠãçããå Žåãæå€±ãæå°éã«æããããã«äœãéžæããããããããŸãã
奜å¥å¿ãããã¬ã³ãŒãã®ç·æ°ã ãã§çµå©ç»é²ã®å²åã説æã§ããããšãèŠãŠã¿ãŸãããã
X_train=np.reshape(X_train[:,4],[X_train.shape[0],1]) X_test=np.reshape(X_test[:,4],[X_test.shape[0],1]) lr = linear_model.LinearRegression() lr.fit(X_train, y_train) print('Coefficients:', lr.coef_) print('Score:', lr.score(X_train,y_train))
Coefficients: [ 1.0571131]
Score: 0.788270672692
ãŸããæªãã¯ãªããã客芳çã«ä»ã®å
åãèæ
®ãããããå°ãªã
ã°ã©ããèŠãŠã¿ãŸãããïŒ

å
ã®ããŒã¿ã»ããã«å¥ã®åœ¹ã«ç«ããªã屿§ã远å ããŠã¿ãŸãããã
åå倿Žã®ç¶æ
ç»é²ã§ã¯ãç¬ç«ããŠã¢ãã«ãæ§ç¯ãããã®å±æ§ã説æããããŒã¿éã確èªã§ããŸãïŒäžèšïŒã
ãããŠãããã«ããŒã¿ãéžæããå€ã4ã€ã®æšèã§ã¢ãã«ããã¬ãŒãã³ã°ããŸãã
columns_to_show4=columns_to_show2.copy() columns_to_show4.append("StateRegistrationOfNameChange") X = df2[columns_to_show4].values
Coefficients: [[ 0.06583714 1.1080889 -0.35025999 -0.24473705 -0.4513887 ]]
Score: 0.285094398157
ãã®çç¶ã¯ç§ãã¡ãå°ç¡ãã«ããã ããªã®ã§ãæ£ååã¯è©Šã¿ãŸãã;åºæ¬çã«ã¯äœã倿ŽããŸããã
æåŸã«äŸ¿å©ãªæ©èœãéžæããŸãããã
誰ããçµå©åŒã®æãå£ç¯ïŒå€ãšåç§ïŒããããéããªå£ç¯ïŒå¬ïŒãããããšãç¥ã£ãŠããŸãã
ã¡ãªã¿ã«ã5æã«ã¯çµå©åŒãã»ãšãã©ãªãããšã«é©ããŸããã
Coefficients: [[-0.10613428 0.91315175 -0.55413198 -0.13253367 0.28536285]]
Score: 0.472057997208
質ã®åäžããããŠæãéèŠãªããšã¯ããã¹ãŠãå¥å
šãªè«çã«å¯Ÿå¿ããŠããããšã§ãã
ããŒãv ïŒãæ°ããã«ãã¿ã³ãã«ããããå€ããã®ã詊ããŠã¿ãŠãã ããïŒã-ãã¬ã³ãäºæž¬
ããããæåŸã«æ®ããããšã¯ããã¬ã³ããäºæž¬ããããã®ããŒã«ãšããŠç·åœ¢ååž°ãèŠãããšã§ãã åã®ç« ã§ã¯ãããŒã¿ãã©ã³ãã ã«ååŸããŸãããã€ãŸããå
šæéç¯å²ã®ããŒã¿ããã¬ãŒãã³ã°ã»ããã«åé¡ãããŸããã ä»åã¯ãããŒã¿ãéå»ãšæªæ¥ã«åå²ããäœããäºæž¬ã§ãããã©ããã確èªããŸã
䟿å®äžã2010幎1æããã®æéãæåäœã§æ€èšããŸãããã®ãããããŒã¿ã倿ããåçŽãªå¿å颿°ãäœæããŸãããã®çµæã幎åãææ°ã§çœ®ãæããŸãã
2016幎ãŸã§ããŒã¿ã®èª¿æ»ãè¡ãã2016幎ããå§ãŸããã¹ãŠãç§ãã¡ã®æªæ¥ã§ãã
Coefficients: [ 2.60708376e-01 1.30751121e+01 -3.31447168e+00 -2.34368684e-01
2.88096512e+02]
Score: 0.383195050367
ãã®ãããªããŒã¿ã®å
èš³ãèŠããšãããããã«ã粟床ã¯ããã¶ãäœäžããŠããŸãããäºæž¬ã®å質ã¯ç©ºã®æãããåªããŠããŸãã
ã°ã©ããèŠãŠç¢ºèªããŠãã ããã
plt.figure(figsize=(9,23))

ãã£ãŒãã§ã¯ãéå»ã¯éã§è¡šç€ºãããæªæ¥ã¯ç·ã§è¡šç€ºããããã³ãã«ã¯çŽ«ã§è¡šç€ºãããŸãã
ãããã£ãŠãã¢ãã«ããã€ã³ããäžå®å
šã«èšè¿°ããŠããããšã¯æããã§ãããå°ãªããšãå£ç¯ã®ãã¿ãŒã³ãèæ
®ããŠããŸãã
ãããã£ãŠãå°æ¥ãå©çšå¯èœãªããŒã¿ã«ãããšãã¢ãã«ã¯çµå©ãç»é²ãããšãã芳ç¹ãããäœããã®åœ¢ã§ç§ãã¡ãæ¹åã¥ããããšãã§ãããšæåŸ
ã§ããŸãã
ãã®èšäºã®ç¯å²ãè¶
ãããã¬ã³ãåæçšã®ããé«åºŠãªããŒã«ããããŸããïŒç§ã®æèŠã§ã¯ãããŒã¿ãµã€ãšã³ã¹ã®åæã¹ãã«ïŒ
ãããã«
ããŠãå垰埩å
ã®åé¡ã調æ»ããŸãããåœã®å·æ§é ã®ãªãŒãã³ããŒã¿ããŒã¿ã«ã§ä»ã®äŸåé¢ä¿ãæ¢ãããšããå§ãããŸããè峿·±ãäŸåé¢ä¿ãèŠã€ãããããããŸããã ããã£ã¬ã³ãžããšããŠããã©ã«ãŒã·å
±ååœ
opendata.byã®ãªãŒãã³ããŒã¿ããŒã¿ã«ã§äœããæãäžããããšããå§ãããŸãã
åçã®æåŸã«ã
ã¢ã¬ã¯ãµã³ããŒã»ã°ãªãŽãªãšãŽã£ãããšèšè
ãšã®ã³ãã¥ãã±ãŒã·ã§ã³ãšäžå¿«ãªè³ªåãžã®åçã«åºã¥ããŠããŸãã

ããŒãã¹-æãžã®ã¢ãããŒããç°ãªãããã粟床ãåäžããŸã
ååã¯ãäºæž¬ã®åè³ªãæ¹åããããã®æšå¥šäºé
ãå«ãæçšãªã³ã¡ã³ããæ®ããŸããã
èŠããã«ããã¹ãŠã®ææ¡ã¯ããã¹ãŠãç°¡çŽ åããããã«ããæãåã誀ã£ãŠãšã³ã³ãŒããããšããäºå®ã«åž°çããŸããïŒããã¯æ¬åœã«ããã§ãïŒã ããã2ã€ã®æ¹æ³ã§æ¹åããããšããŸãã
ãªãã·ã§ã³1-ã¯ã³ãããã³ãŒãã£ã³ã°ãåæã®å€ã«å¯ŸããŠç¬èªã®ç¹æ§ãäœæãããå Žåã
éå§ããã«ã¯ãç·šéããã«ãœãŒã¹ãã¬ãŒããããŠã³ããŒãããŠãã ãã
df_base = pd.read_csv('https://op.mos.ru/EHDWSREST/catalog/export/get?id=230308', compression='zip', header=0, encoding='cp1251', sep=';', quotechar='"')
次ã«ãpandasããŒã¿ãã¬ãŒã ã©ã€ãã©ãªïŒget_dummies颿°ïŒã«å®è£
ãããã¯ã³ãããã³ãŒãã£ã³ã°ãé©çšããäžèŠãªåãåé€ããã¢ãã«ã®ãã¬ãŒãã³ã°ãšã°ã©ãã®æç»ãåéããŸãã
ã²ãã
Coefficients: [ 2.18633008e-01 -1.41397731e-01 4.56991414e-02 -5.17558633e-01
4.48131002e+03 -2.94754108e+02 -1.14429758e+03 3.61201946e+03
2.41208054e+03 -3.23415050e+03 -2.73587261e+03 -1.31020899e+03
4.84757208e+02 3.37280689e+03 -2.40539320e+03 -3.23829714e+03]
Score: 0.869208071831

å質ã倧å¹
ã«åäžããŸããïŒ
ãªãã·ã§ã³2-ã¿ãŒã²ãããšã³ã³ãŒãã£ã³ã°ãæ¯æãã¬ãŒãã³ã°ãµã³ãã«ã§ä»æã®ç®ç颿°ã®å¹³åå€ããšã³ã³ãŒãããŸãïŒ
roryorangepantsã«æè¬ïŒ
ååŸãããã®ïŒ
Coefficients: [ 0.16556761 -0.12746446 -0.03652408 -0.21649349 0.96971467]
Score: 0.875882918435

å質ã®ç¹ã§éåžžã«é¡äŒŒããçµæã§ããã䜿çšãããæ©èœã®æ°ã¯éåžžã«å°ãªãã
ãŸããããã¯ç§ãæããã¹ãŠã§ãã
ããã«ãZhoposranchikãæ°ãšã®å¥ãã®åçããããŸããããã誰ããæããããããholivarovããåŒãèµ·ãããªãããšãé¡ã£ãŠããŸã:)
