ãã®èšäºã§ã¯ãäžè¬çãªæ©æ¢°åŠç¿ãšããŒã¿ã»ãããšã®çžäºäœçšã«çŠç¹ãåœãŠãŸãã ããªããåå¿è
ã®å Žåãã©ãããå匷ãå§ããã°ããã®ãããããããããŒã¿ã»ããããšã¯äœããæ©æ¢°åŠç¿ãå¿
èŠãªçç±ãæè¿äººæ°ãé«ãŸã£ãŠããçç±ã«èå³ããããŸããç«ããé¡ãããŸãã Python 3ã¯ãæ©æ¢°åŠç¿ãåŠç¿ããããã®éåžžã«ã·ã³ãã«ãªããŒã«ã§ããããã䜿çšããŸãã
ãã®èšäºã®å¯Ÿè±¡è
ãã®å Žåãæ°ããäºå®ã®æ€çŽ¢ã®æŽå²ãæãäžããããšæã人ããŸãã¯ãæ©æ¢°åŠç¿ãä»äºãããããã¹ãŠãã©ã®ããã«è¡ãã®ãããå°ãªããšãäžåºŠã¯çåã«æã£ãŠãã人ã¯ãããã§åœŒã®è³ªåã«å¯ŸããçããèŠã€ããã§ãããã ã»ãšãã©ã®å Žåãçµéšè±å¯ãªèªè
ã¯ããœãããŠã§ã¢ã®éšå
ã¯åå¿è
ãåŠç¿ããããã«ããããç°¡ç¥åãããŠãããããèªåã«ãšã£ãŠèå³æ·±ããã®ãèŠã€ããããšã¯ãããŸããããäžè¬çã«ãæ©æ¢°åŠç¿ã®èµ·æºãšãã®éçºã«ã€ããŠåŠã¶ããšã¯èª°ã«ã害ãäžããŸããã
æ°åã§
æ¯å¹ŽãäŒæ¥ãšæ奜家ã®äž¡æ¹ã®ããã°ããŒã¿ãç 究ããå¿
èŠæ§ãé«ãŸã£ãŠããŸãã YandexãGoogleãªã©ã®å€§äŒæ¥ã¯ãRããã°ã©ãã³ã°èšèªãPythonã©ã€ãã©ãªãªã©ã®ããŒã¿ã調æ»ããããã®ããŒã«ããŸããŸã䜿çšããŠããŸãïŒãã®èšäºã§ã¯ãPython 3åãã«äœæãããäŸã瀺ããŸãïŒã
ã ãŒã¢ã®æ³å ïŒããã³åç-圌èªèº«ïŒã«ãã
ãš ãéç©åè·¯äžã®ãã©ã³ãžã¹ã¿ã®æ°ã¯24ãæããšã«2åã«ãªããŸãã ããã¯ãæ¯å¹Žã³ã³ãã¥ãŒã¿ãŒã®çç£æ§ãåäžããããã以åã¯ã¢ã¯ã»ã¹ã§ããªãã£ãç¥èã®å¢çãåã³ãå³ã«ã·ãããããããšãæå³ããŸããäž»ã«ãããã°ããŒã¿ã®ç§åŠãã®äœæã«é¢é£ããããã°ããŒã¿ã調æ»ããç¯å²ããããŸããäž»ã«åè¿°ã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã䜿çšããããšã§å¯èœã«ãªããåäžçŽåŸã«åããŠæ€èšŒãå¯èœã«ãªããŸããã ãããããæ°å¹ŽåŸã«ã¯ãããšãã°æµäœéåã®ããŸããŸãªåœ¢æ
ã絶察çãªç²ŸåºŠã§èª¬æã§ããããã«ãªãã§ãããã
ããŒã¿åæã¯ç°¡åã§ããïŒ
ã¯ã ãããŠãŸãèå³æ·±ãã 人é¡å
šäœãããã°ããŒã¿ãç 究ããããã®ç¹å¥ãªéèŠæ§ã«å ããŠãããã°ããŒã¿ãç¬èªã«ç 究ããåãåã£ããåçãïŒæ奜家ããæ奜家ãžïŒãé©çšããããšã¯æ¯èŒçç°¡åã§ãã ä»æ¥ã®åé¡åé¡ã解決ããã«ã¯ãèšå€§ãªéã®ãªãœãŒã¹ããããŸãã ãããã®ã»ãšãã©ãçç¥ãããšãScikit-learnã©ã€ãã©ãªãŒïŒSKlearnïŒã®ããŒã«ã䜿çšã§ããŸãã æåã®åŠç¿ãã·ã³ãäœæããŸãã
clf = RandomForestClassifier() clf.fit(X, y)
ããã§ãå±æ§ã«ãã£ãŠåŒæ°ã®å€ãäºæž¬ïŒãŸãã¯åé¡ïŒã§ããæãåçŽãªãã·ã³ãäœæããŸããã
-ãã¹ãŠãéåžžã«åçŽãªå Žåãããšãã°éè²šäŸ¡æ Œãªã©ã誰ãããŸã äºæž¬ããŠããªãã®ã§ããïŒãããã®èšèã䜿ãã°ãèšäºãå®æãããããšã¯å¯èœã§ããã
ãã¡ãããããè¡ããŸãã ïŒãã¡ãããåŸã§è¡ããŸãïŒãã¿ã¹ã¯ã»ããã®äºæž¬ã®æ£ç¢ºæ§ãæºããããã®ç¹å®ã®åŸ®åŠãªéãããããŸãã ãã¹ãŠã®ã¿ã¹ã¯ãããã§ç°¡åã«è§£æ±ºã§ããããã§ã¯ãããŸããïŒããã«ã€ããŠã®è©³çŽ°ã¯ã
ãã¡ããåç
§ããŠãã ãã ïŒ
èŠç¹ãã€ãã
-ããã§ãç§ã¯ããã«ãã®ããžãã¹ã§ãéã皌ãããšãã§ããŸãããïŒã¯ããè³é100,000ãã«ã®åé¡ã解決ããã«ã¯ãŸã ãŸã é ãã§ããã誰ããç°¡åãªãã®ããå§ããŸããã
ãããã£ãŠãä»æ¥å¿
èŠãªã®ã¯ïŒ
- Python 3ïŒpip3ãã€ã³ã¹ããŒã«ãããŠããïŒ
- ãžã¥ãã¿ãŒ
- SKlearnãNumPyãmatplotlib
äœãã足ããªãå ŽåïŒãã¹ãŠã5åã§å
¥ããéå§ããã«ã¯ãPython 3ã
ããŠã³ããŒãããŠã€ã³ã¹ããŒã«ããŸãïŒã€ã³ã¹ããŒã«äžã«ãWindowsã€ã³ã¹ããŒã©ãŒãããŠã³ããŒãããå Žåã¯ãå¿ããã«pipãã€ã³ã¹ããŒã«ããŠPATHã«è¿œå ããŠãã ããïŒã ãã®åŸã䟿å®äžãPythonçšã®150ãè¶
ããã©ã€ãã©ãª
ãå«ã Anacondaããã±ãŒãžã䜿çšãããŸããïŒããŠã³ããŒã
ãªã³ã¯ ïŒã Jupyterãnumpyãscikit-learnãmatplotlibã©ã€ãã©ãªã䜿çšããã®ã«äŸ¿å©ã§ããããã¹ãŠã®ã€ã³ã¹ããŒã«ãç°¡çŽ åããŸãã ã€ã³ã¹ããŒã«åŸãAnacondaã³ã³ãããŒã«ããã«ãŸãã¯ã³ãã³ãã©ã€ã³ïŒç«¯æ«ïŒããJupyter Notebookãå®è¡ããŸãïŒãjupyter Notebookãã
ããã«äœ¿çšããã«ã¯ãPythonæ§æã®ããã€ãã®ç¥èãšãªãŒããŒã®æ©èœãå¿
èŠã§ãïŒèšäºã®æåŸã«ããPython 3ã®åºæ¬ããå«ãæçšãªãªãœãŒã¹ãžã®ãªã³ã¯ãæäŸãããŸãïŒã
éåžžã©ãããäœæ¥ã«å¿
èŠãªã©ã€ãã©ãªãã€ã³ããŒãããŸãã
import numpy as np from pandas import read_csv as read
-ããŠãNumpyã§ã¯ããã¹ãŠãæ確ã§ãã ãããããªããã³ããããã«read_csvãå¿
èŠãªã®ã§ããããïŒå©çšå¯èœãªããŒã¿ããèŠèŠåããããšäŸ¿å©ãªå ŽåããããŸããããã®å Žåã¯ããããæäœãããããªããŸãã ããã«ã人æ°ã®ããKaggleãµãŒãã¹ã®ã»ãšãã©ã®ããŒã¿ã»ããã¯ããŠãŒã¶ãŒãCSV圢åŒã§ã³ã³ãã€ã«ããŸãã
ãããŠãããã¯ãã³ããèŠèŠåããããŒã¿ã»ããã®ããã§ãããã§ã[ã¢ã¯ãã£ããã£]åã«ã¯ãåå¿ãé²è¡äžãã©ããã衚瀺ãããŸãïŒè¯å®ã®å Žåã¯1ãåŠå®ã®å Žåã¯0ïŒã ãããŠãæ®ãã®åã¯äžé£ã®èšå·ãšããã«å¯Ÿå¿ããå€ïŒåå¿äžã®ç©è³ªã®ããŸããŸãªå²åããããã®åéç¶æ
ãªã©ïŒã§ãã
ãããŒã¿ã»ãããšããèšèã䜿çšããããšãèŠããŠããŸããã ããã¯äœã§ããïŒããŒã¿ã»ããã¯ããŒã¿ã®ãµã³ãã«ã§ãããéåžžã¯ãèšå·ã®ã»ããã®ã»ãããâãäžéšã®å€ãïŒããšãã°ãäœå®
äŸ¡æ ŒããŸãã¯ããã€ãã®ã¯ã©ã¹ã®ã»ããã®ã·ãŒã±ã³ã¹çªå·ïŒã®åœ¢åŒã§ã
ãXã¯èšå·ã®ã»ããã§ã
yã¯åãã§ããå€ã ããšãã°ãå€ãã®ã¯ã©ã¹ã®æ£ããã€ã³ããã¯ã¹ã決å®ããããšã¯
åé¡ã¿ã¹ã¯ã§ãããã¿ãŒã²ããå€ïŒäŸ¡æ Œããªããžã§ã¯ããŸã§ã®è·é¢ãªã©ïŒãæ€çŽ¢ããããšã¯
ã©ã³ãã³ã°ã¿ã¹ã¯ã§ãã æ©æ¢°åŠç¿ã®çš®é¡ã®è©³çŽ°ã«ã€ããŠã¯ãèšäºããã³åºçç©ãã芧ãã ãããçŽæã©ãããèšäºãžã®ãªã³ã¯ã¯èšäºã®æåŸã«ãããŸãã
ããŒã¿ãç¥ã
ææ¡ãããããŒã¿ã»ããã¯
ããããããŠã³ããŒãã§ã
ãŸã ã ãœãŒã¹ããŒã¿ãžã®ãªã³ã¯ãšç¹æ§ã®èª¬æã¯ãèšäºã®æåŸã«ãããŸãã æ瀺ããããã©ã¡ãŒã¿ãŒã«åŸã£ãŠããã®ã¯ã€ã³ãŸãã¯ãã®ã¯ã€ã³ãã©ã®ã°ã¬ãŒãã«å±ããŠããããå€æããããæ±ããããŸãã ããã§äœãèµ·ãã£ãŠããããææ¡ã§ããŸãã
path = "% %/wine.csv" data = read(path, delimiter=",") data.head()
JupyterããŒãããã¯ã§äœæ¥ãããšã次ã®çããåŸãããŸãã
ããã¯ãåæãå¯èœã«ãªã£ãããšãæå³ããŸãã æåã®åã®ã°ã¬ãŒãå€ã¯ãã¯ã€ã³ãã©ã®ã°ã¬ãŒãã«å±ããŠãããã瀺ããæ®ãã®åã¯ã¯ã€ã³ãåºå¥ã§ããèšå·ã瀺ããŠããŸãã
data.headïŒïŒã®ä»£ããã«
ããŒã¿ã®ã¿ãå
¥åããŠã¿ãŠãã ãã-ããŒã¿ã»ããã®ãäžéšãã ãã§ãªãã衚瀺ã§ããããã«ãªããŸããã
åé¡ã¿ã¹ã¯ã®ç°¡åãªå®è£
èšäºã®äž»èŠéšåã«ç®ãåããŸã-åé¡åé¡ã解決ããŸãã ãã¹ãŠé ïŒ
- ãã¬ãŒãã³ã°ãµã³ãã«ãäœæãã
- ã©ã³ãã ã«éžæããããã©ã¡ãŒã¿ãŒãšãããã«å¯Ÿå¿ããã¯ã©ã¹ã§è»ãèšç·Žããã
- å®è£
ããããã·ã³ã®å質ãèšç®ããŸã
å®è£
ãèŠãŠã¿ãŸãããïŒã³ãŒãã®åæç²ã¯ãããŒãããã¯ã®åå¥ã®ã»ã«ã§ãïŒã
X = data.values[::, 1:14] y = data.values[::, 0:1] from sklearn.cross_validation import train_test_split as train X_train, X_test, y_train, y_test = train(X, y, test_size=0.6) from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(n_estimators=100, n_jobs=-1) clf.fit(X_train, y_train) clf.score(X_test, y_test)
X-笊å·ïŒ1ã13åïŒã
y-ã¯ã©ã¹ïŒ0çªç®ã®åïŒã®é
åãäœæããŸãã 次ã«ããœãŒã¹ããŒã¿ãããã¹ãããã³ãã¬ãŒãã³ã°ãµã³ãã«ãåéããããã«ãscikit-learnã§å®è£
ããã䟿å©ãªçžäºæ€èšŒ
é¢æ°train_test_splitã䜿çšããŸãã æ¢è£œã®ãµã³ãã«ãããã«åŠçããŸããRandomForestClassifierãã¢ã³ãµã³ãã«ããsklearnã«ã€ã³ããŒãããŸãã ãã®ã¯ã©ã¹ã«ã¯ããã·ã³ã®ãã¬ãŒãã³ã°ãšãã¹ãã«å¿
èŠãªãã¹ãŠã®ã¡ãœãããšæ©èœãå«ãŸããŠããŸãã ã¯ã©ã¹randomForestClassifierã
clf ïŒåé¡åïŒå€æ°ã«
å²ãåœãŠãŠããã
fitïŒïŒé¢æ°ãåŒã³åºãããšã«ãããclfã¯ã©ã¹ããè»ãèšç·ŽããŸããããã§ã
X_trainã¯
y_trainã«ããŽãªãŒã®ç¬Šå·ã§ãã ããã§ãã¯ã©ã¹ã«çµã¿èŸŒãŸãã
ã¹ã³ã¢ã¡ããªãã¯ã䜿çšããŠããããã®
y_testã«ããŽãªã®çã®å€ã«ãã£ãŠ
X_testã«å¯ŸããŠäºæž¬ãããã«ããŽãªã®ç²ŸåºŠãå€æ
ã§ããŸãã ãã®ã¡ããªãã¯ã䜿çšãããšã0ã1ã®ç²ŸåºŠå€ã衚瀺ãããŸãã1<=> 100ïŒ
å®äºã§ãïŒ
RandomForestClassifierãšçžäºæ€èšŒã¡ãœããtrain_test_splitã«ã€ããŠRandomForestClassifierã®clfãåæåãããšããå€n_estimators = 100ãn_jobs = -1ãèšå®ããŸãã æåã®å€ã¯ãã©ã¬ã¹ãå
ã®ããªãŒã®æ°ã«ã2çªç®ã®å€ã¯é¢é£ããããã»ããµã³ã¢ã®æ°ã«é¢ä¿ããŸãïŒ-1ã§ã¯ãã¹ãŠã®ã³ã¢ãé¢ä¿ããããã©ã«ãã¯1ã§ãïŒã ãã®ããŒã¿ã»ããã䜿çšããŠããããã¹ããµã³ãã«ãååŸããå Žæããªãããã train_test_splitã䜿çšããŠãããŒã¿ããã¬ãŒãã³ã°ãµã³ãã«ãšãã¹ããµã³ãã«ã«ã ã¹ããŒãã« ãåå²ããŸãã èå³ã®ããã¯ã©ã¹ãŸãã¯ã¡ãœããã匷調衚瀺ããJupyterç°å¢ã§Shift + TabãæŒããšããããã«ã€ããŠè©³ããç¥ãããšãã§ããŸãã
-粟床ãè¯ãã ãã€ããããªæãïŒåé¡ã®åé¡ã解決ããããã®éèŠãªèŠå ã¯ãã«ããŽãªã®ãã¬ãŒãã³ã°ãµã³ãã«ã«æé©ãªãã©ã¡ãŒã¿ãŒãéžæããããšã§ãã ããå€ãã®ãããè¯ãã ããããåžžã«ã§ã¯ãããŸããïŒãã ããããã«ã€ããŠã¯ã€ã³ã¿ãŒãããã§ã詳ããèªãããšãã§ããŸããããããããåå¿è
åãã«èšèšãããå¥ã®èšäºãæžããŠããŸãïŒã
ãç°¡åããããã ãã£ãšèïŒãã®ããŒã¿ã»ããã®åŠç¿ææãèŠèŠåããããã«ãäŸãæããããšãã§ããŸãïŒ2次å
空éã«èšå®ãããã©ã¡ãŒã¿ãŒã2ã€ã ãæ®ããŠãèšç·Žããããµã³ãã«ã®ã°ã©ããäœæããŸãïŒãã®ã°ã©ãã®ãããªãã®ãåŸãããŸããããã¯èšç·Žã«äŸåããŸãïŒïŒ
ã¯ããæšèã®æ°ãæžå°ãããšãèªè粟床ãäœäžããŸãã ã°ã©ãã¯ããã»ã©çŸããã¯ãããŸããã§ããããåçŽãªåæã§ã¯æ±ºå®çã§ã¯ãããŸããã§ããããã·ã³ããã¬ãŒãã³ã°ãµã³ãã«ïŒãã€ã³ãïŒãéžæããäºæž¬å€ïŒãã£ã«ïŒã®å€ãšæ¯èŒããæ¹æ³ãã¯ã£ãããšèŠããŸãã
ããã§å®è£
from sklearn.preprocessing import scale X_train_draw = scale(X_train[::, 0:2]) X_test_draw = scale(X_test[::, 0:2]) clf = RandomForestClassifier(n_estimators=100, n_jobs=-1) clf.fit(X_train_draw, y_train) x_min, x_max = X_train_draw[:, 0].min() - 1, X_train_draw[:, 0].max() + 1 y_min, y_max = X_train_draw[:, 1].min() - 1, X_train_draw[:, 1].max() + 1 h = 0.02 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) pred = clf.predict(np.c_[xx.ravel(), yy.ravel()]) pred = pred.reshape(xx.shape) import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF']) cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF']) plt.figure() plt.pcolormesh(xx, yy, pred, cmap=cmap_light) plt.scatter(X_train_draw[:, 0], X_train_draw[:, 1], c=y_train, cmap=cmap_bold) plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.title("Score: %.0f percents" % (clf.score(X_test_draw, y_test) * 100)) plt.show()
èªè
ã«ããªããããã©ã®ããã«æ©èœããã®ããèŠã€ããŠããããŸãã
æåŸã®èšè
ãã®èšäºããPythonã§ã®åçŽãªæ©æ¢°åŠç¿ã®éçºã«å°ãæ
£ããã®ã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã ãã®ç¥èã¯ãBigData + Machine Learningã®ãããªãç 究ã®ããã®éäžã³ãŒã¹ãç¶ç¶ããã®ã«ååã§ãã äž»ãªãã®ã¯ãåçŽãªãã®ããåŸã
ã«æ·±ãããããšã§ãã ãããŠãããã«çŽæãããŠããæçšãªãªãœãŒã¹ãšèšäºããããŸãïŒ
èè
ããã®èšäºãäœæãããã£ãããšãªã£ãè³æ
æŽå²çãšãã»ã€ïŒ
æ©æ¢°åŠç¿ã®è©³çŽ°ïŒ
PythonãåŠã¶ããããŒã¿ã
æ±ãåã« ïŒ
ãã ããsklearnã©ã€ãã©ãªãŒãæé©ã«éçºããã«ã¯ãè±èªã®ç¥èã圹ç«ã¡ãŸãã
ãã®ãœãŒã¹ã«ã¯å¿
èŠãªç¥èããã¹ãŠå«ãŸããŠããŸãïŒããã¯APIãªãã¡ã¬ã³ã¹ã§ããããïŒã
Pythonã§ã®æ©æ¢°åŠç¿ã®äœ¿çšã«é¢ãããã詳现ãªç 究ãå¯èœã«ãªããYandexã®æåž«ã®ãããã§ç°¡åã«ãªããŸãã-
ãã®ã³ãŒã¹ã«ã¯ãã·ã¹ãã å
šäœã®ä»çµã¿ã説æããæ©æ¢°åŠç¿ã®çš®é¡ã«ã€ããŠè©³ãã説æããããã«å¿
èŠãªãã¹ãŠã®ããŒã«ããããŸã
ä»æ¥ã®ããŒã¿ã»ããã®ãã¡ã€ã«ã¯
ããããååŸ
ãã ããããã«å€æŽãããŸããã
ããŒã¿ãååŸããå ŽæããŸãã¯ãããŒã¿ã»ããã¹ãã¬ãŒãžã-ããŸããŸãªãœãŒã¹ãã倧éã®ããŒã¿
ãåéãã
ãŸã ã å®éã®ããŒã¿ã«ã€ããŠãã¬ãŒãã³ã°ããããšã¯éåžžã«äŸ¿å©ã§ãã
ãã®èšäºãæ¹åããããã®ãµããŒãã«æè¬ãããšãšãã«ãããããçš®é¡ã®å»ºèšçãªæ¹å€ã«åããŸãã