åã®èšäºã§ãã¢ã¹ã¯ã¯ã§æ倧150äžã«ãŒãã«çžåœã®2010幎以äžã®ã¡ã«ã»ãã¹ãã³ãEã¯ã©ã¹ã賌å
¥ããäŸã§ã¯ãåçæ§ã®é«ãè»ãèŠã€ããåé¡ãèæ
®ãããŸããã ãåçæ§ã®é«ãããšã¯ããã·ã¢é£éŠã§äžå€è»ã販売ããããã®æãè©å€ã®é«ããã¹ãŠã®ãµã€ãããåéãããåºåã®äžã§ãäŸ¡æ ŒãçŸåšã®åžå ŽäŸ¡æ Œãããäœããªãã¡ãŒãç解ããããšã§ãã
æåã®æ®µéã§ã¯ãæ©æ¢°åŠç¿æ³ãšããŠå€éç·åœ¢ååž°ãéžæããããã®äœ¿çšã®æ£åœæ§ãããã³é·æãšçæãèæ
®ãããŸããã åçŽãªç·åœ¢ååž°ãã銎æã¿ã®ããã¢ã«ãŽãªãºã ãšããŠéžæãããŸããã æããã«ãååž°åé¡ã解決ããããã®ããå€ãã®æ©æ¢°åŠç¿æ¹æ³ããããŸãã ãã®èšäºã§ã¯ã調æ»äžã®ã¢ãã«ã«æé©ãªæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãã©ã®ããã«éžæããããæ£ç¢ºã«èª¬æããããšæããŸããããã¯ãçŸåšå®è£
ããŠãããµãŒãã¹
-robasta.ruã§äœ¿çšãããŠããŸãã
ã¢ã«ãŽãªãºã ã®éžæ
ããã£ã³ããªã³ãã®ã¿ã€ãã«ã®ç³è«è
ïŒ
éžæãè¡ãåã«ãäžèšã®ã¢ã«ãŽãªãºã ã¯ãã¹ãŠèª¿æ»ããããããåã¢ã«ãŽãªãºã ã«ã€ããŠè©³ãã説æããããšæããŸãã ãã ãããã®ãããªãã«ãŒââããã©ãŒã¹æ€çŽ¢ãã¹ã¯å®å
šã«æé©ãªãã®ã§ã¯ãªããæåã«ã¿ã¹ã¯ã®è¿œå 調æ»ãå®æœããæ¹ãåççã§ãã
ã¡ã«ã»ãã¹ã»ãã³ãEã¯ã©ã¹ã«å ããŠãç§ã¯ã¢ãŠãã£A5ã«æéãåããŸãããç¹ã«239銬åã®ãã£ãŒãŒã«ãšã³ãžã³ã¯ãåªãããã€ããã¯ã¹ïŒ6ç§ãã100 km / hïŒãšèš±å®¹çšãåããŠããŸãã ãã®ãã€ã人ãšã³ãžãã¢ã®åµé ã®ãšã³ãžã³åºåãžã®äŸ¡æ Œã®äŸåé¢ä¿ãèŠããšïŒä»¥äžã®èŠèŠåïŒãå€ãã®çåãèªç¶ã«æ¶ããŸãã

ããã§ç·åœ¢äŸåã®åé¡ã¯ãªãããã説æãããå€æ°ïŒãã®å Žåã¯ã³ã¹ãïŒã®ååž°å€æ°ãžã®ç·åœ¢äŸåã«åºã¥ãã¢ã«ãŽãªãºã ã¯å®å
šã«ç Žæ£ã§ããŸãã å€é
åŒã¢ãã«ãšéç·åœ¢ã¢ãã«ã®äœ¿çšã¯ãç¹å®ã®ååž°ã¢ãã«ã®åã
ã®èªåè»ã¢ãã«ã®äŸ¡æ Œãžã®äŸåã®ã¿ã€ããäºåã«ããããªããšããçç±ã§éæ³ã§ãã
ãããã£ãŠãäžèšã®èæ
®äºé
ãèæ
®ã«å
¥ãããšã
決å®æšã«åºã¥ãã¢ã«ãŽãªãºã -
ã©ã³ãã ãã©ã¬ã¹ããš
Xgboost ïŒ2çš®é¡ã®ããŒã¹ãã£ã³ã°-xgbDartãxgbTreeïŒã®ã¿ãèæ
®ãããããããæé©ãªã¢ã«ãŽãªãºã ãéžæã§ããŸãã
æé©ãªã¢ã«ãŽãªãºã ã¯ã
亀差æ€èšŒããã³é
延ãµã³ããªã³ã°äžã«æé«ã®
ããã©ãŒãã³ã¹ ïŒæå°
RMSE ïŒã瀺ãã¢ã«ãŽãªãºã ã§ããããšã«æ³šæããŠãã ããã
éžæããã¢ã«ãŽãªãºã ã®ããã©ã€ã³ããã¢ããªã±ãŒã·ã§ã³ã«é²ãåã«ã次ã®ç« ã§ããããã®èšå®ã®åé¡ã«ã€ããŠè©³ãã説æããŸãã
çžäºæ€èšŒ
ã¯ãã¹æ€èšŒïŒCVïŒã¯ãã¢ãã«ã®å®éã®æ©èœãè©äŸ¡ããæ©æ¢°åŠç¿ã¿ã¹ã¯ã§ãã®ãã©ã¡ãŒã¿ãŒã調æŽããããã«ãã䜿çšãããŸãã åæãµã³ãã«ã®ç¹å®ã®ããŒãã£ã·ã§ã³ã»ããã¯ããã¬ãŒãã³ã°ãµããµã³ãã«ãšã³ã³ãããŒã«ãµããµã³ãã«ã«åºå¥ãããŸãã ããŒãã£ã·ã§ã³ããšã«ããã¬ãŒãã³ã°ãµããµã³ãã«ã«åŸã£ãŠã¢ã«ãŽãªãºã ãæ§æããããã®å¹³å誀差ãã³ã³ãããŒã«ãµããµã³ãã«ã§æšå®ãããŸãã
亀差æ€å®è©äŸ¡ã¯ãã³ã³ãããŒã«ãµããµã³ãã«ã®ã³ã³ãããŒã«ãµããµã³ãã«å
šäœã®å¹³å誀差ãæããŸãã
亀差æ€å®ã§åŸããããšã©ãŒ
ã®ç¢ºçã®
ãã€ã¢ã¹æšå®ã§ã¯ ã
åãã¬ãŒãã³ã°ã®çŸè±¡ãåé¿ããããã«ããã¬ãŒãã³ã°ãµã³ãã«ãšã³ã³ãããŒã«ãµã³ãã«ãäºãã«çŽ ãªãµãã»ããã圢æããå¿
èŠããããŸãã
亀差æ€å®ã®çš®é¡ïŒ
- kåå²äº€å·®æ€å®
詳现ãã®æ¹æ³ã¯ãããŒã¿ãã©ã³ãã ã«ãã»ãŒåããµã€ãºã®kåã®ã°ãã°ãã®ãããã¯ã«åå²ããŸãã åãããã¯ã¯æ€èšŒãµã³ãã«ãšèŠãªãããæ®ãã®k-1ãããã¯ã¯ãã¬ãŒãã³ã°ãµã³ãã«ãšèŠãªãããŸãã ã¢ãã«ã¯k-1ãããã¯ã§ãã¬ãŒãã³ã°ãããæ€èšŒãããã¯ãäºæž¬ããŸãã ã¢ãã«ã®äºæž¬ã¯ãéžæããã€ã³ãžã±ãŒã¿ã䜿çšããŠæšå®ãããŸãïŒç²ŸåºŠãæšæºåå·®ïŒRMSEïŒãªã©ã ãã®ããã»ã¹ã¯kåç¹°ãè¿ãããã¢ãã«ã®æçµçãªæšå®ã§ããå¹³åå€ãèšç®ãããkåã®è©äŸ¡ãåŸãããŸãã éåžžãkã¯10ãæã«ã¯5ã«éžæãããŸããkãå
ã®ããŒã¿ã»ããã®èŠçŽ ã®æ°ã«çããå Žåããã®æ¹æ³ã¯åã
ã®èŠçŽ ã®çžäºæ€èšŒãšåŒã°ããŸãïŒãã®èšäºã¯èæ
®ãããŸããïŒã
- kå亀差æ€èšŒã®ç¹°ãè¿ãã
詳现ãã®æ¹æ³ã§ã¯ãkãããã¯ã®äº€å·®æ€èšŒãæ°åå®è¡ãããŸãã ããšãã°ã5x 10ãããã¯ã®çžäºæ€èšŒã§ã¯50ã®è©äŸ¡ãäžããããããã«åºã¥ããŠå¹³åè©äŸ¡ãèšç®ãããŸãã ããã¯ã50ãããã¯ã®çžäºæ€èšŒãšã¯ç°ãªãããšã«æ³šæããŠãã ããã
- ã¢ã³ãã«ã«ãã¯ãã¹æ€èšŒïŒMKKVãã¢ã³ãã«ã«ãã¯ãã¹æ€èšŒãè±éã°ã«ãŒãã¯ãã¹æ€èšŒïŒã
詳现ãã®ã¡ãœããã¯ãå
ã®ããŒã¿ã»ããããæå®ãããåæ°ãäºåã«æ±ºããããå²åã§ãã¬ãŒãã³ã°ããã³æ€èšŒãµã³ãã«ã«ã©ã³ãã ã«åå²ããŸãã
äžèšã®çžäºæ€èšŒæ¹æ³ã¯ããããããã€ã¢ã¹ãšåæ£ã䜿çšããŠç¹åŸŽä»ããããšãã§ããŸãã ãã€ã¢ã¹ã¯ãè©äŸ¡ã®ç²ŸåºŠãç¹åŸŽä»ããŸãã åæ£ã¯ç²ŸåºŠãç¹åŸŽã¥ããŸãã
äžè¬ã«ãçžäºæ€èšŒæ¹æ³ã®ãã€ã¢ã¹ã¯ãæ€èšŒãµã³ãã«ã®ãµã€ãºã«äŸåããŸãã æ€èšŒãµã³ãã«ã®ãµã€ãºãåæããŒã¿ã®50ïŒ
ã§ããå ŽåïŒ2ãããã¯ã®äº€å·®æ€èšŒïŒãæšæºåå·®ã®æçµæšå®å€ã¯ããã®ãµã€ãºãåæããŒã¿ã®10ïŒ
ã§ããå Žåãããåãã倧ãããªããŸãã äžæ¹ãæ€èšŒãµã³ãã«ã®ãµã€ãºãå°ãããªããšãåæ€èšŒãµã³ãã«ã«å«ãŸããããŒã¿ãå°ãªããªããå®å®ããæšæºåå·®ãåŸããããããåæ£ãå¢å ããŸãã
ãããã£ãŠãkãããã¯ã®äº€å·®æ€èšŒã«é¢ããŠã¯ããã€ã¢ã¹ãæå°éã«æããæ倧kãéžæããåæ£ãæžããã«ã¯ãè€æ°ã®kãããã¯ã®æ¹æ³ã䜿çšããŸãã
MQCEã«é¢ããŠã¯ãæ€èšŒãµã³ãã«ã®ãµã€ãºã¯ããã®ã¿ã€ãã®çžäºæ€èšŒã®åæ£ã«å¯Ÿãã圱é¿ããããã»ã¹ã®ç¹°ãè¿ãæ°ããããããã«å€§ãããªããŸãã ãŸããããã»ã¹ã®ç¹°ãè¿ãåæ°ããã€ã¢ã¹ã«å€§ãã圱é¿ããªãããšã«ã泚æããŠãã ããã
ãããã£ãŠãMQWã¡ãœããã«å°ãããµã€ãºã®æ€èšŒãµã³ãã«ïŒããšãã°ã10ïŒ
ïŒã䜿çšããåæ£ãæžããããã«å€æ°ã®ç¹°ãè¿ããå®è¡ããããšããå§ãããŸãã
ãã ããCeteris paribusã§ã¯ãè€æ°ã®10ãããã¯HFã䜿çšãããšåæ£ãå°ãªããªããŸããããã¯äž»ã«ããã®æ¹æ³ã§ã¯ãMQWãšã¯ç°ãªããåãããŒã¿èŠçŽ ãç°ãªããµã³ãã«ã§èŠã€ããããšãã§ããªãããã§ãã
æšè«ã®æåŸã«ã倧éã®ããŒã¿ã«å¯ŸããŠã10ãããã¯ãŸãã¯5ãããã¯ã®ã·ã³ã°ã«ã·ã§ããKBã§ãååã«åãå
¥ããããçµæãåŸãããããã«äºçŽããããšæããŸãããã®ã¿ã¹ã¯ã§ã¯ãè€æ°ã®10ãããã¯ã¯ãã¹æ€èšŒã䜿çšããŠã¢ãã«ãæ§æããŸãã
ã©ã³ãã ãã©ã¬ã¹ã
ãã©ã³ãã ãã©ã¬ã¹ããã¯ãåä¿¡ããããŒã¿ã«å¯ŸããŠå€æ°
ã®æ±ºå®æšãã©ã³ãã ã«äœæããäºæž¬çµæãå¹³ååããã¢ã«ãŽãªãºã ã§ãã ããªãŒæ§ç¯ã¢ã«ãŽãªãºã ã¯éåžžã«é«éã§ãããããå¿
èŠãªæ°ã®ããªãŒãç°¡åã«äœæã§ããŸãã
å®çšçãªèŠ³ç¹ãããäžèšã®æ¹æ³ã«ã¯1ã€ã®å€§ããªå©ç¹ããããŸããæ§æãã»ãšãã©äžèŠã§ãã ååž°ã§ããããšãã¥ãŒã©ã«ãããã¯ãŒã¯ã§ããããšãä»ã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãæ¡çšããå Žåããããã¯ãã¹ãŠå€ãã®ãã©ã¡ãŒã¿ãŒãæã¡ãç¹å®ã®ã¿ã¹ã¯ã«å¯ŸããŠéžæã§ããå¿
èŠããããŸãã å®éãRFã«ã¯ãèšå®ãå¿
èŠãªéèŠãªãã©ã¡ãŒã¿ã1ã€ã ããããŸããmtryïŒããªãŒæ§ç¯ã®åã¹ãããã§éžæãããã©ã³ãã ãµãã»ããã®ãµã€ãºïŒã§ãã ãã ããããã©ã«ãå€ã䜿çšããŠããéåžžã«åãå
¥ããããçµæãåŸãããšãã§ããŸãã
åã®èšäºã®ããã«ãæ¬ æå€ïŒN / AïŒããã¹ãŠã®ãªã°ã¬ããµã®äžå€®å€ã«çœ®ãæãããµã³ãã«ãããšã³ãžã³ããªã¥ãŒã ãé€å€ãïŒãã©ã¡ãŒã¿ãŒãšãã¯ãŒã®åŒ·ãçžé¢é¢ä¿ã«ããïŒããã®ã¢ã«ãŽãªãºã ã®æ©èœã調ã¹ãŸãã
dat <- read.csv("dataset.txt") # R dat$mileage[is.na(dat$mileage)] <- median(na.omit(dat$mileage)) # NA dat <- dat[-c(1,11)] # set.seed(1) # ( ) split <- runif(dim(dat)[1]) > 0.2 # train <- dat[split,] # (cross-validation) test <- dat[!split,] # (hold-out)
亀差æ€èšŒã§ã¯ãã¢ãã«ã®å質ãè©äŸ¡ããããã®ãªãã·ã§ã³ã
rfcvãããå€ã
ãã£ã¬ããããã±ãŒãžã䜿çšã
ãŸã ã
library(caret) # caret fit.control <- trainControl(method = "repeatedcv", number = 10, repeats = 10) train.rf.model <- train(price~., data=train, method="rf", trControl=fit.control , metric = "RMSE") # 10- 10- - train.rf.model # -
詳现ã©ã³ãã ãã©ã¬ã¹ã
292ãµã³ãã«
15äºæž¬å
ååŠçãªã
ãªãµã³ããªã³ã°ïŒäº€å·®æ€èšŒïŒ10åã10åç¹°ãè¿ãïŒ
ãµã³ãã«ãµã€ãºã®æŠèŠïŒ262ã262ã262ã263ã263ã263ã...
ãã¥ãŒãã³ã°ãã©ã¡ãŒã¿ãŒå
šäœã®çµæã®ãªãµã³ããªã³ã°ïŒ
mtry RMSE Rsquared
2 134565.8 0.4318963
8 117451.8 0.4378768
15 122897.6 0.3956822
RMSEã䜿çšããŠãæå°å€ã䜿çšããŠæé©ãªã¢ãã«ãéžæããŸããã
ã¢ãã«ã«äœ¿çšãããæçµå€ã¯mtry = 8ã§ããã
library(randomForest) # random forest train.rf.model <- randomForest(price ~ ., train,mtry=8) # -
ã¢ãã«ã®åäºæž¬åã®éèŠæ§ãæ確ã«ç€ºãã°ã©ããäœæããŸãã
varImpPlot(train.rf.model) #

rf.model.predictions <- predict(train.rf.model, test) # print(sqrt(sum((as.vector(rf.model.predictions - test$price))^2)/length(rf.model.predictions))) # ( ) [1] 121760.5
è»ã®å€ã®æšå®ã§åŸãããå¹³å誀差ã¯ãç·åœ¢ååž°ã§åŸãããåãå€ãšåçã§ãã RFãšã¯ç°ãªããç·åœ¢ã¢ãã«ãæ§ç¯ããéã«
æåºç©ãæé€ãããããèªåè»ã®ã³ã¹ããæšå®ããéã«ããã«äžæ£ç¢ºã«ãªãããšã«æ³šæããŠãã ããã ãããã£ãŠãæåº
ã«å¯Ÿãã ãã©ã³ãã ãã©ã¬ã¹ããã®
å
ç¢æ§ã«ã€ããŠè°è«ããããšãã§ããŸãã
Xgboost
åŸé
ããŒã¹ãã£ã³ã°ã®ã¢ã€ãã¢ã¯ãäºãã«é 次æ¹è¯ããåºæ¬ã¢ãã«ã®ã¢ã³ãµã³ãã«ãæ§ç¯ããããšã§ãã åŸç¶ã®ååºæ¬ã¢ãã«ã¯ãåã®åºæ¬ã¢ãã«ããã®ã¢ã³ãµã³ãã«ã®ãééããã«ã€ããŠãã¬ãŒãã³ã°ãããã¢ãã«ã®å¿çã¯éã¿ä»ããããŠèŠçŽãããŸãã
ã»ãšãã©ãã¹ãŠã®ã¢ãã«ïŒäžè¬ç·åœ¢ãäžè¬ç·åœ¢ã決å®æšãKæè¿åããã®ä»å€æ°ïŒããèµ·åãã§ããŸãã
xgboostã®ããŒã¹ãã£ã³ã°ã¢ã«ãŽãªãºã ã®å®è£
ã®æ©èœã«ã¯ããŸãã1次ããã³2次å°é¢æ°ã«å ããŠæ倱é¢æ°ã䜿çšããããšãå«ãŸããŸããããã«ãããã¢ã«ãŽãªãºã ã®å¹çãåäžããŸãã 第äºã«ã
åèšç·Žã«å¯Ÿæããã®ã«åœ¹ç«ã€çµã¿èŸŒã¿ã®
æ£ååã®ååšã ãããŠæåŸã«ãã«ã¹ã¿ã æ倱é¢æ°ãšå質ã¡ããªãã¯ãå®çŸ©ããæ©èœã
å®éšãã©ã¡ãŒã¿num_parallel_treeã®ãããã§ãåæã«äœæãããããªãŒã®æ°ãèšå®ãã1åã®å埩ã§ããŒã¹ãã£ã³ã°ã¢ãã«ã®ç¹æ®ãªã±ãŒã¹ãšããŠ
ã©ã³ãã ãã©ã¬ã¹ããæ瀺ã§ããŸãã ãŸããè€æ°ã®å埩ã䜿çšãããšãåãã©ã³ãã ãã©ã¬ã¹ãããåºæ¬ã¢ãã«ãšããŠæ©èœãããšãã«ããã©ã³ãã ãã©ã¬ã¹ããã®ããŒã¹ããåŸãããŸãã
èšäºã®äžéšãšããŠã1ã€ã®ã¿ã€ãã®ããŒã¹ã-xgbTreeã®ã¿ãæ€èšããŸãã xgbDartã§ãåæ§ã®çµæãåŸãããŸãã
fit.control <- trainControl(method = "repeatedcv", number = 10, repeats = 10) train.xgb.model <- train(price ~., data = train, method = "xgbTree", trControl = fit.control, metric = "RMSE") # 10- 10- - train.xgb.model # -
詳现eXtreme Gradient Boosting
292ãµã³ãã«
15äºæž¬å
ååŠçãªã
ãªãµã³ããªã³ã°ïŒäº€å·®æ€èšŒïŒ10åã10åç¹°ãè¿ãïŒ
ãµã³ãã«ãµã€ãºã®æŠèŠïŒ263ã262ã262ã263ã264ã263ã...
ãã¥ãŒãã³ã°ãã©ã¡ãŒã¿ãŒå
šäœã®çµæã®ãªãµã³ããªã³ã°ïŒ
eta max_depth colsample_bytree nrounds RMSE Rsquared
0.3 1 0.6 50 114131.1 0.4705512
0.3 1 0.6 100 113639.6 0.4745488
0.3 1 0.6 150 113821.3 0.4734121
0.3 1 0.8 50 114234.6 0.4694687
0.3 1 0.8 100 113960.5 0.4712563
0.3 1 0.8 150 114337.1 0.4685121
0.3 2 0.6 50 115364.6 0.4604643
0.3 2 0.6 100 117576.4 0.4472452
0.3 2 0.6 150 119443.6 0.4358365
0.3 2 0.8 50 116560.3 0.4494750
0.3 2 0.8 100 119054.2 0.4350078
0.3 2 0.8 150 121035.4 0.4222440
0.3 3 0.6 50 117883.2 0.4422659
0.3 3 0.6 100 121916.7 0.4162103
0.3 3 0.6 150125 206.7 0.3968248
0.3 3 0.8 50 119331.3 0.4296062
0.3 3 0.8 100 124385.7 0.3987044
0.3 3 0.8 150128 396.6 0.3753334
0.4 1 0.6 50 113771.6 0.4727520
0.4 1 0.6 100 113951.6 0.4717968
0.4 1 0.6 150114 135.0 0.4710503
0.4 1 0.8 50 114055.0 0.4700165
0.4 1 0.8 100 114345.5 0.4680938
0.4 1 0.8 150 114715.8 0.4655844
0.4 2 0.6 50 116982.1 0.4499777
0.4 2 0.6 100 119511.9 0.4347406
0.4 2 0.6 150122 337.9 0.4163611
0.4 2 0.8 50 118384.6 0.4379478
0.4 2 0.8 100121 302.6 0.4201654
0.4 2 0.8 150124 283.7 0.4015380
0.4 3 0.6 50 118843.2 0.4356722
0.4 3 0.6 100 124315.3 0.4017282
0.4 3 0.6 150128 263.0 0.3796033
0.4 3 0.8 50 122043.1 0.4135415
0.4 3 0.8 100128 164.0 0.3782641
0.4 3 0.8 150 132538.2 0.3567702
調æŽãã©ã¡ãŒã¿ãŒãã¬ã³ããã¯å€0ã§äžå®ã«ä¿æãããŠããŸãã
ãã¥ãŒãã³ã°ãã©ã¡ãŒã¿ãŒ 'min_child_weight'ã¯1ã®å€ã§äžå®ã«ä¿æãããŠããŸãã
RMSEã䜿çšããŠãæå°å€ã䜿çšããŠæé©ãªã¢ãã«ãéžæããŸããã
ã¢ãã«ã«äœ¿çšãããæçµå€ã¯ãnrounds = 100ãmax_depth = 1ãeta = 0.3ãgamma = 0ãcolsample_bytree = 0.6ããã³min_child_weight = 1ã§ãã
library(xgboost) # xgboost xgb_train <- xgb.DMatrix(as.matrix(train[-c(1)] ), label=train$price) # xgb_test <- xgb.DMatrix(as.matrix(test[-c(1)]), label=test$price) # xgb.param <- list(booster = "gbtree", max.depth = 1, eta = 0.3, gamma = 0, subsample = 0.5, colsample_bytree = 0.6, min_child_weight = 1, eval_metric = "rmse") train.xgb.model <- xgb.train(data = xgb_train, nrounds = 100, params = xgb.param) # -
ã¢ãã«ã®åäºæž¬åã®éèŠæ§ã瀺ãã°ã©ããäœæããŸãã
importance.frame <- xgb.importance(colnames(train[-c(1)]), model = train.xgb.model) # library(Ckmeans.1d.dp) # xgb.plot xgb.plot.importance(importance.frame)

xgb.model.predictions <- predict(train.xgb.model, xgb_test) # print(sqrt(sum((as.vector(xgb.model.predictions - test$price))^2)/length(xgb.model.predictions))) # ( ) [1] 118742.8
ãã®ç¹å®ã®ã±ãŒã¹ã®XGboostã¯ãèªåè»ã®ã³ã¹ãã®ãããã«æ£ç¢ºãªæšå®å€ã瀺ããŸããã è»ã®éžæãããã¡ãŒã«ãŒãšã¢ãã«ã«å¿ããŠãåæ§æãå¿
èŠãšããå€æ°ã®ãã€ããŒãã©ã¡ãŒã¿ãŒã«ã€ããŠæžå¿µããããŸãã ãã®ç¹ã§ã
robasta.ruãµãŒãã¹ã§äœ¿çšããã«ã¯ãã©ã³ãã ãã©ã¬ã¹ãã¢ã«ãŽãªãºã ãåªå
ãããŸããã
éžæããã¢ã«ãŽãªãºã ã®ãã¹ã
ããã£ã³ããªã³ãã®éžæãçµãã£ãã®ã§ãä»åºŠã¯åœŒã®è¡åãèŠãŠã¿ãŸãããã
library(randomForest) # random forest rf.model <- randomForest(price ~ ., dat,mtry=8) # - predicted.price <- predict(rf.model, dat) # real.price <- dat$price # profit <- predicted.price - real.price #
åã®èšäºã®ç·åœ¢ååž°
ã«ã€ããŠã¯ ãäŸ¡æ Œã«å¯Ÿããå©çã®äŸåé¢ä¿ã®ã°ã©ããäœæããŸãã
plot(real.price,profit) abline(0,0)

ãããŠãå©çã®å²åãèšç®ããŸãããã
sorted <- sort(predicted.price /real.price, decreasing = TRUE) sorted[1:10] 69 42 122 15 168 248 346 109 231 244 1.412597 1.363876 1.354881 1.256323 1.185104 1.182895 1.168575 1.158208 1.157928 1.154557
åŸãããçµæã¯ãç·åœ¢ååž°ã䜿çšããŠåŸãããçµæãšéåžžã«åŒ±ãé¡äŒŒããŠãããäž¡æ¹ã®ã¢ãã«ã®æšæºåå·®ãã»ãŒåäžã§ããã«ããããããããã劥åœã§ããããã«èŠããŸãã
ãã®èšäºã®çµæãæ¯èŒããããã«ã
以åã®åºçç©ã®ãµã³ãã«ã䜿çšããã®ã§ã
ã¡ã«ã»ãã¹ã»ãã³ãEã¯ã©ã¹ã2010幎ããå€ããªããçŸåšåžå Žã«åºãŠãã
ã¢ã¹ã¯ã¯ã§
æ倧150äžã«ãŒãã«ã®åçæ§ã®é«ããªãã¡ãŒã®æ°ãèŠãŠã¿ãŸãããã

äžèšã®ãã¹ãŠãèŠçŽãããšãäžå€è»ã®éžæã®ããã«ããåœã®ãåºåã«ææã§ã¯ãªãããªã¢ã«ã¿ã€ã ã§æ©èœãã匷åãªããŒã«ãåŸããããšèªä¿¡ãæã£ãŠèšããŸãã è»ã販売ããåºåã®ããããã€ãã®ãµã€ãã«æéãè²»ããå¿
èŠããªããªããæœåšçã«äžå©çãªãªãã¡ãŒãèŠãããã«é転ããå¿
èŠããªããªããŸãã
ããããããã ãã§ã¯ãããŸãããçŸåšã
ããã¹ãã¯ãèæ
®ãããæ°åŠçè£
眮ã䜿çšããŠã賌å
¥ããã人ã ãã§ãªããè»ã売ããã人ãå©ããããšãã§ããŸãã
è»ã®è²©å£²
ããªãã®è»ã売ããšããããªãã¯ããã¡ãããå°ãªããšããããå®ãããªãã§ããããŠãããçæéã§å£²ãããã§ãã ããªãã®è»ã®è¿
éãã€æçãªè²©å£²ã®ããã«ã¯ããã®äŸ¡å€ãžã®ããŸããŸãªç¹æ§ã®è²¢ç®ãç解ããå¿
èŠããããŸãã
ãã®åé¡ã解決ããããã«ãåããã©ã³ãã ãã©ã¬ã¹ããã«åºã¥ããŠ
ãèªåè»ãè©äŸ¡ãã
ãµãŒãã¹ãéçº
ãããŸãã ã è»ã®ãã©ã¡ãŒã¿ãŒã«åŸã£ãŠãæ€çŽ¢ãã©ãŒã ã®ãã¹ãŠã®ãã£ãŒã«ãã«å
¥åããŸãããã®åŸãçŸåšã®åžå Žã®ãªãã¡ãŒã«åºã¥ããŠã¢ãã«ããã¬ãŒãã³ã°ãããŸãã åžå Žã«5ã€ä»¥äžã®åºåãããå Žåãå
¥åããããŒã¿ã®ã¢ã«ãŽãªãºã ã¯äŸ¡æ Œãäºæž¬ããåžå Žå
šäœã®ç¶æ³ã«å¿ããŠããã€ãã®èå³æ·±ãæ©èœãæäŸããŸãã æé«ã®ç²ŸåºŠãéæããããã«ãåæ察象ãšããŠããªããšåãäžä»£ã®è»ã®ã¿ãéžæãããããšã匷調ãã䟡å€ããããŸãã ããªãã®è»ã®è©äŸ¡ã®çµæã¯
pdfã¬ããŒãã®åœ¢åŒã§çæããããã®è²»çšã¯99âœã§ãã

æåŸã«
çŸåšããããªãéçºã®ããŸããŸãªæ¹åæ§ãæ€èšãããŠããããã®äžã®äž»ãªæ¹åæ§ã¯æ¬¡ã®ãšããã§ãã
æ¯èŒçé«äŸ¡ãªTOã®åã«æ¯èŒçæ°ããè»ïŒæ倧走è¡è·é¢10äžkmïŒã販売ãããããšããããããŸãããããã®ããŒã¿ãã¢ãã«ã§èæ
®ãããšäŸ¿å©ã§ãã ãããã£ãŠãçŸåšãäžèŠæš¡ããã³å€§èŠæš¡ã®èªåè»ãã£ãŒã©ãŒã®ä¿¡é Œã§ããããŒãããŒãæ¢ããŠããŸãã
ã¢ã¹ã¯ã¯ã«è»ã®éžæãšè©äŸ¡ã®ããã®ãªãã©ã€ã³ã»ã³ã¿ãŒãéèšããŸããããã¯ãå®è£
ãããã¢ã«ãŽãªãºã ã®ãããã§ã競åä»ç€Ÿãããã¯ããã«å®äŸ¡ã«ãªããŸãã
ãã€ã³ããªãžã§ã³ããªã»ã©ãŒãã«æ©èœãæäŸããããã®äŸ¿å©ãªAPIã®äœæã
ç§ããçºèšãããã¿ã¹ã¯ã®å®è£
ã«åœ¹ç«ã€ãã®ããã¢ã€ãã¢ãæäŸããããã®ã¯ãããŸããïŒ æžããŠãç§ã¯ãã€ã§ãã©ããªååãæ€èšããæºåãã§ããŠããŸãã
åç
§è³æ
- dat ïŒãµã³ãã«MB Eã¯ã©ã¹ïŒ
- dat_a5 ïŒAudi A5ãµã³ãã«ïŒ