Anacondaã«ãããã³ãºãªã³ããŒã¿ãµã€ãšã³ã¹ã®æžç±ã®ç« ã®ç¿»èš³ã玹ä»ããŸã
ãäºæž¬ããŒã¿åæ-ã¢ããªã³ã°ãšæ€èšŒã
ããŸããŸãªããŒã¿åæã宿œããäž»ãªç®çã¯ããã¿ãŒã³ãæ€çŽ¢ããŠãä»åŸäœãèµ·ããããäºæž¬ããããšã§ãã æ ªåŒåžå Žã§ã¯ãç ç©¶è
ãå°éå®¶ãããŸããŸãªãã¹ãã宿œããŠãåžå Žã¡ã«ããºã ãçè§£ããŠããŸãã ãã®å Žåãå€ãã®è³ªåãããããšãã§ããŸãã ä»åŸ5幎éã®åžå Žææ°ã®ã¬ãã«ã¯ã©ããªããŸããïŒ æ¬¡ã®IBMäŸ¡æ Œåž¯ã¯ã©ããªããŸããïŒ åžå Žã®ãã©ãã£ãªãã£ã¯å°æ¥çã«å¢æžããŸããïŒ æ¿åºãçšæ¿çã倿Žããå Žåãã©ã®ãããªåœ±é¿ããããŸããïŒ ããåœãä»ã®åœãšè²¿ææŠäºãéå§ããå Žåã®æœåšçãªå©çãšæå€±ã¯äœã§ããïŒ ããã€ãã®é¢é£å€æ°ãåæããŠãæ¶è²»è
ã®è¡åãã©ã®ããã«äºæž¬ããŸããïŒ å€§åŠé¢çã忥ããå¯èœæ§ãäºæž¬ã§ããŸããïŒ ããç¹å®ã®ç
æ°ã®ç¹å®ã®è¡åã®éã®é¢ä¿ãèŠã€ããããšãã§ããŸããïŒ
ãããã£ãŠã次ã®ãããã¯ãæ€èšããŸãã
- äºæž¬ããŒã¿åæã«ã€ããŠ
- 䟿å©ãªããŒã¿ã»ãã
- å°æ¥ã®ã€ãã³ãã®äºæž¬
- ã¢ãã«éžæ
- ã°ã¬ã³ãžã£ãŒå æé¢ä¿ãã¹ã
äºæž¬ããŒã¿åæã«ã€ããŠ
人ã
ã¯å°æ¥ã®ã€ãã³ãã«é¢ããŠå€ãã®è³ªåããããããããŸããã
- æè³å®¶ã¯ãæ ªäŸ¡ã®å°æ¥ã®åããäºæž¬ã§ããã°ã倧ããªå©çãäžããããšãã§ããŸãã
- äŒæ¥ã¯ã補åã®åŸåãäºæž¬ã§ããã°ãæ ªäŸ¡ãšåžå Žã·ã§ã¢ãé«ããããšãã§ããŸãã
- æ¿åºã¯ãé«éœ¢åã瀟äŒãçµæžã«äžãã圱é¿ãäºæž¬ã§ããã°ãå·äºç®ããã®ä»ã®é¢é£ããæŠç¥ç決å®ã«é¢ããŠããè¯ãæ¿çãçå®ããã€ã³ã»ã³ãã£ããå¢ããã§ãããã
- 倧åŠã¯ã忥çã«èšå®ãããå質ãšã¹ãã«ã®é¢ã§åžå Žã®éèŠãååã«çè§£ã§ããã°ãå°æ¥ã®åŽååã®ããŒãºãæºããããã«ãããè¯ãããã°ã©ã ã®ã»ãããéçºããããæ°ããããã°ã©ã ãç«ã¡äžãããã§ããŸãã
ããè¯ãäºåŸã®ããã«ãç ç©¶è
ã¯å€ãã®è³ªåãèæ
®ãã¹ãã§ãã ããšãã°ããµã³ãã«ããŒã¿ãå°ããããŸãããïŒ äžè¶³ããŠãã倿°ãåé€ããæ¹æ³ã¯ïŒ ãã®ããŒã¿ã»ããã¯ãããŒã¿åéæé ã«é¢ããŠåã£ãŠããŸããïŒ æ¥µå€ãæåºéã«ã€ããŠã©ãæããŸããïŒ å£ç¯æ§ãšã¯äœã§ãããã©ã®ããã«å¯ŸåŠããŸããïŒ ã©ã®ã¢ãã«ã䜿çšããå¿
èŠããããŸããïŒ ãã®ç« ã§ã¯ããããã®åé¡ã®ããã€ãã«ã€ããŠèª¬æããŸãã 䟿å©ãªããŒã¿ã»ããããå§ããŸãããã
䟿å©ãªããŒã¿ã»ãã
æé©ãªããŒã¿ãœãŒã¹ã®1ã€ã¯
UCI Machine Learning Repositoryã§ãã ãµã€ãã«ã¢ã¯ã»ã¹ãããšã次ã®ãªã¹ãã衚瀺ãããŸãã

ããšãã°ãæåã®ããŒã¿ã»ããïŒAbaloneïŒãéžæãããšã次ã®ããã«è¡šç€ºãããŸãã ã¹ããŒã¹ãç¯çŽããããã«ãäžéšã®ã¿ã衚瀺ãããŸãã

ããããããŠãŒã¶ãŒã¯ããŒã¿ã»ãããããŠã³ããŒãããŠå€æ°å®çŸ©ãèŠã€ããããšãã§ããŸãã æ¬¡ã®ã³ãŒãã䜿çšããŠãããŒã¿ã»ãããããŒãã§ããŸãã
dataSet<-"UCIdatasets" path<-"http://canisius.edu/~yany/RData/" con<-paste(path,dataSet,".RData",sep='') load(url(con)) dim(.UCIdatasets) head(.UCIdatasets)
察å¿ããåºåã¯æ¬¡ã®ãšããã§ãã

åã®çµè«ãããããŒã¿ã»ããã«ã¯427ã®èŠ³æž¬å€ïŒããŒã¿ã»ããïŒãããããšãããããŸãã ããããã«ã€ããŠã
NameãData_TypesãDefault_TaskãAttribute_TypesãN_Instances ïŒã€ã³ã¹ã¿ã³ã¹ã®æ°ïŒã
N_Attributes ïŒå±æ§ã®æ°ïŒã
Yearãªã©ã®7ã€ã®é¢é£ãã颿°ããããŸãã
Default_Taskãšãã倿°ã¯ãåããŒã¿ã»ããã®äž»ãªçšéãšããŠè§£éã§ããŸãã ããšãã°ã
AbaloneãšåŒã°ããæåã®ããŒã¿ã»ããã
Classificationã«äœ¿çšã§ããŸãã
äžæã®ïŒïŒé¢æ°ã䜿çšããŠã次ã«ç€ºããã¹ãŠã®å¯èœãª
Default_Taskãæ€çŽ¢ã§ããŸãã

Rããã±ãŒãžAppliedPredictiveModeling
ãã®ããã±ãŒãžã«ã¯ããã®ç« ãä»ã®ç« ã§äœ¿çšã§ããå€ãã®äŸ¿å©ãªããŒã¿ã»ãããå«ãŸããŠããŸãã ãããã®ããŒã¿ã»ãããèŠã€ããæãç°¡åãªæ¹æ³ã¯ã次ã«ç€ºã
helpïŒïŒé¢æ°ã䜿çšããããšã§ãã
library(AppliedPredictiveModeling) help(package=AppliedPredictiveModeling)
ããã§ã¯ããããã®ããŒã¿ã»ãããããŒãããäŸãããã€ã瀺ããŸãã 1ã€ã®ããŒã¿ã»ãããèªã¿èŸŒãã«ã¯ã
dataïŒïŒé¢æ°ã䜿çšã
ãŸã ã
abaloneãšããæåã®ããŒã¿ã»ããã«ã¯ã次ã®ã³ãŒãããããŸãã
library(AppliedPredictiveModeling) data(abalone) dim(abalone) head(abalone)
åºåã¯æ¬¡ã®ãšããã§ãã

å Žåã«ãã£ãŠã¯ã倧ããªããŒã¿ã»ããã«ã¯ããã€ãã®ãµãããŒã¿ã»ãããå«ãŸããŸãã
library(AppliedPredictiveModeling) data(solubility) ls(pattern="sol")
[1] "solTestX" "solTestXtrans" "solTestY" [4] "solTrainX" "solTrainXtrans" "solTrainY"
åããŒã¿ã»ãããããŒãããã«ã¯ã颿°
dimïŒïŒ ã
headïŒïŒ ã
tailïŒïŒããã³
summaryïŒïŒã䜿çšã§ããŸãã
æç³»ååæ
æç³»åã¯ãå€ãã®å Žåããããã®ééãçééã§ããé£ç¶ããç¬éã«ååŸãããå€ã®ã»ãããšããŠå®çŸ©ã§ããŸãã 幎次ãååæãææ¬¡ã鱿¬¡ãæ¥æ¬¡ãªã©ãããŸããŸãªæéããããŸãã GDPïŒåœå
ç·çç£ïŒã®æç³»åã§ã¯ãéåžžãååæãŸãã¯å¹Žæ¬¡ã䜿çšããŸãã èŠç©ãã-å¹Žæ¬¡ãææ¬¡ãããã³æ¥æ¬¡ã®é »åºŠã 次ã®ã³ãŒãã䜿çšããŠãç±³åœã®GDPã«é¢ããããŒã¿ãååæããšãšå¹Žéã®äž¡æ¹ã§ååŸã§ããŸãã
ath<-"http://canisius.edu/~yany/RData/" dataSet<-"usGDPannual" con<-paste(path,dataSet,".RData",sep='') load(url(con)) head(.usGDPannual)
YEAR GDP 1 1930 92.2 2 1931 77.4 3 1932 59.5 4 1933 57.2 5 1934 66.8 6 1935 74.3
dataSet<-"usGDPquarterly" con<-paste(path,dataSet,".RData",sep='') load(url(con)) head(.usGDPquarterly)
DATE GDP_CURRENT GDP2009DOLLAR 1 1947Q1 243.1 1934.5 2 1947Q2 246.3 1932.3 3 1947Q3 250.1 1930.3 4 1947Q4 260.3 1960.7 5 1948Q1 266.2 1989.5 6 1948Q2 272.9 2021.9
ãã ããæç³»ååæã«ã¯å€ãã®è³ªåããããŸãã ããšãã°ããã¯ãçµæžåŠã®èгç¹ãããããžãã¹ãŸãã¯çµæžã®ãµã€ã¯ã«ããããŸãã ç£æ¥ãäŒæ¥ã«ã¯å£ç¯æ§ããããŸãã ããšãã°ã蟲æ¥ç£æ¥ã䜿çšãããšãèŸ²å®¶ã¯æ¥ãšç§ã®å£ç¯ã«ããå€ããè²»ãããå¬ã«å°ãªãè²»ãããŸãã å°å£²æ¥è
ã«ãšã£ãŠã¯ã幎æ«ã«ã¯è«å€§ãªè³éãæµå
¥ããŸãã
æç³»åãæäœããã«ã¯ãRããã±ãŒãžã«å«ãŸãã
timeSeriesãšåŒã°ããå€ãã®äŸ¿å©ãªæ©èœã䜿çšã§ããŸãã ãã®äŸã§ã¯ãæ¯é±ã®é »åºŠã§æ¯æ¥ã®å¹³åããŒã¿ãååŸããŸãã
library(timeSeries) data(MSFT) x <- MSFT by <- timeSequence(from = start(x), to = end(x), by = "week") y<-aggregate(x,by,mean)
ãŸãã
headïŒïŒé¢æ°ã䜿çšããŠããã€ãã®èгå¯çµæã確èªããããšãã§ããŸãã
head(x)
GMT Open High Low Close Volume 2000-09-27 63.4375 63.5625 59.8125 60.6250 53077800 2000-09-28 60.8125 61.8750 60.6250 61.3125 26180200 2000-09-29 61.0000 61.3125 58.6250 60.3125 37026800 2000-10-02 60.5000 60.8125 58.2500 59.1250 29281200 2000-10-03 59.5625 59.8125 56.5000 56.5625 42687000 2000-10-04 56.3750 56.5625 54.5000 55.4375 68226700
head(y)
GMT Open High Low Close Volume 2000-09-27 63.4375 63.5625 59.8125 60.6250 53077800 2000-10-04 59.6500 60.0750 57.7000 58.5500 40680380 2000-10-11 54.9750 56.4500 54.1625 55.0875 36448900 2000-10-18 53.0375 54.2500 50.8375 52.1375 50631280 2000-10-25 61.7875 64.1875 60.0875 62.3875 86457340 2000-11-01 66.1375 68.7875 65.8500 67.9375 53496000
å°æ¥ã®ã€ãã³ãã®äºæž¬
ç§»åå¹³åãååž°ãèªå·±ååž°ãªã©ãæªæ¥ãäºæž¬ãããšãã«äœ¿çšã§ããå€ãã®æ¹æ³ããããŸãããŸããæãåçŽãªç§»åå¹³åããå§ããŸãããã
movingAverageFunction<- function(data,n=10){ out= data for(i in n:length(data)){ out[i] = mean(data[(i-n+1):i]) } return(out) }
åã®ã³ãŒãã§ã¯ãæéæ°ã®ããã©ã«ãå€ã¯10
ã§ããtimeSeriesãšããRããã±ãŒãžã«å«ãŸããMSFTãšããããŒã¿ã»ããã䜿çšã§ããŸãïŒæ¬¡ã®ã³ãŒããåç
§ïŒã
library(timeSeries) data(MSFT) p<-MSFT$Close
[1] 60.6250 61.3125 60.3125 59.1250 56.5625 55.4375
head(ma)
[1] 60.62500 61.31250 60.75000 60.25000 58.66667 57.04167
mean(p[1:3])
[1] 60.75
mean(p[2:4])
[1] 60.25
æåã¢ãŒãã§ã¯ã
xã®æåã®3ã€ã®å€ã®å¹³åã
yã® 3çªç®ã®å€ãšäžèŽããããšãããããŸãã ããæå³ã§ã¯ãç§»åå¹³åã䜿çšããŠå°æ¥ãäºæž¬ã§ããŸãã
次ã®äŸã§ã¯ãæ¥å¹Žã®äºæ³åžå Žãªã¿ãŒã³ãè©äŸ¡ããæ¹æ³ã瀺ããŸãã ããã§ã¯ãSïŒP500ã€ã³ããã¯ã¹ãšéå»ã®å¹³å幎éå€ãäºæ³å€ãšããŠäœ¿çšããŸãã æåã®ããã€ãã®ã³ãã³ãã¯ã
.sp500monthlyãšããé¢é£ããŒã¿ã»ãããããŒãããããã«äœ¿çšãããŸãã ãã®ããã°ã©ã ã®ç®çã¯ã幎éå¹³åãš90ïŒ
ã®ä¿¡é Œåºéãè©äŸ¡ããããšã§ãã
library(data.table) path<-'http://canisius.edu/~yany/RData/' dataSet<-'sp500monthly.RData' link<-paste(path,dataSet,sep='') load(url(link))
[min mean max ]
cat(min2,ourMean,max2,"\n")
0.05032956 0.09022369 0.1301178
çµæãããããããã«ãSïŒP500ã®éå»ã®å¹³å幎éåççã¯9ïŒ
ã§ãã ããããæ¥å¹Žã®ã€ã³ããã¯ã¹ã®åçæ§ã9ïŒ
ã«ãªããšã¯èšããŸããã 5ïŒ
ãã13ïŒ
ã«ãªãå¯èœæ§ãããããããã¯å€§ããªå€åã§ãã
å£ç¯æ§
次ã®äŸã§ã¯ãèªå·±çžé¢ã®äœ¿ç𿹿³ã瀺ããŸãã ãŸãã
astsaãšããRããã±ãŒãžãããŠã³ããŒãããŸããããã¯ãé©çšãããçµ±èšçæç³»ååæã®ç¥ã§ãã æ¬¡ã«ãååæããšã®é »åºŠã§ç±³åœã®GDPãèªã¿èŸŒã¿ãŸãã
library(astsa) path<-"http://canisius.edu/~yany/RData/" dataSet<-"usGDPquarterly" con<-paste(path,dataSet,".RData",sep='') load(url(con)) x<-.usGDPquarterly$DATE y<-.usGDPquarterly$GDP_CURRENT plot(x,y) diff4 = diff(y,4) acf2(diff4,24)
äžèšã®ã³ãŒãã§ã¯ã
diffïŒïŒé¢æ°ã¯å·®ãåãå
¥ããŸããããšãã°ãçŸåšã®å€ããåã®å€ãåŒãããã®ã§ãã 2çªç®ã®å
¥åå€ã¯é
å»¶ã瀺ããŸãã
acf2ïŒïŒãšãã颿°ã䜿çšããŠãACFããã³PACFæç³»åãäœæããã³å°å·ããŸãã ACFã¯èªå·±å
±åæ£é¢æ°ã衚ããPACFã¯éšåèªå·±çžé¢é¢æ°ã衚ããŸãã é¢é£ããã°ã©ãã¯æ¬¡ã®ãšããã§ãã

ã³ã³ããŒãã³ãã®å¯èŠå
ã°ã©ãã䜿çšã§ããã°ãæŠå¿µãšããŒã¿ã»ãããã¯ããã«çè§£ããããããšã¯æããã§ãã æåã®äŸã¯ãéå»50幎éã®ç±³åœã®GDPã®å€åã瀺ããŠããŸãã
path<-"http://canisius.edu/~yany/RData/" dataSet<-"usGDPannual" con<-paste(path,dataSet,".RData",sep='') load(url(con)) title<-"US GDP" xTitle<-"Year" yTitle<-"US annual GDP" x<-.usGDPannual$YEAR y<-.usGDPannual$GDP plot(x,y,main=title,xlab=xTitle,ylab=yTitle)
察å¿ããã¹ã±ãžã¥ãŒã«ã¯æ¬¡ã®ãšããã§ãã

GDPã«å¯Ÿæ°ã¹ã±ãŒã«ã䜿çšããå Žåãæ¬¡ã®ã³ãŒããšã°ã©ãã«ãªããŸãã
yTitle<-"Log US annual GDP" plot(x,log(y),main=title,xlab=xTitle,ylab=yTitle)
次ã®ã°ã©ãã¯çŽç·ã«è¿ããã®ã§ãã

Rããã±ãŒãž-LiblineaR
ãã®ããã±ãŒãžã¯ãLIBLINEAR C / C ++ã©ã€ãã©ãªã«åºã¥ãç·åœ¢äºæž¬ã¢ãã«ã§ãã
ã¢ã€ãªã¹ããŒã¿ã»ããã®äœ¿çšäŸã®1ã€ã次ã«ç€ºããŸãã ããã°ã©ã ã¯ããã¬ãŒãã³ã°ããŒã¿ã䜿çšããŠããã©ã³ããå±ããã«ããŽãªãäºæž¬ããããšããŸãã
library(LiblineaR) data(iris) attach(iris) x=iris[,1:4] y=factor(iris[,5]) train=sample(1:dim(iris)[1],100) xTrain=x[train,];xTest=x[-train,] yTrain=y[train]; yTest=y[-train] s=scale(xTrain,center=TRUE,scale=TRUE)
çµè«ã¯æ¬¡ã®ãšããã§ãã BCRã¯ãã©ã³ã¹ã®åããåé¡çã§ãã ãã®ãããã§ã¯ãé«ãã»ã©è¯ãïŒ
cat("Best model type is:",bestType,"\n")
Best model type is: 4
cat("Best cost is:",bestCost,"\n")
Best cost is: 1
cat("Best accuracy is:",bestAcc,"\n")
Best accuracy is: 0.98
print(res) yTest setosa versicolor virginica setosa 16 0 0 versicolor 0 17 0 virginica 0 3 14 print(BCR)
[1] 0.95
Rããã±ãŒãž-eclust
ãã®ããã±ãŒãžã¯ã髿¬¡å
ããŒã¿ã®è§£éãããäºæž¬ã¢ãã«çšã®ç°å¢ã«åªããã¯ã©ã¹ã¿ãªã³ã°ã§ãã ãŸããããã±ãŒãžã®ã·ãã¥ã¬ãŒããããããŒã¿ãå«ã
simdataãšããããŒã¿ã»ãããèŠãŠã¿ãŸãããã
library(eclust) data("simdata") dim(simdata)
[1] 100 502
simdata[1:5, 1:6]
YE Gene1 Gene2 Gene3 Gene4 [1,] -94.131497 0 -0.4821629 0.1298527 0.4228393 0.36643188 [2,] 7.134990 0 -1.5216289 -0.3304428 -0.4384459 1.57602830 [3,] 1.974194 0 0.7590055 -0.3600983 1.9006443 -1.47250061 [4,] -44.855010 0 0.6833635 1.8051352 0.1527713 -0.06442029 [5,] 23.547378 0 0.4587626 -0.3996984 -0.5727255 -1.75716775
table(simdata[,"E"])
0 1 50 50
åã®çµè«ã¯ãããŒã¿æ¬¡å
ã100 x 502ã§ããããšã瀺ããŠããŸã
ãYã¯é£ç¶å¿çãã¯ãã«ã§ããã
Eã¯ECLUSTã¡ãœããã®ãã€ããªç°å¢å€æ°ã§ãã
E = 0ã¯éé²åºïŒn = 50ïŒã§ã
E = 1ã¯é²åºïŒn = 50ïŒã§ãã
次ã®ããã°ã©ã Rã¯ããã£ãã·ã£ãŒã®z倿ãè©äŸ¡ããŸãã
library(eclust) data("simdata") X = simdata[,c(-1,-2)] firstCorr<-cor(X[1:50,]) secondCorr<-cor(X[51:100,]) score<-u_fisherZ(n0=100,cor0=firstCorr,n1=100,cor1=secondCorr) dim(score)
[1] 500 500
score[1:5,1:5]
Gene1 Gene2 Gene3 Gene4 Gene5 Gene1 1.000000 -8.062020 6.260050 -8.133437 -7.825391 Gene2 -8.062020 1.000000 9.162208 -7.431822 -7.814067 Gene3 6.260050 9.162208 1.000000 8.072412 6.529433 Gene4 -8.133437 -7.431822 8.072412 1.000000 -5.099261 Gene5 -7.825391 -7.814067 6.529433 -5.099261 1.000000
Fisherã®z倿ãå®çŸ©ããŸãã
nåã®ãã¢
x iãš
y iã®ã»ããããããšä»®å®ãããšã次ã®åŒã䜿çšããŠãããã®çžé¢ãæšå®ã§ããŸãã

ããã§ã
pã¯2ã€ã®å€æ°éã®çžé¢ã§ãã

ãããŠ

ã¯ãã©ã³ãã 倿°
xããã³
yã®ãµã³ãã«å¹³åã§ãã
zã®å€ã¯æ¬¡ã®ããã«å®çŸ©ãããŸãã
lnã¯èªç¶å¯Ÿæ°é¢æ°ã
arctanhïŒïŒã¯éåæ²ç·æ£æ¥é¢æ°ã§ãã
ã¢ãã«éžæ
è¯ãã¢ãã«ãèŠã€ãããšããæã
ããŒã¿ã®äžè¶³/éå°ã«çŽé¢ããŸãã æ¬¡ã®äŸã¯
ããããåŒçšã
ããŠããŸã ã ããã§ã®äœæ¥ã®åé¡ãšãéç·åœ¢é¢æ°ãè¿äŒŒããããã«å€é
åŒæ©èœã䜿çšããŠç·åœ¢ååž°ã䜿çšããæ¹æ³ã瀺ããŠããŸãã æå®ãããæ©èœïŒ

次ã®ããã°ã©ã ã§ã¯ãç·åœ¢ã¢ãã«ãšå€é
åŒã¢ãã«ã䜿çšããŠæ¹çšåŒãè¿äŒŒããããšããŸãã ãããã«å€æŽãããã³ãŒããããã«ç€ºããŸãã ãã®ããã°ã©ã ã¯ãã¢ãã«ã«å¯ŸããããŒã¿äžè¶³/äŸçµŠéå°ã®åœ±é¿ã瀺ããŠããŸãã
import sklearn import numpy as np import matplotlib.pyplot as plt from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn.model_selection import cross_val_score
çµæã®ã°ã©ãã¯æ¬¡ã®ãšããã§ãã

Pythonããã±ãŒãž-ã¢ãã«ãã£ãããŠã©ãŒã¯
äŸã¯
ããã«ãããŸã ã
ã³ãŒãã®æåã®æ°è¡ã¯æ¬¡ã®ãšããã§ãã
import datetime import pandas from sqlalchemy import create_engine from metta import metta_io as metta from catwalk.storage import FSModelStorageEngine, CSVMatrixStore from catwalk.model_trainers import ModelTrainer from catwalk.predictors import Predictor from catwalk.evaluation import ModelEvaluator from catwalk.utils import save_experiment_and_get_hash help(FSModelStorageEngine)
察å¿ããçµè«ãããã«ç€ºããŸãã ã¹ããŒã¹ãç¯çŽããããã«ãäžéšã®ã¿ã衚瀺ãããŸãã
Help on class FSModelStorageEngine in module catwalk.storage: class FSModelStorageEngine(ModelStorageEngine) | Method resolution order: | FSModelStorageEngine | ModelStorageEngine | builtins.object | | Methods defined here: | | __init__(self, *args, **kwargs) | Initialize self. See help(type(self)) for accurate signature. | | get_store(self, model_hash) | | ----------------------------------------------------------------------
| Data descriptors inherited from ModelStorageEngine: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined)
Pythonããã±ãŒãž-sklearn
sklearnã¯éåžžã«äŸ¿å©ãªããã±ãŒãžã§ããããããã®ããã±ãŒãžã®äœ¿çšäŸãããã«ç€ºã䟡å€ããããŸãã ããã«ç€ºãäŸã¯ãããã±ãŒãžã䜿çšããŠãbag-of-wordsã¢ãããŒãã䜿çšããŠãããã¯ããšã«ããã¥ã¡ã³ããåé¡ããæ¹æ³ã瀺ããŠããŸãã
ãã®äŸã§ã¯ã
scipy.sparseãããªãã¯ã¹ã䜿çšããŠãªããžã§ã¯ããä¿åãã
ã¹ããŒã¹ãããªãã¯ã¹ãå¹ççã«åŠçã§ããããŸããŸãªåé¡åšã瀺ããŸãã ãã®äŸã§ã¯ã20ã®ãã¥ãŒã¹ã°ã«ãŒãã®ããŒã¿ã»ããã䜿çšããŸãã èªåçã«ããŠã³ããŒãããããã£ãã·ã¥ãããŸãã zipãã¡ã€ã«ã«ã¯å
¥åãã¡ã€ã«ãå«ãŸããŠããã
ããããããŠã³ããŒãã§ã
ãŸã ã ã³ãŒãã¯
ãã¡ãããå
¥æã§ã
ãŸã ã ã¹ããŒã¹ãç¯çŽããããã«ãæåã®æ°è¡ã®ã¿ã瀺ãããŠããŸãã
import logging import numpy as np from optparse import OptionParser import sys from time import time import matplotlib.pyplot as plt from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import HashingVectorizer from sklearn.feature_selection import SelectFromModel
察å¿ããåºåã¯æ¬¡ã®ãšããã§ãã

åæ¹æ³ã«ã¯ãè©äŸ¡ããã¬ãŒãã³ã°æéããã¹ãæéã®3ã€ã®ææšããããŸãã
ãžã¥ãªã¢ããã±ãŒãž-QuantEcon
ããšãã°ããã«ã³ãé£éã®äœ¿çšãèããŠã¿ãŸãããã
using QuantEcon P = [0.4 0.6; 0.2 0.8]; mc = MarkovChain(P) x = simulate(mc, 100000); mean(x .== 1)
çµæïŒ

ãã®äŸã®ç®çã¯ãå°æ¥ã®ããçµæžçå°äœã«ãã人ãå¥ã®çµæžçå°äœã«ã©ã®ããã«å€åãããã調ã¹ãããšã§ãã æåã«ã次ã®ãã£ãŒããèŠãŠã¿ãŸãããã

ãäžè¯ãã¹ããŒã¿ã¹ã®å·Šç«¯ã®æ¥åãèŠãŠã¿ãŸãããã 0.9ã¯ããã®ã¹ããŒã¿ã¹ã®äººã90ïŒ
ã®ç¢ºçã§è²§å°ç¶æ
ã«ãªãã10ïŒ
ãäžæµéçŽã«ãªãããšãæå³ããŸãã æ¬¡ã®è¡åã§è¡šãããšãã§ããŸãããŒãã¯ããŒãéã«ãšããžããªãå Žæã§ãã

次ã®ãããªæ£ã®æŽæ°jãškãããå Žåã2ã€ã®ç¶æ
xãšyã¯äºãã«é¢é£ããŠãããšèšãããŠããŸãã

ãã¹ãŠã®ç¶æ
ãæ¥ç¶ãããŠããå Žåããã«ã³ãé£é
Pã¯æ¢çŽãšåŒã°ããŸãã ã€ãŸãã
xãš
yãããããïŒxãyïŒ
ã«ã€ããŠå ±åãããå Žåã§ãã æ¬¡ã®ã³ãŒãã§ããã確èªããŸãã
using QuantEcon P = [0.9 0.1 0.0; 0.4 0.4 0.2; 0.1 0.1 0.8]; mc = MarkovChain(P) is_irreducible(mc)
次ã®ã°ã©ãã¯æ¥µç«¯ãªã±ãŒã¹ã衚ããŠããŸããè²§ãã人ã®å°æ¥ã®ç¶æ³ã¯100ïŒ
è²§ããããã§ãã

çµæã
falseã«ãªããããæ¬¡ã®ã³ãŒãã§ãããã確èªã
ãŸã ã
using QuantEcon P2 = [1.0 0.0 0.0; 0.1 0.8 0.1; 0.0 0.2 0.8]; mc2 = MarkovChain(P2) is_irreducible(mc2)
ã°ã¬ã³ãžã£ãŒå æé¢ä¿ãã¹ã
Grangerå æé¢ä¿ãã¹ãã¯ã1ã€ã®æç³»åãèŠå ã§ãããã©ããã倿ãã2çªç®ã®æç³»åãäºæž¬ããããã®æçšãªæ
å ±ãæäŸããããã«äœ¿çšãããŸãã æ¬¡ã®ã³ãŒãã§ã¯ãå³ãšããŠ
ChickEggãšããååã®
ããŒã¿ã»ããã䜿çšããŠããŸãã ããŒã¿ã»ããã«ã¯ãé¶ã®æ°ãšåµã®æ°ã®2ã€ã®åããããã¿ã€ã ã¹ã¿ã³ããä»ããŠããŸãã
library(lmtest) data(ChickEgg) dim(ChickEgg)
[1] 54 2
ChickEgg[1:5,]
chicken egg [1,] 468491 3581 [2,] 449743 3532 [3,] 436815 3327 [4,] 444523 3255 [5,] 433937 3156
åé¡ã¯ãæ¥å¹Žã®é¶ã®æ°ãäºæž¬ããããã«ä»å¹Žã®åµã®æ°ã䜿çšã§ãããã©ããã§ãã
ãã®å Žåãé¶ã®æ°ãã°ã¬ã³ãžã£ãŒã®åµæ°ã®çç±ã«ãªããŸãã ããã§ãªãå Žåãé¶ã®æ°ã¯åµã®æ°ã®ã°ã¬ã³ãžã£ãŒã®çç±ã§ã¯ãªããšèšããŸãã é¢é£ããã³ãŒãã¯æ¬¡ã®ãšããã§ãã
library(lmtest) data(ChickEgg) grangertest(chicken~egg, order = 3, data = ChickEgg)
Granger causality test Model 1: chicken ~ Lags(chicken, 1:3) + Lags(egg, 1:3) Model 2: chicken ~ Lags(chicken, 1:3) Res.Df Df F Pr(>F) 1 44 2 47 -3 5.405 0.002966 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ã¢ãã«1ã§ã¯ãéã®æ°ã説æããããã«ãéã®é
ããšåµã®é
ãã䜿çšããããšããŸãã
ãªããªã
Pã®å€
ã¯éåžžã«å°ããïŒ0.01ã§ææïŒãåµã®æ°ãé¶ã®æ°ã®ã°ã¬ã³ãžã£ãŒã®çç±ã§ãããšèšããŸãã
次ã®ãã¹ãã¯ãé¶ã«é¢ããããŒã¿ã䜿çšããŠæ¬¡ã®æéãäºæž¬ã§ããªãããšã瀺ããŠããŸãã
grangertest(egg~chicken, order = 3, data = ChickEgg)
Granger causality test Model 1: egg ~ Lags(egg, 1:3) + Lags(chicken, 1:3) Model 2: egg ~ Lags(egg, 1:3) Res.Df Df F Pr(>F) 1 44 2 47 -3 0.5916 0.6238
次ã®äŸã§ã¯ãIBMãšSïŒP500ã®åçæ§ããã§ãã¯ããŠããããã®åå ãå¥ã®ã°ã¬ã³ãžã£ãŒã®çç±ã§ããããšã確èªããŸãã
ãŸããå©åã颿°ãå®çŸ©ããŸãã
ret_f<-function(x,ticker=""){ n<-nrow(x) p<-x[,6] ret<-p[2:n]/p[1:(n-1)]-1 output<-data.frame(x[2:n,1],ret) name<-paste("RET_",toupper(ticker),sep='') colnames(output)<-c("DATE",name) return(output) }
>x<-read.csv("http://canisius.edu/~yany/data/ibmDaily.csv",header=T) ibmRet<-ret_f(x,"ibm") x<-read.csv("http://canisius.edu/~yany/data/^gspcDaily.csv",header=T) mktRet<-ret_f(x,"mkt") final<-merge(ibmRet,mktRet) head(final)
DATE RET_IBM RET_MKT 1 1962-01-03 0.008742545 0.0023956877 2 1962-01-04 -0.009965497 -0.0068887673 3 1962-01-05 -0.019694350 -0.0138730891 4 1962-01-08 -0.018750380 -0.0077519519 5 1962-01-09 0.011829467 0.0004340133 6 1962-01-10 0.001798526 -0.0027476933
ããã§ãå
¥åå€ã§é¢æ°ãåŒã³åºãããšãã§ããŸãã ãã®ããã°ã©ã ã®ç®æšã¯ãIBMã®åçæ§ã説æããããã«åžå Žã®é
ãã䜿çšã§ãããã©ããããã¹ãããããšã§ãã åæ§ã«ãåžå Žåçã«ãããIBMã®é
ãã説æããããã«ãã§ãã¯ããŸãã
library(lmtest) grangertest(RET_IBM ~ RET_MKT, order = 1, data =final)
Granger causality test Model 1: RET_IBM ~ Lags(RET_IBM, 1:1) + Lags(RET_MKT, 1:1) Model 2: RET_IBM ~ Lags(RET_IBM, 1:1) Res.Df Df F Pr(>F) 1 14149 2 14150 -1 24.002 9.729e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
çµæã¯ãSïŒP500ãçµ±èšçã«0.1ïŒ
ãšææã§ãããããIBMã®æ¬¡æã®åçæ§ã説æããããã«äœ¿çšã§ããããšã瀺ããŠããŸãã æ¬¡ã®ã³ãŒãã¯ãIBMã®é
ããSïŒP500ã®å€æŽã説æããŠãããã©ããã確èªããŸãã
grangertest(RET_MKT ~ RET_IBM, order = 1, data =final)
Granger causality test Model 1: RET_MKT ~ Lags(RET_MKT, 1:1) + Lags(RET_IBM, 1:1) Model 2: RET_MKT ~ Lags(RET_MKT, 1:1) Res.Df Df F Pr(>F) 1 14149 2 14150 -1 7.5378 0.006049 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ãã®çµæã¯ããã®æéäžã«ãIBMã®ãªã¿ãŒã³ã䜿çšããŠãæ¬¡ã®æéã®SïŒP500ã€ã³ããã¯ã¹ã説æã§ããããšã瀺åããŠããŸãã