Word2vecã¯ãå®éã«ã¯ïŒãããªã«ãŒãã§ã¯ãªãïŒéåžžã®PCäžã§æ¯èŒçç°¡åã«å®è¡ã§ãã劥åœãªæéå
ã«åèªã®åæ£è¡šçŸãæ§ç¯ããå¯äžã®æ·±å±€åŠç¿ã¢ã«ãŽãªãºã ã§ããå°ãªããšãã
Kaggleã¯
ããèããŠããŸãã èšç·Žãããã¢ãã«ã§ã§ããããªãã¯ã«ã€ããŠ
ããã§èªãã åŸãç§ã¯ãã®ãããªããšã詊ããŠã¿ãªããã°ãªããªãããšã«æ°ä»ããŸããã åé¡ã¯1ã€ã ãã§ããç§ã¯äž»ã«Rèšèªã§åããŠããŸãããword2vecã®å
¬åŒã®å®è£
ãRã®äžã§èŠã€ãããªãã£ããããåã«ååšããªããšæããŸãã
ããããCã«ã¯
word2vecã®ãœãŒã¹ã³ãŒãããããGoogleã«ã¯
説æããããŸããRã«ã¯ãCãC ++ãããã³Fortranã§å€éšã©ã€ãã©ãªã䜿çšããæ©äŒããããŸãã ãšããã§ãæéã®Rã©ã€ãã©ãªã¯ãç¹ã«Cããã³C ++ã§äœæãããŠããŸãã éçºäžã®
tmcn.word2vec Rã©ãããŒããããŸãã ãã®èè
Jian Li ïŒäžåœèªã®ãŠã§ããµã€ãïŒã¯ãäžåœèªã®ãã¢ã®ãããªããšãããŸããïŒè±èªã§ãåäœããŸããããã·ã¢èªã§ã¯ãŸã 詊ããŠããŸããïŒã ãã®ããŒãžã§ã³ã®åé¡ã¯æ¬¡ã®ãšããã§ãã
- ãŸãããã¹ãŠã®ãã©ã¡ãŒã¿ãŒã¯Cã³ãŒãã§ä¿è·ãããŠããŸãã
- 第äºã«ãèè
ã¯èšç·Žãããã¢ãã«ãæäœããããã®é¢æ°ã1ã€ã ãäœæããŸãããããã¯åèªã®é¡äŒŒæ§ãè©äŸ¡ããæå€§å€ãæã€20ã®ãªãã·ã§ã³ã衚瀺ããŸãã
- 第äžã«ãx64 Windowsçšã®ããã±ãŒãžãæ§ç¯ã§ããŸããã§ããã win32ã§ã¯ãããã±ãŒãžã¯åé¡ãªãã€ã³ã¹ããŒã«ãããŸãã
ãã®ãè±ãããããã¹ãŠè©äŸ¡ããŠãç§ã¯word2vecã®Rã€ã³ã¿ãŒãã§ã€ã¹ã®ç¬èªããŒãžã§ã³ãäœæããããšã«ããŸããã å®ãèšããšãç§ã¯Cãããç¥ããªãã®ã§ãç°¡åãªããã°ã©ã ãæžãã ãã§ããã®ã§
ã Jian Liã®
ãœãŒã¹ã³ãŒãã Windowsã§ç¢ºå®ã«ã³ã³ãã€ã«ãããã®ã§ãåºç€ãšããããšã«ããŸããã äœããæ©èœããªãå Žåã¯ãåžžã«å
ã®ãã®ãšæ¯èŒã§ããŸãã
æºåãã
Windowsã§Rã®Cã³ãŒããã³ã³ãã€ã«ããã«ã¯ãããã«
Rtoolsãã€ã³ã¹ããŒã«ããå¿
èŠããããŸãã ãã®ããŒã«ãããã«ã¯ãCygwinã®äžã§å®è¡ãããgccã³ã³ãã€ã©ãå«ãŸããŠããŸãã Rtoolsãã€ã³ã¹ããŒã«ããåŸãPATH倿°ã確èªããå¿
èŠããããŸãã æ¬¡ã®ãããªãã®ãããã¯ãã§ãã
DïŒ\ Rtools \ bin; DïŒ\ Rtools \ gcc-4.6.3 \ bin; DïŒ\ R \ bin
OS Xã§ã¯ãRtoolsã¯å¿
èŠãããŸããã ã€ã³ã¹ããŒã«ãããã³ã³ãã€ã©ãå¿
èŠã§ããããã®ååšã¯gcc --versionã³ãã³ãã«ãã£ãŠãã§ãã¯ãããŸãã ããã§ãªãå Žåã¯ã
Xcodeãã€ã³ã¹ããŒã«ããXcode-ã³ãã³ãã©ã€ã³ããŒã«ã䜿çšããå¿
èŠããããŸãã
RããCã©ã€ãã©ãªãåŒã³åºãã«ã¯ã次ã®ããšãç¥ã£ãŠããå¿
èŠããããŸãã
- 颿°ãåŒã³åºããšãã®ãã¹ãŠã®å€ã¯ãã€ã³ã¿ãŒã®åœ¢åŒã§æž¡ããããããã®åãæç€ºçã«ç»é²ããããã«æ³šæããå¿
èŠããããŸãã æãä¿¡é Œã§ããæ¹æ³ã¯ãcharåã®ãã©ã¡ãŒã¿ãŒãæž¡ããŠãããæ¢ã«Cã«ããç®çã®åã«å€æããããšã§ãã
- åŒã³åºããã颿°ã¯å€ãè¿ããŸããã voidåã§ãªããã°ãªããŸããã
- Cã³ãŒãã§ã¯ãïŒinclude <Rh>åœä»€ã远å ããå¿
èŠããããŸããè€éãªæ°åŠãããå Žåã¯ãïŒinclude <R.math>ã远å ããŸãã
- Rã³ã³ãœãŒã«ã«äœããåºåããå¿
èŠãããå Žåã¯ãprintfïŒïŒã®ä»£ããã«RprintfïŒïŒã䜿çšããããšããå§ãããŸãã 確ãã«ãprintfïŒïŒãæ©èœããŸãã
ãŸããHelloãWorldïŒãªã©ãéåžžã«ã·ã³ãã«ãªãã®ãäœæããããšã«ããŸããã ãã ããããã«å€ãæž¡ãããããã«ããŸãã ç§ãé垞䜿çšããRstudioã䜿çšãããšãCããã³C ++ã³ãŒããèšè¿°ã§ãããã¹ãŠãæ£ããç¹ç¯ããŸãã hello.cã«ã³ãŒããèšè¿°ããŠä¿åããåŸãã³ãã³ãã©ã€ã³ãåŒã³åºããç®çã®ãã£ã¬ã¯ããªã«ç§»åããŠã次ã®ã³ãã³ãã§ã³ã³ãã€ã©ãèµ·åããŸããã
> R --arch x64 CMD SHLIB hello.c
win32ã§ã¯ãã¢ãŒããã¯ãã£ããŒã¯å¿
èŠãããŸããã
> R CMD SHLIB hello.c
ãã®çµæããã£ã¬ã¯ããªã«hello.oïŒå®å
šã«åé€ã§ããŸãïŒãšhello.dllã©ã€ãã©ãªã®2ã€ã®ãã¡ã€ã«ã衚瀺ãããŸããã ïŒOS Xã§ã¯ãdllã®ä»£ããã«ãæ¡åŒµåãsoã®ãã¡ã€ã«ãååŸããŸãïŒã çµæã®Rã®hello颿°ã¯ã次ã®ã³ãŒãã§åŒã³åºãããŸãã
dyn.load("hello.dll") hellof <- function(n) { .C("hello", as.integer(n)) } hellof(5)
ãã¹ãã§ã¯ããã¹ãŠãæ£ããæ©èœããword2vecã䜿çšããå®éšã§ã¯ããŒã¿ãæºåããããšãæ®ã£ãŠããããšã瀺ãããŸããã ç§ã¯ãèšèã®è¢ãšãããã³ãŒã³ã®è¢ãã¿ã¹ã¯ãããããã
Kaggleã«é£ããŠè¡ãããšã«ã
ãŸãã ã ãã¬ãŒãã³ã°ããã¹ããæªå²ãåœãŠã®ãµã³ãã«ããããåèšã§IMDBããã®æ ç»ã®10äžã®æ¹èšçãå«ãŸããŠããŸãã ãããã®ãã¡ã€ã«ãããŠã³ããŒãããåŸããããããHTMLã¿ã°ãç¹æ®æåãæ°åãå¥èªç¹ãã¹ãããã¯ãŒããåé€ããããŒã¯ã³åããŸããã åŠçã®è©³çްã¯çç¥ããŸããããã§ã«ãããã«ã€ããŠ
æžããŸããã
Word2vecã¯ãã¹ããŒã¹ã§åºåãããåèªãå«ã1è¡ã®ããã¹ããã¡ã€ã«åœ¢åŒã§ãã¬ãŒãã³ã°çšã®ããŒã¿ãåãå
¥ããŸãïŒå
¬åŒããã¥ã¡ã³ãã®word2vecã®äœ¿çšäŸãåæããããšã§ãããèŠã€ããŸããïŒã ããŒã¿ã»ããã1è¡ã«æ¥çããŠãããã¹ããã¡ã€ã«ã«ä¿åããŸããã
ã¢ãã«
Jian Liããªã¢ã³ãã§ã¯ããããã¯2ã€ã®ãã¡ã€ã«word2vec.hããã³word2vec.cã§ãã æåã®ã³ãŒãã«ã¯ã¡ã€ã³ã³ãŒããå«ãŸããŠãããã¡ã€ã³ã³ãŒãã¯å
ã®word2vec.cãšäžèŽããŠããŸãã 2çªç®ã¯ãTrainModelïŒïŒé¢æ°ãåŒã³åºãããã®ã©ãããŒã§ãã ç§ãæåã«æ±ºããã®ã¯ããã¹ãŠã®ã¢ãã«ãã©ã¡ãŒã¿ãRã³ãŒãã«åã蟌ãããšã§ããã word2vec.cã®Rã¹ã¯ãªãããšã©ãããŒãç·šéããå¿
èŠããããŸãããæ¬¡ã®æ§é ãåŸãããŸããã
dyn.load("word2vec.dll") word2vec <- function(train_file, output_file, binary, cbow, num_threads, num_features, window, min_count, sample) { //... ... OUT <- .C("CWrapper_word2vec", train_file = as.character(train_file), output_file = as.character(output_file), binary = as.character(binary), //... ) //... OUT... } word2vec("train_data.txt", "model.bin", binary=1, # output format, 1-binary, 0-txt cbow=0, # skip-gram (0) or continuous bag of words (1) num_threads = 1, # num of workers num_features = 300, # word vector dimensionality window = 10, # context / window size min_count = 40, # minimum word count sample = 1e-3 # downsampling of frequent words )
ãã©ã¡ãŒã¿ãŒã«é¢ããããã€ãã®èšèïŒ
ãã€ã㪠-ã¢ãã«åºå圢åŒã
cbow -skip-gramãŸãã¯åèªã®è¢ïŒcbowïŒã®ãã¬ãŒãã³ã°ã«äœ¿çšããã¢ã«ãŽãªãºã ã Skip-gramã¯äœéã§ããããŸããªåèªã§ã¯ããè¯ãçµæãåŸãããŸãã
num_threads-ã¢ãã«ã®æ§ç¯ã«é¢ä¿ããããã»ããµã¹ã¬ããã®æ°ã
num_features-ã¯ãŒãã¹ããŒã¹ïŒãŸãã¯åã¯ãŒãã®ãã¯ãã«ïŒã®æ¬¡å
ãæ°åããæ°çŸãæšå¥šãããŸãã
window-åŠç¿ã¢ã«ãŽãªãºã ãèæ
®ãã¹ãã³ã³ããã¹ãã®åèªæ°ã
min_count-æå³ã®ããåèªã®èŸæžã®ãµã€ãºãå¶éããŸãã ããã¹ãå
ã§æå®ãããæ°ãè¶
ããŠèŠã€ãããªãåèªã¯ç¡èŠãããŸãã æšå¥šå€ã¯10ã100ã§ãã
ãµã³ãã« -ããã¹ãå
ã®åèªã®åºçŸé »åºŠã®äžéã.00001ãã.01ãŸã§ãæšå¥šãããŸãã
æšå¥šããã
makefileããŒã䜿çšããŠæ¬¡ã®ã³ãã³ãã§ã³ã³ãã€ã«ããŸãã
> R --arch x64 CMD SHLIB -lm -pthread -O3 -march = native -Wall -funroll-loops -Wno-unused-result word2vec.c
ã³ã³ãã€ã©ãŒã¯ããã€ãã®èŠåãåºããŸããããæ·±å»ãªããšã¯äœããããŸããã§ããã åé¡ãªããdyn.load颿°ïŒ "word2vec.dll"ïŒã䜿çšããŠRã«ããŒãããåãååã®é¢æ°ãèµ·åããŸããã pthreadããŒã ãã䟿å©ã ãšæããŸãã æ®ããªãã§ãå®è¡ã§ããŸãïŒãããã®äžéšã¯Rtoolsæ§æã«ç»é²ãããŠããŸãïŒã
çµæïŒ
åèšãããšãç§ã®ãã¡ã€ã«ã¯1150äžèªãèŸæž-19133èªã§ããããšã倿ããIntel Core i7ãæèŒããã³ã³ãã¥ãŒã¿ãŒã§ã®ã¢ãã«äœææéã¯6åã§ããã ãªãã·ã§ã³ãæ©èœãããã©ããã確èªããããã«ãnum_threadsã®å€ã1ãã6ã«å€æŽããŸããã ãªãœãŒã¹ã®ç£èŠãèŠãªãããšãå¯èœã§ããã¢ãã«ã®æ§ç¯æéã¯1ååã«ççž®ãããŸããã ã€ãŸãããã®ãã®ã¯æ°åã§1,100äžèªãåŠçã§ããŸãã
é¡äŒŒæ§ã®è©äŸ¡
è·é¢çã«ã¯ãå®éã«ã¯äœã倿Žãããè¿ãããå€ã®æ°ã®ãã©ã¡ãŒã¿ãŒãåŒãåºããŸããã æ¬¡ã«ã圌ã¯ã©ã€ãã©ãªãã³ã³ãã€ã«ããRã«ããŒãããŠããæªãããšãè¯ãããç°¡åã«ç¢ºèªããŸããã
åèªïŒèªåœã®æªãäœçœ®ïŒ15
ã¯ãŒãã³ã¹ãã£ã¹ã
1ã²ã©ã0.5778409
2æããã0.5541780
3ãç²æ«ãª0.5527389
4ã²ã©ã0.5206609
5ç¬ããªãã0.4910716
6極æªãª0.4841466
7æããã0.4808238
8è¯ã0.4805901
9æªã0.4726501
10æããã0.4579800
åèªïŒèªåœã®è¯ãäœçœ®ïŒ6
ã¯ãŒãã³ã¹ãã£ã¹ã
1ãŸãšããª0.5678578
2çŽ æµãª0.5364762
3çŽ æŽããã0.5197815
4æªã0.4805902
5åªãã0.4554003
6è¯ã0.4365533
7倧äžå€«0.4361723
8æ¬åœã«0.4153538
9奜ã0.4061105
10眰é0.4004776
ãã¹ãŠãåã³ããŸããã£ãã è峿·±ãããšã«ãèšèã§æ°ãããšãæªãè·é¢ããè¯ãè·é¢ãŸã§ã®è·é¢ã¯ãè¯ãè·é¢ããæªãè·é¢ããã倧ãããªããŸãã ãŸãã圌ãã¯ãæããæãã¿ãž...ããšèšãããã«ãéããŸãåæ§ã§ãã ã¢ã«ãŽãªãºã ã¯ã次ã®åŒã«åŸã£ãŠããã¯ãã«éã®è§åºŠã®ã³ãµã€ã³ãšããŠé¡äŒŒåºŠãèšç®ããŸãïŒ
wikiã®ç»åïŒïŒ

ãã®ããããã¬ãŒãã³ã°æžã¿ã®ã¢ãã«ã䜿çšãããšãCãªãã§è·é¢ãèšç®ããé¡äŒŒæ§ã®ä»£ããã«ãããšãã°å·®ç°ãè©äŸ¡ã§ããŸãã ãããè¡ãã«ã¯ãããã¹ã圢åŒïŒãã€ããª= 0ïŒã§ã¢ãã«ãæ§ç¯ããread.tableïŒïŒã䜿çšããŠRã«ããŒãããäžå®éã®ã³ãŒããæžã蟌ãå¿
èŠããããŸãã äŸå€åŠçã®ãªãã³ãŒãïŒ
similarity <- function(word1, word2, model) { size <- ncol(model)-1 vec1 <- model[model$word==word1,2:size] vec2 <- model[model$word==word2,2:size] sim <- sum(vec1 * vec2) sim <- sim/(sqrt(sum(vec1^2))*sqrt(sum(vec2^2))) return(sim) } difference <- function(string, model) { words <- tokenize(string) num_words <- length(words) diff_mx <- matrix(rep(0,num_words^2), nrow=num_words, ncol=num_words) for (i in 1:num_words) { for (j in 1:num_words) { sim <- similarity(words[i],words[j],model) if(i!=j) { diff_mx[i,j]=sim } } } return(words[which.min(rowSums(diff_mx))]) }
ããã§ã¯ãåèªæ°ã«å¯Ÿããã¯ãšãªã®åèªæ°ã®ãµã€ãºã§æ£æ¹è¡åãäœæãããŸãã ããã«ãéé¡äŒŒèªã®åãã¢ã«ã€ããŠãé¡äŒŒæ§ãèšç®ãããŸãã æ¬¡ã«ãå€ãè¡ã§åèšãããæå°éã®è¡ããããŸãã è¡çªå·ã¯ããªã¯ãšã¹ãå
ã®ãäœåãªãåèªã®äœçœ®ã«å¯Ÿå¿ããŠããŸãã ãããªãã¯ã¹ã®ååã®ã¿ãã«ãŠã³ãããããšã«ãããäœæ¥ãå éã§ããŸãã ããã€ãã®äŸïŒ
>éãïŒããªã¹é¹¿äººéç¬ç«ããã¢ãã«ïŒ
[1]ã人éã
>éãïŒãæªãèµ€ãè¯ãããããã²ã©ãããã¢ãã«ïŒ
[1]ãèµ€ã
顿š
顿šã®æ€çŽ¢ã«ããããç·æ§ã¯å¥³æ§ãæããçã¯ã©ã®ããã«é¢ä¿ããŠããã®ãïŒããªã©ã®åé¡ã解決ã§ããŸãã ç¹å¥ãªåèªã¢ãããžãŒé¢æ°ã¯å
ã®Googleã³ãŒãã«ã®ã¿ãããããç§ã¯ããããããå¿
èŠããããŸããã Rãã颿°ãåŒã³åºãã©ãããŒãäœæããã³ãŒãããç¡éã«ãŒããåé€ããæšæºã®å
¥å/åºåã¹ããªãŒã ããã©ã¡ãŒã¿ãŒã®åãæž¡ãã«çœ®ãæããŸããã æ¬¡ã«ãã©ã€ãã©ãªã«ã³ã³ãã€ã«ããããã€ãã®å®éšãè¡ããŸããã ç§ã¯å¥³çã§æåããŸããã§ãããæããã«1,100äžèªã§ã¯äžååã§ãïŒword2vecã®èè
ã¯çŽ10åèªãæšå¥šããŠããŸãïŒã è¯ãäŸïŒ
>ã¢ãããžãŒïŒ "model300.bin"ãââ "man woman king"ã3ïŒ
ã¯ãŒãã³ã¹ãã£ã¹ã
1ç座0.4466286
2ãªã¢0.4268206
3ããªã³ã»ã¹0.4251665
>顿šïŒãmodel300.binãããç·ãšå¥³ã®å€«ãã3ïŒ
ã¯ãŒãã³ã¹ãã£ã¹ã
1人ã®åŠ»0.6323696
2äžå¿ å®ãª0.5626401
3çµå©0.5268299
>ã¢ãããžãŒïŒãmodel300.binãããman woman boyãã3ïŒ
ã¯ãŒãã³ã¹ãã£ã¹ã
1人ã®å¥³ã®å0.6313665
æ¯2人0.4309490
3 10代0.4272232
ã¯ã©ã¹ã¿ãªã³ã°
ããã¥ã¡ã³ããèªãã åŸãword2vecã«ã¯çµã¿èŸŒã¿ã®K-Meansã¯ã©ã¹ã¿ãªã³ã°ãããããšãããã£ãã ãããŠãããã䜿çšããã«ã¯ãRã®ãã1ã€ã®ãã©ã¡ãŒã¿ãŒãã¯ã©ã¹ãããåŒãåºããã ãã§ååã§ãã ããã¯ã¯ã©ã¹ã¿ãŒã®æ°ã§ããããããŒããã倧ããå Žåãword2vecã¯word-cluster numberãšãã圢åŒã®ããã¹ããã¡ã€ã«ãçæããŸãã 300åã®ã¯ã©ã¹ã¿ãŒã§ã¯ãæ£æ°ãåŸãã®ã«ååã§ã¯ãããŸããã§ããã éçºè
ããã®çºèŠçææ³ïŒèŸæžã®ãµã€ãºã¯5ã§å²ãããŸãããããã£ãŠã3000ãéžæããŸãããããã€ãã®æåããã¯ã©ã¹ã¿ãŒãæäŸããŸãïŒãããã®åèªãè¿ãçç±ãçè§£ã§ãããšããæå³ã§æåããŸãïŒã
åèªID
335ãŠãŒã¢ã¢2952
489æ·±å»ãª2952
872è³¢ã2952
1035ãŠãŒã¢ã¢2952
1796ã®åç
§2952
1916颚åº2952
2061ãã¿ãã¿2952
2367颚å€ãããª2952
2810åæ²¹2952
2953ã¢ã€ãããŒ2952
3125ãšãã§ããªã2952
3296è¶çª2952
3594åºã2952
4870æãã2952
4979ãšããžã®å¹ãã2952
åèªID
1025ç«241
3242ããŠã¹241
11189ãããŒ241
åèªID
1089è»é322
1127è»é322
1556ããã·ã§ã³322
1558幎å
µå£«322
3254ãã€ããŒ322
3323æŠé322
3902ã³ãã³ã322
3975ãŠããã322
4270倧äœ322
4277ã³ãã³ããŒ322
7821å°é322
7853æµ·å
µé322
8691æµ·è»322
9762æè322
10391 gi 322
12452è»å£322
15839æ©å
µ322
16697ãã€ããŒ322
ã¯ã©ã¹ã¿ãªã³ã°ã®å©ããåããŠãææ
åæãè¡ãã®ã¯ç°¡åã§ãã ãããè¡ãã«ã¯ããã¯ã©ã¹ã¿ãŒããã°ããäœæããå¿
èŠããããŸããããã¯ãã¯ã©ã¹ã¿ãŒã®æå€§æ°ã«å¯Ÿãããªããžã§ã³æ°ã®ãµã€ãºã®ãããªãã¯ã¹ã§ãã ãã®ãããªãããªãã¯ã¹ã®åã»ã«ã«ã¯ãç¹å®ã®ã¯ã©ã¹ã¿ãŒå
ã®ã¬ãã¥ãŒããã®åèªã®ãããæ°ãå¿
èŠã§ãã 詊ããããšã¯ãããŸããããããã§ã¯åé¡ã¯ãããŸããã
圌ãã¯ãIMDBããã®ã¬ãã¥ãŒã®ç²ŸåºŠã¯ããèšèã®è¢ããéããŠãããè¡ãå Žåãšåããããããã«äœã
ãšèšããŸãã
ãã¬ãŒãº
Word2vecã¯ããã¬ãŒãºã䜿çšããããåèªã®å®å®ããçµã¿åããã䜿çšãããã§ããŸãã ãããè¡ãããã«ãå
ã®ã³ãŒãã«ã¯word2phraseããã·ãŒãžã£ããããŸãã 圌女ã®ä»äºã¯ãé »ç¹ã«çºçããåèªã®çµã¿åãããèŠã€ããŠããããã®éã®ã¹ããŒã¹ãã¢ã³ããŒã¹ã³ã¢ã«çœ®ãæããããšã§ãã æåã®ãã¹ã®åŸã«ååŸããããã¡ã€ã«ã«ã¯2ã€ã®åèªãå«ãŸããŠããŸãã å床word2phraseã«éä¿¡ãããšãããªãã«ãšãã©ãŒã衚瀺ãããŸãã ãã®çµæã¯ãword2vecã®ãã¬ãŒãã³ã°ã«äœ¿çšã§ããŸãã
word2vecãšã®é¡æšã«ãããRãããã®ããã·ãŒãžã£ãåŒã³åºããŸããã
word2phrase("train_data.txt", "train_phrase.txt", min_count=5, threshold=100)
min_countãã©ã¡ãŒã¿ãŒã¯ãæå®ãããå€ãããå°ãªããã¬ãŒãºãèæ
®ããªãããã«ããŸãã
ãããå€ã¯ã¢ã«ãŽãªãºã ã®æåºŠãå¶åŸ¡ããå€ã倧ããã»ã©ãæ€åºããããã¬ãŒãºã¯å°ãªããªããŸãã 2åç®ã®ãã¹ã®åŸãç§ã¯çŽ6000ã®çµã¿åãããåŸãŸããã ãã¬ãŒãºèªäœãèŠãããã«ãæåã«ããã¹ã圢åŒã§ã¢ãã«ãäœæããããããåèªã®åãåŒãåºããŠããã®äžã§ãã£ã«ã¿ãªã³ã°ããŸããã æ¬¡ã«äŸã瀺ããŸãã
[5887] "works_perfectly" "four_year_old" "multi_million_dollar"
[5890] "fresh_faced" "return_living_dead" "seemed_forced"
[5893] "freddie_prinze_jr" "re_lucky" "puerto_rico"
[5896]ãevery_sentenceããliving_hellããwent_straightã
[5899] "supporting_cast_include" "action_set_pieces" "space_shuttle"
è·é¢ïŒïŒã®ããã€ãã®ãã¬ãŒãºãéžæããŸããïŒ
>è·é¢ïŒ "p_model300_2.bin"ãââ "crouching_tiger_hidden_ââdragon"ã10ïŒ
åèªïŒcrouching_tiger_hidden_ââdragonèªåœã®äœçœ®ïŒ15492
ã¯ãŒãã³ã¹ãã£ã¹ã
1 tsui_hark 0.6041993
2 ang_lee 0.5996884
3 martial_arts_films 0.5541546
4 kung_fu_hustle 0.5381692
5倧ããã0.5305687
6 kill_bill 0.5279162
7ã°ã©ã€ã³ãããŠã¹0.5242150
8ããã¯ã0.5224440
9äºç®0.5141657
10 john_woo 0.5046486
>è·é¢ïŒ "p_model300_2.bin"ãââ "academy_award_winning"ã10ïŒ
åèªïŒacademy_award_winningããã£ãã©ãªãŒã®äœçœ®ïŒ15780
ã¯ãŒãã³ã¹ãã£ã¹ã
1ããããŒã0.4570983
2 ever_produced 0.4558123
3 francis_ford_coppola 0.4547777
4 producer_director 0.4545878
5 set_standard 0.4512480
6åå 0.4503479
7 won_academy_award 0.4477891
8 michael_mann 0.4464636
9 huge_budget 0.4424854
10 directorial_debut 0.4406852
ããã§ãå®éšãå®äºããŸããã éèŠãªæ³šæç¹ã®1ã€ã¯ãword2vecãã¡ã¢ãªãšçŽæ¥ãéä¿¡ãããããšã§ããRã®çµæãäžå®å®ã«åäœããã»ãã·ã§ã³ãã¯ã©ãã·ã¥ãããå¯èœæ§ããããŸãã ããã¯ãRãæ£ããåŠçã§ããªãOSããã®èšºæã¡ãã»ãŒãžã®åºåãåå ã§ããå ŽåããããŸãã ã³ãŒãã«ãšã©ãŒããªãå Žåã¯ãã€ã³ã¿ãŒããªã¿ãŒãŸãã¯Rstudioã®åèµ·åã圹ç«ã¡ãŸãã
Rã³ãŒããCãœãŒã¹ãããã³ç§ã®
ãªããžããªã® x64 Windows dllã§ã³ã³ãã€ã«ãã
ãŸã ã
UPDïŒServPonomarevãšã®è«äºããã³ãã®åŸã®word2vecã³ãŒãã®åæã®çµæãã¢ã«ãŽãªãºã ã1000ã¯ãŒãã®è¡ã§ãã¬ãŒãã³ã°ãããããã«æ²¿ã£ãŠãŠã£ã³ããŠããã©ã¹/ãã€ãã¹5ã¯ãŒãã§ç§»åããããšãããããŸããã EOLæåãæ€åºããããšãã¢ã«ãŽãªãºã ã«ãã£ãŠèŸæžå
ã®ãŒãçªå·ã®ç¹å¥ãªåèªã«å€æããããŠã£ã³ããŠã®ç§»åã忢ããæ°ããè¡ã§ç¶è¡ãããŸãã ã¢ãã«å
ã®EOLã§åºåãããåèªã®è¡šçŸã¯ãã¹ããŒã¹ã§åºåãããåãåèªã®è¡šçŸãšã¯ç°ãªããŸãã çµè«ïŒãœãŒã¹ããã¹ããããã¥ã¡ã³ãã®ã³ã¬ã¯ã·ã§ã³ã§ãããã©ã€ã³ãã£ãŒãã§åºåããããã¬ãŒãºãŸãã¯æ®µèœã§ããå Žåããã®è¿œå æ
å ±ãåé€ããªãã§ãã ããã EOLãã£ã©ã¯ã¿ãŒããã¬ãŒãã³ã°ã»ããã«æ®ããŸãã æ®å¿µãªããããããäŸã§èª¬æããããšã¯éåžžã«å°é£ã§ãã