
æ°å¹Žåã
ã€ã³ã¿ãã¥ãŒãå
¬éããã人工ç¥èœãç¹ã«ãã£ãããããã«ã€ããŠè©±ããŸããã åçè
ã¯ããã£ãããããã¯éä¿¡ããªãããéä¿¡ãæš¡å£ããããšã匷調ããŠããŸãã
圌ãã¯ãå®å
šã«äººéã¬ãã«ã®ã€ã³ããªãžã§ã³ããªãã€ã¯ããã€ã¢ãã°ã®ã³ã¢ãæ§ç¯ãããã®ã³ã¢ãžã®äŒè©±ãçµ¶ããæžããéä¿¡ã¢ã«ãŽãªãºã ãæ§ç¯ããŸããã ããã ãã§ãã
ç§ã®æèŠã§ã¯ãããã«ã¯äœãããããŸã...
ããã§ããHabréã§ã¯ãã£ãããããã«ã€ããŠå€ãã®è©±ããããŸãã ãããã¯éåžžã«ç°ãªãå ŽåããããŸãã åèªããšã®å¿çãçæãããã¥ãŒã©ã«äºæž¬ãããã¯ãŒã¯ã«åºã¥ãããããäžè¬çã§ãã ããã¯éåžžã«è峿·±ããã®ã§ãããå®è£
ã®èгç¹ãããç¹ã«å€§éã®åèªåœ¢åŒã®ããã«ãã·ã¢èªã®å Žåã¯é«äŸ¡ã§ãã Boltoonãã£ããããããå®è£
ããããã«å¥ã®ã¢ãããŒããéžæããŸããã
Boltoonã¯ãææ¡ãããããŒã¿ããŒã¹ããæãæå³çã«æãè¿ãåçãéžæãããã®åŸã®åŠçãè¡ããšããååã«åºã¥ããŠåäœããŸãã ãã®ã¢ãããŒãã«ã¯ããã€ãã®å©ç¹ããããŸãã
- äœæ¥é床;
- Chatbotã¯ããŸããŸãªã¿ã¹ã¯ã«äœ¿çšã§ããŸãããã®ããã«ã¯ãæ°ããããŒã¿ããŒã¹ãããŠã³ããŒãããå¿
èŠããããŸãã
- ããŒã¿ããŒã¹ãæŽæ°ããåŸããããã¯è¿œå ã®ãã¬ãŒãã³ã°ãå¿
èŠãšããŸããã
ã©ã®ããã«æ©èœããŸããïŒ
ãããã«å¯Ÿãã質åãšåçãå«ãããŒã¿ããŒã¹ããããŸãã

ããããå
¥åããããã¬ãŒãºã®æå³ãããèªèããããŒã¿ããŒã¹å
ã§é¡äŒŒãããã¬ãŒãºãèŠã€ããããšãå¿
èŠã§ãã ããšãã°ãããå
æ°ã§ããïŒããããå
æ°ã§ããïŒããããå
æ°ã§ããïŒããšããæå³ã§ãã ãªããªã ã³ã³ãã¥ãŒã¿ãŒã¯æåã§ã¯ãªãæ°åã§ããŸãæ©èœããŸããå
¥åãããã¬ãŒãºãšæ¢åã®ãã¬ãŒãºãšã®å¯Ÿå¿ã®æ€çŽ¢ã¯ãæ°åã®æ¯èŒã«éå®ããå¿
èŠããããŸãã ããŒã¿ããŒã¹ããã®è³ªåãå«ãåå
šäœãæ°å€ã«ããŸãã¯Nåã®å®æ°ã®ãã¯ãã«ã«å€æããå¿
èŠããããŸãã ãããã£ãŠããã¹ãŠã®ããã¥ã¡ã³ãã¯N次å
空éã®åº§æšãåãåããŸãã æ³åããã®ã¯å°é£ã§ãããæç¢ºã«ããããã«ã¹ããŒã¹ã®æ¬¡å
ã2ã«æžããããšãã§ããŸãã

åã空éã§ããŠãŒã¶ãŒãå
¥åãããã¬ãŒãºã®åº§æšãèŠã€ããã³ãµã€ã³ã¡ããªãã¯ã«åŸã£ãŠå©çšå¯èœãªãã¬ãŒãºãšæ¯èŒããæãè¿ããã®ãèŠã€ããŸãã Boltoonã¯ããã®ãããªåçŽãªã¢ã€ãã¢ã«åºã¥ããŠããŸãã

次ã«ããã¹ãŠã®é åºãšããæ£åŒãªèšèªã«ã€ããŠèª¬æããŸãã ãããã¹ãã®ãã¯ãã«è¡šçŸãïŒåèªã®åã蟌ã¿ïŒ-ãããã³ã°ã®æŠå¿µã玹ä»ããŸã
èªç¶èšèªããåºå®é·ã®ãã¯ãã«ãžã®åèªïŒéåžžã¯100ãã500次å
ããã®å€ãé«ãã»ã©è¡šçŸã¯ããæ£ç¢ºã«ãªããŸãããèšç®ãé£ãããªããŸãïŒã
ããšãã°ããç§åŠãããæ¬ããšããèšèã®æå³ã¯æ¬¡ã®ãšããã§ãã
vïŒ "science"ïŒ= [0.956ã-1.987 ...]
vïŒãæ¬ãïŒ= [0.894ã0.234 ...]Habréã«ã€ããŠã¯ãã§ã«ããã«ã€ããŠæžããŠããŸãïŒ
ããã§è©³çްãèªãããšãã§ã
ãŸã ïŒã ãã®ã¿ã¹ã¯ã«ã¯ã
忣ããã¹ã衚瀺
ã¢ãã«ãæé©ã§ãã ç¹å®ã®ãæå³ã®ç©ºéããã€ãŸããã¹ãŠã®åèªãæããŸãã¯æ®µèœããã€ã³ããšãªãN次å
ã®çäœããããšæ³åããŠãã ããã åé¡ã¯ããããæ§ç¯ããæ¹æ³ã§ããïŒ
2013幎ã
ããã¯ãã«ç©ºéã§ã®åèªè¡šçŸã®å¹ççãªæšå®ããšããèšäºãThomas Mikolovã«ãã£ãŠ
ç»å Žã ãword2vecã«ã€ããŠèªã£ãŠã
ãŸã ã ããã¯ãåèªã®åæ£è¡šçŸãèŠã€ããããã®ã¢ã«ãŽãªãºã ã®ã»ããã§ãã ãããã£ãŠãååèªã¯ç¹å®ã®ã»ãã³ãã£ãã¯ç©ºéã®ãã€ã³ãã«å€æããããã®ç©ºéã§ã®ä»£æ°æŒç®ã¯åèªã®æå³ã®æŒç®ã«å¯Ÿå¿ããŸãïŒãããã£ãŠãã»ãã³ãã£ãã¯åèªã䜿çšããŸãïŒã
åçã¯ãã女æ§ãããããã¯ãã«ã®äŸãšããŠã空éã®ãã®éåžžã«éèŠãªç¹æ§ã瀺ããŠããŸãã åèªãkingãã®ãã¯ãã«ããåèªãmanãã®ãã¯ãã«ãæžç®ããåèªãwomanãã®ãã¯ãã«ã远å ãããšããqueenããåŸãããŸãã
Yandexã®
è¬çŸ©ã§ããå€ãã®äŸãèŠã€ããããšãã§ããŸãããŸããç¹å¥ãªæ°åŠã®ãªãword2vecã®èª¬æããããŸãã

Pythonã§ã¯ããã®ããã«èŠããŸãïŒgensimããã±ãŒãžãã€ã³ã¹ããŒã«ããå¿
èŠããããŸãïŒã
import gensim w2v_fpath = "all.norm-sz100-w10-cb0-it1-min100.w2v" w2v = gensim.models.KeyedVectors.load_word2vec_format(w2v_fpath, binary=True, unicode_errors='ignore') w2v.init_sims(replace=True) for word, score in w2v.most_similar(positive=[u"", u""], negative=[u""]): print(word, score)
ãã·ã¢ã®ååžã·ãœãŒã©ã¹ãããžã§ã¯ãã«ãã£ãŠæ¢ã«æ§ç¯ãããword2vecã¢ãã«ã䜿çšããŸã
ååŸãããã®ïŒ
0.856020450592041 0.8100876212120056 0.8040660619735718 0.7984248995780945 0.7981560826301575 0.7949156165122986 0.7862951159477234 0.7808529138565063 0.7741949558258057 0.7644592523574829
ãçãã«æãè¿ãèšèãããè©³çŽ°ã«æ€èšããŸãã æå³çã«é¢é£ããåèªãæ€çŽ¢
ããããã®
ãªãœãŒã¹ããããçµæã¯ãšãŽãããã¯ãŒã¯ãšããŠè¡šç€ºãããŸãã 以äžã¯ããçããšããèšèã«æãè¿ã20人ã®é£äººã§ãã

Mikolovã«ãã£ãŠææ¡ãããã¢ãã«ã¯éåžžã«åçŽã§ã-åæ§ã®æèã®åèªã¯åãããšãæå³ãããšä»®å®ãããŸãã ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®ã¢ãŒããã¯ãã£ãæ€èšããŠãã ããã

Word2vecã¯1ã€ã®é ãã¬ã€ã€ãŒã䜿çšããŸãã å
¥åå±€ã«ã¯ãèŸæžã®åèªãšåãæ°ã®ãã¥ãŒãã³ããããŸãã é衚瀺ã¬ã€ã€ãŒã®ãµã€ãºã¯ãã¹ããŒã¹ã®æ¬¡å
ã§ãã åºåå±€ã®ãµã€ãºã¯å
¥åå±€ãšåãã§ãã ãããã£ãŠãåŠç¿çšã®èªåœãVåèªã§æ§æãããNãåèªãã¯ãã«ã®æ¬¡å
ã§ãããšä»®å®ãããšãå
¥åå±€ãšé ãå±€ã®éã®éã¿ã¯ããµã€ãºVÃNã®SYN0è¡åã圢æããŸãã 以äžã衚ããŸãã

Vè¡ã®ããããã¯ãåèªã®ãã¯ãã«N次å
衚çŸã§ãã
åæ§ã«ãé衚瀺局ãšåºåå±€ã®éã®éã¿ã¯ãNÃVãããªãã¯ã¹SYN1ã圢æããŸãã æ¬¡ã«ãåºåã¬ã€ã€ãŒã®å
¥åã§æ¬¡ã®ããã«ãªããŸãã
ã©ãã§
ãããªãã¯ã¹SYN1ã®jçªç®ã®åã§ãã
ã¹ã«ã©ãŒç©ã¯ãn次å
空éã®2ç¹éã®è§åºŠã®äœåŒŠã§ãã ãã®åŒã¯ãåèªãã¯ãã«ãã©ãã ãè¿ããã瀺ããŠããŸãã åèªãå察ã®å Žåããã®å€ã¯-1ã§ãã æ¬¡ã«ãsoftmax-ããœããæå€§é¢æ°ãã䜿çšããŠãåèªã®ååžãååŸããŸãã
softmaxã䜿çšãããšãword2vecã¯ãã®é£ã«ããåèªã®ãã¯ãã«éã®ã³ãµã€ã³æž¬å®å€ãæå€§åããçºçããªãå Žåã¯æå°åããŸãã ããã¯ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®åºåã§ãã
ã¢ã«ãŽãªãºã ãã©ã®ããã«æ©èœããããããããçè§£ããããã«ãæ¬¡ã®æã§æ§æããããã¬ãŒãã³ã°ã®ã±ãŒã¹ãæ€èšããŠãã ããã
ãç«ã¯ç¬ãèŠãã
ãç«ã¯ç¬ã远ããããŠããŸããã
ãçœç«ã¯æšã«ç»ã£ãããã³ãŒãã¹èŸæžã«ã¯8ã€ã®åèªãå«ãŸããŠããŸãïŒ["white"ã "climbed up"ã "tree"ã "cat"ã "on"ã "stalked"ã "dog"ã "saw"]
ã¢ã«ãã¡ãããé ã«ãœãŒãããåŸãååèªã¯èŸæžã®ã€ã³ããã¯ã¹ã«ãã£ãŠåç
§ã§ããŸãã ãã®äŸã§ã¯ããã¥ãŒã©ã«ãããã¯ãŒã¯ã«ã¯8ã€ã®å
¥åãã¥ãŒãã³ãšåºåãã¥ãŒãã³ããããŸãã é ãå±€ã«3ã€ã®ãã¥ãŒãã³ããããšããŸãã ããã¯ãSYN0ããã³SYN1ããããã8Ã3ããã³3Ã8è¡åã«ãªãããšãæå³ããŸãã ãã¬ãŒãã³ã°ã®åã«ããããã®è¡åã¯ãéåžžã®ãã¬ãŒãã³ã°ã®å Žåãšåæ§ã«ãå°ããªã©ã³ãã å€ã§åæåãããŸãã SYN0ãšSYN1ãæ¬¡ã®ããã«åæåããŸãã

ãã¥ãŒã©ã«ãããã¯ãŒã¯ããç»å±±ããšãç«ããšããèšèã®é¢ä¿ãèŠã€ããªããã°ãªããªããšããŸãã ã€ãŸãããããã¯ãŒã¯ã®å
¥åã«ãäžæããå
¥åãããå Žåããããã¯ãŒã¯ã¯ãç«ããšããåèªã®é«ã確çã瀺ãå¿
èŠããããŸãã ã³ã³ãã¥ãŒã¿èšèªã®çšèªã§ã¯ããç«ããšããèšèã¯äžæ¢ãšåŒã°ãããç»ãããšããèšèã¯æèçã§ãã

ãã®å Žåãå
¥åãã¯ãã«Xã¯
ïŒãã¯ã©ã€ãã³ã°ãã¯èŸæžã®2çªç®ã§ããããïŒã ãã¯ãã«åèªãç«ã-
ã
ãäžæãã衚ããã¯ãã«ããããã¯ãŒã¯å
¥åã«äŸçµŠããããšãé ãå±€ã®ãã¥ãŒãã³ã®åºåã¯æ¬¡ã®ããã«èšç®ã§ããŸãã
é ãå±€ã®ãã¯ãã«Hã¯ããããªãã¯ã¹SYN0ã®2è¡ç®ã«çããããšã«æ³šæããŠãã ããã ãããã£ãŠãé衚瀺局ã®ã¢ã¯ãã£ãåæ©èœã¯ãå
¥åèªã®ãã¯ãã«ãé衚瀺局ã«ã³ããŒããããšã§ãã
åæ§ã«ãåºåå±€ã®å ŽåïŒ
åºåå±€ã§åèªã®ç¢ºçãååŸããå¿
èŠããããŸãã
ã®ããã«
äžå¿èªãšæèå
¥åãšã®é¢ä¿ãåæ ããŠããŸãã ãã¯ãã«ã確çã§è¡šç€ºããã«ã¯ãsoftmaxã䜿çšããŸãã jçªç®ã®ãã¥ãŒãã³ã®åºåã¯ã次ã®åŒã§èšç®ãããŸãã
$$ display $$ y_j = PïŒword_ {context}âword_jïŒ= \ frac {expâ¡^ {val_jÃval_ {context}}} {\ sum_ {k \ in V}expâ¡^ {val_jÃval_k}} = softmax $$衚瀺$$
ãããã£ãŠãã³ãŒãã¹å
ã®8ã€ã®åèªã®ç¢ºçã¯æ¬¡ã®ãšããã§ãã[0.143073 0.094925 0.114441
0.111166 0.14492 0.122874 0.119431 0.1448800]ããç«ãã®ç¢ºçã¯0.111166ã§ãïŒèŸæžã®ã€ã³ããã¯ã¹ã«ãããšïŒ ïŒ
ããã§ãååèªããã¯ãã«ãšäžèŽãããŸããã ããããèšèã§ã¯ãªããã¬ãŒãºãæç« å
šäœã§äœæ¥ããå¿
èŠããããŸãã 人ã
ã¯ãã®ããã«äŒããŸãã ãã®ããã«ã
Doc2vec ïŒå
ã
ã¯Paragraph VectorïŒããããŸã-word2vecã«åºã¥ããŠããã¹ãã®äžéšã®åæ£è¡šçŸãååŸããã¢ã«ãŽãªãºã ã§ãã ããã¹ãã¯ä»»æã®é·ãã«ããããšãã§ããŸãïŒã³ãã±ãŒã·ã§ã³ããæ®µèœãŸã§ã ãããŠãåºåã§åºå®é·ã®ãã¯ãã«ãååŸããããšãéåžžã«éèŠã§ãã
Boltoonã¯ãã®æè¡ã«åºã¥ããŠããŸãã ãŸãããã·ã¢èªçãŠã£ãããã£ã¢ïŒ
ãã³ããžã®ãªã³ã¯ïŒã«åºã¥ããŠã300次å
ã®ã»ãã³ãã£ãã¯ã¹ããŒã¹ãæ§ç¯ããŸãïŒåè¿°ã®ãšããã100ãã500ã®æ¬¡å
ãéžæããŸãïŒã
ããå°ãPythonã
model = Doc2Vec(min_count=1, window=10, size=100, sample=1e-4, workers=8)
ãã©ã¡ãŒã¿ãŒã䜿çšããŠããã«ãã¬ãŒãã³ã°ããããã«ãã¯ã©ã¹ã®ã€ã³ã¹ã¿ã³ã¹ãäœæããŸãã
- min_countïŒé »åºŠãæå®ãããé »åºŠããäœãå Žåãåèªã®æå°åºçŸé »åºŠ-ç¡èŠ
- windowïŒã³ã³ããã¹ããèæ
®ãããããŠã£ã³ããŠã
- ãµã€ãºïŒãã¯ãã«ã®æ¬¡å
ïŒã¹ããŒã¹ïŒ
- ãµã³ãã«ïŒæå®ãããé »åºŠãããé«ãå Žåãåèªã®æå€§åºçŸé »åºŠ-ç¡èŠ
- ã¯ãŒã«ãŒïŒã¹ã¬ããæ°
model.build_vocab(documents)
èŸæžã®è¡šãäœæããŸãã ããã¥ã¡ã³ã-ãŠã£ãããã£ã¢ã®ãã³ãã
model.train(documents, total_examples=model.corpus_count, epochs=20)
ãã¬ãŒãã³ã°ã total_examples-å
¥åããããã¥ã¡ã³ãã®æ°ã ãã¬ãŒãã³ã°ã¯äžåºŠè¡ãããŸãã ããã¯ãªãœãŒã¹ã倧éã«æ¶è²»ããããã»ã¹ã§ããã50 MBã®ãŠã£ãããã£ã¢ãã³ãããã¢ãã«ãæ§ç¯ããŠããŸãïŒ8 GBã®RAMãæèŒããç§ã®ã©ãããããã¯ãã«ãããŸããïŒã 次ã«ããã¬ãŒãã³ã°æžã¿ã¢ãã«ãä¿åããŠããããã®ãã¡ã€ã«ãåãåããŸãã

åè¿°ã®ããã«ãSYN0ããã³SYN1ã¯ãã¬ãŒãã³ã°äžã«åœ¢æãããéã¿è¡åã§ãã ãããã®ãªããžã§ã¯ãã¯ãpickleã䜿çšããŠåå¥ã®ãã¡ã€ã«ã«ä¿åãããŸãã ãµã€ãºã¯NÃVÃWã«æ¯äŸããŸããããã§ãNã¯ãã¯ãã«ã®æ¬¡å
ãVã¯èŸæžå
ã®åèªã®æ°ãWã¯1æåã®éã¿ã§ãã ããã«ããããã¡ã€ã«ãµã€ãºãéåžžã«å€§ãããªããŸããã
質åãšåçãšãšãã«ããŒã¿ããŒã¹ã«æ»ããŸãã æ°ããæ§ç¯ããã空éã§ãã¹ãŠã®ãã¬ãŒãºã®åº§æšãèŠã€ããŸãã ããŒã¿ããŒã¹ãæ¡åŒµãããšãã·ã¹ãã ãåãã¬ãŒãã³ã°ããå¿
èŠããªããªãã远å ããããã¬ãŒãºãèæ
®ããŠãåãã¹ããŒã¹ã§åº§æšãèŠã€ããã ãã§ååã§ããããšãããããŸãã ãããBoltoon'aã®äž»ãªå©ç¹ã§ã-ããŒã¿ã®æŽæ°ãžã®è¿
éãªé©å¿ã
次ã«ããŠãŒã¶ãŒãã£ãŒãããã¯ã«ã€ããŠèª¬æããŸãã 空éå
ã®è³ªåã®åº§æšãšããã«æãè¿ããã¬ãŒãºãæ€çŽ¢ããŸããããã¯ããŒã¿ããŒã¹ã§å©çšã§ããŸãã ããããããã§ã¯ãN次å
空éã§ç¹å®ã®ãã€ã³ãã«æãè¿ããã€ã³ããèŠã€ãããšããåé¡ãçºçããŸãã KD-Treeã®äœ¿çšããå§ãããŸãïŒè©³çްã«ã€ããŠã¯ã
ãã¡ããã芧ãã ãã ïŒã
KDããªãŒïŒK次å
ããªãŒïŒã¯ãè¶
å¹³é¢ã§ã®ã¯ãªããã³ã°ã«ãã£ãŠK次å
空éãäœæ¬¡å
ã®ç©ºéã«åå²ã§ããããŒã¿æ§é ã§ãã
from scipy.spatial import KDTree def build_tree(self, ethalon): return KDTree(list(ethalon.values()))
ããããããã«ã¯éå€§ãªæ¬ ç¹ããããŸããèŠçŽ ã远å ããããšãããªãŒã¯å¹³åããŠOïŒNlogNïŒã§åæ§ç¯ããããããé·æéããããŸãã ãã®ãããBoltoonã¯ãé
å»¶ãæŽæ°ã䜿çšããŸã-ããŒã¿ããŒã¹ã«è¿œå ãããMãã¬ãŒãºããšã«ããªãŒãåæ§ç¯ããŸãã æ€çŽ¢ã¯OïŒlogNïŒã§è¡ãããŸãã
Boltoonã®ãããªããã¬ãŒãã³ã°ã®ããã«ãæ¬¡ã®æ©èœãå°å
¥ãããŸããã質åãåãåã£ãåŸãå質ãè©äŸ¡ããããã«2ã€ã®ãã¿ã³ãæã€åçãéä¿¡ãããŸãã

åŠå®çãªåçã®å ŽåããŠãŒã¶ãŒã¯ãããä¿®æ£ããããã«æ±ããããä¿®æ£ãããçµæãããŒã¿ããŒã¹ã«å
¥åãããŸãã

ããŒã¿ããŒã¹ã«ãªããã¬ãŒãºã䜿çšããBoltoonãšã®å¯Ÿè©±ã®äŸã

ãã¡ãããããããå¿ããšåŒã¶ããšã¯å°é£ã§ãããã«ãã³ã¯ç¥æ§ãæã£ãŠããŸããã 圌ã¯Siriãæè¿ã®Aliceã®ãããªããããããããã¯ã»ã©é ãã§ãããããã¯åœŒã圹ã«ç«ãããé¢çœãããŸãããçµå±ã®ãšãããããã¯1人ã«ãã£ãŠäœæãããå€ã®ç·Žç¿ã®æ çµã¿ã®åŠçãããžã§ã¯ãã§ãã å°æ¥ãå¿çåŠçã¢ãžã¥ãŒã«ãåºå®ããŠïŒå¯Ÿè©±è
ã®ããã¢ãšäžèŽãããªã©ïŒãäŒè©±ã®ã³ã³ããã¹ããèŠããŠïŒä»¥åã®ããã€ãã®ã¡ãã»ãŒãžã®ãã¬ãŒã ã¯ãŒã¯å
ã§ïŒã¿ã€ããã¹ãåŠçããäºå®ã§ãã ããåççãªBoltoon 2.0ãæã«å
¥ãããšãé¡ã£ãŠããŸãã ããããããã¯ãã§ã«æ¬¡ã®èšäºã®äŒè©±ã§ãã
PS @boltoon_botãªã³ã¯ã䜿çšããŠé»å ±ã§Boltoonããã¹ãã§ããŸããåä¿¡ããåå¿çãè©äŸ¡ããããšãå¿ããªãã§ãã ãããããããªããšãåŸç¶ã®ã¡ãã»ãŒãžãç¡èŠãããŸãã ãããŠã誰ãäœãæžãããã®ãã¹ãŠã®ãã°ãèŠãã®ã§ãè¯èã®æ çµã¿ãç¶æããŸãããã
PPSãã®èšäºã®ã¢ããã€ã¹ãšå»ºèšçãªæ¹å€ã«ã€ããŠãå
çã®
PavelMSTUãš
ov7aã«æè¬ããŸãã