
ããŒã¿ãµã€ãšã³ã¹ã®åéã®ããã°ã©ããå°éå®¶ã¯ããã䌌ããŠãŒã¶ãŒãããã¡ã€ã«ãèŠã€ãããã䌌ããããªé³æ¥œãéžæãããšãã課é¡ã«çŽé¢ããããšããããããŸãã ãœãªã¥ãŒã·ã§ã³ã¯ããªããžã§ã¯ãããã¯ãã«åœ¢åŒã«å€æããæãè¿ããã®ãèŠã€ããããšã«éå
ã§ããŸãã
ãŸããé¡èªèåé¡ã§æè¿åãæ€çŽ¢ããå¿
èŠæ§ã«çŽé¢ããŸããã ããã§ããã¥ãŒã©ã«ãããã¯ãŒã¯ã䜿çšããŠé¡ã®ãã¯ãã«è¡šçŸã圢æãããã§ã«æåãªäººã
ã®æãè¿ããã¯ãã«ãæ¢ããŸãã æåã«ãæ€çŽ¢ã®ããã«ãSpotifyãå«ãããç¥ãããå®çžŸã®ããã¢ã«ãŽãªãºã ãšããŠAnnoyãéžæããŸããã ãããã圌ãã¯ããã«ã圌ã®èšæ¶æ¬²æ±ã§ãç§ãã¡ã¯RAMã«é©åããªãã£ãããå€ãã®ç²ŸåºŠã倱ã£ãããšã«æ°ä»ããŸããã ããã¯å°ãç ç©¶ã«ã€ãªãããŸããã ãã®çµæã以äžã§èª¬æããŸãã
çè«ãå®è·µã§èããããã«ããã®èšäºã«ã¯ãåèªã«æãè¿ããã®ãæ¢ãå°ããªã³ãŒãããããŸãã 人æ°ã®ããword2vecã䜿çšããŠãã¯ã¿ãŒè¡šçŸãååŸããŸãã ãã®ã¢ã«ãŽãªãºã ã¯ãæå³çã«é¡äŒŒããåèªã«å¯ŸããŠè¿ããã¯ãã«ãçæããŸãã word2vec CBOWã§ã¯ãç°å¢ã«ãã£ãŠåèªãäºæž¬ããå°ããªãã¥ãŒã©ã«ãããã¯ãŒã¯ã®åŠç¿ã®å¯ç£ç©ãšããŠãã¯ãã«è¡šçŸãååŸãããŸãã ãã¯ãã«ã䜿çšãããšãking +ïŒwoman-manïŒ= queenã®ãããªç®è¡æŒç®ãã§ããã®ã¯è峿·±ãã§ãã

ã³ãŒãã§ãããæäœããæ¹æ³ãèŠãŠã¿ãŸãããã
model = gensim.models.KeyedVectors.load_word2vec_format('./GoogleNews-vectors-negative300.bin', binary=True) start = time.time() pprint.pprint(model.wv.most_similar(positive=['king'])) print('time:', time.time() - start) print('king + (woman - man) = ', model.wv.most_similar(positive=['woman', 'king'], negative=['man'])[0]) print('Japan + (Moscow - Russia) = ', model.wv.most_similar(positive=['Moscow', 'Japan'], negative=['Russia'])[0])
ååŸãããã®ïŒ
[(u'kings', 0.7138045430183411), (u'queen', 0.6510956883430481), (u'monarch', 0.6413194537162781), (u'crown_prince', 0.6204219460487366), (u'prince', 0.6159993410110474), (u'sultan', 0.5864823460578918), (u'ruler', 0.5797567367553711), (u'princes', 0.5646552443504333), (u'Prince_Paras', 0.5432944297790527), (u'throne', 0.5422105193138123)] time: 0.236690998077 king + (woman - man) = (u'queen', 0.7118192911148071) Japan + (Moscow - Russia) = (u'Tokyo', 0.8696038722991943)
äžèšã®äŸã§ã¯ãã©ã€ãã©ãªã䜿çšããŠgensimããã¹ããšãGoogleã®word2vec ã¢ãã« ïŒ1.5 GBïŒãåŠçããŸããããã®ã¢ãã«ã«ã¯ã300äžã®åèªãšçããã¬ãŒãºããããŸãã ã³ãŒãåºåã¯ã女çãåäž»ãçåãçã«è¿ãããšã瀺ããŠããŸãã ãŸãããã¯ãã«ã䜿çšããæŒç®ãæ©èœããããšã確èªããŸããã ãã ãããªã¯ãšã¹ãããšã«1/4ç§ã¯ããŸãé
åçã§ã¯ãªããçµå±ã®ãšãããgensimã¯ãã«ãŒããã©ãŒã¹æ€çŽ¢ã®æ¯èŒçè¯ãå®è£
ãæã£ãŠããŸãïŒãã¹ãŠã®æ¢ç¥ã®ãªããžã§ã¯ããŸã§ã®è·é¢ã®èšç®ã䜿çšïŒã 次ã«ã粟床ã®ããããªæå€±ã®ã¿ã§ããã®æéãæ°çŸåã®1ã«ççž®ã§ããæ¹æ³ãæ€èšããŸãã
ããããåçŽãªã¢ã€ãã¢ããå§ããŸãããïŒå¹³é¢ã§2ã€ã®ååã«åå²ããããšã«ãããæ¢çŽ¢ç©ºéãçããããã«ããŠãã ããã ãŸããæ€çŽ¢äžã«ãã¯ãšãªãšåãå¹³é¢ã®åãåŽã«ããé£äººãŸã§ã®è·é¢ã®ã¿ãèæ
®ããŸãã
nbrs = NearestNeighbors(algorithm='brute', metric='cosine') nbrs.fit(model.wv.syn0norm) king_vec = model.wv['king'][np.newaxis, :]
ååŸãããã®ïŒ
full search time: 20.3163180351 [u'king', u'kings', u'queen', u'monarch', u'crown_prince', u'prince', u'sultan', u'ruler', u'princes', u'Prince_Paras'] elements in mask: 1961204 half search time: 9.15824007988 [u'king', u'kings', u'queen', u'monarch', u'crown_prince', u'prince', u'sultan', u'ruler', u'princes', u'Prince_Paras']
ãã®ãããèšç®ãååã«ããŠãé£è¡æ©ã®ããé£ã§æ£ç¢ºã«å€±ããŸããã åŸã§èª¬æããã¢ã«ãŽãªãºã ã®äžã«ã¯ãåæ§ã®ããªãã¯ã䜿çšãããã®ããããŸãã
1. HNSW
2011幎ã«ãæé«ã®ããã²ãŒã·ã§ã³å¯èœãªã¹ã¢ãŒã«ã¯ãŒã«ãïŒNSWïŒã®æè¿åæ€çŽ¢ã¢ã«ãŽãªãºã ã®1ã€ãå
¬éãããŸããã 2016幎ã圌ã®éå±€çå¯èªå°äžçïŒHNSWïŒã®åŸç¶è
ãç»å ŽããŸããã
NSW芪ã¢ã«ãŽãªãºã ããå§ããŸãããã ãäžçã¯çãããšããã³ã©ã ã«åºã¥ããŠããŸãã ãããã®ã°ã©ãã«ã¯å¥åŠã§äŸ¿å©ãªæ©èœããããŸããé ç¹ã®ãã¢ã¯ã»ãšãã©é£æ¥ããŠããªãå¯èœæ§ãé«ãã§ãããæ¯èŒçå°æ°ã®ã¹ãããã§å°éå¯èœã§ãïŒ å¹³åããŠïŒã ãã®ãããªã°ã©ãã¯éåžžã«äžè¬çã§ãã ããšãã°ãè³ã®ãã¥ãŒã©ã«ãããã¯ãŒã¯ããœãŒã·ã£ã«ã¡ãã£ã¢ã°ã«ãŒããããã³WordNetã»ãã³ãã£ãã¯ãããã¯ãŒã¯ã¯SWã°ã©ãã§ãã ãã®å Žåãé ç¹ã¯ãã¯ãã«ã§ããããšããžã¯ããããæãè¿ãé ç¹ã«æ¥ç¶ããŸãã ã°ã©ãã«ã¯ãé è·é¢ã®ããŒã¯ãçµã¶ãšããžã衚瀺ãããŸãã
è¿åãæ€çŽ¢ããã«ã¯ãã¯ãšãªãŸã§ã®æçè·é¢ã§é ç¹ãæ€çŽ¢ããŠã°ã©ããå·¡åããŸãã ã©ã³ãã ãªé ç¹ããéå§ãããå人ãã®æªèšªåã®é ç¹ïŒçŸåšã®ãšããžã«æ¥ç¶ãããŠããé ç¹ïŒããã¯ãšãªãŸã§ã®è·é¢ãèæ
®ããæçè·é¢ã®é ç¹ã«ç§»åããŸãã é·ããšããžã¯ãã°ã©ãã«worldå±ãªäžçã®ç¹æ§ãäžãããªã¯ãšã¹ãã«è¿ããªããžã§ã¯ãã®é åã«ãã°ããç§»åã§ããããã«ããçããšããžã¯æãè¿ãé£äººãç±å¿ã«æ€çŽ¢ããŸãã
ã°ã©ããå·¡åããéã«ãæè¿åã®å°ããªãªã¹ããæŽæ°ããæ¬¡ã®å埩ã§ãªã¹ããæŽæ°ãããŠããªãå Žåã¯åæ¢ããŸãã
æ¬äŒŒã³ãŒã K-NNSearch(object q, integer: m, k) 1 TreeSet [object] tempRes, candidates, visitedSet, result 2 for (iâ0; i < m; i++) do: 3 put random entry point in candidates 4 tempResânull 5 repeat: 6 get element c closest from candidates to q 7 remove c from candidates 8 //check stop condition: 9 if c is further than k-th element from result 10 than break repeat 11 //update list of candidates: 12 for every element e from friends of c do: 13 if e is not in visitedSet than 14 add e to visitedSet, candidates, tempRes 15 16 end repeat 17 //aggregate the results: 18 add objects from tempRes to result 19 end for 20 return best k elements from result
index = nmslib.init(space='cosinesimil', method='sw-graph') nmslib.addDataPointBatch(index, np.arange(model.wv.syn0.shape[0], dtype=np.int32), model.wv.syn0) index.createIndex({}, print_progress=True) start = time.time() items = nmslib.knnQuery(index, 10, king_vec.tolist()) print(time.time() - start) print([model.wv.index2word[idx] for idx in items])
ååŸãããã®ïŒ
0.000545024871826 [u'king', u'kings', u'queen', u'monarch', u'crown_prince', u'prince', u'sultan', u'ruler', u'princes', u'royal']
Hierarchical Navigable Small WorldïŒHNSWïŒã¢ã«ãŽãªãºã ã§ã®äžèšã®ã¢ã€ãã¢ã®éçºãæ€èšããŠãã ããã ããã¯å€ãã®ç¹ã§NSWã«éåžžã«äŒŒãŠããŸãããçŸåšã¯ã°ã©ãã®éå±€ãæ±ã£ãŠããŸãããã¹ãŠã®ãªããžã§ã¯ãã¯ãŒãã¬ã€ã€ãŒã§è¡šãããã¬ã€ã€ãŒã倧ãããªã£ããå°ãããªã£ããããã«ã€ããŠå€§ãããªããŸãã ãã®å Žåãã¬ã€ã€ãŒäžã®ãã¹ãŠã®ãªããžã§ã¯ã ã¬ã€ã€ãŒäžã«ãããŸã ã
æ€çŽ¢ãããšããéå§ã¯äžäœå±€ã®ã°ã©ãã®ã©ã³ãã ãªé ç¹ããçºçããããã§èŠæ±ã«è¿ãé ç¹ïŒåè£ïŒããã°ããèŠã€ããåã®å±€ã§ãããããæ€çŽ¢ãåéããŸãã
æ¬äŒŒã³ãŒã SearchAtLayer (object q, Set[object] enterPoints, integer: M, ef, layer) 1 Set [object] visitedSet 2 priority_queue [object] candidates (closer - first), result (further - first) 3 candidates, visitedSet, result â enterPoints 7 4 repeat: 5 object c =candidates.top() 6 candidates.pop() 7 //check stop condition: 8 if d(c,q)>d(result.top(),q) do: 9 break 10 //update list of candidates: 11 for_each object e from c.friends(layer) do: 12 if e is not in visitedSet do: 13 add e to visitedSet 14 if d(e, q)< d(result.top(),q) or result.size()<ef do: 15 add e to candidates, result 16 if result.size()>ef do: 17 result.pop() 18 return best k elements from result K-NNSearch (object query, integer: ef) 1 Set [object] tempRes, enterPoints=[enterpoint] 2 for i=maxLayer downto 1 do: 3 tempRes=SearchAtLayer (query, enterPoints, M, 1, i) 4 enterPoints =closest elements from tempRes 5 tempRes=SearchAtLayer (query, enterPoints, M, ef, 0) 6 return best K of tempRes
1.1ã é·æãšçæ
+ã¢ã«ãŽãªãºã ã¯çè§£ãããã
+æå
端ã®çµæã衚瀺ããŸã
+ nmslibã©ã€ãã©ãªã«å¹æçãªå®è£
ããããŸã
+ã°ã©ããšããžãä¿åããããã®è¿œå ã¡ã¢ãªã³ã¹ããå°ãªã
-ãã®ã¢ã«ãŽãªãºã ã¯ãã¯ãã«è¡šçŸã®å§çž®ããµããŒãããŠããŸãããããã«ã€ããŠã¯ããã«æ€èšããŸã
index = nmslib.init(space='cosinesimil', method='hnsw') nmslib.addDataPointBatch(index, np.arange(model.wv.syn0.shape[0], dtype=np.int32), model.wv.syn0) index.createIndex({}, print_progress=True) start = time.time() items = nmslib.knnQuery(index, 10, king_vec.tolist()) print(time.time() - start) print([model.wv.index2word[idx] for idx in items])
ååŸãããã®ïŒ
0.000469923019409 [u'king', u'kings', u'queen', u'monarch', u'crown_prince', u'prince', u'sultan', u'ruler', u'princes', u'Prince_Paras']
2. FAISS
2017幎3æãFacebookã¯FANNã©ã€ãã©ãªã§ããANNã®ãœãªã¥ãŒã·ã§ã³ã玹ä»ããŸããã å€ãã®æ¹æ³ãšã¢ã«ãŽãªãºã ãçµã¿åãããŠããŸãã 以äžã§æ€èšããã¢ã«ãŽãªãºã ã§ã¯ããã¯ãã«ã®ã°ã«ãŒããŸã§ã®è·é¢ã¯ããããã®è¿ãã®åºæºç¹ãŸã§ã®è·é¢ã§è¿äŒŒãããŸãã ãã®ããããªã¯ãšã¹ãããå°æ°ã®ã³ã³ãããŒã«ãã€ã³ããŸã§ã®è·é¢ãèŠã€ãã培åºçãªæ€çŽ¢ã«ããããªã¯ãšã¹ãã®æ®ãã®éšåã«ããè¿ãã³ã³ãããŒã«ãã€ã³ãã«å±ãããã¯ãã«ãŸã§ã®è·é¢ãèšç®ã§ããŸãã ãã®ã¢ã«ãŽãªãºã ãéšåçã«åæããŸãããã
2.1ã ADC-é察称è·é¢èšç®
次ã®ã¢ã€ãã¢ããã詳现ã«èããŠã¿ãŸãããããã¯ãã«ã®ã°ã«ãŒããŸã§ã®è·é¢ã¯ããããã®è¿ãã®å¶åŸ¡ç¹ãŸã§ã®è·é¢ã§è¿äŒŒã§ããŸãã ã¢ã³ã«ãŒãã€ã³ãã¯ã¹ããŒã¹ããšãªã¢ã«åå²ããŸãã FAISSã§åç
§ç¹ãæ€çŽ¢ããã«ã¯ãããç¥ãããŠããk-meansã¯ã©ã¹ã¿ãªã³ã°ã¢ã«ãŽãªãºã ã䜿çšãããåŸãããéå¿ããã¯ãã«ãšæ¯èŒãããŸãã
ïŒ0.1ã0.2ïŒâ1
ïŒ0.5ã-0.2ïŒâ2
ïŒ0.1ã0.1ïŒâ1
ïŒ0.6ã-0.1ïŒâ2
ã³ã¬ã¯ã·ã§ã³ãã¯ãã«ã¯ãã®éå¿ã§è¿äŒŒãããŸã ã©ãã§ ãã㊠-å€ãã®éå¿ã 次ã«ããªã¯ãšã¹ãããã®è·é¢ åã« è¿ããããããŸãã ã è·é¢ãèšç®ãããã®æ¹æ³ã¯ãé察称ãšåŒã°ããŸãã ç°¡åã«èšãã°ã空éãé åã«åå²ããã¯ãšãªãã1ã€ã®é åã«åé¡ããããã¯ãã«ã®ã°ã«ãŒããŸã§ã®è·é¢ã¯ããã®é åã圢æããéå¿ãŸã§ã®è·é¢ã«ã»ãŒçãããšèšããŸããã
éå¿ã«å±ãããã¯ãã«ã广çã«ä¿åãããã°ããæ€çŽ¢ããããšã¯ãå転ãã¡ã€ã«ãšåŒã°ããç°¡åãªããªãã¯ã«åœ¹ç«ã¡ãŸãã
2.2ã IVF-å転ãã¡ã€ã«
IVFã§ã¯ãå²ãåœãŠãéã«ããŸãã ãã¯ãã«ã®ãªã¹ããéå¿ã«äžèŽããããã«ãªããŸããã
1â[ïŒ0.1ã0.2ïŒãïŒ0.1ã0.1ïŒ]
2â[ïŒ0.5ã-0.2ïŒãïŒ0.6ã-0.1ïŒ]
ãã®ãããéå¿ãŸã§ã®è·é¢ãèšç®ããããšã§åè£ããã°ããèŠã€ããŠãæãè¿ããã®ã«ã€ããŠæ¢ã«æºåæžã¿ã®ãªã¹ããååŸã§ããŸãã
2.3ã PQ-補åéåååš
ãã®èšäºã§æåŸã«è§Šããã³ã³ããŒãã³ãã¯ã補åéåååšãšåŒã°ããŸãã éå¯éãã¯ãã«å§çž®ãæäŸãããã¯ãã«è¡šçŸãã¡ã¢ãªã«åãŸããªãå Žåã«äœ¿çšãããŸãã ãã¯ãã«ã®æ¬¡å
ã128ã§ã64ãããïŒã³ã³ããŒãã³ãããã0.5ãããã®ã¿ïŒã§ãšã³ã³ãŒããããå Žåãéå¿ã®æ°ã«çããéååãåŠçããå¿
èŠããããŸã ã ããã¯éèŠãªã¿ã¹ã¯ã§ãã ãããã«ã¯èšå€§ãªãã¬ãŒãã³ã°ã»ãããå¿
èŠã§ãã
ãã¯ãã«ãåå²ããŠåé¡ãåçŽåãã éšå ãããã³äŒçµ±ã«ãããåéšåã«256ã®éå¿ããããŸãã ã€ãŸãããã¯ãã«ã¯éå¿ã€ã³ããã¯ã¹ã®ã»ãããšããŠæžãæããããšãã§ããŸã-äŸãã° ãããããã®çµæžã¯å ãã ãã€ã
ãã®çš®ã®ã³ãŒãã£ã³ã°ãæ®å·®ãã¯ãã«ã«é©çšããŸã ã ãããŠ
2.4ã æ€çŽ¢ãã
ãããŠä»ãããããã¹ãŠã1ã€ã®ã¹ããŒã ã«ãŸãšããŸãã
ãªã¯ãšã¹ãçš èŠã€ãã æãè¿ãéå¿ã®ãã¡ããããã®éå¿ã«å¯Ÿå¿ãããã¯ãã«ã®ãªã¹ããåéããæ®å·®ãã¯ãã«ã䜿çšããŠããããŸã§ã®è·é¢ãã«ãŠã³ãããŠããã æãè¿ãã
import faiss index = faiss.index_factory(model.wv.syn0norm.shape[1], 'IVF16384,Flat') index.verbose = True train = model.wv.syn0norm[np.random.binomial(1, 1./3, size=model.wv.syn0norm.shape[0]).astype(bool)] index.train(train) index.add(model.wv.syn0norm) index.nprobe = 100 start = time.time() distances, items = index.search(king_vec, 10) print(time.time() - start) print([model.wv.index2word[idx] for idx in items[0]])
ååŸãããã®ïŒ
0.0048999786377 [u'king', u'kings', u'queen', u'monarch', u'crown_prince', u'prince', u'sultan', u'ruler', u'princes', u'Prince_Paras']
2.5ã é·æãšçæ
+å§çž®ã®ãµããŒã
+å°ããªã»ã³ããã€ãã¹ãã¬ãŒãžã®ãªãŒããŒããã
+ GPUã³ã³ãã¥ãŒãã£ã³ã°æ©èœ*
-CPUã®HNSWãã5åé
ã
* GPUå®è£
ãããã«äœ¿çšã§ããªãã£ããããæéãç¡é§ã«ããªãããšã«ããŸããã
3.ãããããã
ã¹ããŒã¹ãå¹³é¢ã§åå²ãããšããèãã¯Annoyã§ããŸãå®è£
ãããŠããŸãã ã¢ã«ãŽãªãºã ã¯ã ããã°ã§èè
ã«ãã£ãŠçŸãã説æãããŠããŸãã詳现ãçè§£ãããå Žåã¯ããã®ã¢ã«ãŽãªãºã ãèªãããšããå§ãããŸãã ãã®æ¬è³ªãèŠçŽããŠã¿ãŸãã ãã®ã¢ã«ãŽãªãºã ã§ã¯ã空éãå¹³é¢ã§ååž°çã«åå²ããäºåæšã圢æããŸãã ããªãŒã®åããŒãã«ã¯ãçŸåšã®å¹³é¢ãå®çŸ©ãããã¯ãã«ãæ ŒçŽãããŸãã æ€çŽ¢ãããšãã¯ãã«ãŒãããéå§ãããã¬ãŒã³ã«å¯Ÿããã¯ãšãªã®äœçœ®ã«åºã¥ããŠåããŒããéžæããŸãã ãããã£ãŠãããªãŒã®ãªãŒãèŠçŽ ã«è¡ããŸãããã®ãªãŒãèŠçŽ ã¯ããã¬ãŒã³ã®ã°ã«ãŒãïŒããã¯å°ããªã¹ããŒã¹ã§ãïŒã®çåŽã«ãããã¯ãã«ãæ ŒçŽããŸãããããã¯ãå¿
èŠãªæè¿åã«ãªãããã§ãã ä»ã®ã¢ã«ãŽãªãºã ãšæ¯èŒããAnnoyã®é·æãšçæãèŠãŠã¿ãŸãããã
3.1ã é·æãšçæ
+ã¢ã«ãŽãªãºã ã¯çè§£ãããã
-ããã¯å€ãã®ã¡ã¢ãªãå¿
èŠãšããŸã
-é床ã®äœäž
4.ã¢ã«ãŽãªãºã ã®æ¯èŒ
åã¢ã«ãŽãªãºã ã«ã¯ãé ç¹ã®ãã¬ã³ãã®æå€§æ°ïŒNSWãHNSWïŒãŸãã¯éå¿ã®æ°ïŒFAISSïŒã®ãã©ã¡ãŒã¿ãŒã®ã»ããããããŸãã ãããã®ãã©ã¡ãŒã¿ã¯ãæ¶è²»ãããã¡ã¢ãªã®éãæ€çŽ¢ã®å質ãšé床ã«åœ±é¿ããŸãã Annoyã®èè
ã¯ãããŸããŸãªãã©ã¡ãŒã¿ãŒã§annãã³ãããŒã¯ãªããžããªã«ANNã¢ã«ãŽãªãºã ã®ã°ã«ãŒãã®ãã¹ããå®è£
ããŸããã GloVeããã³SIFTã¢ã«ãŽãªãºã ã䜿çšããŠååŸããããŒã¿ã»ããå
ã®10åã®æè¿åã®æ€çŽ¢ã®ç²ŸåºŠãè©äŸ¡ããŸãã
GloVeã¯ãåèªã®ãã¯ãã«è¡šçŸãååŸããå¥ã®æ¹æ³ã§ã;åããµã€ãºã®ã±ãŒã¹ã§ãã¬ãŒãã³ã°ããå Žåãããããç¹ã§word2vecãè¶
ããŸãã ãã®ããŒã¿ã»ããã¯ã20åã®ãã€ãŒãã§ãã¬ãŒãã³ã°ããã120äžã®åèªã®ãã¯ãã«è¡šçŸã§æ§æãããŠããŸãã SIFTã¯ã倿ã«èæ§ã®ããäž»èŠãªç»åãã€ã³ããšãã®ãã¯ãã«è¡šçŸãååŸããããã®å€ãã¢ã«ãŽãªãºã ã§ãã ããã¯ãªããžã§ã¯ãèªèã«äœ¿çšããããã®èªèã®éèŠãªéšåã¯åæ§ã®ãã¯ãã«è¡šçŸã®æ€çŽ¢ã§ããã ããŒã¿ã»ããã«ã¯ããã€ãã®ããªãšãŒã·ã§ã³ããããŸããã100äžåã®ãã¯ã¿ãŒãå«ãSIFT 1Mã«èå³ããããŸãã
以äžã¯ãã¢ã«ãŽãªãºã ã®é床ãš10åã®æè¿åã®æ€çŽ¢ã®ç²ŸåºŠãšã®é¢ä¿ã瀺ãã°ã©ãã§ãã ã¢ã«ãŽãªãºã ã¯ãã€ã³ãã®ã°ã«ãŒãã§è¡šãããåãã€ã³ãã¯äžé£ã®ãã©ã¡ãŒã¿ãŒã§ãã¹ãå®è¡ãããŸãã


HNSWãèªä¿¡ãæã£ãŠãªãŒãããŠããããšãããããŸãã ãã ããã°ã©ãã«ã¯FAISSããããŸããã Facebookã¯ãç°ãªãæ§æã§HNSWãšFAISSãåå¥ã«æ¯èŒããŸãããçµæã衚ã«ç€ºããŸãã
æ¹æ³ | æ€çŽ¢æé | 1-R @ 1 | ã€ã³ããã¯ã¹ãµã€ãº | ã€ã³ããã¯ã¹äœææé |
---|
ãã©ããCPU | 9.100ç§ | 1.0000 | 512 MB | 0ç§ |
nmslibïŒhnswïŒ | 0.081ç§ | 0.8195 | 512 + 796 MB | 173ç§ |
IVF16384ããã©ãã | 0.538ç§ | 0.8980 | 512 + 8 MB | 240ç§ |
IVF16384ããã©ããïŒTitan XïŒ | 0.059ç§ | 0.8145 | 512 + 8 MB | 5ç§ |
ãã©ããGPUïŒTitan XïŒ | 0.753ç§ | 0.9935 | 512 MB | 0ç§ |
衚ã§ã¯ãå§çž®ãªãã®FAISSã¡ãœãããç¹ã«IVF16384,Flat
ã ãããã£ãŠã16,384ã®éå¿ãæã€IVFADCã䜿çšãããŸãã float32ã®æ¬¡å
128ã®çŸäžãã¯ãã«ã®å Žåã®ã¡ã¢ãªã³ã¹ãã瀺ãããŸãã HNSWã¯ãCPUã§èšç®ããå Žåã¯5åé«éã§ãããã°ã©ãã®ãšããžïŒ ïŒéå¿ãããå€ãã®ã¡ã¢ãªãå¿
èŠã§ãïŒ ïŒ
5.ãŸãšã
æè¿åããã°ããèŠã€ããããã«äœ¿çšãããããã€ãã®ã¢ã«ãŽãªãºã ã調ã¹ãŸããã ã¢ãã€ã¯ã¡ã¢ãªãšã¹ããŒãã®äž¡æ¹ã倱ããŸããããã¢ã€ãã¢ã¯è¯ããé¢é£ããåé¡ã®è§£æ±ºã«åœ¹ç«ã¡ãŸãã ããšãã°ãããŒã¿ã»ããã®ã¯ãªãŒãã³ã°ã«äŸ¿å©ãªéé¢ãã©ã¬ã¹ãã®ç°åžžæ€çŽ¢ã¢ã«ãŽãªãºã ã¯ãæŠå¿µãéåžžã«äŒŒãŠããŸãã FAISSã¯ã¡ã¢ãªã«å¶éã®ããåªãããœãªã¥ãŒã·ã§ã³ã§ãããIVF16384ãPQ64ã䜿çšããŠ60 GBã®RAMã®10ååã®ãã¯ãã«è¡šçŸãç©ã¿éããããšãã§ããŸãã ãã ããã¡ã¢ãªãããã«ããã¯ã§ãªãå Žåã¯ãHNSWãéžæããå¿
èŠããããŸãã
PSæãè峿·±ãåºçç©ã¯ãFAISSã®ã¢ã«ãŽãªãºã ã«é¢ãããã®ã§ããã ããšãã°ãGPUã®æé©åãéååã®æ¹åãããæ¹æ³ïŒæé©åããã補åã®éååïŒãã€ã³ããã¯ã¹ãäœæããããã®ããªãããŒãªæ¹æ³ïŒéãã«ãã€ã³ããã¯ã¹ïŒã«ã€ããŠèªãããšãã§ããŸãã
PPSç§ãã¡èªèº«ã®ããã«ãHNSWãéžæããŸããã
6.æåŠ