ã·ã³ãã«ãªããžãã¹ã®å°å°ã æ€çŽ¢ã¯ãšãªã§äºéã«å°å°ãããŸãã 仿¥ã®ãã¹ãŠã®å€§èŠæš¡ãªWebæ€çŽ¢ãšã³ãžã³ãèªãã§ãããŒã¯ãŒãã®ãšã©ãŒã1åã«ãã¯ãšãªã®ããã³ããã2åã«ä¿®æ£ã§ããŸãã ãããã«ç¶ããŠåããã®ãããå°ããªæ€çŽ¢ãæãã§ããŸãã äž¡æ¹ã®éšåã¯ã
SphinxãšåŒã°ãããªãŒãã³ãªæ€çŽ¢ãšã³ãžã³ã䜿çšããŠå·§åŠã«å®è£
ã§ããŸãã ãã®æçš¿ã§ã¯ããã®æ¹æ³ãæ£ç¢ºã«èª¬æããŸãã
ããŠãããªãã¯ã©ãããæå³ã§ããïŒãã©ãããæå³ã§ãããïŒããã³ä»ã®ã¯ãšãªã®å®äºïŒãããªãã¯æ¬åœã«Vasyaãæ¢ããŠããŸããïŒãïŒ
ãæ°ã«å
¥ãã®æ²ãããŒãããèããŠãã ããïŒ
ç°¡åãªãã®ããå§ããŸãããã ã¯ãšãªã®å®äºã ãŠãŒã¶ãŒã¯æ€çŽ¢è¡ã®å
¥åãéå§ããŸããå
¥åãããšããã«ããã³ãââãã衚瀺ãããã®ã§ãïŒããã¯ãä»ã®äººãšåãããã«ãäžæãªãã®ãæ¢ããã«ãValenokããèãããã§ãïŒã ããããçµæã®æ°ããã£ãŠãã ã©ã®ããã«ãã©ãããããŒã¿ãååŸããŸããïŒ
çŽ2ã€ã®ãªãã·ã§ã³ããããŸãããŠãŒã¶ãŒã¯ãšãªãçŽæ¥å
¥åããæ¹æ³ãšãåã
ã®ããŒã¯ãŒããéžæããæ¹æ³ã§ãã
ããŒã¯ãŒãã®ææ¡ã¯ç¹ã«ç°¡åã§ãã äºæããªãååã®ã€ã³ãã¯ãµãŒãæã€ã€ã³ããã¯ã¹äœæããã°ã©ã ã«ã¯ã2ã€ã®çããäºæããªãããŒããããŸããåæã«ã«ãŠã³ãããŸãïŒé »åºŠèŸæžã§ãïŒã ã€ã³ãã¯ãµãŒã䜿çšããŠãããã¥ã¡ã³ãã®ã³ã¬ã¯ã·ã§ã³å
ã§æãäžè¬çãªåèªã1äžïŒ100ïŒäœæããŸãã
$ indexer myindex --buildstops dict.txt 10000 --buildfreqs
ãã®ãã¡ã€ã«ã®ãããªãã®ãåŸãããŸãïŒ
i 9533843
ããã³5427048
5418872ãž
5371581
4282225
ããªã2877338
...
æ¬¡ã¯æè¡ã®åé¡ã§ãã SQLã©ãã«ãäœæãã12è¡ã§ã€ã³ããŒãã¹ã¯ãªãããèšè¿°ããéåžžã®LIKEããã³ããšããŠäœ¿çšããåèªã®é »åºŠã§çµæãäžŠã¹æ¿ããŸãã ã€ã³ããã¯ã¹ã®LIKEã¯ãããããããªãé«éã«ãªããŸãã ããŒã®å§ãŸããæ¢ããŠããŸãã 1æåã®ã¯ãšãªã®å Žåããããã¿ããšã«äœåè¡ãã·ã£ãã«ããªãããã«ãçµæãããã«èšç®ããŠãã£ãã·ã¥ãããšããã§ãããã ãã ããMySQLã¯ãšãªãã£ãã·ã¥ã¯ãæå¹ã«ãããšä¿åãããããšã«ãªã£ãŠããŸãã
CREATE TABLEããŒã¯ãŒã
ïŒ
ããŒã¯ãŒãVARCHARïŒ255ïŒNOT NULLã
freq INTEGER NOT NULLã
INDEXïŒããŒã¯ãŒããé »åºŠïŒ
ïŒ;
SELECT * FROMããŒã¯ãŒãWHEREããŒã¯ãŒãLIKE 'valenïŒ
' ORDER BY freq DESC LIMIT 10
ãªã¯ãšã¹ãã¯ã»ãŒåãæ¹æ³ã§èŠæ±ãããŸãã ã¿ãã¬ãããLIKEãã©ãã«ãè¡ããŸããã åäžã®åèªã®ä»£ããã«ãä»ã§ã¯å®å
šãªè¡ãå¿
èŠã§ãã ã¯ãšãªãã°ã§ããããååŸããå¿
èŠããããããããé »åºŠãèšç®ããŸãã ãã°ã¯ãã¡ã€ã«ã«ãã£ãŠãããã«åŠçãããå¿
èŠããããŸããæ€çŽ¢ã«é¢ãããvasya pupkinããšããè¡ã¯ããVasyaïŒ PupkinïŒãããªãç°ãªããªã¯ãšã¹ããæ€èšããã®ãã¯ããŸããããããŸããã ããã¯ãSphinx APIã®
BuildKeywordsïŒïŒã¡ãœããã«ãã£ãŠãããã¯ãããŸããä»»æã®ã¯ãšãªè¡ãååŸãããã®äžã«ããŒã¯ãŒããäœæãïŒå€§æåå°æåã®åæžãªã©ïŒãæ£èŠåãããã¯ãšãªè¡ã埩å
ããŸãã ããã¯ãã¹ãŠãæåã«OpenïŒïŒã¡ãœããã䜿çšããŠæ°žç¶çãªæ¥ç¶ãèšå®ããããšã§æé©ã«è¡ãããŸããããããªããšãåäœãäœåãé
ããªããŸãã ããŠã芪æã§ã ãã°ããã¡ã€ã«ããœãŒããuniq -cãã¹ã¯ãªããã®ã€ã³ããŒããSQLãã¬ãŒããLIKEãå©çã ãã¡ã€ã«ã¯æ¬¡ã®ããã«ãªããŸãã
$ cl = new SphinxClientïŒïŒ;
$ cl-> OpenïŒïŒ;
foreachïŒ$ãšã³ããªãšããŠ$ãã°ïŒ
{
$ããŒã¯ãŒã= $ cl-> BuildKeywordsïŒ$ãšã³ããªããmyindexããfalseïŒ;
foreachïŒ$ããŒã¯ãŒããšããŠã®$ããŒã¯ãŒãïŒ
$ããŒã¯ãŒã["ããŒã¯ã³å"]ãå°å·ããŸãã ãã;
ã\ nããå°å·ããŸãã
}
SQLããŒã¿ããŒã¹å
ã®2ã€ã®ãã£ãŒã«ãã«é¢ããããã¹ããã¡ã€ã«ããã®ã€ã³ããŒãã¹ã¯ãªããã¯ãèªè
ã®å®¿é¡ãšããŠæ®ãããŠããŸãã
ããã¯ãã³ãã§ããããšã«æ°ä»ããŸããã
ãã³ãã倱ã£ãã®ã§ããšã©ãŒã®ä¿®æ£ã«é²ã¿ãŸãã ãåç¥ã®ããã«ãã¿ã€ããã¹ã®ããªãããŒãšããååã¯ã
600çš®é¡æªæºã®æ¹æ³ã§å
¥åã§ã
ãŸã ã ãããã圌女ã«ã¯å§ããããŸãã ãããã圌女ãã¡ã¯æ€çŽ¢ã§ããŸããã圌女ã¯ãŸã£ããæ€çŽ¢ã§ããŸããïŒ ãšã©ãŒã®ããã¯ãšãªã§ã¯ãäœãèŠã€ãããŸããã ããŒãžã¯ç©ºçœã«ãªããŸãã Adsense / Ydirect / Unameitã¯è³ªã®æªãåºåã衚瀺ããŸãã 誰ãã¯ãªãã¯ããŸããã ã¹ã¿ãŒãã¢ããã¯çãå°œããŸãã Sphinxã«é¢ããåçšãµãŒãã¹ã賌å
¥ãã人ã¯ããŸãããããããžã§ã¯ããæ»ã«çµ¶ããŸãã ããã¯åãå
¥ããããŸãããããŒã¯ãŒããæ©æ¥ã«ä¿®æ£ããå¿
èŠããããŸãã
ãã¡ãããispellãaspellãhunspellããŸãã¯çŸåšã®ä»»æã®ãã¡ãã·ã§ã³ããã蟌ããªãã·ã§ã³ãåžžã«ãããŸãã æããã«ãããã¯åžžã«xxxspellèŸæžã®å質ã«äŸåããããé©åãªèšèªãååšããªããšããæããã®ã©ã¡ããã«ããã£ãŠããŸãã æ°é èªïŒä¿åïŒãç¹å¥ãªçšèªïŒã¢ã·ãžãŠã ã¢ã»ãããµãªã·ãªãŠã ïŒãå°ççåç§°ãªã©ã«ã¯äœã®å©ãã«ããªããªãããšã¯æããã§ãã ããã«ãããç§ã¯ãŸã ãã£ã𿬲ããã§ãã ãããŠãããã°ã¯ispellã«ã€ããŠã§ã¯ãªããåŸãå¿
èŠããããŸãã
ç¹°ãè¿ããŸãããåšæ³¢æ°èŸæžãå¿
èŠã§ãã 確ãã«ã10,000åãè¶
ããããŒã¯ãŒã-ãŸããªæ£ããåèªãé »ç¹ã«è¿ãåèªã«ãä¿®æ£ããã䟡å€ã¯ãããŸããã éåžžã100äžèªã®èŸæžã§ååã§ããã1000äžèªã§ååã§ãã ã³ãã³ãã¯ã€ã³ãã¯ãµãŒã«å€ãããŸã--buildstops dict.txt 10000000 --buildfreqs MYINDEXNAMEïŒã¡ãªã¿ã«ãC2D E8500ã§ã¯20 MB /ç§ä»¥äžã®é床ã§åäœããŸãïŒã ãã³ããšã¯ç°ãªãããã®ãããªèŸæžã§ã®SQLæ€çŽ¢ã¯åœ¹ã«ç«ã¡ãŸããã ããã«ããŒã¿ãããããªã¯ãšã¹ãã®ã¿ã€ãã¯åãã§ã¯ãããŸããã ããããã¹ãã£ã³ã¯ã¹ã¯åœ¹ç«ã¡ãŸãã
äž»ãªã¢ã€ãã¢ã¯æ¬¡ã®ãšããã§ãã èŸæžããåèªããšã«
ããã©ã€ã°ã©ã ã®ã»ãããããããçæããŸã
ã 3ã€ã®é£ç¶ããæå ã Sphinxã§ãã©ã€ã°ã©ã ã«ã€ã³ããã¯ã¹ãä»ããŸãã 眮æãªãã·ã§ã³ãæ€çŽ¢ããã«ã¯ããšã©ãŒã®ããåèªã®ãã©ã€ã°ã©ã ãäœæããŸããã€ã³ããã¯ã¹ã§ããããæ¢ããŸãã ããã€ãã®åè£ããããŸãã äžèŽãããã©ã€ã°ã©ã ãå€ãã»ã©ãèªé·ã®å·®ã¯å°ãããªããèŠã€ãã£ããªãã·ã§ã³ãèŠã€ããé »åºŠãé«ããªãã»ã©ãããè¯ãçµæãåŸãããŸãã ãããŠãå®äŸã䜿çšããŠãããããã¹ãŠããã詳现ã«åæããŸãã
ã€ã³ãã¯ãµãŒ--buildstopsã«ãã£ãŠäœæãããèŸæžã¯ããŸã 次ã®ããã«ãªã£ãŠããŸãïŒåèªãããæ¬ç©ã«ãªããäŸãããæç¢ºã«ãªãããã«ãå¥ã®éšåãéžæããŸããïŒã
...
ååŒ32431
äœæããã32429
ã©ã€ã32275
å¿
èŠãª32252
ã ãŒã32185
æ»32140
32136ã®èåŸ
éåžž32113
ã¢ã¯ã·ã§ã³32053
32052è¡
è
¹ãç«ãŠãŠ32043
...
åèªããšã«ãäžæã®IDãäœæããåèªèªäœãšãã®é »åºŠãä¿åãããã©ã€ã°ã©ã ãäœæããŠããã¹ãŠãããŒã¿ããŒã¹ã«ä¿åããå¿
èŠããããŸãã ã€ã³ããã¯ã¹ä»ãããŒã¿ããŒã¹ã«ã¿ã€ããã¹ãããå Žåã¯ãããŸãã«ããŸããªåèªãåé€ããã®ãçã«ããªã£ãŠããŸãã ãããããããã¯ã¿ã€ããã¹ã§ãã
CREATE TABLEãµãžã§ã¹ãïŒ
id INTEGER PRIMARY KEY AUTO_INCREMENT NOT NULLã
ããŒã¯ãŒãVARCHARïŒ255ïŒNOT NULLã
trigrams VARCHARïŒ255ïŒNOT NULLã
freq INTEGER NOT NULL
ïŒ;
æ¿å
¥ããŠå€ãææ¡
...
ïŒ735ã 'deal'ã '__ d _de dea eal al_ l __'ã32431ïŒã
ïŒ736ããäœææžã¿ããã__ c _cr cre rea eat ate ted ed_ d __ãã32429ïŒã
ïŒ737ã 'light'ã '__ l _li lig igh ght ht_ t __'ã32275ïŒã
ïŒ738ããå¿
èŠããã__ n _ne nee eed ede ded ed_ d __ãã32252ïŒã
ïŒ739ããmoodããã__ m _mo moo ood od_ d __ãã32185ïŒã
ïŒ740ã 'death'ã '__ d _de dea eat ath th_ h __'ã32140ïŒã
ïŒ741ã 'behind'ã '__ b _be beh ehi hin ind nd_ d __'ã32136ïŒã
ïŒ742ããéåžžããã__ u _us usu suaual all lly ly_ y __ãã32113ïŒã
ïŒ743ã 'action'ã '__ a _ac act cti tio ion on_ n __'ã32053ïŒã
ïŒ744ã 'line'ã '__ l _li lin ine ine _____'ã32052ïŒã
ïŒ745ããpissedããã__ p _pi pis iss sse sed ed_ d __ãã32043ïŒã
ïŒ746ã 'bye'ã '__ b _by bye ye_ e __'ã32012ïŒã
...
ãã©ã€ã°ã©ã ã§ãã£ãŒã«ãã«ã€ã³ããã¯ã¹ãä»ããããšã ããå¿
èŠã§ãããåè£ãã©ã³ã¯ä»ãããããã«ã¯ïŒããããæè¯ã®ä¿®æ£ãéžæããããïŒãã³ã¬ã¯ã·ã§ã³å
ã®åèªã®é·ããšãã®åºçŸé »åºŠãäŸç¶ãšããŠå¿
èŠã§ãã
sql_query = SELECT idãtrigramsãfreqãLENGTHïŒããŒã¯ãŒãïŒAS len FROMææ¡
sql_attr_uint = freq
sql_attr_uint = len
æ€çŽ¢ã¯ãšãªã®çµæããçãããåèªãèå¥ããŸããæ€çŽ¢çµæãå°ãªãããïŒãŸãã¯ãŸã£ãããªãïŒå Žåãå¿çã»ã¯ã·ã§ã³$ result ["words"]ãåæããååèªã®ããã¥ã¡ã³ãæ°ã調ã¹ãŸãã ææžãå°ãªãå Žåã¯ããã®ãããªåèªãä¿®æ£ããããšããŸãã ããšãã°ããã¹ãã€ã³ããã¯ã¹ã®ã¯ãšãª "green liight"ã®å Žåã "green"ã®çºçæ°ã¯34421ã§ã "liight"ã®ã¿ã§ããä¿®æ£äœæ¥ã«é²ãã¹ããã®ã¯ããã«ããããŸãã ãå°æ°ãã®ç¹å®ã®ãããå€ã¯ãããã¥ã¡ã³ãããã³ãªã¯ãšã¹ãã®ããŸããŸãªã³ã¬ã¯ã·ã§ã³ã«å¯ŸããŠéåžžã«åå¥ã§ãã èŸæžãšã¯ãšãªãã°ãèŠãŠãããžãã¯å®æ°ãéžæããŸãã
ãã©ã€ã°ã©ã ãäœæãããã©ã€ã°ã©ã ç¹æ®ã€ã³ããã¯ã¹ã§ã¯ãšãªãå®è¡ããŸãã åèªã«ãšã©ãŒãå
¥åãããŠããããã
ãã¹ãŠã®ãã©ã€ã°ã©ã ãäžèŽããå¯èœæ§ã¯äœãã§ãã äžæ¹ã1ã€ã®ãã©ã€ã°ã©ã ã®ã¿ãäžèŽããå Žåããã®ãããªåè£ã¯ããŸãé¢å¿ããããŸããïŒããã¯ãåèªã®äžå€®ã®3æåãäžèŽããïŒä»ã«äœããªãïŒå ŽåããŸãã¯å
é ã«1æåïŒä»ã«äœããªãïŒå Žåã«ã®ã¿çºçããŸãã ããŠã
å®è¶³æ°æŒç®åã䜿çšããŸããããã¯ãŸãã«æ¢ããŠãããã®ã§ããå°ãªããšã2ã€ã®ãã©ã€ã°ã©ã ãäžèŽãããã¹ãŠã®ããã¥ã¡ã³ããçºè¡ããŸãã ãŸããé·ãã®å¶éãå°å
¥ããŠããŸããæ£ããããªã¢ã³ãã®é·ãã¯2æå以å
ã§ç°ãªããšæ³å®ããŠããŸãã
$ len = strlenïŒ "liight"ïŒ;
$ cl-> SetFilterRangeïŒ "len"ã$ len-2ã$ len + 2ïŒ;
$ cl-> QueryïŒ '"__l _li iig igh ght ht_ ht __" / 2'ã 'suggest'ïŒ;
èŠã€ãã£ãåè£è
ã®æããœãŒãããããããæé©ãªãã®ãéžæããå¿
èŠããããŸãã ç§ãã¡ãæã£ãŠããèŠå ãæãåºããŠãã ããïŒ
- äžèŽãããã©ã€ã°ã©ã ãå€ãã»ã©è¯ãã
- èªé·ãçãã»ã©è¯ãã
- èŠã€ãã£ããªãã·ã§ã³ãé »ç¹ã«èŠã€ããã»ã©ãããè¯ãçµæãåŸãããŸãã
ããããã¹ãŠã®èŠå ãSphinxã®ææ°ããŒãžã§ã³ã¯ããµãŒããŒåŽã§å®å
šã«èšç®ããã³ãœãŒãã§ããŸãã äžèŽãããã©ã€ã°ã©ã ã®æ°ã¯ãã©ã³ã«ãŒSPH_RANK_WORDCOUNTã䜿çšããŠèšç®ã§ããŸãïŒç¹å¥ãªæ€çŽ¢ã®éçšã§ãåãã©ã€ã°ã©ã ã¯åå¥ã®ããŒã¯ãŒããšããŠæ©èœããŸãïŒã èªé·ã®éãã¯absïŒlen- $ lenïŒã§ãé »åºŠã¯freq屿§ã«æ ŒçŽãããŸãã èŠå ãèšç®ããããã€ãããŸãšããŠãæé©ãªãã®ãéžæããŸãã
$ cl-> SetMatchModeïŒSPH_MATCH_EXTENDED2ïŒ;
$ cl-> SetRankingModeïŒSPH_RANK_WORDCOUNTïŒ;
$ cl-> SetSelectïŒ "*ã@ weight + 2-absïŒlen- $ lenïŒAS myrank"ïŒ;
$ cl-> SetSortModeïŒSPH_SORT_EXTENDEDã "myrank DESCãfreq DESC"ïŒ;
ããããïŒ liightãšããåèªã¯ãã©ã€ããã£ãã¯ã¹ãæ£åžžã«æ€åºããŸããã ïŒããæ£ç¢ºã«ã¯ãSphinxã¯IDãæ€åºããããŒã¿ããŒã¹ãããlightãè¡ãååŸããŸãïŒã
ããã¯ãSphinx 0.9.9-rc2ã«é©çšããããã¢ã®ä»çµã¿ã§ãïŒã¢ãŒã«ã€ãå
ã®misc / suggestãã£ã¬ã¯ããªãåç
§ïŒã远å ã®ã³ãŒããèšè¿°ããããšãªããããŒã¿ãããã«è©Šãããšãã§ããŸã:-)
ãã¢ã¯ããã«çè§£ã§ããäžå®å
šã§ããã
ãã¡ã€ã«ã®æ¹è¯ã®å¯Ÿè±¡ãšãªã
ãŸã ã ïŒç³ãèš³ãããŸããããæµæã§ããŸããã§ãããïŒUTF-8ãæåŸ
ãããsubstrã䜿çšããããããPHPããã¯ã¹ã®äžéšããã·ã¢èªã§åäœããªããšããå±éºããããŸãã ã»ãŒç¢ºå®ã«ãFREQ_THRESHOLDããããå¿
èŠããããŸãã åèªãã¿ã€ããã¹ãšèŠãªãããç¹æ®ã€ã³ããã¯ã¹ã«åé¡ãããªããšã³ããªã®æå°æ°ã ããŒã¿ã®å°ããªã³ã¬ã¯ã·ã§ã³ã®å Žåã¯äœãã倧ããªã³ã¬ã¯ã·ã§ã³ã®å Žåã¯å¢å ããŸãã åãçç±ã§ïŒãŸããªç Žçãé »ç¹ãªãŽããããé«ãã©ã³ã¯ä»ããããªãããã«ïŒãmyrankã®èšç®åŒãã²ããå¿
èŠããããããããŸããã ããšãã°ã1000åç°ãªãåšæ³¢æ°ã®äžèŽããããªã°ã©ã ã®æ°ã«äœåãªåäœã远å ããŸãã
$ cl-> SetSelectïŒ "*ã@ weight + 2-absïŒlen- $ lenïŒ+ lnïŒfreqïŒ/ lnïŒ1000ïŒAS myrank"ïŒ;
ããã«ããã©ã€ã°ã©ã ã«ãããã©ãŒã«ã¹ã¯å¹æçã§ãããéåžžã«ã·ã³ãã«ã§ãããããŸãèæ
®ãããŠããŸããã ãã©ã€ã°ã©ã ã®é åºã¯èæ
®ãããŸãããã劥åœãªé·ãã®åèªã®å Žåãããã¯äžè¬ã«åé¡ã§ã¯ãããŸããã ããã«è峿·±ãã®ã¯ã人ã
ã
ã©ã®ããã«ééããŠãã
ããèæ
®ããŠããªãããšã§ãïŒé£æ¥ãã2ã€ã®æåã3ã°ã©ã ã®æ°ã ãäžŠã¹æ¿ããããšã¯ããããã®æåãä»ã®æåïŒïŒïŒ ããŒããŒãäžã®æåã®è¿ãã¯ããããªãå Žåã«ãèæ
®ãããŸããã é³å£°ã®è¿æ¥æ§ïŒå®éã«ã¯/ akshullyïŒã¯ããããªãæ¹æ³ã§ãèæ
®ãããŸããã å°ãªãè¡ã§è£æ£ã®è³ªãæ¹åããããã®ããªãæçœãªã¢ã€ãã¢ïŒ1ã€ã®æè¯ã®éžæè¢ã®ä»£ããã«ã10-20åãåãåºããã¯ã©ã€ã¢ã³ãã®
ã¬ãŒãã³ã·ã¥ã¿ã€ã³è·é¢ãæ°ããèšç®çµæã調æŽããŸãã ããããã®è¡ãæµããã°ãä»ã®èªäœã¢ã«ãŽãªãºã ã䜿çšããŠã1ããŒã¹ãŸãã¯2åè£ããæ°ãããããšãã§ããŸãã
äžè¬ã«ããã¢ã¯ãã®ãŸãŸäœ¿çšã§ããŸãã ããããããã¯ãŸã ãã¢ã§ããããããªãåµé æ§ã®ããã«å€ãã®ã¹ããŒã¹ããã£ã³ã»ã«ãã人ã¯ããŸããã ããã°ã®äœæãçºæãäœæããããã®éä¿¡ïŒ