èšäºã®æåã®éšåã§ã¯ ãæ®éçãªã¬ãŒãã³ã·ã¥ã¿ã€ã³ãªãŒãããã³ã調ã¹ãŸãã
ãããã¯ãããåèªWããäžããããã¬ãŒãã³ã·ã¥ã¿ã€ã³è·é¢ä»¥äžã®åèªããã£ã«ã¿ãªã³ã°ããããã®åŒ·åãªããŒã«ã§ãã æ¬¡ã¯ããã®ããŒã«ã䜿çšããŠãèŸæžã®ãã¡ãžãŒæ€çŽ¢ã®åé¡ã广çã«è§£æ±ºããæ¹æ³ãåŠç¿ããŸãã
ã¬ãŒãã³ã·ã¥ã¿ã€ã³ãªãŒãããã³ã䜿çšããŠèŸæžã®ãã¡ãžãŒæ€çŽ¢åé¡ã解決ããããã®æãç°¡åã§æçœãªã¢ã«ãŽãªãºã ã¯ãç¶²çŸ
çãªæ€çŽ¢ã¢ã«ãŽãªãºã ã§ãã ãã®å ŽåãèŸæžã®ååèªã«ã€ããŠãæ€çŽ¢ã¯ãšãªãŸã§ã®ã¬ãŒãã³ã·ã¥ã¿ã€ã³ïŒDamerau-LevenshteinïŒã®è·é¢ãæšå®ãããŸãã CïŒã®å®è£
äŸ
ã¯ãèšäºã®æåã®éšåã«èšèŒãããŠããŸãã
ãã®åçŽãªã¢ã«ãŽãªãºã ã§ãããåçããã°ã©ãã³ã°ææ³ã䜿çšããå Žåãšæ¯èŒããŠçç£æ§ãåäžããŸãã ãã ããããå¹ççãªã¢ã«ãŽãªãºã ããããŸãã
ã·ã¥ã«ããšãããã®åºæ¬ã¢ã«ãŽãªãºã
æåã®ãã®ãããªã¢ã«ãŽãªãºã ãèããŠã¿ãŠãã ãã-ç§ã®ç¥ãéãã
Schultz and MikhovïŒ2002ïŒã«ãã£ãŠææ¡ãããã®ã§ãåºæ¬çãªSchultz and Mikhov
ã¢ã«ãŽãªãºã ãŸãã¯åã«
åºæ¬çãªã¢ã«ãŽãªãºã ãšåŒã³
ãŸã ã ãããã£ãŠã確å®ããã¬ãŒãã³ã·ã¥ã¿ã€ã³ãªãŒãããã³
A N ïŒWïŒããæªãã ãšæãããåèª
Wãšç·šéè·é¢ã®ãããå€
Nã«å¯ŸããŠäžãããããšããŸã
ã æ€èšäžã®èŸæž
Dããå
¥åã¢ã«ãã¡ããã
Eãæã€æ±ºå®è«çæéãªãŒãããã³
A DãšããŠè¡šãããŸã
ã ãªãŒãããã³ã®ç¶æ
ããããã
qããã³
q Dã§ ãé·ç§»é¢æ°ã
Vããã³
V Dã§ ãæçµç¶æ
ã®ã»ããã
Fããã³
F Dã§ç€ºããŸãã SchultzãšMihovã«ãã£ãŠææ¡ãããèŸæžã®ãã¡ãžãŒæ€çŽ¢ã®ã¢ã«ãŽãªãºã ã¯
ããªã¿ãŒã³ã
äŒŽãæšæºã®
æ€çŽ¢æé
ã§ãã ãæ¬¡ã®æ¬äŒŒã³ãŒãã§èšè¿°ã§ããŸãã

ã¢ã«ãŽãªãºã ã¯ããªãŒãããã³ã®åæç¶æ
ã§åäœãéå§ããŸãã æ°ããæåãå
¥åã«éããããšãåŸç¶ã®ç¶æ
ã空ã§ãªãå Žåãã¹ã¿ãã¯ã«ããã·ã¥ãããŸãã äž¡æ¹ã®ãªãŒãããã³ã®æ¬¡ã®ç¶æ
ãæéã§ããå Žåãæ€çŽ¢ã¯ãŒããèŠã€ãããŸãã
ãã®ã¢ã«ãŽãªãºã ã¯ãæéç¶æ
ãã·ã³ã®ã亀差ç¹ããšèããããšãã§ããŸãã äž¡æ¹ã®ãªãŒãããã³ã®æçµç¶æ
ã«å¯Ÿå¿ããåèªã®ã¿ããçµæã®ãµã³ãã«ã«åé¡ãããŸãã ãã®å Žåãäž¡æ¹ã®ãã·ã³ã空ã§ãªãç¶æ
ã«å€æãããã¬ãã£ãã¯ã¹ã®ã¿ãèæ
®ãããŸãã
ã¢ã«ãŽãªãºã ã®èšç®ã®è€éãã¯ãèŸæžã®ãµã€ãºãšç·šéè·é¢
Nã«äŸåããŸã
ã NãèŸæžå
ã®æé·åèªã®ãµã€ãºã«éãããšãã¢ã«ãŽãªãºã ã¯ãªãŒãããã³
A Dã®ç¶æ
ã®å®å
šãªåæã«ãªããŸã
ã ããããå®éã®åé¡ã解決ãããšãã¯ãååãšããŠãå°ããªå€ã®
Nã䜿çšãããŸãã ãã®å Žåãã¢ã«ãŽãªãºã ã¯ãªãŒãããã³
A Dã®ç¶æ
ã®éåžžã«å°ããªãµãã»ããã®ã¿ãèæ
®ããŸã
ã N = 0ã®å Žåãã¢ã«ãŽãªãºã ã¯æé
OïŒ| W |ïŒã®èŸæžã§åèª
WãèŠã€ããŸãã
説æããã¢ã«ãŽãªãºã ã¯ãæ€çŽ¢äžã«æå€±ããªãããšãä¿èšŒããããšã«æ³šæããå¿
èŠããããŸãã ã€ãŸãã
N以å
ã®è·é¢ã§
Wããåé¢ãããŠããèŸæžã®åèªã®100ïŒ
ã¯ãçµæã®ãµã³ãã«ã«åé¡ãããŸãã
ãœãããŠã§ã¢å®è£
ã®æ©èœ
ãã¬ãã£ãã¯ã¹ããªãŒãªã©ã®ããŒã¿æ§é ã«æ¢ã«ç²ŸéããŠãããšæããŸãã ãã¬ãã£ãã¯ã¹ããªãŒã¯ããããŒããããããªãŒãïŒãŸãã¯ãããŠçŽ ãããããŒã ããããã©ã€ãïŒãšãåŒã°ããèŸæžã®ä¿åã«äœ¿çšãããŸãã ãã®å³ã¯ããé«éãããé¢çœããããå®å
šã«ããããã¡ãžãŒãã®4ã€ã®åèªã®èŸæžã®ãã¬ãã£ãã¯ã¹ããªãŒã瀺ããŠããŸãã

ãã¬ãã£ãã¯ã¹ããªãŒã«ç²ŸéããŠããªãå Žåã¯ããã®æ§é ã詳现ã«èª¬æãããŠããåºçç©ïŒããšãã°ã
ãã¡ãïŒã«æ
£ããããšãã§ã
ãŸã ã
èŸæžãä¿åããããã«ãã¬ãã£ãã¯ã¹ããªãŒã䜿çšãããã®ã¯ãªãã§ããïŒ ãã¬ãã£ãã¯ã¹ããªãŒã¯æéç¶æ
ãã·ã³ãšèŠãªãããšãã§ããããã§ãã ããªãŒã®åããŒãã¯ããªãŒãããã³ã®ç¶æ
ã衚ããŸãã åæç¶æ
ã¯ããªãŒã®ã«ãŒãã§ãããæçµç¶æ
ã¯åèªã«å¯Ÿå¿ããããŒãã§ãã åããŒããšã·ã³ãã«ã«å¯ŸããŠãé·ç§»ã¯1ã€ã ãå¯èœã§ã-ãªãŒãããã³ã¯æ±ºå®è«çã§ãã
ãã®ããããã¬ãã£ãã¯ã¹ããªãŒã決å®è«çãªæéç¶æ
ãã·ã³ãšèŠãªããæ±ºå®è«çãªã¬ãŒãã³ã·ã¥ã¿ã€ã³ãªãŒãããã³ã®ãœãããŠã§ã¢å®è£
ãæã€ããšãèæ
®ãããšãã¢ã«ãŽãªãºã ã®æ¬äŒŒã³ãŒããããã°ã©ãã³ã°èšèªã®ã³ãŒãã«å€ããããšã¯é£ãããããŸããã CïŒã®ãã¿ãã¬ã®äŸ
åºæ¬çãªã¢ã«ãŽãªãºã FB-Trieã¢ã«ãŽãªãºã
ã©ãã 2004
幎ãMikhovãšSchulzã¯ãäžèšã®ã¢ã«ãŽãªãºã ã®ä¿®æ£ãææ¡ããŸããããã®äž»ãªã¢ã€ãã¢ã¯ãæ€çŽ¢ã¯ãšãª
Wã2ã€ã®ã»ãŒçããéšå
W 1ãš
W 2ã«åå²ããããšãšçµã¿åãããŠããã©ã¯ãŒãããã³ãªããŒã¹ãã¬ãã£ãã¯ã¹ããªãŒã䜿çšããããšã§ãã ãã®ã¢ã«ãŽãªãºã ã¯FB-TrieïŒåæ¹ããã³åŸæ¹ãã©ã€ããïŒãšããŠç¥ãããŠããŸãã
éãã¬ãã£ãã¯ã¹ããªãŒã¯ãèŸæžå
ã®ãã¹ãŠã®åèªã®å転ããæ§ç¯ããããã¬ãã£ãã¯ã¹ããªãŒãšããŠçè§£ããå¿
èŠããããŸãã åèªã®å転ãç§ã¯åã«åŸæ¹ã«æžãããåèªãæå³ããŸãã
éèŠãªç¹-ããããšã·ã¥ã«ãã¯åœŒãã®ç ç©¶ã«ãããŠã2æ¬ã®ç·ã®å転éã®Damerau-Levenshteinè·é¢ãç·èªäœã®è·é¢ã«çããããšã瀺ããŸããã
N = 1ã®ã¢ã«ãŽãªãºã ã®åäœã¯ã次ã®ã¹ããŒãã¡ã³ãã«åºã¥ããŠããŸããDamerau-Levenshteinè·é¢
dïŒSãWïŒ<= 1ã ãã©ã€ã³
Wããé¢ããŠããã©ã€ã³
Sã¯ã S 1ãš
S 2ã®2ã€ã®éšåã«åå²ã§ããŸããçžäºã«æä»çãª3ã€ã®æ¡ä»¶ã®ããããïŒ
aïŒdïŒS 1 ãW 1 ïŒ= 0ããã³dïŒS 2 ãW 2 ïŒ<= 1
bïŒdïŒS 1 ãW 1 ïŒ<= 1ããã³dïŒS 2 ãW 2 ïŒ= 0
cïŒdïŒS 1 ãW 1 'ïŒ= 0ããã³dïŒS 2 ãW 2 'ïŒ= 0
ãcãæ®µèœã§ã¯ãè¡
W 1 'ãš
W 2 'ã¯ãè¡
W 1ãš
W 2ãããæåŸã®æå
W 1ãæåã®æå
W 2ã« ããŸãã¯ãã®éã«çœ®ãæããããšã«ãã£ãŠååŸãããŸãã ããšãã°ã
W 1 = 'FU'ããã³
W 2 = 'ZZY'ã®å Žåã
W 1 ' =' FZ 'ããã³
W 2 ' = 'UZY'ã§ãã
ãªãã·ã§ã³ãaãã«é©åãããã¹ãŠã®åèªãèŸæžã§ãã°ããèŠã€ããã«ã¯ã©ãããã°ããã§ããïŒ ããã¯éåžžã«ç°¡åã§ãããã¬ãã£ãã¯ã¹ããªãŒã§ãã¬ãã£ãã¯ã¹
W 1ãæã€ããŒããèŠã€ããåºæ¬çãªSchultz and Mihovã¢ã«ãŽãªãºã ã«åŸã£ãŠãã®ãã¹ãŠã®çžç¶äººããã€ãã¹ãã
W 2ããããŒã1以äžã®ãã®ãéžæããŸãã
ãªãã·ã§ã³ãbãã®å Žåã鿥é èŸããªãŒã䟿å©ã§ããéè¡
W 2ã«å¯Ÿå¿ããããŒããèŠã€ãããã®ãã¹ãŠã®çžç¶äººããã€ãã¹ããé
W 1ããããŒã1以äžã§ãããã®ãéžæããŸã-åºæ¬ã¢ã«ãŽãªãºã ã«åŸã£ãŠã
ãªãã·ã§ã³ãcãã®å ŽåãããŒãã£ã·ã§ã³å¢çäžã®åèª
Wã® 2æåã亀æããçµæã®åèªããã¬ãã£ãã¯ã¹ããªãŒã«å«ãŸããŠãããã©ããã確èªããã ãã§ãã
FB-Trieã¢ã«ãŽãªãºã ã䜿çšããŠãã£ã¯ã·ã§ããªã®ãã¡ãžãŒæ€çŽ¢ã®åé¡ã解決ããã«ã¯ãäžèšã®3ã€ã®åèªã»ãããèŠã€ããŠããããçµåããã ãã§ãã
N = 2ã®å Žå
ãããã«2ã€ã®ã±ãŒã¹ãèæ
®ããå¿
èŠããããŸãã
aïŒdïŒS 1 ãW 1 ïŒ= 0ããã³dïŒS 2 ãW 2 ïŒ<= 2
bïŒ1 <= dïŒS 1 ãW 1 ïŒ<= 2ããã³dïŒS 2 ãW 2 ïŒ= 0
cïŒdïŒS 1 ãW 1 ïŒ= 1ããã³dïŒS 2 ãW 2 ïŒ= 1
ãŸããæåå
S 1ã®æåŸã®æåã
W 2ã®æåã®æåã«çããã
W 1ã®æåŸã®æåã
S 2ã®æåã®æåã«çããå Žåãããã«2ã€ã®ã±ãŒã¹ãèããããŸãã
dïŒdïŒS 1 ãW 1 'ïŒ= 0ããã³dïŒS 2 ãW 2 'ïŒ<= 1
dïŒdïŒS 1 ãW 1 'ïŒ<= 1ããã³dïŒS 2 ãW 2 'ïŒ= 0
æåã®2ã€ã®ã±ãŒã¹ã¯ã
N = 1ã®ãªãã·ã§ã³ãaããšãbãã®é¡æšã«ãã£ãŠç°¡åã«æ€åºãããŸããã
N = 2ã®ã¬ãŒãã³ã·ã¥ã¿ã€ã³ãªãŒãããã³ã䜿çšãããŸãã
N = 2ã®ãªãã·ã§ã³ãgããšãdãã¯ã
N = 1ã®ãªãã·ã§ã³ãaããšãbããç¹°ãè¿ããŸããéšåæåå
W 1ãš
W 2ã®ä»£ããã«ã
W 1 'ãš
W 1 'ã䜿çšãããŸãã
ãªãã·ã§ã³ãcãã¯ããå°ãè€éã§ãã æåã®ã¹ãããã§ã¯ã
W 1ïŒåºæ¬ã¢ã«ãŽãªãºã ïŒãã1以äžã®ãã¬ãã£ãã¯ã¹ã«å¯Ÿå¿ãããã¹ãŠã®ããŒããçŽæ¥ãã¬ãã£ãã¯ã¹ããªãŒã§èŠã€ããå¿
èŠããããŸãã 2çªç®ã®ã¹ãããã§ã¯ããã®ãããªããŒãããšã«ãåäŒç€Ÿããã€ãã¹ãã
W 2ãã1以äžã®ééã®ããŒã«å¯Ÿå¿ããåäŒç€Ÿãéžæããå¿
èŠããããŸãïŒåã³ãåºæ¬ã¢ã«ãŽãªãºã ïŒã
N = 3ã®å Žåã7ã€ã®ã±ãŒã¹ãèæ
®ããå¿
èŠããããŸãã ããã§ã¯ç޹ä»ã
ãŸãããMikhovand SchulzïŒ2004ïŒã®ãªãªãžãã«èšäºã
ã芧ãã ããã 顿šã«ãããä»»æã®
Nã«ã€ããŠç¶è¡ã§ããŸãããå®éçãªåé¡ã解決ããéã«ãããå¿
èŠã«ãªãããšã¯ã»ãšãã©ãããŸããã
æ§èœè©äŸ¡
è峿·±ãããšã«ãFB-Trieã¢ã«ãŽãªãºã ã䜿çšããæ€çŽ¢æéã¯ãèªé·
Wãå¢å ããã«ã€ããŠæžå°ããŸãã
ä»ã®åºãç¥ãããŠãããã¡ãžãŒæ€çŽ¢ã¢ã«ãŽãªãºã ãšæ¯èŒããFB-Trieã¢ã«ãŽãªãºã ã䜿çšããæ€çŽ¢æéã®è©³çްãªåæã¯ãLeonid Boytsovã®ç ç©¶
ãè¿äŒŒèŸæžæ€çŽ¢ã®ããã®çŽ¢åŒä»ãæ¹æ³ïŒæ¯èŒåæãïŒ2011ïŒã«ãããŸãã ãã®äœæ¥ã«ãããæ¬¡ã®ãããªã¢ã«ãŽãªãºã ã§æ€çŽ¢æéãšæ¶è²»ãããã¡ã¢ãªéã培åºçã«æ¯èŒã§ããŸãã
- 培åºçãªæ€çŽ¢ã
- n-gramã¡ãœããã®ããŸããŸãªå€æŽã
- ãµã³ããªã³ã°æ¡åŒµã¡ãœããã®ããŸããŸãªå€æŽã
- 眲åããã·ã¥
- FB-Trieããã³ãã®ä»ã®ã¢ã«ãŽãªãºã ã
ããã§å€æ°ã®æ°åãšã°ã©ãããã¹ãŠç¹°ãè¿ãã®ã§ã¯ãªããèªç¶èšèªã®äžè¬çãªçµè«ã«éå®ããŸãã
ãã®ãããFB-Trieã¢ã«ãŽãªãºã ã¯ãããã©ãŒãã³ã¹ãšã¡ã¢ãªæ¶è²»ã®åççãªåŠ¥åç¹ãæäŸããŸãã ã¢ããªã±ãŒã·ã§ã³ã2以äžã®ç·šéè·é¢ãç¶æããå¿
èŠããããèŸæžã«500,000以äžã®åèªãå«ãŸããŠããå Žå-FB-Trieã¢ã«ãŽãªãºã ã¯åççãªéžæã§ãã åççãªã¡ã¢ãªæ¶è²»ã§æå°æ€çŽ¢æéãæäŸããŸãïŒã¬ãã·ã³ã³ã䜿çšããã¡ã¢ãªã®çŽ300ïŒ
ïŒã
ç·šéè·é¢ã
N = 1ã«å¶éããå ŽåããŸãã¯å°ããªèŸæžãããå Žåã¯ãå€ãã®ã¢ã«ãŽãªãºã ãããé«éã«åäœããå¯èœæ§ããããŸãïŒããšãã°ãMor-Fraenkelã¡ãœãããŸãã¯
FastSS ïŒããã¡ã¢ãªæ¶è²»ã®å¢å ïŒã¬ãã·ã³ã³ãµã€ãºã®æå€§20,000ïŒ
ïŒã«åããŠãã ããã ãã¡ãžãŒã€ã³ããã¯ã¹ãä¿åããããã«æ°åã®ã¬ãã€ãã®RAMãããå Žåããããã®ã¡ãœããã倧ããªèŸæžãµã€ãºã§äœ¿çšã§ããŸãã
èªè
ããããã©ãã ãã§ããããæ±ºããããšãã§ããããã«-500,000åèªããã·ã¢èªã®åèªæ°ïŒ
ããããåŒçš ïŒã«ã€ããŠããã€ãã®æ°åã瀺ããŸãã
- Lopatinã®ã¹ãã«èŸæžã«ã¯162,240èªãå«ãŸããŠããŸããã¬ãã·ã³ã³ãã¡ã€ã«ã®ãµã€ãºã¯2 MBã§ãã
- ãã·ã¢èªã®å§ã®ãªã¹ãã«ã¯ãå°ãªããšã247,780ã®å§ãå«ãŸããŠããŸããã¬ãã·ã³ã³ãã¡ã€ã«ã®ãµã€ãºã¯4.6 MBã§ãã
- A. A. Zaliznyakã«ãããã·ã¢èªã®å®å
šã«åŒ·èª¿ããããã©ãã€ã ã¯ã2,645,347ã®åèªåœ¢åŒã§ãããã¬ãã·ã³ã³ã®ãã¡ã€ã«ãµã€ãºã¯çŽ35 MBã§ãã
ãããã2ã€ã®ãã¬ãã£ãã¯ã¹ããªãŒã®åœ¢åŒã§èŸæžãä¿åããæ©èœããªãå Žåã¯ã©ãã§ããããã ããšãã°ããœãŒãããããªã¹ããšããŠè¡šç€ºãããŸãã ãã®å Žåããã¡ãžãŒæ€çŽ¢ã«ã¬ãŒãã³ã·ã¥ã¿ã€ã³ãªãŒãããã³ã䜿çšããããšã¯å¯èœã§ãããå®çšçã§ã¯ãããŸããã ãããã-培åºçãªæ€çŽ¢ã®ããŸããŸãªå€æŽãæ®ã£ãŠããããã§ãïŒããšãã°ãåèªã®é·ãã«æ²¿ã£ãã¯ãªããã³ã°
| W | plus minus
N ïŒã å®è£
ãããç°¡åãªæ¹æ³ïŒ
ãµã³ãã«æ¡åŒµã¢ã«ãŽãªãºã ãªã© ïŒãšæ¯èŒããŠããã©ãŒãã³ã¹ãåäžããªããããå®çšçã§ã¯ãããŸããã
åºæ¬çãªSchultz and Mihovã¢ã«ãŽãªãºã ã¯ãFB-Trieã¢ã«ãŽãªãºã ããã2åå°ãªãã¡ã¢ãªããå¿
èŠãšããªãããšã«æ³šæããŠãã ããã ãã ããæ€çŽ¢æéã1æ¡å¢ããããšã§ããã®è²»çšãæ¯æãå¿
èŠããããŸãïŒã¢ã«ãŽãªãºã ã®äœæè
ã®è©äŸ¡ïŒã
ããã«ã€ããŠã¯ãå®å
šãªLevenshteinãªãŒãããã³ã䜿çšããèŸæžå
ã®ãã¡ãžãŒæ€çŽ¢ã¢ã«ãŽãªãºã ã®æ€èšãæ€èšããŸãã
ã¯ããCïŒã®Spellcheckerã®å®å
šãªãœãŒã¹ã³ãŒãã¯
ãã¡ãã«ãããŸã ã ããããç§ã®å®è£
ã¯ããã©ãŒãã³ã¹ã®ç¹ã§ã¯æé©ã§ã¯ãããŸããããFB-Trieã¢ã«ãŽãªãºã ã®åäœãçè§£ããã®ã«åœ¹ç«ã¡ãã¢ããªã±ãŒã·ã§ã³ã®åé¡ã解決ããã®ã«åœ¹ç«ã€ãããããŸããã
åºçç©ãèªãã ãã¹ãŠã®äººã«-ããªãã®èå³ã«æè¬ããŸãã
åç
§è³æ
- ç§ã®æçš¿ã®æåã®éšå
- ã¬ãŒãã³ã·ã¥ã¿ã€ã³ãªãŒãããã³ãšã·ã¥ã«ããšãããã®åºæ¬ã¢ã«ãŽãªãºã ïŒ2002ïŒ
- æ€çŽ¢ãè¿ã
- Mihov and Schulzã«ããèšäºã®FB-Trieã¢ã«ãŽãªãºã ïŒ2004ïŒ
- ãã¡ãžã£æ€çŽ¢å°çšã®Leonid Boytsovã®ãµã€ã
- èŸæžãšããã¹ãã®ãããŸãæ€çŽ¢ã«é¢ããè¯ãæçš¿
- ãã¬ãã£ãã¯ã¹ããªãŒã«ã€ããŠHabrÃ©ã«æçš¿ãã
- FastSSã¢ã«ãŽãªãºã
- CïŒã®èšäºã®ãœãŒã¹
- Javaã§ã®å®è£
ïŒ 1ãš2
- ãã·ã¢èªèŸæžã®ã»ãã