ãã®æçš¿ã®ã³ãŒãã¯ãæçš¿èªäœãšåæ§ã«
githubã«æçš¿ãããŸãã
æè¿ãŸã§ãæ£èŠè¡šçŸã¯ããçš®ã®éæ³ã®ããã«æããŸããã æååãç¹å®ã®æ£èŠè¡šçŸã«äžèŽãããã©ãããå€æããæ¹æ³ãããããŸããã§ããã ãããŠä»ãç§ã¯ãããæã«å
¥ããŸããïŒ ä»¥äžã¯ã200è¡æªæºã®ã³ãŒãã§ã®åçŽãªæ£èŠè¡šçŸãšã³ãžã³ã®å®è£
ã§ãã
ããŒã1ïŒè§£æ
ä»æ§æž
æ£èŠè¡šçŸãå®å
šã«å®è£
ããã®ã¯éåžžã«å°é£ãªäœæ¥ã§ãã ããã«æªãããšã«ã圌女ã¯ããªãã«ã»ãšãã©æããŸããã ç§ãã¡ãå®è£
ããŠããããŒãžã§ã³ã¯ãã«ãŒãã³ã«ã¹ãªããããããšãªããããã¯ãç 究ããã®ã«ååã§ãã æ£èŠè¡šçŸèšèªã¯æ¬¡ããµããŒãããŸãã
.
-ä»»æã®æåã«äžèŽ|
cde
ãŸãã¯cde
äžèŽ+
-åã®ãã¿ãŒã³ã®1ã€ä»¥äžã«äžèŽ*
-0å以äžã®åã®ãã¿ãŒã³ãšäžèŽ(
i )
-ã°ã«ãŒãå
ãªãã·ã§ã³ã®ã»ããã¯å°ããã§ãããããšãã°
m (t|n| ) | b
ãªã©ã®èå³æ·±ãæ£èŠè¡šçŸãäœæããããã«äœ¿çšã§ããŸã
m (t|n| ) | b
m (t|n| ) | b
Star Trekã®åå¹ãªãã§Star Warsã®åå¹ãæ€çŽ¢ã§ããããã«ãããã
(..)*
ã§ãã¹ãŠã®é·ãã®å¶æ°è¡ã®ã»ãããæ€çŽ¢ã§ããŸãã
æ»æèšç»
3ã€ã®æ®µéã§æ£èŠè¡šçŸãåæããŸãã
- æ£èŠè¡šçŸãæ§ææšã«è§£æïŒè§£æïŒãã
- æ§ææšãç¶æ
æ©æ¢°ã«å€æãã
- ç§ãã¡ã®ã©ã€ã³ã®ã¹ããŒããã·ã³åæ
æ£èŠè¡šçŸãåæããã«ã¯ïŒããã«ã€ããŠã¯ä»¥äžã§è©³ãã説æããŸãïŒã
NFAãšåŒã°ããã¹ããŒããã·ã³ã䜿çšããŸãã é«ã¬ãã«ã§ã¯ãNFAã¯æ£èŠè¡šçŸãè¡šããŸãã å
¥åãåä¿¡ãããšãNFAã®ç¶æ
ããç¶æ
ã«ç§»åããŸãã æå¹ãªé·ç§»ãè¡ãããšãäžå¯èœãªãã€ã³ãã«å°éããå Žåãæ£èŠè¡šçŸã¯æååãšäžèŽããŸããã
ãã®ã¢ãããŒãã¯ãUnixã®èè
ã®1人ã§ããã±ã³ãã³ããœã³ã«ãã£ãŠåããŠå®èšŒãããŸããã CACMèªã®1968幎ã®èšäºã§ã圌ã¯ããã¹ããšãã£ã¿ãŒãå®è£
ããããã®ååãæŠèª¬ãããã®ã¢ãããŒããæ£èŠè¡šçŸã€ã³ã¿ãŒããªã¿ãŒãšããŠè¿œå ããŸããã å¯äžã®éãã¯ã圌ã®èšäºããã·ã³ã³ãŒã7094ã§æžãããŠããããšã§ãããã®æç¹ã§ã¯ããã¹ãŠãããããŒãã³ã¢ã§ããã
ãã®ã¢ã«ãŽãªãºã ã¯ãRE2ã®ãããªéåŒãåç
§ã®ãªããšã³ãžã³ãã蚌æå¯èœãªç·åœ¢æéã§æ£èŠè¡šçŸãåæããæ¹æ³ãåçŽåãããã®ã§ãã ããã¯ãéã«ãã¯ã¢ããã䜿çšããPythonããã³Javaã®æ£èŠè¡šçŸãšã³ãžã³ãšã¯å€§ããç°ãªããŸãã 现ãç·ã®ããå
¥åããŒã¿ã®å Žåããããã¯ã»ãŒç¡éã«å®è¡ãããŸãã å®è£
ã¯
O(length(input) * length(expression)
ãŸãã
ç§ã®ã¢ãããŒãã¯ãã©ã¹ã³ãã¯ã¹ã圌ã®
çŽ æŽãããæçš¿ã§ã¬ã€ã¢ãŠãããæŠç¥ãšã»ãŒäžèŽããŠããŸãã
æ£èŠè¡šçŸè¡šçŸ
äžæ©äžãã£ãŠãæ£èŠè¡šçŸã®æ瀺æ¹æ³ã«ã€ããŠèããŠã¿ãŸãããã æ£èŠè¡šçŸã®åæãéå§ããåã«ãã³ã³ãã¥ãŒã¿ãŒã§äœ¿çšã§ããããŒã¿æ§é ã«å€æããå¿
èŠããããŸãã æååã®æ§é ã¯ç·åœ¢ã§ãããæ£èŠè¡šçŸã«ã¯èªç¶ãªéå±€ããããŸãã
æåå
abc|(c|(de))
èŠãŠã¿ãŸãããã æååãšããŠæ®ããå Žåã¯ãæ»ã£ãŠãã©ã³ãžã·ã§ã³ãå®è¡ããåŒãåæãããšãã«ãã©ã±ããã®ããŸããŸãªã»ããã远跡ããå¿
èŠããããŸãã 1ã€ã®è§£æ±ºçã¯ãã³ã³ãã¥ãŒã¿ãŒãç°¡åã«ãã€ãã¹ã§ããããªãŒã«å€æããããšã§ãã ããšãã°ã
b+a
ã¯æ¬¡ã®ããã«ãªããŸãã
ããªãŒãè¡šãã«ã¯ãã¯ã©ã¹éå±€ãäœæããå¿
èŠããããŸãã ããšãã°ã
Or
ã¯ã©ã¹ã«ã¯ããã®äž¡åŽãè¡šã2ã€ã®ãµãããªãŒãå¿
èŠã§ãã ä»æ§ãããæ£èŠè¡šçŸã®4ã€ã®ç°ãªãã³ã³ããŒãã³ãã
+
ã
*
ã
|
ãèªèããå¿
èŠãããããšã¯æããã§ã
|
ããã³ã®ãããªæåãªãã©ã«
.
ã
a
ããã³
b
ããã«ãããåŒãå¥ã®åŒã®åŸã«ç¶ãå Žåãè¡šãå¿
èŠããããŸãã ã¯ã©ã¹ã¯æ¬¡ã®ãšããã§ãã
abstract class RegexExpr // ., a, b case class Literal(c: Char) extends RegexExpr // a|b case class Or(expr1: RegexExpr, expr2: RegexExpr) extends RegexExpr // ab -> Concat(a,b); abc -> Concat(a, Concat(b, c)) case class Concat(first: RegexExpr, second: RegexExpr) extends RegexExpr // a* case class Repeat(expr: RegexExpr) extends RegexExpr // a+ case class Plus(expr: RegexExpr) extends RegexExpr
æ£èŠè¡šçŸè§£æ
æååããããªãŒãã¥ãŒã«ç§»åããã«ã¯ã
解æïŒè§£æïŒãšåŒã°ããå€æããã»ã¹ã䜿çšããå¿
èŠããããŸãã ããŒãµãŒã®æ§ç¯ã«ã€ããŠã¯è©³ãã説æããŸããã 代ããã«ãç¬èªã®èšè¿°ãè¡ãå Žåã«é©åãªæ¹åã瀺ãã®ã«ååãªæ
å ±ã説æããŸãã
ããŒãµãŒã³ã³ãããŒã¿ãŒã©ã€ãã©ãª Scalaã䜿çšããŠæ£èŠè¡šçŸã解æããæ¹æ³ã«ã€ããŠç°¡åã«èª¬æããŸãã
ScalaããŒãµãŒã©ã€ãã©ãªã䜿çš
ãããšãèšèªãèšè¿°ããäžé£ã®
ã«ãŒã«ãèšè¿°ããã ãã§ãããŒãµãŒ
ãèšè¿°ã§ããŸã ã æ®å¿µãªãããããã¯å€ãã®æããªãã£ã©ã¯ã¿ãŒã䜿çšããŸãããç§ã¯ããªãããã€ãºãä¹ãè¶ããŠäœãèµ·ãã£ãŠããã®ãäžè¬çãªç解ãåŸãããšãæã¿ãŸãã
ããŒãµãŒãå®è£
ãããšããæäœã®é åºã決å®ããå¿
èŠããããŸãã ç®è¡ã®ããã«ãPEMDASã䜿çšãããŸã
[çŽã ããã P arenthesesã E xponentsãUltiplication / D ivisionãAddition / S ubtractionã¯ãç®è¡æŒç®ã®é åºãèŠããããšãã§ããããŒã¢ããã¯ã§ãã] ãæ£èŠè¡šçŸã§ã¯ç°ãªãã«ãŒã«ã»ãããé©çšãããŸãã æŒç®åããã®é£ã®ã·ã³ãã«ã«ããªã³ã¯ããããšããã¢ã€ãã¢ã®å©ããåããŠããããããæ£åŒã«è¡šçŸã§ããŸãã ç°ãªãæŒç®åã¯ãç°ãªã匷床ã§ããã€ã³ãããããŸã-åæ§ã«ã5 + 6 * 4ã®ãããªåŒã§
*
ãæŒç®å
*
+
ãããå€ããã€ã³ããããŸãã æ£èŠè¡šçŸã§ã¯ã
*
|
ããã匷åã«ä»å ãããŸã
|
ã ããã¯ãæã匱ãæŒç®åãæäžäœã«ããããªãŒã®åœ¢ã§æ³åã§ããŸãã
ãããã£ãŠãæã匱ãæŒç®åãæåã«è§£æãã次ã«åŒ·ãæŒç®åã解æããå¿
èŠããããŸãã 解æã§ã¯ãããã¯æŒç®åãæœåºããŠããªãŒã«è¿œå ããæååã®æ®ãã®2ã€ã®éšåãååž°çã«åŠçããããšãšèããããšãã§ããŸãã
æ£èŠè¡šçŸã§ã¯ãçµå匷床ã¯æ¬¡ã®é åºã§ãã
- æåãªãã©ã«ãšæ¬åŒ§
+
ããã³*
- é£çµ-bã®åŸã«a
|
4ã€ã®ã¬ãã«ã®çµååãããããã4çš®é¡ã®åŒãå¿
èŠã§ãã ããããïŒã©ã³ãã ã«ïŒåœåããŸããïŒ
lit
ã
lowExpr
ïŒ
+
ã
*
ïŒã
midExpr
ïŒé£çµïŒããã³
highExpr
ïŒ
|
ïŒã ã³ãŒãã«ç§»ããŸãããã æåã«ãæãåºæ¬çãªã¬ãã«ãã€ãŸãåäžã®ãã£ã©ã¯ã¿ãŒçšã®ããŒãµãŒãäœæããŸãã
object RegexParser extends RegexParsers { def charLit: Parser[RegexExpr] = ("""\w""".r | ".") ^^ { char => Literal(char.head) }
æ§æãå°ã説æããŸãããã ãã®ã³ãŒãã¯ã
RegexExpr
ãåéããããŒãµãŒãå®çŸ©ããŠããŸãã å³åŽã«ã¯ãã
\w
ïŒä»»æã®åèªæåïŒãŸãã¯ããªãªãã«äžèŽãããã®ãæ€çŽ¢ããŸãã èŠã€ãã£ããã
Literal
ã«å€ããŠãã ããã
æ¬åŒ§ã¯æã匷åãªãã€ã³ãã£ã³ã°ãæã€ãããããŒãµãŒã®æäžäœã¬ãã«ã§å®çŸ©ããå¿
èŠããããŸãã ãã ããäœããæ¬åŒ§ã§å²ãå¿
èŠããããŸãã 次ã®ã³ãŒãã§ãããå®çŸã§ããŸãã
def parenExpr: Parser[RegexExpr] = "(" ~> highExpr <~ ")" def lit: Parser[RegexExpr] = charLit | parenExpr
ããã§
*
ãš
+
ãå®çŸ©ããŸãïŒ
def repeat: Parser[RegexExpr] = lit <~ "*" ^^ { case l => Repeat(l) } def plus: Parser[RegexExpr] = lit <~ "+" ^^ { case p => Plus(p) } def lowExpr: Parser[RegexExpr] = repeat | plus | lit
次ã«ã次ã®ã¬ãã«ãå®çŸ©ããŸã-é£çµïŒ
def concat: Parser[RegexExpr] = rep(lowExpr) ^^ { case list => listToConcat(list)} def midExpr: Parser[RegexExpr] = concat | lowExpr
æåŸã«ãããŸãã¯ããå®çŸ©ããŸãã
def or: Parser[RegexExpr] = midExpr ~ "|" ~ midExpr ^^ { case l ~ "|" ~ r => Or(l, r)}
ãããŠæåŸã«
highExpr
ãå®çŸ©ããŸãã
highExpr
ã¯
or
ã§ãããæã匱ããã€ã³ãã£ã³ã°æŒç®å
midExpr
ã
or
ã
or
ããªã
or
ã¯
midExpr
ã§ãã
def highExpr: Parser[RegexExpr] = or | midExpr
次ã«ãããã€ãã®ãã«ããŒã³ãŒããè¿œå ããŠçµäºããŸãã
def listToConcat(list: List[RegexExpr]): RegexExpr = list match { case head :: Nil => head case head :: rest => Concat(head, listToConcat(rest)) } def apply(input: String): Option[RegexExpr] = { parseAll(highExpr, input) match { case Success(result, _) => Some(result) case failure : NoSuccess => None } } }
ãããŠããã ãã§ãïŒ ãã®ã³ãŒããScalaã§äœ¿çšãããšãä»æ§ãæºããä»»æã®æ£èŠè¡šçŸã®æ§æããªãŒãçæã§ããŸãã çµæã®ããŒã¿æ§é ã¯ããªãŒã«ãªããŸãã
æ£èŠè¡šçŸãæ§æããªãŒã«å€æã§ããããã«ãªã£ãã®ã§ãæ§æ解æã«è¿ã¥ããŠããŸãã
ããŒã2ïŒNFAã®çæ
æ§æããªãŒãNFAã«å€æãã
åã®ããŒãã§ã¯ãæ£èŠè¡šçŸã®ãã©ããã©ã€ã³è¡šçŸãéå±€æ§æããªãŒåœ¢åŒã«å€æããŸããã ãã®ããŒãã§ã¯ãæ§æããªãŒãç¶æ
ãã·ã³ã«å€æããŸãã ã¹ããŒããã·ã³ã¯ãæ£èŠè¡šçŸã³ã³ããŒãã³ããã°ã©ãã®ç·åœ¢åœ¢åŒã«å€æãããaãbã«ç¶ããcãç¶ãããšããé¢ä¿ãäœæããŸãã ã°ã©ãè¡šçŸã¯ãæœåšçãªè¡ã«é¢ããåæãç°¡çŽ åããŸãã
æ£èŠè¡šçŸã«äžèŽããããã ãã«å¥ã®å€æãè¡ãã®ã¯ãªãã§ããïŒãã¡ãããæååããã¹ããŒããã·ã³ã«çŽæ¥å€æããããšãã§ããŸãã æ§æããªãŒãŸãã¯æååããæ£èŠè¡šçŸãçŽæ¥è§£æããããšãã§ããŸãã ãã ããã¯ããã«è€éãªã³ãŒããåŠçããå¿
èŠããããŸãã æœè±¡åã®ã¬ãã«ããã£ãããšäžããããšã§ããã¹ãŠã®æ®µéã§ã³ãŒããç°¡åã«ç解ã§ããããã«ãªããŸãã ããã¯ãã»ãŒç¡éã®å¢çç·ã®ã±ãŒã¹ãæã€æ£èŠè¡šçŸã€ã³ã¿ãŒããªã¿ãŒã«äŒŒããã®ãæ§ç¯ããå Žåã«ç¹ã«éèŠã§ãã
NFAãDFAãããªã
NFAãŸãã¯
é決å®æ§æéãªãŒãããã³ãšåŒã°ããç¶æ
ãã·ã³ãäœæããŸãã ç¶æ
ãšé·ç§»ã®2çš®é¡ã®ã³ã³ããŒãã³ãããããŸãã å³ã«è¡šç€ºãããå Žåãç¶æ
ã¯åã§ç€ºãããé·ç§»ã¯ç¢å°ã§ç€ºãããŸãã äžèŽç¶æ
ã¯äºéäžžã§ç€ºãããŸãã é·ç§»ã«ããŒã¯ãä»ããããŠããå Žåããã®é·ç§»ãè¡ãã«ã¯ãå
¥åã§ãã®ã·ã³ãã«ãååŸããå¿
èŠãããããšãæå³ããŸãã ãã©ã³ãžã·ã§ã³ã«ãã¿ã°ããªãå ŽåããããŸãã ããã¯ãå
¥åãæ¶è²»ããã«ç§»è¡ã§ããããšãæå³ããŸãã
泚ïŒæç®ã§ã¯ãããã¯ÎµãšåŒã°ããããšããããŸãã åçŽãªæ£èŠè¡šçŸãabããè¡šãç¶æ
ãã·ã³ïŒ
以äžã®å³ã¯ãããŒãããåãæ¿ãããšãã«åãå
¥åãæ¶è²»ãã2ã€ã®ãã¹ã瀺ããŠãããããããŒãã«ã¯ç¹å®ã®å
¥åã«å¯ŸããŠããã€ãã®æ£ããåŸç¶ã®ç¶æ
ãããå ŽåããããŸãã
ããã決å®è«çãªæéç¶æ
ãã·ã³ïŒDFAïŒãšæ¯èŒããŠãã ãããDFAã§ã¯ããã®ååã瀺ãããã«ãäžããããå
¥åãåäžã®ç¶æ
å€åãåŒãèµ·ããå¯èœæ§ããããŸãã äžæ¹ã§ããã®å¶éã®åæžã«ãããåæãå°ãè€éã«ãªããŸãããäžæ¹ã§ãããã«ãããããã«ãããªãŒããã®çæã倧å¹
ã«ç°¡çŽ åãããŸãã åºæ¬çã«ãNFAãšDFAã¯ãè¡šçŸã§ããã¹ããŒããã·ã³ãäºãã«äŒŒãŠããŸãã
çè«ã®è»¢æ
æ§æããªãŒãNFAã«å€æããæŠç¥ã®æŠèŠãèŠãŠã¿ãŸãããã ããã¯æãããæãããããããŸããããããã»ã¹ãæ§æå¯èœãªãã©ã°ã¡ã³ãã«å解ãããšç解ãããããªãããšãããããŸãã å€æããå¿
èŠãããæ§æèŠçŽ ãæãåºããŠãã ããã
- æåãªãã©ã«ïŒ
Lit(c: Char)
*
ïŒ Repeat(r: RegexExpr)
+
ïŒ Plus(r: RegexExpr)
- é£çµïŒ
Concat(: RegexExpr, : RegexExpr)
|
ïŒ Or(a: RegexExpr, b: RegexExpr)
ãã®åŸã®å€æã¯ããã³ããœã³ã1968幎ã®èšäºã§æåã«è¿°ã¹ããã®ã§ãã å€æã¹ããŒã ã§ã¯ã
In
ã¯ã¹ããŒããã·ã³ã®ãšã³ããªãã€ã³ããæãã
Out
ã¯åºåãæããŸãã å®éã«ã¯ãMatchç¶æ
ãStartç¶æ
ããŸãã¯ãã®ä»ã®æ£èŠè¡šçŸã³ã³ããŒãã³ãã«ãªããŸãã
In
/
Out
æœè±¡åã«ãããæéç¶æ
ãã·ã³ãæ§æããã³çµåããããšãã§ããŸã-æãéèŠãªçµè«ã ãã®ååãããäžè¬çãªæå³ã§é©çšããŠãæ§æããªãŒã®åèŠçŽ ãæ§æå¯èœãªç¶æ
ãã·ã³ã«å€æããŸãã
æåãªãã©ã«ããå§ããŸãããã æåãªãã©ã«ã¯ãããç¶æ
ããå¥ã®ç¶æ
ãžã®é·ç§»ã§ãããå
¥åããŒã¿ãæ¶è²»ããŸãã ãªãã©ã«ãaãã®æ¶è²»ã¯æ¬¡ã®ããã«ãªããŸãã
次ã«ãé£çµã調ã¹ãŠã¿ãŸãããã æ§æããªãŒã®2ã€ã®ã³ã³ããŒãã³ããé£çµããã«ã¯ãã©ãã«ãªãã®ç¢å°ã§ããããæ¥ç¶ããã ãã§ãã ãã®äŸã§ã¯ã
Concat
ã¯2ã€ã®ä»»æã®æ£èŠè¡šçŸã®é£çµãå«ããããšãã§ãããããã¹ããŒã å
ã®
A
ãš
B
ã¯ãå¥ã
ã®ç¶æ
ã§ã¯ãªããæéç¶æ
ãã·ã³ã«ãªããŸãã
A
ãš
B
äž¡æ¹ããªãã©ã«ã§ããå Žåãå°ãå¥åŠãªããšãèµ·ãããŸãã äžéç¶æ
ãªãã§2ã€ã®ç¢å°ãæ¥ç¶ããã«ã¯ã©ãããã°ããã§ããïŒ çãã¯ããªãã©ã«ã¯ãã¹ããŒããã·ã³ã®æŽåæ§ãç¶æããããã«ãäž¡åŽã«ãã¡ã³ãã ç¶æ
ãæã€ããšãã§ãããšããããšã§ãã
Concat(A, B)
次ã®ããã«å€æãããŸãã
Or(A, B)
ãè¡šãã«ã¯ãåæç¶æ
ãã2ã€ã®å¥ã
ã®ç¶æ
ã«åå²ããå¿
èŠããããŸãã ãããã®ãªãŒãããã³ãå®äºãããšãäž¡æ¹ãšã次ã®ç¶æ
ïŒ
Out
ïŒã瀺ãã¯ãã§ãã
*
èŠãŠã¿ãŸãããã ã¢ã¹ã¿ãªã¹ã¯ã«ã¯ããã¿ãŒã³ã®0å以äžã®ç¹°ãè¿ããæå®ã§ããŸãã ãããå®è£
ããã«ã¯ã次ã®ç¶æ
ãçŽæ¥æã1ã€ã®æ¥ç¶ãšã
A
ä»ããŠçŸåšã®ç¶æ
ã«æ»ãæ¥ç¶ãå¿
èŠã§ã
A
A*
ã¯æ¬¡ã®ããã«ãªããŸãã
+
ã«ã€ããŠã¯ãã¡ãã£ãšããããªãã¯ã䜿çšããŸãã
a+
ã¯
aa*
ã§ãã èŠçŽãããšã
Plus(A)
ã
Concat(A, Repeat(A))
æžãæããããšãã§ããŸãã ãã®å Žåã®ç¹å¥ãªãã¿ãŒã³ãéçºãã代ããã«ãããããŸãã
+
èšèªã«å«ããç¹å¥ãªçç±ããããŸãããç§ãèŠéããä»ã®ããè€éãªæ£èŠè¡šçŸèŠçŽ ããç§ãã¡ã®èšèªã®ã«ããŽãªãŒã§ã©ã®ããã«è¡šçŸã§ãããã瀺ãããã£ãã®ã§ãã
å®è·µãžã®è»¢æ
çè«çãªèšç»ãã§ããã®ã§ãã³ãŒãã§ãããå®è£
ããæ¹æ³ãç解ããŸãããã å¯å€ã°ã©ããäœæããŠãããªãŒãä¿åããŸãã ç§ã¯äžå€æ§ã奜ã¿ãŸãããäžå€ã®ã°ã©ããäœæããããšã¯ãã®è€éãã«æ©ãŸãããç§ã¯æ ãè
ã§ãã
äžèšã®ã¹ããŒã ãåºæ¬ã³ã³ããŒãã³ãã«éå
ããããšãããšãå
¥åããŒã¿ãæ¶è²»ããç¢å°ãäžèŽç¶æ
ã2ã€ã®ç¶æ
ã«åå²ããã1ã€ã®ç¶æ
ãšãã3ã€ã®ã¿ã€ãã®ã³ã³ããŒãã³ããåŸãããŸãã ç§ã¯ãããå°ãå¥åŠã«èŠããæœåšçã«äžå®å
šã§ããããšãç¥ã£ãŠããŸãã ãã®ãããªãœãªã¥ãŒã·ã§ã³ãæãã¯ãªãŒã³ãªã³ãŒãã«ã€ãªãããšç§ãä¿¡ããå¿
èŠããããŸãã NFAã³ã³ããŒãã³ãã®3ã€ã®ã¯ã©ã¹ã¯æ¬¡ã®ãšããã§ãã
abstract class State class Consume(val c: Char, val out: State) extends State class Split(val out1: State, val out2: State) extends State case class Match() extends State
泚ïŒéåžžã®ã¯ã©ã¹ã§ã¯ãªãã Match
case-
ãäœæããŸããã Scalaã®ã±ãŒã¹ã¯ã©ã¹ã¯ãããã©ã«ãã§ã¯ã©ã¹ã«äŸ¿å©ãªæ©èœãæäŸããŸãã å€ã«åºã¥ããŠåçæ§ãäžãããããç§ã¯ããã䜿çšããŸããã ããã«ããããã¹ãŠã®Match
ãªããžã§ã¯ããåçã«ãªãã䟿å©ãªæ©èœã«ãªããŸãã ä»ã®ã¿ã€ãã®NFAã³ã³ããŒãã³ãã«ã€ããŠã¯ããªã³ã¯ã®ç䟡æ§ãå¿
èŠã§ããã³ãŒãã¯ãæ§æããªãŒãååž°çã«èµ°æ»ãã
andThen
ãªããžã§ã¯ãããã©ã¡ãŒã¿ãŒãšããŠä¿åããŸãã
andThen
ãåŒã®èªç±ãªåºåã«æ·»ä»ãããã®ã§ãã æ§æããªãŒã®ä»»æã®ãã©ã³ãã«ã¯ãããã«å
ã«é²ããã®ã®ååãªã³ã³ããã¹ãããªããããå¿
èŠã§ã-
andThen
ãååž°çãªèµ°æ»äžã«ãã®ã³ã³ããã¹ããæž¡ãããšãã§ããŸãã ãŸãã
Match
ç¶æ
ã«åå ããç°¡åãªæ¹æ³ãæäŸããŸãã
Repeat
åŠçã«é¢ããŠã¯ãåé¡ããããŸãã
andThen
èªäœã¯åå²æŒç®åã§ãã ãã®åé¡ã«å¯ŸåŠããããã«ãåŸã§ãã€ã³ãã§ããããã«ãããã¬ãŒã¹ãã«ããŒãå°å
¥ããŸãã 次ã®ã¯ã©ã¹ã§ãã¬ãŒã¹ãã«ããŒãè¡šããŸãã
class Placeholder(var pointingTo: State) extends State
Placeholder
var
ã¯ã
pointingTo
å¯å€ã§ããããšãæå³ããŸãã ããã¯å¯å€æ§ã®å€ç«ããéšåã§ããã埪ç°ã°ã©ããç°¡åã«äœæã§ããŸãã ä»ã®ãã¹ãŠã®ã¡ã³ããŒã¯å€æŽãããŠããŸããã
ããããã
Match()
ã§ãã ããã¯ãæ£èŠè¡šçŸã«å¯Ÿå¿ããã¹ããŒããã·ã³ãäœæããããšãæå³ããŸãããã®ã¹ããŒããã·ã³ã¯ãåä¿¡ããŒã¿ãæ¶è²»ããããšãªã
Match
ç¶æ
ã«ç§»è¡ã§ããŸãã ã³ãŒãã¯çããªããŸããããªããã«ãªããŸãã
object NFA { def regexToNFA(regex: RegexExpr): State = regexToNFA(regex, Match()) private def regexToNFA(regex: RegexExpr, andThen: State): State = { regex match { case Literal(c) => new Consume(c, andThen) case Concat(first, second) => {
ãããŠããã ãã§ãïŒ ãã®éšåã®ã³ãŒãè¡ã«å¯Ÿããæã®æ¯çã¯éåžžã«é«ããåè¡ã¯å€ãã®æ
å ±ããšã³ã³ãŒãããŸããããã¹ãŠã¯åã®éšåã§èª¬æããå€æã«åž°çããŸãã 説æãã䟡å€ããããŸã-ç§ã¯ãã 座ã£ãŠãã®åœ¢åŒã§ã³ãŒããæžãçããã®ã§ã¯ãããŸããã ã³ãŒãã®ç°¡æœããšæ©èœæ§ã¯ãããŒã¿æ§é ãšã¢ã«ãŽãªãºã ã®æäœãæ°åç¹°ãè¿ããçµæã§ããã çŽç²ãªã³ãŒããæžãã®ã¯é£ããã§ãã
ãããã°ããã»ã¹äžã«ãNFAãã
dot
ãã¡ã€ã«ãçæããçãã¹ã¯ãªãããäœæããŠããããã°çšã«çæãããNFAã衚瀺ã§ããããã«ããŸããã ãã®ã³ãŒãã«ãã£ãŠçæãããNFAã«ã¯ãå€ãã®äžå¿
èŠãªé·ç§»ãšç¶æ
ãå«ãŸããŠããããšã«æ³šæããŠãã ããã èšäºãæžããšãã¯ãããã©ãŒãã³ã¹ãç ç²ã«ããŠããã³ãŒããã§ããã ãã·ã³ãã«ã«ããããã«ããŸããã åçŽãªæ£èŠè¡šçŸã®äŸã次ã«ç€ºããŸãã
(..)*
ab
a|b
NFAïŒæãé£ããéšåïŒãã§ããã®ã§ããããåæããå¿
èŠããããŸãïŒãã®éšåã¯ç°¡åã§ãïŒã
ããŒã3ïŒNFAåæ
NFAãDFAãããã³æ£èŠè¡šçŸ
第2éšã§ã¯ã確å®çãšé確å®çã®2çš®é¡ã®æéç¶æ
ãã·ã³ããããšè¿°ã¹ãŸããã äž»ãªéãã1ã€ãããŸããé決å®æ§æéç¶æ
ãã·ã³ã¯ã1ã€ã®ããŒã¯ã³ã«å¯ŸããŠ1ã€ã®ããŒããžã®è€æ°ã®ãã¹ãæã€ããšãã§ããŸãã ããã«ãå
¥åãåãåããã«ãã¹ããã©ãããšãã§ããŸãã è¡šçŸåïŒããã¯ãŒããšåŒã°ããããšãå€ãïŒã®ç¹ã§ã¯ãNFAãDFAãããã³æ£èŠè¡šçŸã¯äŒŒãŠããŸãã ããã¯ãNFAã䜿çšããŠã«ãŒã«ãŸãã¯ãã¿ãŒã³ïŒå¶æ°é·ã®æååãªã©ïŒãè¡šçŸã§ããå ŽåãDFAãŸãã¯æ£èŠè¡šçŸãéããŠè¡šçŸããããšãã§ããããšãæå³ããŸãã æåã«ãDFAãšããŠè¡šãããæ£èŠè¡šçŸ
abc*
èŠãŠã¿ãŸãããã
DFAåæã¯ç°¡åã§ã-å
¥åããŒã¿ã®æååãæ¶è²»ããããšã§ãç¶æ
ããç¶æ
ã«ç§»åããã ãã§ãã äžèŽããç¶æ
ã§å
¥åã®æ¶è²»ãçµäºããå ŽåãäžèŽããããŸãããããã§ãªãå Žåã¯äžèŽããŸããã äžæ¹ãã¹ããŒããã·ã³ã¯NFAã§ãã ãã®æ£èŠè¡šçŸã®ã³ãŒãã«ãã£ãŠçæãããNFAã¯æ¬¡ã®ãšããã§ãã
ã·ã³ãã«ãæ¶è²»ããã«æž¡ãããšãã§ããã©ãã«ã®ãªããšããžãããã€ãããããšã«æ³šæããŠãã ããã ããããå¹æçã«è¿œè·¡ããæ¹æ³ã¯ïŒ çãã¯é©ãã»ã©ç°¡åã§ããå¯èœãªç¶æ
ã1ã€ã ã远跡ããã®ã§ã¯ãªãããšã³ãžã³ãçŸåšååšããç¶æ
ã®ãªã¹ããä¿åããå¿
èŠããããŸãã åå²ãçºçããå Žåãäž¡æ¹ã®ãã¹ã«åŸãå¿
èŠããããŸãïŒ1ã€ã®ç¶æ
ã2ã€ã«å€ããïŒã ç¶æ
ã«çŸåšã®å
¥åã«å¯Ÿããæå¹ãªé·ç§»ããªãå Žåããªã¹ãããåé€ãããŸãã
ããã§ã2ã€ã®åŸ®åŠãªç¹ãèæ
®ããå¿
èŠããããŸããã°ã©ãå
ã®ç¡éã«ãŒãã®åé¿ãšãå
¥åããŒã¿ã®ãªãé·ç§»ã®æ£ããåŠçã§ãã
äžããããç¶æ
ãåæãããšããå
¥åããŒã¿ãããã«æ¶è²»ããªãå ŽåããŸããã¹ãŠã®ç¶æ
ã«ç§»åããŠãçŸåšã®ç¶æ
ããå°éå¯èœãªãã¹ãŠã®å¯èœãªç¶æ
ããªã¹ãããŸãããã®æ®µéã§ã¯ãã°ã©ãã®ç¡éã«ãŒããåé¿ããããã«ã泚æããŠãå€ãã®èšªåããç¶æããå¿
èŠããããŸãããããã®ãã¹ãŠã®ç¶æ
ãåæããããå
¥åããŒã¿ã®æ¬¡ã®ããŒã¯ã³ãæ¶è²»ããŸãããããã®ç¶æ
ã«è¡ããããªã¹ãããåé€ããŸãã object NFAEvaluator { def evaluate(nfa: State, input: String): Boolean = evaluate(Set(nfa), input) def evaluate(nfas: Set[State], input: String): Boolean = { input match { case "" => evaluateStates(nfas, None).exists(_ == Match()) case string => evaluate( evaluateStates(nfas, input.headOption), string.tail ) } } def evaluateStates(nfas: Set[State], input: Option[Char]): Set[State] = { val visitedStates = mutable.Set[State]() nfas.flatMap { state => evaluateState(state, input, visitedStates) } } def evaluateState(currentState: State, input: Option[Char], visitedStates: mutable.Set[State]): Set[State] = { if (visitedStates contains currentState) { Set() } else { visitedStates.add(currentState) currentState match { case placeholder: Placeholder => evaluateState( placeholder.pointingTo, input, visitedStates ) case consume: Consume => if (Some(consume.c) == input || consume.c == '.') { Set(consume.out) } else { Set() } case s: Split => evaluateState(s.out1, input, visitedStates) ++ evaluateState(s.out2, input, visitedStates) case m: Match => if (input.isDefined) Set() else Set(Match()) } } } }
ãããŠããã ãã§ãïŒååŒãå®äºãã
ãã¹ãŠã®éèŠãªã³ãŒãã¯å®äºããŸããããAPIã¯æãã§ããã»ã©ãããã§ã¯ãããŸãããããã§ãæ£èŠè¡šçŸãšã³ãžã³ãåŒã³åºãåäžåŒã³åºããŠãŒã¶ãŒã€ã³ã¿ãŒãã§ã€ã¹ãäœæããå¿
èŠããããŸãããŸããã©ã€ã³äžã®ä»»æã®å Žæã§ãã¿ãŒã³ãäžèŽãããæ©èœãè¿œå ããæ§æã·ã¥ã¬ãŒãå
±æããŸãã object Regex { def fullMatch(input: String, pattern: String) = { val parsed = RegexParser(pattern).getOrElse( throw new RuntimeException("Failed to parse regex") ) val nfa = NFA.regexToNFA(parsed) NFAEvaluator.evaluate(nfa, input) } def matchAnywhere(input: String, pattern: String) = fullMatch(input, ".*" + pattern + ".*") }
ãããŠã䜿çšãããã³ãŒãã¯æ¬¡ã®ãšããã§ãã Regex.fullMatch("aaaaab", "a*b")
ãããŠãç§ãã¡ã¯ããã§çµãããŸããããã§ãããã106è¡ã§éšåçã«æ©èœããæ£èŠè¡šçŸã®å®è£
ãã§ããŸãããããã«è©³çŽ°ãè¿œå ã§ããŸãããã³ãŒãã®äŸ¡å€ãé«ããããšãªãè€éããå¢ãããªãããšã«ããŸããã- ãã£ã©ã¯ã¿ãŒã¯ã©ã¹
- å€ã®ååŸ
?
- ãšã¹ã±ãŒãæå
- ãããŠãã¯ããã«ã
ãã®åçŽãªå®è£
ãæ£èŠè¡šçŸãšã³ãžã³ã®å
éšã§äœãèµ·ãã£ãŠããã®ããç解ããã®ã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãïŒéèš³ã®ããã©ãŒãã³ã¹ã¯å«ã§ãæ¬åœã«ã²ã©ãããšãèšåãã䟡å€ããããŸãããããããä»åŸã®æçš¿ã®1ã€ã§ããã®çç±ãåæããæé©åããæ¹æ³ã«ã€ããŠèª¬æããŸã...