ããã«ã¡ã¯ååïŒ
çºä¿¡å¹Žã®æåŸã®åºçç©ã§ãç§ãã¡ã¯ãã§ã«
æ¬ã翻蚳ããŠãããããã¯ã§ãã匷ååŠç¿ã«ã€ããŠèšåãããã£ãã
èªåã§å€æããŠãã ãããMediumã®åæ©çãªèšäºããããåé¡ã®ã³ã³ããã¹ããæŠèª¬ããPythonã§ã®å®è£
ã䌎ãæãåçŽãªã¢ã«ãŽãªãºã ã«ã€ããŠèª¬æããŸããã ãã®èšäºã«ã¯ããã€ãã®gifããããŸãã ãããŠãã¢ãããŒã·ã§ã³ãå ±é
¬ãæåãžã®éã®æ£ããæŠç¥ã®éžæã¯ãæ¥å¹Žã«ç§ãã¡äžäººäžäººã«ãšã£ãŠéåžžã«åœ¹ç«ã€ãã®ã§ãã
çŽ æµãªèªæžãïŒ
匷ååŠç¿ã¯ããšãŒãžã§ã³ããç°å¢ã§è¡åããããšãåŠç¿ããã¢ã¯ã·ã§ã³ãå®è¡ããããã«ãã£ãŠçŽæãéçºããã¿ã€ãã®æ©æ¢°åŠç¿ã§ãããã®åŸã圌ã¯ã¢ã¯ã·ã§ã³ã®çµæã芳å¯ããŸãã ãã®èšäºã§ã¯ã匷åã«ããåŠç¿ã®åé¡ãç解ããŠå®åŒåãããããPythonã§è§£æ±ºããæ¹æ³ã説æããŸãã
æè¿ãã³ã³ãã¥ãŒã¿ãŒã人éãšå¯ŸæŠãããšããäºå®ã«æ
£ããŠããŸãã-ãã«ããã¬ã€ã€ãŒã²ãŒã ã®ããããšããŠããŸãã¯äžå¯Ÿäžã®ã²ãŒã ã®ã©ã€ãã«ãšããŠïŒããšãã°ãDota2ãPUB-Gãããªãªã 2016幎ã«2016幎ã®AlphaGoããã°ã©ã ãéåœã®ãã£ã³ããªã³ãç Žã£ããšãã調æ»äŒç€Ÿã®
Deepmindã¯ãã¥ãŒã¹ã«ã€ããŠ
倧éšãããã ããªããç±å¿ãªã²ãŒããŒãªããDota 2 OpenAI Fiveã®5ã€ã®è©Šåã«ã€ããŠèãããšãã§ããŸããããã§ã¯ãè»ã人ãšæŠã£ãŠãããã€ãã®è©Šåã§Dota2ã®æé«ã®éžæãç ŽããŸããã ïŒè©³çŽ°ã«èå³ãããå Žåã¯ã
ããã§ã¢ã«ãŽãªãºã ã詳现ã«åæãããã·ã³ãã©ã®ããã«åçããããã調ã¹ãŸãïŒã
OpenAI Fiveã®ææ°ããŒãžã§ã³
ã¯Roshanãæ¡çšããŠããŸãã
ããã§ã¯ãäžå¿çãªè³ªåããå§ããŸãããã 匷åããããã¬ãŒãã³ã°ãå¿
èŠãªçç± ããã¯ã²ãŒã ã§ã®ã¿äœ¿çšãããŸããããŸãã¯é©çšãããåé¡ã解決ããããã®çŸå®çãªã·ããªãªã«é©çšã§ããŸããïŒ ãããåããŠåŒ·åãã¬ãŒãã³ã°ãèªãå Žåããããã®è³ªåã«å¯Ÿããçããæ³åããããšã¯ã§ããŸããã å®éã匷ååŠç¿ã¯ã人工ç¥èœã®åéã§æãåºã䜿çšãããæ¥éã«çºå±ããŠããæè¡ã®1ã€ã§ãã
匷ååŠç¿ã·ã¹ãã ãç¹ã«æ±ããããŠããåéã¯æ¬¡ã®ãšããã§ãã
- ç¡äººè»äž¡
- ã²ãŒã ç£æ¥
- ãããã£ã¯ã¹
- ã¬ã³ã¡ã³ããŒã·ã¹ãã
- åºåãšããŒã±ãã£ã³ã°
匷ååŠç¿ã®æŠèŠãšèæ¯ã§ã¯ãæ©æ¢°åŠç¿ãšãã£ãŒãã©ãŒãã³ã°ã®æ¹æ³ãèªç±ã«äœ¿ããããã«ãªã£ããšãã匷åã䌎ãåŠç¿çŸè±¡ã¯ã©ã®ããã«åœ¢ã«ãªã£ãã®ã§ããããïŒ ã圌ã¯ãªãããµããã³ãšãªããå士ã®ç 究ç£ç£è
ã§ããã¢ã³ããªã¥ãŒããŒãã«ãã£ãŠçºæãããå士å·ã®æºåãæ¯æŽããŸãããã ãã©ãã€ã ã¯1980幎代ã«æåã«åœ¢ã«ãªãããã®åŸå€é¢šãªãã®ã«ãªããŸããã ãã®åŸããªããã¯èªåã«ã¯çŽ æŽãããæªæ¥ããããæçµçã«ã¯èªãããããšä¿¡ããŠããŸããã
匷åãããåŠç¿ã¯ããããã€ãããç°å¢ã§ã®èªååããµããŒãããŸãã æ©æ¢°åŠç¿ãšãã£ãŒãã©ãŒãã³ã°ã¯ã©ã¡ããã»ãŒåãæ¹æ³ã§åäœããŸããæŠç¥çã«ç°ãªãé
眮ã«ãªã£ãŠããŸãããã©ã¡ãã®ãã©ãã€ã ãèªååããµããŒãããŠããŸãã ããã§ã¯ããªã匷åèšç·Žãè¡ãããã®ã§ããïŒ
ããã¯èªç¶ãªåŠç¿ããã»ã¹ãéåžžã«é£æ³ãããããã»ã¹/ã¢ãã«ãè¡åãã圌女ãã©ã®ããã«ã¿ã¹ã¯ã«å¯ŸåŠãããã«ã€ããŠã®ãã£ãŒãããã¯ãåãåããŸãïŒè¯ããã©ããã
æ©æ¢°åŠç¿ãšãã£ãŒãã©ãŒãã³ã°ããã¬ãŒãã³ã°ãªãã·ã§ã³ã§ãããå©çšå¯èœãªããŒã¿ã®ãã¿ãŒã³ãç¹å®ããããã«èª¿æŽãããŠããŸãã äžæ¹ã匷ååŠç¿ã§ã¯ããã®ãããªçµéšã¯è©Šè¡é¯èª€ãéããŠåŸãããŸãã ã·ã¹ãã ã¯åŸã
ã«é©åãªãªãã·ã§ã³ãŸãã¯ã°ããŒãã«æé©ãèŠã€ããŸãã 匷åãããåŠç¿ã®é倧ãªè¿œå ã®å©ç¹ã¯ããã®å Žåãæåž«ã«ããæå°ã®å Žåã®ããã«ãåºç¯ãªãã¬ãŒãã³ã°ããŒã¿ã®ã»ãããæäŸããå¿
èŠããªãããšã§ãã ããã€ãã®å°ããªæçã§ååã§ãã
匷ååŠç¿ã®æŠå¿µç«ã«æ°ããããªãã¯ãæããããšãæ³åããŠãã ããã ããããæ®å¿µãªããšã«ãç«ã¯äººéã®èšèªãç解ããŠããªãã®ã§ãããªãã¯åœŒããšäžç·ã«éãã§ãããã®ãåãäžããŠäŒããããšã¯ã§ããŸããã ãããã£ãŠãããªãã¯ç°ãªã£ãè¡åãããŸããç¶æ³ãç䌌ãããšãç«ã¯äœããã®åœ¢ã§å¿çããããšããŸãã ç«ãããªããæãããã«åå¿ããå Žåãããªãã¯ããã«ãã«ã¯ã泚ããŸãã 次ã«äœãèµ·ãããç解ããŠããŸããïŒ ç¹°ãè¿ããŸãããåæ§ã®ç¶æ³ã§ãç«ã¯åã³æã¿ã®ã¢ã¯ã·ã§ã³ãå®è¡ããããã«ç±æã蟌ããŠãããè¯ã絊é€ãæåŸ
ãããŸãã ããã¯ãããžãã£ããªäŸã§åŠç¿ãè¡ãããæ¹æ³ã§ãã ãããããã¬ãã£ããªã€ã³ã»ã³ãã£ããæã€ç«ããæè²ãããããšãããšãããšãã°ãå³å¯ã«èŠãŠãçãã²ãããå Žåããã®ãããªç¶æ³ã§ã¯éåžžåŠç¿ããŸããã
匷ååŠç¿ãåæ§ã«æ©èœããŸãã ãã·ã³ã«å
¥åãšã¢ã¯ã·ã§ã³ãäŒããåºåã«å¿ããŠãã·ã³ã«å ±é
¬ãäžããŸãã ç§ãã¡ã®ç©¶æ¥µã®ç®æšã¯ãå ±é
¬ãæ倧åããããšã§ãã 次ã«ã匷ååŠç¿ã®èŠ³ç¹ããäžèšã®åé¡ãåå®åŒåããæ¹æ³ãèŠãŠã¿ãŸãããã
- ç«ã¯ãç°å¢ãã«ãããããããšãŒãžã§ã³ãããšããŠæ©èœããŸãã
- ç°å¢ã¯ãããªããç«ã«æããŠãããã®ã«å¿ããŠã家åºãŸãã¯éã³å Žã§ãã
- ãã¬ãŒãã³ã°ããçããç¶æ³ã¯ãç¶æ
ããšåŒã°ããŸãã ç«ã®å Žåãæ¡ä»¶ã®äŸã¯ãç«ãããããã®äžãèµ°ãããŸãã¯ã "ãããšãã§ãã
- ãšãŒãžã§ã³ãã¯ãã¢ã¯ã·ã§ã³ãå®è¡ãããããç¶æ
ãããå¥ã®ãç¶æ
ãã«ç§»åããããšã§åå¿ããŸãã
- ç¶æ
ãå€æŽãããåŸããšãŒãžã§ã³ãã¯å®è¡ããã¢ã¯ã·ã§ã³ã«å¿ããŠãå ±é
¬ããŸãã¯ã眰éããåãåããŸãã
- ãæŠç¥ãã¯ãæè¯ã®çµæãåŸãããã®ã¢ã¯ã·ã§ã³ãéžæããããã®æ¹æ³è«ã§ãã
匷ååŠç¿ãšã¯äœããç解ããã®ã§ã匷ååŠç¿ãšæ·±å±€åŒ·ååŠç¿ã®èµ·æºãšé²åã«ã€ããŠè©³ãã説æãããã®ãã©ãã€ã ãã©ã®ããã«æåž«ãªãã§ãåŠç¿ã§ããªãåé¡ã解決ã§ããããè°è«ãã次ã®ããšã«æ³šæããŠãã ããå¥åŠãªäºå®ïŒçŸåšãGoogleã®æ€çŽ¢ãšã³ãžã³ã¯åŒ·ååŠç¿ã¢ã«ãŽãªãºã ã䜿çšããŠæé©åãããŠããŸãã
匷ååŠç¿ã®çšèªãç解ãããšãŒãžã§ã³ããšç°å¢ã¯ã匷ååŠç¿ã¢ã«ãŽãªãºã ã§éèŠãªåœ¹å²ãæãããŸãã ç°å¢ã¯ããšãŒãžã§ã³ããçãæ®ããªããã°ãªããªãäžçã§ãã ããã«ããšãŒãžã§ã³ãã¯ç°å¢ïŒå ±é
¬ïŒãã匷åä¿¡å·ãåãåããŸããããã¯ãäžçã®çŸåšã®ç¶æ
ãã©ãã ãè¯ããæªããã説æããæ°å€ã§ãã ãšãŒãžã§ã³ãã®ç®çã¯ãç·å ±é
¬ããããããã²ã€ã³ããæ倧åããããšã§ãã æåã®åŒ·ååŠç¿ã¢ã«ãŽãªãºã ãäœæããåã«ã次ã®çšèªãç解ããå¿
èŠããããŸãã
- ç¶æ
ïŒç¶æ
ã¯ããã®äžçãç¹åŸŽä»ããæ
å ±ã®åäžã®æçãæ¬ èœããŠããªãäžçã®å®å
šãªèšè¿°ã§ãã åºå®ãŸãã¯åçã®äœçœ®ãå¯èœã§ãã ååãšããŠããã®ãããªç¶æ
ã¯ãé«æ¬¡ã®é
åãè¡åããŸãã¯ãã³ãœã«ã®åœ¢åŒã§èšè¿°ãããŸãã
- ã¢ã¯ã·ã§ã³ ïŒéåžžãã¢ã¯ã·ã§ã³ã¯ç°å¢æ¡ä»¶ã«äŸåããç°ãªãç°å¢ã§ã¯ãšãŒãžã§ã³ããç°ãªãã¢ã¯ã·ã§ã³ãå®è¡ããŸãã å€ãã®æå¹ãªãšãŒãžã§ã³ãã¢ã¯ã·ã§ã³ã¯ããã¢ã¯ã·ã§ã³ã¹ããŒã¹ããšåŒã°ããã¹ããŒã¹ã«èšé²ãããŸãã éåžžã空éå
ã®ã¢ã¯ã·ã§ã³ã®æ°ã¯æéã§ãã
- ç°å¢ ïŒããã¯ããšãŒãžã§ã³ããååšãããšãŒãžã§ã³ããšå¯Ÿè©±ããå Žæã§ãã ããŸããŸãªçš®é¡ã®å ±é
¬ãæŠç¥ãªã©ãããŸããŸãªç°å¢ã«äœ¿çšãããŸãã
- å ±é
¬ãšè³é ïŒåŒ·åã§ãã¬ãŒãã³ã°ããå Žåãå ±é
¬é¢æ°Rãåžžã«ç£èŠããå¿
èŠããããŸãã ã¢ã«ãŽãªãºã ãèšå®ããæé©åãããšãããããŠåŠç¿ãããããšããéèŠã§ãã ããã¯ãäžçã®çŸåšã®ç¶æ
ãä»ãšã£ãè¡åããããŠæ¬¡ã®äžçã®ç¶æ
ã«äŸåããŸãã
- æŠç¥ ïŒæŠç¥ã¯ããšãŒãžã§ã³ãã次ã®ã¢ã¯ã·ã§ã³ãéžæããã«ãŒã«ã§ãã æŠç¥ã®ã»ããã¯ããšãŒãžã§ã³ãã®ãé è³ããšãåŒã°ããŸãã
匷ååŠç¿ã®çšèªã«ç²Ÿéããã®ã§ãé©åãªã¢ã«ãŽãªãºã ã䜿çšããŠåé¡ã解決ããŸãããã ãã®åã«ããã®ãããªåé¡ãå®åŒåããæ¹æ³ãç解ããå¿
èŠãããããã®åé¡ã解決ãããšãã¯ã匷åã䌎ããã¬ãŒãã³ã°ã®çšèªã«äŸåããŸãã
ã¿ã¯ã·ãŒãœãªã¥ãŒã·ã§ã³ããã§ã匷åã¢ã«ãŽãªãºã ã䜿çšããŠåé¡ã解決ããŸãã
ç¡äººã¿ã¯ã·ãŒçšã®ãã¬ãŒãã³ã°ãŸãŒã³ãããã4ã€ã®ç°ãªããã€ã³ãïŒ
R,G,Y,B
ïŒã§é§è»å Žã«ä¹å®¢ãåŒãæž¡ãããã«ãã¬ãŒãã³ã°ãŸãŒã³ããããšããŸãã ãã®åã«ãPythonã§ããã°ã©ãã³ã°ãéå§ããç°å¢ãç解ããŠèšå®ããå¿
èŠããããŸãã PythonãåŠç¿ãå§ããã°ããã®å Žåã¯ã
ãã®èšäºããå§ã
ããŸã ã
ã¿ã¯ã·ãŒã®åé¡ã解決ããããã®ç°å¢ã¯ãOpenAIã®
Gymã䜿çšããŠæ§æã§ããŸããããã¯ã匷åãã¬ãŒãã³ã°ã®åé¡ã解決ããããã®æãäžè¬çãªã©ã€ãã©ãªã®1ã€ã§ãã ãžã ã䜿çšããåã«ããã·ã³ã«ã€ã³ã¹ããŒã«ããå¿
èŠããããŸããããã«ã¯ãpipãšããPythonããã±ãŒãžãããŒãžã£ãŒã䟿å©ã§ãã 以äžã¯ã€ã³ã¹ããŒã«ã³ãã³ãã§ãã
pip install gym
次ã«ãç°å¢ãã©ã®ããã«è¡šç€ºãããããèŠãŠã¿ãŸãããã ãã®ã¿ã¹ã¯ã®ãã¹ãŠã®ã¢ãã«ãšã€ã³ã¿ãŒãã§ãŒã¹ã¯æ¢ã«gymã§æ§æãããŠããã
Taxi-V2
äžã«ååãä»ããããŠããŸãã 以äžã®ã³ãŒãã¹ããããã¯ããã®ç°å¢ã衚瀺ããããã«äœ¿çšãããŸãã
ã4ã€ã®å ŽæããããŸãïŒç°ãªãæåã§ç€ºãããŠããŸãïŒã ç§ãã¡ã®ä»äºã¯ãããå°ç¹ã§ä¹å®¢ãè¿ããå¥ã®å°ç¹ã§éããããšã§ãã ä¹å®¢ã®éè»ã«æåãããš+20ãã€ã³ããç²åŸããããã«è²»ãããã¹ãããããšã«1ãã€ã³ãã倱ããŸãã ãŸããæå³ããªãä¹å®¢ã®æä¹ããã³éè»ããšã«10ãã€ã³ãã®ããã«ãã£ããããŸãããïŒåºå
žïŒ
gym.openai.com/envs/Taxi-v2 ïŒ
ã³ã³ãœãŒã«ã«è¡šç€ºãããåºåã¯æ¬¡ã®ãšããã§ãã
ã¿ã¯ã·ãŒV2 ENV
çŽ æŽãããã
env
ã¯OpenAi Gymã®å¿èéšã§ãããçµ±åãããç°å¢ã€ã³ã¿ãŒãã§ãŒã¹ã§ãã 以äžã¯ã圹ã«ç«ã€ãšæãããenvã¡ãœããã§ãã
env.reset
ïŒç°å¢ããªã»ããããã©ã³ãã ãªåæç¶æ
ãè¿ããŸãã
env.step(action)
ïŒç°å¢
env.step(action)
éçºã1ã¹ãããé²ããŸãã
env.step(action)
ïŒæ¬¡ã®å€æ°ãè¿ããŸã
observation
ïŒç°å¢ã®èŠ³å¯ãreward
ïŒããªãã®è¡åãæçã§ãã£ããã©ãããreward
ãŸããdone
ïŒã1ãšããœãŒãããšãåŒã°ãããä¹å®¢ãé©åã«ä¹éããããã©ããã瀺ããŸããinfo
ïŒãããã°ç®çã«å¿
èŠãªããã©ãŒãã³ã¹ãé
延ãªã©ã®è¿œå æ
å ±ãenv.render
ïŒç°å¢ã®1ãã¬ãŒã ã衚瀺ããŸãïŒã¬ã³ããªã³ã°ã«åœ¹ç«ã¡ãŸãïŒ
ããã§ã¯ãç°å¢ã調ã¹ãŠãåé¡ãããããç解ããŠã¿ãŸãããã ã¿ã¯ã·ãŒã¯ãã®é§è»å Žã§å¯äžã®è»ã§ãã é§è»å Žã¯
5x5
ã°ãªããã«åå²ã§ãã25ã®ã¿ã¯ã·ãŒã®å ŽæãååŸã§ããŸãã ãããã®25åã®å€ã¯ãç¶æ
空éã®èŠçŽ ã®1ã€ã§ãã 泚ïŒçŸæç¹ã§ã¯ãã¿ã¯ã·ãŒã¯åº§æšïŒ3ã1ïŒã®å°ç¹ã«ãããŸãã
ä¹å®¢ãæä¹ã§ããç°å¢ã«ã¯4ã€ã®ãã€ã³ãããããŸãïŒ
R, G, Y, B
ãŸãã¯
[(0,0), (0,4), (4,0), (4,3)]
座æšïŒæ°Žå¹³ãåçŽïŒãäžèšã®ç°å¢ããã«ã«ã座æšã§è§£éã§ããå Žåã ã¿ã¯ã·ãŒå
ã§ä¹å®¢ã®ç¶æ
ããã1ã€èæ
®ããå Žåã¯ãã¿ã¯ã·ãŒã®èšç·Žã®ããã«ç°å¢å
ã®å·ã®ç·æ°ãèšç®ããããã«ãä¹å®¢ã®å Žæãšç®çå°ã®ãã¹ãŠã®çµã¿åãããååŸã§ããŸãã4ã€ã®ç®çå°ãš5ã€ã®ïŒ4+ 1ïŒä¹å®¢ã®å Žæã
ãããã£ãŠãã¿ã¯ã·ãŒã®ç°å¢ã§ã¯ã5Ã5Ã5Ã4 = 500ã®å¯èœãªç¶æ
ããããŸãã ãšãŒãžã§ã³ãã¯500ã®æ¡ä»¶ã®ãããããåŠçããã¢ã¯ã·ã§ã³ãå®è¡ããŸãã ç§ãã¡ã®å Žåãéžæè¢ã¯æ¬¡ã®ãšããã§ãã1ã€ã®æ¹åãŸãã¯å¥ã®æ¹åã«ç§»åããããä¹å®¢ãä¹éããã決å®ã§ãã ã€ãŸãã次ã®6ã€ã®å¯èœãªã¢ã¯ã·ã§ã³ãèªç±ã«äœ¿çšã§ããŸãã
ããã¯ã¢ããããããããåãæ±ãåã西ïŒæåŸã®4ã€ã®å€ã¯ãã¿ã¯ã·ãŒã移åã§ããæ¹åã§ãïŒ
ããã¯
action space
ã§ãããšãŒãžã§ã³ããç¹å®ã®ç¶æ
ã§å®è¡ã§ãããã¹ãŠã®ã¢ã¯ã·ã§ã³ã®ã»ããã§ãã
äžã®å³ããæãããªããã«ãã¿ã¯ã·ãŒã¯ç¶æ³ã«ãã£ãŠã¯ç¹å®ã®ã¢ã¯ã·ã§ã³ãå®è¡ã§ããŸããïŒå£ãå¹²æžããŸãïŒã ç°å¢ãèšè¿°ããã³ãŒãã§ã¯ãå£ã®ãããããšã«-1ã®ããã«ãã£ãå²ãåœãŠãã¿ã¯ã·ãŒãå£ã«è¡çªããŸãã ãããã£ãŠããã®ãããªçœ°éã¯çŽ¯ç©ããã®ã§ãã¿ã¯ã·ãŒã¯å£ã«ã¶ã€ãããªãããã«ããŸãã
å ±é
¬è¡šïŒã¿ã¯ã·ãŒç°å¢ãäœæãããšãã«ãPãšåŒã°ãããã©ã€ããªå ±é
¬è¡šãäœæã§ããŸããããã¯ãç¶æ
ã®æ°ãè¡ã®æ°ã«å¯Ÿå¿ããã¢ã¯ã·ã§ã³ã®æ°ãåã®æ°ã«å¯Ÿå¿ãããããªãã¯ã¹ãšèããããšãã§ããŸãã ã€ãŸãã
states à actions
ãããªãã¯ã¹ã«ã€ããŠè©±ããŠ
states à actions
ã§ãã
絶察ã«ãã¹ãŠã®æ¡ä»¶ããã®ãããªãã¯ã¹ã«èšé²ãããŠããããã説æããããã«éžæããå·ã«å²ãåœãŠãããå ±é
¬ã®ããã©ã«ãå€ã衚瀺ã§ããŸãã
>>> import gym >>> env = gym.make("Taxi-v2").env >>> env.P[328] {0: [(1.0, 433, -1, False)], 1: [(1.0, 233, -1, False)], 2: [(1.0, 353, -1, False)], 3: [(1.0, 333, -1, False)], 4: [(1.0, 333, -10, False)], 5: [(1.0, 333, -10, False)] }
ãã®èŸæžã®æ§é ã¯æ¬¡ã®ãšããã§ãïŒ
{action: [(probability, nextstate, reward, done)]}
ã
- å€0ã5ã¯ãå³ã«ç€ºãçŸåšã®ç¶æ
ã§ã¿ã¯ã·ãŒãå®è¡ã§ããã¢ã¯ã·ã§ã³ïŒåãåãæ±ã西ãããã¯ã¢ããããããããªãïŒã«å¯Ÿå¿ããŠããŸãã
- doneã䜿çšãããšãç®çã®å°ç¹ã§ä¹å®¢ãéããããšãã«å€æã§ããŸãã
匷åãã¬ãŒãã³ã°ãªãã§ãã®åé¡ã解決ããã«ã¯ãã¿ãŒã²ããç¶æ
ãèšå®ããã¹ããŒã¹ãéžæããŠãããäœåºŠãç¹°ãè¿ããŠã¿ãŒã²ããç¶æ
ã«å°éã§ããå Žåããã®ç¬éãæå€§å ±é
¬ã«å¯Ÿå¿ãããšä»®å®ããŸãã ä»ã®å·ã§ã¯ãããã°ã©ã ãæ£ããæ©èœããïŒç®æšã«è¿ã¥ãïŒå Žåãå ±é
¬ã®äŸ¡å€ã¯æ倧ã«è¿ã¥ããããã¹ãç¯ããå Žåã眰éã环ç©ããŸãã ããã«ã眰éã®å€ã¯-10ã«éããããšãã§ããŸãã
匷åãã¬ãŒãã³ã°ãªãã§ãã®åé¡ã解決ããã³ãŒããæžããŸãããã
åå·ã®ããã©ã«ãã®å ±é
¬å€ãæã€PããŒãã«ãããã®ã§ããã®ããŒãã«ã«åºã¥ããŠã¿ã¯ã·ãŒã®ããã²ãŒã·ã§ã³ãæŽçããããšãã§ããŸãã
ä¹å®¢ãç®çå°ïŒ1ãšããœãŒãïŒã«å°éãããŸã§ãã€ãŸããå ±é
¬çã20ã«å°éãããŸã§ã¹ã¯ããŒã«ãããç¡éã«ãŒããäœæããŸã
env.action_space.sample()
ã¡ãœããã¯ã䜿çšå¯èœãªãã¹ãŠã®ã¢ã¯ã·ã§ã³ã®ã»ããããã©ã³ãã ã¢ã¯ã·ã§ã³ãèªåçã«éžæããŸãã ã äœãèµ·ãããèããŠãã ããïŒ
import gym from time import sleep
çµè«ïŒ
ã¯ã¬ãžããïŒOpenAI
åé¡ã¯è§£æ±ºãããŠããŸãããæé©åãããŠããŸããããŸãã¯ããã®ã¢ã«ãŽãªãºã ã¯ãã¹ãŠã®å Žåã«æ©èœããŸããã åé¡ã解決ããããã«ãã·ã³/ã¢ã«ãŽãªãºã ã«ãã£ãŠè²»ããããå埩ã®æ°ãæå°éã«æããããããã«ãé©åãªçžäºäœçšãšãŒãžã§ã³ããå¿
èŠã§ãã ããã§ã¯ãQåŠç¿ã¢ã«ãŽãªãºã ã圹ç«ã¡ãŸãããã®å®è£
ã«ã€ããŠã¯ã次ã®ã»ã¯ã·ã§ã³ã§æ€èšããŸãã
Qã©ãŒãã³ã°ã®çŽ¹ä»ä»¥äžã¯ãæã人æ°ããããæãåçŽãªåŒ·ååŠç¿ã¢ã«ãŽãªãºã ã®1ã€ã§ãã ç°å¢ã¯ã段éçãªãã¬ãŒãã³ã°ãšãç¹å®ã®ç¶æ
ã§åœŒãæãæé©ãªã¹ããããèžããšããäºå®ã«å¯ŸããŠãšãŒãžã§ã³ãã«å ±é
¬ãäžããŸãã äžèšã®å®è£
ã§ã¯ããšãŒãžã§ã³ããåŠç¿ããå ±é
¬ããŒãã«ãPãããããŸããã å ±é
¬ããŒãã«ã«åºã¥ããŠã圌ã¯ãããã©ãã»ã©æçšãã«ãã£ãŠæ¬¡ã®ã¢ã¯ã·ã§ã³ãéžæããQå€ãšåŒã°ããå¥ã®å€ãæŽæ°ããŸãã ãã®çµæãQããŒãã«ãšåŒã°ããæ°ããããŒãã«ãäœæãããçµã¿åããïŒã¹ããŒã¿ã¹ãã¢ã¯ã·ã§ã³ïŒã«è¡šç€ºãããŸãã Qå€ãåªããŠããå Žåãããæé©åãããå ±é
¬ãåŸãããŸãã
ããšãã°ãã¿ã¯ã·ãŒãä¹å®¢ãã¿ã¯ã·ãŒãšåãå°ç¹ã«ããç¶æ
ã«ããå Žåããããã¯ã¢ãããã¢ã¯ã·ã§ã³ã®Qå€ã¯ããä¹å®¢ãéãããããåãžè¡ãããªã©ã®ä»ã®ã¢ã¯ã·ã§ã³ãããé«ãå¯èœæ§ãéåžžã«é«ããã
Qå€ã¯ã©ã³ãã ãªå€ã§åæåããããšãŒãžã§ã³ããç°å¢ãšå¯Ÿè©±ããç¹å®ã®ã¢ã¯ã·ã§ã³ãå®è¡ããããšã§ããŸããŸãªå ±é
¬ãåãåããšãQå€ã¯æ¬¡ã®åŒã«åŸã£ãŠæŽæ°ãããŸãã
ããã«ãããQå€ãåæåããæ¹æ³ãšããããèšç®ããæ¹æ³ãåé¡ã«ãªããŸãã ã¢ã¯ã·ã§ã³ãé²ããšããã®æ¹çšåŒã§Qå€ãå®è¡ãããŸãã
ããã§ãAlphaãšGammaã¯QåŠç¿ã¢ã«ãŽãªãºã ã®ãã©ã¡ãŒã¿ãŒã§ãã ã¢ã«ãã¡ã¯åŠç¿ã®ããŒã¹ã§ãããã¬ã³ãã¯å²åŒçã§ãã äž¡æ¹ã®å€ã®ç¯å²ã¯0ã1ã§ããã1ã«çããå ŽåããããŸãã æŽæ°äžã®æ倱ã®å€ãè£æ£ããå¿
èŠããããããã¬ã³ãã¯ãŒãã«ããããšãã§ããŸãããã¢ã«ãã¡ã¯ã§ããŸããïŒåŠç¿çã¯æ£ã§ãïŒã ããã§ã®ã¢ã«ãã¡å€ã¯ãå
çã«æãããšããšåãã§ãã ã¬ã³ãã¯ãå°æ¥ç§ãã¡ãåŸ
ã£ãŠããå ±é
¬ãäžããããšã®éèŠæ§ã決å®ããŸãã
ãã®ã¢ã«ãŽãªãºã ã®æŠèŠã¯æ¬¡ã®ãšããã§ãã
- ã¹ããã1ïŒQããŒãã«ãåæåãããŒãã§åããQå€ã«å¯ŸããŠä»»æã®å®æ°ãèšå®ããŸãã
- ã¹ããã2ïŒæ¬¡ã«ããšãŒãžã§ã³ããç°å¢ã«å¿çããããŸããŸãªã¢ã¯ã·ã§ã³ãè©Šãããã«ããŸãã ç¶æ
ã®å€æŽããšã«ããã®ç¶æ
ïŒSïŒã§å¯èœãªãã¹ãŠã®ã¢ã¯ã·ã§ã³ã®1ã€ãéžæããŸãã
- ã¹ããã3ïŒåã®ã¢ã¯ã·ã§ã³ïŒaïŒã®çµæã«åºã¥ããŠã次ã®ç¶æ
ïŒS 'ïŒã«é²ã¿ãŸãã
- ã¹ããã4ïŒç¶æ
ïŒS 'ïŒããå¯èœãªãã¹ãŠã®ã¢ã¯ã·ã§ã³ã«ã€ããŠãQå€ãæãé«ãã¢ã¯ã·ã§ã³ãéžæããŸãã
- ã¹ããã5ïŒäžèšã®åŒã«åŸã£ãŠQããŒãã«å€ãæŽæ°ããŸãã
- ã¹ããã6ïŒæ¬¡ã®ç¶æ
ãçŸåšã®ç¶æ
ã«ããŸãã
- ã¹ããã7ïŒã¿ãŒã²ããç¶æ
ã«éããå Žåãããã»ã¹ãå®äºããŠããç¹°ãè¿ããŸãã
Python QåŠç¿ import gym import numpy as np import random from IPython.display import clear_output
ããã§ããã¹ãŠã®å€ã
q_table
å€æ°ã«ä¿åãããŸãã
ãã®ãããã¢ãã«ã¯ç°å¢æ¡ä»¶ã§èšç·Žãããä¹å®¢ãããæ£ç¢ºã«éžæããæ¹æ³ãããããŸããã ãããŠã匷ååŠç¿ã®çŸè±¡ã«ç²Ÿéããã¢ã«ãŽãªãºã ãããã°ã©ã ããŠæ°ããåé¡ã解決ã§ããŸãã
ãã®ä»ã®åŒ·ååŠç¿ãã¯ããã¯ïŒ
- ãã«ã³ãææ決å®ããã»ã¹ïŒMDPïŒãšãã«ãã³æ¹çšåŒ
- åçããã°ã©ãã³ã°ïŒã¢ãã«ããŒã¹ã®RLãæŠç¥ã®å埩ãããã³å€ã®å埩
- ãã£ãŒãQãã¬ãŒãã³ã°
- æŠç¥åŸé
éäžæ³
- ãµã«ãµ
ãã®æŒç¿ã®ã³ãŒãã¯æ¬¡ã®å Žæã«ãããŸãã
vihar / python-reinforcement-learning