åã®èšäºã§ãåŸé
ãæç€ºçã«èšç®ããããšãªã颿°ãã©ã¡ãŒã¿ãŒãæé©åããã®ã«åœ¹ç«ã€ããã€ãã®é²åæŠç¥ïŒESïŒã¢ã«ãŽãªãºã ã«ã€ããŠèª¬æããŸããã 匷ååŠç¿ïŒRLïŒã®åé¡ã解決ãããšãããããã®ã¢ã«ãŽãªãºã ã䜿çšããŠããã¥ãŒã©ã«ãããã¯ãŒã¯ãšãŒãžã§ã³ãã®ã¢ãã«ãã©ã¡ãŒã¿ãŒã®é©åãªã»ãããèŠã€ããããšãã§ããŸãã ãã®èšäºã§ã¯ãäžéšã®RLã¿ã¹ã¯ã§ã®ESã®äœ¿çšã«ã€ããŠèª¬æããããå®å®ããå®å®ããããªã·ãŒãèŠã€ããæ¹æ³ã«ã€ããŠã説æããŸãã
匷ååŠç¿æŠç¥
RLã¢ã«ãŽãªãºã ã¯åã¯ããã¯ãµã€ã¯ã«ã§ãšãŒãžã§ã³ãã«éä¿¡ãããå ±é
¬ä¿¡å·ãå¿
èŠãšããããããããã®ã¢ã«ãŽãªãºã ã§ã¯ããšãŒãžã§ã³ããç°å¢ã§å®è¡ãããåŸã«ãšãŒãžã§ã³ããåä¿¡ãã环ç©åŒ·åã®ã¿ãéèŠã§ãã å€ãã®å ŽåããšãŒãžã§ã³ããæ£åžžã«åäœãããã©ããããããããããã¥ã¬ãŒã¿ãŒããªããžã§ã¯ããååŸãããã©ããããšãŒãžã§ã³ããçãæ®ã£ããã©ãããªã©ãã¿ã¹ã¯ã®æåŸã®åºåã®ã¿ãããããŸãããããã®ãã¹ãŠã®ã¿ã¹ã¯ã§ãESã¯åŸæ¥ã®RLããã广çã§ãã 以äžã«ããšãŒãžã§ã³ãã®å®è¡ãOpenAIãžã ç°å¢ã«ã«ãã»ã«åãããæ¬äŒŒã³ãŒãã瀺ããŸããã ããã§ã¯ã环ç©åŒ·åã«ã®ã¿é¢å¿ããããŸãã
def rollout(agent, env): obs = env.reset() done = False total_reward = 0 while not done: a = agent.get_action(obs) obs, reward, done = env.step(a) total_reward += reward return total_reward
åã®èšäºã§èª¬æããããã«ã rollout
ããšãŒãžã§ã³ãã¢ãã«ã®ãã©ã¡ãŒã¿ãŒãšãã£ãããã¹ãã€ã³ããæ¯èŒããç®ç颿°ãšããŠå®çŸ©ããESãœã«ããŒã䜿çšããŠé©åãªãã©ã¡ãŒã¿ãŒã»ãããèŠã€ããããšãã§ããŸãã
env = gym.make('worlddomination-v0') # use our favourite ES solver = EvolutionStrategy() while True: # ask the ES to give set of params solutions = solver.ask() # create array to hold the results fitlist = np.zeros(solver.popsize) # evaluate for each given solution for i in range(solver.popsize): # init the agent with a solution agent = Agent(solutions[i]) # rollout env with this agent fitlist[i] = rollout(agent, env) # give scores results back to ES solver.tell(fitness_list) # get best param & fitness from ES bestsol, bestfit = solver.result() # see if our task is solved if bestfit > MY_REQUIREMENT: break
決å®è«çããã³ç¢ºçè«çããªã·ãŒ
ç°å¢ã芳å¯ããçµæã¯ãšãŒãžã§ã³ããžã®å
¥åã§ãããåºåã¯ç°å¢å
ã§ã®å®è¡äžã®åãµã€ã¯ã«ã§ã®å¹æã§ãã å¿
èŠã«å¿ããŠãšãŒãžã§ã³ããã·ãã¥ã¬ãŒãããã³ãŒãããã£ã·ãžã§ã³ããªãŒããªã«ã¬ã³ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®ç·åœ¢é¢æ°ã§èŠå®ãããŠããã«ãŒã«ã®ã¡ãœããã䜿çšã§ããŸãã ãã®èšäºã§ã¯ããšãŒãžã§ã³ãïŒãã¯ãã«x ïŒã®ç£èŠçµæãã¢ã¯ã·ã§ã³ïŒãã¯ãã«y ïŒã«çŽæ¥å€æããããã«ã2ã€ã®é衚瀺ã¬ãã«ãæã€åçŽãªãã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ã䜿çšããŸãã
h 1 = f h ïŒW 1 x + b 1 ïŒ
h 2 = f h ïŒW 2 h 1 + b 2 ïŒ
y = f out ïŒW out h 2 + b out ïŒ
ã¢ã¯ãã£ããŒã·ã§ã³é¢æ°f hããã³f outã¯ã tanh
ã sigmoid
ã relu
ãŸãã¯ä»ã®ããããã§ãã ãã¹ãŠã®å®éšã§ã tanh
ã䜿çšããŸããã ãã®ãããªå¿
èŠæ§ãããå Žåãéç·åœ¢æ§ã®ãªããã¹ã¹ã«ãŒé¢æ°ãšããŠåºåã¬ãã«ã§fãåãåºãããšãã§ããŸãã ãã¹ãŠã®éã¿ãšãã€ã¢ã¹ãã©ã¡ãŒã¿ãŒãåäžã®ãã¯ãã«Wã«é£çµãããšãäžèšã®ãã¥ãŒã©ã«ãããã¯ãŒã¯ã決å®é¢æ°y = FïŒxãWïŒã§ããããšãããããŸãã æ¬¡ã«ãESã䜿çšããŠãåã®èšäºã§èª¬æããæ€çŽ¢ãµã€ã¯ã«ã䜿çšããŠãœãªã¥ãŒã·ã§ã³WãèŠã€ããããšãã§ããŸãã
ãšãŒãžã§ã³ãã®ããªã·ãŒã決å®è«çã«ããããªãå Žåã¯ã©ãããŸããïŒ ãããããã®ãããªåçŽãªã¿ã¹ã¯ã§ããæé©ãªããªã·ãŒã¯ã©ã³ãã ã¢ã¯ã·ã§ã³ã§ãã ã€ãŸãããšãŒãžã§ã³ãã¯ç¢ºçè«çãªæ¿æ²»ãåŠã°ãªããã°ãªããŸããã y = FïŒxãWïŒã確ççããªã·ãŒã«å€ãã1ã€ã®æ¹æ³ã¯ãWãã©ã³ãã ã«ããããšã§ãã ã¢ãã«ã®åãã©ã¡ãŒã¿ãŒw iâWã¯ãæ£èŠååžNïŒÎŒiãÏiïŒããååŸããã©ã³ãã ãªå€ã«ããããšãã§ããŸãã
ãã®ã¿ã€ãã®ç¢ºççãã¥ãŒã©ã«ãããã¯ãŒã¯ã¯ã ãã€ãžã¢ã³ãã¥ãŒã©ã«ãããã¯ãŒã¯ãšåŒã°ããŸã ã ããã¯ãäºåã«éã¿ä»ããããé
ä¿¡ãããã¯ãŒã¯ã§ãã ãã®å Žåãè§£ãæ¢ããŠããã¢ãã«ã®ãã©ã¡ãŒã¿ãŒã¯ãéã¿Wã§ã¯ãªãããã¯ãã«ÎŒãšÏã®ã»ããã§ãããã¥ãŒã©ã«ãããã¯ãŒã¯ã®åãã¹ã§ãNïŒÎŒãÏIïŒããæ°ããWãååŸããŸããããŸããŸãªã¿ã¹ã¯ãããã³ãããã®ãããã¯ãŒã¯ã®ãã¬ãŒãã³ã°ã®åé¡ ã ESã䜿çšããŠãWã®ä»£ããã«ÎŒãšÏãèšå®ããããšã«ããã確çè«çããªã·ãŒã®ãœãªã¥ãŒã·ã§ã³ãçŽæ¥èŠã€ããããšãã§ããŸãã
確çãããã¯ãŒã¯ã¯ãRLã«æ§ããããäœåã«ããèŠãããŸãã ããšãã°ã Proximal Policy OptimizationïŒPPOïŒã¢ã«ãŽãªãºã ã§ã¯ãæåŸã®ã¬ãã«ã¯ãã©ã¡ãŒã¿ãŒã®ã»ããÎŒãšÏãããã³NïŒÎŒãÏIïŒããéžæãããã¢ã¯ã·ã§ã³ã§ãã ãã©ã¡ãŒã¿ã«ãã€ãºã远å ãããšããšãŒãžã§ã³ããåºæ¿ããŠç°å¢ãæ¢çŽ¢ãã屿çãªæé©åãåé¿ã§ããŸãã
ãšãŒãžã§ã³ããåªäœã調æ»ããå¿
èŠãããå ŽåãWãã¯ãã«ãå®å
šã«ã©ã³ãã ã§ããå¿
èŠã¯ãªãããšãå€ãããšãããããŸããããã€ã¢ã¹ã ãã§ååã§ãã ããšãã°ã roboschoolç°å¢ãªã©ã®ç§»åã«é¢é£ããå°é£ãªã¿ã¹ã¯ã§ã¯ãESã䜿çšããŠãæ£èŠååžãããšã©ãŒãã©ã¡ãŒã¿ãŒã®ã¿ãæœåºãã確ççããªã·ãŒãèŠã€ããå¿
èŠããããŸãã
äºè¶³æ©è¡åšã®æç¶å¯èœæ§ããªã·ãŒã®éçº
ããã¯ãæç¶å¯èœæ§ããªã·ãŒãèŠã€ããããã«ESã圹ç«ã€åéã®1ã€ã§ãã ã©ã³ãã ãªè©Šè¡ãäœåºŠãç¹°ãè¿ããŠãããŒã¿ã®ããã©ãŒãã³ã¹ãšããªã·ãŒã®æç¶å¯èœæ§ã®ãã©ã³ã¹ã管çããããšèããŠããŸãã Oleg KlimovãéçºããåªããBipedalWalkerHardcore-v2ç°å¢ã§ESããã¹ãããŸããã ãã®ç°å¢ã§ã¯ã Angry Birdsã§ã䜿çšãããŠããBox2Dç©çãšã³ãžã³ã䜿çšããŠããŸãã
ãšãŒãžã§ã³ããBipedalWalkerHardcore-v2ãæ±ºå®ããŸãã ã
ãã®ç°å¢ã§ããšãŒãžã§ã³ãã¯ãæéå¶éã®ããã©ã³ãã ã«çæãããé害ç©ã³ãŒã¹ã転åããã«æ©ãããªã·ãŒãç ç©¶ããŸããã 24åã®å
¥åä¿¡å·ã䜿çšãããŸããïŒ10åã®LIDARã»ã³ãµãŒãè§åºŠãããã³è¡šé¢ãšã®æ¥è§Šã ãšãŒãžã§ã³ãã¯ãã«ãŒãã®ã©ãã«ããããç¥ããŸããã ã¢ã¯ã·ã§ã³ã¹ããŒã¹ïŒã¢ã¯ã·ã§ã³ã¹ããŒã¹ïŒã¯ã4ã€ã®ã¢ãŒã¿ãŒã®ãã«ã¯ãå¶åŸ¡ãã4ã€ã®é£ç¶åäœå€ïŒé£ç¶å€ïŒã§æ§æãããŠããŸãã ç·åŒ·åïŒç·å ±é
¬ïŒã¯ããšãŒãžã§ã³ããç§»åããè·é¢å
šäœã«åºã¥ããŠèšç®ãããŸãã ãšãŒãžã§ã³ããã«ãŒãå
šäœãééããå Žåã300ãã€ã³ã以äžãç²åŸããŸãã 確ãã«ãé©çšããããã«ã¯ã®éã«å¿ããŠããã€ãã®ãã€ã³ããå·®ãåŒãããããããšãã«ã®ãŒæ¶è²»ãå¶éã«ãªããŸãã
BipedalWalkerHardcore-v2ã§ã¯ ããšãŒãžã§ã³ãã100åã®é£ç¶ã©ã³ãã 詊è¡ã§å¹³å300以äžã®ãã€ã³ããç²åŸããå Žåãã¿ã¹ã¯ã¯è§£æ±ºããããšèŠãªãããŸãã RLã¢ã«ãŽãªãºã ã䜿çšãããšãã«ãŒãå
šäœãæ£åžžã«å®äºããããã«ãšãŒãžã§ã³ãããã¬ãŒãã³ã°ããã®ã¯æ¯èŒçç°¡åã§ããããšãŒãžã§ã³ããå¹ççã«ééãããŠå®å®ããçµæãåŸãã®ã¯ã¯ããã«å°é£ã§ãã ãã®ã¿ã¹ã¯ã¯éåžžã«è峿·±ããã®ã§ãã ç§ã®ç¥ãéãã2017幎10æã®æç¹ã§ãç§ã®ãšãŒãžã§ã³ãã¯ãã©ãã¯ãæé«ã®ç¶æ
ã«ããŠããŸãã

ã¹ã¿ãŒãã æ©ãããšãåŠã¶ã

ãšã©ãŒãä¿®æ£ããããšãåŠã³ãŸãããããã§ããŸã ãã£ãããšå€æããŸã...
詊è¡ããšã«æ°ããã©ã³ãã ã«ãŒããçæããããããã«ãŒãã¯ç°¡åãªå Žåãããã°éåžžã«é£ããå ŽåããããŸãã èªç¶éžæäžã«æ¬¡äžä»£ã«åŒãç¶ãããã®ç°¡åãªéãæ©ãã®ã«ååãªå¹žéãªæ¿æ²»å®¶ããããšãŒãžã§ã³ãã¯å¿
èŠãããŸããã ããã«ãåªç§ãªæ¿æ²»å®¶ãæã€ä»£ç人ã¯ã圌ããä»ã®äººãããæªããªãããšã蚌æã§ããã¯ãã§ãã ããã§ã16åã®ã©ã³ãã ã©ã³ã®å¹³åçµæããšãŒãžã§ã³ããšããœãŒããšãã16åã®ã©ã³ã«ããã环ç©åŒ·åã®å¹³åå€ããã£ãããã¹ã¹ã³ã¢ãšããŠäœ¿çšããŸããã
äžæ¹ã100åã®è©Šè¡ã§ãšãŒãžã§ã³ãããã¹ããããã詊è¡ã¯çæã§è¡ãããããããã¹ãã¿ã¹ã¯ã¯ã·ã¹ãã ãæé©åãããã¬ãŒãã³ã°ã¿ã¹ã¯ã«å¯Ÿå¿ããŠããªãããšãããããŸãã 確ççç°å¢ã§ãæ¯éå£å
ã§è€æ°ã®ãšãŒãžã§ã³ããäœåºŠãå¹³ååãããå Žåããã¬ãŒãã³ã°ã»ãããšãã¹ãã»ããã®ã®ã£ãããæžããããšãã§ããŸãã ãã¬ãŒãã³ã°ã»ãããåãã¬ãŒãã³ã°ã§ããå Žåã¯ãç¹ã«RLã§ãã¹ãã»ãããåãã¬ãŒãã³ã°ã§ããŸããããã«ã¯åé¡ã¯ãããŸãã :)
ãã¡ãããã¢ã«ãŽãªãºã ã®ããŒã¿å¹çã¯16åæªåããŠããŸãã ããããæçµçãªæ¿çã¯ãã£ãšæç¶å¯èœã«ãªããŸããã 100åé£ç¶ããŠã©ã³ãã ã«è©Šè¡ããŠæçµããªã·ãŒããã¹ããããšãããã®ç°å¢ãå®äºããã«ã¯å¹³å300ãã€ã³ã以äžããããŸããã ãã®å¹³ååæ¹æ³ããªããã°ã100åã®è©Šè¡ã§æé«ã®ãšãŒãžã§ã³ããçŽ220ã230ãã€ã³ããç²åŸã§ããŸããã ç§ã®ç¥ãéããç§ãã¡ã®æ±ºå®ã¯ä»é±æ°Žææ¥ã«ééããæåã®ãã®ã§ããïŒ2017幎10æçŸåšïŒã

åå©ã®æ±ºå®ã¯ããšããœãŒãããšã«å¹³å16åå®è¡ããPEPGã䜿çšããŠèª¿æ»ãããŸãã ã
ãŸããRLçšã®åªããããªã·ãŒã°ã©ããŒã·ã§ã³ã¢ã«ãŽãªãºã ã§ããPPOã䜿çšããŸããã ç§ã®èœåã®åã¶éããç§ã¯ãããç§ã®ã¿ã¹ã¯ã§ããŸãæ©èœããããã«èšå®ããããšããŸããã 100åã®ã©ã³ãã 詊è¡ã§çŽ240ã250ãã€ã³ãããéæã§ããŸããã§ããã ããããã ãããPPOãŸãã¯å¥ã®RLã¢ã«ãŽãªãºã ã䜿çšããŠãã®ç°å¢ãééã§ãããšç¢ºä¿¡ããŠããŸãã
å®å
šãªããªã·ãŒãå¿
èŠãªçŸå®ã®ç¶æ³ã§ã¯ãæçšã§éåžžã«å¹æçãªæ©èœã¯ãããŒã¿ã®å¹çæ§ãšããªã·ãŒã®å埩åã®ãã©ã³ã¹ã管çããããšã§ãã çè«çã«ã¯ãååãªèšç®èœåãããã°ã100åã®å¿
èŠãªå®è¡ã«ããã£ãŠããŒã¿ãå¹³åããèŠä»¶ã®ã¬ãã«ã§Biped walkerãæé©åããããšããå¯èœã§ãã ããã€ã¹ãèšèšãããšããããã®ãšã³ãžãã¢ã¯ãã°ãã°å質管çãµãŒãã¹ãšå®å
šä¿æ°ã®èŠä»¶ãèæ
®ããå¿
èŠããããŸãã ããããåãå·»ãçŸå®ã®äžçã«åœ±é¿ãäžããå¯èœæ§ã®ãããšãŒãžã§ã³ãã®ããªã·ãŒãæãããšãããããã¯èæ
®ãããªããã°ãªããŸããã
ESãèŠã€ããããã€ãã®ãœãªã¥ãŒã·ã§ã³ïŒ

CMA-ESãœãªã¥ãŒã·ã§ã³ã

OpenAI-ESãœãªã¥ãŒã·ã§ã³ã
ãŸãã確çè«çããªã·ãŒã䜿çšããåæãã€ãºãã©ã¡ãŒã¿ãé«ããããã¯ãŒã¯ã䜿çšããŠãšãŒãžã§ã³ãããã¬ãŒãã³ã°ããããããšãŒãžã§ã³ãã¯åãããã€ãºã®å€ãå Žæã§ããã€ãºãèŠãŸããã ãã®çµæããšãŒãžã§ã³ãã¯ãå
¥åä¿¡å·ãšåºåä¿¡å·ã®ç²ŸåºŠã«èªä¿¡ããªãå Žåã§ããåé¡ã解決ããããã«ãã¬ãŒãã³ã°ãããŸããïŒãã ãã300ãã€ã³ããè¶
ããã¹ã³ã¢ã¯åŸãããŸããã§ããïŒã

確çè«çæ¿æ²»ãçšããäºè¶³æ©è¡åšã
ã¯ãŒã«ã°ã©ãããããã¢ãŒã
ç§ã¯ãESãšå¹³ååæè¡ã®çµã¿åãããé©çšããŠãKukaããããã¢ãŒã ã䜿çšããç°¡åãªã¿ã¹ã¯ã解決ããããšããŸããã ãã®ç°å¢ã¯ã pybulletç°å¢ã§äœ¿çšã§ããŸãã ã·ãã¥ã¬ãŒã¿ã§äœ¿çšãããKukaã¢ãã«ã¯ãå®éã®Kukaãããã¥ã¬ãŒã¿ãŒã«å¯Ÿå¿ããŠããŸãã ãã®ã¿ã¹ã¯ã§ã¯ããšãŒãžã§ã³ãã¯ãªããžã§ã¯ãã®åº§æšãååŸããŸãã
ããé«åºŠãªRLç°å¢ã§ã¯ããšãŒãžã§ã³ãã¯ãã¯ã»ã«å
¥åä¿¡å·ã«åºã¥ããŠã¢ã¯ã·ã§ã³ãå®è¡ããå¿
èŠãããå ŽåããããŸãããååãšããŠããã®åçŽåãããã¢ãã«ãäºåãã¬ãŒãã³ã°ãããç³ã¿èŸŒã¿ãã¥ãŒã©ã«ãããã¯ãŒã¯ãšçµã¿åãããŠã座æšãèšç®ããããšãã§ããŸãã

確ççããªã·ãŒã䜿çšãããããã¥ã¬ãŒã¿ãŒã«ããã°ã©ãã®ã¿ã¹ã¯ã
ãšãŒãžã§ã³ãããªããžã§ã¯ãã®ååŸã«æåããå Žåã10,000ãã€ã³ããåãåããããã§ãªãå Žåã¯0ãåãåããŸãããã€ã³ãã®äžéšã¯ãšãã«ã®ãŒæ¶è²»ã®ããã«åãé€ãããŸãã 16ã®ã©ã³ãã 詊è¡ã§åŒ·åãå¹³åãããšãå®å®æ§ã®èгç¹ããESãæé©åã§ããŸãã ããããæçµçã«ã¯ã確å®çã§ç¢ºççãªããªã·ãŒãæã€ã±ãŒã¹ã®çŽ70ã75ïŒ
ã§ãªããžã§ã¯ããååŸããããªã·ãŒãååŸã§ããŸããã ãŸã åªåãå¿
èŠã§ãã
Minitaurã«ããã€ãã®ã¿ã¹ã¯ãæããŸã
ããã€ãã®è€éãªã¿ã¹ã¯ãåæã«å®è¡ããããšãåŠã¶ãšãåäžã®ã¿ã¹ã¯ãããé©åã«å®è¡ãå§ããŸãã ããšãã°ãæ±ã®äžã«ç«ã£ãŠãããŠã§ã€ãã¬ã¹ãæã¡äžããå°æå¯ºã®ä¿®é士ã¯ããŠã§ã€ãã¬ã¹ã®æ¹ãã¯ããã«ãã©ã³ã¹ãåããŠããŸãã å±±éã§æé140 kmã§è»ãé転ããŠãããšãã«ã«ããããæ°ŽãããŒããªãããã«ãããšãéæ³ãªã¹ããªãŒãã¬ãŒã¹ã§çŽ æŽããããã©ã€ããŒã«ãªããŸãã ãŸããè€æ°ã®ã¿ã¹ã¯ãäžåºŠã«å®è¡ããããã«ãšãŒãžã§ã³ãããã¬ãŒãã³ã°ããããšãã§ããŸãããããããšããšãŒãžã§ã³ãã¯ããå®å®ããããªã·ãŒãç¿åŸããŸãã

å°æå¯ºã®ãšãŒãžã§ã³ãã

ããªãããã¬ãŒãã³ã°ã
èªå·±åçãšãŒãžã§ã³ãã«é¢ããæè¿ã®ç ç©¶ã¯ãçžæ²ãªã©ã®é£ããã¿ã¹ã¯ãç¿åŸãããšãŒãžã§ã³ãïŒããã³ãã®ã¹ããŒãã«ã¯å€ãã®ã¹ãã«ãå¿
èŠïŒãã远å ã®ãã¬ãŒãã³ã°ãªãã§æ©ããšãã«é¢šã®æµæãªã©ã®åçŽãªã¿ã¹ã¯ãå®è¡ã§ããããšã蚌æããŠããŸãã Erwin Kumansã¯æè¿ãæ©ãããšãåŠãã§ããMinitaurã®ãµãã«ã¢ãã«ã远å ããå®éšã詊ã¿ãŸããã ã¢ãã«ãåããå Žåãã¿ã¹ã¯ã¯ã«ãŠã³ããããŸããã ãããã£ãŠãã¿ã¹ã¯ãžã®ãã®ãããªè¿œå ããã·ãã¥ã¬ãŒã¿ãŒããã®ç ç©¶ãããããªã·ãŒãå®éã®Minitaurã«å€æããã®ã«åœ¹ç«ã€ããšãæåŸ
ã§ããŸãã äŸã® 1ã€ãåãäžããESããã¬ãŒãã³ã°ã«äœ¿çšããŠãããã¿ãŠã¢ãšã¢ãã«ã詊ããŸããã

pybulletã® CMA-ESãŠã©ãŒãã³ã°ããªã·ãŒã

ãŽãŒã¹ããããã£ã¯ã¹ã® Real Minitaurã
pybulletã®Minitaurã¢ãã«ã¯ãå®éã®Minitaurãã¢ãã«ã«ããŠããŸãã ãã ããçæ³çãªä»®æ³ç°å¢ã§åŠç¿ããããªã·ãŒã¯éåžžãçŸå®ã®äžçã§ã¯æ©èœããŸããã 圌女ã¯ãã·ãã¥ã¬ãŒã¿ãŒå
ã®ã¿ã¹ã¯ãžã®å°ããªè¿œå ãèŠçŽããããšããã§ããªããããããŸããã ããšãã°ãåã®ãããªã§ã¯ãMinitaurã¯ïŒCMA-ESã䜿çšããŠïŒåé²ããããã«èšç·ŽãããŸããããåãããªã·ãŒã§ã¯ãããããã®äžã«ã·ãã¥ã¬ãŒã¿ãŒã眮ããå Žåãéšå±ãéããŠã¢ãã«ãç§»åã§ãããšã¯éããŸããã

æ©è¡ããªã·ãŒã¯ã¢ãã«ãšé£æºããŸãã

ã¢ãã«ã®å©ããåããŠåŠãã æ¿çã
ã¢ãã«ã䜿ããªãåçŽãªæ©è¡ã«é¢ããç ç©¶ã¯ãããããã«ã¢ãã«ãä¹ããå Žåã§ãäœããã®åœ¢ã§æ©èœããŸããã€ãŸããã¿ã¹ã¯ãããŸãè€éã«ããŸããã ã¢ãã«ã¯å®å®ããŠããã®ã§ãMinitaurããããèœãšããªãããšã¯ããã»ã©é£ãããããŸããã§ããã ã¿ã¹ã¯ãéåžžã«è€éã«ããããã«ãã¢ãã«ãããŒã«ã«çœ®ãæããããšããŸããã


ããŒããåŠã¶ã
ããããããã¯å®å®ãããã©ã³ã¹ã®ãšããããªã·ãŒã®ç¬æã®åºçŸã«ã¯è³ããŸããã§ããã 代ããã«ãCMA-ESã¯ãããŒã«ãèã®éã®ããŒã¿ã«è»¢ãããŠããã«ä¿æããããšã«ãããããŒã«ãæè¡çã«è»¢éã§ããããªã·ãŒãéçºããŸããã ç®çã«åŸã£ãŠè§£æ±ºçãæ¢ãã¢ã«ãŽãªãºã ïŒç®çé§ååæ€çŽ¢ã¢ã«ãŽãªãºã ïŒã¯ãã¿ã¹ã¯ã®ããã«ç°å¢ã®èšèšäžã®æ¬ é¥ã䜿çšããããšãåŠã¶ãšçµè«ä»ããããšãã§ããŸãã

確çè«çæ¿æ²»ã¯ããŒã«ã§ç ç©¶ããŸããã

åãããªã·ãŒã§ãããã¢ãã«ãããŸãã
ããŒã«ãå°ããããåŸãCMA-ESã¯ãããŒã«ãåæã«æ©ãããã©ã³ã¹ãåãããšãã§ãã確ççããªã·ãŒãèŠã€ããããšãã§ããŸããã ãã®ããªã·ãŒã¯ãã¢ãã«ã®ããããåçŽãªã¿ã¹ã¯ã«ç§»è¡ãããŸããã å°æ¥ããã®ãããªã¿ã¹ã¯ãå®äºããæ¹æ³ãå®éã®ããããã«çµéšãç§»ãã®ã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã
ESã®éèŠãªæ©èœã®1ã€ã¯ãäžå€®åŠçè£
眮ã®ç°ãªãã³ã¢ãŸãã¯ç°ãªããã·ã³äžã§ç°ãªãå®è¡ã¹ã¬ããã§äœæ¥ããè€æ°ã®ã¯ãŒã«ãŒéã§èšç®ã䞊ååã§ããããšã§ãã
Pythonã®ãã«ãããã»ãã·ã³ã°ã©ã€ãã©ãªã䜿çšãããšãããã»ã¹ã䞊è¡ããŠå®è¡ã§ããŸãã Message Passing InterfaceïŒMPIïŒãšmpi4pyã䜿çšããŠãã¿ã¹ã¯ããšã«åå¥ã®Pythonããã»ã¹ãå®è¡ããããšã奜ã¿ãŸãã ããã«ãããã€ã³ã¿ãŒããªã¿ãŒã®ã°ããŒãã«ããã¯ããã€ãã¹ããåããã»ã¹ãç¬èªã®ã€ã³ã¹ã¿ã³ã¹ïŒãµã³ãããã¯ã¹numpyããã³gymïŒã確å®ã«åä¿¡ã§ããããã«ããŸããããã¯ãä¹±æ°ãžã§ãã¬ãŒã¿ãŒã®åæåã«é¢ããŠéèŠã§ãã

ããã¹ã¯ãŒã«ããããŒããŠã©ãŒã«ãŒãã¢ã³ãã

ããã¹ã¯ãŒã«ãªãŒãã£ãŒã
ããŸããŸãªããã¹ã¯ãŒã«ã®ã¿ã¹ã¯ã«ã€ããŠèšç·Žãããestool
ã䜿çšãããšãŒãžã§ã³ãã
ã·ã³ãã«ãªããŒã«estool
ãå®è£
ããŸãããããã¯ãåã®èšäºã§èª¬æãães.py
ã©ã€ãã©ãªã䜿çšããŠããã£ãŒããã©ã¯ãŒãããªã·ãŒãããã¯ãŒã¯ãåããã·ã³ãã«ãªãããã¯ãŒã¯ãæããŠãé£ç¶å¶åŸ¡ïŒé£ç¶å¶åŸ¡RLã¿ã¹ã¯ïŒã§es.py
RLã¿ã¹ã¯ãå®è¡ããŸãã estool
ã䜿çšããŠãäžèšã®ãã¹ãŠã®å®éšã§ã®åŠç¿ãç°¡çŽ åãããšãšãã«ããžã ãããã¹ã¯ãŒã«å
ã§åžžã«å¶åŸ¡ããããŸããŸãªã¿ã¹ã¯ã§åŠç¿ãç°¡çŽ åããŸããã estool
ã¯åæ£åŠçã«MPIã䜿çšãããããç°ãªããã·ã³ã«ã¯ãŒã«ãŒã忣ãããã®ã«å€ãã®ãžã§ã¹ãã£ãŒã¯å¿
èŠãããŸããã
Github
ãžã ãšããã¹ã¯ãŒã«ã§æäŸãããç°å¢ã«å ããŠã estool
ã¯ã»ãšãã©ã®pybulletãžã ç°å¢ã§ããŸãæ©èœããŸãã æ¢åã®Pybulletç°å¢ã倿ŽããŠãã¿ã¹ã¯ã«ããé©ãããã®ã«ããããšãã§ããŸãã ããšãã°ãMinitaurãããŒã«ãéã¶ç°å¢ãïŒãªããžããªã®custom_envs
ãã£ã¬ã¯ããªã«ïŒç°¡åã«äœæããŸããã ç°å¢ã®ç°¡åãªã»ããã¢ããã«ãããæ°ããã¢ã€ãã¢ãç°¡åã«ãã¹ãã§ããŸãã ROSãBlenderãªã©ã®ä»ã®ãœãããŠã§ã¢ããã±ãŒãžãã3Dã¢ãã«ãå®è£
ããå Žåã¯ãæ°ããè峿·±ãpybulletç°å¢ãäœæããä»ã®äººã«ãããã詊ããŠãããããšãã§ããŸãã
KukaãMinitaurãªã©ãpybulletã®å€ãã®ã¢ãã«ãšç°å¢ã¯ãå®éã®ããããã®ã€ã¡ãŒãžãšé¡äŒŒæ§ã«åºã¥ããŠã¢ãã«åãããŠãããèšç·Žãããã¢ã«ãŽãªãºã ã®çŸåšã®ç¥èãããã«è»¢éããŸãã å®éãæè¿ã®å€ãã®ç ç©¶ïŒ 1ã2ã3ã4 ïŒã§ã¯ãç¥èäŒéå®éšãå¶åŸ¡ããããã«ããã¬ããã䜿çšãããŠããŸãã
ããããã·ãã¥ã¬ãŒã¿ããå®éã®ããã€ã¹ãžã®ç¥èã®è»¢éã詊ãããã«ãé«äŸ¡ãªãããããå
¥æããå¿
èŠã¯ãããŸããã Pybulletã«ã¯ãããŒããŠã§ã¢ãªãŒãã³ãœãŒã¹ã¹ã€ãŒãMIT racecarã«åºã¥ããracecarã¢ãã«ããããŸãã ãšãŒãžã§ã³ããå
¥åç£èŠããŒã«ãšããŠä»®æ³ç»é¢ã䜿çšã§ããããã«ãä»®æ³ã¬ãŒã¹ã«ãŒã«ä»®æ³ã«ã¡ã©ãããŠã³ãããpybulletç°å¢ããããŸãã
æåã«ãããåçŽãªããŒãžã§ã³ã詊ããŠã¿ãŸãããããã®å Žåããã·ã³ã¯å·šå€§ãªããŒã«ã«ç§»åããããªã·ãŒãç ç©¶ããã ãã§ãã RacecarBulletEnv-v0ç°å¢ã§ã¯ããšãŒãžã§ã³ãã¯å
¥åã§ããŒã«ã®çžå¯Ÿåº§æšãåãåããåºåã§ã¢ãŒã¿ãŒé床ãšèµæ¹åãå¶åŸ¡ããããã®é£ç¶ã¢ã¯ã·ã§ã³ãåãåããŸãã ã¿ã¹ã¯ã¯ç°¡åã§ã2014 Macbook ProïŒ8ã³ã¢ããã»ããµïŒã®ãã¬ãŒãã³ã°ã«ã¯5åïŒ50äžä»£ïŒããããŸãã estool
䜿çšãããšããã®ã³ãã³ãã¯8ã€ã®ããã»ã¹ã§ãã¬ãŒãã³ã°ãéå§ããåããã»ã¹ã«4ã€ã®ã¿ã¹ã¯ãå²ãåœãŠãŠã32人ã®ã¯ãŒã«ãŒãäœæããŸãã ããªã·ãŒéçºã«ã¯ãCMA-ESã䜿çšãããŸãã
python train.py bullet_racecar -o cma -n 8 -t 4
çŸåšã®ãã¬ãŒãã³ã°çµæãšæ€åºãããã¢ãã«ãã©ã¡ãŒã¿ãŒã¯ã log
ãµããã£ã¬ã¯ããªã«ä¿åããlog
ã ãã®ã³ãã³ããå®è¡ããŠãèŠã€ãã£ãæé©ãªããªã·ãŒã䜿çšããŠç°å¢å
ã®ãšãŒãžã§ã³ããã¬ã³ããªã³ã°ããŸãã
python model.py bullet_racecar log/bullet_racecar.cma.1.32.best.json

MIT Racecarã«åºã¥ãæ°Žææ¥ã®ããã¬ããã¬ãŒã¹ã«ãŒã
ã·ãã¥ã¬ãŒã¿ãŒã§ã¯ãããŠã¹ã«ãŒãœã«ã§ããŒã«ãåãããããè»ãåããããšããã§ããŸãã
IPython notepad plot_training_progress.ipynb
ã䜿çšãããšããã¹ãŠã®äžä»£ã®ãšãŒãžã§ã³ãã®åŠç¿å±¥æŽã衚瀺ã§ããŸãã äžä»£ããšã«ãæé«ã¹ã³ã¢ãšæäœã¹ã³ã¢ãããã³æ¯éå£å
šäœã®å¹³åçµæã確èªã§ããŸãã


æšæºçãªç§»åã¿ã¹ã¯ã¯ãåç«æ¯åã®ãããªããã¹ã¯ãŒã«ã§äœ¿çšãããã¿ã¹ã¯ã«äŒŒãŠããŸãã ãŸããpybulletã«ã¯ãHopperãWalkerãHalfCheetahãAntãããã³HumanoidããããŸãã Antã®ããªã·ãŒãéçºããŸãããPEPGã®å©ããåããŠã1æéã§256ã®ãšãŒãžã§ã³ããååšãããã«ãã³ã¢ãã·ã³ã§3000ãã€ã³ãã«å°éããŸãã
python train.py bullet_ant -o pepg -n 64 -t 4


AntBulletEnvã®å®è¡äŸ ã gym.wrappers.Monitor
ã䜿çšãããšãã©ã³ãã³ã°ãMP4ãããªã§èšé²ã§ããŸãã


ãããã«
ãã®èšäºã§ã¯ãESã䜿çšããŠãšã³ãããŒãšã³ãã®ãã¥ãŒã©ã«ãããã¯ãŒã¯ãšãŒãžã§ã³ãã®ããªã·ãŒãéçºãããžã ã€ã³ã¿ãŒãã§ã€ã¹ã§å®çŸ©ãããäžå®ã®å¶åŸ¡ã§ããŸããŸãªRLã¿ã¹ã¯ãå®è¡ããæ¹æ³ã«ã€ããŠèª¬æããŸããã estool
ããŒã«ã«ã€ããŠèª¬æãestool
;忣ã³ã³ãã¥ãŒãã£ã³ã°ç°å¢ã§MPIãã¬ãŒã ã¯ãŒã¯ã䜿çšããŠãããŸããŸãªèšå®ã§ESã¢ã«ãŽãªãºã ããã°ãããã¹ãã§ããŸãã
ãããŸã§ã詊è¡é¯èª€ãéããŠãšãŒãžã§ã³ãããã¬ãŒãã³ã°ããæ¹æ³ã®ã¿ã説æããŠããŸããã ãã®ãŒãããã®åŠç¿åœ¢åŒã¯ãã¢ãã«ã䜿çšããªã匷ååŠç¿ãšåŒã°ããŸãïŒ model-free ïŒã 次ã®èšäºïŒãŸã£ããæžããå ŽåïŒã§ã¯ããšãŒãžã§ã³ãã以åã«ãã¬ãŒãã³ã°ããã¢ãã«ã䜿çšããŠçŸåšã®ã¿ã¹ã¯ãå®äºããæ¹æ³ãåŠç¿ãããšãã«ãã¢ãã«ããŒã¹ã®ãã¬ãŒãã³ã°ã«ã€ããŠè©³ãã説æããŸãã ãããŠãã¯ããç§ã¯åã³é²åã¢ã«ãŽãªãºã ãé©çšããŸãã
è峿·±ããªã³ã¯
ãšã¹ããŒã«
å®å®ãŸãã¯å®å®ïŒ éãã¯äœã§ããïŒ
OpenAIãžã ã®ããã¥ã¡ã³ã
匷ååŠç¿ã®ã¹ã±ãŒã©ãã«ãªä»£æ¿ææ®µãšããŠã®é²åæŠç¥
ãšãã¯ãŒã-確ççã¢ããªã³ã°ãçµè«ãæ¹å€ã®ããã®ã©ã€ãã©ãª
ãã€ãžã¢ã³ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®æŽå²
BipedalWalkerHardcore-v2
roboschool
pybullet
Emergent Complexity
GraspGAN