ããã«ã¡ã¯ Hessian-FreeãŸãã¯Truncated NewtonïŒTruncated Newton MethodïŒãšããŠç¥ãããæé©åææ³ãšããã£ãŒãã©ãŒãã³ã°ã©ã€ãã©ãªTensorFlowã䜿çšããå®è£
ã«ã€ããŠã話ããããšæããŸãã 2次ã®æé©åææ³ãå©çšããŠããã2次å°é¢æ°ã®è¡åãèªã¿åãå¿
èŠã¯ãããŸããã ãã®èšäºã§ã¯ãHFã¢ã«ãŽãªãºã èªäœãšãMNISTããã³XORããŒã¿ã»ããã§ã®çŽæ¥é
ä¿¡ãããã¯ãŒã¯ã®ãã¬ãŒãã³ã°ã«é¢ããäœæ¥ã«ã€ããŠèª¬æããŸãã

æé©åæ¹æ³ã«ã€ããŠå°ã
ãã¥ãŒã©ã«ãããã¯ãŒã¯ãåŠç¿ããã«ã¯ããã®éã¿ã«é¢é£ããŠæ倱é¢æ°ãæå°åããå¿
èŠããããŸãã ãããã£ãŠããã®åé¡ã解決ããããã®å€ãã®æé©åæ¹æ³ããããŸãã
åŸé
éäž
åŸé
éäžæ³ã¯ã埮åå¯èœãªé¢æ°ã®æå°å€ãé£ç¶ããŠèŠã€ããããã®æãç°¡åãªæ¹æ³ã§ãã
ïŒãã¥ãŒã©ã«ãããã¯ãŒã¯ã®å Žåãããã¯äŸ¡å€ã®é¢æ°ã§ãïŒã ããã€ãã®ãªãã·ã§ã³ããã
ïŒãããã¯ãŒã¯ã®éã¿ïŒããã³ãããã«é¢ããŠé¢æ°ã埮åãããšãåå°é¢æ°ã®ãã¯ãã«ãŸãã¯åŸé
ãã¯ãã«ãåŸãããŸãã
åŸé
ã¯åžžã«é¢æ°ã®æ倧æé·ã®æ¹åãæããŸãã éæ¹åã«ç§»åããå ŽåïŒã€ãŸãã
ïŒãã®åŸãæéã®çµéãšãšãã«æå°éã«ãªããŸãããããå¿
èŠãªããšã§ãã æãåçŽãªåŸé
éäžã¢ã«ãŽãªãºã ïŒ
- åæåïŒãªãã·ã§ã³ãã©ã³ãã ã«éžæ
- åŸé
ãèšç®ããŸãã
- è² ã®åŸé
ã®æ¹åã«ãã©ã¡ãŒã¿ãŒãå€æŽããŸãã ã©ã㧠-åŠç¿çã®ãã©ã¡ãŒã¿ãŒ
- åŸé
ããŒãã«è¿ã¥ããŸã§åã®æé ãç¹°ãè¿ããŸã
åŸé
éäžæ³ã¯ããªãåçŽã§å®çžŸã®ããæé©åææ³ã§ããããã€ãã¹ããããŸããããã¯äžæ¬¡ã§ãããããã³ã¹ãé¢æ°ã«é¢ããäžæ¬¡å°é¢æ°ã䜿çšãããŸãã ããã«ãããããã€ãã®å¶éã課ããããŸããã€ãŸããã³ã¹ãé¢æ°ã¯å±æçã«å¹³é¢ã®ããã«èŠãããã®æ²çã¯èæ
®ãããªãããšãæå³ããŸãã
ãã¥ãŒãã³ã®æ¹æ³
ããããã³ã¹ãé¢æ°ã®2次å°é¢æ°ãæäŸããæ
å ±ãååŸããŠäœ¿çšãããšã©ããªããŸããïŒ äºæ¬¡å°é¢æ°ã䜿çšããæãããç¥ãããæé©åæ¹æ³ã¯ããã¥ãŒãã³æ³ã§ãã ãã®æ¹æ³ã®äž»ãªã¢ã€ãã¢ã¯ãã³ã¹ãé¢æ°ã®2次è¿äŒŒãæå°åããããšã§ãã ããã¯ã©ãããæå³ã§ããïŒ ãããç解ããŸãããã
äžæ¬¡å
ã®å ŽåãèããŠã¿ãŸãããã é¢æ°ããããšããŸãïŒ
ã æå°ç¹ãèŠã€ããã«ã¯ã次ã®ããšãããã£ãŠããããããã®å°é¢æ°ã®ãŒããèŠã€ããå¿
èŠããããŸãã
æäœã§ã
ã é¢æ°ãè¿äŒŒãã
2次ã®ãã€ã©ãŒå±éïŒ
æ¢ããã
ãã
æå°ã«ãªããŸãã ãããè¡ãããã«ã
ãããŠãŒãã«çããïŒ
ãã
äºæ¬¡é¢æ°ããã¯çµ¶å¯Ÿæå°å€ã«ãªããŸãã æå°å€ãç¹°ãè¿ãèŠã€ãããå Žåã¯ãæåã®
ãã®ã«ãŒã«ã«åŸã£ãŠæŽæ°ããŸãã
æéãçµã€ã«ã€ããŠããœãªã¥ãŒã·ã§ã³ã¯æå°éã«ãªããŸãã
å€æ¬¡å
ã®å ŽåãèããŸãã å€æ¬¡å
é¢æ°ããããšããŸã
ãã®åŸïŒ
ã©ãã§
-ããã»è¡åïŒããã·ã¢ã³ïŒãŸãã¯2次å°é¢æ°ã®è¡åã ããã«åºã¥ããŠããã©ã¡ãŒã¿ãŒãæŽæ°ããã«ã¯ã次ã®åŒã䜿çšããŸãã
ãã¥ãŒãã³æ³ã®åé¡
ã芧ã®ãšãããNewtonã¡ãœããã¯2次ã®ã¡ãœããã§ãããéåžžã®åŸé
éäžãããããŸãæ©èœããŸããããã¯ãåã¹ãããã§ããŒã«ã«ãããã ã«ç§»åãã代ããã«ãé¢æ°ã
äºæ¬¡ããã³ãã€ã©ãŒçŽæ°ã®äºæ¬¡ã®å±éã¯ãã®è¯ãè¿äŒŒã§ãã
ãããããã®æ¹æ³ã«ã¯
1ã€ã®å€§ããªãã€ãã¹ããããŸãã ã³ã¹ãé¢æ°ãæé©åããã«ã¯ãããã»è¡åãŸãã¯ããã»è¡åãèŠã€ããå¿
èŠããããŸã
ã 眮ã
ãã©ã¡ãŒã¿ã®ãã¯ãã«ã§ããå ŽåïŒ
ã芧ã®ãšãããããã»è¡åã¯ãµã€ãºã®2次å°é¢æ°ã®è¡åã§ã
èšç®ã«ã¯äœãå¿
èŠã§ãã
æ°çŸãŸãã¯æ°åã®ãã©ã¡ãŒã¿ãæã€ãããã¯ãŒã¯ã«ãšã£ãŠéåžžã«éèŠãªèšç®æäœã ããã«ããã¥ãŒãã³æ³ã䜿çšããŠæé©ååé¡ã解決ããã«ã¯ãéããã»è¡åãèŠã€ããå¿
èŠããããŸã
ãããã®ããã«ãããã¯ãã¹ãŠã®ããã«æ確ã«å®çŸ©ãããã¹ãã§ã
ã
æ£å®è¡åããããªãã¯ã¹ å¯žæ³ æ¡ä»¶ãæºããå Žåãéè² å®å€ãšåŒã°ããŸãã ã ãã®å Žåãå³å¯ãªäžçåŒãæãç«ã€å Žåãè¡åã¯æ£å®å€ãšåŒã°ããŸãã ãã®ãããªè¡åã®éèŠãªç¹æ§ã¯ããããã®éç¹ç°æ§ã§ãã éè¡åã®ååš ã
ãã·ã¢ã³ããªãŒæé©å
HFæé©åã®äž»ãªèãæ¹ã¯ãNewtonã®æ¹æ³ãåºç€ãšããããšã§ãããããé©åãªæ¹æ³ã䜿çšããŠ2次é¢æ°ãæå°åããããšã§ãã ãã ããæåã«ãå°æ¥å¿
èŠã«ãªãåºæ¬çãªæŠå¿µã玹ä»ããŸãã
ããã
-ãããã¯ãŒã¯ãã©ã¡ãŒã¿ãããã§
-éã¿ã®è¡åïŒéã¿ïŒã
ãã€ã¢ã¹ãã¯ãã«ã次ã«ãããã¯ãŒã¯åºåãåŒã³åºããŸãã
ã©ãã§
-å
¥åãã¯ãã«ã
-æ倱é¢æ°
-ã¿ãŒã²ããå€ã ãããŠããã¹ãŠã®ãã¬ãŒãã³ã°äŸïŒãã¬ãŒãã³ã°ãããïŒã®æ倱ã®å¹³åãšããŠæå°åããé¢æ°ãå®çŸ©ããŸãã
ïŒ
次ã«ããã¥ãŒãã³ã®æ¹æ³ã«åŸã£ãŠã2次ã®ãã€ã©ãŒçŽæ°ã«å±éããŠåŸããã2次é¢æ°ãå®çŸ©ããŸãã
ããã«ã
äžèšã®åŒãããŒãã«çãããããšã次ã®ããã«ãªããŸãã
èŠã€ããããã«
å
±åœ¹åŸé
æ³ã䜿çšããŸãã
å
±åœ¹åŸé
æ³
å
±åœ¹åŸé
æ³ïŒCGïŒã¯ã次ã®ã¿ã€ãã®ç·åœ¢æ¹çšåŒç³»ã解ãããã®å埩æ³ã§ãã
ã
ç°¡åãªCGã¢ã«ãŽãªãºã ïŒå
¥åããŒã¿ïŒ ã
ã
ã
-CGã¢ã«ãŽãªãºã ã®ã¹ããã
åæåïŒ- -ãšã©ãŒãã¯ãã«ïŒæ®å·®ïŒ
- -æ€çŽ¢æ¹åã®ãã¯ãã«
åæ¢æ¡ä»¶ãæºãããããŸã§ç¹°ãè¿ããŸãã
ããã§ãå
±åœ¹åŸé
æ³ã䜿çšããŠãæ¹çšåŒïŒ2ïŒã解ããšã
ããã«ããæå°åãããŸãïŒ1ïŒã ç§ãã¡ã®å ŽåïŒ
ã
CGã¢ã«ãŽãªãºã ãåæ¢ããŸãã ããŸããŸãªåºæºã«åºã¥ããŠå
±åœ¹åŸé
æ³ãåæ¢ã§ããŸãã äºæ¬¡é¢æ°ã®æé©åã®çžå¯Ÿçãªé²æã«åºã¥ããŠãããè¡ããŸã
ïŒ
ã©ãã§
-é²æã®äŸ¡å€ãèæ
®ããããŠã£ã³ããŠãã®ãµã€ãºã
ã åæ¢æ¡ä»¶ã¯æ¬¡ã®ãšããã§ãã
ã
ããã§
ãHFæé©åã®äž»ãªç¹åŸŽã¯ ãããã·ã¢ã³ãçŽæ¥èŠã€ããå¿
èŠã¯ãªãããã¯ãã«ã§ãã®ç©ã®çµæãèŠã€ããã ãã§ããããšãããããŸãã
ãã¯ãã«ã«ãããã·ã¢ã³ä¹ç®
åè¿°ããããã«ããã®æ¹æ³ã®é
åã¯ãããã·ã¢ã³ãçŽæ¥æ°ããå¿
èŠããªãããšã§ãã ãã¯ãã«ã«ãã2次å°é¢æ°ã®è¡åã®ç©ã®çµæãèšç®ããå¿
èŠãããã ãã§ãã ãã®ããã«ããªãã¯æ³åããããšãã§ããŸã
ã®æŽŸçç©ãšããŠ
æ¹åã«
ïŒ
ããããå®éã«ãã®åŒã䜿çšãããšãååã«å°ããèšç®ã«é¢é£ããå€ãã®åé¡ãçºçããå¯èœæ§ããããŸã
ã ãã®ããããã¯ãã«ã«ããè¡åã®æ£ç¢ºãªç©ãèšç®ããæ¹æ³ããããŸãã 埮åæŒç®åã玹ä»ããŸã
ã ããçšåºŠã®å°é¢æ°ã瀺ããŸã
äŸåãã
æ¹åã«
ïŒ
ããã¯ãããã»è¡åã®ç©ããã¯ãã«ã§èšç®ããã«ã¯ã次ã®èšç®ãå¿
èŠã§ããããšã瀺ããŠããŸãã
HFæé©åã®ããã€ãã®æ¹å
1.äžè¬åãã¥ãŒãã³ã¬ãŠã¹è¡åïŒäžè¬åã¬ãŠã¹ãã¥ãŒãã³è¡åïŒãããã»è¡åã®äžå®æ§ã¯ãéåžïŒéåžïŒé¢æ°ãæé©åããããã®åé¡ã§ããã2次é¢æ°ã®äžéã®æ¬ åŠã«ã€ãªããå¯èœæ§ããããŸãã
ãããŠãã®çµæããã®æå°å€ãèŠã€ããããšã¯äžå¯èœã§ãã ãã®åé¡ã¯å€ãã®æ¹æ³ã§è§£æ±ºã§ããŸãã ããšãã°ãä¿¡é Œåºéãå°å
¥ãããšãæ²çè¡åã«æ£ã®å確å®æåãè¿œå ãã眰éã«åºã¥ããŠæé©åãŸãã¯æžè¡°ãå¶éãããŸã
ãããŠåœŒå¥³ãååãã«ããã
å®éã®çµæã«åºã¥ããŠããã®åé¡ã解決ããæè¯ã®æ¹æ³ã¯ããã¥ãŒãã³ã¬ãŠã¹è¡åã䜿çšããããšã§ã
ããã»è¡åã®ä»£ããã«ïŒ
ã©ãã§
-ã€ã³ãã¢ã³ã
-æ倱é¢æ°ã®äºæ¬¡å°é¢æ°ã®è¡å
ã
è¡åã®ç©ãèŠã€ããã«ã¯
ãã¯ãã«ã§
ïŒ
ãæåã«ãã¯ãã«ã«ããã€ã³ãã¢ã³ã®ç©ãæ±ããŸãã
次ã«ããã¯ãã«ã®ç©ãèšç®ããŸã
è¡åãž
ãããŠæåŸã«è¡åãæããŸã
ã«
ã
2.ãã³ãã³ã°ãæšæºã®ãã¥ãŒãã³æ³ã§ã¯ã匷ãéç·åœ¢ãªç®çé¢æ°ã®æé©åãäžååãªå ŽåããããŸãã ãã®çç±ã¯ãæé©åã®åæ段éã§ã¯ãéå§ç¹ãæå°ç¹ããé ããããéåžžã«å€§ããç©æ¥µçã§ããå¯èœæ§ãããããã§ãã ãã®åé¡ã解決ããããã«ããã³ãã䜿çšãããŸã-äºæ¬¡é¢æ°ãå€æŽããæ¹æ³
ãŸãã¯ãæ°ããå¶éã
ãã®ãããªå¶éå
ã«ãããŸã
è¯ãè¿äŒŒã®ãŸãŸã«ãªããŸã
ã
Tikhonovã®æ£ååïŒTikhonovæ£ååïŒãŸãã¯Tikhonovã®ãã³ãã³ã°ïŒTikhonovæžè¡°ïŒã ïŒæ©æ¢°åŠç¿ã®ã³ã³ããã¹ãã§äžè¬çã«äœ¿çšããããæ£èŠåããšããçšèªãšæ··åããªãã§ãã ããïŒããã¯ãé¢æ°ã«äºä¹ããã«ãã£ãè¿œå ããæãæåãªãã³ãæ¹æ³ã§ã
ïŒ
ã©ãã§
ã
-ãã³ããã©ã¡ãŒã¿ãŒã èšç®
ãã®ããã«è¡ãããŸãïŒ
3. Levenberg-Marquardtã®ãã¥ãŒãªã¹ãã£ãã¯ïŒLevenberg-Marquardtãã¥ãŒãªã¹ãã£ãã¯ïŒãããããã®ãã³ãã³ã°ã¯ããã©ã¡ãŒã¿ãŒã®åç調æŽã«ãã£ãŠç¹åŸŽä»ããããŸã
ã å€æŽ
LM-ã¡ãœããã®ã³ã³ããã¹ãã§ãã䜿çšãããLevenberg-Marquardtã«ãŒã«ã«åŸããŸãïŒæé©åã¡ãœããã¯ãNewtonã¡ãœããã®ä»£æ¿ã§ãïŒã LM-ãã¥ãŒãªã¹ãã£ãã¯ã䜿çšããã«ã¯ãããããçž®å°çãèšç®ããå¿
èŠããããŸãã
ã©ãã§
-HFã¢ã«ãŽãªãºã ã®ã¹ãããçªå·ã
-CGæå°åã®äœæ¥ã®çµæã
Levenberg-Marquardtã®ãã¥ãŒãªã¹ãã£ãã¯ã«ããã°ãæŽæ°èŠåãåŸãããŸã
ïŒ
4.å
±åœ¹åŸé
ã®ã¢ã«ãŽãªãºã ã®åææ¡ä»¶ïŒäºå調æŽïŒãHFæé©åã®ã³ã³ããã¹ãã§ã¯ãããã€ãã®å¯éå€æè¡åããããŸã
äžç·ã«å€ãã
ãã®ãããª
代ããã«
æå°åãã
ã ãã®æ©èœãCGã¢ã«ãŽãªãºã ã«é©çšããã«ã¯ãå€æããããšã©ãŒãã¯ãã«ã®èšç®ãå¿
èŠã§ãã
ã©ãã§
ã
ç°¡åãªPCGïŒååŠçä»ãå
±åœ¹åŸé
ïŒã¢ã«ãŽãªãºã ïŒå
¥åããŒã¿ïŒ ã
ã
ã
ã
-CGã¢ã«ãŽãªãºã ã®ã¹ããã
åæåïŒ- -ãšã©ãŒãã¯ãã«ïŒæ®å·®ïŒ
- -æ¹çšåŒã®è§£
- -æ€çŽ¢æ¹åã®ãã¯ãã«
åæ¢æ¡ä»¶ãæºãããããŸã§ç¹°ãè¿ããŸãã- -æ¹çšåŒã®è§£
ãããªãã¯ã¹éžæ
éåžžã«ç°¡åãªäœæ¥ã§ãã ãŸããå®éã«ã¯ãïŒãã«ã©ã³ã¯ã®è¡åã®ä»£ããã«ïŒå¯Ÿè§è¡åã䜿çšãããšãããªãè¯ãçµæãåŸãããŸãã ãããªãã¯ã¹ãéžæããããã®ãªãã·ã§ã³ã®1ã€
-ããã¯å¯Ÿè§ãã£ãã·ã£ãŒè¡åïŒçµéšçãã£ãã·ã£ãŒå¯Ÿè§ïŒã®äœ¿çšã§ãïŒ
5. CGã®åæå-ã¢ã«ãŽãªãºã ãã€ãã·ã£ã«ãåæåããããšããå§ãããŸã
ãå
±åœ¹åŸé
ã¢ã«ãŽãªãºã ã®å Žåãå€
HFã¢ã«ãŽãªãºã ã®åã®ã¹ãããã§èŠã€ãããŸããã ãã®å Žåãããã€ãã®æžè¡°å®æ°ã䜿çšã§ããŸãã
ã ã€ã³ããã¯ã¹ã¯æ³šç®ã«å€ããŸã
HFã¢ã«ãŽãªãºã ã®ã¹ãããçªå·ãæããé çªã«ã€ã³ããã¯ã¹0
CGã¢ã«ãŽãªãºã ã®åæã¹ããããæããŸãã
å®å
šãªãã·ã¢ã³ããªãŒæé©åã¢ã«ãŽãªãºã ïŒå
¥åããŒã¿ïŒ ã
-ãã³ããã©ã¡ãŒã¿ãŒ
-ã¢ã«ãŽãªãºã ã®å埩ã®ã¹ããã
åæåïŒ
ã¡ã€ã³ã®HFæé©åãµã€ã¯ã«ïŒ- è¡åãèšç®ãã
- èŠã€ãã CGãŸãã¯PCGã䜿çšããŠæé©ååé¡ã解決ããŸãã
- ãã³ããã©ã¡ãŒã¿ãŒã®æŽæ° Levenberg-Marquardtãã¥ãŒãªã¹ãã£ãã¯ã䜿çš
- ã -åŠç¿çãã©ã¡ãŒã¿ãŒ
ãããã£ãŠãããã»è¡åã䜿çšããªãæé©åææ³ã«ããã倧次å
é¢æ°ã®æå°å€ãèŠã€ããåé¡ã解決ã§ããŸãã ããã»è¡åãçŽæ¥èŠã€ããå¿
èŠã¯ãããŸããã
TensorFlowã§ã®HFæé©åã®å®è£
çè«ã¯ç¢ºãã«åªããŠããŸãããå®éã«ãã®æé©åææ³ãå®è£
ããŠããã®çµæã確èªããŠã¿ãŸãããã HFã¢ã«ãŽãªãºã ãèšè¿°ããããã«ãPythonãšTensorFlow深局åŠç¿ã©ã€ãã©ãªã䜿çšããŸããã ãã®åŸãããã©ãŒãã³ã¹ãã§ãã¯ãšããŠãæé©åã®ããã«HFã¡ãœããã䜿çšããŠãXORããã³MNISTããŒã¿ã»ããäžã®ããã€ãã®ã¬ã€ã€ãŒã§çŽæ¥é
ä¿¡ãããã¯ãŒã¯ããã¬ãŒãã³ã°ããŸããã
å
±åœ¹åŸé
æ³ã®å®è£
ïŒTensorFlowèšç®ã°ã©ãã®äœæïŒã
def __conjugate_gradient(self, gradients): """ Performs conjugate gradient method to minimze quadratic equation and find best delta of network parameters. gradients: list of Tensorflow tensor objects Network gradients. return: Tensorflow tensor object Update operation for delta. return: Tensorflow tensor object Residual norm, used to prevent numerical errors. return: Tensorflow tensor object Delta loss. """ with tf.name_scope('conjugate_gradient'): cg_update_ops = [] prec = None
è¡åãèšç®ããã³ãŒã
åææ¡ä»¶ïŒåææ¡ä»¶ïŒãèŠã€ããã«ã¯ã次ã®ããã«ãªããŸãã åæã«ãTensorflowã¯æ瀺ããããã¬ãŒãã³ã°äŸå
šäœã®åŸé
ãèšç®ããçµæãèŠçŽãããããåäŸã§åŸé
ãåå¥ã«ååŸããããã«å°ãã²ããå¿
èŠããããããã解ã®æ°å€å®å®æ§ã«åœ±é¿ãäžããŸããã ãããã£ãŠã圌ããèšãããã«ãããªãèªèº«ã®å±éºãšãªã¹ã¯ã§ãäºå調æŽã®äœ¿çšãå¯èœã§ãã
prec = [[g**2 for g in tf.gradients(tf.gather(self.prec_loss, i), self.W)] for i in range(batch_size)]
ãã¯ãã«ïŒ4ïŒã«ããããã»è¡åã®ç©ã®èšç®ã ãã®å ŽåãTikhonovãã³ãã䜿çšãããŸãïŒ6ïŒã
def __Hv(self, grads, vec): """ Computes Hessian vector product. grads: list of Tensorflow tensor objects Network gradients. vec: list of Tensorflow tensor objects Vector that is multiplied by the Hessian. return: list of Tensorflow tensor objects Result of multiplying Hessian by vec. """ grad_v = [tf.reduce_sum(g * v) for g, v in zip(grads, vec)] Hv = tf.gradients(grad_v, self.W, stop_gradients=vec) Hv = [hv + self.damp_pl * v for hv, v in zip(Hv, vec)] return Hv
äžè¬åããããã¥ãŒãã³ã»ã¬ãŠã¹è¡åïŒ5ïŒã䜿çšããããšããå°ããªåé¡ã«ééããŸããã ã€ãŸããTensorFlowã¯ãä»ã®Theanoãã£ãŒãã©ãŒãã³ã°ãã¬ãŒã ã¯ãŒã¯ïŒTheanoã«ã¯ãã®ããã«ç¹å¥ã«èšèšãããRopé¢æ°ããããŸãïŒã®ããã«ããã¯ãã«ã«å¯Ÿããã€ã³ãã¢ã³ã®ä»äºãèšç®ããæ¹æ³ãç¥ããŸããã TensorFlowã§ã¢ããã°æäœãè¡ãå¿
èŠããããŸããã
def __Rop(self, f, x, vec): """ Computes Jacobian vector product. f: Tensorflow tensor object Objective function. x: list of Tensorflow tensor objects Parameters with respect to which computes Jacobian matrix. vec: list of Tensorflow tensor objects Vector that is multiplied by the Jacobian. return: list of Tensorflow tensor objects Result of multiplying Jacobian (df/dx) by vec. """ r = None if self.batch_size is None: try: r = [tf.reduce_sum([tf.reduce_sum(v * tf.gradients(f, x)[i]) for i, v in enumerate(vec)]) for f in tf.unstack(f)] except ValueError: assert False, clr.FAIL + clr.BOLD + 'Batch size is None, but used '\ 'dynamic shape for network input, set proper batch_size in '\ 'HFOptimizer initialization' + clr.ENDC else: r = [tf.reduce_sum([tf.reduce_sum(v * tf.gradients(tf.gather(f, i), x)[j]) for j, v in enumerate(vec)]) for i in range(self.batch_size)] assert r is not None, clr.FAIL + clr.BOLD +\ 'Something went wrong in Rop computation' + clr.ENDC return r
ãããŠãäžè¬åããããã¥ãŒãã³ã»ã¬ãŠã¹è¡åã®ç©ããã¯ãã«ã§æ¢ã«å®çŸããŠããŸãã
def __Gv(self, vec): """ Computes the product G by vec = JHJv (G is the Gauss-Newton matrix). vec: list of Tensorflow tensor objects Vector that is multiplied by the Gauss-Newton matrix. return: list of Tensorflow tensor objects Result of multiplying Gauss-Newton matrix by vec. """ Jv = self.__Rop(self.output, self.W, vec) Jv = tf.reshape(tf.stack(Jv), [-1, 1]) HJv = tf.gradients(tf.matmul(tf.transpose(tf.gradients(self.loss, self.output)[0]), Jv), self.output, stop_gradients=Jv)[0] JHJv = tf.gradients(tf.matmul(tf.transpose(HJv), self.output), self.W, stop_gradients=HJv) JHJv = [gv + self.damp_pl * v for gv, v in zip(JHJv, vec)] return JHJv
äž»ãªåŠç¿ããã»ã¹ã®æ©èœã以äžã«ç€ºããŸãã ãŸããCG / PCGã䜿çšããŠ2次é¢æ°ãæå°åããã次ã«ãããã¯ãŒã¯ã®éã¿ã®äž»ãªæŽæ°ãè¡ãããŸãã Levenberg-Marquardtãã¥ãŒãªã¹ãã£ãã¯ã«åºã¥ããã³ããã©ã¡ãŒã¿ãŒã調æŽãããŸãã
def minimize(self, feed_dict, debug_print=False): """ Performs main training operations. feed_dict: dictionary Input training batch. debug_print: bool If True prints CG iteration number. """ self.sess.run(tf.assign(self.cg_step, 0)) feed_dict.update({self.damp_pl:self.damping}) if self.adjust_damping: loss_before_cg = self.sess.run(self.loss, feed_dict) dl_track = [self.sess.run(self.ops['dl'], feed_dict)] self.sess.run(self.ops['set_delta_0']) for i in range(self.cg_max_iters): if debug_print: d_info = clr.OKGREEN + '\r[CG iteration: {}]'.format(i) + clr.ENDC sys.stdout.write(d_info) sys.stdout.flush() k = max(self.gap, i // self.gap) rn = self.sess.run(self.ops['res_norm'], feed_dict)
HFæé©åã®ãã¹ã
èšè¿°ãããHFãªããã£ãã€ã¶ãŒããã¹ãããŸãããã®ãããXORããŒã¿ã»ããã䜿çšããç°¡åãªäŸã䜿çšããMNISTããŒã¿ã»ããã䜿çšããããè€éãªäŸã䜿çšããŸãã åŠç¿ææã確èªããããã€ãã®æ
å ±ãèŠèŠåããããã«ãTesnorBoardã䜿çšããŸãã ãŸããTensorFlowèšç®ã®ããªãè€éãªã°ã©ããååŸãããããšã«ã泚æããŠãã ããã
TensorFlow Computingã°ã©ããXORããŒã¿ã»ããã«é¢ãããããã¯ãŒã¯ã¢ãŒããã¯ãã£ãšãã¬ãŒãã³ã°ã2ã€ã®å
¥åãã¥ãŒãã³ã2ã€ã®é ããã¥ãŒãã³ã1ã€ã®åºåã®ãµã€ãºã®åçŽãªãããã¯ãŒã¯ãäœæããŸãããã ã¢ã¯ãã£ããŒã·ã§ã³é¢æ°ãšããŠãã·ã°ã¢ã€ãã䜿çšããŸãã æ倱é¢æ°ãšããŠã察æ°æ倱ã䜿çšããŸãã
ããã§ãHFæé©åïŒããã»è¡åã䜿çšïŒãHFæé©åïŒãã¥ãŒãã³ã¬ãŠã¹è¡åã䜿çšïŒãããã³0.01ã«çããåŠç¿é床ãã©ã¡ãŒã¿ãŒã䜿çšããéåžžã®åŸé
éäžã䜿çšããŠãåŠç¿çµæãæ¯èŒããŸãã å埩åæ°ã¯100ã§ãã
åŸé
éäžã®æ倱ïŒèµ€ç·ïŒã ããã»è¡åã䜿çšããHFæé©åã®æ倱ïŒãªã¬ã³ãžè²ã®ç·ïŒã ãã¥ãŒãã³ã¬ãŠã¹è¡åã«ããHFæé©åã®æ倱ïŒéç·ïŒãåæã«ãNewton-Gaussè¡åã䜿çšããHFæé©åãæéã§åæããäžæ¹ã§ã100åã®å埩ã®åŸé
éäžã§ã¯éåžžã«å°ããããšãå€æããŸããã åŸé
éäžã®æ倱é¢æ°ãHFæé©åã«å¹æµããããã«ã¯ãçŽ
100,000åã®å埩ãå¿
èŠã§ããã
åŸé
éäžã®æ倱ã100,000åã®å埩ãMNISTããŒã¿ã»ããã«é¢ãããããã¯ãŒã¯ã¢ãŒããã¯ãã£ãšãã¬ãŒãã³ã°ãææžãã®æ°åèªèã®åé¡ã解決ããããã«ãå
¥åãã¥ãŒãã³784ãé衚瀺300ãåºå10ã®ãµã€ãºã®ãããã¯ãŒã¯ãäœæããŸãã æ倱ã®é¢æ°ãšããŠãã¯ãã¹ãšã³ããããŒã䜿çšããŸãã ãã¬ãŒãã³ã°äžã«æäŸããããããããã®ãµã€ãºã¯50ã§ãã
with tf.name_scope('loss'): one_hot = tf.one_hot(t, n_outputs, dtype=tf.float64) xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot, logits=y_out) loss = tf.reduce_mean(xentropy, name="loss") with tf.name_scope("eval"): correct = tf.nn.in_top_k(tf.cast(y_out, tf.float32), t, 1) accuracy = tf.reduce_mean(tf.cast(correct, tf.float64)) n_epochs = 10 batch_size = 50 with tf.Session() as sess: """ Initializing hessian free optimizer """ hf_optimizer = HFOptimizer(sess, loss, y_out, dtype=tf.float64, batch_size=batch_size, use_gauss_newton_matrix=True) init = tf.global_variables_initializer() init.run()
XORã®å Žåãšåæ§ã«ãHFæé©åïŒããã»è¡åïŒãHFæé©åïŒãã¥ãŒãã³ã¬ãŠã¹è¡åïŒãããã³åŠç¿é床ãã©ã¡ãŒã¿ãŒ0.01ã®éåžžã®åŸé
éäžã䜿çšããŠãåŠç¿çµæãæ¯èŒããŸãã å埩åæ°ã¯200ãã€ãŸãã§ãã ãããããã®ãµã€ãºã50ã®å Žåã200ã¯å®å
šãªæ代ã§ã¯ãããŸããïŒãã¬ãŒãã³ã°ã»ããã®ãã¹ãŠã®äŸã䜿çšãããããã§ã¯ãããŸããïŒã ãã¹ãŠãããéããã¹ãããããã«ãããè¡ããŸããããããã§ãäžè¬çãªåŸåãèŠããŸãã
å·Šã®å³ã¯ããã¹ããµã³ãã«ã®ç²ŸåºŠã§ãã å³ã®å³ã¯ããã¬ãŒãã³ã°ãµã³ãã«ã®ç²ŸåºŠã§ãã åŸé
éäžã®ç²ŸåºŠïŒèµ€ç·ïŒã ããã»è¡åïŒãªã¬ã³ãžè²ã®ç·ïŒã䜿çšããHFæé©åã®ç²ŸåºŠã ãã¥ãŒãã³ã¬ãŠã¹è¡åã䜿çšããHFæé©åã®ç²ŸåºŠïŒéç·ïŒã
åŸé
éäžã®æ倱ïŒèµ€ç·ïŒã ããã»è¡åã䜿çšããHFæé©åã®æ倱ïŒãªã¬ã³ãžè²ã®ç·ïŒã ãã¥ãŒãã³ã¬ãŠã¹è¡åã«ããHFæé©åã®æ倱ïŒéç·ïŒãäžã®å³ãããããããã«ãããã»è¡åã䜿çšããHFæé©åã¯ããŸãå®å®ããŠåäœããŸããããããã€ãã®æ代ã§åŠç¿ãããšæçµçã«åæããŸãã æè¯ã®çµæã¯ããã¥ãŒãã³ã¬ãŠã¹ãããªãã¯ã¹ã䜿çšããHFæé©åã«ãã£ãŠç€ºãããŸãã
åŠç¿ã®å®å
šãªæ代ã å·Šã®å³ã¯ããã¹ããµã³ãã«ã®ç²ŸåºŠã§ãã å³ã®å³ã¯ããã¬ãŒãã³ã°ãµã³ãã«ã®ç²ŸåºŠã§ãã åŸé
éäžã®ç²ŸåºŠïŒéç·è²ã®ç·ïŒã ãã¥ãŒãã³ã¬ãŠã¹è¡åã䜿çšããHFæé©åã®æ倱ïŒãã³ã¯ã®ç·ïŒã
åŠç¿ã®å®å
šãªæ代ã åŸé
éäžã®æ倱ïŒéç·è²ã®ç·ïŒã ãã¥ãŒãã³ã¬ãŠã¹è¡åã䜿çšããHFæé©åã®æ倱ïŒãã³ã¯ã®ç·ïŒãå
±åœ¹åŸé
ã®ã¢ã«ãŽãªãºã ã®åææ¡ä»¶ã§å
±åœ¹åŸé
ã®æ¹æ³ã䜿çšãããšïŒäºå調æŽïŒãèšç®èªäœã倧å¹
ã«é
ããªããéåžžã®CGãããéãåæããŸããã§ããã
PCGã¢ã«ãŽãªãºã ã䜿çšããHFæé©åã®æ倱ããããã®ãã¹ãŠã®ã°ã©ããããNewton-Gaussè¡åãšæšæºå
±åœ¹åŸé
æ³ã䜿çšããHFæé©åã«ãã£ãŠæè¯ã®çµæã瀺ãããããšãããããŸãã
å®å
šãªã³ãŒãã¯
GitHubã§è¡šç€ºã§ããŸãã
ãŸãšã
ãã®çµæãPythonã§ã®HFã¢ã«ãŽãªãºã ã®å®è£
ã¯ãTensorFlowã©ã€ãã©ãªã䜿çšããŠäœæãããŸããã äœæäžã«ãã¢ã«ãŽãªãºã ã®äž»ãªæ©èœãå®è£
ãããšãã«ãããã€ãã®åé¡ãçºçããŸãããã€ãŸããNewton-Gaussè¡åã®ãµããŒããšäºå調æŽã§ãã ããã¯ãTensorFlowãç§ãã¡ãæãã»ã©æè»ãªã©ã€ãã©ãªã§ã¯ãªããç 究çšã«èšèšãããŠããªãããã§ãã å®éšç®çã§ã¯ãTheanoã䜿çšããã»ããèªç±åºŠãé«ããªãããããã¯ãè¯ãã§ãã ããããç§ã¯æåã«ããããã¹ãŠTensorFlowã§è¡ãããšã«ããŸããã ããã°ã©ã ããã¹ãããããã¥ãŒãã³ã¬ãŠã¹è¡åã䜿çšããHFã¢ã«ãŽãªãºã ãæè¯ã®çµæãäžããããšãããããŸããã å
±åœ¹åŸé
ã®ã¢ã«ãŽãªãºã ã®åææ¡ä»¶ïŒäºå調æŽïŒã䜿çšãããšãäžå®å®ãªæ°å€çµæãåŸãããèšç®ã倧å¹
ã«é
ããªããŸããããããã¯TensorFlowã®ç¹æ§ã«ãããã®ãšæãããŸãïŒäºå調æŽã®å®è£
ã®ããã«ãç§ã¯éåžžã«ããããªããã°ãªããŸããã§ããïŒã
ãœãŒã¹
ãã®èšäºã§ã¯ãã¢ã«ãŽãªãºã ã®äž»èŠãªæ¬è³ªãç解ã§ããããã«ãããã»è¡åã®çè«çåŽé¢-ç¡æã®æé©åã«ã€ããŠç°¡åã«èª¬æããŸãã çŽ æã®è©³çŽ°ãªèª¬æãå¿
èŠãªå Žåã¯ãåºæ¬çãªçè«æ
å ±ãååŸããæ
å ±æºãåŒçšããŸãããã®æ
å ±ã«åºã¥ããŠãPythonãHFã¡ãœãããå®è£
ããŸããã
1ïŒ
ãã·ã¢ã³ã䜿çšããªãæé©åã«ãããã£ãŒãã§ãªã«ã¬ã³ããªãããã¯ãŒã¯ã®ãã¬ãŒãã³ã°ïŒJames Martensããã³Ilya Sutskeverãããã³ã倧åŠïŒ-HFã®å®å
šãªèª¬æ-æé©åã
2ïŒ
ããã»è¡åã®ãªãæé©åã«ãããã£ãŒãã©ãŒãã³ã°ïŒJames Martensãããã³ã倧åŠïŒ-HFã䜿çšããçµæãå«ãèšäº-æé©åã
3ïŒ
ããã»è¡åã«ããé«éå®å
šä¹ç®ïŒBarak A. PearlmutterãSiemens Corporate ResearchïŒ -ããã»è¡åãšãã¯ãã«ã®ä¹ç®ã®è©³çŽ°ãªèª¬æã
4ïŒ
èŠçã䌎ããªãå
±åœ¹åŸé
æ³ã®çŽ¹ä»ïŒJonathan Richard Shewchukãã«ãŒãã®ãŒã¡ãã³å€§åŠïŒ -å
±åœ¹åŸé
æ³ã®è©³çŽ°ãªèª¬æã