çŸåšãé³å£°åæã¯ããŸããŸãªåéã§äœ¿çšãããŠããŸãã ãããã¯ãé³å£°ã¢ã·ã¹ã¿ã³ããIVRã·ã¹ãã ãã¹ããŒãããŒã ãªã©ã§ãã ã¿ã¹ã¯èªäœã¯ãç§ã®å¥œã¿ã§ã¯ãéåžžã«æ確ã§ç解ãããããã®ã§ããæžãããããã¹ãã¯ã人ãããããã«çºé³ãããã¹ãã§ãã
å°ãåã«ãä»ã®å€ãã®åéãšåæ§ã«ãæ©æ¢°åŠç¿ã¯é³å£°åæã®åéã«å
¥ããŸããã ã·ã¹ãã å
šäœã®å€ãã®ã³ã³ããŒãã³ãããã¥ãŒã©ã«ãããã¯ãŒã¯ã«çœ®ãæããããšãã§ãããããæ¢åã®ã¢ã«ãŽãªãºã ã®å質ã«ã¢ãããŒãã§ããã ãã§ãªãããããã倧å¹
ã«äžåãããšãã§ããŸãã
ç§ã¯å®å
šã«ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®çµ±åãèªåã®æã§è¡ããåæã«ã³ãã¥ããã£ã§çµéšãå
±æããããšããããšã«ããŸããã ããã®ç±æ¥ã¯ãç«ã®äžãèŠãããšã§ããããŸãã
é³å£°åæ
é³å£°åæã·ã¹ãã ãæ§ç¯ããã«ã¯ãããŸããŸãªåéã®å°é家ããŒã å
šäœãå¿
èŠã§ãã ãããã®ããããã«ã€ããŠãã¢ã«ãŽãªãºã ãšã¢ãããŒãã®ãã¹ãããããŸãã åºæ¬çãªã¢ãããŒãã説æããå士è«æãšåæã®æ¬ãæžãããŠããŸãã ãããã®ããããã®è¡šé¢çãªç解ããå§ããŸãããã
èšèªåŠ
- ããã¹ãã®æ£èŠå ã æåã«ããã¹ãŠã®ç¥èªãæ°åãæ¥ä»ãããã¹ãã«å±éããå¿
èŠããããŸãã 20äžçŽã®50幎代ã¯20äžçŽã®50代ã«ãªã ããµã³ã¯ãããã«ãã«ã¯ã®éœåžãããªã·ã§ã€prãP.S. ãµã³ã¯ãããã«ãã«ã¯åžãããªã·ã§ã€ãããã°ã©ãŒãåŽã®å±æ ã ããã¯ãæžããããã®ãèªãããã«äººã«æ±ãããããã®ããã«èªç¶ã«èµ·ããã¯ãã§ãã
- ã¹ãã¬ã¹èŸæžã®æºå ã ã¢ã¯ã»ã³ãã¯ãèšèªã®èŠåã«åŸã£ãŠé
眮ã§ããŸãã è±èªã§ã¯ãæåã®é³ç¯ã«éç¹ã眮ãããããšãå€ããã¹ãã€ã³èªã§ã¯æåŸãã2çªç®ã®é³ç¯ã«éç¹ã眮ãããŸãã ããã«ããããã®ã«ãŒã«ããã¯ãäžè¬çãªã«ãŒã«ã«åŸããªãäŸå€ãå€æ°ãããŸãã ããããèæ
®ããå¿
èŠããããŸãã äžè¬çãªæå³ã§ã®ãã·ã¢èªã®å Žåãã¹ãã¬ã¹ãé
眮ããããã®ã«ãŒã«ã¯ãŸã£ããååšããªããããã¹ãã¬ã¹ãé
眮ãããèŸæžããªããã°ãè¡ãæ¹æ³ã¯ãŸã£ãããããŸããã
- ãã¢ã°ã©ãã£ã®åé€ ã ãã¢ã°ã©ãã¯ãã¹ãã«ã¯äžèŽãããçºé³ãç°ãªãåèªã§ãã ãã€ãã£ãã¹ããŒã«ãŒã¯ç°¡åã«ã¹ãã¬ã¹ããããããšãã§ããŸãïŒ ãã¢ããã¯ãšå±±ã®å ã ãããã ããã¯ã®éµã¯ããé£ããã¿ã¹ã¯ã§ãã æèãèæ
®ããããšãªããã¢ã°ã©ãã£ãå®å
šã«åé€ããããšã¯äžå¯èœã§ãã
ãããœãã£ã«
- ã·ã³ã¿ã°ãã®åŒ·èª¿è¡šç€ºãšäžæåæ¢ ã ã·ã³ã¿ã°ãã¯ãæå³ã«ãããŠæ¯èŒçå®æããé³å£°ã®ã»ã°ã¡ã³ããè¡šããŸãã 人ã話ããšãã圌ã¯éåžžãæã®éã«ããŒãºãæ¿å
¥ããŸãã ããã¹ãããã®ãããªæ§æã«åå²ããæ¹æ³ãåŠã¶å¿
èŠããããŸãã
- ã€ã³ãããŒã·ã§ã³ã®ã¿ã€ãã®æ±ºå® ã å®å
šæ§ãçåãæåã®è¡šçŸã¯æãåçŽãªã€ã³ãããŒã·ã§ã³ã§ãã ããããç®èãç念ãç±æãè¡šçŸããããšã¯ãã¯ããã«é£ãã課é¡ã§ãã
é³å£°åŠ
- æåèµ·ãããååŸãã ã æçµçã«ã¯ã©ã€ãã£ã³ã°ã§ã¯ãªãçºé³ã䜿çšãããããæåïŒæžèšçŽ ïŒã®ä»£ããã«é³ïŒé³çŽ ïŒã䜿çšããããšãè«ççã§ããããšã¯æããã§ãã æžèšçŽ ã®é²é³ãé³çŽ ã«å€æããããšã¯ãå€ãã®ã«ãŒã«ãšäŸå€ã§æ§æãããå¥ã®ã¿ã¹ã¯ã§ãã
- ã€ã³ãããŒã·ã§ã³ãã©ã¡ãŒã¿ã®èšç® ã ãã®æç¹ã§ãé
眮ãããããŒãºãéžæãããé³çŽ ã·ãŒã±ã³ã¹ãè¡šçŸãããã€ã³ãããŒã·ã§ã³ã®ã¿ã€ãã«å¿ããŠãããããšçºé³é床ãã©ã®ããã«å€åãããã決å®ããå¿
èŠããããŸãã åºæ¬çãªããŒã³ãšé床ã«å ããŠãé·æéå®éšã§ããä»ã®ãã©ã¡ãŒã¿ãŒããããŸãã
é³é¿åŠ
- é³ã®èŠçŽ ã®éžæ ã åæã·ã¹ãã ã¯ãããããç°é³ã§åäœããŸã-ç°å¢ã«å¿ããŠãé³çŽ ã®å®çŸã ãã¬ãŒãã³ã°ããŒã¿ã®ã¬ã³ãŒãã¯ãé³çŽ ããŒãã³ã°ã«ãã£ãŠæçã«åãåããããç°é³ã®ããŒã¹ã圢æããŸãã åç°é³ã¯ãã³ã³ããã¹ãïŒé³çŽ è¿åïŒãããããæç¶æéãªã©ã®äžé£ã®ãã©ã¡ãŒã¿ã«ãã£ãŠç¹åŸŽä»ããããŸãã åæããã»ã¹èªäœã¯ãçŸåšã®æ¡ä»¶ã«æãé©ãããç°é³ã®æ£ããã·ãŒã±ã³ã¹ã®éžæã§ãã
- å€æŽããã³å¹æé³ ã çµæã®é²é³ã§ã¯ãåæåŸã®é³å£°ã人éã®é³å£°ã«å°ãè¿ã¥ããããäœããã®æ¬ é¥ãä¿®æ£ãããããç¹å¥ãªãã£ã«ã¿ãŒãå¿
èŠã«ãªãå ŽåããããŸãã
ããããã¹ãŠç°¡åã«ãªããé ã®äžã§ç解ã§ããããåã
ã®ã¢ãžã¥ãŒã«ã®ãã¥ãŒãªã¹ãã£ãã¯ãããã«èŠã€ãããããšæããããããã³ãã£ãŒèªã§åæããå¿
èŠããããšæ³åããŠãã ããã èšèªãããããªãå Žåãé©åãªã¬ãã«ã®èšèªãç¥ã£ãŠãã人ãåŒãä»ããããšãªããåæã®å質ãè©äŸ¡ããããšããã§ããŸããã ç§ã®æ¯åœèªã¯ãã·ã¢èªã§ãåæãã¹ãã¬ã¹ãšééããããããééã£ãå£èª¿ã§è©±ãããããããšèãããŸãã ãããåæã«ãåæããããã¹ãŠã®è±èªã®é³ã¯ãç§ã«ãšã£ãŠã¯ã»ãŒåãã§ããããããšããŸããã¯ãªèšèªã¯èšããŸã§ããããŸããã
å®è£
ã·ã³ã»ã·ã¹ã®End-2-EndïŒE2EïŒå®è£
ãèŠã€ããããšããŸããããã¯ãèšèªã®åŸ®åŠãªç¹ã«é¢é£ãããã¹ãŠã®å°é£ãåŒãåããŸãã èšãæãããšãå
¥åãšããŠããã¹ããåãåããåºåãšããŠåæé³å£°ãçæãããã¥ãŒã©ã«ãããã¯ãŒã¯ã«åºã¥ããã·ã¹ãã ãæ§ç¯ããããšèããŠããŸãã çãå°åã®å°é家ããŒã å
šäœãæ©æ¢°åŠç¿ã«ç¹åããããŒã ïŒå Žåã«ãã£ãŠã¯1人ã§ãïŒã«çœ®ãæãããããªãããã¯ãŒã¯ããã¬ãŒãã³ã°ããããšã¯å¯èœã§ããïŒ
end2end ttsãªã¯ãšã¹ãã§ãGoogleã¯å€æ°ã®çµæãçæããŸãã äžçªäžã«ããã®ã¯ãGoogleèªäœããã®Tacotronã®å®è£
ã§ãã ãã®åéã®ç 究ã«åŸäºããããŸããŸãªã¢ãŒããã¯ãã£ã®å®è£
ãå
±æããŠããGithubã®ç¹å®ã®äººã
ããè¡ãããšã¯ãç§ã«ãšã£ãŠæãç°¡åã«æããŸããã
ç§ã¯3ã€ãéžã³ãŸãïŒ
- ãã¥ãã§ã³å
Œ
- ããŒã¹ã»ã€ã
- å±±æ¬éŸäž
ãªããžããªã§ããããèŠãŠãæ
å ±ã®å庫ããããŸãã E2Eåæã®åé¡ã«ã¯ãå€ãã®ã¢ãŒããã¯ãã£ãšã¢ãããŒãããããŸãã äž»ãªãã®ã®äžã§ïŒ
- ã¿ã³ããã³ïŒããŒãžã§ã³1ã2ïŒã
- DeepVoiceïŒããŒãžã§ã³1ã2ã3ïŒã
- Char2Wavã
- DCTTSã
- WaveNet
ãããããéžæããå¿
èŠããããŸãã å°æ¥ã®å®éšã®åºç€ãšããŠã
ä¹
byå
¬åã®ãã£ãŒã
ã³ã³ããªã¥ãŒã·ã§ãã«ããã¹ãèªã¿äžãïŒDCTTSïŒãéžæããŸããã ãªãªãžãã«ã®èšäºã¯
ãã¡ãã§ã芧
ããã ããŸã ã å®è£
ã詳ããèŠãŠã¿ãŸãããã
èè
ã¯ãåæã®çµæã3ã€ã®ç°ãªãããŒã¹ã§ããã¬ãŒãã³ã°ã®ç°ãªã段éã§ã¬ã€ã¢ãŠãããŸããã ç§ã®è¶£å³ãšããŠã¯ããã€ãã£ãã¹ããŒã«ãŒã§ã¯ãªãã«ããŠããããªããŸãšãã§ãã ç§ã®ããŒã¿ããŒã¹ã«ã¯ã»ãŒåçšåºŠã®éã®ããŒã¿ãå«ãŸããŠãããããæåŸã®è±èªã®ããŒã¿ããŒã¹ïŒã±ã€ããŠã£ã³ã¹ã¬ããã®ãªãŒãã£ãªããã¯ïŒã«ã¯ããã5æéã®ã¹ããŒãããå«ãŸããŠããŸããã
ã·ã¹ãã ããã¬ãŒãã³ã°ããŠãã°ããããŠããªããžããªã«èè
ãéåœèªã®ã¢ãã«ãæ£åžžã«ãã¬ãŒãã³ã°ãããšããæ
å ±ã衚瀺ãããŸããã ããã¯éåžžã«éèŠã§ããèšèªã¯å€§ããç°ãªãå¯èœæ§ããããèšèªã«å¯Ÿããå
ç¢æ§ã¯çŽ æŽãããè¿œå æ©èœã ããã§ãã ãã¬ãŒãã³ã°ããã»ã¹äžããã¬ãŒãã³ã°ããŒã¿ã®åã»ãããžã®ç¹å¥ãªã¢ãããŒãïŒèšèªãé³å£°ããŸãã¯ãã®ä»ã®ç¹æ§ïŒã¯å¿
èŠãªãããšãäºæ³ãããŸãã
ãã®çš®ã®ã·ã¹ãã ã®ãã1ã€ã®éèŠãªãã€ã³ãã¯ããã¬ãŒãã³ã°æéã§ãã ç§ãæã£ãŠãããã®éã®ã¿ã³ããã³ã¯ãç§ã®æšå®ã«ãããšãçŽ2é±éå匷ããŸãã åæã¬ãã«ã§ã®ãããã¿ã€ãã³ã°ã«ã€ããŠã¯ããªãœãŒã¹ãéäžçã«äœ¿çšããããã«æããŸããã ãã¡ãããããã«ããããå¿
èŠã¯ãããŸããããããã€ãã®åºæ¬çãªãããã¿ã€ããäœæããã«ã¯å€ãã®ã«ã¬ã³ããŒæéãããããŸãã æçµããŒãžã§ã³ã®DCTTSã¯ãæ°æ¥ã§åŠç¿ããŸãã
åç 究è
ã«ã¯ãèªåã®ä»äºã§äœ¿çšããäžé£ã®ããŒã«ããããŸãã 誰ãã奜ã¿ã«åãããŠããããéžæããŸãã ç§ã¯PyTorchãæ¬åœã«å¥œãã§ãã æ®å¿µãªãããDCTTSã®å®è£
ãèŠã€ããããšãã§ãããTensorFlowã䜿çšããå¿
èŠããããŸããã ããããããæç¹ã§ãå®è£
ãPyTorchã«æçš¿ããã§ãããã
ãã¬ãŒãã³ã°ããŒã¿
åæãå®è£
ããããã®åªããåºç€ã¯ãæåã®äž»ãªä¿èšŒã§ãã æ°ãã声ã®æºåã¯éåžžã«åŸ¹åºçã«è¡ãããŠããŸãã ããã®ã¢ããŠã³ãµãŒãäºåã«æºåãããã¬ãŒãºãäœæéãçºé³ããŸãã çºè©±ããšã«ããã¹ãŠã®ããŒãºã«èãããžã£ãŒã¯ãã¹ããŒããŠã³ãªãã§è©±ãããåºæ¬é³ã®æ£ããã¢ãŠãã©ã€ã³ãåçŸããããããã¹ãŠãæ£ããã€ã³ãããŒã·ã§ã³ã§åçŸããå¿
èŠããããŸãã ãšãããããã¹ãŠã®å£°ãåãããã«å¿å°ããèãããããã§ã¯ãããŸããã
ããã®ã¢ããŠã³ãµãŒã«ãã£ãŠèšé²ãããçŽ8æéã®ããŒã¹ãæã«ããŸããã ç§ã®ååãšç§ã¯çŸåšããã®é³å£°ãéå¶å©ç®çã§èªç±ã«å©çšã§ããããã«ããå¯èœæ§ã«ã€ããŠè°è«ããŠããŸãã ãã¹ãŠãããŸãããã°ãé²é³èªäœã«å ããŠãé³å£°ä»ãã®é
ä¿¡ã«ããããã®æ£ç¢ºãªããã¹ããå«ãŸããŸãã
å§ããŸããã
å
¥åãšããŠããã¹ããåãåããåºåãšããŠåæããããµãŠã³ããçæãããããã¯ãŒã¯ãäœæããŸãã è±å¯ãªå®è£
ã¯ãããå¯èœã§ããããšã瀺ããŠããŸããããã¡ããå€ãã®äºçŽããããŸãã
ã¡ã€ã³ã·ã¹ãã ãã©ã¡ãŒã¿ãŒã¯éåžžãã€ããŒãã©ã¡ãŒã¿ãŒãšåŒã°ããåå¥ã®ãã¡ã€ã«ã«åãåºãããŸãããã®ãã¡ã€ã«ã¯ããã®äŸã®ããã«
hparams.pyãŸãã¯
hyperparams.pyãšåŒã°ããŸãã ã¡ã€ã³ã³ãŒãã«è§Šããããšãªããã€ã¹ãã§ãããã¹ãŠã®ãã®ã¯ããã€ããŒãã©ã¡ãŒã¿ãŒã§åãåºãããŸãã ãã°ã®ãã£ã¬ã¯ããªããå§ãŸããé ãã¬ã€ã€ãŒã®ãµã€ãºã§çµãããŸãã ãã®åŸãã³ãŒãå
ã®ãã€ããŒãã©ã¡ãŒã¿ãŒã¯æ¬¡ã®ããã«äœ¿çšãããŸãã
from hyperparams import Hyperparams as hp batch_size = hp.B
ããã«ã
hpæ¥é èŸãæã€ãã¹ãŠã®å€æ°
ã ãã€ããŒãã©ã¡ãŒã¿ãŒãã¡ã€ã«ããååŸã ãããã®ãã©ã¡ãŒã¿ã¯ãã¬ãŒãã³ã°ããã»ã¹äžã«å€æŽãããªããããæ°ãããã©ã¡ãŒã¿ã§äœããåèµ·åããå Žåã¯æ³šæããŠãã ããã
ããã¹ã
ããã¹ãã®åŠçã«ã¯ãéåžžãæåã«é
眮ãããããããåã蟌ã¿ã¬ã€ã€ãŒã䜿çšãããŸãã ãã®æ¬è³ªã¯ã·ã³ãã«ã§ã-æåãã¯ãã«ãæåãã¯ãã«ã«é¢é£ä»ããåãªããã¬ãŒãã§ãã åŠç¿ããã»ã¹ã§ã¯ããããã®ãã¯ãã«ã«æé©ãªå€ãéžæããå®æããã¢ãã«ã«åŸã£ãŠåæãããšãã«ããã®ãã¬ãŒãããå€ãååŸããŸãã ãã®ã¢ãããŒãã¯ããã§ã«åºãç¥ãããŠããWord2Vecã§äœ¿çšãããŠãããåèªã®ãã¯ãã«è¡šçŸãæ§ç¯ãããŸãã
ããšãã°ãåçŽãªã¢ã«ãã¡ãããã䜿çšããŸãã
['a', 'b', 'c']
åŠç¿ããã»ã¹ã«ãããŠãåã·ã³ãã«ã®æé©å€ã¯æ¬¡ã®ãšããã§ããããšãããããŸããã
{ 'a': [0, 1], 'b': [2, 3], 'c': [4, 5] }
次ã«ãåã蟌ã¿å±€ãééããåŸã®
aabbccè¡ã«ã€ããŠã次ã®ãããªãã¯ã¹ãååŸããŸãã
[[0, 1], [0, 1], [2, 3], [2, 3], [4, 5], [4, 5]]
ãã®ãããªãã¯ã¹ã¯ãã·ã³ãã«ã®æŠå¿µã§åäœããªããªã£ãä»ã®ã¬ã€ã€ãŒã«éãããŸãã
çŸæç¹ã§ã¯ãç§ãã¡ã®åœã«æåã«çŸããå¶éããããŸããåæã®ããã«éä¿¡ã§ããæåã®ã»ããã¯éãããŠããŸãã ãã£ã©ã¯ã¿ãŒããšã«ãã§ããã°ç°ãªãã³ã³ããã¹ãã§ããŒã以å€ã®æ°ã®ãã¬ãŒãã³ã°ããŒã¿ã®ãµã³ãã«ãå¿
èŠã§ãã ããã¯ãã¢ã«ãã¡ãããã®éžæã«æ³šæããå¿
èŠãããããšãæå³ããŸãã
ç§ã®å®éšã§ã¯ããªãã·ã§ã³ã決å®ããŸããïŒ
ããã¯ããã·ã¢èªã®ã¢ã«ãã¡ãããããã€ãã³ãã¹ããŒã¹ãããã³è¡æ«ã®æå®ã§ãã ããã€ãã®éèŠãªãã€ã³ããšä»®å®ããããŸãã
- ã¢ã«ãã¡ãããã«å¥èªç¹ãè¿œå ããŸããã§ããã äžæ¹ã§ã¯ãå®éã«ã¯çºé³ããŸããã äžæ¹ãå¥èªç¹ã«ããã°ããã¬ãŒãºãéšåïŒã·ã³ã¿ã°ãïŒã«åå²ããäžæåæ¢ã§åå²ããŸãã ã·ã¹ãã ã¯ã©ã®ããã«å®è¡ãèš±ããªããšçºé³ããŸããïŒ
- ã¢ã«ãã¡ãããã«ã¯æ°åããããŸããã åæãé©çšããåãã€ãŸãæ£èŠåããåã«ãããããæ°åã«å±éãããããšãæåŸ
ããŠããŸãã äžè¬ã«ãç§ãèŠããã¹ãŠã®E2Eã¢ãŒããã¯ãã£ã«ã¯ãæ£ç¢ºã«æ£èŠåãããããã¹ããå¿
èŠã§ãã
- ã¢ã«ãã¡ãããã«ã¯ã©ãã³æåããããŸããã è±èªã·ã¹ãã ã¯çºé³ã§ããŸããã ããªãã¯é³èš³ãè©ŠããŠã匷ããã·ã¢èªã®ã¢ã¯ã»ã³ããåŸãããšãã§ããŸãã
- ã¢ã«ãã¡ãããã«ã¯eãšããæåããããŸãã ã·ã¹ãã ããã¬ãŒãã³ã°ããããŒã¿ã§ã¯ãã·ã¹ãã ãå¿
èŠãªå Žæã«ç«ã£ãŠããããããã®é
眮ãå€æŽããªãããšã«ããŸããã ããããç§ãçµæãè©äŸ¡ããŠããç¬éã«ãä»ãåæãç³è«ããåã«ããã®æåãæ£ããèšå®ããå¿
èŠãããããšãããããŸãããããã§ãªããã°ãã·ã¹ãã ã¯eã§ã¯ãªãeãæ£ç¢ºã«çºé³ããŸãã
å°æ¥ã®ããŒãžã§ã³ã§ã¯ãåã¢ã€ãã ã«ããã«æ³šæãæãããšãã§ããŸãããä»ã®ãšããã¯ããã®ãããªãããã«åçŽåããã圢åŒã®ãŸãŸã«ããŠãããŸãã
é³
ã»ãšãã©ãã¹ãŠã®ã·ã¹ãã ã¯ãä¿¡å·èªäœã§ã¯ãªããç¹å®ã®ã¹ãããã§ãŠã£ã³ããŠã§ååŸãããããŸããŸãªçš®é¡ã®ã¹ãã¯ãã«ã§åäœããŸãã 詳现ã«ã€ããŠã¯èª¬æããŸãããããã®ãããã¯ã«ã€ããŠã¯ããŸããŸãªçš®é¡ã®æç®ããããŸãã å®è£
ãšäœ¿çšã«çŠç¹ãåœãŠãŸãã DCTTSå®è£
ã§ã¯ãæ¯å¹
ã¹ãã¯ãã«ãšãã§ãŒã¯ã¹ãã¯ãã«ã®2çš®é¡ã®ã¹ãã¯ãã«ã䜿çšãããŸãã
ãããã¯æ¬¡ã®ããã«èæ
®ãããŸãïŒãã®ãªã¹ãããã³åŸç¶ã®ãã¹ãŠã®ã³ãŒãã¯DCTTSå®è£
ããååŸãããŸãããæ確ã«ããããã«å€æŽãããŠããŸãïŒã
èšç®ã«ã¯ãã»ãŒãã¹ãŠã®E2Eåæãããžã§ã¯ãã§LibROSAã©ã€ãã©ãªïŒ
https://librosa.imtqy.com/librosa/ ïŒã䜿çšãã
ãŸã ã ããã«ã¯å€ãã®æçšãªãã®ãå«ãŸããŠããŸããããã¥ã¡ã³ãã調ã¹ãŠãå
容ã確èªããããšããå§ãããŸãã
次ã«ã䜿çšããããŒã¿ããŒã¹ã®ãã¡ã€ã«ã®1ã€ã§æ¯å¹
ã¹ãã¯ãã«ãã©ã®ããã«èŠããããèŠãŠã¿ãŸãããã
ãŠã£ã³ããŠã¹ãã¯ã¿ãŒãè¡šããã®ãªãã·ã§ã³ã¯ã¹ãã¯ããã°ã©ã ãšåŒã°ããŸãã ç§åäœã®æéã¯æšªåº§æšã«ããããã«ãåäœã®åšæ³¢æ°ã¯çžŠåº§æšã«ãããŸãã ã¹ãã¯ãã«ã®æ¯å¹
ãè²ã§åŒ·èª¿è¡šç€ºãããŸãã ãã€ã³ããæããã»ã©ãæ¯å¹
ã¯å€§ãããªããŸãã
ãã§ãŒã¯ã¹ãã¯ãã«ã¯æ¯å¹
ã¹ãã¯ãã«ã§ããããã§ãŒã¯ã¹ã±ãŒã«ã§ç¹å®ã®ã¹ããããšãŠã£ã³ããŠã§æ®åœ±ãããŸãã äºåã«ã¹ãããæ°ãèšå®ããŸã;ã»ãšãã©ã®å®è£
ã§ã¯ãå€80ãåæã«äœ¿çšãããŸãïŒ
hp.n_melsãã©ã¡ãŒã¿ãŒã§èšå®ïŒã ãã§ãŒã¯ã¹ãã¯ãã«ãžã®ç§»è¡ã¯ãããŒã¿éã倧å¹
ã«åæžã§ããŸãããåæã«é³å£°ä¿¡å·ã«ãšã£ãŠéèŠãªç¹æ§ãä¿æããŸãã åããã¡ã€ã«ã®ãã§ãŒã¯ã¹ãã¯ããã°ã©ã ã¯æ¬¡ã®ãšããã§ãã
ãªã¹ãã®æåŸã®è¡ã§ããã§ãŒã¯ã¹ãã¯ãã«ãæéãšãšãã«èããªãããšã«æ³šæããŠãã ããã ãããã
4ã€ã®ãã¯ãã«ïŒ
hp.r == 4 ïŒã®ã¿ã䜿çšããããããµã³ããªã³ã°åšæ³¢æ°ãåæžãããŸãã é³å£°åæã§ã¯ãäžé£ã®æåãããã§ãŒã¯ã¹ãã¯ãã«ãäºæž¬ããŸãã èãæ¹ã¯åçŽã§ãããããã¯ãŒã¯ã®äºæž¬ãå°ããã»ã©ãããŸã察åŠã§ããŸãã
ã¹ãã¯ããã°ã©ã ã¯é³å£°ã§ååŸã§ããŸãããèãããšã¯ã§ããŸããã ãããã£ãŠãä¿¡å·ãå
ã«æ»ãå¿
èŠããããŸãã ãããã®ç®çã®ããã«ãã·ã¹ãã ã¯å€ãã®å ŽåãGriffin-Limã¢ã«ãŽãªãºã ãšãã®ææ°ã®è§£éïŒããšãã°ãRTISILAã
link ïŒã䜿çšããŸãã ãã®ã¢ã«ãŽãªãºã ã«ãããæ¯å¹
ã¹ãã¯ãã«ããä¿¡å·ã埩å
ã§ããŸãã ç§ã䜿çšããå®è£
ïŒ
def griffin_lim(spectrogram, n_iter=hp.n_iter): x_best = copy.deepcopy(spectrogram) for i in range(n_iter): x_t = librosa.istft(x_best, hp.hop_length, win_length=hp.win_length, window="hann") est = librosa.stft(x_t, hp.n_fft, hp.hop_length, win_length=hp.win_length) phase = est / np.maximum(1e-8, np.abs(est)) x_best = spectrogram * phase x_t = librosa.istft(x_best, hp.hop_length, win_length=hp.win_length, window="hann") y = np.real(x_t) return y
ãããŠãæ¯å¹
ã¹ãã¯ããã°ã©ã ããã®ä¿¡å·ã¯ã次ã®ããã«åŸ©å
ã§ããŸãïŒã¹ãã¯ãã«ãååŸããã®ãšéã®æé ïŒã
æ¯å¹
ã¹ãã¯ãã«ãååŸããŠã埩å
ããŠãããè©ŠããŠã¿ãŸãããã
ãªãªãžãã«ïŒ
埩å
ãããä¿¡å·ïŒ
ç§ã®å¥œã¿ã§ã¯ãçµæã¯æªåããŠããŸãã Tacotronã®èè
ïŒæåã®ããŒãžã§ã³ããã®ã¢ã«ãŽãªãºã ã䜿çšïŒã¯ãGriffin-Limã¢ã«ãŽãªãºã ãäžæçãªãœãªã¥ãŒã·ã§ã³ãšããŠäœ¿çšããŠãã¢ãŒããã¯ãã£ã®æ©èœãå®èšŒããŠããããšã«æ³šç®ããŸããã WaveNetããã³åæ§ã®ã¢ãŒããã¯ãã£ã«ããããã質ã®é«ãé³å£°ãåæã§ããŸãã ãããããããã¯ãããããŒãŠã§ã€ãã§ããããã¬ãŒãã³ã°ã«ã¯å€å°ã®åªåãå¿
èŠã§ãã
ãã¬ãŒãã³ã°
éžæããDCTTSã¯ã2ã€ã®å®è³ªçã«ç¬ç«ãããã¥ãŒã©ã«ãããã¯ãŒã¯ã§æ§æãããŠããŸãïŒText2MelãšSpectrogram Super-resolution NetworkïŒSSRNïŒã
Text2Melã¯ã2ã€ã®ãšã³ã³ãŒããŒïŒTextEncãAudioEncïŒãš1ã€ã®ãã³ãŒããŒïŒAudioDecïŒããªã³ã¯ããã¢ãã³ã·ã§ã³ã¡ã«ããºã ã䜿çšããŠãããã¹ãå
ã®ãã§ãŒã¯ã¹ãã¯ãã«ãäºæž¬ããŸãã Text2Melã¯ãŸã°ããªãã§ãŒã¯ã¹ãã¯ãã«ãæ£ç¢ºã«åŸ©å
ããããšã«æ³šæããŠãã ããã
SSRNã¯ããã¬ãŒã ã®æ¬ èœãèæ
®ãããµã³ããªã³ã°åšæ³¢æ°ã埩å
ããŠããã§ãŒã¯ã¹ãã¯ãã«ããå®å
šãªæ¯å¹
ã¹ãã¯ãã«ã埩å
ããŸãã
èšç®ã®ã·ãŒã±ã³ã¹ã«ã€ããŠã¯ãå
ã®èšäºã§è©³ãã説æããŠããŸãã ããã«ãå®è£
çšã®ãœãŒã¹ã³ãŒããããããããã€ã§ããããã°ããŠåŸ®åŠãªãšããã調ã¹ãããšãã§ããŸãã å®è£
ã®äœè
ãããã€ãã®å Žæã§èšäºããé¢ããããšã«æ³šæããŠãã ããã 2ã€ã®ãã€ã³ãã匷調ããŸãã
- æ£èŠåã®ããã®è¿œå ã®å±€ïŒæ£èŠåå±€ïŒãããããããªãã§ã¯ãèè
ã«ããã°äœãæ©èœããŸããã§ããã
- å®è£
ã§ã¯ãæ£èŠåãæ¹åããããã«ããããã¢ãŠãã¡ã«ããºã ã䜿çšããŸãã ããã¯èšäºã«ã¯ãããŸããã
8æéã®é²é³ïŒæ°åãã¡ã€ã«ïŒãå«ãé³å£°ãåããŸããã å·Šã®èšé²ã®ã¿ïŒ
- ããã¹ãã«ã¯ãæåãã¹ããŒã¹ããã€ãã³ã®ã¿ãå«ãŸããŸãã
- ããã¹ãã®é·ãã¯hp.max_Nãè¶
ããŸããã
- åžéåŸã®ãã§ãŒã¯ã¹ãã¯ãã«ã®é·ãã¯hp.max_Tãè¶
ããŸããã
ç§ã¯5æé匷ãåŸãŸããã ãã¹ãŠã®èšé²ã«å¿
èŠãªã¹ãã¯ãã«ãèšç®ããText2MelãšSSRNã®ãã¬ãŒãã³ã°ãéå§ããŸããã ããã¯ãã¹ãŠéåžžã«å·§åŠã«è¡ãããŸãïŒ
$ python prepro.py $ python train.py 1 $ python train.py 2
å
ã®ãªããžããªã§ã¯ã
prepro.pyã¯
prepo.pyãšåŒã°ããããšã«
泚æããŠ
ãã ãã ã ç§ã®å
ãªãå®ç§äž»çŸ©è
ã¯ããã«èããããªãã£ãã®ã§ãæ¹åããŸããã
DCTTSã«ã¯ç³ã¿èŸŒã¿å±€ã®ã¿ãå«ãŸããŠãããTacotronã®ãããªRNNå®è£
ãšã¯ç°ãªããã¯ããã«é«éã«åŠç¿ããŸãã
Intel Core i5-4670ã16 Gb RAMãGeForce 1080ãæèŒããç§ã®ãã·ã³ã§ã¯ãText2Melã®5äžã¹ãããã¯15æéã§åŠç¿ããSSRNã®7äž5ã¹ãããã¯5æéã§åŠç¿ããŸãã åŠç¿ããã»ã¹ã®1000ã¹ãããã«å¿
èŠãªæéã¯ã»ãšãã©å€ãããªãã£ãã®ã§ãå€ãã®ã¹ãããã§åŠç¿ããã®ã«ã©ãã ãã®æéããããããç°¡åã«ææ¡ã§ããŸãã
ããããµã€ãºã¯
hp.Bã§èª¿æŽã§ããŸãã æã
ãåŠç¿ããã»ã¹ã¯ã¡ã¢ãªäžè¶³ã§èœã¡ããããããããµã€ãºã2ã«åå²ãããŒãããåŠç¿ãåéããŸããã ãã®åé¡ã¯TensorFlowã®è
žïŒç§ã¯ææ°çã䜿çšããªãã£ãïŒãšãããåŠçã®å®è£
ã®è€éãã«ãããšä¿¡ããŠããŸãã å€
8ã§ãã¹ãŠãèœã¡ãã®ããããã®ã§ãç§ã¯ããã«å¯ŸåŠããŸããã§ããã
çµæ
ã¢ãã«ããã¬ãŒãã³ã°ãããåŸãæçµçã«åæãéå§ã§ããŸãã ãããè¡ãã«ã¯ããã¡ã€ã«ã«ãã¬ãŒãºãå
¥åããŠå®è¡ããŸãïŒ
$ python synthesize.py
å®è£
ãå°ã調æŽããŠãç®çã®ãã¡ã€ã«ãããã¬ãŒãºãçæããŸããã
çµæã¯waveãã¡ã€ã«ã®åœ¢åŒã§ã
samplesãã£ã¬ã¯ããªã«ä¿åãã
ãŸã ã ç§ãæã«å
¥ããåæã·ã¹ãã ã®äŸã次ã«ç€ºããŸãã
çµè«ãšçºèš
çµæã¯ãå質ã«å¯Ÿããç§ã®å人çãªæåŸ
ãäžåããŸããã ã·ã¹ãã ã¯ã¹ãã¬ã¹ããããã¹ããŒãã¯èªã¿ããããé³å£°ã¯èªèå¯èœã§ãã äžè¬ã«ãæåã®ããŒãžã§ã³ã§ã¯æªããããŸããã§ãããç¹ã«ããã¬ãŒãã³ã°ã«äœ¿çšãããã®ã¯5æéã®ãã¬ãŒãã³ã°ããŒã¿ã ãã ã£ãããã§ãã
ãã®ãããªåæã®å¯å¶åŸ¡æ§ã«ã€ããŠã¯çåãæ®ã£ãŠããŸãã ãããééã£ãŠããå Žåãåèªã®ã¹ãã¬ã¹ãä¿®æ£ããããšããäžå¯èœã§ãã ãã¬ãŒãºã®æ倧é·ãšãã§ãŒã¯ã¹ãã¯ããã°ã©ã ã®ãµã€ãºã«å³å¯ã«é¢é£ä»ããããŠããŸãã ã€ã³ãããŒã·ã§ã³ãšåçé床ãå¶åŸ¡ããæ¹æ³ã¯ãããŸããã
å
ã®å®è£
ã®ã³ãŒãã«å€æŽãæçš¿ããŸããã§ããã 圌ãã¯ãæ¢è£œã·ã¹ãã ã«ããåæã®ããã®ãã¬ãŒãã³ã°ããŒã¿ãšãã¬ãŒãºã®èªã¿èŸŒã¿ãããã³ãã€ããŒãã©ã¡ãŒã¿ãŒã®å€ïŒã¢ã«ãã¡ãããïŒ
hp.vocab ïŒãšãããã®ãµã€ãºïŒ
hp.B ïŒã®ã¿ã«é¢ä¿ããŠããŸããã æ®ãã®å®è£
ã¯å
ã®ãŸãŸã§ãã
話ã®äžéšãšããŠãç§ã¯ãã®ãããªã·ã¹ãã ã®å®è£
ã®çç£ã®ãããã¯ã«ãŸã£ãã觊ããŸããã§ãããããã¯ãŸã å®å
šã«E2Eé³å£°åæã·ã¹ãã ããéåžžã«é ãã§ãã ç§ã¯CUDAã§GPUã䜿çšããŸããããããã§ããã¹ãŠããªã¢ã«ã¿ã€ã ãããäœéã§ãã ãã¹ãŠãCPUã§ãšãŠã€ããªããã£ããåäœããã ãã§ãã
ãããã®åé¡ã¯ãã¹ãŠãä»åŸæ°å¹Žéã§å€§äŒæ¥ãç§åŠã³ãã¥ããã£ã«ãã£ãŠå¯ŸåŠãããŸãã éåžžã«èå³æ·±ããã®ã«ãªããšç¢ºä¿¡ããŠããŸãã