ä»æ¥ãæ ãè
ã ããæ©æ¢°åŠç¿ããã¥ãŒã©ã«ãããã¯ãŒã¯ãããã³äººå·¥ç¥èœã«ã€ããŠè©±ãããšã¯ãããŸããïŒæžã蟌ã¿ãèããŠããŸãïŒã æšå¹ŽãMLã¯10代ã®ã»ãã¯ã¹ãšæ¯èŒãããŸãã-誰ããæãã§ããŸããã誰ããããæãã§ããŸããã ä»æ¥ã誰ããAIã«ãã£ãŠä»äºããªããªãããšãå¿é
ããŠããŸãã ææ°ã®Gartnerã®èª¿æ»ã«ãããšã2020幎ãŸã§ã«AIã®ãããã§ãæé€ããããããå€ãã®ä»äºãåµåºããããããèœã¡çãããšãã§ããŸãã ã ããã芪æãªãå人ãMLãæããŠãã ããããããŠããªãã¯å¹žãã«ãªããŸãã
泚ïŒHackerèªã®èšäºã®å®å
šçã®åºçç©ã·ãªãŒãºãç¶ç¶ããŸãã èè
ã®ã¹ãã«ãšå¥èªç¹ãä¿åãããŸããã
** ML- Python Azure Functions**, , , . :)
ãã®èšäºã§ã¯ã Aktion-press ïŒãªã³ã©ã€ã³ãµãã¹ã¯ãªãã·ã§ã³ãµãŒãã¹ïŒã§è¡ã£ããããžã§ã¯ãã®äŸãªã©ãå®çšçãªã±ãŒã¹ã§MLã玹ä»ããŸãã ãã®äŸã®èª¬æã¯ãå€ãã®äººã«ãšã£ãŠåœ¹ã«ç«ã€ãšç¢ºä¿¡ããŠããŸãã ãªããããªã«å€ãã®ã§ããïŒ ã¯ããç§ãã¡ã解決ããåé¡ã¯ãèšå€§ãªæ°ã®ã¡ãŒã«ã®ã¢ãã¬ã¹ãžã®ãœãŒããšè»¢éããšåŒã°ããŠããããã§ãã 管çè
ãé©åãªéšéã«æ¯ãåããŠè»¢éããªããã°ãªããªã巚倧ãªéä¿¡ã®åé¡ã¯ã»ãšãã©æ®éçã§ããããã®åé¡ã¯çŸä»£ã®æ¹æ³ã§è§£æ±ºãããªããã°ãªããŸããã
ããã§ã顧客ãšçžè«ããåŸãæåãœãŒãã®æ倧éã®èªååã®ããã®æ©æ¢°åŠç¿ã¢ãã«ãéçºããããšã«ããŸããã
æ©æ¢°åŠç¿ã¢ãã«
ãã®ãœãªã¥ãŒã·ã§ã³ã®èšèªãšããŠPythonãéžæããããšã«é©ããªããšæããŸãã ããã¯æŽå²çã«èµ·ãããŸããããé«ã¬ãã«ã§ãããæãéèŠãªã®ã¯ãæ©æ¢°åŠç¿ã«åœ¹ç«ã€å€ãã®ã©ã€ãã©ãªãåããŠããããšã§ãã 以äžã§ãããã«ã€ããŠã話ããŸãã
æ£çŽãªãšããããã®å Žåã®MLã«ã€ããŠç¹å¥ãªããšã¯ãããŸããã ããžã¹ãã£ãã¯ååž°ã«åºã¥ããäžé£ã®åçŽãªãã€ããªåé¡åã¯ææãªçµæã瀺ããããŒã¿ã®æºåãšåã蟌ã¿ããã¹ãã®æ§ç¯ã«çŠç¹ãåœãŠãŠãã¢ãã«èªäœããããçšåºŠæœè±¡åããããšãã§ããŸããã ãããããªããžããªèªäœã¯ãã§ã«ä»ã®3ã€ã®ç¬ç«ãããããžã§ã¯ãã®åºç€ãšããŠäœ¿çšãããŠãããããã€ãã®åé¡å®éšã§ååã«èšŒæãããéçºãžã®è¿
éãªç§»è¡ã®ããã®ä¿¡é Œã§ããåºç€ãšããŠç¢ºç«ãããŸããã ãããã£ãŠããã®ã»ã¯ã·ã§ã³ã®ã¿ã¹ã¯ã¯ãããŠããŠãã瀺ãããšã§ã¯ãªãã次ã®éçšåã®ã»ã¯ã·ã§ã³ã®åºç€ãšããŠå¿
èŠã§ãã
ãã®ã³ãŒããèªåã§è©ŠããŠã¿ãããåå©çšãããã§ããããã«ãããã§ç§ã®çµéšãå
±æããããã€ãã®æšå¥šäºé
ã瀺ããŸãã
æ©å¯æ§ãç¶æããããã«ãå
ã®ããŒã¿ã»ããã¯ãåæ§ã®å
¬éãããŠãããã¯ããã«ãã®ã¬ãã¥ãŒåé¡ããŒã¿ã»ããã«çœ®ãæããããŸããã ãã¡ã€ã«data / data.csvãåç
§ããŠãã ããã
ããŒã¿èªäœã¯ã Id
ã Text
ãããã³Class
3ã€ã®åãæã€CSVãã¡ã€ã«ã§æ瀺ãããŸããã ãŸããNLTKã¯CSV圢åŒã®ãã¡ã€ã«ããããŒã¿ãèªã¿åãããã®çµã¿èŸŒã¿ãµããŒããæäŸããªãããããã©ã«ããŒãããã¡ã€ã«ãåäžã®ããŒã¿ãã¬ãŒã ãã³ããšããŠèªã¿åãããNLTK圢åŒã®æ®µèœãæãåèªãªã©ã®åœ¢åŒã§ããã¹ããæœåºã§ããç¬èªã®ã¢ãžã¥ãŒã«ãäœæããŸããã
ãããŠãã¯ã©ã€ã¢ã³ãããŒã¿ã§CsvCorpusReader
ãèªã¿åããã®èªå·±èšè¿°ã¢ãžã¥ãŒã«ãåæåããããã®ã³ãŒããCsvCorpusReader
ãŸãã ã¯ã©ã¹ã®å®è£
ã¯ããã¡ã€ã«lib \ corpus.pyã§ç¢ºèªã§ããŸãã Experiments \ TrainingExperiment.pyãã¡ã€ã«ã®å
容ãç解ããããšã匷ããå§ãããŸãã
åæåã®æåŸã«ãææžããåèªãæœåºããããããæ£èŠåããå¿
èŠããããŸãã ç§ãã¡ã®ã±ãŒã¹ã§ã¯ãäžé£ã®å®éšã®åŸãäžé£ã®è£å©é¢æ°ãããã»ã¹å
šäœã®ã©ãããŒãšããŠäœ¿çšããŠã䜿ããããæ§æã¬ãã«å
ã§NLTKããã³Gensimã©ã€ãã©ãªã®åŒã³åºããé衚瀺ã«ããããšã«ããŸããã
以äžã«ãæœåºkeep_levels=Levels.Nothing
ã«åèªã®ãªã¹ãã®åœ¢åŒã§ããã¥ã¡ã³ããè¿ãã³ãã³ããäžãã段èœãŸãã¯æã®æ§é ãç Žæ£ããŸãïŒ keep_levels=Levels.Nothing
åç
§ïŒã 次ã«ãååèªãå°æåã«å€æããã¹ãããã¯ãŒããç Žæ£ããåèªã®åºæ¬ã匷調ããŸãã æçµæ®µéã§ã¯ãããããåãªãã¿ã€ããã¹ã§ããããåé¡ã«éèŠãªåœ±é¿ãäžããªããšä»®å®ããŠãäœé »åºŠã®åèªãåé€ããŸãã
以äžã®ã³ãŒãã¯è±èªã®ããŒã¿ãµã³ãã«ã®ã¿ã«çŠç¹ãåœãŠãŠããããšã«æ³šæããŠãã ããããªãªãžãã«ããŒãžã§ã³ã§ã¯ããã·ã¢èªã®ããæ£ç¢ºãªåé¡ãå¯èœã«ããPyMorphy2ã䜿çšããŠãã·ã¢èªã®èŠåºãèªåãå®è£
ãããŸããã
ãšã³ã¯ããŒãžã£ãŒãããŒã¯ã³åãããšããã«ã次ã®ã¹ãããã¯æè³ãæ§ç¯ããããšã§ãã åããã¥ã¡ã³ããåé¡åã§äœ¿çšããäžé£ã®ç¡æå³ãªæ°å€ã«å€æããã«ã¯ã以äžã®ã³ãŒããå¿
èŠã§ãã
ããã€ãã®ç°ãªãã¢ãããŒãïŒBoWãTF-IDFãLSIãRPãããã³w2vãå«ãïŒããã¹ãããŸãããããã®ã±ãŒã¹ã§ã¯500ã®ãããã¯ãæœåºããåŸæ¥ã®LSIã¢ãã«ãæè¯ã®çµæïŒ AUC = 0.98ïŒãæäŸããŸããã æåã«ãã³ãŒãã¯ãå
±æãã©ã«ããŒã«æ¢åã®ã·ãªã¢ã«åãããã¢ãã«ãååšãããã©ããã確èªããŸãã ã¢ãã«ããªãå Žåãã³ãŒãã¯äºåã«æºåãããããŒã¿ã䜿çšããŠæ°ããã¢ãã«ããã¬ãŒãã³ã°ããçµæããã£ã¹ã¯ã«ä¿åããŸãã ã¢ãã«ãæ€åºãããå Žåãããã¯åã«ã¡ã¢ãªã«ããŒããããŸãã 次ã«ãã³ãŒãã¯ããŒã¿ã»ãããå€æãã次ã®æ·»ä»ãã¡ã€ã«ã§ã¹ããªãŒã ãç¹°ãè¿ããŸãã
å¹çã®èŠ³ç¹ãããLSIã¢ãã«ã¯word2vecããã®ä»ã®ããè€éãªã¢ãããŒãã«åºã¥ããã¯ããã«åŒ·åãªvector2ã¢ã«ãŽãªãºã ãäžåã£ãŠããŸããããã¯ããã€ãã®èããããçç±ã«ããå¯èœæ§ããããŸãã
æãæçœãªã®ã¯ãæ¢ããŠããã¿ã€ãã®æåã«ã¯ãèªåè¿ä¿¡ã®å Žåã®ããã«ãäºæž¬å¯èœãªç¹°ãè¿ããã¿ãŒã³ã®åèªããã£ãããšã§ãïŒããšãã°ã ãããããšãããããŸããç§ã¯...質åãç·æ¥ã®å Žåã¯ãªãã£ã¹ã«ããŸãã... " ïŒã ãããã£ãŠãåŠçã«ã¯ãTF-IDFãªã©ã®åçŽãªãã®ã§ååã§ãã LSIã¯äžè¬çãªã€ããªãã®ãŒããµããŒãããŠããããã®ã¢ãã«ã¯åŠçã«é©ããå矩èªãè¿œå ããæ¹æ³ãšèŠãªãããšãã§ããŸãã åæã«ãWikipediaã§ãã¬ãŒãã³ã°ãããword2vecã¢ã«ãŽãªãºã ã¯ãè€éãªå矩èªæ§é ã®ããã«äžèŠãªãã€ãºãçæããå¯èœæ§ãé«ããããã«ãã£ãŠã¡ãã»ãŒãžå
ã®ãã³ãã¬ãŒããããŒãããããããã£ãŠåé¡ã®ç²ŸåºŠãäœäžããŸãã
ãã®ã¢ãããŒãã¯ãword2vecããªã«ã¬ã³ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®æ代ã§ãã£ãŠããå€ããŠããªãã·ã³ãã«ãªæ¹æ³ãè©Šã䟡å€ãããããšã瀺ããŸããã
ãã€ãã®ããã«ã矩åçãªã«ãŒãã³ã³ãŒããåãé€ãããšã¯äžå¯èœã§ãã ããã«ãskit-learnã䜿çšããŠæ©æ¢°åŠç¿çšã®ããŒã¿ãæºåãããšãã«åœ¹ç«ã¡ãŸãã
äžèšã§è¿°ã¹ãããã«ã1ã€ã®ãã«ãã¯ã©ã¹åé¡åã®ä»£ããã«è€æ°ã®ãã€ããªã䜿çšããŸãã ãã®ãããã¯ã©ã¹ã®1ã€ã«å¯ŸããŠãã€ããªã¿ãŒã²ãããäœæããŸãïŒãã®ãµã³ãã«ã§ã¯SlowServiceã§ãïŒã class_to_find
å€æ°ã®å€ãå€æŽãã以äžã®ã³ãŒããclass_to_find
å®è¡ããŠãåã¯ã©ã¹åé¡åãåå¥ã«ãã¬ãŒãã³ã°ã§ããŸãã è©äŸ¡ã¹ã¯ãªããã¯ãããã€ãã®ã¢ãã«ã§åäœããããã«èšèšãããŠãããéžæãããã©ã«ããŒããããããèªåçã«ããŠã³ããŒãããŸãã æåŸã«ããã¬ãŒãã³ã°ãšãã¹ãã®ããŒã¿ã»ããã圢æãããã®ã£ããã®ããè¡ã¯å®å
šã«é€å€ãããŸãã
次ã«ãåé¡åã®ãã¬ãŒãã³ã°ãéå§ãïŒãã®å Žåãããã¯ããžã¹ãã£ãã¯ååž°ã§ãïŒãã¢ãã«ã以åã«å€æãåã蟌ãããã«äœ¿çšããã®ãšåãäžè¬ãã£ã¬ã¯ããªã«ä¿åããŸãã
ã芧ã®ãšããã以äžã®ã³ãŒãã§ã¯ãã¢ãã«åã®ç¹å¥ãªåœ¢åŒïŒ class_{0}_thresh_{1}.bin
ãéµå®ããŠããŸãã ããã¯ãããã«è©äŸ¡ããéã«ã¯ã©ã¹åãšå¯Ÿå¿ãããããå€ã決å®ããããã«å¿
èŠã§ãã
ãããŠãç¶è¡ããåã®æåŸã®æ³šæç¹ã éçºããŒã«ãšããŠãVisual Studio CodeãéžæããŸããã Pythonã®ãããªåçèšèªã®ããã®åºæ¬çãªIntelliSenseæ©èœïŒã³ãŒãè£å®ãšããŒã«ãããïŒãæäŸãã䜿ãããã軜éãšãã£ã¿ãŒã§ãã åæã«ã Jupyterããã³Pythonæ¡åŒµæ©èœãšIPythonã³ã¢ãçµã¿åãããããšã§ãã³ãŒãã1è¡ãã€å®è¡ããã¹ã¯ãªãããåèµ·åããã«çµæãèŠèŠåã§ããŸããããã¯ãMLã¿ã¹ã¯ã«åžžã«äŸ¿å©ã§ãã ã¯ããæšæºã®Jupyterã«äŒŒãŠããŸãããIntelliSenseããã³/ gitã³ãŒãã®åãããããŸãã å°ãªããšããµã³ãã«ã§äœæ¥ããŠããéã¯è©ŠããŠã¿ãããšããå§ãããŸããçç£çãªéçºã«ã¯ãVS Codeã«é¢é£ããä»ã®å€ãã®æ©èœãããããã§ãã
以äžã®ã³ãŒãã«ã€ããŠã¯ã ROCãããå€ãããããããè¡ã¯Jupyteræ¡åŒµæ©èœã®äœ¿çšäŸã§ãã Run cell
äžã«ããç¹å¥ãªãã»ã«ãRun cell
ãã¿ã³ãã¯ãªãã¯ããŠãTPå€ãšFPå€ã確èªããããããå³åŽã®çµæãã€ã³ã®ãããå€ãšæ¯èŒã§ããŸãã ããŒã¿ã»ããã®é¡èãªäžåè¡¡ã«ãããæé©ãªã«ãããªãã¬ãã«ã¯éåžžã®0.5ã§ã¯ãªãåžžã«çŽ0.04ã§ãã£ããããéçšäžã«ãã®ãã£ãŒããç©æ¥µçã«äœ¿çšããŸããã ãã¹ãã«VSã³ãŒãã䜿çšã§ããªãå Žåã¯ãæšæºã®PythonããŒã«ã䜿çšããŠã¹ã¯ãªãããå®è¡ããå¥ã®ãŠã£ã³ããŠã§çµæã衚瀺ããåŸããã¡ã€ã«åãçŽæ¥å€æŽã§ããŸãã
ããã§ã¯ãè©äŸ¡ã¹ã¯ãªããã®æéã§ãïŒ Score \run.pyã ãã®äžã«ã¯ã»ãšãã©æ°ãããã®ã¯ãããŸãããã³ãŒãã®å€§éšåã¯ãåè¿°ã®åæãã¬ãŒãã³ã°å®éšããååŸãããã®ã§ãã GitHubãªããžããªã§ãã®ãã¡ã€ã«ã®å
容ã確èªããŠãã ããã
CSVãã¡ã€ã«ã¯è©äŸ¡ã®ããã«å
¥åã«éä¿¡ãããåºåã§ã¯2ã€ã®ç°ãªããã¡ã€ã«ãååŸããŸãã1ã€ã¯è©äŸ¡ãããã¯ã©ã¹ãå«ã¿ããã1ã€ã¯è©äŸ¡ã§ããªãè¡èå¥åãå«ã¿ãŸãã ãã¡ã€ã«ã䜿çšããçç±ã«ã€ããŠã¯ãåŸã§éçšåã«ã€ããŠèª¬æãããšãã«èª¬æããŸãã
ãã®ã»ã¯ã·ã§ã³ã®æåŸã§ã1ã€ã®ãã«ãã¯ã©ã¹åé¡åã®ä»£ããã«è€æ°ã®ãã€ããªã䜿çšããçç±ã説æããŸãã ãŸããäœæ¥ãéå§ããŠã¯ã©ã¹ã®ããã©ãŒãã³ã¹ãåå¥ã«æé©åããããšãéåžžã«ç°¡åã§ããã ãã®ã¢ãããŒãã§ã¯ãèªåå¿çã®å Žåã®ããã«ãã¯ã©ã¹ããšã«ç°ãªãæ°åŠã¢ãã«ã䜿çšããããšãã§ããŸããèªåå¿çã¯ãå€ãã®å Žåãããªã硬çŽããæ§é ãæã¡ãåçŽãªåèªã®è¢ã䜿çšããŠåŠçã§ããŸãã åæã«ãITã¹ãã·ã£ãªã¹ãã®èŠ³ç¹ããã¯ã以äžã®ã³ãŒãã®ãããªãã®ã䜿çšãããšå±éãç°¡åã«ãªããä»ã®ã¢ãã«ã«åœ±é¿ãäžããã«æ°ããã¢ãã«ãæ¥ç¶ããããæ¢åã®ã¢ãã«ãå€æŽãããã§ããŸãã
model_paths = [path for path in os.listdir(os.path.join('..', 'model')) if path.startswith('class_') ] for model_path in model_paths: model = joblib.load(os.path.join('..', 'model', model_path)) res = model.predict_proba(features_notnull)[:, 1] class_name = model_path.split('_')[1] threshold = float(model_path.rsplit('.', 1)[0].split('_')[-1]) result.loc[:, "class_" + class_name] = res > threshold result.loc[:, "class_" + class_name + "_score"] = res
ããŒã«ã«PCããç¬èªã®ããŒã¿ã䜿çšããŠãæäœããŸã£ããè¡ããã«ãä»ããã³ãŒããè©Šãããšãã§ããŸãã
- ãªããžããªã®ã¯ããŒã³ãäœæããããŒã«ã«ã®Anacondaç°å¢ããããã€ããããã®æ瀺ã«åŸã£ãŠãVisual Studio Codeãç®çã®æ¡åŒµåã§ã€ã³ã¹ããŒã«ããŸãã
- data \ data.csvãã¡ã€ã«ã«ãµããŒããããŠãã圢åŒã§ããŒã¿ãé
眮ã ã Experiment \ TrainingExperiment.pyãã¡ã€ã«ãéããŠãè©äŸ¡ããã¯ã©ã¹ã§ã¢ãã«ããã¬ãŒãã³ã°ããŸãã
- æåã«ã¢ãã«ãã©ã«ããŒå
šäœãåé€ããããšãå¿ããªãã§ãã ãããããããªããšãã³ãŒãããµã³ãã«ã®å€æãšã¢ãã«ãåå©çšããããšããããã§ãã
- Score \ run.pyã«ç§»åãã Score \ debug \ input.csvãã¡ã€ã« ã®ããŒã¿ãç¬èªã®ãã¡ã€ã«ã«çœ®ãæã ã Jupyteræ¡åŒµæ©èœã䜿çšããŠã¹ã¯ãªããã1è¡ãã€å®è¡ããŸãã
VS Codeã§ã¯ã ãããã°ãããã°ã»ã¯ã·ã§ã³ïŒCtrl + Alt + DïŒãéããæ§æãšããŠ[ ã¹ã³ã¢ïŒPythonïŒ ] ãéžæãã[ ãããã°ã®éå§ ]ãã¯ãªãã¯ããŠããšãã£ã¿ãŒã§è¡ããšã®ã³ãŒãåæãå®è¡ããããšãã§ããŸãã ã¢ã«ãŽãªãºã ãçµäºãããšãçµæã¯ãã©ã«ããŒScore \ debugã®ãã¡ã€ã«input.scores.csvããã³input.unscorable.csvã«ãããŸãã
éçšå
Azure Functionsã§ã®PythonãµããŒãã¯ãŸã åæã®ãã¬ãã¥ãŒã§ãããããããã·ã§ã³ã¯ãªãã£ã«ã«ãªã¿ã¹ã¯ã«äœ¿çšããããšã¯æãŸãããããŸããã ããããå€ãã®å ŽåãMLã¯ãã®ãããªãã®ã«ã¯é©çšãããªããããå®è£
ã®å®¹æãã¯ãäºåããŒãžã§ã³ã®é©åã«é¢ããå°é£ãäžåãå ŽåããããŸãã
ãããã£ãŠããã®æ®µéã§ã¯2ã€ã®ã¹ã¯ãªããããããŸããã Experiments \ TraintExperiment.pyã¹ã¯ãªããã¯ã¢ãã«ããã¬ãŒãã³ã°ããå€æãããŠãã¬ãŒãã³ã°ãããã¢ãã«ãå
±æãã£ã¬ã¯ããªã«ä¿åããŸãããã®ãã¬ãŒãã³ã°ã¹ã¯ãªããã¯ãå¿
èŠã«å¿ããŠããŒã«ã«ãã·ã³ã§åèµ·åããããšæ³å®ãããŠããŸãã Score \ run.pyã¹ã¯ãªããã¯æ¯æ¥å®è¡ãããæ°ããã¡ãŒã«ãå°çãããšãœãŒããããŸãã
ãã®ã»ã¯ã·ã§ã³ã§ã¯ãAzure Functionsã䜿çšããããã»ã¹ã®éçšåã«ã€ããŠèª¬æããŸãã ãããã®é¢æ°ã¯äœ¿ãããããã¹ã¯ãªãããããŸããŸãªããªã¬ãŒïŒHTTPããã¥ãŒãblobã¹ãã¬ãŒãžãªããžã§ã¯ããWebHookãªã©ïŒã«ãã€ã³ãããããã€ãã®èªååºåãã€ã³ãã£ã³ã°ãæäŸããå®äŸ¡ã§ããæ¶è²»ãã©ã³ãéžæãããšãæ¯æãã¯0.000016ã ãã§ã1ç§éã«äœ¿çšãããRAMã®ã®ã¬ãã€ãããšã«1ãã«ã ãã ããå¶éããããŸãïŒé¢æ°ã¯10å以äžå®è¡ã§ããã1.5 GBãè¶
ããRAMã䜿çšããŸãã ãããèªåã«åããªãå Žåã¯ãApp Serviceã«åºã¥ãç¹å¥ãªæéãã©ã³ã«ãã€ã§ãåãæ¿ããããšãã§ããŸããããµãŒããŒã¬ã¹ã¢ãããŒãã®ä»ã®å©ç¹ãžã®ã¢ã¯ã»ã¹ã¯ç¶æãããŸãã ãã ããåçŽãªããžã¹ãã£ãã¯ååž°ããã³æ°çŸæåã®ããã±ãŒãžã®å Žåãéžæããèšç»ãæé©ã§ããã
ããã°ã©ããŒã®èŠ³ç¹ããèŠããšãé¢æ°ã¯é¢æ°èªäœã®ååïŒãã®å Žåã¯åã«Score ïŒãä¿æãããã©ã«ããŒã§ããã2ã€ã®ç°ãªããã¡ã€ã«ãå«ãŸããŠããŸãã
function.json
ã¯ã function.json
ã®æ§æãèšè¿°ãããã¡ã€ã«ã§ãïŒãã¡ãã®åœ¢åŒãåç
§ ïŒãrun.py
ã¯ãããªã¬ãŒãèµ·åãããšãã«å®è¡ãããPythonã¹ã¯ãªããã§ãã
Function.jsonã¯ãæåã§äœæããããAzureããŒã¿ã«ã䜿çšããŠæ§æã§ããŸãã ãã®å Žåã«åãåã£ãã³ãŒãã以äžã«ç€ºããŸãã æåã®ãã€ã³ãã£ã³ã°inputcsvã¯ã mail-classify/input/{input_file_name}.csv
äžèŽããååã®ãã¡ã€ã«ãããã©ã«ãã®Azure BLOBã¹ãã¢ã«è¡šç€ºããããã³ã«ã¹ã¯ãªãããå®è¡ããŸãã æ®ãã®2ã€ã®ãã€ã³ãã£ã³ã°ã¯ãé¢æ°ãæåããåŸã«åºåãã¡ã€ã«ãä¿åããŸãã ãã®å Žåãããããå¥ã®åºåãã©ã«ããŒã«ä¿åããŸãããããã®ååã¯ãæ¥å°ŸèŸãscoredãŸãã¯unscorableã§ããå
¥åãã¡ã€ã«ã®ååã«å¯Ÿå¿ããŸãã ãããã£ãŠãGUIDãªã©ã®ä»»æã®èå¥ååãæã€ãã¡ã€ã«ãinput
ãã©ã«ããŒã«é
眮ã§ããŸããGUIDãã掟çããååãæã€2ã€ã®æ°ãããã¡ã€ã«ãããã°ãããããšoutput
ãã©ã«ããŒã«è¡šç€ºãããŸãã
{ "bindings": [ { "name": "inputcsv", "type": "blobTrigger", "path": "mail-classify/input/{input_file_name}.csv", "connection": "apmlstor", "direction": "in" }, { "name": "scoredcsv", "type": "blob", "path": "mail-classify/output/{input_file_name}.scored.csv", "connection": "apmlstor", "direction": "out" }, { "name": "unscorablecsv", "type": "blob", "path": "mail-classify/output/{input_file_name}.unscorable.csv", "connection": "apmlstor", "direction": "out" } ], "disabled": false }
Azureé¢æ°ã®run.py
ã¹ã¯ãªããã¯ãæåã®ãéæäœåãããŒãžã§ã³ãšã»ãŒåãã§ãã å¯äžã®å€æŽã¯ãé¢æ°ãçä¿¡ããã³çºä¿¡ããŒã¿ã¹ããªãŒã ãæž¡ãæ¹æ³ã«é¢ãããã®ã§ãã éžæãããå
¥åºåããŒã¿ã®ã¿ã€ãïŒHTTPèŠæ±ããã¥ãŒå
ã®ã¡ãã»ãŒãžãBLOBãã¡ã€ã«...ïŒã«é¢ä¿ãªããå
容ã¯äžæãã¡ã€ã«ã«ä¿åããããã®ãã¹ã¯å¯Ÿå¿ãããã€ã³ãã£ã³ã°ã®ååã§ç°å¢å€æ°ã«æžã蟌ãŸããŸãã ããšãã°ããã®å Žåãé¢æ°ãå®è¡ããããã³ã«ãã ... \ Binding [GUID] \ inputcsv ããšããååã®ãã¡ã€ã«ãäœæããããã®ãã¹ã¯inputcsvç°å¢å€æ°ã«ä¿åãããŸã ã åæ§ã®æäœãåçºä¿¡ãã¡ã€ã«ã«å¯ŸããŠå®è¡ãããŸãã ãã®ããžãã¯ãèæ
®ããŠãã¹ã¯ãªããã«ããã€ãã®å°ããªå€æŽãå ããŸããã
ãããã¯ãã¹ãŠãCSVãã¡ã€ã«ãBLOBã¹ãã¬ãŒãžã«è¡šç€ºããããã®çµæãšããŠäºæž¬ãå«ããã¡ã€ã«ãåä¿¡ãããšãã«ãµãŒãã¹ãéå§ããããã«å¿
èŠãªå€æŽã§ãã
æ£çŽã«èšããšãä»ã®ããªã¬ãŒããã¹ãããŸããããæã匷åãªPythoné¢æ°ïŒã¢ãžã¥ãŒã«ïŒããµãŒããŒã¬ã¹ã·ã¹ãã ã®åªãã«ãªãããšãããããŸããã Pythonã®ã¢ãžã¥ãŒã«ã¯ãä»ã®å€ãã®èšèªã®ããã«æ¥ç¶ããå¿
èŠãããéçã©ã€ãã©ãªã§ã¯ãããŸããããèµ·åãããã³ã«å®è¡ãããã³ãŒãã§ãã ãµãŒãã¹ãªã©ã®é·æçãªãœãªã¥ãŒã·ã§ã³ã®å Žåãããã¯ã»ãšãã©ç®ã«èŠããŸããããAzureã®æ©èœã®èŠ³ç¹ããã¯ãæ¯åã¹ã¯ãªãããå®å
šã«å®è¡ããã«ã¯ããªãã®ã³ã¹ããããããŸãã ããã«ãããPythonã§ã®HTTPããªã¬ãŒã®äœ¿çšãè€éã«ãªããŸãããå€ãã®MLã¹ã¯ãªããã§äžè¬çãªCSVãã¡ã€ã«ããŒã¹ã®ãããåŠçã«ãããããŒã¿è¡ãããã®ãããã®ã³ã¹ããåççãªæå°å€ãŸã§åæžã§ããŸãã
Pythonã§ãªã¢ã«ã¿ã€ã ããªã¬ãŒãªãã§å®è¡ã§ããªãå Žåã¯ãå°çšã®Azure App Serviceã®æéãã©ã³ã«åãæ¿ããããšãã§ããŸããããã«ããããã¹ãã®ã³ã³ãã¥ãŒãã£ã³ã°ãªãœãŒã¹ã倧å¹
ã«å¢å ããã€ã³ããŒããé«éåãããå¯èœæ§ããããŸãã ãã®å Žåãå®è£
ã®å®¹æããšæ¶è²»èšç»ã®äœã³ã¹ãããè¿
éãªå®è£
ã®å©ç¹ãäžåããŸããã
ç¶è¡ããåã«ãVisual Studio Codeã䜿çšããŠéçºãç°¡çŽ åããæ¹æ³ãèŠãŠã¿ãŸãããã ãã®èšäºã®å·çæç¹ã§ã¯ã Functions CLIã¯Pythonãã³ãã¬ãŒãã®åæçæãæäŸããŠããŸãããããããã°æ©èœã¯ãããŸããã§ããã ãã ããVS Codeã®çµã¿èŸŒã¿é¢æ°ã䜿çšããŠãã©ã³ã¿ã€ã ãã·ãã¥ã¬ãŒãããããšã¯ããã»ã©é£ãããããŸããã .vscode \ launch.jsonãã¡ã€ã«ã¯ããããã°ãªãã·ã§ã³ãæ§æããã®ã«åœ¹ç«ã¡ãŸãã JSON , debug Score (Python) VS Code ${workspaceRoot}/Score/run.py
${workspaceRoot}/Score
, , - . , Azure Functions ( ). Debug (Ctrl + Alt + D) VS Code, Score (Python) Start Debugging , .
[...] { "name": "Score (Python)", "type": "python", "request": "launch", "stopOnEntry": true, "pythonPath": "${config:python.pythonPath}", "console": "integratedTerminal", "program": "${workspaceRoot}/Score/run.py", "cwd": "${workspaceRoot}/Score", "env": { "inputcsv": "${workspaceRoot}/Score/debug/input.csv", "outputcsv": "${workspaceRoot}/Score/debug/output.csv", "unscorablecsv": "${workspaceRoot}/Score/debug/unscorable.csv" }, "debugOptions": [ "RedirectOutput", "WaitOnAbnormalExit" ] } [...]
Jupyter , . , . IPython, Debug .
if "IPython" in sys.modules and 'Score' not in os.getcwd(): os.environ['inputcsv'] = os.path.join('debug', 'input.csv') os.environ['scoredcsv'] = os.path.join('debug', 'input.scores.csv') os.environ['unscorablecsv'] = os.path.join('debug', 'input.unscorable.csv') os.chdir('Score')
, , Azure. Python Azure , . Python 2.7. 3.6, wiki Python ( ) D:\home\site\tools . . Python 2.7 PATH python.exe .
Kudu, , , . setup , . , 3.6, , (.zip) Python D:\home\site\tools .
tools_path = 'D:\\home\\site\\tools' if not sys.version.startswith('3.6'):
pip. Pip API Python, Python , . , Python ( langid , pymorphy ) , . , C++. App Service Visual C++, (wheels). pip ( ), ML- wheel . Azure Blob Storage, Azure. .
def install_package(package_name): pip.main(['install', package_name]) install_package('https://apmlstor.blob.core.windows.net/wheels/numpy-1.13.1%2Bmkl-cp36-cp36m-win_amd64.whl') install_package('https://apmlstor.blob.core.windows.net/wheels/pandas-0.20.3-cp36-cp36m-win_amd64.whl') install_package('https://apmlstor.blob.core.windows.net/wheels/scipy-0.19.1-cp36-cp36m-win_amd64.whl') install_package('https://apmlstor.blob.core.windows.net/wheels/scikit_learn-0.18.2-cp36-cp36m-win_amd64.whl') install_package('https://apmlstor.blob.core.windows.net/wheels/gensim-2.3.0-cp36-cp36m-win_amd64.whl') install_package('https://apmlstor.blob.core.windows.net/wheels/nltk-3.2.4-py2.py3-none-any.whl') install_package('langid') install_package('pymorphy2')
. , , NLTK. install_packages.
import nltk; nltk_path = os.path.abspath(os.path.join('..', 'lib', 'nltk_data')) if not os.path.exists(nltk_path): os.makedirs(nltk_path) print("INFO: Created {0}".format(nltk_path)) nltk.download('punkt', download_dir=os.path.join('..', 'lib', 'nltk_data')) nltk.download('stopwords', download_dir=os.path.join('..', 'lib', 'nltk_data'))
Setup , . , : , Python 3.6, , .
ãããã«
, , Azure Functions ML- Python. , ML . GitHub .
ããã¯ããã«ãŒèªã®èšäºã®å®å
šããŒãžã§ã³ã§ããããšãæãåºããŸãã