
æšå¹ŽãBadooã¯Hadoop + Sparkãã³ãã«ãç©æ¥µçã«äœ¿çšãå§ããSpark Streamingã䜿çšããŠæ°åäžã®ã¡ããªãã¯ãåéããã³åŠçããç¬èªã®ã·ã¹ãã ãæ§ç¯ããŸããã
ç§ãã¡ã®ç¥èãæ¡å€§ãããã®åéã®ææ°ã®ã€ãããŒã·ã§ã³ã«ç²Ÿéããããã«ãä»å¹Ž5ææ«ã«BIïŒããžãã¹ã€ã³ããªãžã§ã³ã¹ïŒéçºè
ã¯ãã³ãã³ã«è¡ãã次ã®Hadoop + Strataã·ãªãŒãºäŒè°ãéå¬ãããŸãããããã°ããŒã¿åæã
ãŸããç§ãã¡ã䜿çšããŠããæè¡ã¹ã¿ãã¯ã«èå³ããããŸããã Sparkã«ã€ããŠã®è¬æŒè
ã¯ãClouderaãDatabricksãHortonworksãIBMã®æžç±ã®è¬æŒè
ãã¯ãªãšã€ã¿ãŒãã¢ã¯ãã£ããªãå¯çš¿è
ããèè
ã®æµãå
šäœã匷調ããŸããã
äŒè°ã®äž»å¬è
ã¯ãè¡šçŽã«è±ªè¯ãªåç©ã®æ¬ãåºçããITåºç瀟ãšããŠç¥ãããO'ReillyãšãHadoopããŒã¹ã®ãœãªã¥ãŒã·ã§ã³ãå°éãšããITäŒç€Ÿã®Clouderaã§ãã
ä»å¹Žã®äŒè°ã®äŒå Žã¯ã巚倧ãª
ãã³ãã³ã®ãšã¯ã»ã«å±ç€ºããŒã«ã§ããã ãã®ãµã€ãºã¯çŽ æŽãããã§ãã DLRåè»ã®2ã€ã®é§
ã®éã«ãããå¿
èŠãªã€ãã³ããã©ã®éšåã§è¡ããããããããªãå Žåã¯ã15ã20åãããŠå»ºç©ã®äžæ¹ã®ç«¯ããããäžæ¹ã®ç«¯ãŸã§ç§»åã§ããŸãã
ã€ãã³ãèªäœã®èŠæš¡ãåæ§ã«éå¿çã§ããã äŒè°ã¯ç«ææ¥ããéææ¥ãŸã§ã®4æ¥éç¶ããŸããã æåã®2æ¥éã§ãå
¥éè¬çŸ©ãã»ãããŒããã¹ã¿ãŒã¯ã©ã¹ãéå¬ãããç¿æ¥ã«ã¯äŒè°èªäœãéå¬ãããŸããã ãããããããã°ããŒã¿ãæ±ãããšã«å°å¿µããäŒè°ã¯ããã§ãããšæãããã®ã§ãããã§æ瀺ãããæ
å ±ã®éã¯èšå€§ã§ããã äžé£ã®8ã€ã®ã¬ããŒãïŒïŒïŒã§å§ãŸãããã®åŸ10ãè¶
ãã䞊åã»ã¯ã·ã§ã³ãéããããããã«6æéã®ã¬ããŒãããããŸããã

ã¹ã±ãžã¥ãŒã«ãéåžžã«å³ãããããäž»å¬è
ã¯ãããããããããããæ¹æ³ã«ãã£ãšæ³šæãæãå¿
èŠããããŸãã ããšãã°ãåã»ã¯ã·ã§ã³ã®ãããã¯ã瀺ããããã¿ãŒã²ãããªãŒãã£ãšã³ã¹å¥ã«ã¬ããŒããäœæãããïŒã€ãŸãããšã³ãžãã¢ãã¢ããªã¹ããããã³ããžãã¹æ
åœè
åãã®ã¬ããŒãã«åå²ïŒããããšãã§ããŸãã
代ããã«ãç»é²äžã«ãA3ã·ãŒããæå°ã®ãã©ã³ãã§æžãããã¹ã±ãžã¥ãŒã«ãšãšãã«é
åžãããŸããã 確ãã«ã代æ¿æ段ãšããŠãçŸåšçºçããŠããã€ãã³ãã远跡ããç¬èªã®ã¹ã±ãžã¥ãŒã«ãäœæã§ããã¢ãã€ã«ã¢ããªã±ãŒã·ã§ã³ã䜿çšããããšãã§ããŸããã
äŒè°ã®åæ¥ã¯äžé£ã®çãåºèª¿è¬æŒã§å§ãŸããã¹ããŒã«ãŒã¯äººå·¥ç¥èœãããŒã¿åæãæ©æ¢°åŠç¿ãã»ãã¥ãªãã£ã®åéã®äž»èŠãªåŸåã«ã€ããŠ10ã15åéæèŠãå
±æããŸããã ãããã®ã¹ããŒãã®åŸãã¬ããŒãã®ããã»ã¯ã·ã§ã³ãéãããŸããã æãèå³æ·±ããšæãããã¬ããŒãã«ã€ããŠã¯ãæ¬æ¥ã®ã¬ãã¥ãŒã§ãç¥ããããŸãã
Spark 2.0次ã¯äœã§ããïŒ
Apache Sparkã»ã¯ã·ã§ã³ã®æåã¯ãDatabricksã®éçºè
ã§ããTathagata Dazã§ããïŒåœŒã¯ããã«èª°ã圌ã®ååãçºé³ã§ããªãããšã«æ°ä»ããã®ã§ã誰ããåã«TDãšåŒã³ãŸããïŒã 圌ã®ã¬ããŒãã§ã¯ãApache Spark 2.0ã®ãªãªãŒã¹ã§äœãæåŸ
ãããã«ã€ããŠè©±ããŸããã
TDã¯ãã¡ãžã£ãŒãªãªãŒã¹ã¯ä»å¹Ž6æã«äºå®ãããŠãããšè¿°ã¹ãŸããã ãã®èšäºã®å·çæç¹ã§ã¯ã
äžå®å®ãªãã¬ãã¥ãŒãªãªãŒã¹ããŒãžã§ã³ã®ã¿ããã¹ãŠã®ãŠãŒã¶ãŒãå©çšã§ããŸãã ã¹ããŒã«ãŒã¯ãŸãããªãªãŒã¹ã®ã倧éšåãã«ããããããã1.xãšã®åŸæ¹äºææ§ãã»ãŒå®å
šã«ç¶æãããããšãä¿èšŒããŸããã
次ã«ããã®ãªãªãŒã¹ã®çŽæãããç»æçãªæ©èœã«ã€ããŠçŽæ¥èª¬æããŸãã
- ã¿ã³ã°ã¹ãã³ãã§ãŒãº2 ã Tungstenãããžã§ã¯ãã¯ãSparkãã¬ãŒã ã¯ãŒã¯ã§ã®éã®ã¡ã¢ãªãšäœ¿çšçãæ¹åããããšãç®çãšããäžé£ã®æé©åã§ãã æŽæ°ãããããŒãžã§ã³ã§ã¯ãã¿ã³ã°ã¹ãã³ã®äœæ¥ã5ã10åå éããããšãçŽæããŸãã ããã¯ãã³ãŒãçæãæé©åããã¡ã¢ãªã¢ã«ãŽãªãºã ãæ¹åããããšã§éæãããŸããã 以åã«ããã€ãã®é£ç¶ããæäœããã®èŠæ±ãä»®æ³åŒã³åºãã®ãã§ãŒã³ãå¿
èŠãšããŠããå Žåãä»ã§ã¯åäžã®ã³ãŒãã«ã³ã³ãã€ã«ãããŸãã
- æ§é åã¹ããªãŒãã³ã° ã éçºè
ããå€ãã®ãã£ãŒãããã¯ãåãåã£ãSparkããŒã ã¯ãã¹ããªãŒãã³ã°ã¢ãã«ã倧å¹
ã«äœãçŽããŸããã æŽæ°ãããããŒãžã§ã³ã§ã¯ãæ§é åã¹ããªãŒãã³ã°ãšåŒã°ãããã€ã³ã¿ã©ã¯ãã£ããã¹ããªãŒãã³ã°ã䜿çšãããŸãããã®ã¹ããªãŒãã³ã°ã§ã¯ãæ¢åã®ã¹ããªãŒã ã«å¯ŸããŠããŸããŸãªã¯ãšãªãå®è¡ããæ©æ¢°åŠç¿ã¢ãã«ãæ§ç¯ããã©ã³ã¿ã€ã ãå®è¡ã§ããŸãã æ¬è³ªçã«ãããã¯SQL APIã®äžã«æ§ç¯ãããé«ã¬ãã«APIã§ãã ããã«ãããã¹ããªãŒãã³ã°ã¯ã¿ã³ã°ã¹ãã³ãããã¹ãŠã®æé©åãåãåãã¯ãã§ãã

- ããŒã¿ã»ãããšããŒã¿ãã¬ãŒã ã æ°ããããŒãžã§ã³ã§ã¯ãDataFrame = Dataset [Row]ãªããžã§ã¯ããäœæãããšãã«ãããã2ã€ã®APIãããŒãžãããŸãã ããã«ãããDatesetã®DataFrameãªããžã§ã¯ãã§ãããããã£ã«ã¿ãŒãªã©ã®æäœãå®è¡ã§ããŸãã æ©èœæ§ããŒã¿ã»ããã¯å®éšçãªãã®ãšããŠããŒã¯ãããŠããã1.xããŒãžã§ã³ãšã®äºææ§ã厩ããå Žæã®1ã€ã§ãã

Sparkã§ã¢ããªã±ãŒã·ã§ã³ãéçºããŠããå Žåã¯ãå€ã«æ°ããããŒãžã§ã³ãæºåããŠã¢ããã°ã¬ãŒãããã®ã«æéããããããšã確èªããŠãã ããã APIã®æ¹åãšããã©ãŒãã³ã¹ã®åäžã¯èŠäºã§ãã
Matei ZahariaãCTO DatabricksãSparkã®äœæè
ã«ããSpark Summitã®éåžžã«ãã䌌ãã¬ããŒãã¯ã
Spark 2.0ã§èŠãããšãã§ããŸãã
Sparkã§ã®ã¹ããªãŒãã³ã°ã®æªæ¥ã æ§é åã¹ããªãŒãã³ã°
äŒè°ã§ã®TDããã®2çªç®ã®ã¬ããŒãã¯ãSpark Streamingã®æ¹åæ§ãã©ã®ããã«çºå±ãããã«ã€ããŠã®ãã詳现ãªè©±ã§ããã ã¹ããŒã«ãŒã«ãããšãSparkã䜿çšããŠããéçºè
ã®åæ°ä»¥äžããSpark Streamingãã·ã¹ãã ã®æãéèŠãªã³ã³ããŒãã³ãã§ãããšèããŠããŸãã
éçºè
ãã¹ããªãŒãã³ã°ã®ååšã®3幎éã«ããã£ãŠè¡ã£ãäž»ãªçµè«ïŒãã®ããã»ã¹ã¯åç¬ã§çºçããã¹ãã§ã¯ãããŸããã ãŠãŒã¶ãŒã¯ãããŒã¿ã¹ããªãŒã ãååŸããŠåŠçããåŸã§äœ¿çšããããã«ããŒã¿ããŒã¹ã«æ ŒçŽããã ãã§ãªãã ã»ãšãã©ã®å Žåãç£èŠããã®ã¹ããªãŒã ã«æ¥ç¶ããæ©æ¢°åŠç¿çšã®ããŒã¿ãåéãããªã©ãå¿
èŠã§ãã
ãã®ç¹ã§ãéçºè
ã¯ããåºãèãå§ããäžèšã®ãã¹ãŠã®å¯èœæ§ãè¿œå ããããã¬ãŒã ã¯ãŒã¯å
ã§ãã¹ããªãŒãã³ã°ã ãã§ãªãç¶ç¶çã¢ããªã±ãŒã·ã§ã³ãšããæ°ãããããžã§ã¯ããåŒã³å§ããŸããã
TDã¯ãçŸåšã®D-Streamsã¢ãã«ã®äž»èŠãªåé¡é åã調æ»ããæ§é åã¹ããªãŒãã³ã°ãšåŒã°ããæ°ãããœãªã¥ãŒã·ã§ã³ãå°å
¥ããŸããã
æ§é åã¹ããªãŒãã³ã°ã¯ãç¡éã®ããŒãã«ãæã€äœåãšããŠã¹ããªãŒãã³ã°ãèŠãããšãã§ããæ°ããæŠå¿µã§ãã

ãã®ããŒãã«ã®ããŒã¿ã¯ãDataFrames APIãä»ããSQLã¯ãšãªã䜿çšããŠã¯ãšãªã§ããŸãã ãŠãŒã¶ãŒãå¿
èŠãšãããã®ã«å¿ããŠããã¹ãŠã®ããŒã¿ã«å¯ŸããŠãããã³åä¿¡ããããŒã¿ã®ãã«ã¿ã«å¯ŸããŠã®ã¿ãªã¯ãšã¹ããåŒã³åºãããšãã§ããŸãã
API dstreamsãšDataFramesãçµã¿åãããããšã«ãããã¹ããªãŒã ããã®ããŒã¿ãéçã»ãããšçµã¿åãããæäœãå®è¡ã§ããããã«ãªããŸããã
ãã®ã¬ããŒãã§ã¯ãæ°ããã·ã¹ãã ããå
éšãã§ã©ã®ããã«æ©èœããããæãéèŠãªããšãšããŠããã©ãŒã«ããã¬ã©ã³ã¹ãã©ã®ããã«éæããããã«ã€ããŠãæ€èšããŸããã
Spark Streamingã®äžã«è€éãªã·ã¹ãã ãæ§ç¯ããå Žåãéçºè
ã«ãããšãæãåçŽåãããAPIã§é«éã§èé害æ§ã®ããã¹ããªãŒãã³ã°ãåŸãããšãã§ãããããæ°ããStructured Streamingã³ã³ã»ãããå¿
ãæ€èšããå¿
èŠããããŸãã
ä»æ§ã¯ã
æ§é åã¹ããªãŒãã³ã°ããã°ã©ãã³ã°æœè±¡åã»ãã³ãã£ã¯ã¹ãšAPIã«ãããŸã ã
åãäœè
ã«ãããSpark Summitã䜿çšããåæ§ã®ã¬ããŒãã®èšé²ã¯ãããã§èŠãããšãã§ããŸãïŒ
æ§é åãããã¹ããªãŒãã³ã°ã®è©³çŽ°å®é£
å ±åã®åéã«ãäž»å¬è
ã¯å®å
šã«æšæºçãªã³ãŒããŒãã¬ã€ã¯ãæé
ããããæ¥ãæŒé£ãå§ãŸããŸããã å€ãã®ITäŒè°ãšã¯ç°ãªããHadoop + Strataã§ã®æŒé£ã¯è¿œå ãªãã·ã§ã³ã§ã¯ãããŸããã§ããããäŒè°ã®ã¹ãã³ãµãŒããæäŸãããŸããã ããšãã°ã1æ¥ç®ã¯Teradataã®ã©ã³ãã¯æ¶ŒãããŠæ°é®®ã§ã2æ¥ç®ã¯ããªã¥ãŒã ãã£ã·ãã®IBMããã§ããã
Holden Karauã«ããã·ã£ããã«ã®å

圌女ã®éŠã®éå±ãªã³ã°ã®ãã£ã€ã ã®äžã«ããèŽ
æ²¢ãªã¹ããŒã«ãŒã¯ãSparkã®å
éšã«ã€ããŠå€ãã®èå³æ·±ãããšãèªã£ãã
Sparkã¿ã¹ã¯ãå®è¡ãããã·ã£ããã«ã¹ããŒãžã®æãæ¥ãŸãã... OOMãã©ãŒïŒ
ãããŠãç§ã¯å¹žããšã·ãŒã«ã欲ããã§ãã ç«ã«ã€ããŠèšãã°ãå ±åæžã«ã¯ç«ãããããããŸããã ãŸãã圌女ã¯ãããã奜ãã§ãã éèŠã§ã¯ãªãããmi-mi-miãã ãã§ãããã ãã§ãã
ã§ã¯ãéå°ãªã¡ã¢ãªæ¶è²»ãšããã©ãŒãã³ã¹ã®äœäžã¯ã©ãããæ¥ãã®ã§ããããïŒ
- ããŒã«ããäžèŠãªã°ã«ãŒãåã®å®è¡ã å¯èœã§ããã°ãgroupByKeyã§ã¯ãªãreduceByKeyã䜿çšããå¿
èŠããããŸãã ããã«ãããããŒã¿éãããã«åæžãããŸãã
- äžåäžã«åæ£ãããããŒã¿ã 1ã€ã®ããŒãããã®ããŒã¿ãä»ã®ããŒãããã¯ããã«å€ãå Žåãã·ã£ããã«ãå®è¡ãããšããã®ããŒã¿ã¯ãã¹ãŠ1ã€ã®ã¬ãã¥ãŒãµãŒã«éããã... OOM-killerãããããã«ãããŸãïŒ ãããã£ãŠãã·ã£ããã«ã®äº€æã圹ç«ã€å ŽåããããŸãã
- ä»ã®ããŒã¿ã»ãããšã®çµåã䜿çšããå¿
èŠæ§ã ãŸããããã¯äžè¬çã«ãã¹ãŠã®map-reduceã¢ã«ãŽãªãºã ã«ãšã£ãŠçœå®³ã§ããããŒã¿éãæžããããšã¯äžå¯èœã§ãããå¢ããããšããã§ããªãããã§ãã ççºçãªå¢å ããªãããšã«æ³šæããå¿
èŠããããŸãã
ããã¯ãã¹ãŠç解ããã説æããã解決çãææ¡ãããŸãã ãããªãæé©åãåŸ
ã€éã«ã©ã®ãããªåé¡ãçºçããå¯èœæ§ããããã瀺ãããŸãã
ãããŠãã¡ãããä»åŸã®Spark 2.0ã®æ°æ©èœãªãã§ã¯ã
次ã¯ãSparkã®åäœãã¹ãã§ãã ãããŠãæå§ãã«-æé©åã§ããªãã³ãŒããéä¿¡ããææ¡ã
äžè¬ã«ãå®çšçã§ãªããŠãã誰ãããã¹ãŠãæ確ã§ããããã§ã¯ãããŸããããé¢çœãæçã§ããã
Slideshareã¬ããŒãã®
ã¹ã©ã€ãããŒã¹ã®ãã¢
äŒæ¥ã®ã¹ã¿ã³ãã§ã¯ã圌ãã®ä»£è¡šè
ãšããããŠäžéšã§ã¯-補åã®äœè
ãšè©±ãããšãã§ããŸããã
ããšãã°ãMapRã¹ã¿ã³ãã§
Ted DunningãMapR-FSã®ä»çµã¿ãæããŠãããŸããã ã¯ã©ã¹ã¿ãŒãããŒã ãã£ã¬ã¯ããªãšããŠããŠã³ãããäžæ¹ã®ã³ã³ãœãŒã«ã§å®æçã«çŸåšã®æå»ããã¡ã€ã«ã«æžã蟌ã¿å§ããä»æ¹ã§
tail -fãå®è¡ããŸããã äžè¬çã«ãã¯ãŒã«ïŒ ãã¡ã€ã«ãæäœããã ãã§ããã¡ã€ã«ã·ã¹ãã èªäœããµãŒããŒãã¬ããªã±ãŒã·ã§ã³ããã®ä»ãã¹ãŠãåŠçããŸãã ãŸããããŒã¿ãèªã¿åãã«ã¯ãFSã¯ã©ã€ã¢ã³ãã«ããŠã³ãããããã¡ã€ã«ãšããŠã ãã§ãªããåŠçã®ããã«Hive / Sparkã§äœ¿çšããããšãã§ããŸãã
ãã®FSã«ã¯ã³ãã¥ããã£ããŒãžã§ã³ããããŸãã ç§ãã¡ã¯ããã䜿ã£ãŠã¿ãã¹ãã ãšæããŸãïŒ
å®åHadoopã¯ã©ã¹ã¿ãŒã§ã®Apache Sparkã®ä¿è·

SparkãŸãã¯Hadoopã®ã»ãã¥ãªãã£ãæ§æããå¿
èŠãããå Žåã¯ãééããªããã®ãã¬ãŒã³ããŒã·ã§ã³ãããã¯ããŒã¯ã«è¿œå ããŠãã ããïŒ æåã«ãã¹ããŒã«ãŒã¯Hadoopã®ã»ãã¥ãªãã£ã·ã¹ãã ã®éçºã®æŽå²ããå°ã話ãããŸãããã©ã®ã³ã³ããŒãã³ããå«ãŸããŠãããã§ãã äžè¬çã«ãæåã¯ããã«ã»ãã¥ãªãã£ã¯æ³å®ãããŠããŸããã§ãããããã¹ãŠããã£ãšåŸã«ç»å Žããããã¯ãã¹ãŠãã©ã®ããã«é
眮ãããããéšåçã«èª¬æããŸãã
Sparkã®ããŒã¿ã»ãã¥ãªãã£ã¯ãHadoopã®ããŒã¿ã»ãã¥ãªãã£ã«åºã¥ããŠããŸãã ãããã£ãŠãã¹ããŒãªãŒã¯Kerberosããå§ãŸãããŠãŒã¶ãŒãæ¿èªããHDFS / YARNãæ§æããŸãã
ãããã£ãŠããã¹ãŠãé çªã«ä¿è·ããŸãã
- ãŠãŒã¶ãŒèªèšŒ;
- HDFS
- ã€ãŒã³;
- Web UI
- PRC API
- EncryptedFS;
- æç·ããŒã¿æå·å;
- JVMã¡ã¢ãª
- äžæçãªã·ã£ããã«ãããã¯ã®æå·åã
ãããç§ã¯äœãå¿ããŠããªãããã§ãã å¿ããå Žåããã¹ãŠãã¬ããŒãã«å«ãŸããŠããŸãïŒ
次ã«ããã¡ã€ã«ã¬ãã«ãHiveããŒãã«ã®ã¬ãã«ãè¡ãšåã®ã¬ãã«ã§ãç¹æš©ãé
åžããããã®ãªãã·ã§ã³ã«ã€ããŠèª¬æããŸããã å¯èœãªå Žåã¯ãã©ã®ããã«ç°ãªããŸããã
Sparkã®ã»ãã¥ãªãã£ã®éçºèŠéãã«ã€ããŠè©±ããŸããã
ã ããä»ãç§ã¯ã»ãã¥ãªãã£ã«é¢ãããã¹ãŠãç¥ã£ãŠããŸãã ãŸããããã¯ç§ã«ã¯æããŸãã
Hadoop Summitã§ã®åæ§ã®ã¬ããŒãã¯
ãæ¬çªçšHadoopã¯ã©ã¹ã¿ãŒã§ã®Sparkã®ä¿è·ãã芧ãã ããã
ãã³ãšãµãŒã«ã¹

ã¬ããŒãã®åæ¥ã®çµããã«ã2ã€ã®ã¢ãã¿ãŒããŒãã£ãŒãããã«è¡ãããŸããã æåã¯ãäŒè°ã®ã¹ããŒã«ãŒãšåå è
ã®ã»ãšãã©ãå±ç€ºããŒã«ã«è¡ããäž»å¬è
ã¯ã¹ãã³ãµãŒã®ã¹ã¿ã³ãã«ã¢ã«ã³ãŒã«ãšãœããããªã³ã¯ãæž©ããã¹ããã¯ã眮ããŸãããåå è
ãç¹å®ã®ã«ãŒãã«æ²¿ã£ãŠããããããå¥ã®ããã«ç§»åããŠå€ãéãããšãïŒã
æ¥äžãäž»å¬è
ãšãã©ã³ãã£ã¢ã¯äŒè°åå è
ã«ãã«ãŒãã·ãŒãããé
åžããŸããããããã«ã¯4ã€ã®ãã³ãã³ã®ããã®ãã§ãŒã³ã瀺ãããŠãããé çªã«ãäŒæ¯ãããå¿
èŠããããŸããã å€é£ã®ããã«ãããŒã«ãããã®ã¹ããã¯ã¯ã¹ãã³ãµãŒããæäŸãããŸããã äžéšã®æœèšã¯å€äžå®å
šã«è³è²žãããŠããŸããããä»ã®æœèšã§ã¯ãåå è
ãå°å
ã®å©çšè
ãææã«é¥ããŸããã

ç¿æ¥ã®æã«ã¯ããããã¡ãã®é¡ãšè³åã®ããŒã«ãŒã§ãã¹ãŠã®ããã蚪ããããšãã§ããåå è
ãç°¡åã«åºå¥ã§ããŸããã
åµã®å€ã®åŸãäŒè°ã®2æ¥ç®ã«å
šå¡ãåå ã§ããããã§ã¯ãããŸããã§ããã ããããç§ãã¡ã¯ããã€ãã®èå³æ·±ãå ±åãåŸãããšãã§ããããã«ã¯åãåããŸããã 次ã«ãå°è±¡ãå
±æããŸãã
Hadoopã®ãžã§ããé
ãã®ã¯ãªãã§ããïŒ
Apache Ambari Hadoopã¯ã©ã¹ã¿ãŒãã«ã¹ã¢ãã¿ãªã³ã°ããŒã«ã«é¢ããæçãªã¬ããŒãã
Ambari Metrics Systemã®äœ¿çšäŸããµãã·ã¹ãã ïŒHDFSãYARNãHBaseïŒã®æšæºããã·ã¥ããŒãã瀺ãããŠããŸãã
芳å¯ãããç¶æ³ã®å
·äœäŸã瀺ãããŠããŸãã

HDFSããã³YARNç£æ»ãã°ã§ã©ã®ããã«äœãèŠã€ããããšãã§ããããAmbariãä»ããŠãã°ãæäœããæ¹æ³ã瀺ããŸãã


䟿å©ãªããŒã«ïŒ ããã«ãããã¯ã©ã¹ã¿ãŒãäœããã®æ¹æ³ã§åäœããçç±ãã¿ã¹ã¯ã«ååãªãªãœãŒã¹ããããã©ãããå®è¡æ¹æ³ãšå®è¡å
容ããã詳现ã«ç解ã§ããŸãã
ç§ãã¡ã¯ãã®ãããªãã®ãã©ã®ããã«äœ¿çšãã¹ãããç解ããããšããŸãããããããŸã§ã®ãšãããå
¬åŒææžã«ãããšã¯ã©ã¹ã¿ãŒã¯Ambariãä»ããŠãããã€ããå¿
èŠããããæ¢ã«ã¯ã©ã¹ã¿ãŒããããŸãã 圌ã殺ããŸãããïŒ ã ããç§ã¯ããã«æãäžããŸãã
ãã¬ãŒã³ããŒã·ã§ã³ã¹ã©ã€ã ïŒ.pptxãã¡ã€ã«ãžã®æ³šææ·±ããªã³ã¯ïŒ
ãµã€ã³äŒ
äŒè°ã¯ãªã©ã€ãªãŒãäž»å¬ãããã¡ããå±ç€ºããŒã«ã®å
¥ãå£ã«å€§ããªã¹ã¿ã³ããææããŠããŸããã ããã§ã¯ãBig Booksã®æ¬æ°ãªæ¬ãå²åŒäŸ¡æ Œã§è³Œå
¥ããäŒè°ã§èè
ãèŠã€ããŠæ¬ã«çœ²åããããšãã§ããŸãã
ãŸãããµã€ã³äŒã®ã¹ã±ãžã¥ãŒã«ã¯ã¹ã¿ã³ãã«ããã£ãŠãããã¬ããŒãéã®é·ãéãèè
ã®æããå°æ¥ã®æ¬ã®ãµã€ã³å
¥ãã®æ©æãªãªãŒã¹ãèŽãç©ãšããŠåãåãããšãã§ããŸããã

ãããããã¹ãŠãç¡æã§ããããã«ããããã®åæãªãªãŒã¹ã®æ¬ã¯ããªã圹ã«ç«ãããå Žåã«ãã£ãŠã¯ççŽã«èšã£ãŠåºåã§ããããšãå€æããŸããã
è»èã§é£ã³ãããªãã¯ãåŸæŽãã
ä»ã®ã«ã³ãã¡ã¬ã³ã¹ãšåæ§ã«ãããŒã±ãã£ã³ã°æ
åœè
ã¯Strataã«åãæ®ãããŠããŸãããããã³ãäžããã«ãé åå
šäœã«unningãªsetãä»æããã ãã§ãã
äŒè°ã®éå§åã§ãããçµéšè±å¯ãªäººã
ã¯ããã¹ãŠãæ¬è³ªçã«åºåã®ã¿ã§ãããããã¹ã±ãžã¥ãŒã«ã«Xã®ã¹ãã³ãµãŒãä»ããã¬ããŒãã«ã¢ã¯ã»ã¹ããããšã«å¯ŸããŠèŠåããŸããã
ãŸããæ®å¿µãªãããèãæã«å¿
èŠãªæµè¡èªã䜿çšããããšããååãšèª¬æã®å ±åããããŸããããå®éã«ã¯ãã¹ããŒã«ãŒã¯è£œåã宣äŒãã説æã«èšèŒãããæè¡ã«ã€ããŠã¯äžèšãèšããŸããã§ããã
åäŒè°åå è
ã¯ãå人çšãã©ã¹ããã¯ã«ãŒããšã€ã³ã¹ããŒã«ãããRFIDã¿ã°ãä»ãããããžãæã¡ãcãªã¹ãã³ãµãŒã¯ãæ¥è¿ãã人ãã¹ãã£ã³ããååãããžãäžããåŸã®ã¿ãåã¹ã¿ã³ãã§ã¹ããã«ãŒãã¬ãããã¿ããã®ä»ã®ãåç£ãé
ããŸããã 圌ã®ãã¹ãŠã®ç»é²ããŒã¿ããã®ã¹ãã³ãµãŒãšå
±æããŸããã
äžéšã®ã¹ã¿ã³ãã§ã¯ããªã¹ããŒã«è³ªåã§çããåã«ãããžãã¹ãã£ã³ããããšãã§ããŸããã
ãããã®ããªãã¯ã®ããã«ãç§ã¯ãã¹ãŠã®ã¹ãã³ãµãŒã·ããã®ã¡ãŒã«é
ä¿¡ããéäŒããã®ã«ãã°ããæéãè²»ãããªããã°ãªããŸããã§ããã
çµè«ã®ä»£ããã«
Strata + Hadoopã«ã³ãã¡ã¬ã³ã¹ã¯ãäžçã®ããŸããŸãªåœã§å¹Ž5åéå¬ãããŸãã ããšãã°ã次ã®ã€ãã³ãã¯8æäžæ¬ã«å京ã§éå¬ãããŸãã ãã®ã·ãªãŒãºã®äŒè°ã«åºåžãããã©ãããŸã 決å®ããŠããªãå Žåã¯ãããã§æšå¥šããã®ã¯ããªãå°é£ã§ãã
äžæ¹ã§ãäž»ã«ãšã³ãžãã¢ãªã³ã°ã¬ããŒããšç¹å®ã®æè¡ã«é¢å¿ãããå Žåã¯ãããå°éçãªã€ãã³ããã芧ãã ããã ããšãã°ã
Spark Summit ã 説æããå€æãããšãå€ãã®ãšã³ãžãã¢ãããŠã圌ãèªèº«ã®å€ãã¯ãå°æ¥ã®éçºã®ããã«èŽè¡ãããã£ãŒãããã¯ãšãæ©èœèŠæ±ããåŸãããšæã£ãŠããŸãã
äžæ¹ãããªã倧ããªBIããŒã ãããå Žåãéåžžã«å¹
åºãã¬ããŒãã®ãããã§ããã¹ãŠã®äººã«ãšã£ãŠå€ãã®èå³æ·±ãããšããããééããªãæéãç¡é§ã«ããããšã¯ãããŸããã ãŸããã€ãã³ãã®éåžžã«å°éçãªçµç¹ãååããªé°å²æ°ãããã³ããŒã¿ãµã€ãšã³ã¹ãæ©æ¢°åŠç¿ãããžãã¹åæã®åéã§äººæ°ã®ãããã¹ãŠã®è£œåã®ã¯ãªãšã€ã¿ãŒãšãã£ããããæ©äŒãåŸãããŸãã
BIãœãããŠã§ã¢ãšã³ãžãã¢Vadim BabaevBIãœãããŠã§ã¢ãšã³ãžãã¢Valery Starynin