
誰ããç¥ã£ãŠããããã«ãäžçã®ããŒã¿éã¯å¢ãç¶ããŠãããæ
å ±ã®æµããåéããŠåŠçããããšããŸããŸãå°é£ã«ãªã£ãŠããŸãã ãããè¡ãã«ã¯ãMapReduceãã©ãã€ã ã䜿çšããŠããã«ãã¹ã¬ããã¢ããªã±ãŒã·ã§ã³ã®éçºãšãããã°ã®æ¹æ³ãç°¡çŽ åãããšããèãã§ã人æ°ã®ããHadoopãœãªã¥ãŒã·ã§ã³ã䜿çšããŸãã ãã®ãã©ãã€ã ã¯åžžã«ãã®ã¿ã¹ã¯ã§æåãããšã¯éããŸããããã°ãããããšãHadoopãè¶
ãããäžéšæ§é ãããããŸãïŒ
DAGãã©ãã€ã ãåãã
Apache Tez ã Tezã®å€èŠ³ã¯ãHive HDFS-SQLãã³ãã©ãŒã«ãé©å¿ããŸãã ããããåžžã«æ°ãããã®ãå€ããã®ããåªããŠããããã§ã¯ãããŸããã ã»ãšãã©ã®å ŽåãHiveOnTezã¯HiveOnMapReduceããã倧å¹
ã«é«éã§ãããããã€ãã®èœãšãç©Žã¯ãœãªã¥ãŒã·ã§ã³ã®ããã©ãŒãã³ã¹ã«å€§ãã圱é¿ããŸãã ããã§ãç§ãééãããã¥ã¢ã³ã¹ããäŒãããããšæããŸãã ãããETLãŸãã¯å¥ã®Hadoop UseCaseã®é«éåã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã
MapReduceãTezãHive
å
ã»ã©èšã£ãããã«ãäžçã«ã¯ãŸããŸãå€ãã®ããŒã¿ããããŸãã ãŸããã¹ãã¬ãŒãžãšåŠçã®ããã«ãHadoopãªã©ã®ãŸããŸãããªãããŒãªãœãªã¥ãŒã·ã§ã³ãç»å ŽããŸãã HDFSã«ä¿åãããããŒã¿ãåŠçããããã»ã¹ãå¹³åçãªã¢ããªã¹ãã§ãç°¡åã«è¡ããããã«ããããã«ãHadoopã«ã¯ããã€ãã®SQLã¢ããªã³ããããŸãã ãããã®äžã§æãå€ããåçŽãªããã®ã¯Hiveã§ãã Hiveã®æ¬è³ªã¯æ¬¡ã®ãšããã§ããããããããåã¹ãã¢åœ¢åŒã®ããŒã¿ãããããããã«é¢ããæ
å ±ãã¡ã¿ããŒã¿ã«å
¥åããããã€ãã®å¶éä»ãã®æšæºSQLãèšè¿°ããåé¡ã解決ããMapReduceãžã§ãã®ãã§ãŒã³ãçæããŸãã çŽ æŽããããå¿«é©ã§ãããé
ãã§ãã ããšãã°ãç°¡åãªã¯ãšãªã次ã«ç€ºããŸãã
select t1.column1, t2.column2 from table1 t1 inner join table2 t2 on t1.column1 = t2.column1 union select t3.column1, t4.column2 from table3 t3 inner join table4 t4 on t3.column1 = t4.column1 order by column1;
ãã®ã¯ãšãªã¯4ã€ã®ãžã§ããçæããŸãã
- table1å
éšçµåtable2;
- table3å
éšçµåtable4;
- çµå;
- 䞊ã¹æ¿ããŸãã

ã¹ãããã¯é 次å®è¡ãããåã¹ãããã¯HDFSã«ããŒã¿ãæžã蟌ãããšã§çµäºããŸãã éåžžã«æé©ã«èŠããŸããã ããšãã°ãã¹ããã1ãš2ã¯äžŠè¡ããŠå®è¡ã§ããŸãã ãŸããè€æ°ã®ã¹ãããã§åãããããŒãé©çšãããããã®ããããŒã®çµæã«è€æ°ã®ã¿ã€ãã®ã¬ãã¥ãŒãµãŒãé©çšããããšãè³¢æãªå ŽåããããŸãã ãããã1ã€ã®ãžã§ãã®ãã¬ãŒã ã¯ãŒã¯å
ã§ã®MapReduceã®æŠå¿µã§ã¯ãããã¯èš±å¯ãããŠããŸããã ãã®åé¡ã解決ããããã«ãDAGã³ã³ã»ããã®Apache Tezãããã«ç»å ŽããŸãã DAGã®æ¬è³ªã¯ãMapper-Reducerã®ãã¢ïŒ+ã€ãã·ãã³ïŒã®ä»£ããã«ãéå·¡åæåã°ã©ããäœæããããšã§ããåé ç¹ã¯Mapper.ClassãŸãã¯Reduser.Classã§ããããšããžã¯ããŒã¿ãããŒ/å®è¡é åºã瀺ããŸãã TAGã¯DAGã«å ããŠãããã«ããã€ãã®ããŒãã¹ãæäŸããŸããããžã§ãã®èµ·åã®é«éåïŒæ¢ã«å®è¡äžã®Tez-Engineãä»ããŠDAGãžã§ããéä¿¡ã§ããŸãïŒãã¹ãããéã§ããŒãã¡ã¢ãªã®ãªãœãŒã¹ãä¿æããæ©èœãç¬ç«ããŠäžŠååãèµ·åããæ©èœãªã©ã§ãã Tezãç»å Žãã察å¿ããã¢ããªã³ãHiveã«è¿œå ãããŸããã ãã®ã¢ãã€ã³ã䜿çšãããšãã¯ãšãªã¯ã»ãŒæ¬¡ã®æ§é ã®DAGãžã§ãã«ãªããŸãã
- ããããŒã¯table1ãèªã¿åããŸãã
- ããããŒã¯table2ãèªã¿åããã¹ããã1ã®çµæãšçµåããŸãã
- ããããŒã¯table3ãèªã¿åããcolumn1 IS NOT NULLããã£ã«ã¿ãŒããŸãã
- ããããŒã¯table4ãèªã¿åããcolumn1 IS NOT NULLããã£ã«ã¿ãŒããŸãã
- ã¬ãã¥ãŒãµãŒã¯ãã¹ããã3ãš4ã®çµæãçµåããŸãã
- ã¬ãã¥ãŒãµãŒã¯çµåãè¡ããŸãã
- ã¬ãã¥ãŒãµãŒã®ã°ã«ãŒãåãšäžŠã¹æ¿ãã
- çµæãåéããŸãã

å®éãã¹ããã1ãš2ã¯æåã®çµåã§ããã2ã3ãš4ã¯2çªç®ã®çµåã§ãïŒçµåãç°ãªãæ¹æ³ã§åŠçãããããã«ãç°ãªããµã€ãºã®ããŒãã«ãç¹å¥ã«éžæããŸããïŒã ãã®å Žåã2ã€ã®ãããã¯ã¯äºãã«ç¬ç«ããŠããã䞊è¡ããŠå®è¡ã§ããŸãã ããã¯ãã§ã«éåžžã«ã¯ãŒã«ã§ãã Tezã¯ãè€éãªã¯ãšãªã®åŠçé床ã倧å¹
ã«åäžãããŸãã ãã ãã
set hive.execution.engine=tez
ã¯MapReduceãããæªãå Žåããããããæ¬çª
set hive.execution.engine=tez
ã«éä¿¡ããåã«ã
set hive.execution.engine=mr
ãš
set hive.execution.engine=mr
äž¡æ¹ã§ã¯ãšãªãå®è¡ããå¿
èŠããããŸãã
ããã§ããã¹ãšã¯äœã§ããïŒ
Tezã«ã€ããŠç¥ã£ãŠããã¹ãããšïŒMapReduceããžãã¯ãDAGïŒæåéå·¡åã°ã©ãïŒã«å€æŽããããããŒãŸãã¯ãªãã¥ãŒãµãŒã«ããããããåãDataFlowå
ã§ããã€ãã®ç°ãªãããã»ã¹ãåæã«å®è¡ããæ©èœãæäŸããŸãã äž»ãªããšã¯ããã®å
¥åãæºåã§ããŠããããšã§ãã ããŒã¿ã¯ãã¹ãããéã§ããŒãã«ããŒã«ã«ã«ä¿åã§ãããã£ã¹ã¯æäœã«é Œããã«ããŒãã®RAMã«ä¿åããããšãã§ããŸãã ããããŒãšãªãã¥ãŒãµãŒã®æ°ãšå Žæãæé©åããŠããã«ãã¹ãããèšç®ãèæ
®ããŠããããã¯ãŒã¯äžã®ããŒã¿è»¢éãæå°éã«æãã1ã€ã®Tez-Jobã®ãã¬ãŒã ã¯ãŒã¯å
ã§é£æ¥ããã»ã¹ã§æ¢ã«åäœããŠããã³ã³ãããŒãåå©çšããçµ±èšã«äžŠåå®è¡ã調æŽã§ããŸããåã®ã¹ãããã§åéãããŸããã ããã«ããã®ãšã³ãžã³ã«ããããšã³ããŠãŒã¶ãŒã¯MapReduceãšåãã·ã³ãã«ãã§DAGã¿ã¹ã¯ãäœæã§ããŸããã圌èªèº«ã¯ãªãœãŒã¹ãåèµ·åãã¯ã©ã¹ã¿ãŒã®DAG管çã«åŸäºããŸãã Tezã¯éåžžã«ã¢ãã€ã«ã§ãããTezãµããŒããè¿œå ããŠãæ¢ã«å®è¡äžã®ããã»ã¹ãäžæãããããšã¯ãããŸããããŸããå€ãããŒãžã§ã³ã®Tezããã¹ãŠã®ã¯ã©ã¹ã¿ãŒã¿ã¹ã¯ã§åäœããå ŽåãããŒã«ã«ã§ãã¯ã©ã€ã¢ã³ãåŽãã§æ°ããããŒãžã§ã³ããã¹ãã§ããŸãã æåŸã«ãªããŸããããTezã¯ã¯ã©ã¹ã¿ãŒäžã§ãµãŒãã¹ãšããŠå®è¡ããããã¯ã°ã©ãŠã³ãã§å®è¡ã§ãããããMapReduceãæ£åžžã«èµ·åããããšããããã¯ããã«é«éã«ã¿ã¹ã¯ãéä¿¡ã§ããŸãã Tezãè©ŠããããšããªãããŸã çåãããå Žåã¯
ã HortonWorks
ãã¬ãŒã³ããŒã·ã§ã³ã§å
¬éãã
ãŠããé床æ¯èŒãã芧ãã ããã

ãããŠãHiveãšãã¢ã«ãªããŸããïŒ

ããããã°ã©ããšèª¬æã®ãã¹ãŠã®çŸããã«ã¯ãHiveOnTezã«åé¡ããããŸãã
Tezã¯ãMapReduceãããäžåäžãªããŒã¿ååžã«å¯Ÿããèæ§ãäœã
æåã®æ倧ã®åé¡ã¯ãDAG-jobãšMapReduce-jobã®äœæã®éãã«ãããŸãã ãããã«ã¯1ã€ã®ååããããŸããããããŒãšãªãã¥ãŒãµãŒã®æ°ã¯ããžã§ãã®éå§æã«èšç®ãããŸãã MapReduce-jobsã®ãã§ãŒã³ã«ãã£ãŠã¯ãšãªãå®è¡ãããå Žåã«ã®ã¿ãHadoopã¯åã®ã¹ãããã®çµæãšãœãŒã¹ã«ãã£ãŠåéãããåæã«åºã¥ããŠå¿
èŠãªã¿ã¹ã¯æ°ãèšç®ããŸããDAG-jobã®å Žåãããã¯ãã¹ãŠã®ã¹ããããåæã«åºã¥ããŠã®ã¿èšç®ãããåã«è¡ãããŸãã
äŸã§èª¬æããŸãã ã¯ãšãªã®éäžã®ã©ããã§ããã¹ããããã¯ãšãªãå®è¡ãããšã2ã€ã®ããŒãã«ããããŸãã çµ±èšã«ãããšãããããã«ã¯nè¡ãškåã®äžæã®çµåããŒå€ããããŸãã åºåã§ã¯ãçŽn * kè¡ãäºæ³ãããŸãã ãããŠããã®æ°éã1ã€ã®ã³ã³ããã«ããŸãåãŸããTezã次ã®ã¹ãããïŒãœãŒããªã©ïŒã§1ã€ã®Reducerã匷調衚瀺ãããšããŸãã ãããŠããã®Reducerã®æ°ã¯ãäœããã£ãŠãå®è¡ããã»ã¹äžã«å€åããŸããã ããã§ãå®éã«ã¯ãããã®ããŒãã«ã®ã¹ãã¥ãŒãéåžžã«æªããšä»®å®ããŸãã1ã€ã®å€ã«å¯ŸããŠn-k + 1è¡ããããæ®ãã¯ãã¹ãŠ1è¡ã«å¯ŸããŠã§ãã ãããã£ãŠãåºåã§ã¯n ^ 2 + k ^ 2-2kn-k + 2nè¡ã«ãªããŸãã ã€ãŸããïŒn + 2-2kïŒ/ k +ïŒk-1ïŒ/ nã¯n / kã®2åã®å€§ããã§ãã ãããŠããã®ãããªéã®1ã€ã®Reducerã¯æ°žé ãå®è¡ããŸãã MapReduceã®å Žåããã®ã¹ãããã®åºåã§n ^ 2 + k ^ 2-2kn-k + 2nãåãåã£ãHadoopã¯ããã®åŒ·åºŠã客芳çã«è©äŸ¡ããå¿
èŠãªæ°ã®MapperãšReducerãæäŸããŸãã ãã®çµæãMapReduceã䜿çšãããšããã¹ãŠãã¯ããã«é«éã«åäœããŸãã
ãã©ã€ãªèšç®ã¯éåžžã«æéããããããã«èŠãããããããŸããããå®éã«ã¯ãã®ç¶æ³ã¯çŸå®ã§ãã ãããŠããããèµ·ãããªãã£ãå Žåãããªãã¯å¹žéã ãšèããŠãã ããã è€éãªã¯ãšãªãŸãã¯ã«ã¹ã¿ã ããããŒã§ã©ãã©ã«ãã¥ãŒã䜿çšãããšãåæ§ã®Tez-DAGå¹æã«ééããŸããã
Tezã®ãã¥ãŒãã³ã°æ©èœ
ç®èãªããšã«ãç§ãç¥ã£ãŠããæåŸã®éèŠãªTezæ©èœã¯ããã®DAGãã¯ãŒã§ãã ã»ãšãã©ã®å Žåãã¯ã©ã¹ã¿ãŒã¯åãªãæ
å ±ã®ãªããžããªã§ã¯ãããŸããã ãŸããããŒã¿ãåŠçãããã·ã¹ãã ã§ããããã¢ã¯ãã£ããã£ã®æ®ããã¯ã©ã¹ã¿ã®ãã®éšåã«åœ±é¿ãäžããªãããšãéèŠã§ãã ããŒãã¯ãªãœãŒã¹ã§ãããããéåžžãã³ã³ããã®æ°ã¯ç¡å¶éã§ã¯ãããŸããã ãããã£ãŠããžã§ããå®è¡ãããšãã¯ãéåžžã®ããã»ã¹ã倧å¹
ã«é
ãããªãããã«ããã¹ãŠã®ã³ã³ãããè©°ãŸãããªãæ¹ãè¯ãã§ãã ãããŠãããã§DAGã¯ããªãã«ãã¿ã眮ãããšãã§ããŸãã DAGã§ã¯ãåå©çšãã¹ã ãŒãºãªè£
å¡«ãªã©ã«ãããå¿
èŠãªã³ã³ããã®æ°ãïŒãã£ã³ããŒãããå¹³åïŒå°ãªããªããŸããããããå€ãã®ã¯ã€ãã¯ã¹ãããããããšãã³ã³ããã¯ææ°é¢æ°çã«å¢å ãå§ããŸãã æåã®ããããŒã¯ãŸã å®æããŠããŸããããããŒã¿ã¯ãã§ã«ä»ã®ããããŒã«é
åžãããŠãããã³ã³ããã¯ãã¹ãŠããã«å²ãåœãŠãããŠããŸã-ããŒã ïŒ ã¯ã©ã¹ã¿ãŒã倩äºã«è©°ãŸã£ãŠãããããä»ã®èª°ãåäžã®ãžã§ããéå§ã§ããŸããã ååãªãªãœãŒã¹ããªããé²è¡ç¶æ³ããŒã®æ°å€ãã©ãã ããã£ããå€åãããã確èªããŸãã äžè²«æ§ããããããMapReduceã¯ãã®å¹æãå
ããŸããããã€ãã®ããã«ãé«éã§æéãæ¯æããŸãã
æšæºã®MapReduceãããŸãã«ãå€ãã®ã³ã³ãããå æãããšããäºå®ã«å¯ŸåŠããæ¹æ³ãé·ãéç¥ã£ãŠããŸãã ãã©ã¡ãŒã¿ãŒã調æŽããŸãã
mapreduce.input.fileinputformat.split.maxsize
ïŒæžå°-ããããŒã®æ°ãå¢ãããŸããmapreduce.input.fileinputformat.split.minsize
ïŒå¢å -ããããŒã®æ°ãæžãããŸããmapreduce.input.fileinputformat.split.minsize.per.node
ã mapreduce.input.fileinputformat.split.minsize.per.rack
ïŒããŒã«ã«ïŒããŒããŸãã¯ã©ãã¯ã®æå³ã§ïŒããŒãã£ã·ã§ã³ãå¶åŸ¡ããããã®ãã现ããèšå®ãhive.exec.reducers.bytes.per.reducer
ïŒå¢å - hive.exec.reducers.bytes.per.reducer
ã®æ°ãæžãããŸããmapred.tasktracker.reduce.tasks.maximum
ïŒ mapred.tasktracker.reduce.tasks.maximum
ã®æ倧æ°ãèšå®ããŸããmapred.reduce.tasks
ïŒç¹å®ã®æ°ã®mapred.reduce.tasks
èšå®ããŸãã
ã泚æ DAGã§ã¯ããã¹ãŠã®ãªãã¥ãŒã¹ã¹ãããã«ãããã§æå®ããæ°ã®ããã»ã¹ãå«ãŸããŸãïŒ ãã ããTezãã©ã¡ãŒã¿ãŒã¯ããè€éã§ãããMapReduceã«èšå®ãããã©ã¡ãŒã¿ãŒãåžžã«åœ±é¿ããããã§ã¯ãããŸããã ãŸãã
hive.tez.container.size
ã«éåžžã«ææã§ãããã€ã³ã¿ãŒãããã¯
yarn.scheduler.minimum-allocation-mb
ãš
yarn.scheduler.maximum-allocation-mb
éã®å€ãåãããšã
yarn.scheduler.minimum-allocation-mb
ãŠããŸãã 次ã«ãæªäœ¿çšã®ã³ã³ãããŒã®ä¿æãã©ã¡ãŒã¿ãŒã確èªããŸãã
tez.am.container.ide.release-timeout-max.millis
;tez.am.container.ide.release-timeout-min.millis
ã
tez.am.container.reuse.enabled
ãªãã·ã§ã³ã¯ãã³ã³ããã®åå©çšãæå¹ãŸãã¯ç¡å¹ã«ããŸãã ç¡å¹ã«ãããšãåã®2ã€ã®ãã©ã¡ãŒã¿ãŒã¯æ©èœããŸããã ãããŠç¬¬äžã«ãã°ã«ãŒãåãªãã·ã§ã³ãèŠãŠãã ããã
tez.grouping.split-waves
;tez.grouping.max-size
;tez.grouping.min-size
ã
å®éã«ã¯ãå€éšããŒã¿ã®èªã¿åãã䞊ååããããã«ãTezã¯ã¿ã¹ã¯ã圢æããããã»ã¹ãå€æŽããŸãããæåã«ã
tez.grouping.split-waves
ã¯ã¯ã©ã¹ã¿ãŒã§å®è¡ã§ããæ³¢æ°ïŒwïŒãæšå®ãã次ã«ãã®æ°ã«
tez.grouping.split-waves
ãã©ã¡ãŒã¿ãŒãä¹ç®ããç©ïŒNïŒãåå²ããŸãã¿ã¹ã¯ããšã®æšæºåå²æ°ã ã¢ã¯ã·ã§ã³ã®çµæã
tez.grouping.min-size
ãš
tez.grouping.max-size
éã«ããå Žåããã¹ãŠãæ£åžžã§ãããã¿ã¹ã¯ã¯Nåã®ã¿ã¹ã¯ã§éå§ãããŸãã ããã§ãªãå Žåãçªå·ã¯ãã¬ãŒã ã«é©åããŸãã Tezã®ããã¥ã¡ã³ãã§ã¯ããå®éšãšããŠã®ã¿ã
tez.grouping.split-count
ãã©ã¡ãŒã¿ãŒãèšå®ããããšã
tez.grouping.split-count
ãŸããããã«ãããäžèšã®ãã¹ãŠã®ããžãã¯ããã£ã³ã»ã«ããããã©ã¡ãŒã¿ãŒã§æå®ãããã°ã«ãŒãæ°ã«åå²ãã°ã«ãŒãåãããŸãã ããããç§ã¯ãã®ããããã£ã䜿çšããªãããã«ããŸããç¹å®ã®å
¥åããŒã¿ãæé©åããããã«TezãšHadoopå
šäœã«æè»æ§ãäžããŸããã
ãã¥ã¹ã®ãã¥ã¢ã³ã¹
倧ããªåé¡ã«å ããŠãããºã¯å°ããªæ¬ é¥ãå
ããŸããã ããšãã°ãhttp Hadoop ResourceManagerã䜿çšããå ŽåãTez-jobãã³ã³ãããŒãå æããéã¯è¡šç€ºãããŸãããããã«ãããããŒãšãªãã¥ãŒãµãŒã®ç¶æ
ã¯è¡šç€ºãããŸããã ã¯ã©ã¹ã¿ãŒã®ç¶æ
ãç£èŠããã«ã¯ã次ã®å°ããªPythonã¹ã¯ãªããã䜿çšããŸãã
import os import threading result = [] e = threading.Lock() def getContainers(appel): attemptfile = os.popen("yarn applicationattempt -list " + appel[0]) attemptlines = attemptfile.readlines() attemptfile.close() del attemptlines[0] del attemptlines[0] for attempt in attemptlines: splt = attempt.split('\t'); if ( splt[1].strip() == "RUNNING" ): containerfile = os.popen("yarn container -list " + splt[0] ) containerlines = containerfile.readlines() containerfile.close() appel[2] += int( containerlines[0].split("Total number of containers :")[1].strip() ) e.acquire() result.append(appel) e.release() appfile = os.popen("yarn application -list -appStates RUNNING") applines = appfile.read() appfile.close() apps = applines.split('application_') del apps[0] appsparams = [] for app in apps: splt = app.split('\t') appsparams.append(['application_' + splt[0],splt[3], 0]) cnt = 0 threads = [] for app in appsparams: threads.append(threading.Thread(target=getContainers, args=(app,))) for thread in threads: thread.start() for thread in threads: thread.join() result.sort( key=lambda x:x[2] ) total = 0 for app in result: print(app[0].strip() + '\t' + app[1].strip() + '\t' + str(app[2]) ) total += app[2] print("Total:",total)
HortonWorksã®ä¿èšŒã«ãããããããç§ãã¡ã®ãã©ã¯ãã£ã¹ã¯ãHiveã§åçŽãªSELECT smth FROMããŒãã«WHERE smthãå®è¡ãããšãã»ãšãã©ã®å ŽåMapReduceãããéãåäœããããšã瀺ããŠããŸãã ããã«ãèšäºã®åé ã§ãç§ã¯ããªããå°ãã ãŸããŸãããHiveOnMapReduceã§ã®äžŠååã¯å¯èœã§ãããããã»ã©ã¹ããŒãã§ã¯ãããŸããã å¿
èŠãªããšã¯ã
set hive.exec.parallel=true
ã«
set hive.exec.parallel.thread.number=
...ã«
set hive.exec.parallel.thread.number=
ã§ã-ãããŠãç¬ç«ããã¹ãããïŒããããŒ+ã¬ãã¥ãŒãµãŒãã¢ïŒã䞊åã«å®è¡ãããŸãã ã¯ãã1ã€ã®Mapperã®åºåã§è€æ°ã®ReducerãŸãã¯æ¬¡ã®Mapperãèµ·åãããå¯èœæ§ã¯ãããŸããã ã¯ãã䞊ååã¯ã¯ããã«åå§çã§ãããäœæ¥ãé«éåããŸãã
Tezã®ãã1ã€ã®èå³æ·±ãæ©èœã¯ãã¯ã©ã¹ã¿ãŒã§ãšã³ãžã³ãå®è¡ãããã°ãããšã³ãžã³ãä¿æããããšã§ãã äžæ¹ã§ã¯ãã¿ã¹ã¯ãããŒãäžã§ã¯ããã«é«éã«å®è¡ããããããããã«ããäœæ¥ãæ¬åœã«é«éåãããŸãã ãã ããäžæ¹ã§ãäºæããªããã€ãã¹ããããŸããéèŠãªããã»ã¹ã¯ããã®ã¢ãŒãã§ã¯éå§ã§ããŸãããTEZãšã³ãžã³ã¯ãæéã®çµéãšãšãã«å€ãã®ã¯ã©ã¹ãçæããGCãªãŒããŒãããŒã§ã¯ã©ãã·ã¥ããããã§ãã ãããŠãããã¯æ¬¡ã®ããã«ãªããŸãïŒ
nohup hive -f ....hql > hive.log &
ããå€ã«ãªã£ãŠååäžã«æ¥ãŸããã äžå¿«ã§ãã
å€ãè¯ãMapReduceãæ¢ã«å®å®çãªãªãŒã¹ã«å«ãŸããŠãããšããå°ããªåé¡ã®è²¯éç®±ã«è¿œå ãããTEZã¯äººæ°ãšé²æ©æ§ã«ãããããããããŒãžã§ã³0.8.4ã®ãŸãŸã§ãããã©ã®æ®µéã§ããã°ã«ééããå¯èœæ§ããããŸãã ç§ã«ãšã£ãŠææªã®ãã°ã¯æ
å ±ã®åé€ã§ããããã®ãããªããšã¯èŠãããšããããŸããã ããããTezã§èª€ã£ãèšç®ãçºçããMapReduceã¯ãããæ£ãããšèŠãªããŸãã ããšãã°ãç§ã®ååã¯ãäžæã®EntityIdãã£ãŒã«ããæã€2ã€ã®ããŒãã«table1ãštable2ã䜿çšããŸããã TezãéããŠãªã¯ãšã¹ããè¡ããŸããã
select table1.EntityId, count(1) from table1 left join table2 on table1.EntityId = table2.EntityId group by EntityId having count(1) > 1
ãããŠãåºåã«ããã€ãã®è¡ããããŸããïŒ ãã ããMapReduceã¯ç©ºã®çµæãè¿ããŸãã åæ§ã®åé¡ã«é¢ãã
ãã°ã¬ããŒãããããŸãã
ãããã«
Tezã¯ç¡æ¡ä»¶ã®å©ç¹ã§ãããã»ãšãã©ã®å Žåãç掻ã楜ã«ãªããHiveã§ããè€éãªã¯ãšãªãèšè¿°ãããããã«å¯Ÿããè¿
éãªåçãæåŸ
ã§ããŸãã ããããä»ã®åãšåæ§ã«ãæ
éãªã¢ãããŒããæ
éããããã³ããã€ãã®ãã¥ã¢ã³ã¹ã®ç¥èãå¿
èŠã§ãã ãã®çµæãå€ããå®çžŸã®ãããä¿¡é Œã§ããMapReduceã䜿çšããæ¹ãTezã䜿çšãããããåªããŠããå ŽåããããŸãã HiveOnTezã®ãã€ãã¹ã«é¢ããèšäºïŒRuNetã§ãè±èªã§ããªãïŒã1ã€ãèŠã€ãããªãã£ãããšã«éåžžã«é©ãããã®ã®ã£ãããåããããšã«æ±ºããŸããã ãã®æ
å ±ã誰ãã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã ãããããé¡ãããŸãïŒ ã¿ãªãããããããªãïŒ