䞊åã³ã³ãã¥ãŒãã£ã³ã°ããŒã«ã䜿çšããå Žåãã¢ã«ãŽãªãºã ã«æ¬¡ã®ãããª2ã€ã®é 次ã¹ããããå«ãŸããå Žåãç¶æ³ãçºçããå¯èœæ§ãéåžžã«é«ããªããŸã
ãi ïŒå
j- thã¹ããªãŒã ã¯
j- thã¡ã¢ãªã»ã«ã«äžéèšç®çµæãæ ŒçŽãã
ii ïŒãã®ã¹ããªãŒã ã¯1ã€ã®çµæ以äžã®ãé£æ¥ãã¹ã¬ããã æããã«ãããã°ã©ã ã³ãŒãã«ç¹å®ã®æéããªã¢ãç·šæããå¿
èŠããã
ãŸããåã¹ã¬ãã 㯠ãäžéçµæã察å¿ããã¡ã¢ãªã»ã«ã«ä¿åããåŸã«
åã¹ã¬ãããå
æã
ãŸã ïŒã¹ãããïŒ
i ïŒïŒã ããããªããšãäžéšã®ã¹ã¬ãããã¹ãããïŒ
ii ïŒã«é²ã¿ãä»ã®ã¹ã¬ããããŸã ã¹ãããïŒ
i ïŒãå®äºããŠããªãå ŽåããããŸãã æ®å¿µã§ãããCUDAã®äœæè
ã¯ã1ã€ã®GPUã§ä»»æã®æ°ã®ã¹ã¬ãããåæããããã®ãã®ãããª
ç¹å¥ãªçµã¿èŸŒã¿ã¡ã«ããºã ã¯å¿
èŠãªããšæããŸããã ããã§ã¯ããã®æšåã«ã©ã®ããã«å¯ŸåŠã§ããŸããïŒ Googleã¯ããã³ããã§å€æããŠãã®åé¡ã«ç²ŸéããŠããŸããããã®ã¿ã¹ã¯ã®ããã®æ¢è£œã®æºè¶³ã§ããã¬ã·ããèŠã€ããããšã¯ã§ããŸããã§ããããããŠåå¿è
ïŒç§ã¯ïŒã«ãšã£ãŠæãŸããçµæãéæããæ¹æ³ã«ããã€ãã®èœãšãç©ŽããããŸãã
CUDAã¢ãŒããã¯ãã£ã«é¢ããããã€ãã®èšè
ãŸããå
¬åŒããã¥ã¡ã³ã
[1,2]ããã³ã¹ã©ã€ã
[3,4] ãããŸããŸãªãµãŒãããŒãã£ãµã€ãã®è³æ
[5-11]ã«åºã¥ããŠãããã°ã©ããCUDAã䜿çšãããšãã«éé
ããäžè¬çãª
ç¶æ³ãæãåºãããŠãã ããã æé«ã¬ãã«ã®æœè±¡åã§ã¯ãSIMTïŒ
ã·ã³ã°ã«åœä»€ããã«ãã¹ã¬ãã ïŒã¢ãŒããã¯ãã£ãåãã䞊åã³ã³ãã¥ãŒãã£ã³ã°ã·ã¹ãã ãååŸããŸãã1ã€ã®ã³ãã³ããå€ããå°ãªããç¬ç«ãã
ã¹ã¬ãããšäžŠè¡ããŠå®è¡ãããŸãã åäžã®ã¿ã¹ã¯ã®äžéšãšããŠå®è¡ãããããããã¹ãŠã®ã¹ã¬ããã®å
šäœïŒå³1ãåç
§ïŒã¯ã
gridãšåŒã°ããŸãã

ã°ãªããã®äžŠåå®è¡ã¯ããŸãæåã«ãå®éã«ãããŒãå®è¡ããå€æ°ã®åäžã®
ã¹ã«ã©ãŒããã»ããµããããªã«ãŒãäžã«ååšããããšã«ãã£ãŠä¿èšŒãããŸãïŒå³3ãåç
§ïŒã ç©ççã«ïŒå³2ãåç
§ïŒãã¹ã«ã©ãŒããã»ããµã¯
ã¹ããªãŒãã³ã°ãã«ãããã»ããµ ïŒ
SM ïŒã®äžéšã§ãã

ããšãã°ãTeslaã«ã¯30åã®SMããããåSMã«ã¯8åã®ã¹ã«ã©ãŒããã»ããµããããŸãã ãã ãããããã®240ã®ã³ã¢ã§ã¯ãå©çšå¯èœãªãªãœãŒã¹ïŒãããã®ã³ã¢ã®äœæ¥æéãšå©çšå¯èœãªã¡ã¢ãªã®äž¡æ¹ïŒãå
±æããããŒããŠã§ã¢ã¡ã«ããºã ã®ãããã§ãéåžžã«å€ãã®ã¹ã¬ãã
ïŒ1ïŒããã°ãªãããå®è¡ã§ããŸãã ãããŠããããã®ã¡ã«ããºã ã ãã®å®è£
ã®ããã€ãã®æ©èœã¯ãå
±æã¡ã¢ãªã«ã¢ã¯ã»ã¹ãããšãã«ã¹ã¬ãããåæããããã®æè¡ã決å®ããŸãã
ãã®ãããªéèŠãªæ©èœã®1ã€ã¯ã
ã¯ãŒãã®32åã®ãããŒã®ã°ã«ãŒãåã§ããããã¯ã倧ããªãã©ãŒã¡ãŒã·ã§ã³ã®äžéšã§ããããšãå€æããŸããã åãããã¯ã®ãã¹ãŠã®ã¹ã¬ããïŒããšãã°ãTeslaãããã¯ã®å Žåãæ倧512ã¹ã¬ãã
ïŒ1ïŒãå«ããããšãã§ããŸãïŒã¯ãå³å¯ã«1ã€ã®SMã§å®è¡ãããããããªãœãŒã¹ã«ã®ã¿ã¢ã¯ã»ã¹ã§ããŸãã ãã ãã1ã€ã®SMã§è€æ°ã®ãããã¯ãèµ·åããããšãã§ãïŒå³3ãåç
§ïŒããªãœãŒã¹ã¯ãããã®éã§åçã«åå²ãããŸãã

åSMã«ã¯ãããã»ããµæéãªãœãŒã¹ãé
åžããå¶åŸ¡ãŠãããããããŸãã ããã¯ã1ã€ã®SMã®ãã¹ãŠã®ã«ãŒãã«ãåžžã«1ã€ã®ã¯ãŒããå³å¯ã«å®è¡ããããã«è¡ãããŸãã ãããŠãå®äºãããšããã®SMã«å²ãåœãŠããã次ã®ã¯ãŒãããcãªæé©ãªæ¹æ³ã§éžæãããŸãã ãããã£ãŠã1ã€ã®ã¯ãŒãã®ãããŒã¯CUDAã®ããŒããŠã§ã¢æ©èœã«ããåæãããSIMDïŒ
åäžåœä»€ãè€æ°ããŒã¿ ïŒã«ããã«è¿ãæ¹æ³ã«åŸã£ãŠå®è¡ãããããšãããããŸãã ããããç°ãªãã¯ãŒãããã®1ã€ã®ãããã¯ã®ãããŒããèããåæããŠããªãå ŽåããããŸãã
ãã1ã€ã®éèŠãªæ©èœã¯ãCUDAã®ã¡ã¢ãªã®æ§æãšãããŸããŸãªéšåãžã®ã¹ã¬ããã®ã¢ã¯ã»ã¹ã§ãã ã¹ããªãŒã ã®äžè¬çãªæé«ã®ã¢ã¯ã»ã¹å¯èœæ§ã¯ãã°ã©ãã£ãã¯ã«ãŒãã«å¯å°ãããéç©åè·¯ã®åœ¢ã§ç©ççã«å®è£
ããã
ã°ããŒãã«ã¡ã¢ãª ïŒç©ç
ã¡ã¢ãª ïŒã«ãã£ãŠæäŸãããŸãã ããã»ããµã®å€éšã«ããããããã®ã¿ã€ãã®ã¡ã¢ãªã¯ããããªã«ãŒãã§ã®èšç®çšã«æäŸãããŠããä»ã®ã¡ã¢ãªã«æ¯ã¹ãŠæãé
ããªããŸãã ããå°ããªãã¢ã¯ã»ã¹å¯èœæ§ãã¯
å
±æã¡ã¢ãªã§ããéåžžããµã€ãºã16KB
ïŒ1ïŒã®åSMã«ãããããã¯ïŒå³2ãåç
§
ïŒã¯ããã®SMã®ã³ã¢ã§å®è¡ãããã¹ã¬ããã®ã¿ã«ã¢ã¯ã»ã¹ã§ããŸãïŒå³ãåç
§ããŠãã ããïŒ 1ãå³3ïŒã 1ã€ã®SMã§ã®äžŠåå®è¡ã«è€æ°ã®ãããã¯ãå²ãåœãŠãããšãã§ãããããSMã§äœ¿çšå¯èœãªå
±æã¡ã¢ãªã®å
šéããããã®ãããã¯ã«åçã«é
åãããŸãã å
±æã¡ã¢ãªã¯SMã³ã¢ã«éåžžã«è¿ãå Žæã«ç©ççã«é
眮ãããŠãããããã¡ã¢ãªã®äž»ãªçš®é¡ã§ãã
ã¬ãžã¹ã¿ã®é床ã«å¹æµããé«ãã¢ã¯ã»ã¹é床ãæã£ãŠããããšã«æ³šæããŠãã ããã åºæ¬çãªæ©æ¢°èªåœä»€ã®ãªãã©ã³ããšããŠäœ¿çšã§ããã¬ãžã¹ã¿ã§ãããæéã®ã¡ã¢ãªã§ãã 1ã€ã®SMã®ãã¹ãŠã®ãã£ãã·ã¥ã¬ãžã¹ã¿ã¯ããã®SMã§å®è¡ãããŠãããã¹ãŠã®ã¹ã¬ããã«åçã«åå²ãããŸãã ã¹ã¬ããã䜿çšããããã«å²ãåœãŠãããã¬ãžã¹ã¿ã®ã°ã«ãŒãã¯ã圌ã ãã䜿çšã§ããŸãã CUDAïŒãŸãã¯éã«çœå®³ã®èŠæš¡ïŒã®åã®å®äŸãšããŠïŒåããã¹ã©ã§ã¯ãåSMã¯16384åã®32ãããæ±çšã¬ãžã¹ã¿ãŒã®äœ¿çšãæäŸããŸã
ïŒ1ïŒ ã
äžèšã®ãã¹ãŠããã1ã€ã®ãããã¯ã®ãããŒéã®çžäºäœçšã¯å
±éã®é«éå
±æã¡ã¢ãªãä»ããŠã2ã€ã®ç°ãªããããã¯ã®ãããŒéã§ã¯ã°ããŒãã«ã¡ã¢ãªã®ã¿ã䜿çšããŠè©Šè¡ããããšçµè«ä»ããããšãã§ããŸãã ããã¯ãå°å
¥éšã§æŠèª¬ããåé¡ãçºçããå Žæã§ããã¡ã¢ãªé åã®èªã¿åããšæžã蟌ã¿ã«å
¬éãããŠããããŸããŸãªã¹ããªãŒã ã®ããŒã¿ã®é¢é£æ§ã远跡ããŸãã èšãæããã°ãã¹ã¬ããåæã®åé¡ã ãã§ã«è¿°ã¹ãããã«ã1ã€ã®ãããã¯å
ã§ãåã¯ãŒãã®ãããŒã¯äºãã«åæããŠããŸãã ã¯ãŒãã¡ã³ããŒã·ããã«é¢ä¿ãªããããã¯ãããŒãåæããã«ã¯ãããã€ãã®ããªã¢ã¿ã€ãã®ã³ãã³ãããããŸãã
- __syncthreadsïŒïŒãæã確å®ãªæ¹æ³ã§ãã ãã®é¢æ°ã¯ãïŒaïŒ ãã®ãããã¯ã®ä»ã®ãã¹ãŠã®ã¹ã¬ããããã®ãã€ã³ãã«å°éããïŒbïŒ ãã®ãããã¯ã®ã¹ã¬ããã«ãã£ãŠå®è¡ãããå
±æã¡ã¢ãªããã³ã°ããŒãã«ã¡ã¢ãªã«ã¢ã¯ã»ã¹ãããã¹ãŠã®æäœãå®äºãããã®ãããã¯ã®ã¹ã¬ããããèŠããããã«ãªããŸã§ãåã¹ã¬ãããåŸ
æ©ãããŸã ã æ¡ä»¶ä»ãifã¹ããŒãã¡ã³ãå
ã«ãã®ã³ãã³ããé
眮ããå¿
èŠã¯ãããŸãããããããã¯ã®ãã¹ãŠã®ã¹ã¬ããã«ãããã®é¢æ°ãžã®ç¡æ¡ä»¶åŒã³åºããæäŸããå¿
èŠããããŸãã
- __threadfence_blockïŒïŒã¯ãå
±æã¡ã¢ãªããã³ã°ããŒãã«ã¡ã¢ãªãžã®ãã¹ãŠã®ã³ããããããã¢ã¯ã»ã¹æäœãå®äºãããã®ãããã¯ã®ã¹ã¬ããããèŠããããã«ãªããŸã§ã åŒã³åºããã¹ã¬ãããåŸ
æ©ãããŸã ã
- __threadfenceïŒïŒã¯ãå
±æã¡ã¢ãªãžã®ãã¹ãŠã®ã³ããããããã¢ã¯ã»ã¹æäœããã®ãããã¯ã®ã¹ã¬ããããèŠããããã«ãªããŸã§ã¹ã¬ãããåŸ
æ©ããããããã€ã¹ãäžã®ãã¹ãŠã®ã¹ã¬ããã«å¯ŸããŠã°ããŒãã«ã¡ã¢ãªã䜿çšããŸãã ãããã€ã¹ããšã¯ãã°ã©ãã£ãã¯ã¢ããã¿ãŒãæå³ããŸãã
- __threadfence_systemïŒïŒã¯__threadfenceïŒïŒã«äŒŒãŠããŸãããéåžžã«äŸ¿å©ãªããŒãžããã¯ã¡ã¢ãªã䜿çšããå ŽåãCPUïŒããã¹ããïŒäžã®ã¹ã¬ãããšã®åæãå¯èœã«ããŸãã 詳现ã«ã€ããŠã¯ã [1,2]ããã³ä»¥äžã®ãªã¹ãã«ãªã¹ããããŠããä»ã®ãœãŒã¹ãåç
§ããŠãã ããã
æåã®ããŒã ã¯ã1ã€ã®ãããã¯ã®ãã¹ãŠã®ã¹ã¬ããã«1ã€ã®ããªã¢ãé
眮ããä»ã®3ã€ã¯ãåã¹ã¬ããã«ç¬èªã®ããªã¢ãé
眮ããŸãã ã°ãªããå
šäœã®ãããŒãåæããã«ã¯ãä»ã®äœããèãåºãå¿
èŠããããŸãã ãã®ããŸã ããæ€èšããåã«ãæå³ã®ããCã³ãŒãã®äŸã瀺ãããšãã§ããããã«ãã¿ã¹ã¯ãæå®ããŸãã
ããå€ãã®ã¿ã¹ã¯
ãããã£ãŠãããå
·äœçã«ã¯ã次ã®äŸãæ€èšããŠãã ããã ã¢ããã¿ã®ã°ããŒãã«ã¡ã¢ãªã«2ã€ã®ã»ã¯ã·ã§ã³ãå²ãåœãŠãŸãïŒ
X []ããã³
P []é
åã®128èŠçŽ ã é
å
X []ããã¹ãããïŒã³ã³ãã¥ãŒã¿ãŒã®RAMã®äžå€®åŠçè£
眮ã«ãã£ãŠïŒæžã蟌ãŸããããã«ããŸãã ããããã«64ã¹ã¬ããã®2ãããã¯ã®ã°ãªãããã€ãŸãåèš128ã¹ã¬ãããäœæããŸãïŒå³4ãåç
§ïŒã

ããã§ãã¹ãããïŒ
i ïŒãå®è¡ã§ããŸããçªå·
jã®åã¹ããªãŒã ã¯ãé
å
X []ã®ãã¹ãŠã®èŠçŽ ãå ç®ããçµæã
P [j]ã«æžã蟌ã¿ãŸãã 次ã«ãã¹ãããïŒ
ii ïŒãå®è¡ããå¿
èŠããããŸããå
jçªç®ã®ã¹ããªãŒã ã¯ãé
å
P []ã®ãã¹ãŠã®èŠçŽ ã®åèšãéå§ãã察å¿ãã
X [j]ã«æžã蟌ã¿ãŸãã ãã¡ãããCUDAã䜿çšããŠåãããšã128å䞊è¡ããŠå®è¡ããããšã¯ç¡æå³ã§ãããå®éã«ã¯åã¹ããªãŒã ã«ã¯å ç®ãçºçããç¬èªã®éã¿ä¿æ°ã®ã»ããããããå€æ
X- >
P ãããã³ãã®éã
P- >
X-ã¯äœåºŠãçºçããŸã ãã®äŸã§ã¯ãæå¿«ããšåçŽãã®ããã«ããŠããã£ã«çããä¿æ°ãéžæããŸããããã¯äžè¬æ§ã«éåããŸããã
çè«ããå®éšã«ç§»ããŸãã ã¢ã«ãŽãªãºã ã¯éåžžã«ééçã§ããããã«ãã¹ã¬ãããæ±ã£ãããšããªã人ã¯ããã«æ¬¡ã®CUDAã«ãŒãã«ã³ãŒããææ¡ã§ããŸãã
__global__ void Kernel(float *X, float *P) { const int N = 128;
ãã®ã«ãŒãã«ãç¹°ãè¿ãå®è¡ãããšãé
å
P []ãæã
åãã«ãªãããšãããããŸãããããã§ã¯
X []ãç°ãªãå ŽåããããŸãã ããã«ãéããããå Žåãããã¯1ã€ã®èŠçŽ
X [j]ã§ã¯ãªãã32ã®é£ç¶ããèŠçŽ ã®ã°ã«ãŒãã«ãªããŸãïŒ ãã®å Žåããšã©ãŒã®ãããããã¯ã®æåã®èŠçŽ ã®ã€ã³ããã¯ã¹ã32ã®åæ°ã«ãªããŸããããã¯ãéåžžã«ã¯ãŒãã®åæãšãããŸããŸãªã¯ãŒãovã®éåæã¹ããªãŒã ã®äžéšã®åæã®çŸãã§ãã äœããã®ã¹ã¬ããã§ãšã©ãŒãçºçããå Žåãããã¯åœŒã®æ®ãã®ã¯ãŒããã¹ãŠã«çºçããŸãã CUDAéçºè
ãææ¡ããåæã¡ã«ããºã ãé©çšããå Žå
__global__ void Kernel(float *X, float *P) { ... __syncthreads(); ... }
ãã®åŸãåãããã¯ã¹ããªãŒã ãåãçµæã«ãªãããã«ããŸãã ãããŠã©ãããééã£ãŠããå Žå-ãããã¯å
šäœã ãããã£ãŠãç°ãªããããã¯ãåæããããšã¯ã©ããããããæ®ããŸãã
解決æ¹æ³
æ®å¿µãªãããç§ã¯2ã€ã®æ¹æ³ããç¥ããŸããã
- CUDAã«ãŒãã«ã¯ããã¹ãŠã®ã¹ã¬ãããçµäºããå Žåã«ã®ã¿çµäºããŸãã ãããã£ãŠã1ã€ã®ã³ã¢ã2ã€ã«åå²ããã¡ã€ã³ããã°ã©ã ããé 次åŒã³åºãããšãã§ããŸãã
- ã°ããŒãã«ã¡ã¢ãªã«ãã©ã°ã®ã·ã¹ãã ãèãåºããŸãã
ç§ã®ã¿ã¹ã¯ã§ã¯ããã®ãããªã«ãŒãã«ãé »ç¹ã«ïŒæ°ååïŒåŒã³åºãå¿
èŠããããããæåã®ãªãã·ã§ã³ã¯ããŸã奜ãã§ã¯ãããŸããã§ããããŸããã«ãŒãã«ã®éå§æã«è¿œå ã®é
延ãçºçããããšãæããçç±ããããŸãã åã³ã¢ã®éå§æã«ããã€ãã®å€æ°ãæºåããå¿
èŠãããå Žåã«ã®ã¿ãã«ãŒãã«é¢æ°ã®åŒæ°ãåŠçããŸããã倧ããªãã«ãŒãã«ã§ããã1åè¡ããšãCPUã«å¹²æžãããã°ã©ãã£ãã¯ã¢ããã¿ãŒãç¬èªã®ããŒã¿ãããžã¥ãŒã¹ã沞ããã®ã§ãããè«ççãã€é«éã«ãªããŸãã¡ã¢ãªã
ãã©ã°ã·ã¹ãã ã®2çªç®ã®ãªãã·ã§ã³ã«ã€ããŠã¯ãåæ§ã®ã¡ã«ããºã ã
[1]ã®ã»ã¯ã·ã§ã³ãB.5ã¡ã¢ãªãã§ã³ã¹é¢æ°ãã«èšèŒãããŠããŸãã ãã ããããã§ã¯ãããã«ç°ãªãã«ãŒãã«ã¢ã«ãŽãªãºã ãèæ
®ãããŸãã ãããã¯åæãå®è£
ããããã«ã2ã€ã®æ©èœãå°å
¥ããŸãã1ã€ç®ã¯äœ¿çšæžã¿ãããã¯ã®ã«ãŠã³ã¿ãŒã®å€ãæºåãã2ã€ç®ã¯ããªã¢ã®åœ¹å²ãæãããŸãããã¹ãŠã®ãããã¯ãå®äºãããŸã§åã¹ã¬ãããé
延ãããŸãã ããšãã°ããããã®é¢æ°ãšãããã䜿çšããã«ãŒãã«ã¯æ¬¡ã®ããã«ãªããŸãã
__device__ unsigned int count;
ããã ãã§ãã ãã©ã°ãå·»ãäžããããé¢æ°ãäœæãããŸããã 1çªç®ãš2çªç®ã®æ¹æ³ã®ããã©ãŒãã³ã¹ãæ¯èŒããããšã¯æ®ã£ãŠããŸãã ãã ããæ®å¿µãªããã
SyncWholeDevice ïŒïŒé¢æ°ã¯ã«ãŠã³ã¿ãŒãã€ã³ã¯ãªã¡ã³ãããŸãããããªã¢é
延ãæäŸããŸããã ã©ãããŠã§ããïŒ çµå±ã
whileã«ãŒãã
ãããŸã ã ããã§ãèŠçŽã«èšèŒãããŠããèœãšãç©Žã«ç®ãåããŸã
ãnvccã³ã³ãã€ã©ãŒ
[12-14]ã«ãã£ãŠçæãããptxãã¡ã€ã«ãèŠããšã圌ã¯èŠç¹ãã空ã®ã«ãŒãã芪åã«æããŠããããšãããããŸãã å°ãªããšã2ã€ã®æ¹æ³ã§ããã®æ¹æ³ã§ã«ãŒããæé©åããªãããã«ã³ã³ãã€ã©ãŒã«åŒ·å¶ã§ããŸãã
ptxã¢ã»ã³ãã©ãŒãžã®æ瀺çãªæ¿å
¥ã¯ç¢ºå®ã«æ©èœããŸãã ããšãã°ããã®ãããªé¢æ°ã¯ããã®åŒã³åºãã§
whileã«ãŒãã眮ãæããå¿
èŠã
ãããŸã ã
__device__ void do_while_count_not_eq(int val) { asm("{\n\t" "$my_while_label: \n\t" " .reg .u32 r_count; \n\t" " .reg .pred p; \n\t" " ld.global.u32 r_count, [count]; \n\t" " setp.ne.u32 p, r_count, %0; \n\t" "@p bra $my_while_label; \n\t" "}\n\t" : : "r"(val)); }
æ§æçã«ãšã¬ã¬ã³ããªãã1ã€ã®æ¹æ³ã¯ãã«ãŠã³ã¿ãŒãã©ã°ã宣èšãããšãã«
volatileæå®åã䜿çšããããšã§ãã ããã¯ãã°ããŒãã«ïŒãŸãã¯å
±æïŒã¡ã¢ãªå
ã®å€æ°ããã€ã§ãã©ã®ã¹ã¬ããã§ãå€æŽã§ããããšãã³ã³ãã€ã©ãŒã«äŒããŸãã ãããã£ãŠããã®å€æ°ã«ã¢ã¯ã»ã¹ãããšãã¯ããã¹ãŠã®æé©åããªãã«ããå¿
èŠããããŸãã ã³ãŒãå
ã§å€æŽããå¿
èŠãããã®ã¯2è¡ã®ã¿ã§ãã
__device__ volatile unsigned int count;
解æ³ã®è©äŸ¡
ããã§ã2ã€ã®ãããã¯åææ¹æ³ã®ããã©ãŒãã³ã¹ã®å€§ãŸããªçè«çæšå®ãå®è¡ããŸãã åã«ãããšãã«ãŒãã«åŒã³åºãã«ã¯10ãã€ã¯ãç§çšåºŠããããŸããããã¯ãè€æ°ã®ã³ã¢åŒã³åºãã«ããåæã®ä»£åã§ãã ã«ãŒãããããªã¢ãå°å
¥ããŠåæãè¡ãå Žåãæ倧10åã®ã¹ã¬ããïŒãããã¯æ°ã«å¿ããŠïŒãã«ãŒãå
ã®ã°ããŒãã«ã¡ã¢ãªå
ã®1ã€ã®ã»ã«ãã€ã³ã¯ãªã¡ã³ãããŠèªã¿åããŸããåå
¥åºåæäœã«ã¯çŽ500ã¯ããã¯ãµã€ã¯ã«ããããŸãã åãããã¯ã«ãã®ãããªæäœ3ãå®è¡ãããŸãããã®åŸãåææäœã«çŽ10 * 500 * 3 = 1.5 * 10 ^ 4ãµã€ã¯ã«ãè²»ããããŸãã 1.5 GHzã®ã³ã¢åšæ³¢æ°ã§ã¯ã1.0 * 10 ^ïŒ-5ïŒç§= 10ÎŒsã«ãªããŸãã ã€ãŸãã倧ããã®é åºã¯åãã§ãã
ãããããã¡ãããå°ãªããšãããã€ãã®ãã¹ãã®çµæãèŠãããšã¯èå³æ·±ãã§ãã å³5ã§ã¯ããã¹ããªãŒããŒã¯ãã°ãªããæ§æããšã«10åç¹°ãè¿ããã
X- >
P- >
Xã® 100åã®é£ç¶ããå€æã«è²»ããããæéã®æ¯èŒãèŠãããšãã§ããŸãã 100åã®å€æã«å¿
èŠãªæéãå¹³åããããã«ã10åã®ç¹°ãè¿ããè¡ãããŸã
ïŒ2ïŒ ã

æ°Žå¹³é¢ã«ã¯ãããªã¬ãŒããããããã¯ã®æ°ãšããããã®ã¹ã¬ããã®æ°ããããããããŸãã 瞊軞ã¯ã
ããã«ãã«ãŒãã«èµ·å
ãã¡ãœããïŒ
MKL ïŒã«å¯Ÿãã
ã1ã«ãŒãã«èµ·å
ãã¡ãœããïŒ
SKL-ã·ã³ã°ã«ã«ãŒãã«èµ·åïŒã®æé
ã²ã€ã³ã®å²åãè¡šããŸãã æ€èšäžã®ã°ãªããæ§æã®ã²ã€ã³ã¯ãéåžžã«å°ãããã®ã®ãã»ãŒåžžã«ãã©ã¹ã§ããããšãæ確ã«ããããŸãã ãã ãããããã¯æ°ãå€ãã»ã©ãããã©ãŒãã³ã¹ãé
ããMKLã¡ãœããã¯å°ãªããªããŸãã 32ãããã¯ã®å Žåã圌ã¯SKLã¡ãœããããããããã«åªããŠããŸãã ããã¯ããããã¯ãå€ãã»ã©ãã¹ã¬ããïŒ
threadIdx.x == 0ãæã€ïŒãé
ãã°ããŒãã«ã¡ã¢ãªãã
ã«ãŠã³ãå€æ°ãèªã¿åãããã§ãã ãããããäžåºŠèªãã§ããã¹ãŠã®ãããŒã«æå³ãäžããããšããã¡ã«ããºã ã¯ãããŸããã ãããã¯å
ã®ã¹ã¬ããã®æ°ã«å¿ããçžå¯Ÿçãªçç£æ§ã®å€åãèæ
®ããå Žåããããã¯èªäœã®æ°ã¯äžå®ã§ãããããäžå®ã®èŠåæ§ã«ãæ°ä»ãããšãã§ããŸãã ããããããã§ã¯ããããã¯å
ã®ãããŒã®åæãSMã§ã®ã¯ãŒãã®ç®¡çã«é¢é£ãããèè
ã®äœæ¥ã«ã¯äžæãªå¹æããããŸãã ãããã£ãŠããã以äžã®ã³ã¡ã³ãã¯æ§ããŸãã
åãæ°ã®äœæ¥ã¹ã¬ããïŒ1024ïŒã§ããããã¯ãžã®åå²ãç°ãªãããã©ãŒãã³ã¹ãèŠãã®ã¯èå³æ·±ãã§ãã å³6ã¯ã2ã€ã®ã¡ãœããïŒMKLãšSKLïŒã®äžèšã®å€æã®100 * 10ã«è²»ããããç¹°ã蟌ã¿æéã®ã°ã©ãã瀺ããŠããŸãã

å®éãããã¯å³5ã®æãã®ãã¹ã©ã€ã¹ãã§ãã æåã¯ããããã¯ã倧ãããªããšãäž¡æ¹ã®åææ¹æ³ã®ããã©ãŒãã³ã¹ãçããåäžããããšãã¯ã£ãããšããããŸãã CUDAã®éçºè
ã¯å
¬åŒããã¥ã¡ã³ãã§ãã®ãããªå¹æãèŠåããŠããŸã
[2]ããèè
ã¯åã³æ®å¿µãªãããã®çŸè±¡ã®ã¡ã«ããºã ã®è©³çŽ°ãç¥ããŸããã ãã§ã«è¿°ã¹ãããã«ãå€æ°
countã®èªã¿åãåæ°ã®å¢å ã«äŒŽããã®ã£ããã®çž®å°ãšããããã¯ãžã®æå°ã®åå²ã«ããSKLã¡ãœããã®æ倱ããé¢é£ããŠã
ãŸã ã
whileã«ãŒãã ptx-assembler insertã«çœ®ãæããããšã«ãããSKLã¡ãœããã®å®è£
äžã«ãã¹ããå®è¡ãããããšã«æ³šæããŠãã ããã
æ®çºæ§æå®åã䜿çšãããšïŒã°ãªããæ§æã«å¿ããŠïŒãããã»ã¹ã®é床ãäœäžããå Žåããããé床ãåäžããå ŽåããããŸãã æžé床ã®å€§ããã¯0.20ïŒ
ã«éããå é床ã¯0.15ïŒ
ã§ãã ã»ãšãã©ã®å Žåããã®åäœã¯
ãã³ã³ãã€ã©ãŒ
ã«ããwhileã«ãŒãã®å®è£
ã®ç¹åŸŽãšã人ã«ããptx-assemblerã®æ¿å
¥ã«ãã£ãŠæ±ºå®ãããSKLã¡ãœããã®äž¡æ¹ã®å®è£
ãçããçç£çã§ãããšèããããšãã§ããŸãã
ãããã«
ãã®èšäºã§ã¯ãã¹ã¬ããåæã®åé¡ããããã¯åæã®æ¹æ³ãåºæ¬ã¬ãã«ã§æããã«ããããšããŸããã ããã€ãã®ãã¹ãã®åŸãCUDAã·ã¹ãã ã®äžè¬çãªèª¬æãåçã«äžããŸãã ããã«ããã¹ãããã°ã©ã ã®ãœãŒã¹ã³ãŒã
ïŒ2ïŒã§ãèªè
ã¯å
±æã¡ã¢ãªå
ã®ãããã¡ã®ä¿¡é Œã§ãã䜿çšã®å¥ã®äŸãèŠã€ããããšãã§ããŸãïŒã¹ã¬ããã¯
__syncthreadsïŒïŒãä»ããŠåæãããŸãïŒã 誰ããããã圹ç«ã€ããšãé¡ã£ãŠããŸãã å人çã«ã¯ããã®æ
å ±ã1ãæã§åéããããšã§ãã³ãŒããäœåºŠãè©ŠããŠãã°ãŒã°ã«ã§ãæ€çŽ¢ããæéãç¯çŽã§ããŸããããã¯ãããã¥ã¡ã³ããããŸã泚ææ·±ãèªãŸãªãæããªåŸåãããããã§ãã
ïŒ1ïŒ CUDA APIé¢æ°cudaGetDeviceProperties ïŒ...ïŒ [ 1-2ã15 ]ã䜿çšããŠãã³ã³ãã¥ãŒã¿ãŒã§äœ¿çšå¯èœãªã¢ããã¿ãŒã«é¢ããæè¡æ
å ±ãååŸããããšããå§ãããŸããïŒ2ïŒ pastebin.comã«ã¢ããããŒãããããã¹ãããã°ã©ã ã®ãœãŒã¹ã³ãŒã ãæ
å ±æºã®ãªã¹ã
[1]
CUDA Cããã°ã©ãã³ã°ã¬ã€ã[2]
CUDA Cãã¹ããã©ã¯ãã£ã¹ã¬ã€ã[3]é«åºŠãªCUDAãŠã§ãããŒïŒ
ã¡ã¢ãªã®æé©å[4] S. Tariqã
GPUã³ã³ãã¥ãŒãã£ã³ã°ãšCUDAã¢ãŒããã¯ãã£ã®çŽ¹ä»[5]ãŽã¡ã³ããŒãã«ã倧åŠãACCREã
GPU Computing with CUDA[6] OmSTUãç¡ç·å·¥åŠéšãçµ±åæ
å ±ä¿è·éšã
åãã¬ãŒãã³ã°ããã°ã©ã ãProgramming for GPUã[7]å€ã®ã¹ãŒããŒã³ã³ãã¥ãŒã¿ãŒã¢ã«ãããŒã
NVIDIAã°ã©ãã£ãã¯ã¢ã¯ã»ã©ã¬ãŒã¿ã䜿çšããé«æ§èœã¯ã©ã¹ã¿ãŒã³ã³ãã¥ãŒãã£ã³ã°[8] iXBT.comïŒNVIDIA
CUDA-GPUã§ã®éã°ã©ãã£ãã¯ã³ã³ãã¥ãŒãã£ã³ã°[9] cgm.computergraphics.ruïŒCUDA
ãã¯ãããžãŒã®çŽ¹ä»[10] THG.ruïŒnVidia
CUDAïŒã°ã©ãã£ãã¯ã«ãŒãã§ã®ã³ã³ãã¥ãŒãã£ã³ã°ãŸãã¯CPUã®æ»ïŒ[11] steps3d.narod.ruïŒCUDAã®
åºæ¬ãCUDA ããã°ã©ãã³ã°ïŒããŒã2ïŒ[12]
CUDAã³ã³ãã€ã©ãã©ã€ããŒïŒNVCCïŒ[13]
CUDAã§ã®ã€ã³ã©ã€ã³PTXã¢ã»ã³ããªã®äœ¿çš[14]
PTXïŒäžŠåã¹ã¬ããå®è¡ISAããŒãžã§ã³3.0[15] CUDA APIãªãã¡ã¬ã³ã¹ããã¥ã¢ã«ïŒ
PDF ã
HTMLãªã³ã©ã€ã³ ïŒ