
ãã®äœæ¥ã®ç®çã¯ããµã€ã¯ã«ãæé©åããããã®å¥ã®ææ³ãæå®ããããšã§ãã åæã«ãæ¢åã®ã¢ãŒããã¯ãã£ã«çŠç¹ãåãããŠãåé¡ã¯ãããŸããããå察ã«ãäž»ã«åžžèã«é Œã£ãŠãã§ããã ãæœè±¡çã«è¡åããããšããŸãã
èè
ã¯ããã®ææ³ããã
loops unrolling ãã
ãloops nesting ããªã©ã®é¡æšã«ãã£ãŠãã«ãŒãããã©ããã³ã°ããããšåŒã³ãŸããã ããã«ããã®çšèªã¯æå³ãåæ ããŠãããå¿ãããªãã
ãµã€ã¯ã«ã¯æé©åã®äž»èŠãªãªããžã§ã¯ãã§ãããã»ãšãã©ã®ããã°ã©ã ãã»ãšãã©ã®æéãè²»ãããµã€ã¯ã«ã§ãã ååãªæ°ã®æé©åææ³ããããŸãã
ããã§ãããã«ç²Ÿéããããšãã§ã
ãŸã ã
æé©åã®ããã®äž»èŠãªãªãœãŒã¹
- ãµã€ã¯ã«ãçµäºããããžãã¯ã®ç¯çŽã ã«ãŒããçµäºããããã®åºæºããã§ãã¯ãããšãåå²ãçºçããåå²ãããã€ãã©ã€ã³ãå£ããŸããã®ã§ãããŸãé »ç¹ã«ãã§ãã¯ããªãããã«ããŸãããã çµæã¯ã Dufã®deviceãªã©ã®çŽ æµãªã³ãŒããµã³ãã«ã§ãã
void send(int *to, int *from, int count) { int n = (count + 7) / 8; switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); } }
çŸæç¹ã§ã¯ãããã»ããµã®é·ç§»ã®äºæž¬å ïŒååšããå ŽåïŒã«ããããã®ãããªæé©åã¯ç¡å¹ã«ãªã£ãŠããŸãã
- ãã©ã±ããããã«ãŒãäžå€éãåé€ããïŒ å·»ãäžã ïŒã
- ã¡ã¢ãªã®ãã£ãã·ã¥ã®ãã广çãªäœæ¥ã®ããã®ã¡ã¢ãªã䜿çšããäœæ¥ã®æé©åã ãµã€ã¯ã«å
ã§æããã«ãã£ãã·ã¥ãµã€ãºãè¶
ããã¡ã¢ãªéã®åŒã³åºããããå Žåããããã®åŒã³åºãã®é åºã§éèŠã«ãªããŸãã æãããªå Žåã«å ããŠãã³ã³ãã€ã©ãŒãããã«å¯ŸåŠããããšã¯å°é£ã§ã;æã
ã广ãéæããããã«ãå¥ã®ã¢ã«ãŽãªãºã ãå®éã«æžãããŠããŸãã ãããã£ãŠããã®æé©åã¯ãé©çšãããããã°ã©ãã®è©ã«ããã£ãŠããŸãã ãããŠãã³ã³ãã€ã©/ãããã¡ã€ã©ã¯çµ±èšãæäŸãããã³ããäžããŸã...ãã£ãŒãããã¯ã
- ïŒæç€ºçãŸãã¯æé»çïŒããã»ããµäžŠååŠçã䜿çšããŸãã ææ°ã®ããã»ããµã¯ãã³ãŒãã䞊è¡ããŠå®è¡ã§ããŸãã
æç€ºçãªäžŠåã¢ãŒããã¯ãã£ïŒ EPIC ã VLIW ïŒã®å Žåã1ã€ã®åœä»€ã«ãç°ãªãæ©èœãããã¯ã«åœ±é¿ãäžããè€æ°ã®åœä»€ïŒäžŠåã§å®è¡ãããïŒãå«ããããšãã§ããŸãã
ã¹ãŒããŒã¹ã«ã©ãŒããã»ããµã¯ãç¬ç«ããŠåœä»€ã®ãããŒãè§£æãã䞊ååŠçãæ¢ãåºããå¯èœãªéãããã䜿çšããŸãã

ã³ãã³ãã®ã¹ãŒããŒã¹ã«ã©ãŒå®è¡ã®æŠç¥å³
å¥ã®ãªãã·ã§ã³ã¯ããã¯ãã«æŒç®SIMDã§ãã
çŸåšãããã»ããµã®äžŠååŠçãæå€§éã«æŽ»çšããæ¹æ³ãæ¢ããŠããŸãã
äœããããŸãã
ã¯ããã«ãå®éšã®ããã«
Intel Core-i7 2600ããã»ããµãŒã§MSVS-2013ïŒx64ïŒã䜿çšããããã€ãã®ç°¡åãªäŸãèŠãŠã¿ãŸãããã ã¡ãªã¿ã«ãGCCã¯ãããã«ããŠãããã®ãããªåçŽãªäŸã§ãåãããšãè¡ãããšãã§ããŸãã
æãåçŽãªã«ãŒãã¯ãæŽæ°é
åã®åèšãèšç®ããããšã§ãã
int64_t data[100000]; ⊠int64_t sum = 0; for (int64_t val : data) { sum += val; }
ã³ã³ãã€ã©ãäœæãããã®ã¯æ¬¡ã®ãšããã§ãã
lea rsi,[data] mov ebp,186A0h ;100 000 mov r14d,ebp ... xor edi,edi mov edx,edi nop dword ptr [rax+rax] ; loop_start: add rdx,qword ptr [rsi] lea rsi,[rsi+8] dec rbp jne loop_start
ããã«ãšåãïŒAVXã
/ fpïŒæ£ç¢º ïŒ/ fpïŒå³æ Œ-ANSIäºææ§ïŒïŒ
vxorps xmm1,xmm1,xmm1 lea rax,[data] mov ecx,186A0h nop dword ptr [rax+rax] loop_start: vaddsd xmm1,xmm1,mmword ptr [rax] lea rax,[rax+8] dec rcx jne loop_start
ãã®ã³ãŒãã¯ã85ç§ã§100äžåå®è¡ãããŸãã
ããã§ã¯ã䞊ååŠçãèå¥ããããã®ã³ã³ãã€ã©ãŒã®äœæ¥ã¯èŠãããŸããããã¿ã¹ã¯ã§ã¯æãããªããã«èŠããŸãã ã³ã³ãã€ã©ã¯ããŒã¿ã®äŸåé¢ä¿ãæ€åºãããããåé¿ã§ããŸããã§ããã
åãããã ãïŒAVXã/ fpïŒé«é-ANSIäºææ§ãªãïŒïŒ
vxorps ymm2,ymm0,ymm0 lea rax,[data] mov ecx,30D4h ; 12500, 1/8 vmovupd ymm3,ymm2 loop_start: vaddpd ymm1,ymm3,ymmword ptr [rax+20h] ; SIMD vaddpd ymm2,ymm2,ymmword ptr [rax] lea rax,[rax+40h] vmovupd ymm3,ymm1 dec rcx jne loop_start vaddpd ymm0,ymm1,ymm2 vhaddpd ymm2,ymm0,ymm0 vmovupd ymm0,ymm2 vextractf128 xmm4,ymm2,1 vaddpd xmm0,xmm4,xmm0 vzeroupper
26ç§ãããããã¯ãã«æŒç®ã䜿çšãããŸãã
åãã«ãŒãã§ãããåŸæ¥ã®Cã¹ã¿ã€ã«ã®å ŽåïŒ
for (i = 0; i < 100000; i ++) { sum += data[i]; }
ïŒ/ fpïŒprecisionïŒã§äºæããã«ååŸããŸãã
vxorps xmm4,xmm4,xmm4 lea rax,[data+8h] lea rcx,[piecewise_construct+2h] vmovups xmm0,xmm4 nop word ptr [rax+rax] loop_start: vaddsd xmm0,xmm0,mmword ptr [rax-8] add rax,50h vaddsd xmm1,xmm0,mmword ptr [rax-50h] vaddsd xmm2,xmm1,mmword ptr [rax-48h] vaddsd xmm3,xmm2,mmword ptr [rax-40h] vaddsd xmm0,xmm3,mmword ptr [rax-38h] vaddsd xmm1,xmm0,mmword ptr [rax-30h] vaddsd xmm2,xmm1,mmword ptr [rax-28h] vaddsd xmm3,xmm2,mmword ptr [rax-20h] vaddsd xmm0,xmm3,mmword ptr [rax-18h] vaddsd xmm0,xmm0,mmword ptr [rax-10h] cmp rax,rcx jl loop_start
䞊ååŠçã¯ãããŸãããã¡ã³ããã³ã¹ãµã€ã¯ã«ãç¯çŽããããã®è©Šã¿ã§ãã ãã®ã³ãŒãã¯87ç§éå®è¡ãããŸãã / fpã®å ŽåïŒé«éã³ãŒãã¯å€æŽãããŠããŸããã
ã«ãŒãã®ãã¹ãã䜿çšããŠã³ã³ãã€ã©ãŒã«äŒããŸãããã
double data[100000]; ⊠double sum = 0, sum1 = 0, sum2 = 0; for (int ix = 0; i < 100000; i+=2) { sum1 += data[i]; sum2 += data[i+1]; } sum = sum1 + sum2;
èŠæ±ãããšããã®çµæãåŸãããã³ãŒãã¯/ fpïŒfastããã³/ fpïŒexactãªãã·ã§ã³ãšåãã§ãã äžéšã®ããã»ããµãŒïŒAMD BulldozerïŒã§ã®
Vaddsdæäœã¯ã䞊è¡ããŠå®è¡ã§ããŸãã
vxorps xmm0,xmm0,xmm0 vmovups xmm1,xmm0 lea rax,[data+8h] lea rcx,[piecewise_construct+2h] nop dword ptr [rax] nop word ptr [rax+rax] loop_start: vaddsd xmm0,xmm0,mmword ptr [rax-8] vaddsd xmm1,xmm1,mmword ptr [rax] add rax,10h cmp rax,rcx jl loop_start
ãã®ã³ãŒãã¯43ç§ã§æ°çŸäžåå®è¡ããããåçŽã§æ£ç¢ºãªãã¢ãããŒãã®2åã®é床ã§ãã
4ã€ã®èŠçŽ ã®ã¹ãããã§ãã³ãŒãã¯æ¬¡ã®ããã«ãªããŸãïŒã³ã³ãã€ã©ãªãã·ã§ã³/ fpïŒfastïŒ/ fpïŒpreciseã§ãåãã§ãïŒ
vxorps xmm0,xmm0,xmm0 vmovups xmm1,xmm0 vmovups xmm2,xmm0 vmovups xmm3,xmm0 lea rax,[data+8h] lea rcx,[piecewise_construct+2h] nop dword ptr [rax] loop_start: vaddsd xmm0,xmm0,mmword ptr [rax-8] vaddsd xmm1,xmm1,mmword ptr [rax] vaddsd xmm2,xmm2,mmword ptr [rax+8] vaddsd xmm3,xmm3,mmword ptr [rax+10h] add rax,20h cmp rax,rcx jl loop_start vaddsd xmm0,xmm1,xmm0 vaddsd xmm1,xmm0,xmm2 vaddsd xmm1,xmm1,xmm3
ãã®ã³ãŒãã¯34ç§ã§100äžåå®è¡ãããŸãã ãã¯ãã«ã³ã³ãã¥ãŒãã£ã³ã°ãä¿èšŒããã«ã¯ã次ã®ãããªããŸããŸãªããªãã¯ã䜿çšããå¿
èŠããããŸãã
- ãã©ã°ãã®åœ¢åŒã§ã³ã³ãã€ã©ãŒã«ãã³ããèšè¿°ããŸãã ïŒpragma ivdep ïŒ #pragma loopïŒivdepïŒ ãïŒ pragma GCC ivdep ïŒãïŒpragma vector alwaysãïŒpragma omp simd ...
- çµã¿èŸŒã¿ 'ãš-䜿çšããåœä»€ãã³ã³ãã€ã©ã«æç€ºããŸããããšãã°ã2ã€ã®é
åãåèšãããšæ¬¡ã®ããã«ãªããŸã ã
ã©ãããããããããã¯ãã¹ãŠã髿°Žæºèšèªãã®æããã€ã¡ãŒãžã«ã¯ããŸãåããŸããã
äžæ¹ã§ã¯ãå¿
èŠã«å¿ããŠãçµæãåŸãããã«ããããã®æé©åã¯ãŸã£ããè² æ
ã«ãªããŸããã äžæ¹ãç§»æ€æ§ã®åé¡ãçºçããŸãã 4ã€ã®å ç®åšãåããããã»ããµçšã«ããã°ã©ã ãäœæããããããã°ããããšããŸãã æ¬¡ã«ã6åã®å ç®åšãåããããã»ããµããŒãžã§ã³ã§å®è¡ããããšãããšãæåŸ
ã©ããã®ã²ã€ã³ãåŸãããŸããã
ãŸãã3ã€ã®ããŒãžã§ã³ã§ã¯ã4åã®1ã§ã¯ãªã2åã®é床äœäžãçºçããŸãã
æåŸã«ãå¹³æ¹ã®åèšãèšç®ããŸãïŒ/ fpïŒæ£ç¢ºïŒïŒ
vxorps xmm2,xmm2,xmm2 lea rax,[data+8h] ; pdata = &data[1] mov ecx,2710h ; 10 000 nop dword ptr [rax+rax] loop_start: vmovsd xmm0,qword ptr [rax-8] ; xmm0 = pdata[-1] vmulsd xmm1,xmm0,xmm0 ; xmm1 = pdata[-1] ** 2 vaddsd xmm3,xmm2,xmm1 ; xmm3 = 0 + pdata[-1] ** 2 ; sum vmovsd xmm2,qword ptr [rax] ; xmm2 = pdata[0] vmulsd xmm0,xmm2,xmm2 ; xmm0 = pdata[0] ** 2 vaddsd xmm4,xmm3,xmm0 ; xmm4 = sum + pdata[0] ** 2 ; sum vmovsd xmm1,qword ptr [rax+8] ; xmm1 = pdata[1] vmulsd xmm2,xmm1,xmm1 ; xmm2 = pdata[1] ** 2 vaddsd xmm3,xmm4,xmm2 ; xmm3 = sum + pdata[1] ** 2 ; sum vmovsd xmm0,qword ptr [rax+10h] ; ... vmulsd xmm1,xmm0,xmm0 vaddsd xmm4,xmm3,xmm1 vmovsd xmm2,qword ptr [rax+18h] vmulsd xmm0,xmm2,xmm2 vaddsd xmm3,xmm4,xmm0 vmovsd xmm1,qword ptr [rax+20h] vmulsd xmm2,xmm1,xmm1 vaddsd xmm4,xmm3,xmm2 vmovsd xmm0,qword ptr [rax+28h] vmulsd xmm1,xmm0,xmm0 vaddsd xmm3,xmm4,xmm1 vmovsd xmm2,qword ptr [rax+30h] vmulsd xmm0,xmm2,xmm2 vaddsd xmm4,xmm3,xmm0 vmovsd xmm1,qword ptr [rax+38h] vmulsd xmm2,xmm1,xmm1 vaddsd xmm3,xmm4,xmm2 vmovsd xmm0,qword ptr [rax+40h] vmulsd xmm1,xmm0,xmm0 vaddsd xmm2,xmm3,xmm1 ; xmm2 = sum; lea rax,[rax+50h] dec rcx jne loop_start
ã³ã³ãã€ã©ãŒã¯ããµã€ã¯ã«ã®ããžãã¯ãç¯çŽããããã«ãµã€ã¯ã«ã10åã®èŠçŽ ã«åå²ããŸããã5ã€ã®ã¬ãžã¹ã¿ïŒåèš1ã€ãšä¹ç®ã®2ã€ã®äžŠååå²ããšã®ãã¢ïŒãããããŸãã
ãŸãã¯/ fpã®å ŽåïŒfastïŒ
vxorps ymm4,ymm0,ymm0 lea rax,[data] mov ecx,30D4h ;12500 1/8 loop_start: vmovupd ymm0,ymmword ptr [rax] lea rax,[rax+40h] vmulpd ymm2,ymm0,ymm0 ; SIMD vmovupd ymm0,ymmword ptr [rax-20h] vaddpd ymm4,ymm2,ymm4 vmulpd ymm2,ymm0,ymm0 vaddpd ymm3,ymm2,ymm5 vmovupd ymm5,ymm3 dec rcx jne loop_start vaddpd ymm0,ymm3,ymm4 vhaddpd ymm2,ymm0,ymm0 vmovupd ymm0,ymm2 vextractf128 xmm4,ymm2,1 vaddpd xmm0,xmm4,xmm0 vzeroupper
èŠçŽè¡šïŒ
| MSVCã/ fpïŒå³å¯ã/ fpïŒæ£ç¢ºãç§ | MSVCã/ fpïŒé«éãç§ |
foreach | 85 | 26 |
Cã¹ã¿ã€ã«ã®ã«ãŒã | 87 | 26 |
Cã¹ã¿ã€ã«ã®ãã¹ãX2 | 43 | 43 |
Cã¹ã¿ã€ã«ã®ãã¹ãX4 | 34 | 34 |
ãããã®æ°åã説æããã«ã¯ïŒ
ããã»ããµã®éçºè
ã ããäœãèµ·ãã£ãŠãããã®æ¬åœã®ç¶æ³ãç¥ã£ãŠãããæšæž¬ããã§ããªãããšã«æ³šæãã䟡å€ããããŸãã
å éã¯ããã€ãã®ç¬ç«ããå ç®åšã«ãããã®ã§ãããšããæåã®èãã¯æããã«èª€ãã§ãã i7-2600ããã»ããµã«ã¯ãç¬ç«ããã¹ã«ã©ãŒæŒç®ãå®è¡ã§ããªããã¯ãã«å ç®åšã1ã€ãããŸãã
ããã»ããµã®ã¯ããã¯éåºŠã¯æå€§3.8 GHzã§ãã 85ç§ã®åçŽãªãµã€ã¯ã«ïŒ100äžåã100,000åã®è¿œå ïŒã§ãå埩ããã3ã¯ããã¯ãµã€ã¯ã«ãåŸãããŸãã ããã¯ãvaddpdãã¯ãã«åœä»€ã®å®è¡ã®3ã¯ããã¯ãµã€ã¯ã«ã®ããŒã¿ïŒ
1ã2 ïŒãšããäžèŽããŠããŸãïŒã¹ã«ã©ãŒã远å ããå Žåã§ãïŒã ããŒã¿ã«äŸåããŠããããã3ã¯ããã¯ãµã€ã¯ã«ããéãå埩ãå®äºããããšã¯ã§ããŸããã
ãã¹ãïŒX2ïŒã®å Žåãå埩å
ã®ããŒã¿ã«äŸåããããµã€ã¯ã«ã®éãã§å ç®åšãã€ãã©ã€ã³ãããŒãã§ããŸãã ããããæ¬¡ã®ã€ãã¬ãŒã·ã§ã³ã§ã¯ãããŒã¿ã®äŸåé¢ä¿ããµã€ã¯ã«ã®éãã§çŸããŸãããã®çµæãå éã2åã«ãªããŸãã
ãã¹ãã£ã³ã°ïŒX4ïŒã®å Žåãå ç®ã³ã³ãã€ãŒãããŒãåäœã§ããŒããããŸãããïŒã³ã³ãã€ãŒã®é·ãã«ããïŒ3åã®å éã¯çºçããã远å ã®èŠå ãä»åšããŸãã ããšãã°ãã«ãŒãã®å埩ããã£ãã·ã¥ã©ã€ã³
L0mã«åãŸããªããªãã空ãã¯ããã¯ãµã€ã¯ã«ïŒsïŒãåãåããŸãã
ã ããïŒ- ã³ã³ãã€ã«ã¢ãã«/ fpã䜿çšããå ŽåïŒæãåçŽãªãœãŒã¹ã³ãŒãã¯ãã³ãŒãã®æéããŒãžã§ã³ãæäŸããŸãã ç§ãã¡ã¯é«çŽèšèªãæ±ã£ãŠããŸãã
- ãã¹ãã¹ã¿ã€ã«ã®æåæé©åã¯ã/ fpïŒæ£ç¢ºãªã¢ãã«ã«è¯ãçµæããããããŸããã/ fpïŒfastã䜿çšããå Žåã«ã®ã¿ã³ã³ãã€ã©ã«å¹²æžããŸã
- æåã®æé©åã¯ããã¯ãã«åãããã³ãŒããããç§»æ€æ§ããããŸã
ã³ã³ãã€ã©ã«ã€ããŠå°ã
ã¬ãžã¹ã¿ã¢ãŒããã¯ãã£ã¯ãé«ã¬ãã«èšèªã®ããŒã¿ãã«ããã¹ãããåãå
¥ãå¯èœãªã³ãŒããååŸããããã®ã·ã³ãã«ã§æ®éçãªæ¹æ³ãæäŸããŸãã ã³ã³ãã€ã«ã¯æ¡ä»¶ä»ãã§ããã€ãã®ã¹ãããã«åå²ã§ããŸãã
- è§£æ ãã®æ®µéã§ãæ§æçã«å¶åŸ¡ããã倿ãå®è¡ãããéçãã§ãã¯ãå®è¡ãããŸãã åºåã«ã¯ãè§£æããªãŒïŒ DAG ïŒããããŸãã
- äžéã³ãŒãçæã ãªãã·ã§ã³ã§ãäžéã³ãŒãçæãè§£æãšçµã¿åãããããšãã§ããŸãã
ãŸãã 3ã¢ãã¬ã¹åœä»€ãäžéã³ãŒããšããŠäœ¿çšããå Žåãã 3ã¢ãã¬ã¹ã³ãŒãã¯æ§æããªãŒãŸãã¯DAGã®ç·åœ¢åããã衚çŸã§ãããæç€ºçãªååã¯ã°ã©ãã®å
éšããŒãã«å¯Ÿå¿ãã ãããããã®æé ã¯ç°¡åã«ãªããŸã ã
æ¬è³ªçã«ã3ã¢ãã¬ã¹ã³ãŒãã¯ãç¡éã®æ°ã®ã¬ãžã¹ã¿ãæã€ä»®æ³ããã»ããµçšã§ãã
- ã³ãŒãçæã ãã®ã¹ãããã®çµæã¯ãã¿ãŒã²ããã¢ãŒããã¯ãã£çšã®ããã°ã©ã ã§ãã ã¬ãžã¹ã¿ã®å®éã®æ°ã¯éãããŠããããããã®æ®µéã§ãåäžæã¬ãžã¹ã¿ã«ã©ã®äžæå€æ°ãå«ããããæ±ºå®ããç¹å®ã®ã¬ãžã¹ã¿ã«åæ£ããå¿
èŠããããŸãã çŽç²ãªåœ¢åŒã§ãã£ãŠãããã®ã¿ã¹ã¯ã¯NPå®å
šã§ãããããã«ãã¬ãžã¹ã¿ã®äœ¿çšã«ã¯éåžžããŸããŸãªå¶éããããããåé¡ã¯è€éã§ãã ãã ãããã®åé¡ã解決ããããã«ã蚱容å¯èœãªãã¥ãŒãªã¹ãã£ãã¯ãéçºãããŸããã ããã«ã3ã¢ãã¬ã¹ïŒãŸãã¯åçã®ïŒã³ãŒãã¯ãããŒã¿ã¹ããªãŒã ã®åæãæé©åãäžèŠãªã³ãŒãã®åé€ãªã©ã®æ£åŒãªè£
眮ãæäŸããŸãã
åé¡ãè¿«ã£ãŠããŸãïŒ
- ã¬ãžã¹ã¿å²ãåœãŠã®NPå®å
šåé¡ã解決ããããã«ããã¥ãŒãªã¹ãã£ãã¯ã䜿çšãããããã«ãã蚱容å¯èœãªå質ã®ã³ãŒããåŸãããŸãã ãããã®ãã¥ãŒãªã¹ãã£ãã¯ã¯ãã¡ã¢ãªãŸãã¯ã¬ãžã¹ã¿ã®äœ¿çšã«é¢ãã远å ã®å¶éã奜ã¿ãŸããã ããšãã°ãã€ã³ã¿ãŒã¬ãŒã¹ã¡ã¢ãªãåœä»€ã§ã®ã¬ãžã¹ã¿ã®æé»çãªäœ¿çšããã¯ãã«æŒç®ãã¬ãžã¹ã¿ãªã³ã°...ãã¥ãŒãªã¹ãã£ãã¯ãåäœã忢ããæé©ã«è¿ãã³ãŒãã®æ§ç¯ã忢ã§ããçšåºŠã«ã¯ãæ®éçãªæ¹æ³ã§è§£æ±ºã§ããåé¡ã¯ãªããªããŸãã
ãã®çµæãïŒãã¯ãã«ïŒïŒããã»ããµæ©èœã¯ãã³ã³ãã€ã©ããã¬ãŒãã³ã°ãããã»ããããå
žåçãªç¶æ³ãèªèããå Žåã«ã®ã¿äœ¿çšã§ããŸãã
- ã¹ã±ãŒãªã³ã°ã®åé¡ã ã¬ãžã¹ã¿ã®å²ãåœãŠã¯éçã«è¡ãããŸããåãã·ã¹ãã ã®åœä»€ã䜿çšããŠããã»ããµäžã§ã³ã³ãã€ã«ãããã³ãŒããå®è¡ããããšãããšã倿°ã®ã¬ãžã¹ã¿ã䜿çšããŠãã²ã€ã³ã¯åŸãããŸããã
ããã¯ãã¬ãžã¹ã¿ãŠã£ã³ããŠã®ã¹ã¿ãã¯ãåããSPARCã«ãåœãŠã¯ãŸããŸããã¬ãžã¹ã¿ãŠã£ã³ããŠã®æ°ãå€ãã»ã©ãåŒã³åºããã¬ãŒã ã®æ°ãå€ããªããã¡ã¢ãªã¢ã¯ã»ã¹ã®é »åºŠãæžããšããäºå®ã«ãªããŸãã
EPIC-ã¹ã±ãŒãªã³ã°ã®æ¹åã§è©Šã¿ãè¡ãããŸãã-ãè€æ°ã®åœä»€ã®åã°ã«ãŒãã¯ãã³ãã«ãšåŒã°ããŸãã åãã³ãã«ã«ã¯ã次ã®ã°ã«ãŒãããã®çµæã«äŸåããããšã瀺ãã¹ããããããããããŸãã ãã®ãããã䜿çšãããšãè€æ°ã®ãã³ãã«ã䞊è¡ããŠå®è¡ã§ããæ©èœãåããæ¬¡äžä»£ã®ã¢ãŒããã¯ãã£ãäœæã§ããŸãã äŸåé¢ä¿æ
å ±ã¯ã³ã³ãã€ã©ãŒã«ãã£ãŠèšç®ããããããæ©åšã¯ãªãã©ã³ãã®ç¬ç«æ§ã®è¿œå æ€èšŒãå®è¡ããå¿
èŠã¯ãããŸããããç¬ç«ãããã³ãã«ã䞊åã§å®è¡ã§ããã·ã¹ãã å
ã®å®è¡ããã€ã¹ãå€ãã»ã©ãããã°ã©ã ã®å
éšäžŠåæ§ãåºããªããšæ³å®ãããŠããŸããã äžæ¹ã§ããããã®æ©èœã¯åžžã«åã€ãšã¯éããŸããããããã«ãããé
åã®åèšã«ã€ããŠã¯ãèè
ã«ãšã£ãŠã¯åœ¹ã«ç«ããªãããã§ãã
ã¹ãŒããŒã¹ã«ã©ãŒããã»ããµã¯ããç§ãã¡ã®ããã®ç»é²ããšãç§ãã¡èªèº«ã®ç»é²ããå°å
¥ããããšã§åé¡ã解決ããŸãã ã³ã³ãã€ã©ãŒã¯ãã¬ãžã¹ã¿ãŒããã€ã³ãïŒå²ãæ¯ãïŒãããšãã®æåã®ã³ã³ãã€ã©ãŒã®æ°ã«ãã£ãŠã¬ã€ããããŸãã 2çªç®ã®æ°ã¯ä»»æã§ãéåžžã¯æåã®æ°ãããæ°åå€ããªããŸãã ãã³ãŒãäžãã¹ãŒããŒã¹ã«ã©ãŒããã»ããµã¯ãããã°ã©ã æ¬äœã®ãŠã£ã³ããŠå
ã®å®éã®çªå·ã«åºã¥ããŠã¬ãžã¹ã¿ãåç»é²ããŸãã ãŠã£ã³ããŠãµã€ãºã¯ãããã»ããµãåŠçã§ããããžãã¯ã®è€éãã«ãã£ãŠæ±ºãŸããŸãã ãã¡ãããã¬ãžã¹ã¿ã®æ°ã«å ããŠãæ©èœããã€ã¹ãã¹ã±ãŒãªã³ã°ã®å¯Ÿè±¡ãšãªããŸãã
- äºææ§ã®åé¡ã ç¹ã«X84-64ãšãã¯ãããžãŒã©ã€ã³ã«æ³šç®ããŠãã ãã-SSE-SSE2-SSE3-SSSE3-SSE4-AVX-AVX2-AVX512-...
ãããããŠã³äºææ§ïŒã€ãŸããã³ãŒãã¯å€ããã¯ãããžãŒçšã«ã³ã³ãã€ã«ãããŠããŸãããããè¥ãããã»ããµãŒã§å®è¡ãããå ŽåïŒã¯ã1ã€ã®æ¹æ³ã§å®çŸã§ããŸã-èšåãããåãã¯ãããžãŒçšã®ã³ãŒããçæããå®è¡æã«é©åãªå®è¡ãã©ã³ããéžæããããšã«ãã ããã¯ããŸãé
åçã§ã¯ãããŸããã
ããã ã¢ããäºææ§ã¯ãããã»ããµã«ãã£ãŠæäŸãããŸãã ãã®äºææ§ã¯ã³ãŒãã®å®è¡ãä¿èšŒããŸããã广çãªå®è¡ãçŽæãããã®ã§ã¯ãããŸããã ããšãã°ã2ã€ã®ç¬ç«ããå ç®åšãåãããã¯ãããžçšã«ã³ãŒããã³ã³ãã€ã«ããã4ã€ã®ããã»ããµã§å®è¡ãããå Žåãå®éã«äœ¿çšãããã®ã¯ãã®ãã¡ã®2ã€ã ãã§ãã ããŸããŸãªãã¯ãããžãŒçšã«ã³ãŒãã®ããã€ãã®ãã©ã³ããçæããŠããèšç»ãããŠãããã©ããã«ããããããå°æ¥ã®ãã¯ãããžãŒã®åé¡ã¯è§£æ±ºãããŸããã
ãµã€ã¯ã«ãèŠã
åãåé¡ãèæ
®ããŠãé
åãåèšããŸãã ãã®åèšãåäžã®åŒã®èšç®ã§ãããšæ³åããŠãã ããã ãã€ããªå ç®ã䜿çšãããããåŒã¯ãã€ããªããªãŒãšããŠè¡šãããšãã§ããåèšã®çµåæ§ã«ããããã®ãããªããªãŒã倿°ãããŸãã
èšç®ã¯ãããªãŒãå·Šããå³ã«æ·±ããã©ããŒã¹ãããšãã«è¡ãããŸãã éåžžã®åèšã¯ãå·Šã«äŒžã³ããªã¹ãçž®éããªãŒã®ããã«èŠããŸãã

double data[N]; ⊠double sum = 0; for (int i = 0; i < N; i++) { sum += data[i]; }
æå€§ã¹ã¿ãã¯ã®æ·±ãïŒæ·±ãã¯ãåŸçœ®å ç®ãã€ãŸãã¹ã¿ãã¯ãæå³ããŸãïŒãããã§ã¯2ã€ã®èŠçŽ ãå¿
èŠã«ãªãå ŽåããããŸãã äžŠåæ§ã¯æ³å®ãããŠããŸãããååèšïŒæåã®åèšãé€ãïŒã¯ãåã®åèšã®çµæãåŸ
ã€å¿
èŠããããŸãã ããŒã¿äŸåæ§ã¯æããã§ãã
ãããã3ã€ã®ã¬ãžã¹ã¿ïŒåèšãšã¹ã¿ãã¯ã®æäžäœããšãã¥ã¬ãŒãããããã®2ã€ã®ã¬ãžã¹ã¿ïŒã§ä»»æã®ãµã€ãºã®é
åãåèšã§ããŸãã
2ã¹ããªãŒã ãµã€ã¯ã«ã®ãã¹ãã¯æ¬¡ã®ããã«ãªããŸãã

double data[N]; ⊠double sum = 0; double sum1 = 0, sum2 = 0; for (int i = 0; i < N/2; i+=2) { sum1 += data[i]; sum2 += data[i + 1]; } sum = sum1 + sum2;
èšç®ã«ã¯ã2åã®ãªãœãŒã¹ããã¹ãŠã«5ã€ã®ã¬ãžã¹ã¿ãå¿
èŠã§ãããåèšã®äžéšã䞊è¡ããŠå®è¡ã§ããããã«ãªããŸããã
èšç®ã®èгç¹ããæãæããããªãã·ã§ã³ã¯ããªã¹ãã«çž®éãã峿é·ããªãŒã§ãããã®èšç®ã«ã¯ãäžŠåæ§ããªãå Žåã«é
åã®ãµã€ãºã®ã¹ã¿ãã¯ãå¿
èŠã§ãã
ã©ã®ããªãŒãªãã·ã§ã³ãæå€§ã®åæå®è¡æ§ãæäŸããŸããïŒ æããã«ããœãŒã¹ããŒã¿ãžã®ã¢ã¯ã»ã¹ãããŒããèŠçŽãããªãŒãã§ã®ã¿è¡ãããããã©ã³ã¹ã®åããïŒå¯èœãªç¯å²ã§ïŒããªãŒã

ãã®æ¬äŒŒã³ãŒãã§ã¯ã次ã®é¢æ°ã䜿çšãããŸãã
- push ïŒvalïŒ-å€ãã¹ã¿ãã¯ã®äžçªäžã«çœ®ããã¹ã¿ãã¯ãå¢ãããŸãã ã¹ã¿ãã¯ã¯ã¬ãžã¹ã¿ãŒããŒã«ã§ç·šæããããšæ³å®ãããŸãã
- popadd ïŒïŒ-ã¹ã¿ãã¯ã®äžçªäžã«ãã2ã€ã®èŠçŽ ãåèšããçµæãäžãã2çªç®ã«é
眮ããŠãäžçªäžã®èŠçŽ ãåé€ããŸãã
- bit_count ïŒvalïŒ-æŽæ°å€ã®ãããæ°ãã«ãŠã³ãããŸã
ãã®æ¬äŒŒã³ãŒãã®æäœåŸãã¹ã¿ãã¯ã«æ®ã£ãŠããèŠçŽ ã¯ç®çã®éã«çãããªããŸãã
ã©ã®ããã«æ©èœããŸããïŒ ãã€ããªè¡šçŸã®èŠçŽ çªå·ã¯ãåŒããªãŒã®æäžéšããæäžäœãããããæäžäœããããŸã§ã®ãã¹ããšã³ã³ãŒãããããšã«æ³šæããŠãã ããã ãã®å Žåã0ã¯å·Šãžã®ç§»åã1ã¯å³ãžã®ç§»åã瀺ããŸãïŒ
ãããã³ã³ãŒãã«äŒŒ
ãŠããŸã ïŒã
é£ç¶ããŠå®è¡ãããã³ãã¯ãããäžäœãããã®æ°ã¯ãçŸåšã®èŠçŽ ãåŠçããããã«å®è¡ããå¿
èŠãããåèšæ°ã«çããããšã«æ³šæããŠãã ããã ãããŠãããæ°ã®ã³ãã¯ããããããã®ç·æ°ã¯ããããã®èŠçŽ ãæäœããåã®ã¹ã¿ãã¯äžã®èŠçŽ ã®æ°ãæå³ããŸãã
次ã®ããšã«æ³šæããŠãã ããã
- ã¹ã¿ãã¯ãµã€ãºãã€ãŸã ããã«å¿
èŠãªã¬ãžã¹ã¿ã®æ°ã¯ãããŒã¿ãµã€ãºã®log2ã§ãã äžæ¹ã§ã¯ãããã¯ããŸã䟿å©ã§ã¯ãããŸããã ããŒã¿ã®ãµã€ãºãèšç®ã§ããŸããã³ã³ãã€ã«äžã«ã¹ã¿ãã¯ã®ãµã€ãºã決å®ããããšæããŸãã äžæ¹ãã³ã³ãã€ã©ãããŒã¿ãã¿ã€ã«ã«åå²ããããšã劚ãããã®ã¯èª°ãããŸãããã¿ã€ã«ã®ãµã€ãºã¯ã䜿çšå¯èœãªã¬ãžã¹ã¿ã®æ°ã«åºã¥ããŠæ±ºå®ãããŸãã
- ãã®ãããªåé¡ã®è§£éã§ã¯ãå©çšå¯èœãªç¬ç«ããå ç®åšãããã€ã§ãèªåçã«äœ¿çšããã«ã¯ã䞊ååŠçã§ååã§ãã é
åããã®èŠçŽ ã®ããŒãã¯ãç¬ç«ããŠäžŠè¡ããŠå®è¡ã§ããŸãã 1ã€ã®ã¬ãã«ã®åèšã䞊è¡ããŠå®è¡ãããŸãã
- äžŠè¡æ§ã«åé¡ããããŸãã ã³ã³ãã€ã«æã«Nåã®å ç®åšããã£ããšããŸãã Nãšã¯ç°ãªãæ°ïŒãã¹ãŠãèæ¡ãããããïŒã§å¹æçã«äœæ¥ããã«ã¯ãããŒããŠã§ã¢ãµããŒãã䜿çšããå¿
èŠããããŸãã
- æç€ºçãªäžŠè¡æ§ãæã€ã¢ãŒããã¯ãã£ã®å Žåããã¹ãŠãç°¡åã§ã¯ãããŸããã å ç®åšã®ããŒã«ãšãããã€ãã®ç¬ç«ãããã¯ã€ããåœä»€ã䞊è¡ããŠå®è¡ããèš±å¯ã圹ç«ã¡ãŸãã èŠçŽãããšãç¹å®ã®å ç®åšã§ã¯ãªãããã¥ãŒã®æåã®å ç®åšãååŸãããŸãã åºç¯ãªåœä»€ã3ã€ã®å ç®ãå®è¡ããããšãããšã3ã€ã®å ç®åšãããŒã«ããååãããŸãã ãã®ãããªéã®ç¡æå ç®åšããªãå Žåãåœä»€ã¯è§£æŸããããŸã§ãããã¯ãããŸãã
- ã¹ãŒããŒã¹ã«ã©ãŒã¢ãŒããã¯ãã£ã§ã¯ãã¹ã¿ãã¯ã®ç¶æ
ã远跡ããå¿
èŠããããŸãã ã¹ã¿ãã¯ã«ã¯åé
æŒç®ïŒäŸïŒç¬Šå·å転ïŒãšäºé
æŒç®ïŒäŸïŒpopaddïŒããããŸãã äŸåé¢ä¿ã®ãªããªãŒãæäœïŒäŸïŒããã·ã¥ïŒã åŸè
ã¯æãç°¡åãªæ¹æ³ã§ããããã€ã§ãå®è¡ã§ããŸãã ãã ããæäœã«åŒæ°ãããå Žåãå®è¡åã«åŒæ°ã®æºåãã§ãããŸã§åŸ
æ©ããå¿
èŠããããŸãã
ãã®ãããpopaddæäœã®äž¡æ¹ã®å€ãã¹ã¿ãã¯ã®å
é ã«ããå¿
èŠããããŸãã ããããã¿ãŒã³ãããããèŠçŽããæãŸã§ã«ããããã¯ãã§ã«ã¹ã¿ãã¯ã®ãããã«ããªããããããŸããã ãããã¯ç©ççã«é
眮ãããé£ç¶ããŠããªãããšããããŸãã
åºåã¯ãå®è¡åœä»€ãçºè¡ãããæç¹ã§ã®ã¹ã¿ãã¯ããŒã«ããã®å¯Ÿå¿ããã¬ãžã¹ã¿ã®é
眮ïŒå²ãåœãŠïŒã§ãããçµæã®æºåãã§ããŠé
眮ããå¿
èŠãããæç¹ã§ã¯ãããŸããã
push(data[i]) , . . .
popadd , , . , popadd , .
.
. / . .
- ANSI - double' . /fp:fast, . , ( ) , .
- ã¹ã¿ãã¯ã®ããŒããŠã§ã¢ãµããŒããªãã§ãæ¢åã®ã¢ãŒããã¯ãã£ã«èª¬æãããã¹ããŒã ãå®è£
ããããšã¯å¯èœã§ããïŒã¯ããåºå®é
åãµã€ãºã®å Žåãã³ã³ãã€ã©ãŒã¯ãããšãã°64èŠçŽ ã®ãµã€ãºã®ã¿ã€ã«ãžã®ãµã€ã¯ã«ãèªèããã¹ã¿ãã¯ããããã·ã³ãã¬ãžã¹ã¿ãŒã«æç€ºçã«ãã€ã³ãããŸããåæã«ãã«ãŒããçµäºããã«ã¯ãæ®ãã®æ¬¡æ°2ã64æªæºã®ã³ãŒããå¿
èŠã§ããããªãé¢åã§ãããåäœããŸãã
lea rax,[data] vxorps xmm6,xmm6,xmm6 ; 0 vaddsd xmm0,xmm6,mmword ptr [rax] ; 0 vaddsd xmm1,xmm0,mmword ptr [rax+8] ; 1 vaddsd xmm0,xmm6,mmword ptr [rax+10h] ; 1 0 vaddsd xmm2,xmm0,mmword ptr [rax+18h] ; 1 2 vaddsd xmm0,xmm1,xmm2 ; 0 vaddsd xmm1,xmm6,mmword ptr [rax+20h] ; 0 1 vaddsd xmm2,xmm1,mmword ptr [rax+28h] ; 0 2 vaddsd xmm1,xmm6,mmword ptr [rax+30h] ; 0 2 1 vaddsd xmm3,xmm1,mmword ptr [rax+38h] ; 0 2 3 vaddsd xmm1,xmm2,xmm3 ; 0 1 vaddsd xmm0,xmm0,xmm1 ; 0 vaddsd xmm1,xmm6,mmword ptr [rax+40h] ; 0 1 vaddsd xmm2,xmm1,mmword ptr [rax+48] ; 0 2 vaddsd xmm1,xmm6,mmword ptr [rax+50h] ; 0 2 1 vaddsd xmm3,xmm1,mmword ptr [rax+58h] ; 0 2 3 vaddsd xmm1,xmm2,xmm3 ; 0 1 vaddsd xmm2,xmm6,mmword ptr [rax+60h] ; 0 1 2 vaddsd xmm3,xmm2,mmword ptr [rax+68h] ; 0 1 3 vaddsd xmm2,xmm6,mmword ptr [rax+70h] ; 0 1 3 2 vaddsd xmm4,xmm2,mmword ptr [rax+78h] ; 0 1 3 4 vaddsd xmm2,xmm4,xmm3 ; 0 1 2 vaddsd xmm3,xmm1,xmm2 ; 0 3 vaddsd xmm1,xmm0,xmm3 ; 1
ããã¯ã16èŠçŽ ã®ãã©ãããã®ã³ãŒãã®ããã«èŠããå ŽåããããŸãã
次ã¯ïŒ
é
åã®åèšãèŠã€ããããšã«æ³šç®ããŸãã-éåžžã«ç°¡åãªã¿ã¹ã¯ã§ãããã£ãšè€éãªãã®ãèŠãŠã¿ãŸããããæåŸã®äŸã¯éåžžã«åèã«ãªããŸããæé©åããããã«ãååž°ã¯éåžžå埩ã«å€æãããŸãããã®çµæãå
žåçãªããã¹ãïŒã¡ã€ã³ã«ãŒãïŒã¯æ¬¡ã®ããã«ãªããŸãã nn = N >> 1; ie = N; for (n=1; n<=LogN; n++) { rw = Rcoef[LogN - n]; iw = Icoef[LogN - n]; if(Ft_Flag == FT_INVERSE) iw = -iw; in = ie >> 1; ru = 1.0; iu = 0.0; for (j=0; j<in; j++) { for (i=j; i<N; i+=ie) { io = i + in; rtp = Rdat[i] + Rdat[io]; itp = Idat[i] + Idat[io]; rtq = Rdat[i] - Rdat[io]; itq = Idat[i] - Idat[io]; Rdat[io] = rtq * ru - itq * iu; Idat[io] = itq * ru + rtq * iu; Rdat[i] = rtp; Idat[i] = itp; } sr = ru; ru = ru * rw - iu * iw; iu = iu * rw + sr * iw; } ie >>= 1; }
ãã®å Žåãäœãã§ããŸããïŒèª¬æãããŠãããµã€ã¯ã«ã®æé©åã®ç²Ÿç¥ã§ã¯ãããããäœããããŸãããããã§èª¬æããããŒããŠã§ã¢ã¹ã¿ãã¯ã圹ã«ç«ã€ãã©ããã¯ãè峿·±ã質åã§ãããã ããããã¯ãŸã£ããå¥ã®è©±ã§ããPSïŒSIMDã«é¢ããçžè«ã®ã¿ãªãããTasit MurkiïŒFelidïŒã«æè¬ããŸããPPSïŒãã³ã°ã¯ãªã ãŸã³ã®æ åããæ®åœ±ããã¿ã€ãã«ã®ã€ã©ã¹ã-ãã©ã¯ãã£ãŒ-Live in Boston 1974ã