æŽæ°ã®ãã¯ãã«ã®åèšãèšç®ããªããã°ãªããªãã£ãã
çããã§ããã 誰ãå®éã«ãããè¡ãå¿
èŠããããŸããïŒ éåžžããã®ãããªèšç®ã¯ãå°åŠæ ¡ãŸãã¯ã³ã³ãã€ã©ã®ãã³ãããŒã¯ã®åé¡ã§ã®ã¿èŠã€ãããŸãã ããããä»ã§ã¯æ¬åœã«èµ·ãããŸããã
å®éã«ã¯ãã¿ã¹ã¯ã¯
IPv4ããããŒã®
ãã§ãã¯ãµã ããã§ãã¯ããããšã§ãããããã¯ã2ãã€ãã®ãã·ã³ã¯ãŒãã®ãªããŒã¹ã³ãŒãïŒ1ã«è¿œå ïŒã®åèšã§ãã ç°¡åã«èšãã°ãããã¯ãããã»ã¹ã§çæããããã¹ãŠã®ã¯ãŒããšãã¹ãŠã®ãã£ãªãŒãããã®è¿œå ãæå³ããŸãã ãã®æé ã«ã¯ãããã€ãã®äŸ¿å©ãªæ©èœããããŸãã
ADC
ããã»ããµåœä»€ã䜿çšããŠå¹ççã«å®è¡ã§ããŸãïŒæ®å¿µãªããããã®é¢æ°ã¯Cã§ã¯äœ¿çšã§ããŸããïŒã- ä»»æã®ãµã€ãºã®ã¯ãŒãã§å®è¡ã§ããŸãïŒå¿
èŠã«å¿ããŠ8ãã€ãã®å€ãè¿œå ã§ããŸããçµæã®ã¿ã2ãã€ãã«æžããããã¹ãŠã®ãªãŒããŒãããŒããããè¿œå ããå¿
èŠããããŸãïŒã
- ãã€ããªãŒããŒã®åœ±é¿ãåããŸããïŒé©ããããšã«ããã§ãïŒã
éèŠãªèŠä»¶ã1ã€ãããŸããããœãŒã¹ããŒã¿ãæŽåããŠããŸããã§ããïŒIPãã¬ãŒã ã¯ãæ©åšããåä¿¡ãããã®ããã¡ã€ã«ããèªã¿åã£ããã®ãšåãã§ãïŒã
ã³ãŒãã¯Intel x64ïŒLinuxããã³GCC 4.8.3ïŒã®åäžãã©ãããã©ãŒã ã§ã®ã¿åäœããããããœãããŠã§ã¢ã®ç§»æ€æ§ã«ã€ããŠå¿é
ããå¿
èŠã¯ãããŸããã§ããã Intelã¯ãæŽæ°ãªãã©ã³ãã®ã¢ã©ã€ã³ã¡ã³ãã«å¶éããããŸããïŒã¢ã©ã€ã³ãããŠããªãããŒã¿ãžã®ã¢ã¯ã»ã¹ã¯ä»¥åã¯é
ããªããŸãããããã¯ããªããªããŸããïŒããŸãããã€ãé ã¯éèŠã§ã¯ãªããããããŒãããã€ãžã®ãã€ãé ã¯äžãããŸãã ã ããç§ã¯ããã«æžããïŒ
_Bool check_ip_header_sum (const char * p, size_t size) { const uint32_t * q = (const uint32_t *) p; uint64_t sum = 0; sum += q[0]; sum += q[1]; sum += q[2]; sum += q[3]; sum += q[4]; for (size_t i = 5; i < size / 4; i++) { sum += q[i]; } do { sum = (sum & 0xFFFF) + (sum >> 16); } while (sum & ~0xFFFFL); return sum == 0xFFFF; }
ãœãŒã¹ã³ãŒããšã¢ã»ã³ããªåºåã¯
ãªããžããªã«ãã
ãŸã ã
æãäžè¬çãªIPããããŒãµã€ãºã¯20ãã€ãïŒ5ããã«ã¯ãŒãããããåã«
ã¯ãŒããšåŒã³
ãŸã ïŒã§ãããã®ãããã³ãŒãã¯æ¬¡ã®ããã«ãªããŸãã ããã«ããµã€ãºãå°ããããããšã¯ã§ããŸãã-ããã¯ããã®é¢æ°ãåŒã³åºãåã«ãã§ãã¯ãããŸãã IPããããŒã¯15ã¯ãŒããè¶
ããããšã¯ã§ããªããããã«ãŒãã®å埩åæ°ã¯0ã10ã§ãã
ãã®ã³ãŒãã¯å®éã«ã¯ãœãããŠã§ã¢ããŒã¿ãã«ã§ã¯ãããŸããã32ãããå€ãžã®ãã€ã³ã¿ãŒã䜿çšããä»»æã®ã¡ã¢ãªãžã®ã¢ã¯ã»ã¹ã¯ãäžéšã®ããã»ããµãŒã§ã¯æ©èœããªãããšãç¥ãããŠããŸãã ããšãã°ãã»ãšãã©ã®RISCããã»ããµã§ã¯ããã¹ãŠã§ã¯ãããŸããã ããããå
ã»ã©èšã£ãããã«ããããx86ã§åé¡ãåŒãèµ·ããããšãæå³ããŠããŸããã§ããã
ãããŠãã¡ããïŒãããªããã°äœã話ãããšã¯ãªãã ããïŒãçŸå®ã¯å察ã§ããããšã蚌æããããã®ã³ãŒãã¯SIGSEGVãšã©ãŒã§æãèœã¡ãŸããã
åçŽå
é害ã¯ãã«ãŒããå®è¡ãããŠããå Žåã«ã®ã¿çºçããŸãããã€ãŸããããããŒã20ãã€ããè¶
ããŠããŸããã å®ç掻ã§ã¯ããã¯ãã£ãã«èµ·ãããŸãããããã¹ãããŒã¿ã»ããã«ãã®ãããªããããŒããã£ãã®ã§å¹žéã§ããã ãã®ã«ãŒãã®çŽåã«ã³ãŒããåçŽåããŸãããã çµã¿èŸŒã¿é¢æ°ãé¿ããããã«ãçŽç²ãªCã§èšè¿°ãã2ã€ã®ãã¡ã€ã«ã«åå²ããŸãã ãã¡ãã
sum.c
#include <stdlib.h> #include <stdint.h> uint64_t sum (const uint32_t * p, size_t nwords) { uint64_t res = 0; size_t i; for (i = 0; i < nwords; i++) res += p [i]; return res; }
ãããŠãããã
main.c
#include <stdint.h> #include <stdio.h> extern uint64_t sum (const uint32_t * p, size_t nwords); char x [100]; int main (void) { size_t i; for (i = 0; i < sizeof (x); i++) x [i] = (char) i; for (i = 0; i < 16; i++) { printf ("Trying %d sum\n", (int) i); printf ("Done: %d\n", (int) sum ((const uint32_t*) (x + i), 16)); } return 0; }
SIGSEGVã¯ã
i
1ã®ãšãã«
sum
é¢æ°ã«è¡šç€ºãããããã«ãªããŸããã
調æ»
sum
é¢æ°ã®ã³ãŒãã¯é©ãã»ã©å€§ãããããã¡ã€ã³ã«ãŒãã®ã¿ã瀺ããŸãã
.L13: movdqa (%r8), %xmm2 addq $1, %rdx addq $16, %r8 cmpq %rdx, %r9 pmovzxdq %xmm2, %xmm1 psrldq $8, %xmm2 paddq %xmm0, %xmm1 pmovzxdq %xmm2, %xmm0 paddq %xmm1, %xmm0 ja .L13
ã³ã³ãã€ã©ã¯ã¹ããŒãã§ãã ç§ã¯è³¢ãããŸãã 圌ã¯SSEåœä»€ã»ãããé©çšããŸããïŒã©ãã§ã䜿çšããŠã³ãã³ãã©ã€ã³ã§
-msse4.2
ã瀺ããããããããè¡ãããšãèš±å¯ãããŸããããã®ã³ãŒãã¯4ã€ã®å€ãåæã«èªã¿åãïŒ
movdqa
ïŒã2ã€ã®ã¬ãžã¹ã¿ã§64ããã圢åŒã«å€æããŸãïŒ 2ã€ã®åœä»€ã¯
pmovzxdq
ãš
psrldq
ïŒã§ãããçŸåšã®éïŒ
%xmm0
ïŒãè¿œå ããŸãã«ãŒããå埩
%xmm0
åŸã环ç©å€ãå ç®ããŸãã
å€æ°ã®åèªãåŠçããå Žåãããã¯èš±å®¹ã§ããæé©åã®ããã«èŠããŸãããããã§ã¯ãããŸããã ã³ã³ãã€ã©ãŒã¯ãã«ãŒãã®äžè¬çãªå埩åæ°ã確ç«ã§ããªãã£ããããã³ãŒããæ倧ã«æé©åããå°æ°ã®ã¯ãŒãã®å Žåã¯é床ã®æé©åã«ããæ倱ãå°ãããšæ£ããæšè«ããŸããã ããã§ã©ã®ãããªæ倱ããããã©ã®çšåºŠã®æ倱ãããããåŸã§ç¢ºèªããŸãã
ãã®ã³ãŒãã§ãšã©ãŒãçºçããå¯èœæ§ãããã®ã¯äœã§ããïŒ ããã
movdqa
åœä»€ã§ããããšãããã«
movdqa
ã ã»ãšãã©ã®ã¡ã¢ãªã¢ã¯ã»ã¹SSEåœä»€ãšåæ§ã«ãå
ã®åŒæ°ã®ã¢ãã¬ã¹ã®16ãã€ãã®ã¢ã©ã€ã¡ã³ããå¿
èŠã§ãã ãããã
uint32_t
ãã€ã³ã¿ãŒãããã®ãããªã¢ã©ã€ã¡ã³ããæåŸ
ããããšã¯ã§ããŸããããããŠããã®åœä»€ãäžè¬çã«ã©ã®ããã«äœ¿çšããã®ã§ããããïŒ
ã³ã³ãã€ã©ã¯å®éã«ã¢ã©ã€ã¡ã³ããæ°ã«ããŸãã ãµã€ã¯ã«ãéå§ããåã«ããµã€ã¯ã«ãéå§ããåã«åŠçã§ããåèªæ°ãèšç®ããŸãã
testq %rsi, %rsi ; %rsi is n je .L14 movq %rdi, %rax ; %rdi is p movq %rsi, %rdx andl $15, %eax shrq $2, %rax negq %rax andl $3, %eax
ãŸãã¯ããã銎æã¿ã®ãã圢åŒã§ïŒ
if (nwords == 0) return 0; unsigned start_nwords = (- (((unsigned)p & 0x0F) >> 2)) & 3;
16é²æ°ã§
p
ã0ã1ã2ããŸãã¯3ã§çµããå Žåã¯0ãè¿ãã4-7ã§çµããå Žåã¯3ãè¿ãã8-Bã®ç¯å²ã§ã¯2ãè¿ããC-Fã®å Žåã¯1ãè¿ããŸãã ãããã®æåã®åèªãåŠçããåŸããµã€ã¯ã«ãéå§ã§ããŸãïŒæ®ãã®åèªã®æ°ãå°ãªããšã4ã§ãããæ®ããåŠçããŠããå ŽåïŒã
èŠããã«ããã®ã³ãŒãã¯ãã€ã³ã¿ãŒã16ãã€ãã«äœçœ®åããããŸã
ãããã§ã«4ãã€ãã«äœçœ®åãã
ãããŠããå Žåã«éããŸãã
çªç¶ãx86ã¯RISCã®ããã«åäœããŸã
uint32_t
ãžã®ãã€ã³ã¿ãŒã4ãã€ãã§ã¢ã©ã€ã¡ã³ããããŠããªããšã¯ã©ãã·ã¥ããŸãã
åçŽãªãœãªã¥ãŒã·ã§ã³ã¯é©åããŸãã
ãã®é¢æ°ãç°¡åã«æäœããŠåé¡ã解決ããããšã¯ã§ããŸããã ããšãã°ãããã€ã³ã¿ãŒã®ä»»æã®æ§è³ªãã³ã³ãã€ã©ãŒã«èª¬æããããšããåçŽãªè©Šã¿ã§ããã©ã¡ãŒã¿ãŒ
p
ã
char*
ãšããŠå®£èšã§ããŸãã
uint64_t sum0 (const char * p, size_t nwords) { const uint32_t * q = (const uint32_t *) p; uint64_t res = 0; size_t i; for (i = 0; i < nwords; i++) res += q [i]; return res; }
ãŸãã¯ãã€ã³ããã¯ã¹ä»ãããã€ã³ã¿ãŒæŒç®ã«çœ®ãæããããšãã§ããŸãã
uint64_t sum01 (const uint32_t * p, size_t n) { uint64_t res = 0; size_t i; for (i = 0; i < n; i++) res += *p++; return res; }
ãŸãã¯ãäž¡æ¹ã®æ¹æ³ãé©çšããŸãã
uint64_t sum02 (const char * p, size_t n) { uint64_t res = 0; size_t i; for (i = 0; i < n; i++, p += sizeof (uint32_t)) res += *(const uint32_t *) p; return res; }
ãããã®å€æŽã¯ãããã圹ç«ã¡ãŸããã ã³ã³ãã€ã©ã¯ããã¹ãŠã®æ§æç³ãç¡èŠããã³ãŒããã³ã¢ã«åæžããã®ã«ååãªã»ã©è³¢ãã§ãã ãããã®ããŒãžã§ã³ã¯ãã¹ãŠãSIGSEGVãšã©ãŒã§ã¯ã©ãã·ã¥ããŸãã
èŠæ Œãèšãããš
ããã¯ããªãæ±ãã³ã³ãã€ã©ãŒããªãã¯ã®ããã§ãã 圌ã®ããã°ã©ã ã®å€é©ã¯ãx86ã«å¯Ÿããããã°ã©ãã®éåžžã®æåŸ
ãšççŸããŠããŸãã ã³ã³ãã€ã©ã¯ãããè¡ãããšãèš±å¯ãããŠããŸããïŒ ãã®è³ªåã«çããã«ã¯ãæšæºã確èªããå¿
èŠããããŸãã
ããŸããŸãª
Cããã³
C ++æšæºãæ·±ãæãäžããã€ããã¯ãããŸããã ãã®ãã¡ã®1ã€ãã€ãŸã
C99ã®ã¿ãèŠãŠã¿ãŸããããå
·äœçã«
ã¯ãC99ïŒ2007ïŒæšæºã®
ææ°ã®å
¬éããŒãžã§ã³ã§ã ã
ã¢ã©ã€ã¡ã³ãã®æŠå¿µã瀺ããŸãã
3.2
ã¢ã©ã€ã¡ã³ã
ç¹å®ã®ã¿ã€ãã®ãªããžã§ã¯ããããã€ãã¢ãã¬ã¹ã®åæ°ã®ã¢ãã¬ã¹ãæã€ã¡ã¢ãªèŠçŽ ã®å¢çã«é
眮ãããããã®èŠä»¶
ãã®æŠå¿µã¯ããã€ã³ã¿ãŒå€æãå®çŸ©ãããšãã«äœ¿çšãããŸãã
6.3.2.3
ãªããžã§ã¯ããŸãã¯éšååãžã®ãã€ã³ã¿ãŒã¯ãå¥ã®ãªããžã§ã¯ããŸãã¯éšååãžã®ãã€ã³ã¿ãŒã«å€æã§ããŸãã çµæã®ãã€ã³ã¿ãŒãæå®ãããŠããåã«å¯ŸããŠæ£ããäœçœ®åãããããŠããªãå Žåãåäœã¯æªå®çŸ©ã§ãã ãã以å€ã®å Žåãéå€æã§ã¯ãçµæã¯å
ã®ãã€ã³ã¿ãŒãšçãããªããŸãã ãªããžã§ã¯ããžã®ãã€ã³ã¿ãŒãæåããŒã¿åãžã®ãã€ã³ã¿ãŒã«å€æããããšãçµæã¯ãªããžã§ã¯ãã®æå°ã¢ãã¬ã¹ãã€ããæããŸãã ãªããžã§ã¯ãã®ãµã€ãºãŸã§ãçµæã®å¢åãæåãããšããªããžã§ã¯ãã®æ®ãã®ãã€ããžã®ãã€ã³ã¿ãŒãäžããããŸãã
ãŸãããã€ã³ã¿ãŒã®éåç
§ã«ã䜿çšãããŸãã
6.5.3.2ã¢ãã¬ã¹ããã³éåç
§æäœ
ç¡å¹ãªå€ããã€ã³ã¿ãŒã«å²ãåœãŠãããŠããå Žåãåé
*æŒç®åã®åäœã¯æªå®çŸ©ã§ãã 87 ïŒ
87 ïŒåé
æŒç®åã§æŒç®åãéæ¥åç
§ããããã®ç¡å¹ãªå€*ïŒnullãã€ã³ã¿ãŒã åç
§ãããŠãããªããžã§ã¯ãã®ã¿ã€ãã®äžé©åã«ã¢ã©ã€ã¡ã³ããããã¢ãã¬ã¹ã 䜿çšçµäºæã®ãªããžã§ã¯ãã®ã¢ãã¬ã¹ã
ãããã®ç¹ãæ£ããç解ããŠããã°ããã€ã³ã¿ãŒãå€æããïŒäœãã
char *
å€æãã以å€ïŒããšã¯äžè¬ã«å±éºã§ãã å€æäžã«ããã°ã©ã ãããã§ã¯ã©ãã·ã¥ããå¯èœæ§ããããŸãã ãããã¯ãå€æã¯æåããå¯èœæ§ããããŸãããéåç
§äžã«ããã°ã©ã ãã¯ã©ãã·ã¥ããïŒãŸãã¯åºåãäžèŠã«ãªãïŒç¡å¹ãªå€ãçæãããŸãã ãã®å Žåããã®ã³ã³ãã€ã©ã«ãã£ãŠå®è¡ããã
uint32_t
ã®ã¢ã©ã€ã¡ã³ãèŠä»¶ã1ã€ïŒ
char*
ã¢ã©ã€ã¡ã³ãïŒãšç°ãªãå Žåããã®äž¡æ¹ãçºçããå¯èœæ§ããããŸãã
uint32_t
æãèªç¶ãªã¢ã©ã€ã¡ã³ãã¯4ã§ãããããã³ã³ãã€ã©ãŒã¯å®å
šã«æ£ããã§ãã
ããŒãžã§ã³
sum0
ã¯åé¡ã解決ããŸããããå
ã®
sum
ãããåªããŠããŸããããã¯ããã€ã³ã¿ãŒãæ¢ã«
uint32_t*
åã§ããå¿
èŠããããåŒã³åºãã³ãŒãã§ãã€ã³ã¿ãŒã®å€æãå¿
èŠã§ããããã§ãã ãã®å€æã¯ããã«ã¯ã©ãã·ã¥ããããç¡å¹ãªãã€ã³ã¿ãŒå€ãçæããå¯èœæ§ããããŸãã sumé¢æ°ã®è²¬ä»»ã®äžã§ã¢ã©ã€ã¡ã³ããè¡ãã
sum
ã
sum0
眮ãæããŸãããã
æšæºã®ãããã®ç¯ã¯ããã€ã³ã¿ãŒã®ã¿ã€ããšãããã®èšç®æ¹æ³ãè©Šãããšã«ãã£ãŠåé¡ã解決ããè©Šã¿ã倱æããçç±ã説æããŸãã ãã€ã³ã¿ãŒã§äœãå®è¡ããŠããæçµçã«ã¯
uint32_t*
ã«å€æããããã€ã³ã¿ãŒã4ãã€ãã®å¢çã«æŽåãããŠããããšãã³ã³ãã€ã©ãŒã«çŽã¡ã«
uint32_t*
ãŸãã
é©åãªãœãªã¥ãŒã·ã§ã³ã¯2ã€ãããããŸããã
SSEãç¡å¹ã«ãã
æåã®æ±ºå®ã¯ããã»ã©æ±ºå®ã§ã¯ãªãããããããªãã¯ã§ãã x86ã§ã®ã¢ã©ã€ã¡ã³ãã®åé¡ã¯ãSSEã䜿çšããŠããå Žåã«ã®ã¿çºçããããããªãã«ããŸãã
sum
宣èšãããŠãããã¡ã€ã«å
šäœã«å¯ŸããŠãããè¡ãããšãã§ããŸãããããäžéœåãªå Žåã¯ããã®ç¹å®ã®é¢æ°ã«å¯ŸããŠã®ã¿ã§ãã
__attribute__ ((target("no-sse"))) uint64_t sum1 (const char * p, size_t nwords) { const uint32_t * q = (const uint32_t *) p; uint64_t res = 0; size_t i; for (i = 0; i < nwords; i++) res += q [i]; return res; }
ãã®ãããªã³ãŒãã¯ãGCCã«åºæã®å±æ§ãšIntelã«åºæã®å±æ§ã䜿çšããããããªãªãžãã«ãããããã«ç§»æ€æ§ãå£ããŸãã é©åãªæ¡ä»¶ä»ãã³ã³ãã€ã«ã§ã¯ãªã¢ã§ããŸãã
#if defined (__GNUC__) && (defined (__x86_64__) || defined (__i386__)) __attribute__ ((target ("no-sse"))) #endif
ãã ãããã®ã¡ãœããã¯ããã°ã©ã ãä»ã®ã³ã³ãã¥ãŒã¿ãŒãä»ã®ã¢ãŒããã¯ãã£ãŒã§ã®ã¿ã³ã³ãã€ã«ã§ãããããå®éã«ã¯ã»ãšãã©åœ¹ã«ç«ã¡ãŸããããå¿
ãããããã§ã¯æ©èœããŸããã RISCããã»ããµãããå ŽåããŸãã¯ã³ã³ãã€ã©ãå¥ã®æ§æã䜿çšããŠSSEãç¡å¹ã«ããŠããå Žåãããã°ã©ã ã¯äŸç¶ãšããŠå€±æããå¯èœæ§ããããŸãã
GCCãšIntelã®ãã¬ãŒã ã¯ãŒã¯å
ã«ãšã©ãŸã£ãŠãããšããŠãã10幎åŸã«SSE以å€ã®å¥ã®ã¢ãŒããã¯ãã£ããªãããšã誰ãä¿èšŒã§ããã§ããããïŒ æçµçã«ãSSEãååšããªãã£ã20幎åã«å
ã®ã³ãŒããæžãããšãã§ããŸããïŒæåã®MMXã¯1997幎ã«ç»å ŽããŸããïŒã
ãã ãããã®ãããªããã°ã©ã ã¯éåžžã«ããããªã³ãŒãã«ã³ã³ãã€ã«ãããŸãã
sum0: testq %rsi, %rsi je .L34 leaq (%rdi,%rsi,4), %rcx xorl %eax, %eax .L33: movl (%rdi), %edx addq $4, %rdi addq %rdx, %rax cmpq %rcx, %rdi jne .L33 ret .L34: xorl %eax, %eax ret
ããã¯ãŸãã«ãé¢æ°ãæžãããšãã«èããŠããã³ãŒãã§ãã ãã®ã³ãŒãã¯ããã¯ã¿ãŒãµã€ãºãå°ããå ŽåãSSEããŒã¹ã®ã³ãŒããããé«éã«å®è¡ããããšæããŸããããã¯ãIPããããŒã®å Žåã§ãã åŸã§æž¬å®ããŸãã
memcpy
ã䜿çšãã
å¥ã®ãªãã·ã§ã³ã¯ã
memcpy
é¢æ°ã䜿çšããããšã§ãã ãã®é¢æ°ã¯ãæŽåã«é¢ä¿ãªããæ°å€ãè¡šããã€ããé©åãªåã®å€æ°ã«ã³ããŒã§ããŸãã ãããŠã圌女ã¯æšæºã«å®å
šã«åŸã£ãŠããŸãã ããã¯å¹æããªãããã«æãããããããŸãããã20幎åã«ã¯ããã§ããã ãã ããä»æ¥ã§ã¯ãé¢æ°ãããã·ãŒãžã£ã³ãŒã«ãšããŠå®è£
ããå¿
èŠã¯ãããŸããã ã³ã³ãã€ã©ã¯ãããç¬èªã®èšèªé¢æ°ãšã¿ãªããã¡ã¢ãªããã¬ãžã¹ã¿ãžã®è»¢éïŒã¡ã¢ãªããã¬ãžã¹ã¿ïŒã«çœ®ãæããããšãã§ããŸãã GCCã¯ééããªãããã§ãã 次ã®ã³ãŒããã³ã³ãã€ã«ããŸãã
uint64_t sum2 (const char * p, size_t nwords) { uint64_t res = 0; size_t i; uint32_t temp; for (i = 0; i < nwords; i++) { memcpy (&temp, p + i * sizeof (uint32_t), sizeof (temp)); res += temp; } return res; }
å
ã®SSEã«äŒŒãã³ãŒãã«å€æããŸããã
movdqu
ã§ã¯ãªã
movdqu
ã®ã¿ã䜿çšããŸãã ãã®åœä»€ã¯ãéå¢çæŽåããŒã¿ãèš±å¯ããŸãã ãã ããç°ãªãããã©ãŒãã³ã¹ã§åäœããŸãã äžéšã®ããã»ããµã§ã¯ãããŒã¿ãå®éã«æŽåãããŠããŠãã
movdqa
ãããã¯ããã«äœéã§ãã ãã®ä»ã§ã¯ãã»ãŒåãé床ã§åäœããŸãã
çæãããã³ãŒãã®ãã1ã€ã®éãã¯ããã€ã³ã¿ãŒã®äœçœ®åããããè¡ããªãããšã§ãã äœçœ®åããããŠ
movdqa
ã䜿çšã§ããå Žåã§ããå
ã®ãã€ã³ã¿ãŒã§
movdqa
ã䜿çšã
movdqa
ã ããã¯ãçµæãšããŠãããæ®éçãªã³ãŒãããäžéšã®å
¥åããŒã¿ã®å
ã®ã³ãŒããããé
ããªãããšãããããšãæå³ããŸãã
ãã®ãœãªã¥ãŒã·ã§ã³ã¯å®å
šã«ããŒã¿ãã«ã§ãããRISCã¢ãŒããã¯ãã£äžã§ãã©ãã§ã䜿çšã§ããŸãã
è€åãœãªã¥ãŒã·ã§ã³
æåã®è§£æ±ºçã¯ããŒã¿ã®æ¹ãéãããã§ãïŒãŸã 枬å®ããŠããŸãããïŒãã2çªç®ã®è§£æ±ºçã¯ãã移æ€æ§ããããŸãã ããããäžç·ã«çµã¿åãããããšãã§ããŸãïŒ
#if defined (__GNUC__) && (defined (__x86_64__) || defined (__i386__)) __attribute__ ((target ("no-sse"))) #endif uint64_t sum3 (const char * p, size_t nwords) { uint64_t res = 0; size_t i; uint32_t temp; for (i = 0; i < nwords; i++) { memcpy (&temp, p + i * sizeof (uint32_t), sizeof (temp)); res += temp; } return res; }
ãã®ã³ãŒãã¯ãGCC / Intelã§è¯å¥œãªéSSEãµã€ã¯ã«ã«ã³ã³ãã€ã«ãããŸãããä»ã®ã¢ãŒããã¯ãã£ã§åäœããïŒãããŠããªãè¯ãïŒã³ãŒããçæããŸãã ããã¯ããããžã§ã¯ãã§äœ¿çšããããŒãžã§ã³ã§ãã
x86çšã«çæãããã³ãŒãã¯ã
sum1
ããååŸãããã®ãšåãã§ãã
é床枬å®
ã³ã³ãã€ã©ã
movdqa
ã䜿çšããŠã³ãŒããçæãããã¹ãŠã®æš©å©ãæã£ãŠããããšã
movdqa
ã ãã®ãœãªã¥ãŒã·ã§ã³ã¯ãããã©ãŒãã³ã¹ã®ç¹ã§ã©ãã»ã©åªããŠããŸããïŒ ãã¹ãŠã®ãœãªã¥ãŒã·ã§ã³ã®ããã©ãŒãã³ã¹ã枬å®ããŸãã æåã«ãå®å
šã«äœçœ®åãããããããŒã¿ã§ãããè¡ããŸãããïŒãã€ã³ã¿ãŒã¯16ã®å¢çã«äœçœ®åãããããŸãïŒã è¡šã®å€ã¯ãè¿œå ããåèªããšã«ããç§åäœã§æå®ãããŸãã
ãµã€ãºãèšè | sum0ïŒmovdgaïŒ | sum1ïŒã«ãŒãïŒ | sum2ïŒmovdquïŒ | sum3ïŒã«ãŒããmemcpyïŒ |
---|
1 | 2.91 | 1.95 | 2.90 | 1.94 |
5 | 0.84 | 0.79 | 0.77 | 0.79 |
16 | 0.46 | 0.45 | 0.41 | 0.46 |
1024 | 0.24 | 0.46 | 0.26 | 0.48 |
65536 | 0.24 | 0.45 | 0.24 | 0.45 |
ãã®è¡šã¯ãåèªæ°ãéåžžã«å°ãªãå ŽåïŒ1ïŒãéåžžã®ã«ãŒãã¯SSEããŒã¹ã®ããŒãžã§ã³ãããéãåäœããããšã確èªããŠããŸãããéãã¯ããã»ã©å€§ãããããŸããïŒåèªããšã«1ããç§ãåèªã¯1ã€ã ãã§ãïŒã
SSEã¯å€æ°ã®ã¯ãŒãïŒ1024以äžïŒã§ã¯ããã«é«éã§ãããããã§å
šäœçãªã²ã€ã³ã®çµæã¯éåžžã«éèŠã§ãã
äžèŠæš¡ã®å
¥åããŒã¿ïŒ16ãªã©ïŒã§ã¯ãé床ã¯ã»ãŒåãã§ãããSSEïŒ
movdqu
ïŒã®ããããªå©ç¹ããããŸãã
1ã16ã®ãã¹ãŠã®å€ã§ãã¹ããå®è¡ãã平衡ç¹ãã©ãã«ãããã確èªããŸãã ããŒãžã§ã³
sum1
ïŒéSSEãµã€ã¯ã«ïŒãš
sum3
ã¯éåžžã«ãã䌌ãçµæã瀺ããŸãïŒã³ãŒãã¯åãã§ãããããäºæ³ãããçµæã§ããçµæã®éãã¯ã0.02 nsã®é åã§ã®æž¬å®èª€å·®ã瀺ããŠããŸãïŒã ãã®ãããæçµããŒãžã§ã³ïŒ
sum3
ïŒã®ã¿ããã£ãŒãã«è¡šç€ºãããŸãã

åçŽãªã«ãŒãã¯ãæ倧3ã¯ãŒãã®SSEããŒãžã§ã³ãããåªããŠããããšãããããŸãããã®åŸãSSEããŒãžã§ã³ãåŒãç¶ããå§ããŸãïŒéåžžã
movdqu
ããŒãžã§ã³ã¯å
ã®
movdqf
ãããé«é
movdqf
ïŒã
è¿œå æ
å ±ããªãå Žåãã³ã³ãã€ã©ã¯ãä»»æã®ã«ãŒãã3å以äžå®è¡ããããšããåæã§æ£ãããšæãã®ã§ãSSEã䜿çšãã決å®ã¯å®å
šã«æ£ãããšæããŸãã ãããããªã圌ã¯ããã«
movdqu
ãªãã·ã§ã³ã«ã¢ã¯ã»ã¹ããªãã£ãã®ã§ããïŒ
movdqa
ã䜿çšããçç±ã¯ãããŸããïŒ
ããŒã¿ãæŽåããããšã
movdqu
ããŒãžã§ã³ã¯ãå€æ°ã®åèªã§
movdqa
ãšåãé床ã§å®è¡ãããå°æ°ã®åèªã§ããé«éã«åäœããããšã
movdqa
ãŸããã åŸè
ã¯ãã«ãŒãã®åã«ããããå°ãªãåœä»€ã§èª¬æã§ããŸãïŒã¢ã©ã€ã¡ã³ãããã§ãã¯ããå¿
èŠã¯ãããŸããïŒã ã¢ã©ã€ã¡ã³ããããŠããªãããŒã¿ã§ãã¹ããå®è¡ãããšã©ããªããŸããïŒ äžéšã®ãªãã·ã§ã³ã®çµæã¯æ¬¡ã®ãšããã§ãã
ãµã€ãºãèšè | ãªãã»ãã0 | ãªãã»ãã1 | ãªãã»ãã4 |
---|
movdqa | movdqu | ã«ãŒã | movdqu | ã«ãŒã | movdqa | movdqu | ã«ãŒã |
---|
1 | 2.91 | 2.90 | 1.94 | 2.93 | 1.94 | 2.90 | 2.90 | 1.94 |
5 | 0.84 | 0.77 | 0.79 | 0.77 | 0.79 | 0.84 | 0.79 | 0.78 |
16 | 0.46 | 0.41 | 0.46 | 0.42 | 0.46 | 0.52 | 0.40 | 0.46 |
1024 | 0.24 | 0.26 | 0.48 | 0.26 | 0.51 | 0.25 | 0.25 | 0.47 |
65536 | 0.24 | 0.24 | 0.45 | 0.25 | 0.50 | 0.24 | 0.24 | 0.46 |
ã芧ã®ãšããã1ã€ã®äŸå€ãé€ããŠãã¢ã©ã€ã¡ã³ãã¯é床ã«ããããªå€åããäžããŸããïŒ
movdqa
ããŒãžã§ã³ã¯16ã¯ãŒãã§4ã®ãªãã»ããã§å°ãïŒ0.46 nsã§ã¯ãªã0.52 nsïŒé
ããªãå§ããŸãã çŽæ¥çãªã«ãŒãã¯ãå°æ°ã®åèªã§ã¯äŸç¶ãšããŠæé©ãªãœãªã¥ãŒã·ã§ã³ã§ãã
movdqu
ã¯ãå€æ°ã®æé©ãªãœãªã¥ãŒã·ã§ã³ã§ãã ã³ã³ãã€ã©ã¯
movdqa
ã䜿çšããŠééã£ãŠã
movdqa
ã èãããã説æã¯ãå€ãIntelããã»ããµã¢ãã«çšã«æé©åãããŠããããšã§ãã
movdqu
åœä»€ã¯ãå®å
šã«ã¢ã©ã€ã¡ã³ããããããŒã¿ã§ãã£ãŠããXeonããã»ããµã§ã¯
movdqa
ãããå°ãé
ããªã
movdqa
ãã çŸåšãããã¯ãã¯ã芳å¯ãããŠããªãããã§ãããããã³ã³ãã€ã©ãç°¡çŽ åã§ããŸãïŒããã³ã¢ã©ã€ã¡ã³ãèŠä»¶ãç·©åãããŸãïŒã
ãªãªãžãã«æ©èœ
IPããããŒããã§ãã¯ããããã®å
ã®é¢æ°ã¯ã次ã®ããã«æžãæããããã¯ãã§ãã
#if defined (__GNUC__) && (defined (__x86_64__) || defined (__i386__)) __attribute__ ((target ("no-sse"))) #endif _Bool check_ip_header_sum (const char * p, size_t size) { const uint32_t * q = (const uint32_t *) p; uint32_t temp; uint64_t sum = 0; memcpy (&temp, &q [0], 4); sum += temp; memcpy (&temp, &q [1], 4); sum += temp; memcpy (&temp, &q [2], 4); sum += temp; memcpy (&temp, &q [3], 4); sum += temp; memcpy (&temp, &q [4], 4); sum += temp; for (size_t i = 5; i < size / 4; i++) { memcpy (&temp, &q [i], 4); sum += temp; } do { sum = (sum & 0xFFFF) + (sum >> 16); } while (sum & ~0xFFFFL); return sum == 0xFFFF; }
ã¢ã©ã€ã¡ã³ããããŠããªããã€ã³ã¿ãŒã
uint32_t*
ïŒæšæºã§ã¯æªå®çŸ©ã®åäœã«ã€ããŠè¿°ã¹ãŠããïŒã«å€æããã®ãæãå Žåãã³ãŒãã¯æ¬¡ã®ããã«ãªããŸãã
#if defined (__GNUC__) && (defined (__x86_64__) || defined (__i386__)) __attribute__ ((target ("no-sse"))) #endif _Bool check_ip_header_sum (const char * p, size_t size) { uint32_t temp; uint64_t sum = 0; memcpy (&temp, p, 4); sum += temp; memcpy (&temp, p + 4, 4); sum += temp; memcpy (&temp, p + 8, 4); sum += temp; memcpy (&temp, p + 12, 4); sum += temp; memcpy (&temp, p + 16, 4); sum += temp; for (size_t i = 20; i < size; i+= 4) { memcpy (&temp, p + i, 4); sum += temp; } do { sum = (sum & 0xFFFF) + (sum >> 16); } while (sum & ~0xFFFFL); return sum == 0xFFFF; }
ã©ã¡ãã®ããŒãžã§ã³ããç¹ã«2çªç®ã®ããŒãžã§ã³ã§ã¯éåžžã«èŠèŠãããªããŸãã ã©ã¡ããçŽç²ãªã¢ã»ã³ããªèšèªããã°ã©ãã³ã°ãæãåºãããŸãã ãã ããããã¯ããŒã¿ãã«Cã³ãŒããèšè¿°ããæ£ããæ¹æ³ã§ãã
èå³æ·±ãããšã«ããã¹ãã§ã¯ããµã€ã¯ã«ã¯
movdqu
ãšåãé床ã§5ã¯ãŒãã§åäœããŸãããã0ãã
size
ãŸã§ã®1ãµã€ã¯ã«ã§ãã®é¢æ°ãèšè¿°ããåŸ
size
ããé
ãåäœãå§ããŸããïŒéåžžã®çµæã¯0.48 nsããã³0.83åèªãããã®nsïŒã
C ++ããŒãžã§ã³
C ++ã§ã¯ãããã€ãã®ãã³ãã¬ãŒããé©çšããããšã§ãåãé¢æ°ãããèªã¿ãããæ¹æ³ã§äœæã§ããŸãã ãã©ã¡ãŒã¿ãŒåãããå
const_unaligned_pointer
ãå°å
¥ããŸãã
template<typename T> class const_unaligned_pointer { const char * p; public: const_unaligned_pointer () : p (0) {} const_unaligned_pointer (const void * p) : p ((const char*)p) {} T operator* () const { T tmp; memcpy (&tmp, p, sizeof (T)); return tmp; } const_unaligned_pointer operator+ (ptrdiff_t d) const { return const_unaligned_pointer (p + d * sizeof (T)); } T operator[] (ptrdiff_t d) const { return * (*this + d); } };
ããããã¬ãŒã å
šäœã§ãã ãã®å®çŸ©ã«ã¯ãåçæ§ãã¹ãã2ã€ã®ãã€ã³ã¿ãŒã®ãã€ãã¹æŒç®åãä»ã®æ¹åã®ãã©ã¹æŒç®åãããã€ãã®å€æãããã³ããããä»ã®ãã®ãå«ããå¿
èŠããããŸãã
ãã©ã¡ãŒã¿åãããåã䜿çšãããšãé¢æ°ã¯éå§ããå Žæã«éåžžã«è¿ããªããŸãã
bool check_ip_header_sum (const char * p, size_t size) { const_unaligned_pointer<uint32_t> q (p); uint64_t sum = 0; sum += q[0]; sum += q[1]; sum += q[2]; sum += q[3]; sum += q[4]; for (size_t i = 5; i < size / 4; i++) { sum += q[i]; } do { sum = (sum & 0xFFFF) + (sum >> 16); } while (sum & ~0xFFFFL); return sum == 0xFFFF; }
ãããããcã³ãŒãmemcpy
ãšãŸã£ããåãã¢ã»ã³ãã©ã³ãŒããååŸããæããã«åãé床ã§åäœããŸããããã«ããã€ãã®ãã³ãã¬ãŒã
ã³ãŒãã¯éå¢çæŽåããŒã¿ã®ã¿ãèªã¿åããããconst_unaligned_pointer
ååãªclassããããŸããç§ãã¡ããããæžããããªãã©ãããŸããïŒãã®ããã®ã¯ã©ã¹ãäœæã§ããŸããããã®å Žåã2ã€ã®ã¯ã©ã¹ãå¿
èŠã§ãã1ã€ã¯ãã€ã³ã¿ãŒçšã§ããã1ã€ã¯lå€çšã§ããã®ãã€ã³ã¿ãŒã®éåç
§äžã«ååŸãããŸãã template<typename T> class unaligned_ref { void * p; public: unaligned_ref (void * p) : p (p) {} T operator= (const T& rvalue) { memcpy (p, &rvalue, sizeof (T)); return rvalue; } operator T() const { T tmp; memcpy (&tmp, p, sizeof (T)); return tmp; } }; template<typename T> class unaligned_pointer { char * p; public: unaligned_pointer () : p (0) {} unaligned_pointer (void * p) : p ((char*)p) {} unaligned_ref<T> operator* () const { return unaligned_ref<T> (p); } unaligned_pointer operator+ (ptrdiff_t d) const { return unaligned_pointer (p + d * sizeof (T)); } unaligned_ref<T> operator[] (ptrdiff_t d) const { return *(*this + d); } };
ç¹°ãè¿ããŸããããã®ã³ãŒãã¯ã¢ã€ãã¢ã瀺ããŠããŸããæ¬çªç°å¢ã§ã®äœ¿çšã«é©ãããã®ã«ããããã«ãå€ãã®è¿œå ãå¿
èŠã§ããç°¡åãªãã¹ããå®è¡ããŠã¿ãŸãããã char mem [5]; void dump () { std::cout << (int) mem [0] << " " << (int) mem [1] << " " << (int) mem [2] << " " << (int) mem [3] << " " << (int) mem [4] << "\n"; } int main (void) { dump (); unaligned_pointer<int> p (mem + 1); int r = *p; r++; *p = r; dump (); return 0; }
åºåã¯æ¬¡ã®ãšããã§ãã 0 0 0 0 0 0 1 0 0 0
ç§ãã¡ã¯æžãããšãã§ããŸã ++ *p;
ããããããã«ã¯operator++
cã®å®çŸ©ãå¿
èŠunaligned_ref
ã§ããçµè«
- RISC. - SSE x86 ( 32-, 64- ).
- , SSE . â , ( , - ).
- : .
- 20幎ã«ããã£ãŠæžãããã³ãŒãã¯ãããããããIntelã ãã§æ©èœããŠããŸããããã®ã³ãŒãã¯ãåãæ¹æ³ã§çªç¶å€±æãå§ããå¯èœæ§ããããŸããå®çšçãªã¢ããã€ã¹ã1ã€ãããŸãããã®ãããªã³ãŒãã®ã³ã³ãã€ã«äžã«ãå¯èœãªéãæ¡åŒµãããåœä»€ã»ããããã¹ãŠç¡å¹ã«ããŸãããã ããããã§ã解決ããªãå ŽåããããŸãã
- ãã®ã¹ããŒãªãŒã¯ãã³ãŒãã«ãã¬ããžããŒã«ã«åœ¹ç«ã€ãã®ãããããšã瀺ããŠããŸããããã§ã¯ãå
¥åã«ããã³ãŒãå
šäœãå®è¡ãããããšã幞éã§ããã次åã¯éããããããŸããã
æŽæ°ãã
æã«/ R / CPP /ãŠãŒã¶ãŒOldWolf2㯠æ°ã¥ãããã§ãã¯ãµã ã³ãŒããæåŸã®è¡ã«ãšã©ãŒãå«ãŸããŠããããšïŒ } while (sum & ~0xFFFFL);
圌ã¯æ£ããïŒåžžã«åãã§ã¯ãªã0xFFFFL
ã¿ã€ããé·ãã¯32ãããã«ããããšãã§ãã64ããããžã®æ¡åŒµã®åã«ãããã®å転ïŒéã³ãŒãïŒãçºçãããã¹ãã®å®éã®å®æ°ã¯ã«ãªããŸãããã®ãããªãã¹ãã倱æããå Žåãããšãã°2ã€ã®åèªã®é
åã®å Žåãå
¥åãååŸããã®ã¯ç°¡åã§ãïŒãšã64ãããã«å€æããåŸããªããŒã¹ã³ãŒããå®è¡ã§ããŸããunsigned long
uint64_t
long
0x00000000FFFF0000
0xFFFFFFFF
0x00000001
} while (sum & ~(uint64_t) 0xFFFF);
ãŸãã¯ããªãã·ã§ã³ãšããŠãæ¯èŒãè¡ããŸãã } while (sum > 0xFFFF);
èå³æ·±ãããšã«ãGCCã¯2çªç®ã®ã±ãŒã¹ã§ããç°¡æœãªã³ãŒããçæããŸãããã¹ãããŒãžã§ã³ã¯æ¬¡ã®ãšããã§ãã .L15: movzwl %ax, %edx shrq $16, %rax addq %rdx, %rax movq %rax, %rdx xorw %dx, %dx testq %rdx, %rdx jne .L15
ãããŠãããã«æ¯èŒã®ããŒãžã§ã³ããããŸãïŒ .L44: movzwl %ax, %edx shrq $16, %rax addq %rdx, %rax cmpq $65535, %rax ja .L44
以äžãŸãã¯redditã§ã³ã¡ã³ããæè¿ããŸãã