
ããŒãžã§ã³8.2以éã
Intel Performance PrimitivesïŒIPPïŒã©ã€ãã©ãªã¯ã颿°ã®å
éšäžŠååããå€éšãžäœç³»çã«ç§»è¡ããŸãã ãã®æ±ºå®ã®çç±
ã¯è€æ°ã®Streamsã®Image Processingã®ããã®Border Supportãããèšäº
IPP Functionsã§æŠèª¬ãããŸãã
ãã®æçš¿ã§ã¯ãæçµå¿çãæã€ãã£ã«ã¿ãŒãå®è£
ãã颿°-FIRãã£ã«ã¿ãŒïŒæéã€ã³ãã«ã¹å¿çïŒãæ€èšããŸãã
FIRãã£ã«ã¿ãŒ
ãã£ã«ã¿ã¯ãããžã¿ã«ä¿¡å·åŠçã§æãéèŠãªåéã®1ã€ã§ãã ãããŠãã¡ãããIPPã©ã€ãã©ãªã«ã¯ãFIRïŒæéã€ã³ãã«ã¹å¿çïŒãã£ã«ã¿ãŒãå«ããããã®ãã£ã«ã¿ãŒã®ã»ãšãã©ã®ã¯ã©ã¹ã®å®è£
ããããŸãã FIRãã£ã«ã¿ãŒã®è©³çްãªèª¬æã¯ã倿°ã®æç®ãŸãã¯Wikipediaã§èŠã€ããããšãã§ããŸãããç°¡åã«èšãã°ãFIRãã£ã«ã¿ãŒã¯ãããã€ãã®ä»¥åã®ãµã³ãã«ãšå
¥å颿£ä¿¡å·ã®çŸåšã®ãµã³ãã«ã«ãããã«å¯Ÿå¿ããä¿æ°ãåã«ä¹ç®ãããããã®è£œåã远å ããŠãåºåä¿¡å·ã®çŸåšã®ãµã³ãã«ãåãåããŸãã ãŸãã¯ããå°ã圢åŒçã«ïŒFIRãã£ã«ã¿ãŒ
ã¯ãé·ã
Nãµã³ãã«ã®å
¥åãã¯ãã«
Xãé·ã
Nã®åºåãã¯ãã«
Yã«å€æããŸããå
¥åãã¯ãã«ã®
Kãµã³ãã«ã«å¯Ÿå¿ãã
Kä¿æ°
Hãä¹ç®ããããããå ç®ããŸãã ä¿æ°
Kã®æ°ã¯ããã£ã«ã¿ãŒã®æ¬¡æ°ãšåŒã°ããŸãã
å³ 1. FIRãã£ã«ã¿ãŒããã«ïŒ
tapsLenã¯ãã£ã«ã¿ãŒæ¬¡æ°ã
numItersã¯ãã¯ãã«ã®é·ãã§ãã
ãã®å³ã¯IPPã©ã€ãã©ãªã®ããã¥ã¡ã³ãããåãããŠãããããIPPã§åãå
¥ããããŠããçšèªã䜿çšãããŸãã
èŠèŠçã«ãFIRãã£ã«ã¿ãŒã¯æ¬¡ã®ããã«æ³åã§ããŸãã
å³ 2. FIRãã£ã«ã¿ãŒã®æŠç¥å³ã芧ã®ãšãããããã§ãã£ã«ã¿ãŒæ¬¡æ°Kã¯4ã§ããã4ã€ã®ãã£ã«ã¿ãŒä¿æ°hã«ãã¯ãã«xã®4ã€ã®ãµã³ãã«ãä¹ç®ããåèšãåºåãã¯ãã«yã®1ã€ã®ãµã³ãã«ã«å ç®ããŠæžã蟌ã¿ãŸãã ãã£ã«ã¿ä¿æ°h [3]ãh [2]ãh [1]ãh [0]ã¯ãå³ã«äžè¬çã«åãå
¥ããããŠããåŒã«åŸã£ãŠãxãšyã«é¢ããŠéã®é åºã§ã¡ã¢ãªå
ã«ããããšã«æ³šæããŠãã ããã 1
é
å»¶ç·
FIRãã£ã«ã¿ãŒã¯éåžžã®ç³ã¿èŸŒã¿ã§ãããããé·ããNãµã³ãã«ã®åºåãã¯ãã«ãååŸããã«ã¯ãN + K-1åã®å
¥åãµã³ãã«ãå¿
èŠã§ãïŒKã¯ã³ã¢ã®é·ãïŒã æåã®K-1ãµã³ãã«ã¯ãé
å»¶ã©ã€ã³ãïŒé
å»¶ã©ã€ã³ïŒãšåŒã°ããŸãã å³ 2ãçªå·ã¯x [-3]ãx [-2]ãx [-1]ã§ãã 颿°ã«æäŸãããããŒã¿ã¯éåžžã«å€§ãããªãå¯èœæ§ãããããã®çµæãããŒã¿ã¯åå¥ã«é 次åŠçããããããã¯ã«åå²ã§ããŸãã ããšãã°ããªãŒãã£ãªä¿¡å·ã§ããå Žåããªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã«ãã£ãŠãããã¡ãªã³ã°ã§ããŸããå€éšããã€ã¹ããã®ããŒã¿ã§ããå Žåãéä¿¡åç·ãä»ããŠéšåçã«åä¿¡ã§ããŸãã ãŸããå¯èœæ§ã®ããããŒã¿ã®éãäºåã«ããããªãããããããã¡ããã³ã¢ããªã±ãŒã·ã§ã³èªäœã§ããŒã¿ãåŠçã§ããŸãã ãã®å Žåãäœæ¥ãããã¡ãŒã«ã¯ç¹å®ã®åºå®é·ãå²ãåœãŠããããããããšãã°ãäžå®ã¬ãã«ã®ãã£ãã·ã¥ã«åãŸãããã¹ãŠã®ããŒã¿ããããã§ãã®ãããã¡ãŒãééããŸãã ãã®ãããªå Žåã¯ãã¹ãŠãé
å»¶ç·ãéåžžã«åœ¹ç«ã¡ãŸãã ããŒã¿ããããã¯ã«åå²ããŠããšããžå¹æããªãããã«ãããŒã¿ã1ã€ã®é£ç¶ããã¹ããªãŒã ã«éåžžã«åçŽã«ãæ¥çãããã®ã«åœ¹ç«ã¡ãŸãã
IPP API
IPPã©ã€ãã©ãªã®é·å¹Žã®äœ¿çšçµéšãããæ¬¡ã®èŠä»¶ãæºããããã«FIRãã£ã«ã¿APIã倿Žããå¿
èŠãããããšãæããã«ãªããŸããã
- é æ¬¡ãããã¯ã§ãã¯ãã«ãåŠçããããšãå¯èœã§ãã
- é ãããã¡ã¢ãªå²ãåœãŠã¯ãããŸããã
- ç°ãªãã¹ã¬ããã§ã®ãã¯ãã«åŠçããµããŒããããŸãã
- ã€ã³ãã¬ãŒã¹ã¢ãŒãã¯èš±å®¹ãããŸãããã€ãŸããå
¥åãã¯ãã«ã¯åæã«åºåã§ãã
ããããã¹ãŠã®èŠä»¶ãåæã«æºããããã«ããå
¥åãããã³ãåºåãé
å»¶ç·ã®æŠå¿µãå°å
¥ããããã®åŸãAPIã¯æ¬¡ã®ããã«ãªãå§ããŸããã
ãã®APIã¯ãIPPã§äœ¿çšãããæšæºã¹ããŒã ã«åŸããŸãã ãŸãã
ippsFIRSRGetSize颿°ã䜿çšããŠã颿°ã³ã³ããã¹ããšäœæ¥ãããã¡ãŒã®ã¡ã¢ãªãµã€ãºãèŠæ±ãããŸãã æ¬¡ã«ã
ippsFIRSRInit颿°ã
åŒã³åºãã ãããã«ãã£ã«ã¿ãŒä¿æ°ãæäŸãããŸãã ãã®é¢æ°ã¯ãpSpecæ§é äœã®å
éšããŒã¿ããŒãã«ãåæåãã
ippsFIRSRåŠç颿°ã®æäœãå éããŸãã ãã®æ§é äœã®å
容ã¯ã颿°ã®åäœäžã«å€åããããã®ååSpecã«åæ ãããŸãããããã£ãŠãè€æ°ã®ã¹ã¬ããã§åæã«äœ¿çšããŠãã¡ã¢ãªãããå¹ççã«äœ¿çšã§ããŸãã pBufãã©ã¡ãŒã¿ãŒã¯ã颿°ã®äœæ¥çšããã³å€æŽå¯èœãªãããã¡ãŒã§ãããããåäœæ¥ãããã¡ãŒã¯ã¹ã¬ããããšã«å²ãåœãŠãå¿
èŠããããŸãã
ãµãã£ãã¯ã¹SRã¯ã·ã³ã°ã«ã¬ãŒããæå³ããMRïŒãã«ãã¬ãŒãïŒãã£ã«ã¿ãŒã®åäžæ§ã®ããã«äœ¿çšãããŸããMRãã£ã«ã¿ãŒã®èª¬æã¯å®å
šã«å¥ã®èšäºã«ããããšãã§ããŸãã numItersãã©ã¡ãŒã¿ãŒãMRãã£ã«ã¿ãŒããååŸãããŸãããã®å Žåãåã«ãã¯ãã«ã®é·ããæå³ããŸãã
ãã©ã¡ãŒã¿pSrcã¯ãåŠçããããããã¯x [0]ã®å
é ãæããŸãã
次ã«ãpDlySrcãã©ã¡ãŒã¿ãŒãšpDlyDstãã©ã¡ãŒã¿ãŒã®æå³ãèŠãŠã¿ãŸãããã
å³ 3.ãå
¥åãããã³ãåºåãé
å»¶ç·åè¿°ã®ããã«ãx [-3]ãx [-2]ãx [-1]ã®å¿
èŠæ§ã¯ãç³ã¿èŸŒã¿åŒã«ç±æ¥ããŸãã ãããã®èŠçŽ ã¯ãå
¥åé
å»¶ç·ãpDlySrcãšåŒã°ããŸãã ãµã³ãã«x [N-3]ãx [N-2]ãx [N-1]ã¯åŠçããããã¯ãã«ã®ãããŒã«ãã§ããã€ãŸãã æåŸã®K-1ã¢ã€ãã ã ãããã¯ãpDlyDstãåºåé
å»¶ç·ããšåŒã°ããŸãã æ¬¡ã®ãããã¯ã§ã¯ãããããå
¥åè¡ãªã©ã«ãªããŸãã
å
¥åé
å»¶ã©ã€ã³pDlySrcã¯ãx [0]ã®å·Šã«ããk-1åã®ãµã³ãã«ãä»ã®ãããã¡ãŒããŸãã¯NULLãæãããšãã§ããŸãã NULLã®å Žåãå
¥åé
å»¶ç·ã®ãã¹ãŠã®èŠçŽ ã0ã§ãããšæ³å®ãããŸããããã¯ãããŒã¿ããŸã ãªãåæãããã¯ã«äŸ¿å©ã§ãã
pDlyDstã¢ãã¬ã¹ã¯ããããã¯ã®ãããŒã«ããèšé²ããŸãã æåŸã®ãµã³ãã«ã®k-1ã å€ãNULLã®å Žåãäœãæžã蟌ãŸããŸããã
ãã®ãããª2ã€ã®é
å»¶ç·ã®ã¡ã«ããºã ã«ãããã€ã³ãã¬ãŒã¹ã¢ãŒãã®å Žåã§ãããã¯ãã«ã®äžŠååŠçãå¯èœã«ãªããŸãã ãã¯ãã«ãäžæžãããããšãã ãããè¡ãã«ã¯ãæåã«ãããã¯ã®ãããŒã«ããåå¥ã®ãããã¡ãŒã«ã³ããŒããåã¹ããªãŒã ãžã®å
¥åè¡ãšããŠéä¿¡ããã ãã§ååã§ãã ãã®èšäºã§äœ¿çšãããŠããã³ãŒãã®äŸã¯ãæåŸã«1ã€ã®ãªã¹ãã§ç€ºãããŠããŸãã
ããŒãã¹IPP FIRãã£ã«ã¿ãŒã®äœ¿çšäŸã
ããšãã°ãä¿¡å·ã®äœåšæ³¢æåã®ã¿ãæ®ãããã«IPP FIRãã£ã«ã¿ãŒã䜿çšããæ¹æ³ãæ€èšããŠãã ããã
å
ã®ãã£ã«ã¿ãŒãããŠããªãä¿¡å·ãçæããã«ã¯ãç¹å¥ãªIPP颿°Jaehneã䜿çšããŸãã
pDst [n] = magn * sinïŒïŒ0.5Ïn2ïŒ/ lenïŒã0â€n <lenãã®æ©èœã¯ãå€ãã®IPPæ©èœããã¹ããããŠããäž»å補åã§ãã çæãããä¿¡å·ãæãåçŽãª.csvãã¡ã€ã«ã«æžã蟌ã¿ãExcelã§ç»åãæç»ããŸãã å
ã®ä¿¡å·ã¯æ¬¡ã®ããã«ãªããŸãã
å³ 4. 128 Jaehneä¿¡å·ãµã³ãã«ããšãã°ã次æ°31ã®ãã£ã«ã¿ãŒãèããŸããä¿æ°ãçæããã«ã¯ãIPP颿°
ippsFIRGenLowpass_64fã
䜿çšãããŸãã ãã®é¢æ°ã¯ä¿æ°ãdoubleã§ã®ã¿èšç®ãããããfloatã«å€æãããŸãã ä»é²ã®
firgenlowpassïŒïŒé¢æ°ã³ãŒããåç
§ããŠãã ããã ãã®é¢æ°ãåŒã³åºããåŸããããã¡ãŒãµã€ãºãåæåãããã³ã¡ã€ã³é¢æ°ippsFIRSRã®åŒã³åºããèšç®ããããã®ããã©ãŒãã³ã¹ã枬å®ãããŸãã
ããŒãã¹ãã£ã«ã¿ãŒãé©çšããåŸãä¿¡å·ã«äœåšæ³¢æåãæ®ããŸããã äœçžãã·ããããŠããããšã«æ³šæããŠãã ããããã ããããã¯ãã§ã«FIRãã£ã«ã¿ãŒèªäœã®ããããã£ã«åŸã£ãŠãããIPPã©ã€ãã©ãªã«ã¯é©çšãããŸããã
å³ 5.128ããŒãã¹ãã£ã«ã¿ãŒåŸã®Jaehneä¿¡å·ãµã³ãã«ãããã®å³ã§ã¯ãFIRãã£ã«ã¿ãŒã¯128ãµã³ãã«ãåŠçããŸããå
¥åé
å»¶ã©ã€ã³ã®30ãµã³ãã«ã¯0ã«èšå®ãããpDlySrc = NULLã瀺ããŸãã åºåè¡pDlyDst = NULLãå¿
èŠãããŸããã
ãã«ãã¹ã¬ããã®ããã©ãŒãã³ã¹
IPPã©ã€ãã©ãªãŒã®ååã«ã¯ããã©ãŒãã³ã¹ãšããèšèããããããã¯æåç·ã«ãããŸãã ãããã£ãŠãAVX2ããµããŒãããããã»ããµã§ã®
ippFIRSR颿°ã®ããã©ãŒãã³ã¹ã枬å®ããŸãã ãã®åŸãOpenMPã䜿çšããŠæ¬¡ã®ãã«ãã¹ã¬ããã³ãŒããå®è£
ããæž¬å®ããæž¬å®çµæã1ã€ã®ã°ã©ãã«ãŸãšããŸãã
FIRãã£ã«ã¿ãŒAPIã¯ãå³ã«ç€ºãããã«ããã¯ãã«ãè€æ°ã®ã¹ããªãŒã ã«åå²ããããšãåçŽãã€è«ççã§ããããã«èšèšãããŸããã
å³ 6.ã¹ã¬ããéã§å
ã®ãã¯ãã«ãåå²ããã¹ããªãŒã éã§ãã¯ãã«ãåå²ããæ¬¡ã®æ¹æ³ãæç€ºãããŠããŸããfir_omp颿°ãåç
§ããŠãã ããã
Fir_ompã³ãŒã void fir_omp(Ipp32f* src, Ipp32f* dst, int len, int order, IppsFIRSpec_32f* pSpec, Ipp32f* pDlySrc, Ipp32f* pDlyDst, Ipp8u* pBuffer) { int tlen, ttail; tlen = len / NTHREADS; ttail = len % NTHREADS; #pragma omp parallel num_threads(NTHREADS) { int id = omp_get_thread_num(); Ipp32f* s = src + id*tlen; Ipp32f* d = dst + id*tlen; int len = tlen + ((id == NTHREADS-1) ? ttail : 0); Ipp8u* b = pBuffer + id*bufSize; if (id == 0) ippsFIRSR_32f(s, d, len, pSpec, pDlySrc, NULL, b); else if (id == NTHREADS - 1) ippsFIRSR_32f(s, d, len, pSpec, s - (order - 1), pDlyDst, b); else ippsFIRSR_32f(s, d, len, pSpec, s - (order - 1), NULL, b); } }
ãã®ã³ãŒãã®æ©èœãæ€èšããŠãã ããã ãã®ããããã£ã«ã¿ãŒã®åŠçãå¿
èŠãªä¿¡å·x [0]ã...ãx [N-1]ã®æ¬¡ã®éšåãšãå
¥åããã³åºåé
å»¶ã©ã€ã³ãžã®ãã€ã³ã¿ãŒãã€ãŸãåã®éšåãšãããã¡ãŒã®ããŒã«ãåãåããŸãããçŸåšã®éšåã®ãå°Ÿããé
眮ããŸãã ãã£ã«ã¿ãªã³ã°ããã»ã¹ãé«éåãããã®éšåã®åŠçãã¹ã¬ããæ°ã«å¯Ÿå¿ããT = NTHREADSãããã¯ã«åå²ããŸãã ãããè¡ãã«ã¯ãå
¥åè¡ãšåºåè¡ãæ£ããæå®ããåã¹ããªãŒã ã«äœæ¥ãããã¡ãŒãå²ãåœãŠãã ãã§ãã
0çªç®ã®ã¹ããªãŒã ã®å Žåã
ippsFIRSRãåŒã³åºããããšãã®å
¥åé
å»¶ã©ã€ã³ã¯åã®éšåãšåããããŒã«ãã§ãããä»ã®ãã¹ãŠã®å Žåãorder-1èŠçŽ ã«ãã£ãŠã·ããããããããã¯ãžã®ãã€ã³ã¿ãŒãå
¥åã©ã€ã³ãšããŠæäŸãããŸãã ãããŠãæåŸã®ã¹ããªãŒã ã®ã¿ãéšåã®ãããŒã«ããæžã蟌ã¿ãŸãã
äžèšã®ã¢ãããŒãã¯ãçµæã®ãã¯ãã«ãå
ã®ãã¯ãã«ãšã¯ç°ãªãã¢ãã¬ã¹ã«æžã蟌ãŸããããšãæå³ããŸããããŒã¿ãäžæžããããå Žåãé
å»¶ç·ã¯äºåã«å¥ã®ãããã¡ã«ã³ããŒããå¿
èŠããããŸãã
ãã®ã°ã©ãã¯ãAVX2Intel®CoreïŒTMïŒi7-4770K 3.50Ghzåœä»€ããµããŒãããããã»ããµãŒäžã®4次31ãã£ã«ã¿ãŒã¹ã¬ããã®ã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãšãã«ãã¹ã¬ããããŒãžã§ã³ã®ããã©ãŒãã³ã¹ã瀺ããŠããŸãã FIRãã£ã«ã¿ãŒã®å ŽåãcpMACãŠãããã䜿çšãããŸãã æäœããšã®ã¡ãžã£ãŒæ°ä¹ç®+å ç®
cpMAC =ïŒé¢æ°å®è¡æéïŒ/ïŒãã¯ãã«é·*ãã£ã«ã¿ãŒæ¬¡æ°ïŒ
å³ 7. FIRãã£ã«ã¿ãŒã®ã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãšãã«ãã¹ã¬ããããŒãžã§ã³ã®ããã©ãŒãã³ã¹ã®æ¯èŒé¢æ°ã®ã¹ã±ãŒãªã³ã°ã¯éåžžã«ããããã«ãã¹ã¬ããããŒãžã§ã³ã¯ã4ã¹ã¬ããã«éåžžã«ãã察å¿ããã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãããååã«é·ããã¯ãã«ã§çŽ3.7åé«éã«åäœããããšãããããŸãã æ°ããAPIã䜿çšããŠãã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãšãã«ãã¹ã¬ããããŒãžã§ã³ãåãæ¿ããããã®åºæºã¯ãç¹å®ã®ãã·ã³ã«å¯ŸããŠå®éšçã«éžæã§ããŸãã以åã®ãã·ã³ãšã¯ç°ãªããåºæºã¯ã³ãŒãã«çµã¿èŸŒãŸãã颿°ã¯å
éšãã䞊åã§ããã
çŽæ¥å®è£
ãšFFTå®è£
ã®æ¯èŒ
ããžã¿ã«ä¿¡å·åŠçã§ã¯ãç³ã¿èŸŒã¿ãšããŒãªãšå€æã®çžäºãããã³ã°ãåºã䜿çšãããŠããŸãã
çŽæ¥å®è£
ã«å ããŠãIPP FIRãã£ã«ã¿ãŒã«ã¯FFTãä»ããå®è£
ããããçµæã®cpMACã¯ãç¹å®ã®CPUããã³çŽæ¥ã¢ã«ãŽãªãºã ã§çè«çã«å¯èœãªå€ãè¶
ããããšããããŸãã
ããã§ã䜿çšããã¢ã«ãŽãªãºã ã®ã¿ã€ãã瀺ãããã«ãalgTypeãã©ã¡ãŒã¿ãŒã®å€ã®1ã€-ippAlgDirect ippAlgFFTãippAlgAutoã䜿çšããå¿
èŠããããŸãã æåŸã®ãã©ã¡ãŒã¿ãŒã¯ã䜿çšãããCPUã®åºå®åºæºã«åŸã£ãŠé¢æ°ãã¢ã«ãŽãªãºã ãéžæããããšãæå³ããåžžã«æé©ãšã¯éããŸããã
çŽæ¥ã¢ã«ãŽãªãºã ãšFFTå®è£
ã䜿çšããŠã1024ããã³128ãµã³ãã«ã®ãã¯ãã«é·ã®ç°ãªã次æ°ã®ãã£ã«ã¿ãŒã®åãCPUã§ã®ããã©ãŒãã³ã¹ãèæ
®ããŠãã ããã
å³ 8. 1024ãµã³ãã«ã®é·ãã§ã®çŽæ¥å®è£
ãšfftå®è£
ã®ããã©ãŒãã³ã¹ã®æ¯èŒ
å³ 9. 128ãµã³ãã«ã®é·ãã§ã®çŽæ¥å®è£
ãšfftå®è£
ã®ããã©ãŒãã³ã¹ã®æ¯èŒFFTã®å®è£
ã¯ãã¹ãããã«ãã£ãŠç¹åŸŽä»ããããŸãã ããã¯ãããã€ãã®è¿ã次æ°ã®ãã£ã«ã¿ãŒã§ã¯ãåãæ¬¡æ°ã®FFTã䜿çšãããFFTã®æ¬¡ã®æ¬¡æ°ãžã®é·ç§»ããªã³ã«ãªããšãããã©ãŒãã³ã¹ãå€åããããã§ãã æå€§ã®ããã©ãŒãã³ã¹ãå®çŸããã«ã¯ãã°ã©ãã®äžã«ããã¢ã«ãŽãªãºã ã䜿çšããå¿
èŠããããŸãã ææ¡ãããAPIã䜿çšãããšãäž¡æ¹ã®ããŒãžã§ã³ã®ã¢ã«ãŽãªãºã ãå®è¡ããŠç¹å®ã®ãã·ã³ã§æž¬å®ããæé©ãªãã®ãéžæããäŸãå®è£
ã§ããŸãã åçã¯æ¬¡ã®ããã«ãªããŸãã ãã®å³ã§ã¯ãXè»žã«æ²¿ã£ããã£ã«ã¿ãŒæ¬¡æ°ãšYè»žã«æ²¿ã£ããã¯ãã«ã®é·ãã®1024x1024ã®ãµã€ãºã®2次å
空éãæãããŠããŸãã ç·è²ã¯ãfftã¢ã«ãŽãªãºã ãçŽæ¥ããŒãžã§ã³ãããé«éã§ããããšãæå³ããŸãã å³ã®äžéšã«ããç¹åŸŽçãªçŽç·ã¯å³ã«å¯Ÿå¿ããŠããŸãã 9ãæ¬¡ã®é åºã«åãæ¿ããåŸãfftãªãã·ã§ã³ã®åäœããã°ããé
ããªããŸãã
å³ 10. 1024 x 1024ã®ãã£ã«ã¿ãŒç©ºéXãã¯ãã«é·æ¬¡å
ã§ã®IPP FIRãã£ã«ã¿ãŒãããŒãå®è£
ã®çŽæ¥ããã©ãŒãã³ã¹ãšfftããã©ãŒãã³ã¹ã®æ¯èŒãã®å³ã¯éåžžã«è€éã§ãããä»»æã®ãã©ãããã©ãŒã ã§IPPå
ã«è£éããããšã¯ããã»ã©å®¹æã§ã¯ãªãããšãããããŸãã ããã«ããã®ãã¿ãŒã³ã¯ç¹å®ã®ãã·ã³ã«ãã£ãŠç°ãªãå ŽåããããŸãã çŽæ¥ã³ãŒããšfftã³ãŒãã®éžæã«å ããŠãã¹ããªãŒã æ°ã®åœ¢åŒã§å¥ã®æ¬¡å
ã远å ã§ããŸããããã«ãããå€å±€çãªç»åãåŸãããŸãã ãã®å Žåããææ¡ãããAPIã«ããããã®ãã©ãããã©ãŒã ãªãã·ã§ã³ã«æé©ãªãªãã·ã§ã³ãéžæã§ããŸãã
ãããã«
IPP 9.0ã§å°å
¥ãããFIRãã£ã«ã¿ãŒAPIã䜿çšãããšãçŽæ¥ã¢ã«ãŽãªãºã ãšfftã¢ã«ãŽãªãºã ããæé©ãªãªãã·ã§ã³ãéžæããéžæããåãªãã·ã§ã³ã䞊ååããããšã§ãã¢ããªã±ãŒã·ã§ã³ã§ããã«å¹ççã«äœ¿çšã§ããŸãã ããã«ãIPPã©ã€ãã©ãªã¯å®å
šã«ç¡æã§ããã®ãªã³ã¯ããããŠã³ããŒãã§ããŸã
Intel Performance PrimitivesïŒIPPïŒãã¢ããªã±ãŒã·ã§ã³ã IPP FIRãã£ã«ã¿ãŒã®ããã©ãŒãã³ã¹ã枬å®ãããµã³ãã«ã³ãŒã
ãµã³ãã«ã³ãŒã #include <stdio.h> #include <math.h> #include <omp.h> #include "ippcore.h" #include "ipps.h" #include "bmp.h" void save_csv(Ipp32f* pSrc, int len, char* fName) { FILE *fp; int i; if((fp=fopen(fName, "w"))==NULL) { printf("Cannot open %s\n", fName); return; } for (i = 0; i < len; i++){ fprintf(fp, "%.3f\n", pSrc[i]); } fclose(fp); } Ipp32f* pSrc; Ipp32f* pDft; Ipp32f* pDst; Ipp32f* pTaps; Ipp64f rFreq = 0.2; int bufSize; int NTHREADS = 1; IppAlgType algType = ippAlgDirect; void firgenlowpass(int order) { IppStatus status; Ipp8u* pBuffer; Ipp64f* pTaps_64f; int size; int i; status = ippsFIRGenGetBufferSize(order, &size); pBuffer = ippsMalloc_8u(size); pTaps_64f = ippsMalloc_64f(order); ippsFIRGenLowpass_64f(rFreq, pTaps_64f, order, ippWinBartlett, ippTrue, pBuffer); for (i = 0; i < order;i++) { pTaps[i] = pTaps_64f[i]; } ippsFree(pTaps_64f); } void fir_omp(Ipp32f* src, Ipp32f* dst, int len, int order, IppsFIRSpec_32f* pSpec, Ipp32f* pDlySrc, Ipp32f* pDlyDst, Ipp8u* pBuffer) { int tlen, ttail; tlen = len / NTHREADS; ttail = len % NTHREADS; #pragma omp parallel num_threads(NTHREADS) { int id = omp_get_thread_num(); Ipp32f* s = src + id*tlen; Ipp32f* d = dst + id*tlen; int len = tlen + ((id == NTHREADS-1) ? ttail : 0); Ipp8u* b = pBuffer + id*bufSize; if (id == 0) ippsFIRSR_32f(s, d, len, pSpec, pDlySrc, NULL, b); else if (id == NTHREADS - 1) ippsFIRSR_32f(s, d, len, pSpec, s - (order - 1), pDlyDst, b); else ippsFIRSR_32f(s, d, len, pSpec, s - (order - 1), NULL, b); } } void perf(int len, int order, float* cpMAC) { IppStatus status; IppsFIRSpec_32f* pSpec; Ipp8u* pBuffer; int specSize; Ipp32f* pDlySrc = NULL;/*initialize delay line with "0"*/ Ipp32f* pDlyDst = NULL;/*don't write output delay line*/ __int64 beg=0, end=0; int i, loop = 10000; /*allocate memory for input and output vectors*/ pSrc = ippsMalloc_32f(len); pDst = ippsMalloc_32f(len); pTaps = ippsMalloc_32f(order); /*create special vector Jaehne*/ ippsVectorJaehne_32f(pSrc, len, 128); /*get lowpass filter coeffs*/ firgenlowpass(order); /*get necessary buffer sizes for pSpec and for pBuffer*/ status = ippsFIRSRGetSize(order, ipp32f, &specSize, &bufSize); /*allocate memory for pSpec*/ pSpec = (IppsFIRSpec_32f*)ippsMalloc_8u(specSize); /*for N threads bufSize should be multiplied by N*/ /*allocate bufSize*NTHREADS bytes*/ pBuffer = ippsMalloc_8u(bufSize*NTHREADS); /*initalize pSpec*/ status = ippsFIRSRInit_32f(pTaps, order, algType, pSpec); /*apply FIR filter*/ /*start measurement for sinle threaded*/ if (NTHREADS == 1){ ippsFIRSR_32f(pSrc, pDst, len, pSpec, pDlySrc, pDlyDst, pBuffer); beg = __rdtsc(); for (int i = 0; i < loop; i++) { ippsFIRSR_32f(pSrc, pDst, len, pSpec, pDlySrc, pDlyDst, pBuffer); } end = __rdtsc(); } else { fir_omp(pSrc, pDst, len, order, pSpec, pDlySrc, pDlyDst, pBuffer); beg = __rdtsc(); for (int i = 0; i < loop; i++) { fir_omp(pSrc, pDst, len, order, pSpec, pDlySrc, pDlyDst, pBuffer); } end = __rdtsc(); } *cpMAC = ((double)(end - beg) / ((double)loop * (double)len * (double)order)); printf("%5d, %5d, %3.3f\n", len, order, *cpMAC); ippsFree(pSrc); ippsFree(pDst); ippsFree(pTaps); ippsFree(pSpec); ippsFree(pBuffer); } int main() { int len = 32768; int order; float cpMAC; NTHREADS = 1; algType = ippAlgDirect; //algType = ippAlgFFT; len = 128; printf("\nthreads: %d\n", NTHREADS); printf("len, order, cpMAC\n\n"); for (order = 1; order <= 512; order++){ perf(len, order, &cpMAC); } return 0; }