MIPS SIMDテクノロゞヌずBaikal-T1プロセッサヌ

Baikal Electronicsの同僚は、Baikal-T1 [ L1 ]プロセッサを䜿甚しお、圌らの印象に぀いお曞いおくれず申し出たした。 圌らにずっお、これは開発者に自分のプロセッサの機胜ず特城を䌝える方法です。 私にずっお-䟋えば、MIPSfpga-plus [ L2 ]プロゞェクトに新しい機胜を远加しお、珟代のプロセッサコアでシステムを知り、将来、より小さな「自転車」を発明する機䌚。 さお、通垞の゚ンゞニアリングの奜奇心、再び...


今日は、MIPS Warrior P-class P5600 [ L3 ]コアで利甚可胜なMIPS SIMDアヌキテクチャのベクトル拡匵に぀いお説明したす。これは、Baikal-T1プロセッサにも存圚するこずを意味したす。 この蚘事は、開発者を察象にしおいたす。



はじめに


ほずんどの堎合、特定のデバむスデバむス/ハヌドりェア/゜フトりェアおよびハヌドりェアコンプレックスなどの開発に䌎い、デゞタルおよびアナログ信号の凊理の問題の解決策が関連付けられおいたす。 入力には、センサヌの枬定倀、入力/出力デバむスからの信号、ディスク䞊のファむルからの情報などが含たれたす。 出力モニタヌ䞊の画像、スピヌカヌからの音、駆動制埡信号、ダッシュボヌド䞊のむンゞケヌタヌの衚瀺など。および「入力ず出力の間」-特定の数孊的操䜜のセット。


この「ハヌドりェアでの数孊」を実装する方法を簡単にリストするず、開発者が䞀緒にたたは別々に適甚できる次のツヌルのリストが埗られたす。



同じリストですが、より詳现に
  • アナログ回路の圢での実装
    デゞタル機噚の優䜍性にもかかわらず、䞖界ず人間の感芚は䟝然ずしおアナログのたたです。 奜むず奜たざるずにかかわらず、情報をデゞタル圢匏で「排他的に」凊理する堎合でも、ADCに入る前にフィルタを蚭定する必芁がありたす。 同様に、アナログコンポヌネントには、積分噚、埮分噚、加算噚などを実装できたす。 アナログ゚レクトロニクスの優䜍性の数十幎にわたっお、゚ンゞニアはさたざたな問題を解決するために膚倧な経隓を積んでおり、優れた開発者デゞタル゚レクトロニクスでさえはこの遺産を考慮に入れおいたす[ D1 ]。


  • マむクロコントロヌラヌ゜フトりェアの実装
    凊理された信号はほずんどなく、数孊は耇雑ではないか、リ゜ヌスを必芁ずしたせんか この堎合、ADC、䜎動䜜呚波数、および省゚ネオプションを備えた比范的安䟡なマむクロコントロヌラヌは非垞に遞択肢です。 必芁に応じお、ボトルネックをアセンブラヌで蚘述できたす。


  • FPGA実装
    速床、䞊列凊理、スケヌリングの゜リュヌションに高い芁件がある堎合、数孊をVerilogたたはVHDLのモゞュヌルずしお説明し、凊理に必芁な呚波数で動䜜できるFPGAを遞択したす。 ゜リュヌションが非垞に成功し、その幅広い耇補に意味がある堎合-ASIC [ L3 ]の䞖界ぞようこそ。


  • システムオンチップの圢でのハヌドりェアず゜フトりェアの実装
    システムは耇雑すぎお完党にVerilogで蚘述するこずはできたせん。別のロゞックを高玚蚀語でプログラミングし、実際にすべおをLinuxから管理したすか この堎合、゜リュヌションはSoCSystem-on-a-Chip、SoCです。完成したプロセッサコアNios II、MIPSfpgaなどを取埗し、必芁な呚蟺モゞュヌルで重み付けしたす。トリッキヌな数孊を実行したす。 䞀郚の操䜜は、プロセッサヌ呜什[ L4 ]ずしお䜿甚可胜にできたす。 はい、将来、これはASICでも実装できたす。


  • デゞタル信号プロセッサDSPの䜿甚
    実際、ここでは、プロセッサコア、独自の呚蟺機噚、および高速デゞタル信号凊理専甚のコマンドセットを備えた既補のチップを賌入したす。 その呚りに、決定[ L10 、 L5 ]を構築したす。


  • 汎甚プロセッサ゜フトりェアの実装
    各プロセッサメヌカヌは、特定の数孊挔算のパフォヌマンスを最適化する独自のアヌキテクチャ゜リュヌションを提䟛しおいたす。 たた、゜フトりェア開発者のタスクは、必芁に応じお、メヌカヌが提䟛する機胜を䜿甚しお蚈算を高速化するこずです。 これは、以䞋のMIPSプロセッサで説明するものです。


  • コンピュヌティングにグラフィックコントロヌラヌを䜿甚する
    珟圚のリストをより完党にするために、最も耇雑でリ゜ヌスを倧量に消費する蚈算をビデオカヌド[ L6 、 L7 ]に入れる可胜性に蚀及するこずを忘れないでください。

完璧なツヌルはありたせん。 最適なツヌルは、プロゞェクトチヌムが必芁な胜力を持ち、利甚可胜な、たたは最小限のコストで入手できる、蚱容可胜な時間枠内で問題の解決を保蚌するツヌルです。 予算の決定、顧客の芁件、そしお時には政治的理由がそのような決定の採甚に課せられたす。


この蚘事が、MIPS SIMDテクノロゞが利甚可胜なMIPSコアに基づいお構築されたバむカルT1プロセッサたたはその他の特定の蚈算のパフォヌマンスを最適化する必芁に盎面しおいる読者ぞの入門曞ずしお圹立぀こずを願っおいたす。


蚈算リ゜ヌス


次に進む前に、デゞタル信号凊理DSPの䞀般的なタスクの1぀であるフィルタリングを怜蚎しおください。 䟋ずしお、有限むンパルス応答FIR、FIR、有限むンパルス応答[ L8 ]のフィルタヌを䜿甚したす。 DSPの理論ず数孊的蚈算を掘り䞋げるこずなく、䞻なものに泚目したす-このタむプのデゞタルフィルタヌを説明する方皋匏



ここで、 xnは入力信号、 ynは出力信号、 Pはフィルタヌ次数、 biはフィルタヌ係数です。 同じ匏を次のように曞くこずができたす。



この堎合、入力信号xnの性質を無芖したす。 それをADCから取埗したデヌタずしたすが、同じ成功を収めお、ファむルから読み取るこずができたす。 この堎合、私たちにずっおは重芁ではありたせん。 珟圚の蚘事は「DSPに぀いお」ではなく「蚈算に぀いお」であるため、フィルタヌマゞックに飛び蟌むのではなく、オンラむンサヌビスの1぀を䜿甚しお係数を蚈算したす[ L9 ]


目的のフィルタリングパラメヌタヌたずえば、定矩枈みオプションのいずれかバンドストップ-ノッチフィルタヌ[ L11 ]を蚭定し、[フィルタヌの蚭蚈]ボタンをクリックしたす。



蚈算の結果は、フィルタヌ[ L12 ]の呚波数応答です。


フィルタヌ呚波数応答


係数ず゜ヌスコヌドのセット


SampleFilter.h
#ifndef SAMPLEFILTER_H_ #define SAMPLEFILTER_H_ /* FIR filter designed with http://t-filter.appspot.com sampling frequency: 2000 Hz * 0 Hz - 200 Hz gain = 1 desired ripple = 5 dB actual ripple = 3.1077303934211127 dB * 300 Hz - 500 Hz gain = 0 desired attenuation = -40 dB actual attenuation = -42.49314043914754 dB * 600 Hz - 1000 Hz gain = 1 desired ripple = 5 dB actual ripple = 3.1077303934211127 dB */ #define SAMPLEFILTER_TAP_NUM 25 typedef struct { double history[SAMPLEFILTER_TAP_NUM]; unsigned int last_index; } SampleFilter; void SampleFilter_init(SampleFilter* f); void SampleFilter_put(SampleFilter* f, double input); double SampleFilter_get(SampleFilter* f); #endif 

SampleFilter.s
 #include "SampleFilter.h" static double filter_taps[SAMPLEFILTER_TAP_NUM] = { 0.037391727827352596, -0.03299884552335979, 0.044230583967321345, 0.0023050970833628304, -0.06768087195950104, -0.046347105409124706, -0.011717387509232432, -0.0707342284185183, -0.049766517282999544, 0.16086413543836361, 0.21561058688743148, -0.10159456907827959, 0.6638637561392535, -0.10159456907827959, 0.21561058688743148, 0.16086413543836361, -0.049766517282999544, -0.0707342284185183, -0.011717387509232432, -0.046347105409124706, -0.06768087195950104, 0.0023050970833628304, 0.044230583967321345, -0.03299884552335979, 0.037391727827352596 }; void SampleFilter_init(SampleFilter* f) { int i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) f->history[i] = 0; f->last_index = 0; } void SampleFilter_put(SampleFilter* f, double input) { f->history[f->last_index++] = input; if(f->last_index == SAMPLEFILTER_TAP_NUM) f->last_index = 0; } double SampleFilter_get(SampleFilter* f) { double acc = 0; int index = f->last_index, i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) { index = index != 0 ? index-1 : SAMPLEFILTER_TAP_NUM-1; acc += f->history[index] * filter_taps[i]; }; return acc; } 

フィルタヌパラメヌタヌ、 SampleFilter_get関数を芋お、䞊蚘のFIRフィルタヌ匏を思い出しお、最も重芁な点に泚意しおください。



ここで、いく぀かの客芳的な理由で問題の条件が明確になったず仮定したす。



その結果、次の呚波数応答が埗られたす。


フィルタヌ呚波数応答


私たちにずっお重芁なのは



この䟋で瀺されおいるフィルタヌパラメヌタヌの絶察倀に泚意を払っお欲しくありたせん。 私が䌝えたい䞻なアむデアは、ある時点で、蚈算のリ゜ヌス消費の増加が以前に予枬された制限を超える可胜性があるずいうこずです。 そしお、アルゎリズム、そのパラメヌタヌ、たたは入力デヌタにわずかな倉曎を加えた埌、プロセッサヌコアがデヌタの「シャベル」に埓事しおいるだけで、他のタスクの実行は蚀うたでもなく、これを行う時間がないこずが突然刀明する堎合がありたす。 正垞に凊理されたすが、䜜成された負荷たたは蚈算の期間がシステムの芁件を満たしおいたせん。 そしお珟時点では、か぀おないほど最適化のタスクに盎面しおいたす。


蚈算速床


簡単な䟋で問題を芋぀けたら、その解決策を芋おみたしょう。 プロセッサの補造元は、蚈算の速床を䞊げるためにいく぀かのトリックを螏んでいたす頻床を増やし、プロセッサコアの数を増やし、新しいコマンドを远加し、パむプラむン構成、キャッシュサむズを詊し、より高速なバスずむンタヌフェむスを䜿甚するように切り替えたす。


たた、開発者からアセンブラヌで最もボトルネックを実装する暩利、コンパむラヌオプションを詊す暩利、アルゎリズムをリ゜ヌス集玄床の䜎いものに倉曎する暩利、たたは同じ範囲のパラメヌタヌでより予枬可胜な動䜜をする暩利を開発者から奪いたせん。


プロセッサアヌキテクチャのサポヌトなしでは䜿甚できない凊理速床を向䞊させる2぀の方法に焊点を圓おおいたす。



算術挔算の組み合わせ


フィルタヌに戻り、 SampleFilter_get関数のコヌドを詳しく芋おみたしょう。


 double SampleFilter_get(SampleFilter* f) { double acc = 0; int index = f->last_index, i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) { index = index != 0 ? index-1 : SAMPLEFILTER_TAP_NUM-1; acc += f->history[index] * filter_taps[i]; }; return acc; } 

そしお特にラむン䞊


 acc += f->history[index] * filter_taps[i]; 

ここでは、2぀の連続しお実行される操䜜を確認したす。係数を乗算し、このコマンドの結果をアキュムレヌタヌ倉数に环積したす。 DSPアルゎリズムでは、乗算ず加算の同様の組み合わせが非垞に䞀般的です。 しかし、これらの2぀の操䜜が非垞に頻繁に互いに隣接しおいる堎合、1サむクル内で実行される1぀のチヌムにそれらを結合しおみたせんか このような結合のアむデアは、長い間゚ンゞニアの頭に浮かびたした。 そのため、コマンド「环算による乗算」が出珟し耇合乗算–加算、乗算–环積挔算、MAC、すべおのデゞタルシグナルプロセッサに存圚したす。



SIMDアプロヌチ


反察偎から゜リュヌションにアプロヌチしたす。 そしお、なぜ各サンプルxnを個別に凊理するのではなく、いく぀かのサンプルを1぀のベクトル配列に結合せずに、ベクトル党䜓に、より正確に、ベクトルの各芁玠に同時にコマンドを適甚する必芁がありたすか この堎合、1サむクルメモリの凊理をカりントしないで、䞀床に耇数のサンプルを凊理できたす。



たた、最倧蚱容ベクトルサむズが倧きいほど、デヌタ凊理速床が高くなりたす。 この蚈算の線成の原則は、 SIMD 単䞀呜什、耇数デヌタ、単䞀呜什ストリヌム、耇数デヌタストリヌム[ L13 ]ず呌ばれたす。 ベクトルプロセッサ[ L14 ]ずスカラヌプロセッサぞのベクトル拡匵の䜜業は、このアプロヌチに基づいおいたす。x86アヌキテクチャの堎合はSSEずAVX、MIPSアヌキテクチャの堎合はMIPS SIMD [ L15 ]です。


MIPS SIMD


ベクタヌアヌキテクチャ拡匵が構築される基本原則を理解したので、MIPS SIMDに盎接アクセスできたす。 この拡匵の包括的な説明はドキュメント[ D2 ]に蚘茉されおいたすが、䞻な点に泚意しおください。



敎数挔算
ニヌモニック呜什の説明
ADDV、ADDVI远加する
ADD_A、ADDS_A絶察倀の加算ず飜和加算
ADDS_S、ADDS_U笊号付きおよび笊号なしの飜和加算
HADD_S、HADD_U笊号付きおよび笊号なしの氎平方向の远加
ASUB_S、ASUB_U笊号付きおよび笊号なし枛算の絶察倀
AVE_S、AVE_U笊号付きおよび笊号なしの平均
AVER_S、AVER_U四捚五入による笊号付きおよび笊号なしの平均
DOTP_S、DOTP_U眲名付きおよび眲名なしのドット積
DPADD_S、DPADD_U眲名付きおよび眲名なしのドット積远加
DPSUB_S、DPSUB_U笊号付きおよび笊号なしのドット積枛算
DIV_S、DIV_U割る
マドノ乗算加算
MAX_A、MIN_A絶察倀の最倧倀ず最小倀
MAX_S、MAXI_S、MAX_U、MAXI_U笊号付きおよび笊号なしの最倧倀
MIN_S、MINI_S、MIN_U、MINI_U笊号付きおよび笊号なしの最倧倀
Msubv乗算枛算
マルチ掛ける
MOD_S、MOD_U笊号付きおよび笊号なしの剰䜙モゞュロ
SAT_S、SAT_U眲名付きおよび眲名なしのサチュレヌト
SUBS_S、SUBS_U笊号付きおよび笊号なしの飜和枛算
HSUB_S、HSUB_U笊号付きおよび笊号なしの氎平枛算
SUBSUU_S笊号付き飜和笊号なし枛算
SUBSUS_U笊号なしからの笊号なし飜和笊号付き枛算
SUBV、SUBVI匕く

ビット挔算
ニヌモニック呜什の説明
AND、ANDI論理的
BCLR、BCLRIビットクリア
BINSL、BINSLI、BINSR、BINSRI巊右のビット挿入
BMNZ、BMNZIれロでない堎合のビット移動
BMZ、BMZIれロの堎合のビット移動
BNEG、BNEGIビット吊定
BSEL、BSELIビット遞択
BSET、BSETIビットセット
NLOCリヌディング1ビットカりント
NLZCリヌディングれロビットカりント
NOR、NORI論理吊定たたは
PCNT人口ビットを1に蚭定カりント
たたは、ORI論理的たたは
SLL、SLLI巊にシフト
SRA、SRAI右シフト算術
SRAR、SRARI䞞め右シフト挔算
SRL、SRLI右シフト論理
SRLR、SRLRI䞞めシフト右論理
XOR、XORI論理排他的OR

浮動小数点算術挔算
ニヌモニック呜什の説明
ファッド浮動小数点の远加
Fdiv浮動小数点
FEXP2浮動小数点ベヌス2のべき乗
FLOG2浮動小数点ベヌス2察数
FMADD、FMSUB浮動小数点融合乗算加算および乗算枛算
FMAX、FMIN浮動小数点の最倧倀ず最小倀
FMAX_A、FMIN_A絶察倀の浮動小数点の最倧倀ず最小倀
Fmul浮動小数点乗算
Frcp近䌌浮動小数点逆数
フリント敎数ぞの浮動小数点の䞞め
FRSQRT平方根の近䌌浮動小数点逆数
FSQRT浮動小数点平方根
FSUB浮動小数点枛算

非算術浮動小数点挔算
ニヌモニック呜什の説明
FCLASS浮動小数点クラスマスク

浮動小数点比范挔算
ニヌモニック呜什の説明
FCAF浮動小数点の静かな比范は垞に停
FCUN浮動小数点Quiet Compare Unordered
FCOR浮動小数点クワむ゚ット比范順序付け
Fceq浮動小数点の静かな比范が等しい
FCUNE浮動小数点の静かな比范、順序なしたたは等しくない
FCUEQ浮動小数点Quiet Compare UnorderedたたはEqual
FCNE等しくない浮動小数点の静かな比范
FCLT浮動小数点の静かな比范
FCULT浮動小数点の静かな比范順䞍同たたはより小さい
FCLE浮動小数点の静かな比范がより小さいか等しい
FCULE浮動小数点の静かな比范、順序なし、たたは以䞋
FSAF浮動小数点信号比范は垞に停
FSUN浮動小数点信号方匏比范順䞍同
FSOR浮動小数点シグナリング比范順序付け
FSEQ浮動小数点信号の比范が等しい
FSUNE浮動小数点信号の比范順序なしたたは等しくない
FSUEQ浮動小数点シグナリングの比范順序なしたたは等しい
FSNE浮動小数点信号の比范が等しくない
FSLT浮動小数点シグナリングの比范
FSULT浮動小数点信号の比范順序なしたたは未満
FSLE浮動小数点信号の比范がより小さいか等しい
FSULE浮動小数点シグナリングで順序付けられおいないか、等しいか小さいかを比范

浮動小数点倉換操䜜
ニヌモニック呜什の説明
フェクスド浮動小数点ダりンコンバヌト亀換フォヌマット
FEXUPL、FEXUPR巊半分および右半分の浮動小数点アップコンバヌト亀換フォヌマット
FFINT_S、FFINT_U笊号付きおよび笊号なし敎数からの浮動小数点倉換
FFQL、FFQR固定小数点からの巊半分および右半分の浮動小数点倉換
FTINT_S、FTINT_U浮動小数点の䞞めず笊号付きおよび笊号なし敎数ぞの倉換
FTRUNC_S、FTRUNC_U浮動小数点の切り捚おおよび笊号付きおよび笊号なし敎数ぞの倉換
FTQ浮動小数点の䞞めず固定小数点ぞの倉換

固定小数点挔算
ニヌモニック呜什の説明
MADD_Q、MADDR_Q固定小数点乗算および加算䞞めなしおよび䞞めあり
MSUB_Q、MSUBR_Q固定小数点の乗算ず枛算、䞞めなしおよび䞞めあり
MUL_Q、MULR_Q䞞めなしおよび䞞めありの固定小数点乗算

分岐操䜜
ニヌモニック呜什の説明
Bnzれロでない堎合分岐
Bzれロの堎合分岐
CEQ、CEQI等しいず比范
CLE_S、CLEI_S、CLE_U、CLEI_U笊号なしず笊号なしの比范
CLT_S、CLTI_S、CLT_U、CLTI_U未眲名ず未眲名の比范

ベクタヌのロヌドおよびアンロヌド操䜜
ニヌモニック呜什の説明
CFCMSA、CTCMSAMSA制埡レゞスタからのコピヌおよびMSA制埡レゞスタぞのコピヌ
LD荷重ベクトル
LDIすぐにロヌド
移動ベクトルからベクトルぞの移動
スプラット、スプラティベクタヌ芁玠の耇補
塗り぀ぶしGPRからのベクトル
挿入、挿入GPRずベクタヌ芁玠0をベクタヌ芁玠に挿入する
COPY_S、COPY_U゚レメントをGPR眲名および未眲名にコピヌしたす
ST店舗ベクトル

ベクトル順列挔算
ニヌモニック呜什の説明
ILVEV、ILVODむンタヌリヌブ偶数、奇数
ILVL、ILVR巊、右をむンタヌリヌブ
PCKEV、PCKOD偶数および奇数芁玠をパック
SHFシャッフルを蚭定する
SLD、SLDI芁玠スラむド
VSHFベクトルシャッフル

その他の操䜜
ニヌモニック呜什の説明
LSA巊シフト加算たたはロヌド/ストアのアドレス蚈算


チヌムの正匏な説明


コンパむラレベルのサポヌト


MIPS SIMDはgccコンパむラヌによっおサポヌトされおいたすが、このサポヌトには独自の特城がありたす。



最適化前のコヌド
 #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n) - 1))) >> (n)) static inline unsigned char clip_pixel(int i32Val) { return ((i32Val) > 255) ? 255u : ((i32Val) < 0) ? 0u : (i32Val); } void vert_filter_8taps_16width_c(unsigned char *pSrc, // SOURCE POINTER int SrcStride, // SOURCE BUFFER PITCH unsigned char *pDst, // DEST POINTER int DstStride, // DEST BUFFER PITCH char *pFilter, // POINTER TO FILTER BANK int Height) // HEIGHT OF THE BLOCK { unsigned int Row, Col; int FiltSum; short Src0, Src1, Src2, Src3, Src4, Src5, Src6, Src7; pSrc -= (8 / 2 - 1) * SrcStride; // MOVE INPUT SRC POINTER TO APPROPRIATE POSITION // LOOP FOR NUMBER OF COLUMNS-16 for (Col = 0; Col < 16; ++Col) { Src0 = pSrc[0 * SrcStride]; Src1 = pSrc[1 * SrcStride]; Src2 = pSrc[2 * SrcStride]; Src3 = pSrc[3 * SrcStride]; Src4 = pSrc[4 * SrcStride]; Src5 = pSrc[5 * SrcStride]; Src6 = pSrc[6 * SrcStride]; // LOOP FOR NUMBER OF ROWS for (Row = 0; Row < Height; Row++) { Src7 = pSrc[(7 + Row) * SrcStride]; FiltSum = 0; // ACCUMULATED FILTER SUM += PIXEL * FILTER COEFF FiltSum += (Src0 * pi8Filter[0]); FiltSum += (Src1 * pi8Filter[1]); FiltSum += (Src2 * pi8Filter[2]); FiltSum += (Src3 * pi8Filter[3]); FiltSum += (Src4 * pi8Filter[4]); FiltSum += (Src5 * pi8Filter[5]); FiltSum += (Src6 * pi8Filter[6]); FiltSum += (Src7 * pi8Filter[7]); FiltSum = ROUND_POWER_OF_TWO(FiltSum, 7); // ROUNDING pDst[Row * DstStride] = clip_pixel(FiltSum);// CLIP RESULT IN 0-255(UNSIGNED CHAR) // PREPARING FOR NEXT CONVOLUTION- SLIDING WINDOW Src0 = Src1; Src1 = Src2; Src2 = Src3; Src3 = Src4; Src4 = Src5; Src5 = Src6; Src6 = Src7; } pSrc += 1; pDst += 1; } } 

MIPS SIMDを䜿甚した最適化埌のコヌド
 /* MSA VECTOR TYPES */ #define WRLEN 128 // VECTOR REGISTER LENGTH 128-BIT #define NUMWRELEM (WRLEN >> 3) typedef signed char IMG_VINT8 __attribute__ ((vector_size(NUMWRELEM))); //VEC SIGNED BYTES typedef unsigned char IMG_VUINT8 __attribute__ ((vector_size(NUMWRELEM))); //VEC UNSIGNED BYTES typedef short IMG_VINT16 __attribute__ ((vector_size(NUMWRELEM))); //VEC SIGNED HALF-WORD #define LOAD_UNPACK_VEC(pSrc, SrcStride, vi16VecRight, vi16VecLeft) \ { \ IMG_VUINT8 vu8Src; \ IMG_VINT16 vi16Vec0; \ IMG_VINT8 vi8Tmp0; \ /* LOAD INPUT VECTOR */ \ vu8Src = *((IMG_VINT8 *)(pSrc)); \ /* RANGE WARPING TO MAINTAIN 16 BIT PRECISION */ \ vi16Vec0 = __builtin_msa_xori_b(vu8Src, 128); \ /* CALCULATE SIGN EXTENSION */ \ vi8Tmp0 = __builtin_msa_clti_s_b(vi16Vec0, 0); \ /* INTERLEAVE RIGHT TO 16 BIT VEC */ \ vi16VecRight = __builtin_msa_ilvr_b(vi8Tmp0, vi16Vec0); \ /* INTERLEAVE LEFT TO 16 BIT VEC */ \ vi16VecLeft = __builtin_msa_ilvl_b(vi8Tmp0, vi16Vec0); \ pSrc += SrcStride; \ } void vert_filter_8taps_16width_msa(unsigned char *pSrc, // SOURCE POINTER int SrcStride, // SOURCE BUFFER PITCH unsigned char *pDst, // DEST POINTER int DstStride, // DEST BUFFER PITH char *pFilter, // POINTER TO FILTER BANK int Height) // HEIGHT OF THE BLOCK { int u32LoopCnt; VINT16 vi16Vec0Right, vi16Vec1Right, vi16Vec2Right, vi16Vec3Right; VINT16 vi16Vec4Right, vi16Vec5Right, vi16Vec6Right, vi16Vec7Right; VINT16 vi16Vec0Left, vi16Vec1Left, vi16Vec2Left, vi16Vec3Left; VINT16 vi16Vec4Left, vi16Vec5Left, vi16Vec6Left, vi16Vec7Left; VINT16 vi16Temp1Right, vi16Temp1Left; VINT16 vi16Filt0, vi16Filt1, vi16Filt2, vi16Filt3; VINT16 vi16Filt4, vi16Filt5, vi16Filt6, vi16Filt7; pSrc -= (3 * SrcStride); // PREPARE FILTER COEFF IN VEC REGISTERS vi16Filt0 = __builtin_msa_fill_h(*(pFilter)); vi16Filt1 = __builtin_msa_fill_h(*(pFilter + 1)); vi16Filt2 = __builtin_msa_fill_h(*(pFilter + 2)); vi16Filt3 = __builtin_msa_fill_h(*(pFilter + 3)); vi16Filt4 = __builtin_msa_fill_h(*(pFilter + 4)); vi16Filt5 = __builtin_msa_fill_h(*(pFilter + 5)); vi16Filt6 = __builtin_msa_fill_h(*(pFilter + 6)); vi16Filt7 = __builtin_msa_fill_h(*(pFilter + 7)); //LOAD 7 INPUT VECTORS LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec0Right, vi16Vec0Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec1Right, vi16Vec1Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec2Right, vi16Vec2Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec3Right, vi16Vec3Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec4Right, vi16Vec4Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec5Right, vi16Vec5Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec6Right, vi16Vec6Left) // START CONVOLUTION VERTICALLY for (u32LoopCnt = Height; u32LoopCnt--; ) { //LOAD 8TH INPUT VECTOR LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec7Right, vi16Vec7Left) /* FILTER CALC */ IMG_VINT16 vi16Tmp1, vi16Tmp2; IMG_VINT8 vi8Tmp3; // 8 TAP VECTORIZED CONVOLUTION FOR RIGHT HALF vi16Tmp1 = (vi16Vec0Right * vi16Filt0); vi16Tmp1 += (vi16Vec1Right * vi16Filt1); vi16Tmp1 += (vi16Vec2Right * vi16Filt2); vi16Tmp1 += (vi16Vec3Right * vi16Filt3); vi16Tmp2 = (vi16Vec4Right * vi16Filt4); vi16Tmp2 += (vi16Vec5Right * vi16Filt5); vi16Tmp2 += (vi16Vec6Right * vi16Filt6); vi16Tmp2 += (vi16Vec7Right * vi16Filt7); vi16Temp1Right = __builtin_msa_adds_s_h(vi16Tmp1, vi16Tmp2); // 8 TAP VECTORIZED CONVOLUTION FOR LEFT HALF vi16Tmp1 = (vi16Vec0Left * vi16Filt0); vi16Tmp1 += (vi16Vec1Left * vi16Filt1); vi16Tmp1 += (vi16Vec2Left * vi16Filt2); vi16Tmp1 += (vi16Vec3Left * vi16Filt3); vi16Tmp2 = (vi16Vec4Left * vi16Filt4); vi16Tmp2 += (vi16Vec5Left * vi16Filt5); vi16Tmp2 += (vi16Vec6Left * vi16Filt6); vi16Tmp2 += (vi16Vec7Left * vi16Filt7); vi16Temp1Left = __builtin_msa_adds_s_h(vi16Tmp1, vi16Tmp2); // ROUNDING RIGHT SHIFT RANGE CLIPPING AND NARROWING vi16Temp1Right = __builtin_msa_srari_h(vi16Temp1Right, 7); vi16Temp1Right = __builtin_msa_sat_s_h(vi16Temp1Right, 7); vi16Temp1Left = __builtin_msa_srari_h(vi16Temp1Left, 7); vi16Temp1Left = __builtin_msa_sat_s_h(vi16Temp1Left, 7); vi8Tmp3 = __builtin_msa_pckev_b(vi16Temp1Left, vi16Temp1Right); vi8Tmp3 = __builtin_msa_xori_b(vi8Tmp3, 128); // STORE OUTPUT VEC *((IMG_VINT8 *)(pDst)) = (vi8Tmp3); pDst += DstStride; // PREPARING FOR NEXT CONVOLUTION- SLIDING WINDOW vi16Vec0Right = vi16Vec1Right; vi16Vec1Right = vi16Vec2Right; vi16Vec2Right = vi16Vec3Right; vi16Vec3Right = vi16Vec4Right; vi16Vec4Right = vi16Vec5Right; vi16Vec5Right = vi16Vec6Right; vi16Vec6Right = vi16Vec7Right; vi16Vec0Left = vi16Vec1Left; vi16Vec1Left = vi16Vec2Left; vi16Vec2Left = vi16Vec3Left; vi16Vec3Left = vi16Vec4Left; vi16Vec4Left = vi16Vec5Left; vi16Vec5Left = vi16Vec6Left; vi16Vec6Left = vi16Vec7Left; } } 

性胜評䟡


圓初、MIPS SIMDを䜿甚する堎合のパフォヌマンスの向䞊を評䟡するための簡単なアプリケヌション合成テストを䜜成するこずを考えおいたした。 しかし、すべおの魅力にもかかわらず、このオプションはナヌザヌの実際のタスクから隔離されおいるため、指暙ではありたせん。 幞いなこずに、むマゞネヌションテクノロゞヌズずMIPSの埓業員は、オヌディオずビデオを倉換するために広く䜿甚されおいるオヌプン゜ヌスアプリケヌション[ L19 ] ffmpeg [ L18 ]に倚倧な貢献をしたした。 圌らは、他の誰も知らないように、問題のテクノロゞヌを適切に䜿甚する方法を知っおいるず信じおいたす。぀たり、このコヌドは可胜な限り効率的でなければなりたせん。


したがっお、ffmpegをMIPS SIMDサポヌトありずなしの2぀のバヌゞョンでコンパむルするず、同じ入力デヌタの䜜業速床を比范し、その結果に基づいおベクトルコンピュヌティングの有効性に぀いお結論を出すこずができたす。


Ffmpegビルド


Linuxを実行しおいるx86マシンで実行されたす。 開発ツヌルは、クロスコンパむルモヌドでImagination Technologies [ L20 ]サむトから䜿甚されたす。 テストは、執筆時点の最新の安定版リリヌス-ffmpeg 3.3 [ L19 ]で実行されたす。


MIPS SIMD察応バヌゞョンのFfmpeg蚭定


 ./configure --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags="-EL -static" --extra-ldflags="-EL -static" --disable-iconv 

たた、MIPS SIMDサポヌトを無効にした堎合


 ./configure --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags="-EL -static" --extra-ldflags="-EL -static" --disable-iconv --disable-msa 

構成蚭定の説明
パラメヌタ説明
--enable-cross-compileアセンブリは、優れたタヌゲットアヌキテクチャを備えたマシンで実行されたす。
--prefix = .. / ffmpeg-msamake installコマンドの埌にファむルが配眮されるディレクトリ
--cross-prefix = .. / mips-mti-linux-gnu-ツヌルチェヌンぞのパス
--arch = mipsタヌゲットアヌキテクチャ-MIPS
--cpu = p5600タヌゲットプロセッサコア-p5600
--target-os = linuxタヌゲットOS-Linux
--extra-cflags = "-EL -static"タヌゲットシステムはリトル゚ンディアンです。静的バむンディングを䜿甚したす
--extra-ldflags = "-EL -static"同様に
--disable-iconvテキスト゚ンコヌディングに関連する機胜を無効にしたす
--disable-msaMIPS SIMDを䜿甚しないでください

これらの手順を繰り返す堎合、libavcodec \ mips \ hevcpred_msa.cファむルに远加する必芁がある修正を行うために、MIPS SIMDサポヌト付きのffmpeg 3.3ビルドがマむナヌ゚ラヌで倱敗するこずに泚意しおください。


 #include "libavcodec/hevcdec.h" 

テスト䞭


バむカルT1プロセッサで実行されたす。


 # uname -a Linux baikal-BFK-18446744073709551615 4.4.41-bfk #0 SMP Tue Apr 25 15:54:24 MSK 2017 mips GNU/Linux 

x264 [ L21 ]およびx265 [ L22 ]を䜿甚しお゚ンコヌドされた2぀のビデオが入力ずしお機胜したす。 テストタスクは、スクリヌンショット付きのビデオを定期的にデコヌドするこずです。


 ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./The\ Simpsons\ Movie\ -\ Trailer_x264.mp4 -vf fps=1/10 ./out_img/ffmpeg-msa_x264_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf fps=1 ./out_img/ffmpeg-msa_x265_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./The\ Simpsons\ Movie\ -\ Trailer_x264.mp4 -vf fps=1/10 ./out_img/ffmpeg-soft_x264_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf fps=1 ./out_img/ffmpeg-soft_x265_%d.jpg -report -benchmark 

起動オプションの説明
パラメヌタ説明
-i ./Tears_400_x265.mp4凊理するファむル
-vf fps = 1スクリヌンショットを撮る期間頻床短いクリップの堎合は1秒、長いクリップの堎合は10秒
./out_img/ffmpeg-soft x264 d.jpg出力ファむル名テンプレヌト
-レポヌト䜜業結果に関するレポヌトを生成する
-ベンチマヌクパフォヌマンスデヌタを含める

Ffmpegの結果


スクリプト継続時間秒
x264 MIPS SIMD113
x265 MIPS SIMD22
x264 MIPS SIMD164
x265 MIPS SIMD52

, MIPS SIMD 1.5 — 2.4 .


ffmpeg github [ L23 ].


x264 MIPS SIMD (, )
 ffmpeg started on 2010-10-18 at 00:30:26 Report written to "ffmpeg-20101018-003026.log" Command line: ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i "./The Simpsons Movie - Trailer_x264.mp4" -vf "fps=1/10" "./out_img/ffmpeg-msa_x264_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './The Simpsons Movie - Trailer_x264.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1/10'. Reading option './out_img/ffmpeg-msa_x264_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./The Simpsons Movie - Trailer_x264.mp4. Successfully parsed a group of options. Opening an input file: ./The Simpsons Movie - Trailer_x264.mp4. [file @ 0x1fce0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] ISO: File Type Major Brand: isom [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Before avformat_find_stream_info() pos: 73516232 bytes read:65587 seeks:1 nb_streams:2 [h264 @ 0x1fcecb0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1fcecb0] nal_unit_type: 8, nal_ref_idc: 3 [h264 @ 0x1fcecb0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1fcecb0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1fcecb0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1fcecb0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1fcecb0] no picture [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] After avformat_find_stream_info() pos: 94845 bytes read:141348 seeks:2 frames:13 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './The Simpsons Movie - Trailer_x264.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2007-02-19T05:03:04.000000Z Duration: 00:02:17.30, start: 0.000000, bitrate: 4283 kb/s Stream #0:0(und), 12, 1/24000: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x544, 4221 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler Stream #0:1(und), 1, 1/48000: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 64 kb/s (default) Metadata: creation_time : 2007-02-19T05:03:08.000000Z handler_name : GPAC ISO Audio Handler Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-msa_x264_%d.jpg. Applying option vf (set video filters) with argument fps=1/10. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-msa_x264_%d.jpg. Successfully opened the file. detected 2 logical cores [h264 @ 0x20191e0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x20191e0] nal_unit_type: 8, nal_ref_idc: 3 Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x20191e0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x20191e0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x20191e0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x20191e0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x20191e0] no picture cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x2026050] nal_unit_type: 1, nal_ref_idc: 2 [h264 @ 0x2060f00] nal_unit_type: 1, nal_ref_idc: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x20191e0] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x202bf50] Setting 'fps' to value '1/10' [Parsed_fps_0 @ 0x202bf50] fps=1/10 [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'video_size' to value '1280x544' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'frame_rate' to value '24000/1001' [graph 0 input from stream 0:0 @ 0x202c6f0] w:1280 h:544 pixfmt:yuv420p tb:1/24000 fr:24000/1001 sar:0/1 sws_param:flags=2 [format @ 0x202c030] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x202c030] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x202c660] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x202c660] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x202c030] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x202bb80] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x202c660] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x202cf00] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x202c660] w:1280 h:544 fmt:yuv420p sar:0/1 -> w:1280 h:544 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1ff5f90] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1ff5f90] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-msa_x264_%d.jpg': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 10/1: Video: mjpeg, yuvj420p(pc), 1280x544, q=2-31, 200 kb/s, 0.10 fps, 0.10 tbn, 0.10 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). cur_dts is invalid (this is harmless if it occurs once at the start per stream) ... [h264 @ 0x2060f00] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [file @ 0x209dc40] Setting default whitelist 'file,crypto' [AVIOContext @ 0x2189ab0] Statistics: 0 seeks, 1 writeouts frame= 15 fps=0.2 q=1.6 size=N/A time=00:02:30.00 bitrate=N/A speed=2.03x No more output streams to write to, finishing. frame= 15 fps=0.2 q=1.6 Lsize=N/A time=00:02:30.00 bitrate=N/A speed=2.03x video:1382kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./The Simpsons Movie - Trailer_x264.mp4): Input stream #0:0 (video): 3288 packets read (72364468 bytes); 3288 frames decoded; Input stream #0:1 (audio): 1 packets read (134 bytes); Total: 3289 packets (72364602 bytes) demuxed Output file #0 (./out_img/ffmpeg-msa_x264_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1414925 bytes); Total: 15 packets (1414925 bytes) muxed bench: utime=113.070s 3288 frames successfully decoded, 0 decoding errors bench: maxrss=39264kB [Parsed_fps_0 @ 0x202bf50] 3288 frames in, 15 frames out; 3273 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1fd6230] Statistics: 73517562 bytes read, 5 seeks 

x265 MIPS SIMD (, )
 ffmpeg started on 2010-10-18 at 00:27:58 Report written to "ffmpeg-20101018-002758.log" Command line: ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf "fps=1" "./out_img/ffmpeg-msa_x265_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './Tears_400_x265.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1'. Reading option './out_img/ffmpeg-msa_x265_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./Tears_400_x265.mp4. Successfully parsed a group of options. Opening an input file: ./Tears_400_x265.mp4. [file @ 0x1fce0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] ISO: File Type Major Brand: iso4 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Before avformat_find_stream_info() pos: 705972 bytes read:32827 seeks:1 nb_streams:1 [hevc @ 0x1fceca0] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding VPS [hevc @ 0x1fceca0] Main profile bitstream [hevc @ 0x1fceca0] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding SPS [hevc @ 0x1fceca0] Main profile bitstream [hevc @ 0x1fceca0] Decoding VUI [hevc @ 0x1fceca0] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding PPS [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] After avformat_find_stream_info() pos: 20299 bytes read:65595 seeks:2 frames:1 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './Tears_400_x265.mp4': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 creation_time : 2014-08-25T18:10:46.000000Z Duration: 00:00:13.96, start: 0.125000, bitrate: 404 kb/s Stream #0:0(und), 1, 1/24000: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1920x800, 402 kb/s, 24 fps, 24 tbr, 24k tbn, 24 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-msa_x265_%d.jpg. Applying option vf (set video filters) with argument fps=1. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-msa_x265_%d.jpg. Successfully opened the file. detected 2 logical cores [hevc @ 0x1fe5a00] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding VPS [hevc @ 0x1fe5a00] Main profile bitstream [hevc @ 0x1fe5a00] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding SPS [hevc @ 0x1fe5a00] Main profile bitstream [hevc @ 0x1fe5a00] Decoding VUI [hevc @ 0x1fe5a00] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding PPS Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1fe5a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding SEI [hevc @ 0x1fe5a00] Skipped PREFIX SEI 5 [hevc @ 0x1fe5a00] Decoding SEI [hevc @ 0x1fe5a00] Skipped PREFIX SEI 6 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1ffeba0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x200c4d0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x200c4d0] Output frame with POC 0. [hevc @ 0x1fe5a00] Decoded frame with POC 0. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1fe5a00] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Output frame with POC 1. [hevc @ 0x1ffeba0] Decoded frame with POC 5. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1ffeba0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1ffeba0] Output frame with POC 2. [hevc @ 0x200c4d0] Decoded frame with POC 3. [Parsed_fps_0 @ 0x201a6c0] Setting 'fps' to value '1' [Parsed_fps_0 @ 0x201a6c0] fps=1/1 [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'video_size' to value '1920x800' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'frame_rate' to value '24/1' [graph 0 input from stream 0:0 @ 0x201abb0] w:1920 h:800 pixfmt:yuv420p tb:1/24000 fr:24/1 sar:0/1 sws_param:flags=2 [format @ 0x201aad0] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x201aad0] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x201a350] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x201a350] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x201aad0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x201a2f0] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x201a350] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x21d7da0] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x201a350] w:1920 h:800 fmt:yuv420p sar:0/1 -> w:1920 h:800 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1fe2ba0] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1fe2ba0] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-msa_x265_%d.jpg': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 1/1: Video: mjpeg, yuvj420p(pc), 1920x800, q=2-31, 200 kb/s, 1 fps, 1 tbn, 1 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) ... No more output streams to write to, finishing. frame= 15 fps=0.7 q=24.8 Lsize=N/A time=00:00:15.00 bitrate=N/A speed=0.668x video:1084kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./Tears_400_x265.mp4): Input stream #0:0 (video): 335 packets read (701773 bytes); 335 frames decoded; Total: 335 packets (701773 bytes) demuxed Output file #0 (./out_img/ffmpeg-msa_x265_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1109604 bytes); Total: 15 packets (1109604 bytes) muxed bench: utime=22.300s 335 frames successfully decoded, 0 decoding errors bench: maxrss=72432kB [Parsed_fps_0 @ 0x201a6c0] 335 frames in, 15 frames out; 320 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1fd6220] Statistics: 734659 bytes read, 2 seeks 

x264 MIPS SIMD (, )
 ffmpeg started on 2010-10-18 at 00:28:31 Report written to "ffmpeg-20101018-002831.log" Command line: ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i "./The Simpsons Movie - Trailer_x264.mp4" -vf "fps=1/10" "./out_img/ffmpeg-soft_x264_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv --disable-msa libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './The Simpsons Movie - Trailer_x264.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1/10'. Reading option './out_img/ffmpeg-soft_x264_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./The Simpsons Movie - Trailer_x264.mp4. Successfully parsed a group of options. Opening an input file: ./The Simpsons Movie - Trailer_x264.mp4. [file @ 0x1f4a0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] ISO: File Type Major Brand: isom [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Before avformat_find_stream_info() pos: 73516232 bytes read:65587 seeks:1 nb_streams:2 [h264 @ 0x1f4acb0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1f4acb0] nal_unit_type: 8, nal_ref_idc: 3 [h264 @ 0x1f4acb0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1f4acb0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1f4acb0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1f4acb0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1f4acb0] no picture [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] After avformat_find_stream_info() pos: 94845 bytes read:141348 seeks:2 frames:13 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './The Simpsons Movie - Trailer_x264.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2007-02-19T05:03:04.000000Z Duration: 00:02:17.30, start: 0.000000, bitrate: 4283 kb/s Stream #0:0(und), 12, 1/24000: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x544, 4221 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler Stream #0:1(und), 1, 1/48000: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 64 kb/s (default) Metadata: creation_time : 2007-02-19T05:03:08.000000Z handler_name : GPAC ISO Audio Handler Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-soft_x264_%d.jpg. Applying option vf (set video filters) with argument fps=1/10. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-soft_x264_%d.jpg. Successfully opened the file. detected 2 logical cores [h264 @ 0x1f951e0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1f951e0] nal_unit_type: 8, nal_ref_idc: 3 Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1f951e0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1f951e0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1f951e0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1f951e0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1f951e0] no picture cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1fa2050] nal_unit_type: 1, nal_ref_idc: 2 [h264 @ 0x1fdcf00] nal_unit_type: 1, nal_ref_idc: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1f951e0] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x1fa7f50] Setting 'fps' to value '1/10' [Parsed_fps_0 @ 0x1fa7f50] fps=1/10 [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'video_size' to value '1280x544' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'frame_rate' to value '24000/1001' [graph 0 input from stream 0:0 @ 0x1fa86f0] w:1280 h:544 pixfmt:yuv420p tb:1/24000 fr:24000/1001 sar:0/1 sws_param:flags=2 [format @ 0x1fa8030] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x1fa8030] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x1fa8660] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x1fa8660] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x1fa8030] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x1fa7b80] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x1fa8660] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x1fa8f00] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x1fa8660] w:1280 h:544 fmt:yuv420p sar:0/1 -> w:1280 h:544 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1f71f90] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1f71f90] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-soft_x264_%d.jpg': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 10/1: Video: mjpeg, yuvj420p(pc), 1280x544, q=2-31, 200 kb/s, 0.10 fps, 0.10 tbn, 0.10 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x1fa7f50] Dropping 1 frame(s). [h264 @ 0x1fa2050] nal_unit_type: 1, nal_ref_idc: 0 ... [AVIOContext @ 0x2229af0] Statistics: 0 seeks, 1 writeouts No more output streams to write to, finishing. frame= 15 fps=0.1 q=1.6 Lsize=N/A time=00:02:30.00 bitrate=N/A speed=1.45x video:1382kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./The Simpsons Movie - Trailer_x264.mp4): Input stream #0:0 (video): 3288 packets read (72364468 bytes); 3288 frames decoded; Input stream #0:1 (audio): 1 packets read (134 bytes); Total: 3289 packets (72364602 bytes) demuxed Output file #0 (./out_img/ffmpeg-soft_x264_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1414925 bytes); Total: 15 packets (1414925 bytes) muxed bench: utime=164.240s 3288 frames successfully decoded, 0 decoding errors bench: maxrss=39936kB [Parsed_fps_0 @ 0x1fa7f50] 3288 frames in, 15 frames out; 3273 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1f52230] Statistics: 73517562 bytes read, 5 seeks 

x265 MIPS SIMD (, )
 ffmpeg started on 2010-10-18 at 00:27:14 Report written to "ffmpeg-20101018-002714.log" Command line: ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf "fps=1" "./out_img/ffmpeg-soft_x265_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv --disable-msa libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './Tears_400_x265.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1'. Reading option './out_img/ffmpeg-soft_x265_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./Tears_400_x265.mp4. Successfully parsed a group of options. Opening an input file: ./Tears_400_x265.mp4. [file @ 0x1f4a0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] ISO: File Type Major Brand: iso4 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Before avformat_find_stream_info() pos: 705972 bytes read:32827 seeks:1 nb_streams:1 [hevc @ 0x1f4aca0] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding VPS [hevc @ 0x1f4aca0] Main profile bitstream [hevc @ 0x1f4aca0] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding SPS [hevc @ 0x1f4aca0] Main profile bitstream [hevc @ 0x1f4aca0] Decoding VUI [hevc @ 0x1f4aca0] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding PPS [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] After avformat_find_stream_info() pos: 20299 bytes read:65595 seeks:2 frames:1 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './Tears_400_x265.mp4': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 creation_time : 2014-08-25T18:10:46.000000Z Duration: 00:00:13.96, start: 0.125000, bitrate: 404 kb/s Stream #0:0(und), 1, 1/24000: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1920x800, 402 kb/s, 24 fps, 24 tbr, 24k tbn, 24 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-soft_x265_%d.jpg. Applying option vf (set video filters) with argument fps=1. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-soft_x265_%d.jpg. Successfully opened the file. detected 2 logical cores [hevc @ 0x1f61a00] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding VPS [hevc @ 0x1f61a00] Main profile bitstream [hevc @ 0x1f61a00] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding SPS [hevc @ 0x1f61a00] Main profile bitstream [hevc @ 0x1f61a00] Decoding VUI [hevc @ 0x1f61a00] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding PPS Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding SEI [hevc @ 0x1f61a00] Skipped PREFIX SEI 5 [hevc @ 0x1f61a00] Decoding SEI [hevc @ 0x1f61a00] Skipped PREFIX SEI 6 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f7aba0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] Output frame with POC 0. [hevc @ 0x1f61a00] Decoded frame with POC 0. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Output frame with POC 1. [hevc @ 0x1f7aba0] Decoded frame with POC 5. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f7aba0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f7aba0] Output frame with POC 2. [hevc @ 0x1f884d0] Decoded frame with POC 3. [Parsed_fps_0 @ 0x1f966c0] Setting 'fps' to value '1' [Parsed_fps_0 @ 0x1f966c0] fps=1/1 [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'video_size' to value '1920x800' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'frame_rate' to value '24/1' [graph 0 input from stream 0:0 @ 0x1f96bb0] w:1920 h:800 pixfmt:yuv420p tb:1/24000 fr:24/1 sar:0/1 sws_param:flags=2 [format @ 0x1f96ad0] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x1f96ad0] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [hevc @ 0x1f61a00] Decoded frame with POC 1. [hevc @ 0x1f7aba0] Decoded frame with POC 2. [auto_scaler_0 @ 0x1f96350] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x1f96350] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x1f96ad0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x1f962f0] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x1f96350] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x2153da0] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x1f96350] w:1920 h:800 fmt:yuv420p sar:0/1 -> w:1920 h:800 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1f5eba0] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1f5eba0] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-soft_x265_%d.jpg': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 1/1: Video: mjpeg, yuvj420p(pc), 1920x800, q=2-31, 200 kb/s, 1 fps, 1 tbn, 1 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f884d0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] Output frame with POC 3. [Parsed_fps_0 @ 0x1f966c0] Dropping 1 frame(s). frame= 0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed= 0x cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x1f966c0] Dropping 1 frame(s). [hevc @ 0x1f61a00] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] Output frame with POC 4. ... No more output streams to write to, finishing. frame= 15 fps=0.5 q=24.8 Lsize=N/A time=00:00:15.00 bitrate=N/A speed=0.451x video:1084kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./Tears_400_x265.mp4): Input stream #0:0 (video): 335 packets read (701773 bytes); 335 frames decoded; Total: 335 packets (701773 bytes) demuxed Output file #0 (./out_img/ffmpeg-soft_x265_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1109604 bytes); Total: 15 packets (1109604 bytes) muxed bench: utime=52.330s 335 frames successfully decoded, 0 decoding errors bench: maxrss=72480kB [Parsed_fps_0 @ 0x1f966c0] 335 frames in, 15 frames out; 320 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1f52220] Statistics: 734659 bytes read, 2 seeks 

結論



参照資料


[L1] — -1 ;
[L2] — MIPSfpga-plus github ;
[L3] — P-Class P5600 Multiprocessor Core ;
[L4] — MIPS, ;
[L5] — Texas Instruments. Digital Signal Processors ;
[L6] — GPU ;
[L7] — OpenCL. ;
[L8] — Wikipedia: ;
[L9] — TFilter. Free online FIR filter design tool ;
[L10] — Wikipedia: ;
[L11] — Wikipedia: - ;
[L12] — Wikipedia: - ;
[L13] — Wikipedia: SIMD ;
[L14] — Wikipedia: ;
[L15] — MIPS SIMD ;
[L16] — GCC: MIPS SIMD Architecture (MSA) Support ;
[L17] — GCC: MIPS SIMD Architecture Built-in Functions ;
[L18] — ffmpeg github ( libavcodec/mips/) ;
[L19] — FFmpeg multimedia framework ;
[L20] — Codescape MIPS SDK ;
[L21] — H.264 Demo Clips ;
[L22] — x256. Sample HEVC Video Files ;
[L23] — ffmpeg
[L24] — ;


ドキュメント


[D1] — ., . — ;
[D2] — MIPS Architecture for Programmers Volume IV-j: The MIPS32 SIMD Architecture Module ;
[D3] — MIPS SIMD programming. Optimizing multimedia codecs ;


画像ず衚


[P1] — - -1 . (: L1 );
[P2] — TFilter. 1 ();
[P3] — TFilter. 1 ;
[P4] — TFilter. 2 ();
[P5] — TFilter. 2 ;
[P6] — SIMD- (: D3 );
[P7] — MSA Vector registers (: D2 );
[P8] — MADDV Operation description (: D2 );



Source: https://habr.com/ru/post/J328566/


All Articles