meshoptimizerã®éçºäžã«ãããã®ã¢ã«ãŽãªãºã ã¯SIMDã䜿çšã§ããŸããïŒããšããçåããã°ãã°çºçããŸãã
ã©ã€ãã©ãªã¯ããã©ãŒãã³ã¹ãéèŠããŠããŸãããSIMDãåžžã«é床ã倧å¹
ã«åäžããããšã¯éããŸããã æ®å¿µãªãããSIMDã¯ã³ãŒãã®ç§»æ€æ§ãšä¿å®æ§ãäœäžãããå¯èœæ§ããããŸãã ãããã£ãŠãããããã®ã±ãŒã¹ã§ã劥åç¹ãæ¢ãå¿
èŠããããŸãã ããã©ãŒãã³ã¹ãæéèŠã§ããå ŽåãSSEããã³NEONåœä»€ã»ããã®åå¥ã®SIMDå®è£
ãéçºããã³ç¶æããå¿
èŠããããŸãã ãã以å€ã®å Žåã¯ãSIMDã䜿çšããããšã®å¹æãçè§£ããå¿
èŠããããŸãã æ¬æ¥ã¯ãSSEn / AVXnåœä»€ã»ããã䜿çšããŠãæè¿ã©ã€ãã©ãªã«è¿œå ãããæ°ããã¢ã«ãŽãªãºã ã§ããããããªã¡ãã·ã¥ã·ã³ããªãã¡ã€ã¢ãŒã®é«éåã詊ã¿ãŸãã

ãã³ãããŒã¯ã§ã¯ãã¿ã€ä»ã¢ãã«ã600äžåã®äžè§åœ¢ãããã®æ°ã®0.1ïŒ
ã«ç°¡çŽ åããŸãã ã¿ãŒã²ããx64ã¢ãŒããã¯ãã£çšã«Microsoft Visual Studio 2019ã³ã³ãã€ã©ã䜿çšããŸãã ã¹ã«ã©ãŒã¢ã«ãŽãªãºã ã¯ãåäžã®Intel Core i7-8700Kã¹ã¬ããïŒçŽ4.4 GHzïŒã§çŽ210ããªç§ã§ãã®ãããªåçåãå®è¡ã§ããŸãã ããã¯ã1ç§ãããçŽ2850äžã®äžè§åœ¢ã«çžåœããŸãã å®éã«ã¯ããã§ååãããããŸããããæ©åšã®æå€§èœåãæ¢æ±ããããšæããŸããã
å Žåã«ãã£ãŠã¯ãã°ãªãããæçã«åå²ããããšã§æé ã䞊ååã§ããŸããããã®ããã«ã¯å¢çãç¶æããããã«æ¥ç¶æ§ã®è¿œå åæãè¡ãå¿
èŠããããããããã§ã¯çŽç²ãªSIMDæé©åã«å¶éããŸãã
äžæ¬¡å
æé©åã®å¯èœæ§ãçè§£ããããã«ãã€ã³ãã«VTuneã䜿çšããŠãããã¡ã€ãªã³ã°ãå®è¡ããŸãã æé ã100åå®è¡ããŠãååãªããŒã¿ãããããšã確èªããŸãã

ããã§ã¯ãåæ©èœã®å®è¡æéãä¿®æ£ããããã«ããã¯ãèŠã€ããããã«ããã€ã¯ãã¢ãŒããã¯ãã£ã¢ãŒãããªã³ã«ããŸããã åçåã¯äžé£ã®é¢æ°ã䜿çšããŠå®è¡ãããå颿°ã«ã¯äžå®ã®ãµã€ã¯ã«æ°ãå¿
èŠã§ããããšãããããŸãã 颿°ã®ãªã¹ãã¯æéã§ãœãŒããããŸãã ããã§ã¯ãã¢ã«ãŽãªãºã ãçè§£ããããããããã«ãå®è¡é ã«äžŠãã§ããŸãã
rescalePositions
ã¯ããã¹ãŠã®é ç¹ã®äœçœ®ãåäžã®ç«æ¹äœã«æ£èŠåãã rescalePositions
ã䜿çšããŠéååã®æºåããcomputeVertexIds
computeVertexIds
ã¯ãç¹å®ã®ãµã€ãºã®åäžãªã°ãªããäžã®åé ç¹ã«å¯ŸããŠ30ãããã®éååãããèå¥åãèšç®ããŸããå軞ã¯ã°ãªããäžã§éååãããŸãïŒã°ãªãããµã€ãºã¯10ããããªã®ã§ãèå¥åã¯30ã§ãïŒãcountTriangles
ã¯ã1ã€ã®ã°ãªããã»ã«å
ã®ãã¹ãŠã®é ç¹ã®åéåãä»®å®ããŠãã€ãããŒã¿ãŒãæå®ãããã°ãªãããµã€ãºã«å¯ŸããŠäœæããäžè§åœ¢ã®ããããã®æ°ãèšç®ããŸãfillVertexCells
ã¯ããã¹ãŠã®é ç¹ã察å¿ããã»ã«ã«fillVertexCells
ããããŒãã«ãå¡ãã€ã¶ããŸãã åãIDãæã€ãã¹ãŠã®é ç¹ã1ã€ã®ã»ã«ã«å¯Ÿå¿ããŸãfillCellQuadrics
ã¯ãåã»ã«ã®2 Quadric
æ§é ïŒ Quadric
察称ãããªãã¯ã¹ïŒãåããŠã察å¿ãããžãªã¡ããªã«é¢ããéçŽæ
å ±ãåæ ããŸããfillCellRemap
ã¯ãåã»ã«ã®é ç¹ã€ã³ããã¯ã¹ãèšç®ãããã®ã»ã«ã®é ç¹ã®1ã€ãéžæããŠã幟äœåŠçæªã¿ãæå°åããŸãfilterTriangles
ã¯ãåã«æ§ç¯ãããé ç¹ã»ã«é ç¹ããŒãã«ã«åŸã£ãŠãäžè§åœ¢ã®æçµã»ããã衚瀺ããŸãã åçŽãªå€æã§ã¯ãå¹³åã§æå€§5ïŒ
ã®äžè§åœ¢ã®è€è£œãçæãããããã颿°ã¯è€è£œããã£ã«ã¿ãªã³ã°ããŸãã
computeVertexIds
ããã³
countTriangles
ã¯æ°åå®è¡ãããŸãïŒã¢ã«ãŽãªãºã ã¯ãé ç¹ãããŒãžããããã®ã¡ãã·ã¥ãµã€ãºã決å®ããå éãã€ããªæ€çŽ¢ãå®è¡ããŠã¿ãŒã²ããã®äžè§åœ¢ã®æ°ïŒãã®å Žåã¯6000ïŒãéæããåã¡ãã·ã¥ãµã€ãºãåå埩ã§çæããäžè§åœ¢ã®æ°ãèšç®ããŸãã ä»ã®æ©èœã¯äžåºŠèµ·åãããŸãã ãã®ãã¡ã€ã«ã§ã¯ã5åã®æ€çŽ¢ãã¹ã§ã¿ãŒã²ããã¡ãã·ã¥ãµã€ãºã40
3ã«ãªããŸãã
VTuneã¯ããªãœãŒã¹ãæãå€ãæ¶è²»ãã颿°ã2æ¬¡é¢æ°ãèšç®ãã颿°ã§ããããšã矩åçã«å ±åããŸããããã«ã¯ã21ç§éã®åèšå®è¡æéã®ã»ãŒååãããããŸãã ããã¯ãSIMDãæé©åããããã®æåã®ç®æšã§ãã
SIMDããŒã¹ãã€ããŒã¹
fillCellQuadrics
ã®ãœãŒã¹ã³ãŒããèŠãŠãèšç®å¯Ÿè±¡ãæ£ç¢ºã«çè§£ããŠã¿ãŸãããã
static void fillCellQuadrics(Quadric* cell_quadrics, const unsigned int* indices, size_t index_count, const Vector3* vertex_positions, const unsigned int* vertex_cells) { for (size_t i = 0; i < index_count; i += 3) { unsigned int i0 = indices[i + 0]; unsigned int i1 = indices[i + 1]; unsigned int i2 = indices[i + 2]; unsigned int c0 = vertex_cells[i0]; unsigned int c1 = vertex_cells[i1]; unsigned int c2 = vertex_cells[i2]; bool single_cell = (c0 == c1) & (c0 == c2); float weight = single_cell ? 3.f : 1.f; Quadric Q; quadricFromTriangle(Q, vertex_positions[i0], vertex_positions[i1], vertex_positions[i2], weight); if (single_cell) { quadricAdd(cell_quadrics[c0], Q); } else { quadricAdd(cell_quadrics[c0], Q); quadricAdd(cell_quadrics[c1], Q); quadricAdd(cell_quadrics[c2], Q); } } }
ãã®é¢æ°ã¯ããã¹ãŠã®äžè§åœ¢ãå埩åŠçããåäžè§åœ¢ã®2æ¬¡é¢æ°ãèšç®ããŠãåã»ã«ã®2æ¬¡é¢æ°ã«è¿œå ããŸãã Quadric-10ã®æµ®åå°æ°ç¹æ°ãšããŠè¡šããã4Ã4察称è¡åïŒ
struct Quadric { float a00; float a10, a11; float a20, a21, a22; float b0, b1, b2, c; };
äºæ¬¡é¢æ°ã®èšç®ã«ã¯ãäžè§åœ¢ã®å¹³é¢æ¹çšåŒãè§£ããäºæ¬¡è¡åãäœæããäžè§åœ¢ã®é¢ç©ã䜿çšããŠéã¿ä»ãããå¿
èŠããããŸãã
static void quadricFromPlane(Quadric& Q, float a, float b, float c, float d) { Q.a00 = a * a; Q.a10 = b * a; Q.a11 = b * b; Q.a20 = c * a; Q.a21 = c * b; Q.a22 = c * c; Q.b0 = d * a; Q.b1 = d * b; Q.b2 = d * c; Qc = d * d; } static void quadricFromTriangle(Quadric& Q, const Vector3& p0, const Vector3& p1, const Vector3& p2, float weight) { Vector3 p10 = {p1.x - p0.x, p1.y - p0.y, p1.z - p0.z}; Vector3 p20 = {p2.x - p0.x, p2.y - p0.y, p2.z - p0.z}; Vector3 normal = { p10.y * p20.z - p10.z * p20.y, p10.z * p20.x - p10.x * p20.z, p10.x * p20.y - p10.y * p20.x }; float area = normalize(normal); float distance = normal.x*p0.x + normal.y*p0.y + normal.z*p0.z; quadricFromPlane(Q, normal.x, normal.y, normal.z, -distance); quadricMul(Q, area * weight); }
æµ®åå°æ°ç¹æŒç®ã倿°ããããã«èŠãããããSIMDã䜿çšããŠäžŠååã§ããŸãã æåã«ãåãã¯ãã«ã4å¹
ã®SIMDãã¯ãã«ãšããŠè¡šãããŸãã
Quadric
æ§é ã10ã§ã¯ãªã12ã®æµ®åå°æ°ç¹æ°ã«å€æŽããŠã3ã€ã®SIMDã¬ãžã¹ã¿ã«æ£ç¢ºã«é©åãïŒãµã€ãºã倧ããããŠãããã©ãŒãã³ã¹ã«åœ±é¿ããªãïŒããã£ãŒã«ãã®é åºã倿ŽããŠèšç®ãè¡ããŸã
quadricFromPlane
ã¯ããåäžã«ãªããŸããã
struct Quadric { float a00, a11, a22; float pad0; float a10, a21, a20; float pad1; float b0, b1, b2, c; };
ããã§ãäžéšã®èšç®ãç¹ã«ã¹ã«ã©ãŒç©ã¯ãSSEã®ä»¥åã®ããŒãžã§ã³ãšããŸãäžè²«æ§ããããŸããã 幞ããªããšã«ãã¹ã«ã©ãŒç©ã®åœä»€ãSSE4.1ã«ç»å ŽããŸãããããã¯éåžžã«äŸ¿å©ã§ãã
static void fillCellQuadrics(Quadric* cell_quadrics, const unsigned int* indices, size_t index_count, const Vector3* vertex_positions, const unsigned int* vertex_cells) { const int yzx = _MM_SHUFFLE(3, 0, 2, 1); const int zxy = _MM_SHUFFLE(3, 1, 0, 2); const int dp_xyz = 0x7f; for (size_t i = 0; i < index_count; i += 3) { unsigned int i0 = indices[i + 0]; unsigned int i1 = indices[i + 1]; unsigned int i2 = indices[i + 2]; unsigned int c0 = vertex_cells[i0]; unsigned int c1 = vertex_cells[i1]; unsigned int c2 = vertex_cells[i2]; bool single_cell = (c0 == c1) & (c0 == c2); __m128 p0 = _mm_loadu_ps(&vertex_positions[i0].x); __m128 p1 = _mm_loadu_ps(&vertex_positions[i1].x); __m128 p2 = _mm_loadu_ps(&vertex_positions[i2].x); __m128 p10 = _mm_sub_ps(p1, p0); __m128 p20 = _mm_sub_ps(p2, p0); __m128 normal = _mm_sub_ps( _mm_mul_ps( _mm_shuffle_ps(p10, p10, yzx), _mm_shuffle_ps(p20, p20, zxy)), _mm_mul_ps( _mm_shuffle_ps(p10, p10, zxy), _mm_shuffle_ps(p20, p20, yzx))); __m128 areasq = _mm_dp_ps(normal, normal, dp_xyz);
ãã®ã³ãŒãã«ã¯ç¹ã«è峿·±ããã®ã¯ãããŸããã ã¢ã©ã€ã³ãããŠããªãããŒã/ã¹ãã¢åœä»€ãè±å¯ã«äœ¿çšããŠããŸãã Vector3ã®å
¥åã¯ã¢ã©ã€ã¡ã³ãã§ããŸãããã¢ã©ã€ã¡ã³ããããŠããªãèªã¿åãã«å¯ŸããŠç®ç«ã£ãããã«ãã£ã¯ãªãããã§ãã 颿°ã®ååã§ã¯ãã¯ãã«ã䜿çšãããŠããªãããšã«æ³šæããŠãã ãã-ãã¯ãã«ã¯3ã€ã®ã³ã³ããŒãã³ããæã¡ãå Žåã«ãã£ãŠã¯1ã€ã ãã§ãïŒareasq / area / distanceã®èšç®ãåç
§ïŒãäžæ¹ãããã»ããµã¯4ã€ã®æäœã䞊è¡ããŠå®è¡ããŸãã ãããã«ããã䞊ååãã©ã®ããã«åœ¹ç«ã€ããèŠãŠã¿ãŸãããã

fillCellQuadrics
ã®100åã®éå§ã9.8ç§ã§ã¯ãªã5.3ç§ã§å®è¡ãããããã«ãªããåæäœã§çŽ45ããªç§ç¯çŽãããŸã-æªãã¯ãããŸããããããã»ã©å°è±¡çã§ã¯ãããŸããã å€ãã®åœä»€ã§ã¯ã4ã€ã®ã³ã³ããŒãã³ãã®ä»£ããã«3ã€ã®ã³ã³ããŒãã³ãã䜿çšããæ£ç¢ºãªä¹ç®ã䜿çšããŠãããããããªãã®é
å»¶ãçºçããŸãã 以åã«SIMDã®æç€ºãæžããããšãããã°ãã¹ã«ã©ãŒç©ãæ£ããè¡ãæ¹æ³ãç¥ã£ãŠããŸãã
ãããè¡ãã«ã¯ãäžåºŠã«4ã€ã®ãã¯ãã«ãå®è¡ããå¿
èŠããããŸãã 1ã€ã®SIMDã¬ãžã¹ã¿ã«1ã€ã®ãã«ãã¯ãã«ãæ ŒçŽãã代ããã«ã3ã€ã®ã¬ãžã¹ã¿ã䜿çšããŸãã1ã€ã¯
x
4ã€ã®ã³ã³ããŒãã³ããæ ŒçŽãããã1ã€ã¯
ãæ ŒçŽãã3çªç®ã®
z
æ ŒçŽããŸãã ããã§ã¯ãäžåºŠã«4ã€ã®ãã¯ãã«ãå¿
èŠã§ããã€ãŸãã4ã€ã®äžè§åœ¢ãåæã«åŠçããŸãã
åçã€ã³ããã¯ã¹ã䜿çšããé
åã倿°ãããŸãã éåžžã
x
/
y
/
z
ã³ã³ããŒãã³ãã®æºåãããé
åã«ããŒã¿ã転éããã®ã«åœ¹ç«ã¡ãŸãïŒãŸãã¯ãããããå
¥åã®8ã€ã®é ç¹ã®ããããã«å¯ŸããŠãããšãã°ã
float x[8], y[8], z[8]
ãªã©ã®å°ããªSIMDã¬ãžã¹ã¿ãé垞䜿çšãããŸãããŒã¿ïŒããã¯AoSoAïŒé
åæ§é ã®é
åïŒãšåŒã°ãããã£ãã·ã¥ã®å±ææ§ãšSIMDã¬ãžã¹ã¿ãŒãžã®ããŒãã®å®¹æãã®ãã©ã³ã¹ãåããŠããŸãïŒããããã§ã¯åçãªã€ã³ããã¯ã¹ä»ãã¯ããŸãããŸãåäœããªããããéåžžã®ããã«4ã€ã®äžè§åœ¢ã®ããŒã¿ãããŒããã䟿å©ãªæ¹æ³ã§ãã¯ãã«ã転眮ããŸããã¯ã
_MM_TRANSPOSE
çè«çã«ã¯ãç¬èªã®SIMDã¬ãžã¹ã¿ã§4ã€ã®æé2次ã®åã³ã³ããŒãã³ããèšç®ããå¿
èŠããããŸãïŒããšãã°ã
a00
æé2次ã®4ã€ã®ã³ã³ããŒãã³ããæã€__m128 Q_a00ããããŸãïŒã ãã®å Žåã2æ¬¡é¢æ°ã®æŒç®ã¯4ã¯ã€ãã®SIMDåœä»€ã«éåžžã«ããé©åãã倿ã«ããå®éã«ã³ãŒãã®é床ãäœäžããŸãããããã£ãŠãåæãã¯ãã«ã®ã¿ã転眮ããæ¬¡ã«å¹³é¢æ¹çšåŒã転眮ããŠã2æ¬¡é¢æ°ã®èšç®ã«äœ¿çšããã®ãšåãã³ãŒããå®è¡ããŸããããããç¹°ãè¿ããŸã4åã ã³ãŒãã¯æ¬¡ã®ããã«ãªãã平颿¹çšåŒãèšç®ããŸãïŒæ®ãã®éšåã¯ç°¡æœã«ããããã«çç¥ãããŠããŸãïŒã
unsigned int i00 = indices[(i + 0) * 3 + 0]; unsigned int i01 = indices[(i + 0) * 3 + 1]; unsigned int i02 = indices[(i + 0) * 3 + 2]; unsigned int i10 = indices[(i + 1) * 3 + 0]; unsigned int i11 = indices[(i + 1) * 3 + 1]; unsigned int i12 = indices[(i + 1) * 3 + 2]; unsigned int i20 = indices[(i + 2) * 3 + 0]; unsigned int i21 = indices[(i + 2) * 3 + 1]; unsigned int i22 = indices[(i + 2) * 3 + 2]; unsigned int i30 = indices[(i + 3) * 3 + 0]; unsigned int i31 = indices[(i + 3) * 3 + 1]; unsigned int i32 = indices[(i + 3) * 3 + 2];
ã³ãŒãã¯ããå°ãé·ããªããŸãããåå埩ã§4ã€ã®äžè§åœ¢ãåŠçããããã«ãªãããã®ããã®SSE4.1åœä»€ã¯äžèŠã«ãªããŸããã çè«çã«ã¯ãSIMDãŠããããããå¹ççã«äœ¿çšããå¿
èŠããããŸãã ãããã©ã®ããã«åœ¹ç«ã€ãèŠãŠã¿ãŸãããã

倧äžå€«ã倧äžå€«ã§ãã
fillCellQuadrics
颿°ã¯SIMDãªãã§å
ã®é¢æ°ã®ã»ãŒ2åã®é床ã§å®è¡ãããŸãããã³ãŒãã¯ãããã«å éããŸããããããè€éãã®å€§å¹
ãªå¢å ãæ£åœåãããã©ããã¯äžæã§ãã çè«çã«ã¯ãAVX2ã䜿çšããŠå埩ããšã«8ã€ã®äžè§åœ¢ãåŠçã§ããŸãããããã§ã¯ã«ãŒããæåã§ããã«ã¹ãã³ããå¿
èŠããããŸãïŒçæ³çã«ã¯ããã®ã³ãŒãã¯ãã¹ãŠISPCã䜿çšããŠçæãããŸãããè¯ãã³ãŒããçæããããã®ç§ã®çŽ æŽãªè©Šã¿ã¯æåããŸããã§ããïŒã·ãŒã±ã³ã¹ã®ããŒã/ä¿åã®ä»£ããã«åœŒã¯æç¶çã«ã®ã£ã¶ãŒ/ã¹ãã£ãã¿ãŒãçºè¡ãããããå®è¡é床ã倧å¹
ã«äœäžããŸããã ä»ã®ããšã詊ããŠã¿ãŸãããã
AVX2 = SSE2 + SSE2
AVX2ã¯ãå°ãç¬ç¹ãªäžé£ã®åœä»€ã§ãã 8å¹
ã®æµ®åå°æ°ç¹ã¬ãžã¹ã¿ãŒãããã1ã€ã®åœä»€ã§8ã€ã®æäœãå®è¡ã§ããŸãã ããããå®éã«ã¯ããã®ãããªåœä»€ã¯ãã¬ãžã¹ã¿ãŒã®ååã§å®è¡ããã2ã€ã®SSE2åœä»€ãšéãã¯ãããŸããïŒç§ã®ç¥ãéããAVX2ãæèŒããæåã®ããã»ããµãŒã¯ã2ã€ä»¥äžã®ãã€ã¯ããªãã¬ãŒã·ã§ã³ã§ååœä»€ããã³ãŒãããããšããµããŒãããŠãããããããã©ãŒãã³ã¹ã®åäžã¯åœä»€ã®æœåºãã§ãŒãºã«ãã£ãŠå¶éãããŠããŸããïŒã ããšãã°ã
_mm_dp_ps
ã¯2ã€ã®SSE2ã¬ãžã¹ã¿éã§ã¹ã«ã©ãŒç©ãå®è¡ãã
_mm256_dp_ps
ã¯2ã€ã®AVX2ã¬ãžã¹ã¿ã®2ã€ã®ååéã§2ã€ã®ã¹ã«ã©ãŒç©ãçæãããããååããšã«4å¹
ã«å¶éãããŸãã
ãã®ãããAVX2ã³ãŒãã¯æ®éçãªã8ã¯ã€ãSIMDããšã¯ç°ãªãå ŽåããããŸãããããã§ã¯æå©ã«æ©èœããŸãã4ã¯ã€ããã¯ãã«ã転眮ããŠãã¯ãã«åãæ¹åãã代ããã«ãSSE2ã®ä»£ããã«AVX2åœä»€ã䜿çšããŠæåã®ããŒãžã§ã³ã®SIMDã«æ»ããã«ãŒãã2åã«ããŸã/ SSE4ã 4å¹
ã®ãã¯ãã«ãããŒãããŠä¿åããå¿
èŠããããŸãããäžè¬çã«ã¯ãããã€ãã®èšå®ã§
_mm256
_mm_
ã
_mm256
ã«ã
_mm_
ã
_mm256
ã«å€æŽããã ãã§ãã
unsigned int i00 = indices[(i + 0) * 3 + 0]; unsigned int i01 = indices[(i + 0) * 3 + 1]; unsigned int i02 = indices[(i + 0) * 3 + 2]; unsigned int i10 = indices[(i + 1) * 3 + 0]; unsigned int i11 = indices[(i + 1) * 3 + 1]; unsigned int i12 = indices[(i + 1) * 3 + 2]; __m256 p0 = _mm256_loadu2_m128( &vertex_positions[i10].x, &vertex_positions[i00].x); __m256 p1 = _mm256_loadu2_m128( &vertex_positions[i11].x, &vertex_positions[i01].x); __m256 p2 = _mm256_loadu2_m128( &vertex_positions[i12].x, &vertex_positions[i02].x); __m256 p10 = _mm256_sub_ps(p1, p0); __m256 p20 = _mm256_sub_ps(p2, p0); __m256 normal = _mm256_sub_ps( _mm256_mul_ps( _mm256_shuffle_ps(p10, p10, yzx), _mm256_shuffle_ps(p20, p20, zxy)), _mm256_mul_ps( _mm256_shuffle_ps(p10, p10, zxy), _mm256_shuffle_ps(p20, p20, yzx))); __m256 areasq = _mm256_dp_ps(normal, normal, dp_xyz); __m256 area = _mm256_sqrt_ps(areasq); __m256 areanz = _mm256_cmp_ps(area, _mm256_setzero_ps(), _CMP_NEQ_OQ); normal = _mm256_and_ps(_mm256_div_ps(normal, area), areanz); __m256 distance = _mm256_dp_ps(normal, p0, dp_xyz); __m256 negdistance = _mm256_sub_ps(_mm256_setzero_ps(), distance); __m256 normalnegdist = _mm256_blend_ps(normal, negdistance, 0x88); __m256 Qx = _mm256_mul_ps(normal, normal); __m256 Qy = _mm256_mul_ps( _mm256_shuffle_ps(normal, normal, _MM_SHUFFLE(3, 2, 2, 1)), _mm256_shuffle_ps(normal, normal, _MM_SHUFFLE(3, 0, 1, 0))); __m256 Qz = _mm256_mul_ps(negdistance, normalnegdist);
ããã§ãåä¿¡ãã
Qx
/
Qz
/
Qz
128ãããã®ååããšã«ã2æ¬¡é¢æ°ã远å ããããã«äœ¿çšããã®ãšåãã³ãŒããå®è¡ã§ããŸãã 代ããã«ãäžè§åœ¢ã®1ã€ã®ã»ã«ã«3ã€ã®é ç¹ãããå ŽåïŒ
single_cell == true
ïŒãå¥ã®äžè§åœ¢ã«å¥ã®ã»ã«ã®3ã€ã®é ç¹ãããå¯èœæ§ãéåžžã«é«ããšæ³å®ããAVX2ã䜿çšããŠ2æ¬¡ã®æçµéçŽãå®è¡ããŸãïŒ
unsigned int c00 = vertex_cells[i00]; unsigned int c01 = vertex_cells[i01]; unsigned int c02 = vertex_cells[i02]; unsigned int c10 = vertex_cells[i10]; unsigned int c11 = vertex_cells[i11]; unsigned int c12 = vertex_cells[i12]; bool single_cell = (c00 == c01) & (c00 == c02) & (c10 == c11) & (c10 == c12); if (single_cell) { area = _mm256_mul_ps(area, _mm256_set1_ps(3.f)); Qx = _mm256_mul_ps(Qx, area); Qy = _mm256_mul_ps(Qy, area); Qz = _mm256_mul_ps(Qz, area); Quadric& q00 = cell_quadrics[c00]; Quadric& q10 = cell_quadrics[c10]; __m256 q0x = _mm256_loadu2_m128(&q10.a00, &q00.a00); __m256 q0y = _mm256_loadu2_m128(&q10.a10, &q00.a10); __m256 q0z = _mm256_loadu2_m128(&q10.b0, &q00.b0); _mm256_storeu2_m128(&q10.a00, &q00.a00, _mm256_add_ps(q0x, Qx)); _mm256_storeu2_m128(&q10.a10, &q00.a10, _mm256_add_ps(q0y, Qy)); _mm256_storeu2_m128(&q10.b0, &q00.b0, _mm256_add_ps(q0z, Qz)); } else {
çµæã®ã³ãŒãã¯ã倱æããSSE2ã¢ãããŒããããåçŽãç°¡æœãé«éã§ãã

ãã¡ããã8åã®å éã¯éæããŸããã§ãããã2.45åããéæããŸããã§ããã åçãªã€ã³ããã¯ã¹ä»ãã®ããã«äžäŸ¿ãªã¡ã¢ãªã¬ã€ã¢ãŠãã§äœæ¥ããããšãäœåãªãããããããããŒãããã³ã¹ãã¬ãŒãžæäœã¯äŸç¶ãšããŠ4ã¯ã€ãã§ãããèšç®ã¯SIMDã«æé©ã§ã¯ãããŸããã ããããä»ã§ã¯
fillCellQuadrics
ãããã¡ã€ã«ã®ãã€ãã©ã€ã³ã®ããã«ããã¯ã§
fillCellQuadrics
ãªããªããä»ã®æ©èœã«éäžã§ããŸãã
éãŸã£ãŠãåäŸãã¡
ãã¹ãå®è¡ã§4.8ç§ïŒåå®è¡ã§48ããªç§ïŒãç¯çŽããçŸåšã®äž»ãªäŸµå
¥è
ã¯
countTriangles
ã§ãã åçŽãªé¢æ°ã®ããã«èŠããŸããã1åã§ã¯ãªã5åå®è¡ãããŸãã
static size_t countTriangles(const unsigned int* vertex_ids, const unsigned int* indices, size_t index_count) { size_t result = 0; for (size_t i = 0; i < index_count; i += 3) { unsigned int id0 = vertex_ids[indices[i + 0]]; unsigned int id1 = vertex_ids[indices[i + 1]]; unsigned int id2 = vertex_ids[indices[i + 2]]; result += (id0 != id1) & (id0 != id2) & (id1 != id2); } return result; }
ãã®é¢æ°ã¯ããã¹ãŠã®ãœãŒã¹äžè§åœ¢ãå埩åŠçããé ç¹ã®èå¥åãæ¯èŒããŠéçž®éäžè§åœ¢ã®æ°ãèšç®ããŸãã ã®ã£ã¶ãŒåœä»€ã䜿çšããªãéããSIMDã䜿çšããŠäžŠååããæ¹æ³ã¯ããã«ã¯ããããŸããã
AVX2åœä»€ã»ããã¯ãx64 SIMDã«Gather / Scatteråœä»€ãã¡ããªã远å ããŸããã ããããã4ã€ãŸãã¯8ã€ã®å€ãæã€ãã¯ãã«ã¬ãžã¹ã¿ãåãåããåæã«4ã€ãŸãã¯8ã€ã®ããŒããŸãã¯ä¿åæäœãå®è¡ããŸãã ããã§Gatherã䜿çšãããšã3ã€ã®ã€ã³ããã¯ã¹ãããŠã³ããŒãããGatherãäžåºŠã«ïŒãŸãã¯4ãŸãã¯8ã®ã°ã«ãŒãã§ïŒå®è¡ããŠãçµæãæ¯èŒã§ããŸãã Intelããã»ããµã§ã®åéã¯ããããŸã§ããªãäœéã§ãããã詊ããŠã¿ãŸãããã ç°¡åã«ããããã«ã8ã€ã®äžè§åœ¢ã®ããŒã¿ãã¢ããããŒããã以åã®è©Šã¿ãšåãæ¹æ³ã§ãã¯ãã«ã転眮ããåãã¯ãã«ã®å¯Ÿå¿ããèŠçŽ ãæ¯èŒããŸãã
for (size_t i = 0; i < (triangle_count & ~7); i += 8) { __m256 tri0 = _mm256_loadu2_m128( (const float*)&indices[(i + 4) * 3 + 0], (const float*)&indices[(i + 0) * 3 + 0]); __m256 tri1 = _mm256_loadu2_m128( (const float*)&indices[(i + 5) * 3 + 0], (const float*)&indices[(i + 1) * 3 + 0]); __m256 tri2 = _mm256_loadu2_m128( (const float*)&indices[(i + 6) * 3 + 0], (const float*)&indices[(i + 2) * 3 + 0]); __m256 tri3 = _mm256_loadu2_m128( (const float*)&indices[(i + 7) * 3 + 0], (const float*)&indices[(i + 3) * 3 + 0]); _MM_TRANSPOSE8_LANE4_PS(tri0, tri1, tri2, tri3); __m256i i0 = _mm256_castps_si256(tri0); __m256i i1 = _mm256_castps_si256(tri1); __m256i i2 = _mm256_castps_si256(tri2); __m256i id0 = _mm256_i32gather_epi32((int*)vertex_ids, i0, 4); __m256i id1 = _mm256_i32gather_epi32((int*)vertex_ids, i1, 4); __m256i id2 = _mm256_i32gather_epi32((int*)vertex_ids, i2, 4); __m256i deg = _mm256_or_si256( _mm256_cmpeq_epi32(id1, id2), _mm256_or_si256( _mm256_cmpeq_epi32(id0, id1), _mm256_cmpeq_epi32(id0, id2))); result += 8 - _mm_popcnt_u32(_mm256_movemask_epi8(deg)) / 4; }
AVX2ã®
_MM_TRANSPOSE8_LANE4_PS
ãã¯ãã¯
_MM_TRANSPOSE4_PS
ã«çžåœããŸããããã¯æšæºããããŒã«ã¯ãããŸããããç°¡åã«è¡šç€ºãããŸãã 4ã€ã®AVX2ãã¯ãã«ãåãåãã2ã€ã®4Ã4è¡åãäºãã«ç¬ç«ããŠè»¢çœ®ããŸãã
#define _MM_TRANSPOSE8_LANE4_PS(row0, row1, row2, row3) \ do { \ __m256 __t0, __t1, __t2, __t3; \ __t0 = _mm256_unpacklo_ps(row0, row1); \ __t1 = _mm256_unpackhi_ps(row0, row1); \ __t2 = _mm256_unpacklo_ps(row2, row3); \ __t3 = _mm256_unpackhi_ps(row2, row3); \ row0 = _mm256_shuffle_ps(__t0, __t2, _MM_SHUFFLE(1, 0, 1, 0)); \ row1 = _mm256_shuffle_ps(__t0, __t2, _MM_SHUFFLE(3, 2, 3, 2)); \ row2 = _mm256_shuffle_ps(__t1, __t3, _MM_SHUFFLE(1, 0, 1, 0)); \ row3 = _mm256_shuffle_ps(__t1, __t3, _MM_SHUFFLE(3, 2, 3, 2)); \ } while (0)
SSE2 / AVX2åœä»€ã»ããã«ã¯ããã€ãã®æ©èœãããããããã¯ãã«ã転眮ãããšãã«ã¯æµ®åå°æ°ç¹ã¬ãžã¹ã¿ãŒæŒç®ã䜿çšããå¿
èŠããããŸãã ããŒã¿ãå°ãäžæ³šæã«ããŒãããŠããŸãã ããããåºæ¬çã«ã¯éèŠã§ã¯ãããŸãããããã¯ãããã©ãŒãã³ã¹ã®åéã«ãã£ãŠå¶éãããããã§ãã

countTriangles
ã¯çŽ27ïŒ
é«éã«ãªããã²ã©ãCPIïŒåœä»€ãããã®ãµã€ã¯ã«ïŒã«æ°ã¥ããŸãããçŽ4åå°ãªãåœä»€ãéä¿¡ããŸãããåéã«ã¯å€ãã®æéãããããŸãã å
šäœã®äœæ¥ãé«éåãããã®ã¯çŽ æŽãããããšã§ããããã¡ãããããã©ãŒãã³ã¹ã®åäžã¯ããã¶ãèœã¡èŸŒã¿ãŸãã ãããã¡ã€ã«å
ã®
fillCellQuadrics
ã远ãè¶ããŠããªã¹ãã®äžçªäžã«ããæåŸã®é¢æ°ã«ç§»åããŸãããããŸã èŠãŠããŸããã
第6ç« ããã¹ãŠãæ£åžžã«æ©èœãå§ãã
æåŸã®é¢æ°ã¯
computeVertexIds
ã§ãã ã¢ã«ãŽãªãºã ã§ã¯6åå®è¡ããããããæé©åã®åªããç®æšã§ããããŸãã SIMDã§æç¢ºãªæé©åã®ããã«äœæããããšæããã颿°ãåããŠèŠãããŸãã
static void computeVertexIds(unsigned int* vertex_ids, const Vector3* vertex_positions, size_t vertex_count, int grid_size) { assert(grid_size >= 1 && grid_size <= 1024); float cell_scale = float(grid_size - 1); for (size_t i = 0; i < vertex_count; ++i) { const Vector3& v = vertex_positions[i]; int xi = int(vx * cell_scale + 0.5f); int yi = int(vy * cell_scale + 0.5f); int zi = int(vz * cell_scale + 0.5f); vertex_ids[i] = (xi << 20) | (yi << 10) | zi; } }
åã®æé©åã®åŸãäœããã¹ãããç¥ã£ãŠããŸãïŒãµã€ã¯ã«ã4åãŸãã¯8åå±éããŸãã1åã®å埩ã ããé«éåããããšããŠãæå³ããªãããããã¯ãã«æåã転眮ãã䞊è¡ããŠèšç®ãéå§ããŸãã AVX2ã䜿çšããŠãããè¡ããäžåºŠã«8ã€ã®é ç¹ãåŠçããŸãã
__m256 scale = _mm256_set1_ps(cell_scale); __m256 half = _mm256_set1_ps(0.5f); for (size_t i = 0; i < (vertex_count & ~7); i += 8) { __m256 vx = _mm256_loadu2_m128( &vertex_positions[i + 4].x, &vertex_positions[i + 0].x); __m256 vy = _mm256_loadu2_m128( &vertex_positions[i + 5].x, &vertex_positions[i + 1].x); __m256 vz = _mm256_loadu2_m128( &vertex_positions[i + 6].x, &vertex_positions[i + 2].x); __m256 vw = _mm256_loadu2_m128( &vertex_positions[i + 7].x, &vertex_positions[i + 3].x); _MM_TRANSPOSE8_LANE4_PS(vx, vy, vz, vw); __m256i xi = _mm256_cvttps_epi32( _mm256_add_ps(_mm256_mul_ps(vx, scale), half)); __m256i yi = _mm256_cvttps_epi32( _mm256_add_ps(_mm256_mul_ps(vy, scale), half)); __m256i zi = _mm256_cvttps_epi32( _mm256_add_ps(_mm256_mul_ps(vz, scale), half)); __m256i id = _mm256_or_si256( zi, _mm256_or_si256( _mm256_slli_epi32(xi, 20), _mm256_slli_epi32(yi, 10))); _mm256_storeu_si256((__m256i*)&vertex_ids[i], id); }
ãããŠçµæãèŠãŠãã ããïŒ2å
å écomputeVertexIds
ããŸããããã¹ãŠã®æé©åãèæ
®ããŠãããã°ã©ã ã®åèšå®è¡æéã¯çŽ120ããªç§ã«ççž®ãããŸãããããã¯ã1ç§ããã5,000äžã®äžè§åœ¢ã®èšç®ã«çžåœããŸããäºæ³ãããçç£æ§ã®åäžãåã³éæã§ããªãã£ãããã«æãããããããŸãcomputeVertexIds
ãã䞊åååŸã«2å以äžå éãã¹ãã§ã¯ãªãã§ããããããã®è³ªåã«çããããã«ããã®é¢æ°ãå®è¡ããäœæ¥ã®éã確èªããŠã¿ãŸããããcomputeVertexIds
1åã®ããã°ã©ã éå§ã§6åå®è¡ãããŸãããã€ããªæ€çŽ¢äžã«5åãæåŸã«1åå®è¡ãããŠããããªãåŠçã«äœ¿çšãããæçµèå¥åãèšç®ãããŸãããã®é¢æ°ã¯300äžåã®é ç¹ãåŠçãããã³ã«ãåé ç¹ã«å¯ŸããŠ12ãã€ããèªã¿åãã4ãã€ããæžã蟌ã¿ãŸããåèš100åãè¶
ããã€ãããŒã¿ãŒã®å®è¡ã§ããã®é¢æ°ã¯18åã®é ç¹ãåŠçãã21 GBãèªã¿åãã7 GBãæžãæ»ããŸãã 1.46ç§ã§28 GBãåŠçããã«ã¯ã19 GB / sã®ãã¹åž¯åå¹
ãå¿
èŠã§ãããå®è¡ããŠãã¡ã¢ãªåž¯åå¹
ã確èªã§ãmemcmp(block1, block2, 512 MB)
ãŸããçµæã¯45ããªç§ãã€ãŸãã1ã€ã®ã³ã¢ã§çŽ22 GB /ç§ã§ãïŒãã ããAIDA64ãã³ãããŒã¯ã¯ç§ã®ã·ã¹ãã ã§æå€§31 GB /ç§ã®èªã¿åãé床ã瀺ããŸãããè€æ°ã®ã³ã¢ã䜿çšããŸãïŒãå®éãéæå¯èœãªæå€§ã¡ã¢ãªå¶éã«è¿ã¥ããŠãããããã©ãŒãã³ã¹ãããã«åäžãããã«ã¯ããããã®é ç¹ã12ãã€ãæªæºã«æããããã«ããããã®é ç¹ãããå¯ã«ãããã³ã°ããå¿
èŠããããŸãããããã«
æ¯ç§2800äžãã©ã€ã¢ã³ã°ã«ã®é床ã§éåžžã«å€§ããªã°ãªãããç°¡çŽ åãããããªãæé©åãããã¢ã«ãŽãªãºã ãæ¡çšããSSEããã³AVXåœä»€ã»ããã䜿çšããŠãã»ãŒ2åã1ç§ããã5,000äžãã©ã€ã¢ã³ã°ã«ãŸã§é«éåããŸããããã®éçšã§ãSIMDã®ããŸããŸãªäœ¿ç𿹿³ãåŠç¿ããå¿
èŠããããŸããã3ã¯ã€ããã¯ãã«ãæ ŒçŽããã¬ãžã¹ã¿ãSoA転眮ã2ã€ã®3ã¯ã€ããã¯ãã«ãæ ŒçŽããAVX2åœä»€ãã¹ã«ã©ãŒåœä»€ãšæ¯èŒããŠããŒã¿ããŒããé«éåããåœä»€ãåéããæåŸã«ã¹ããªãŒãã³ã°åŠçã«AVX2ãçŽæ¥é©çšããŸãããå€ãã®å ŽåãSIMDã¯æé©åã®æé©ãªåºçºç¹ã§ã¯ãããŸãããã¡ãã·ã¥ãªããã£ãã€ã¶ãŒã¯ããã©ãããã©ãŒã åºæã®åœä»€ã䜿çšããã«ãã¢ã«ãŽãªãºã æé©åãšãã€ã¯ãæé©åã®å€ãã®å埩ãå®è¡ããŸãããããããããæç¹ã§ããããã®å¯èœæ§ã¯äœ¿ãæããããããã©ãŒãã³ã¹ãéèŠãªå ŽåãSIMDã¯å¿
èŠã«å¿ããŠäœ¿çšã§ããçŽ æŽãããããŒã«ã§ãããããã®æé©åãã¡ã€ã³ãã©ã³ãã«åœãŠã¯ãŸãmeshoptimizer
ãã©ããã¯ããããŸãããæçµçã«ãããã¯ãã¢ã«ãŽãªãºã ãæ ¹æ¬çã«å€æŽããã«ã³ãŒããã©ãã ãå éãããã確èªããããã®åãªãå®éšã§ãããã®èšäºãæçã§ãããã³ãŒããæé©åããããã®ã¢ã€ãã¢ãæäŸããŠãããããšãé¡ã£ãŠããŸãããã®èšäºã®æçµæ
å ±æºã¯ãã¡ãã§ãããã®äœæ¥ã¯ãmeshoptimizer 99ab49ã®ããŒãžã§ã³ã«åºã¥ããŠããŸãããããŠã¿ã€ä»ã¢ãã«ãSketchfabã§å
¬éãããŠããŸãã