PHPのUTF-8。 パート1

こんにちは、この投稿では、誰もが「コーシャ」UTF-8エンコーディングを使用する明るい未来をもたらしたいと思います。 特に、これは私に最も近い環境-Webとプログラミング言語-PHPに当てはまります。シリーズの最後に、実用的な部分に取り組み、別の自転車ライブラリを開発します。

1.はじめに


さらなるテキストを理解するために、初心者は一般的なエンコーディングに関するいくつかの詳細を知る必要があります。 できるだけ資料の表示を簡素化するようにします。 ビット単位の操作について何も知らない人は、まずWikipediaの資料に慣れる必要があります。

コンピューターは数字で動作し、文字列(およびその一部としての文字)も数値形式で保存することを理解することから始める必要があります。 これらの目的のために、エンコードがあります。 実際、これらは数字と記号の対応が示されている表です。 歴史的に、メインのASCIIエンコーディングには、合計128の制御コードとラテン文字のみが含まれています(127は7ビットで保存できる最大数です)。

他のASCIIテキストを保存するために、8番目のビットが追加された他の多くのエンコードが作成されました。 すでに最大256文字を保存できます。最初の128文字は従来ASCIIに対応していましたが、残りの部分では、誰もが望むものをすべて押し込んでしまいました。 そして、オペレーティングシステムの各メーカーが独自のエンコーディングセットを所有しており、各メーカーが比較的狭い範囲の人々のニーズしか満たしていないことがありました。 共通の標準が存在しないため状況はさらに複雑になり、アルゴリズム的にそれらを区別することができなくなり、推測に似るようになりました(詳細については次の部分で説明します)。

その結果、すべての可能な文字を保存し、さまざまな人々の文章の違い(たとえば、文章の方向)を考慮に入れることができるエンコーディングであるユニバーサル出力が必要になりました。 タスクは、ユニコードを作成することで解決されました。ユニコードは、1つのエンコードで世界中のほぼすべての書記体系をエンコードできます。

Webで最も人気のあるエンコーディングはUTF-8で、これには多くの重要な利点があります。

最後の点についてお話したいと思います。 これは、以前にテーブルで簡単な変換を実行して結果を記録できた場合、その結果を保存するために必要なビット深度に応じて、この結果を保存する方法が定義されたことを意味します。 例として、表(x-保存されたデータビット)に保存原則を見ることができます:
ビット最大保存値1オクテット2オクテット3オクテット4オクテット
開始オクテット継続オクテット
7U + 007F0xxxxxxx
11U + 07FF110xxxxx10xxxxxx
16U + FFFF1110xxxx10xxxxxx10xxxxxx
21U + 10FFFF(標準ではありますが、実際にはU + 1FFFFF)11110xxx10xxxxxx10xxxxxx10xxxxxx


最初のオクテットの高位ビットには、シーケンスのバイト数を示すカウンターが常にあることに気付くのは簡単です。これは、先頭の単位の数にゼロが続くことです。 注:オクテットが1つしかない場合、先行ユニットは示されないため、最初のオクテットと継続するオクテットを区別するのは簡単です。

例として、文字列「Hi Hi」がUTF-8エンコーディングでどのように見えるかを見てみましょう。

最初のステップ。 に従って、各文字をその数値表現に変換します(16進数システムを使用します)。

Hi Hi = 0x041F 0x0440 0x0438 0x0432 0x044D 0x0442 0x0020 0x0048 0x0069
スペースもシンボルであることを忘れないでください。

ステップ2 数値を16進数から2進数に変換します。 Windows 7計算機(プログラマモード)を使用します。

0x041F = 0000 0100 0001 1111
0x0440 = 0000 0100 0100 0000
0x0438 = 0000 0100 0011 1000
0x0432 = 0000 0100 0011 0010
0x0435 = 0000 0100 0011 0101
0x0442 = 0000 0100 0100 0010
0x0020 = 0010 0000
0x0048 = 0100 1000
0x0069 = 0110 1001
わかりやすくするために、上位桁にゼロを追加しました。 注:文字は異なるバイト数を占める場合があります。

ステップ3 数値表現をUTF-8オクテットシーケンスに変換します。

0x041F = 100 0001 1111 = 110 xxxxx 10xxxxxx = 110 10000 10 011111
0x0440 = 100 0100 0000 = 110 xxxxx 10xxxxxx = 110 10001 10 000 000
0x0438 = 100 0011 1000 = 110 xxxxx 10xxxxxx = 110 10000 10 111000
0x0432 = 100 0011 0010 = 110 xxxxx 10xxxxxx = 110 10000 10 110010
0x0435 = 100 0011 0101 = 110 xxxxx 10xxxxxx = 110 10000 10110101
0x0442 = 100 0100 0010 = 110 xxxxx 10xxxxxx = 110 10001 10 000010
0x0020 = 010 0000 = 0 xxxxxx = 0 0100000
0x0048 = 100 1000 = 0 xxxxxx = 0 1001000
0x0069 = 110 1001 = 0 xxxxxx = 0 1101001
カウンターは太字です。 注:0x0080までのコードを持つ文字は変更されずに保存されます。これはASCII互換です。 UTF-8は、ロシア語のテキストの場合、1バイトのみを使用するWindows-1251の2倍のスペース(2バイト)を占有することも理解しておく必要があります。

解決策として、シーケンス全体を連続して書き込むことができます(エラーが発生しないことを願っています):「11010000 10011111 11010001 10000000 11010000 10111000 11010000 10110010 11010000 10110101 11010001 10000010 00100000 01001000 01101001」。

コードを使用してソリューションを確認できます。
$ tmp = '' ;
foreach 爆発 '' '11010000 10011111 11010001 10000000 11010000 10111000 11010000 10110010 11010000 10110101 11010001 10000010 00100000 01001000 01101001' as $オクテット {
$ tmp 。= chr bindec $ octet )) ;
}
echo $ tmp ;


コードで逆の操作を実行するには、(簡略化された)が必要です。
  1. 最初の文字のオク​​テット数を決定し、この値を保存します。
  2. 最初のバイトからオクテットカウンターを破棄し、残りを保存します。
  3. 1オクテット以上のシーケンスで、演算後の余りを6ビット左にシフトし、後続のオクテットの下位6ビットから情報を書き込みます。
  4. 満足するまで1ポイントから繰り返します:)。


文字の数値表現と逆演算を取得できる最適化されたPHPコード(サイクルの最後に完全版を公開します):
Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  1. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  2. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  3. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  4. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  5. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  6. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  7. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  8. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  9. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  10. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  11. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  12. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  13. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  14. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  15. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  16. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  17. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  18. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  19. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  20. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  21. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  22. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  23. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  24. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  25. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  26. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  27. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  28. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  29. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  30. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  31. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  32. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  33. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  34. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  35. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  36. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  37. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  38. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  39. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  40. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  41. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  42. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  43. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  44. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  45. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  46. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  47. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  48. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  49. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  50. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  51. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  52. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  53. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  54. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  55. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  56. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  57. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  58. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  59. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  60. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  61. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  62. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  63. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  64. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  65. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  66. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  67. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  68. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  69. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  70. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  71. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  72. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  73. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  74. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  75. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  76. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  77. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  78. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  79. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  80. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  81. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  82. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  83. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  84. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  85. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  86. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  87. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  88. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  89. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }

getChar()メソッドはJevixライブラリから取得したもので、とにかくこのコードを見たことがあり、それをよく覚えており、メモリから実装したとしても、作者は言うまでもありません。

コードを使用して、結果のクラスをテストできます。
Copy Source | Copy HTML
  1. //オブジェクトのインスタンスを作成します
  2. $ obj = new String_Multibyte();
  3. //テストに最も便利な方法でラインを形成します
  4. $ tmp = '' ;
  5. foreachexplode'''11010000 10011111 11010001 10000000 11010000 10111000 11010000 10110010 11010000 10110101 11010001 10000010 00100000 01001000 01101001'as $オクテット ){
  6. $ tmp 。= chr(bindec( $ octet ));
  7. }
  8. //文字コードのマップを作成します
  9. $ map = array ();
  10. $ len = strlen( $ tmp );
  11. for$ i = 0 ; $ i < $ len ; $ i ++){
  12. iftrue ==( $ result = $ obj- > getCodePoint( $ tmp$ i ))){
  13. $ map [] = $ result ;
  14. }
  15. }
  16. //行をクリアして、マップから復元します
  17. $ tmp = '' ;
  18. $ count = count( $ map );
  19. for$ i = 0 ; $ i < $ count ; $ i ++){
  20. $ tmp 。= $ obj- > getChar( $ map [ $ i ]);
  21. }
  22. //復元された文字列を出力します
  23. echo $ tmp'<br />' .EOL;
  24. //有効性を確認します(これが最も簡単な方法です)
  25. echo preg_match( '#。{1} #u'$ tmp )? 「有効なUnicode」「不明」「<br />」 .EOL;
私はテストのために最も美しいコードや正しいコードを作成しようとしませんでしたが、それを使用すると、文字の値を少しずつ静かに変更し、すぐに結果を確認できます。 無効なシーケンスはすべて無視され、出力文字列は常に有効ですが、まだまだあります。

テキストに余分なものが含まれていないことを確認するには、テキストから不要な(印刷不能、非マーキング、不定、サロゲートなど)文字を削除し、次のパートでさらに正規化を実行する必要があります。

PS:
さらに、正規化、安全性、コーディングの決定、およびPHPでのUTF-8の使用についても説明します。

参照:

Source: https://habr.com/ru/post/J113715/


All Articles