テキスト内の単語を数える

こんにちは、Habr! つい最近、 ロシア文学の作品では「o」という文字が他の文字よりも人気があることを知り、指定されたテキストから最も使用される単語のリストを作成する簡単なスクリプトを書くという長年の考えをすぐに思い出しました。

英語でテキストを読むことが必要になることもありますが、私の語彙はすべてがその場で理解できるほど豊富ではないため、辞書の使用に気を取られなければならないことがよくあります。 多くの単語が非常に頻繁に見つかりますが、単語の翻訳に最初に知り合った後で頭の中に追い込むことは常に可能ではありません。 そして、この奇跡のトップはここで救助に来ます。 入力、ソーステキスト、出力、最も使用頻度の高いN個の単語のリストなど、すべてが非常に単純です。これらの単語は、後で翻訳者をたたき、テキストの辞書を取得します。

PHPだけが私の頭の中に痕跡を残していたので、スクリプトに必要なのは結果だけであるため、それを書くことにしました。
  1. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  2. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  3. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  4. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  5. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  6. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  7. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  8. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  9. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  10. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  11. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  12. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  13. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  14. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  15. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  16. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  17. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  18. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  19. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  20. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  21. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  22. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  23. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  24. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  25. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  26. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  27. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  28. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  29. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  30. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  31. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  32. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  33. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  34. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  35. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  36. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }
  37. #!/usr/bin/php <?php if (!isset($argv[1])) { die( 'Usage: ./wtop filename [lines] [filter]' . PHP_EOL . 'Use -1 lines for show all words' . PHP_EOL); } if (!file_exists($argv[1]) || !is_readable($argv[1])) { die( 'Data file not found or can not be read . PHP_EOL' ); } if (isset($argv[3])) { if (file_exists($argv[3]) && is_readable($argv[3])) { $filter = str_word_count(file_get_contents($argv[3]), 1); } else { die( 'Filter file not found or can not be read' . PHP_EOL); } } $lines = (isset($argv[2])) ? ( int ) $argv[2] : -1; $data = file_get_contents($argv[1]); $words = str_word_count($data, 1); foreach ($words as $word) { $word = strtolower($word); if (isset($filter) && in_array($word, $filter)) { continue ; } if (isset($result[$word])) { $result[$word] += 1; } else { $result[$word] = 1; } } arsort(&$result); foreach ($result as $word => $count) { if ($lines-- == 0) { break ; } echo $count . ' ' . $word . PHP_EOL; }

作業例:
stream@sapphire:~/development$ cat text.txt
With PHP breaking new ground in the enterprise arena, the establishment of a rati-
fied certification was, some might say, inevitable. However, for me, it couldn't come
soon enough—and I was ecstatic when Zend launched their PHP 4 Certification.
With more than 1,500 certified engineers to date, there is no doubt that their en-
deavour has been a success.
Now, with the introduction of the long-awaited PHP 5 certification, Zend has once
again raised the bar for PHP developers everywhere. This examination is much
broader, and requires much more than just theoretical knowledge—in order to pass
the test, candidates need real-world knowledge in addition to a solid theoretical
background.
The effect of the PHP 5 certification, for me, is even more profound than that of
the original certification, and I believe that it will become the gold standard for those
looking to hire PHP-centric Web Developers. I think that it is apt to consider Zend's
work a job well done, and to applaud those who invest the time and effort needed to
become Zend Certified Engineers.
stream@sapphire:~/development$ cat filter.txt
a the
am are
i you we
stream@sapphire:~/development$ ./wtop text.txt 10 filter.txt
7 to
5 and
5 certification
5 php
4 for
4 is
4 that
4 of
4 zend
3 more
stream@sapphire:~/development$


誰かが助けてくれることを願っています。 頑張って

更新:小さなコードを書き直し(DevManに感謝)、フィルターのサポートを追加しました。

Source: https://habr.com/ru/post/J92770/


All Articles