ãã¹ãŠã®äººã«è¯ãäžæ¥ãïŒ
ããŠãçŽæãããšããã
PHPã³ãŒã¹ã®æºåã§åŠç¿ããå¥ã®è³æãå
±æããŠããŸãã ç§ãã¡ã¯ããªãã«ãšã£ãŠãããè峿·±ãæçšãªãã®ã«ãªãããšãé¡ã£ãŠããŸãã
ãšã³ããªãŒæè¿ã誰ããæ©æ¢°åŠç¿ã«ã€ããŠè©±ããŠããããã§ãã ãœãŒã·ã£ã«ã¡ãã£ã¢ãã£ãŒãã«ã¯ãMLãPythonãTensorFlowãSparkãScalaãGoãªã©ã«é¢ããæçš¿ãè©°ãŸã£ãŠããŸãã ãããŠãããªããšç§ã«å
±éç¹ããããªããPHPã«ã€ããŠã¯ã©ãã§ããïŒ
ã¯ããæ©æ¢°åŠç¿ãšPHPã¯ã©ãã§ããïŒ å¹žããªããšã«ã誰ãããã®è³ªåãããã ãã§ãªããæ¬¡ã®ãããžã§ã¯ãã§äœ¿çšã§ããæ±çšæ©æ¢°åŠç¿ã©ã€ãã©ãªãéçºããã®ã«å€¢äžã«ãªããŸããã ãã®æçš¿ã§ã¯ã
PHPã®æ©æ¢°åŠç¿ã©ã€ãã©ãªã§ãã
PHP-MLãèŠãŠãåŸã§ç¬èªã®ãã£ãããŸãã¯ãã€ãŒããããã«äœ¿çšã§ãã調æ§åæã¯ã©ã¹ãäœæããŸãã ãã®æçš¿ã®äž»ãªç®çã¯æ¬¡ã®ãšããã§ãã
- æ©æ¢°åŠç¿ãšããã¹ãææ
åæã«é¢é£ããäžè¬æŠå¿µã®åŠç¿
- PHP-MLã®æ©èœã𿬠ç¹ã®æŠèŠ
- 解決ããã¿ã¹ã¯ã®å®çŸ©ã
- PHPã§æ©æ¢°åŠç¿ã詊ã¿ãããšã¯çµ¶å¯Ÿã«ã¯ã¬ã€ãžãŒãªç®æšã§ã¯ãªããšãã蚌æ ïŒãªãã·ã§ã³ïŒ
æ©æ¢°åŠç¿ãšã¯äœã§ããïŒæ©æ¢°åŠç¿ã¯ã人工ç¥èœç ç©¶ã®åéã®ãµãã»ããã§ããããã³ã³ãã¥ãŒã¿ãŒã«æ£ç¢ºã«ããã°ã©ãã³ã°ãããªããŠãåŠç¿ããæ©äŒããæäŸããããšã«çŠç¹ãåœãŠãŠããŸãã ããã¯ãç¹å®ã®ããŒã¿ã»ãããããåŠç¿ãã§ããäžè¬çãªã¢ã«ãŽãªãºã ã䜿çšããŠå®çŸãããŸãã
ããšãã°ãæ©æ¢°åŠç¿ã䜿çšããäžè¬çãªæ¹æ³ã®1ã€ã¯åé¡ã§ãã åé¡ã¢ã«ãŽãªãºã ã¯ãç°ãªãã°ã«ãŒããŸãã¯ã«ããŽãªã«ããŒã¿ãé
眮ããããã«äœ¿çšãããŸãã åé¡ã¢ããªã±ãŒã·ã§ã³ã®äŸïŒ
- ã¡ãŒã«ã¹ãã ãã£ã«ã¿ãŒ
- åžå Žã»ã°ã¡ã³ããŒã·ã§ã³ããã±ãŒãž
- è©æ¬ºé²æ¢ã·ã¹ãã
æ©æ¢°åŠç¿ã¯ãããŸããŸãªã¿ã¹ã¯çšã®å€ãã®æ±çšã¢ã«ãŽãªãºã ãå«ãäžè¬çãªçšèªã§ãã ã¢ã«ãŽãªãºã ã«ã¯ãåŠç¿æ¹æ³ã§åé¡ããã2ã€ã®äž»èŠãªã¿ã€ãããããŸããæåž«ãšã®åŠç¿ãšæåž«ãªãã®åŠç¿ã§ãã
æå¡é€ææåž«ãšã®æå°ã§ã¯ãå
¥åãªããžã§ã¯ãïŒãã¯ãã«ïŒã®åœ¢åŒã®ãã¬ãŒãã³ã°ããŒã¿ãšç®çã®åºåå€ã䜿çšããŠã¢ã«ãŽãªãºã ããã¬ãŒãã³ã°ããŸãã ã¢ã«ãŽãªãºã ã¯ãã¬ãŒãã³ã°ããŒã¿ãåæããããããç®ç颿°ãäœæããŸããããã¯ãæ°ããããŒã¯ã®ãªãããŒã¿ã»ããã«é©çšã§ããŸãã
ãã®æçš¿ã®æ®ãã®éšåã§ã¯ãããèŠèŠçã§é¢ä¿ã確èªãããããšããçç±ã ãã§ãæåž«ãšã®ææ¥ã«çŠç¹ãåœãŠãŸãã äž¡æ¹ã®ã¢ã«ãŽãªãºã ãåæ§ã«éèŠã§è峿·±ãããšã«æ³šæããŠãã ããã ä»ã®äººã¯ãæåž«ãªãã§åŠç¿ããããšã¯ããã¬ãŒãã³ã°ããŒã¿ã®å¿
èŠæ§ãæé€ãããããããæçšã§ãããšäž»åŒµããŸãã
æåž«ãªãåŠç¿å¯Ÿç
§çã«ããã®ã¿ã€ãã®ãã¬ãŒãã³ã°ã¯ãæåãããã¬ãŒãã³ã°ããŒã¿ããªããŠãæ©èœããŸãã ããŒã¿ã»ããã®æãŸããçµæå€ãããããªããããã¢ã«ãŽãªãºã ããµã³ãã«ããã®ã¿çµè«ãåŒãåºãããšãèš±å¯ããŸãã æåž«ãªãã§åŠç¿ããããšã¯ãããŒã¿ã®é ãããã¿ãŒã³ãæããã«ããã®ã«ç¹ã«äŸ¿å©ã§ãã
PHP-MLPHPã§ã®æ©æ¢°åŠç¿ãžã®æ°ããã¢ãããŒãã§ãããšäž»åŒµããã©ã€ãã©ãªã§ããPHP-MLã玹ä»ããŸãã ãã®ã©ã€ãã©ãªã¯ãã¢ã«ãŽãªãºã ããã¥ãŒã©ã«ãããã¯ãŒã¯ãããã³ããŒã¿ã®ååŠçãçžäºæ€èšŒãç¹åŸŽæœåºã®ããã®ããŒã«ãå®è£
ããŠããŸãã
èšèªã®é·æã¯æ©æ¢°åŠç¿ã®å®è£
ã«ã¯ããŸãé©ããŠããªããããPHPã¯æ©æ¢°åŠç¿ã«ãšã£ãŠçããéžæã§ããããšã«æåã«æ°ä»ãã§ãããã ãã ãããã¹ãŠã®æ©æ¢°åŠç¿ã¢ããªã±ãŒã·ã§ã³ããã¿ãã€ãã®ããŒã¿ãåŠçããŠå€§èŠæš¡ãªèšç®ãè¡ãå¿
èŠãããããã§ã¯ãããŸãã-åçŽãªã¢ããªã±ãŒã·ã§ã³ã®å ŽåãååãªPHPãšPHP-MLãå¿
èŠã§ã
ç§ãä»ãã®ã©ã€ãã©ãªã§æ³åã§ããæè¯ã®ãŠãŒã¹ã±ãŒã¹ã¯ãã¹ãã ãã£ã«ã¿ã®ãããªãã®ã§ãããããã¹ãã®èª¿æ§ã®åæã§ãããåé¡åšãå®è£
ããããšã§ãã ãããžã§ã¯ãã§PHP-MLã䜿çšããæ¹æ³ãèŠã€ããããã«ãåé¡ã®åé¡ãç¹å®ããæ®µéçã«è§£æ±ºçãèããŠãããŸãã
ææŠããPHP-MLã®å®è£
ããã»ã¹ã説æããã¢ããªã±ãŒã·ã§ã³ã«æ©æ¢°åŠç¿ã远å ããããã«ã解決ãã¹ãè峿·±ãåé¡ãèŠã€ããããšæã£ãŠããŸãããåé¡åãå®èšŒããæè¯ã®æ¹æ³ã¯ããã€ãŒãããŒã³åæã¯ã©ã¹ãäœæããããšã§ãã
æ©æ¢°åŠç¿ãããžã§ã¯ããæåãããããã«å¿
èŠãªéèŠãªèŠä»¶ã®1ã€ã¯ãä¿¡é Œã§ãããœãŒã¹ããŒã¿ã»ããã§ãã ããŒã¿ã»ããã¯ããã§ã«åé¡ãããäŸã§åé¡åšããã¬ãŒãã³ã°ã§ãããããéèŠã§ãã æè¿ãèªç©ºäŒç€Ÿãåãå·»ãã¡ãã£ã¢ã§å€§ããªè©±é¡ãåºãŠããã®ã§ãèªç©ºäŒç€Ÿã®é¡§å®¢ããã®ãã€ãŒããããè¯ããã®ã¯ãªãã§ããããïŒ
幞ããªããšã«ã
Kaggle.ioã®ãããã§ãäžé£ã®ãã€ãŒããšããŠã®ããŒã¿ããã§ã«å©çšã§ããŸãã
ãã®ãªã³ã¯ã䜿çšããŠããŠã§ããµã€ãããUS Airline Sentiment twitterããŒã¿ããŒã¹ãããŠã³ããŒãã§ããŸãã
解決çäœæ¥äžã®ããŒã¿ã»ããã調ã¹ãããšããå§ããŸãããã çããŒã¿ã»ããã«ã¯æ¬¡ã®åããããŸãã
- tweet_id
- Airlines_Sentiment
- Airlines_sentiment_confidence
- åŠå®çãªçç±
- negativereason_confidence
- èªç©ºäŒç€Ÿ
- Airlines_sentiment_gold
- ãåå
- negativereason_gold
- retweet_count
- ããã¹ã
- tweet_coord
- tweet_created
- tweet_location
- user_timezone
ãããŠã
äŸã®ããã«èŠããŸãïŒ

ãã®ãã¡ã€ã«ã«ã¯14,640åã®ãã€ãŒããå«ãŸããŠããŸã-ããã¯ååãªããŒã¿ã»ããã§ãã ããã§ãéåžžã«å€ãã®åã䜿çšã§ããããã«ãªããäŸã«å¿
èŠãªããŒã¿ãããå€ãã®ããŒã¿ãåŸãããŸããã å®çšçãªç®çã®ããã«ãç§ãã¡ã¯æ¬¡ã®åã«ã®ã¿èå³ããããŸãã
- ããã¹ã
- Airlines_sentim
text
ã¯ããããã£ã§ã
airline_sentiment
ã¯ã¿ãŒã²ããã§ãã æ®ãã®åã¯æŒç¿ã«äœ¿çšãããªããããåé€ã§ããŸãã ãããžã§ã¯ããäœæããæ¬¡ã®ãã¡ã€ã«ã䜿çšããŠã³ã¬ã¯ã¿ãŒãåæåããããšããå§ããŸãã
{ "name": "amacgregor/phpml-exercise", "description": "Example implementation of a Tweet sentiment analysis with PHP-ML", "type": "project", "require": { "php-ai/php-ml": "^0.4.1" }, "license": "Apache License 2.0", "authors": [ { "name": "Allan MacGregor", "email": "amacgregor@allanmacgregor.com" } ], "autoload": { "psr-4": {"PhpmlExercise\\": "src/"} }, "minimum-stability": "dev" }
composer install
Composerã®æŠèŠãå¿
èŠãªå Žåã¯ã
ãã¡ããã芧
ãã ãã ã
ãã¹ãŠãæ£ããã€ã³ã¹ããŒã«ããããšã確èªããããã«ã
Tweets.csv
ããŒã¿
Tweets.csv
ãèªã¿èŸŒãã¯ã€ãã¯ã¹ã¯ãªãããäœæããå¿
èŠãªããŒã¿ãå«ãŸããŠããããšã確èªããŸãã æ¬¡ã®ã³ãŒãã
reviewDataset.php
ãšããŠãããžã§ã¯ãã®ã«ãŒãã«ã³ããŒããŸãã
<?php namespace PhpmlExercise; require __DIR__ . '/vendor/autoload.php'; use Phpml\Dataset\CsvDataset; $dataset = new CsvDataset('datasets/raw/Tweets.csv',1); foreach ($dataset->getSamples() as $sample) { print_r($sample); }
次ã«ã
reviewDataset.php
ã¹ã¯ãªãããå®è¡ããŠãçµæã確èªããŸãã
Array( [0] => 569587371693355008 ) Array( [0] => 569587242672398336 ) Array( [0] => 569587188687634433 ) Array( [0] => 569587140490866689 )
ä»ã®ãšãã䟿å©ã«èŠããŸãããïŒ
CsvDataset
ã¯ã©ã¹ãèŠãŠãå
éšã§äœãèµ·ãã£ãŠããããããããçè§£ããŸãããã
<?php public function __construct(string $filepath, int $features, bool $headingRow = true) { if (!file_exists($filepath)) { throw FileException::missingFile(basename($filepath)); } if (false === $handle = fopen($filepath, 'rb')) { throw FileException::cantOpenFile(basename($filepath)); } if ($headingRow) { $data = fgetcsv($handle, 1000, ','); $this->columnNames = array_slice($data, 0, $features); } else { $this->columnNames = range(0, $features - 1); } while (($data = fgetcsv($handle, 1000, ',')) !== false) { $this->samples[] = array_slice($data, 0, $features); $this->targets[] = $data[$features]; } fclose($handle); }
CsvDataset
ã³ã³ã¹ãã©ã¯ã¿ãŒã¯ã3ã€ã®åŒæ°ãåããŸãã
- ãœãŒã¹CSVãã¡ã€ã«ãžã®ãã¹
- ãã¡ã€ã«å
ã®ããããã£ã®æ°ãæå®ããæŽæ°
- æåã®è¡ãããããŒãã©ããã瀺ãããŒã«
ããå°ã詳ããèŠãŠã¿ããšãã¯ã©ã¹ãCSVãã¡ã€ã«ã2ã€ã®å
éšé
åïŒãµã³ãã«ãšã¿ãŒã²ããïŒã«åå²ããŠããããšãããããŸãã ãµã³ãã«ã«ã¯ãã¡ã€ã«ã«ãã£ãŠæäŸããããã¹ãŠã®é¢æ°ãå«ãŸããã¿ãŒã²ããã«ã¯æ¢ç¥ã®å€ïŒè² ãæ£ããŸãã¯äžç«ïŒãå«ãŸããŸãã
äžèšã«åºã¥ããŠãCSVãã¡ã€ã«ã®åœ¢åŒã¯æ¬¡ã®ãšããã§ããããšãããããŸãã
| feature_1 | feature_2 | feature_n | target |
äœæ¥ãç¶ç¶ããå¿
èŠãããåã®ã¿ã䜿çšããŠãã¯ãªãŒã³ãªããŒã¿ã»ãããäœæããå¿
èŠããããŸãã ãã®ã¹ã¯ãªããã
generateCleanDataset.php
ãšåŒã³ãŸãããïŒ
<?php namespace PhpmlExercise; require __DIR__ . '/vendor/autoload.php'; use Phpml\Exception\FileException; $sourceFilepath = __DIR__ . '/datasets/raw/Tweets.csv'; $destinationFilepath = __DIR__ . '/datasets/clean_tweets.csv'; $rows =[]; $rows = getRows($sourceFilepath, $rows); writeRows($destinationFilepath, $rows); function getRows($filepath, $rows) { $handle = checkFilePermissions($filepath); while (($data = fgetcsv($handle, 1000, ',')) !== false) { $rows[] = [$data[10], $data[1]]; } fclose($handle); return $rows; } function checkFilePermissions($filepath, $mode = 'rb') { if (!file_exists($filepath)) { throw FileException::missingFile(basename($filepath)); } if (false === $handle = fopen($filepath, $mode)) { throw FileException::cantOpenFile(basename($filepath)); } return $handle; } function writeRows($filepath, $rows) { $handle = checkFilePermissions($filepath, 'wb'); foreach ($rows as $row) { fputcsv($handle, $row); } fclose($handle); }
ä»äºãããã®ã«ååãªè€éãã¯ãããŸããã
phpgenerateCleanDataset.php
å®è¡ããŠã¿ãŸãããã
ããã§ã¯ã次ã«é²ã¿ãreviewDataset.phpã¹ã¯ãªãããã¯ãªãŒã³ãªããŒã¿ã»ããã«åããŸãã
Array ( [0] => @AmericanAir That will be the third time I have been called by 800-433-7300 an hung on before anyone speaks. What do I do now??? ) Array ( [0] => @AmericanAir How clueless is AA. Been waiting to hear for 2.5 weeks about a refund from a Cancelled Flightled flight & been on hold now for 1hr 49min )
ãã ïŒ ãããç§ãã¡ãæ±ãããšãã§ããããŒã¿ã§ãïŒ ãããŸã§ãããŒã¿ã管çããããã®ç°¡åãªã¹ã¯ãªãããäœæããŸããã æ¬¡ã«ã
src/class/SentimentAnalysis.php
æ°ããã¯ã©ã¹ãäœæããŸãã
<?php namespace PhpmlExercise\Classification; class SentimentAnalysis { public function train() {} public function predict() {} }
Sentimentã¯ã©ã¹ã«ã¯ã調æ§åæã¯ã©ã¹ã®2ã€ã®é¢æ°ãå¿
èŠã§ãã
- ããŒã¿ã»ããã®ãµã³ãã«ãšã©ãã«ãããã³ããã€ãã®è¿œå ãã©ã¡ãŒã¿ãŒãååŸããåŠç¿æ©èœã
- ã©ãã«ä»ããããŠããªãããŒã¿ã»ãããåãåãããã¬ãŒãã³ã°ããŒã¿ã«åºã¥ããŠã©ãã«ã®ã»ãããå²ãåœãŠãäºæž¬é¢æ°ã
ãããžã§ã¯ãã®ã«ãŒãã§ã
classifyTweets.php
ã¹ã¯ãªãããäœæããŸãã ããã䜿çšããŠãäž»èŠãªåæã¯ã©ã¹ãäœæããã³ãã¹ãããŸãã 䜿çšãããã³ãã¬ãŒãã¯æ¬¡ã®ãšããã§ãã
<?php namespace PhpmlExercise; use PhpmlExercise\Classification\SentimentAnalysis; require __DIR__ . '/vendor/autoload.php';
ã¹ããã1.ããŒã¿ã»ãããããŠã³ããŒãããåã®äŸããCSVãããŒã¿ãªããžã§ã¯ãã«ããŒãããããã«äœ¿çšã§ããåºæ¬çãªã³ãŒãã¯æ¢ã«çšæãããŠããŸãã ããã€ãã®å°ããªå€æŽãå ããŠåãã³ãŒãã䜿çšããŸãã
<?php ... use Phpml\Dataset\CsvDataset; ... $dataset = new CsvDataset('datasets/clean_tweets.csv',1); $samples = []; foreach ($dataset->getSamples() as $sample) { $samples[] = $sample[0]; }
ããã«ãããããããã£ã®ã¿ãå«ãé
åãäœæãããŸãããã®å Žåãåé¡åã®ãã¬ãŒãã³ã°ã«äœ¿çšãããã€ãŒãããã¹ãã§ãã
ã¹ããã2ïŒããŒã¿ã»ããã®æºåãã€ãŒãã¯äºãã«å€§ããç°ãªããããçã®ããã¹ããåé¡åã«æž¡ããšãå©çãšæ£ç¢ºãã倱ãããŸãã 幞ããªããšã«ãåé¡ã¢ã«ãŽãªãºã ãŸãã¯æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãé©çšããããšãããšãã«ããã¹ããæäœããæ¹æ³ããããŸãã ãã®äŸã§ã¯ã次ã®2ã€ã®ã¯ã©ã¹ã䜿çšããŸãã
- ããŒã¯ã³ã«ãŠã³ããã¯ãã©ã€ã¶ãŒïŒãã®ã¯ã©ã¹ã¯ãããã¹ããµã³ãã«ã®ã³ã¬ã¯ã·ã§ã³ãããŒã¯ã³ã«ãŠã³ããã¯ãã«ã«å€æããŸãã å®éããã€ãŒãå
ã®ååèªã¯äžæã®çªå·ã«ãªããç¹å®ã®ããã¹ããµã³ãã«å
ã®åèªã®åºçŸåæ°ã远跡ãããŸãã
- Tf-idfãã©ã³ã¹ãã©ãŒããŒïŒçšèªé »åºŠã®ç¥èª-éææžé »åºŠã¯ãææžã®ã³ã¬ã¯ã·ã§ã³ãŸãã¯ã³ãŒãã¹ã®äžéšã§ããææžã®ã³ã³ããã¹ãã§åèªã®éèŠæ§ãè©äŸ¡ããããã«äœ¿çšãããæ°å€çµ±èšã§ãã
ããã¹ããã¯ãã©ã€ã¶ãŒããå§ããŸãããïŒ
<?php ... use Phpml\FeatureExtraction\TokenCountVectorizer; use Phpml\Tokenization\WordTokenizer; ... $vectorizer = new TokenCountVectorizer(new WordTokenizer()); $vectorizer->fit($samples); $vectorizer->transform($samples);
次ã«ãTf-idfãã©ã³ã¹ãã©ãŒããŒãé©çšããŸãã
<?php ... use Phpml\FeatureExtraction\TfIdfTransformer; ... $tfIdfTransformer = new TfIdfTransformer(); $tfIdfTransformer->fit($samples); $tfIdfTransformer->transform($samples);
ãµã³ãã«ã®é
åã¯ãåé¡åšãçè§£ã§ãã圢åŒã«ãªããŸããã ãŸã çµäºããŠããŸãããé©åãªã ãŒãã§åãµã³ãã«ãããŒã¯ããå¿
èŠããããŸãã
ã¹ããã3.ãã¬ãŒãã³ã°ããããäœæãã幞ããªããšã«ãPHP-MLã¯æ¢ã«ãããè¡ãæ¹æ³ãç¥ã£ãŠãããã³ãŒãã¯éåžžã«åçŽã§ãã
<?php ... use Phpml\Dataset\ArrayDataset; ... $dataset = new ArrayDataset($samples, $dataset->getTargets());
ãã®ããŒã¿ã»ããã䜿çšããŠãåé¡åšããã¬ãŒãã³ã°ã§ããŸãã ãã ãããã¹ããšããŠäœ¿çšããã®ã«ååãªãã¹ãããŒã¿ã»ããããªããããåæããŒã¿ã»ããããæž¬å®ãããŠ2ã€ã«åå²ããŸãïŒãã¬ãŒãã³ã°è³æã®ã»ãããšãã¢ãã«ã®ç²ŸåºŠãæ€èšŒããããã«äœ¿çšãããã¯ããã«å°ããªããŒã¿ã»ããã§ãã
<?php ... use Phpml\CrossValidation\StratifiedRandomSplit; ... $randomSplit = new StratifiedRandomSplit($dataset, 0.1); $trainingSamples = $randomSplit->getTrainSamples(); $trainingLabels = $randomSplit->getTrainLabels(); $testSamples = $randomSplit->getTestSamples(); $testLabels = $randomSplit->getTestLabels();
ãã®ã¢ãããŒãã¯ãçžäºæ€èšŒãšåŒã°ããŸãã ãã®çšèªã¯çµ±èšã«åºã¥ããŠãããæ¬¡ã®ããã«å®çŸ©ã§ããŸãã
çžäºæ€èšŒïŒçžäºæ€èšŒãããŒãªã³ã°å¶åŸ¡ãè±èªã®çžäºæ€èšŒïŒ-ç¬ç«ããããŒã¿ã§ã®åæã¢ãã«ãšãã®åäœãè©äŸ¡ããæ¹æ³ã ã¢ãã«ãè©äŸ¡ãããšããå©çšå¯èœãªããŒã¿ã¯kåã®éšåã«åå²ãããŸãã æ¬¡ã«ãããŒã¿ã®k-1éšåã§ã¢ãã«ãã¬ãŒãã³ã°ãå®è¡ãããæ®ãã®ããŒã¿ããã¹ãã«äœ¿çšãããŸãã æé ã¯kåç¹°ãè¿ãããŸãã æçµçã«ãkåã®ããŒã¿ã®ããããããã¹ãã«äœ¿çšãããŸãã ãã®çµæãéžæããã¢ãã«ã®æå¹æ§ãè©äŸ¡ãããå©çšå¯èœãªããŒã¿ãæãåäžã«äœ¿çšãããŸãã
-Wikipedia.com
ã¹ããã4ïŒåé¡åšã®ãã¬ãŒãã³ã°æåŸã«ãSentimentAnalysisã¯ã©ã¹ã«æ»ã£ãŠå®è£
ããæºåãæŽããŸããã æ°ã¥ããŠããªãå Žåãæ©æ¢°åŠç¿ã®å€§éšåã¯ããŒã¿ã®åéãšåŠçã«é¢é£ããŠããŸãã æ©æ¢°åŠç¿ã¢ãã«ã®å®éã®å®è£
ã¯ãããã»ã©è€éã§ã¯ãªãåŸåããããŸãã
ã ãŒãåæã¯ã©ã¹ãå®è£
ãã3ã€ã®åé¡ã¢ã«ãŽãªãºã ããããŸãã
- ãµããŒããã¯ã¿ãŒæ³
- K-Nearest Neighborã¡ãœããïŒKNearestNeighborsïŒ
- Naive Bayes ClassifierïŒNaiveBayesïŒ
ãã®æŒç¿ã§ã¯ãæãåçŽãªNaiveBayesåé¡åã䜿çšããŸãããã®ãããåŠç¿æ©èœãå®è£
ããããã«ã¯ã©ã¹ãç¶ç¶ããŠå€æŽããŸãã
<?php namespace PhpmlExercise\Classification; use Phpml\Classification\NaiveBayes; class SentimentAnalysis { protected $classifier; public function __construct() { $this->classifier = new NaiveBayes(); } public function train($samples, $labels) { $this->classifier->train($samples, $labels); } }
ã芧ã®ãšãããPHP-MLã«ãã¹ãŠã®ããŒãã¯ãŒã¯ãä»»ããŸãã ç§ãã¡ã¯ç§ãã¡ã®ãããžã§ã¯ãã®ããã«å°ãæœè±¡åããã ãã§ãã ããããåé¡åšãæ¬åœã«èšç·ŽãããŠæ©èœãããã©ãããã©ã®ããã«ããŠç¥ãã®ã§ããããïŒ
testSamples
ãš
testLabels
ã䜿çšããæéã
ã¹ããã5ïŒåé¡åšã®ç²ŸåºŠã®æ€èšŒ
åé¡åã®ãã¹ããç¶è¡ããåã«ãäºæž¬ã¡ãœãããå®è£
ããå¿
èŠããããŸãã
<?php ... class SentimentAnalysis { ... public function predict($samples) { return $this->classifier->predict($samples); } }
ãŸããPHP-MLã圹ç«ã¡ãŸãã classifyTweetsã¯ã©ã¹ã次ã®ããã«å€æŽããŸãããã
<?php ... $predictedLabels = $classifier->predict($testSamples);
æåŸã«ãèšç·Žãããã¢ãã«ã®ç²ŸåºŠããã¹ãããæ¹æ³ãå¿
èŠã§ãã 幞ããªããšã«ãPHP-MLããããã«ããŒããŠãããããã€ãã®ã¡ããªãã¯ã¯ã©ã¹ããããŸãã ãã®å Žåãã¢ãã«ã®ç²ŸåºŠã«é¢å¿ããããŸãã ã³ãŒããèŠãŠã¿ãŸãããïŒ
<?php ... use Phpml\Metric\Accuracy; ... echo 'Accuracy: '.Accuracy::score($testLabels, $predictedLabels);
次ã®ãããªãã®ã衚瀺ãããã¯ãã§ãã
Accuracy: 0.73651877133106%
ãããã«ãã®èšäºã¯å€§ããããããšãããã£ãã®ã§ãåŠãã ããšãç¹°ãè¿ããŸãããã
- æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã®å®è£
ã«ã¯ãæåããé©åãªããŒã¿ã»ãããçšæããããšãéèŠã§ãã
- æåž«ãããšæåž«ãªãã®åŠç¿ã®éãã
- æ©æ¢°åŠç¿ã«ãããçžäºæ€èšŒã®æå³ãšäœ¿çšã
- ãã®ãã¯ãã«åãšå€æã¯ãæ©æ¢°åŠç¿çšã®ããã¹ãããŒã¿ã»ããã®æºåã«å¿
èŠã§ãã
- NaiveBayes PHP-MLåé¡åã䜿çšããŠãã€ãŒãææ
åæãå®è£
ããæ¹æ³ã
ãã®æçš¿ã¯ãPHP-MLã©ã€ãã©ãªã®ç޹ä»ã§ããããã©ã€ãã©ãªãã§ããããšãšãã©ã€ãã©ãªããããžã§ã¯ãã«çµ±åããæ¹æ³ã«ã€ããŠã®ã¢ã€ãã¢ãæäŸããŠãããããšãé¡ã£ãŠããŸãã
æåŸã«ããã®æçš¿ã¯æ±ºããŠå
æ¬çãªãã®ã§ã¯ãªããåŠã³ãæ¹åããå®éšããå€ãã®æ©äŒããããŸãã æ¹åæ§ã®éçºæ¹æ³ã«ã€ããŠè©±ãå§ããããã®ã¢ã€ãã¢ãããã€ã玹ä»ããŸãã
- NaiveBayesã¢ã«ãŽãªãºã ããµããŒããã¯ã¿ãŒã¡ãœããã«çœ®ãæããŸãã
- å®å
šãªããŒã¿ã»ããïŒ14,000è¡ïŒãå®è¡ããããšãããšãããããã¡ã¢ãªã®è² è·ãã©ãã ãå¢å ããããããããŸãã åå®è¡ã§ãã¬ãŒãã³ã°ããå¿
èŠããªãããã«ãã¢ãã«ã®äžå€æ§ãå®çŸããŠãã ããã
- ããŒã¿ã»ããçæãç¬èªã®ãã«ããŒã¯ã©ã¹ã«ç§»åããŸãã
çµãã
ãã€ãã®ããã«ãç§ãã¡ã¯ããªãã®ã³ã¡ã³ããæèŠãåŸ
ã£ãŠããŸãã