ãã®èšäºã§ã¯ã AnyEvent :: HTTPã䜿çšããŠå£ãããªã³ã¯ãæ€çŽ¢ããã¹ã¯ãªããã®äŸã䜿çšããŠã颿°åããã°ã©ãã³ã°ãæ€èšããŸã ã æ¬¡ã®ãããã¯ã«ã€ããŠèª¬æããŸãã
- å¿åã«ãŒãã³;
- éé
- ã³ãŒã«ããã¯
å¿åã«ãŒãã³
ç¡åãµãã«ãŒãã³ã¯éåžžã®ãµãã«ãŒãã³ãšåãããã«å®£èšãããŸããã sub
ããŒã¯ãŒããšããã°ã©ã ãããã¯ã®éå§ãã©ã±ããã®éã«ååã¯ãããŸããã ããã«ããã®åœ¢åŒã®èšè¿°ã¯åŒã®äžéšãšã¿ãªããããããã»ãšãã©ã®å Žåãå¿åãµãã«ãŒãã³ã®å®£èšã¯ã»ãã³ãã³ãŸãã¯ä»ã®åŒåºåãæåã§çµäºããå¿
èŠããããŸãã
sub { ... ... };
ããšãã°ãæž¡ãããå€ã3åã«ãããµãã«ãŒãã³ãå®è£
ããŸãã
my $triple = sub { my $val = shift; return 3 * $val; }; say $triple->(2);
ç¡åã«ãŒãã³ã®äž»ãªå©ç¹ã¯ããããŒã¿ãšããŠã®ã³ãŒããã®äœ¿çšã§ãã ã€ãŸããã³ãŒãã倿°ã«ä¿åãïŒããšãã°ãã³ãŒã«ããã¯ã®å Žåã¯é¢æ°ã«æž¡ããŸãïŒãããã«å®è¡ããŸãã
ãŸããã³ãŒã«ããã¯ãšã®çµã¿åãããå«ããå¿åã«ãŒãã³ã䜿çšããŠååž°ãäœæã§ããŸãã ããšãã°ãPerlããŒãžã§ã³5.16.0
ã§ç»å ŽããçŸåšã®ãµãã«ãŒãã³ãžã®ãªã³ã¯ãååŸã§ãã__SUB__
ããŒã¯ã³ã䜿çšããŠãéä¹èšç®ãå®è£
ããŸãã
use 5.16.0; my $factorial = sub { my $x = shift; return 1 if $x == 1; return $x * __SUB__->($x - 1); }; say $factorial->(5);
å£ãããªã³ã¯ãèŠã€ããåé¡ãæ€èšãããšããã³ãŒã«ããã¯ãšçµã¿åãããŠååž°ã䜿çšããäŸã以äžã«ç€ºããŸãã
éé
ãŠã£ãããã£ã¢ã«èšèŒãããŠãããšãã
ã¯ããŒãžã£ãŒã¯æåã®ã¯ã©ã¹ã®é¢æ°ã§ããããã®æ¬äœã«ã¯ãåšå²ã®ã³ãŒãã§ãã®é¢æ°ã®æ¬äœã®å€åŽã§å®£èšããããã®ãã©ã¡ãŒã¿ãŒã§ã¯ãªã倿°ãžã®åç
§ããããŸãã
å®éãã¯ããŒãžã£ãŒã¯OOPã®ã¯ã©ã¹ã®é¡äŒŒç©ã§ããæ©èœãšããŒã¿ãæ¥ç¶ããäžç·ã«ããã±ãŒãžåããŸãã Perlã®ã¯ããŒãžã£ãšC ++ã®ã¯ã©ã¹ã®äŸãèããŠã¿ãŸãããã
Perl
sub multiplicator { my $multiplier = shift; return sub { return shift() * $multiplier; }; }
C ++
class multiplicator { public: multiplicator(const int &mul): multiplier(mul) { } long int operator()(const int &n) { return n * multiplier; } private: int multiplier; };
以äžã®ã³ãŒããåæããŠã¿ãŸãããã
ãã©ã€ããŒã倿°å®£èšïŒ
æåã«ãã¬ãã·ã«ã«å€æ°ïŒ my
ïŒãå®çŸ©ããŸã$multiplier
ïŒ my $multiplier = shift;
ïŒ;
private
ã¢ã¯ã»ã¹ããŒã¯ã³ã®åŸã«int
åã®multiplier
倿°ã宣èšããŸãã
ãã©ã€ããŒã倿°ã®åæåïŒ
倿°ãäœæãããšãã«ãæž¡ãããå€ãåæåããŸãã
ã³ã³ã¹ãã©ã¯ã¿ãŒããªãŒããŒããŒãããŠãæ°å€ãmultiplier
ããåæåãªã¹ãã§å€æ°ã®multiplier
ãåæåããŸãã
æž¡ãããå€ãšä»¥åã«åæåããã倿°ãä¹ç®ãããµãã«ãŒãã³ãäœæããŸãã
å
¥åãšããŠãã©ã¡ãŒã¿ãŒãåãåããããã以åã«åæåããã倿°$multiplier
ããçµæã®å€ãè¿ãå¿åãµãã«ãŒãã³ãè¿ããŸãã
颿°åŒã³åºãæŒç®å()
ããªãŒããŒããŒãããŸãããã®æŒç®åã¯ããã©ã¡ãŒã¿ãŒn
ãå
¥åãšããŠåãåãã倿°ã®multiplier
ããŠå€ãè¿ããŸãã
Perlã§ã¯ããŒãžã£ãŒã䜿çšããC ++ã§ã¯ã©ã¹ã䜿çšããã«ã¯ãããããå®çŸ©ããå¿
èŠããããŸãã ãªããžã§ã¯ããäœæããŸãïŒ
PerlïŒ
- ãªããžã§ã¯ãå®çŸ©ïŒ
my $doubled = multiplicator(2);
my $tripled = multiplicator(3);
say $doubled->(3); # 6
say $tripled->(4); # 12
C ++ïŒ
- ãªããžã§ã¯ãå®çŸ©ïŒ
multiplicator doubled(2), tripled(3);
cout << doubled(3) << endl; // 6
cout << tripled(4) << endl; // 12
C ++ã§ã¯ãå®çŸ©æŒç®å()
å®çŸ©ãããŠããã¯ã©ã¹ã®ãªããžã§ã¯ãã¯ããã°ãã°æ©èœãªããžã§ã¯ããŸãã¯ãã¡ã³ã¯ã¿ãŒãšåŒã°ããŸãã æ©èœãªããžã§ã¯ãã¯ãäžè¬çãªã¢ã«ãŽãªãºã ã®åŒæ°ãšããŠæããã䜿çšãããŸãã ããšãã°ããã¯ãã«ã®èŠçŽ ã远å ããã«ã¯ãfor_eachã¢ã«ãŽãªãºã ã䜿çšã§ããŸããããã¯ãã·ãŒã±ã³ã¹ã®åèŠçŽ ã«æž¡ããã颿°ãé©çšãããªãŒããŒããŒããããæŒç®å()
ã§Sumã¯ã©ã¹ãé©çšããŸããããã¯ãã·ãŒã±ã³ã¹ã®ãã¹ãŠã®èŠçŽ ã远å ããåèšãè¿ããŸãã ãŸããSumã¯ã©ã¹ã®ä»£ããã«ãC ++ 11ã§ç»å Žããã©ã ãã䜿çšã§ããŸãã
C ++ïŒ
#include <iostream> #include <vector> #include <algorithm> using std::cout; using std::endl; using std::vector; class Sum { public: Sum() : sum(0) { }; void operator() (int n) { sum += n; } inline int get_sum() { return sum; } private: int sum; }; int main() { vector<int> nums{3, 4, 2, 9, 15, 267}; Sum s = for_each(nums.begin(), nums.end(), Sum()); cout << " Sum: " << s.get_sum() << endl; long int sum_of_elems = 0; for_each(nums.begin(), nums.end(), [&](int n) { sum_of_elems += n; }); cout << " : " << sum_of_elems << endl; return 0; }
PerlïŒ
sub for_each { my($arr, $cb) = @_; for my $item (@$arr) { $cb->($item); } } my $sum = 0; for_each [3, 4, 2, 9, 15, 267], sub { $sum += $_[0]; }; say $sum;
äŸãããããããã«ãC ++ã§ã¯ã以äžãå«ãSum
ã¯ã©ã¹ã宣èšããŸãã
- æšæºã³ã³ã¹ãã©ã¯ã¿ã§åæåããããã©ã€ããŒã倿°
sum
ã - ãªãŒããŒããŒããããæŒç®å
()
ãéè¡ã®åå€ãåãåãã sum
ãŸãã - ãã©ã€ããŒã倿°
sum
ã«ã¢ã¯ã»ã¹ããget_sum
ã¡ãœããã
Perlã®äŸã§ã¯ãé
åãžã®åç
§ãšå¿å颿°ãåãå
¥ããfor_each
颿°ãäœæããŸãã æ¬¡ã«ãé
åã調ã¹ãŠå¿å颿°ïŒã¯ããŒãžã£ãŒïŒãå®è¡ããé
åã®æ¬¡ã®èŠçŽ ããã©ã¡ãŒã¿ãŒãšããŠæž¡ããŸãã
for_each
颿°ã䜿çšããå Žåãæåã«ãŒãã«åæåãããæ¥èŸå€æ°$sum
å®çŸ©ããŸãã æ¬¡ã«ãé
ååç
§ãšã¯ããŒãžã£é¢æ°ãfor_each
颿°for_each
æž¡ããŸãããã®é¢æ°ã§ã¯ãé
åã®åèŠçŽ ã$sum
倿°ã«$sum
ãŸãã for_each
颿°ãfor_each
åŸã $sum
倿°ã«ã¯é
åã®åèšãå«ãŸããŸãã
C ++ã§ã®Perlã®äŸã®ã¯ããŒãžã£ãŒé¢æ°ã®é¡äŒŒç©ã¯ãã³ãŒãã«ç€ºãããŠããããã«ãã©ã ãã®äœ¿çšã§ãã Perlã®äŸã§ã¯ã颿°ã«æž¡ãããã¯ããŒãžã£ãŒé¢æ°ã¯ã³ãŒã«ããã¯ãŸãã¯ã³ãŒã«ããã¯é¢æ°ãšãåŒã°ããŸãã
ã³ãŒã«ããã¯é¢æ°
for_each
äŸãfor_each
ã«ãã³ãŒã«ããã¯é¢æ°ã¯ãå®è¡å¯èœã³ãŒããä»ã®ã³ãŒãã®ãã©ã¡ãŒã¿ãŒã®1ã€ãšããŠæž¡ãããšã§ãã å€ãã®å Žåãæž¡ããã颿°ã¯ã¯ããŒãžã£ãŒã®ããã«æ©èœããŸãã åå¥å€æ°ã«ã¢ã¯ã»ã¹ã§ããããã°ã©ã ã³ãŒãã®ä»ã®ã³ã³ããã¹ãã§å®çŸ©ã§ããèŠªé¢æ°ïŒã¯ããŒãžã£ãŒ/ã³ãŒã«ããã¯ãæž¡ããã颿°ïŒããã®çŽæ¥åŒã³åºãã«ã¢ã¯ã»ã¹ã§ããŸããã
å®éãã³ãŒã«ããã¯é¢æ°ã¯ã颿°ã®å€æ
æ§ã«é¡äŒŒããŠããŸããã€ãŸããæ§é ã¯åãã§ãããå®è¡å¯èœãªãµãã¿ã¹ã¯ã«ãã£ãŠç¹å®ã®å Žæã§ã®ã¿ç°ãªãäžé£ã®é¢æ°ãäœæãã代ããã«ãããæ±çšçãªé¢æ°ãäœæã§ããŸãã ãã¡ã€ã«ããèªã¿åãããã¡ã€ã«ã«æžã蟌ãã¿ã¹ã¯ã®äŸãèããŠã¿ãŸãããã ãããè¡ãã«ã¯ãPerlã䜿çšããŠããªãŒããŒãšã©ã€ã¿ãŒã®2ã€ã®é¢æ°ãäœæããŸãïŒäŸã¯ã ç°çš®ããŒã¿ãè§£æããããã® Mikhail Ozerov Lazyã€ãã¬ãŒã¿ãŒã«ãããã¬ãŒã³ããŒã·ã§ã³ããååŸããŸãã ïŒãC++ã䜿çšããŠãReader_baseãWriter_baseãReaderWriterã¯ã©ã¹ãäœæããŸãã
Perl
read_write_file.pl use strict; use warnings; sub reader { my ($fn, $cb) = @_; open my $in, '<', $fn; while (my $ln = <$in>) { chomp $ln; $cb->($ln); # } close $in; } sub write_file { my ($fn, $cb) = @_; open my $out, '>', $fn; $cb->(sub { # my $ln = shift; syswrite($out, $ln.$/); }); close $out; } write_file('./out.cvs', sub { my $writer = shift; # sub { my $ln = shift; syswrite() } reader('./in.csv', sub { my $ln = shift; my @fields = split /;/, $ln; return unless substr($fields[1], 0, 1) == 6; @fields = @fields[0,1,2]; $writer->(join(';', @fields)); # }); });
C ++
Reader_base.hpp #pragma once #include <iostream> #include <string> #include <fstream> // - using std::ifstream; using std::getline; using std::cout; using std::runtime_error; using std::endl; using std::cerr; using std::string; class Reader_base { public: Reader_base(const string &fn_in) : file_name(fn_in) { open(file_name); } virtual ~Reader_base() { infile.close(); } virtual void open(const string &fn_in) { infile.open(fn_in); // , if (! infile.is_open()) throw runtime_error("can't open input file \"" + file_name + "\""); } virtual void main_loop() { try { while(getline(infile, line)) { rcallback(line); } } catch(const runtime_error &e) { cerr << e.what() << " Try again." << endl; } } protected: virtual void rcallback(const string &ln) { throw runtime_error("Method 'callback' must me overloaded!"); }; private: ifstream infile; string line; string file_name; };
Writer_base.hpp #pragma once #include <iostream> #include <string> #include <fstream> // - using std::string; using std::ofstream; using std::cout; using std::runtime_error; using std::endl; using std::cerr; class Writer_base { public: Writer_base(const string &fn_out) : file_name(fn_out) { open(file_name); } virtual ~Writer_base() { outfile.close(); } virtual void open(const string &fn_out) { outfile.open(file_name); if (! outfile.is_open()) throw runtime_error("can't open output file \"" + file_name + "\""); } virtual void write(const string &ln) { outfile << ln << endl; } private: string file_name; ofstream outfile; };
ReaderWriter.hpp #pragma once #include "Reader.hpp" #include "Writer.hpp" class ReaderWriter : public Reader_base, public Writer_base { public: ReaderWriter(const string &fn_in, const string &fn_out) : Reader_base(fn_in), Writer_base(fn_out) {} virtual ~ReaderWriter() {} protected: virtual void rcallback(const string &ln) { write(ln); } };
main.cpp #include "ReaderWriter.hpp" int main() { ReaderWriter rw("charset.out", "writer.out"); rw.main_loop(); return 0; }
次ã®ããã«ã³ã³ãã€ã«ããŸãã
$ g++ -std=c++11 -o main main.cpp
ã³ãŒããåæããŸãããïŒ
ãã¡ã€ã«ããã®èªã¿åãïŒ
reader
颿°ã§ã¯ã reader
çšã®ãã¡ã€ã«åãšã³ãŒã«ããã¯ãæž¡ããŸãã ãŸãããã¡ã€ã«ãèªã¿åãçšã«éããŸãã æ¬¡ã«ãã«ãŒãã§ãã¡ã€ã«ã1è¡ãã€ç¹°ãè¿ããåç¹°ãè¿ãã§ã³ãŒã«ããã¯ãåŒã³åºããŠæ¬¡ã®è¡ãæž¡ããŸãã ã«ãŒããå®äºãããããã¡ã€ã«ãéããŸãã OOPã®èгç¹ããèšãã°ãã³ã³ã¹ãã©ã¯ã¿ãŒã¯ãã¡ã€ã«ã®åæåãšãªãŒãã³ãæ
åœãã main_loop
ã¡ãœããã¯ã¡ã€ã³ã«ãŒããæ
åœããŸããã¡ã€ã³ã«ãŒãã§ã¯ãã³ãŒã«ããã¯ã®åŒã³åºãã§ãã¡ã€ã«ãmain_loop
ããŸãã ãã¡ã€ã«ã¯ãã¹ãã©ã¯ã¿ã§éããããŸãã ã³ãŒã«ããã¯ã¯åºæ¬çã«ãåå«ã§ãªãŒããŒããŒãããã芪ããåŒã³åºãããä»®æ³ã¡ãœããã§ãã ãã®é¡äŒŒæ§ã¯ãC ++ã®äŸã§èŠãããšãã§ããŸãã
Reader_base
ã¯ã©ã¹ã®ã³ã³ã¹ãã©ã¯ã¿ã§file_name
倿°ãåæåããèªã¿åãçšã«ãã¡ã€ã«ãéããŸãã æ¬¡ã«ãä»®æ³ã¡ã³ããŒé¢æ°main_loop
ãäœæããŸãããã®é¢æ°ã§ã¯ããã¡ã€ã«ã1è¡main_loop
ã«ãŒãããŠããã®è¡ãåå«ã«ããŒãããå¿
èŠãããã¡ã³ããŒé¢æ°rcallback
ãŸãã
ãã¡ã€ã«ãžã®æžã蟌ã¿ïŒ
writer
颿°ã§ã¯ãæžã蟌ã¿çšã®ãã¡ã€ã«åãšã³ãŒã«ããã¯ãæž¡ããŸãã ãŸãã reader
颿°ã®äŸã®ããã«ãæåã«æžã蟌ã¿çšã«ãã¡ã€ã«ãéããŸãã æ¬¡ã«ãå¥ã®ã³ãŒã«ããã¯ïŒã¯ããŒãžã£ïŒãæž¡ãã³ãŒã«ããã¯ãåŒã³åºããŸããã³ãŒã«ããã¯ã§ã¯ãè¡ãååŸããŠããã¡ã€ã«ã«æžã蟌ã¿ãŸãã ã³ãŒã«ããã¯ãçµäºããåŸããã¡ã€ã«ãéããŸãã OOPã®èгç¹ããèšãã°ãã³ã³ã¹ãã©ã¯ã¿ãŒã¯ãã¡ã€ã«ã®åæåãšãªãŒãã³ãæ
åœããŸãã writeã¡ãœããã¯ãæååãå
¥åãšããŠåãåããããããã¡ã€ã«ã«æžã蟌ããã¡ã€ã«ãžã®æžã蟌ã¿ãæ
åœããŸãã æ¬¡ã«ããã¡ã€ã«ã¯ãã¹ãã©ã¯ã¿ã§éããããŸãã ãã®é¡äŒŒæ§ã¯ãC ++ã®äŸã§èŠãããšãã§ããŸãã
Writer_base
ã¯ã©ã¹ã®ã³ã³ã¹ãã©ã¯ã¿ãŒã§file_name
倿°ãåæåããæžã蟌ã¿çšã«ãã¡ã€ã«ãéããŸãã æ¬¡ã«ããã¡ã€ã«ã«æžã蟌ãæååãæž¡ãããä»®æ³writer
ã¡ã³ããŒé¢æ°ãäœæããŸãã æ¬¡ã«ããã¡ã€ã«ã¯ãã¹ãã©ã¯ã¿ã§éããããŸãã
Perlã§äœæããã颿°ãšC ++ã§ã¯ã©ã¹ãæäœããŸãã
æåã«ãæžã蟌ã¿çšã®ãã¡ã€ã«åãšã³ãŒã«ããã¯ãæž¡ãwriter
颿°ãåŒã³åºããŸãã ã³ãŒã«ããã¯ã§ã¯ã $writer
倿°ã§å¥ã®ã³ãŒã«ããã¯ãååŸããŸãããã®å€æ°ã¯ãæž¡ãããæååããã¡ã€ã«ã«æžã蟌ã¿ãŸãã æ¬¡ã«ã reader
颿°ãåŒã³åºããŸãã reader
颿°ã«ã¯ãèªã¿èŸŒããã¡ã€ã«ã®ååãšã³ãŒã«ããã¯ãæž¡ããŸãã ãªãŒããŒé¢æ°ã®ã³ãŒã«ããã¯ã§ã¯ããã¡ã€ã«ããæ¬¡ã®è¡ãååŸããŠæäœãã $writer
ã³ãŒã«ããã¯ã䜿çšããŠãã¡ã€ã«ã«æžã蟌ã¿ãŸãã äŸãããããããã«ããªãŒããŒé¢æ°ã®ã³ãŒã«ããã¯ã¯æ¬è³ªçã«ã¯ããŒãžã£ãŒã§ãã åå¥å€æ°$writer
ãžã®åç
§ãå«ãŸããŸãã
è€æ°ã®ç¶æ¿ã䜿çšãã ReaderWriter
ã¯ã©ã¹ãšReaderWriter
ã¯ã©ã¹ãç¶æ¿ããReaderWriter
ã¯ã©ã¹ãäœæãWriter_base
ã ã³ã³ã¹ãã©ã¯ã¿ãŒã§ã Writer_base
ã¯ã©ã¹ãšWriter_base
ã¯ã©ã¹Reader_base
ãããããèªã¿åãããã³æžã蟌ã¿çšReader_base
ãã¡ã€ã«åã§åæåReader_base
ãŸãã æ¬¡ã«ããªãŒããŒããŒããããrcallback
ã¡ãœãããäœæããŸãã rcallback
ã¡ãœããã¯ã次ã®è¡ãåãåãã Writer_base
ã¯ã©ã¹ã®write
ã¡ãœããã䜿çšããŠãã¡ã€ã«ã«write
ãŸãã ãªãŒããŒããŒããããrcallback
ã¡ãœããã¯rcallback
main_loop
ã¯ã©ã¹ã®main_loop
ã¡ãœããããrcallback
ã main.cppãã¡ã€ã«ã®äŸãããããããã«ãã¯ã©ã¹ãæäœããããã«ã ReaderWriter
ã¯ã©ã¹ã®rw
ãªããžã§ã¯ããReaderWriter
ããã®ã³ã³ã¹ãã©ã¯ã¿ãŒã¯ãèªã¿åããšæžã蟌ã¿ã®ããã«ãã¡ã€ã«åãæž¡ããŸãã æ¬¡ã«ã rw
ãªããžã§ã¯ãã®ã¡ã³ããŒé¢æ°main_loop
ãåŒã³åºããŸãã
次ã«ãAnyEvent :: HTTPã䜿çšããŠå£ãããªã³ã¯ãèŠã€ãããšããè€éã§å®çšçãªã¿ã¹ã¯ãæ€èšããŸããããã¯ãäžèšã®ãããã¯ïŒå¿åã«ãŒãã³ãã¯ããŒãžã£ãŒãã³ãŒã«ããã¯é¢æ°ïŒã䜿çšããŸãã
å£ãããªã³ã¯ãèŠã€ããã¿ã¹ã¯
å£ãããªã³ã¯ïŒå¿çã³ãŒã4xxããã³5xxã®ãªã³ã¯ïŒãæ€çŽ¢ããåé¡ã解決ããã«ã¯ããµã€ãã¯ããŒã«ãå®è£
ããæ¹æ³ãçè§£ããå¿
èŠããããŸãã å®éããµã€ãã¯ãªã³ã¯ã°ã©ãã§ãã URLã¯ãå€éšããŒãžãšå
éšããŒãžã®äž¡æ¹ã«ãªã³ã¯ã§ããŸãã ãµã€ããã¯ããŒã«ããã«ã¯ã次ã®ã¢ã«ãŽãªãºã ã䜿çšããŸãã
process_page(current_page): for each link on the current_page: if target_page is not already in your graph: create a Page object to represent target_page add it to to_be_scanned set add a link from current_page to target_page scan_website(start_page) create Page object for start_page to_be_scanned = set(start_page) while to_be_scanned is not empty: current_page = to_be_scanned.pop() process_page(current_page)
ãã®ã¿ã¹ã¯ã®å®è£
ã¯ã Broken link checkerãªããžããªã«ãããŸãchecker_with_graph.plã¹ã¯ãªãããæ€èšããŠãã ããã ãŸãã倿°$start_page_url
ïŒéå§ããŒãžã®URLïŒã $cnt
ïŒããŠã³ããŒãããURLã®æ°ïŒãåæåããããã·ã¥$to_be_scanned
ãšã°ã©ã$g
äœæããŸãã
次ã«ã scan_website,
颿°ãäœæããŸãã scan_website,
颿°ã«scan_website,
ããŠã³ããŒãããã³ã³ãŒã«ããã¯çšã®URLã®æå€§æ°ã®å¶éãæž¡ããŸãã
sub scan_website { my ($count_url_limit, $cb) = @_;
æåã«ãéå§ããŒãž$to_be_scanned
ããã·ã¥ãåæåããŸãã
# to_be_scanned = set(start_page) $to_be_scanned->{$start_page_url}{internal_urls} = [$start_page_url];
$to_be_scanned
æ§é ã®å®å
šãªåæã¯ããã«é²ãã§ããããªã³ã¯ãå
éšïŒinternal_urlsïŒã§ããããšã«æ³šæãã䟡å€ããããŸãã
次ã«ãå¿å颿°ãäœæããŠå®è¡ããŸãã ã¬ã³ãŒããèŠã
my $do; $do = sub { ... }; $do->();
ã¯æšæºçãªã€ãã£ãªã ã§ãããã¯ããŒãžã£ãã$do
倿°ã«ã¢ã¯ã»ã¹ããŠãããšãã°ååž°ãäœæã§ããŸãã
my $do; $do = sub { ...; $do->(); ... }; $do->();
ãŸãã¯åŸªç°åç
§ãåé€ããïŒ
my $do; $do = sub { ...; undef $do; ... }; $do->();
$do
ã¯ããŒãžã£ãŒã§ã %urls
ããã·ã¥ãäœæããããã«$to_be_scanned
ããã·ã¥ããURLã远å ããŸãã
my %urls; for my $parent_url (keys %$to_be_scanned) { my $type_urls = $to_be_scanned->{$parent_url}; # $type_urls - internal_urls|external_urls push @{$urls{$parent_url}}, splice(@{$type_urls->{internal_urls}}, 0, $max_connects); while (my ($root_domain, $external_urls) = each %{$type_urls->{external_urls}}) { push @{$urls{$parent_url}}, splice(@$external_urls, 0, 1); } }
%urls
ããã·ã¥æ§é ã¯æ¬¡ã®ãšããã§ãã
{parent_url1 => [target_url1, target_url2, target_url3], parent_url2 => [...]}
次ã«ã颿°process_page
ãå®è¡ãã %urls
hash %urls
ãžã®ãªã³ã¯ãšã³ãŒã«ããã¯ãæž¡ããŸãã
process_page(\%urls, sub { ... });
process_page
颿°ã§ãåä¿¡ããããã·ã¥ãšã³ãŒã«ããã¯ãä¿åããŸãã
sub process_page { my ($current_page_urls, $cb) = @_;
ãã®åŸãURLããã·ã¥ãã«ãŒãããŠãã¢(parent_url => current_urls)
ãååŸããçŸåšã®URLã®ãªã¹ãïŒcurrent_urlsïŒã(parent_url => current_urls)
ãŸã
while (my ($parent_url, $current_urls) = each %$current_page_urls) { for my $current_url (@$current_urls) {
ããŒãžããã®ããŒã¿ã®åä¿¡ãæ€èšããåã«ãå°ãäœè«ããŸãã ããŒãžãè§£æããŠURLãååŸããããã®åºæ¬çãªã¢ã«ãŽãªãºã ã¯ããã®URLãå
éšãå€éšãã«é¢ä¿ãªãã1ã€ã®HTTP GETã¡ãœãããæ³å®ããŠããŸãã ãã®å®è£
ã§ã¯ã2ã€ã®HEADããã³GETåŒã³åºãã䜿çšããŠããµãŒããŒã®è² è·ã次ã®ããã«åæžããŸããã
- HEADãªã¯ãšã¹ãã¯ããã¹ãŠã®å€éšURLã«å¯ŸããŠå®è¡ãããŸãïŒãšã©ãŒããããã©ããã«é¢ä¿ãªãïŒã å
éšã«ãšã©ãŒããããWebããŒãžã§ã¯ãªãå Žåã
- HEADããã³GETèŠæ±ã¯ãå
éšWebããŒãžã«å¯ŸããŠãšã©ãŒãªãã§å®è¡ãããŸãã
ãã®ããããŸãAnyEvent :: HTTPã¢ãžã¥ãŒã«ã®http_head
颿°ãå®è¡ããçŸåšã®URLãèŠæ±ãã©ã¡ãŒã¿ãŒãã³ãŒã«ããã¯ãæž¡ããŸãã
$cv->begin; http_head $current_url, %params, sub {
ã³ãŒã«ããã¯ã§ã¯ãããããŒïŒHTTPããããŒïŒãååŸããŸã
my $headers = $_[1];
ããããå®éã®URLïŒãªãã€ã¬ã¯ãåŸã®URLïŒãååŸããŸã
my $real_current_url = $headers->{URL};
次ã«ããã¢(current_url => real_current_url)
ã%urls_with_redirects
ããã·ã¥ã«%urls_with_redirects
ãŸãã
$urls_with_redirects{$current_url} = $real_current_url if $current_url ne $real_current_url;
ããã«ããšã©ãŒãçºçããå ŽåïŒã¹ããŒã¿ã¹ã³ãŒã4xxããã³5xxïŒããã°ã«ãšã©ãŒã衚瀺ããå°æ¥ã®äœ¿çšã®ããã«ããããŒãããã·ã¥ã«ä¿åããŸã
if ( $headers->{Status} =~ /^[45]/ && !($headers->{Status} == 405 && $headers->{allow} =~ /\bget\b/i) ) { $warn_log->("$headers->{Status} | $parent_url -> $real_current_url") if $warn; $note_log->(sub { p($headers) }) if $note; $urls_with_errors{$current_url} = $headers;
ãã以å€ã®å Žåããµã€ããå
éšã§WebããŒãžã§ããå Žåã
elsif ( # ($start_page_url_root eq $url_normalization->root_domain($real_current_url)) # - && ($headers->{'content-type'} =~ m{^text/html}) ) {
次ã«ã http_get
颿°ãå®è¡ããŸãã http_get
颿°ã«ãäžèšã§åãåã£ãå®éã®çŸåšã®URLããªã¯ãšã¹ããã©ã¡ãŒã¿ãã³ãŒã«ããã¯ã転éããŸãã
$cv->begin; http_get $real_current_url, %params, sub {
http_get颿°ã®ã³ãŒã«ããã¯ã§ãããŒãžã®ããããŒã𿬿http_get
ååŸããããŒãžããã³ãŒãããŸãã
my ($content, $headers) = @_; $content = content_decode($content, $headers->{'content-type'});
Web :: Queryã¢ãžã¥ãŒã«ã䜿çšããŠãããŒãžè§£æãšURLååŸãå®è¡ããŸãã
wq($content)->find('a') ->filter(sub { my $href = $_[1]->attr('href');
each
ã¡ãœããã®åå埩ã§ãã³ãŒã«ããã¯ã«ãªã³ã¯ãååŸããŸã
my $href = $_->attr('href');
ãããŠããã倿ãã
$href = $url_normalization->canonical($href); # '/', '/contact' (//dev.twitter.com/etc) if ($href =~ m{^/[^/].*}) { $href = $url_normalization->path($real_current_url, $href) ; } $href = $url_normalization->without_fragment($href);
次ã«ãã§ãã¯ããŸã-ã°ã©ãã«ãã®ãããªãªã³ã¯ããªãå Žå
unless($g->has_vertex($href)) {
次ã«ããªã³ã¯ã®ã«ãŒããã¡ã€ã³ãååŸããŸãïŒãŸãã¯ã倱æãã«å
¥ããŸãïŒ
my $root_domain = $url_normalization->root_domain($href) || 'fails';
ãã®åŸã $new_urls
ã®æ§é ã$new_urls
ãããã¯ã $to_be_scanned
ã®æ§é ã«äŒŒãŠãããæ¬¡ã®åœ¢åŒã«ãªããŸãã
$new_urls = $to_be_scanned = { parent_url => { external_urls => { root_domain1 => [qw/url1 url2 url3/], root_domain2 => [qw/url1 url2 url3/], }, internal_urls => [qw/url url url/], }, };
$new_urls
æ§é äœã§ã¯ããã¢(parent_url => target_url)
ãäœæããŸããã target_url
ãããã€ãã®éšåã«åå²ããŸããã€ãŸããé
åã«ä¿åããå
éšURLãšãã¡ã€ã³ã«åå²ããé
åã«ä¿åããå€éšURLã«åå²ããŸãã ãã®æ§é ã«ãããæ¬¡ã®ããã«ãµã€ãã®è² è·ãæžããããšãã§ããŸãã %urls
ããã·ã¥ãæ§ç¯ããéã®äžèšã®$do
ã¯ããŒãžã£ãŒã«ç€ºãããã«ãå
éšURLã®$max_connects ( )
åãã¡ã€ã³ããšã«1ã€ã®å€éšURLãéžæããŸãã ãããã£ãŠã scan_website
颿°ã®éå§æã«ãéå§ããŒãžã次ã®ããã«ä¿åããŸããã
$to_be_scanned = { $start_page_url => { internal_urls => [$start_page_url], }, };
ã€ãŸã ãã®å Žåã芪ããŒãžãšçŸåšã®ããŒãžã®äž¡æ¹ãéå§ããŒãžã§ããïŒä»ã®å ŽåãããŒãžããŒã¿ã¯ç°ãªããŸãïŒã
ãã®æ§é ã®æ§ç¯ã¯æ¬¡ã®ãšããã§ã-ãµã€ããå
éšã®å Žåãæ§é ãäœæããŸã
$new_urls->{$real_current_url}{internal_urls} //= []
ãã以å€ã®å Žåããµã€ããå
éšã®å Žåãæ§é
$new_urls->{$real_current_url}{external_urls}{$root_domain} //= []
ãããŠããããã®æ§é ã®1ã€ã$urls
倿°ã«ä¿åããŸãããããæ¬¡ã«äœ¿çšããŠã $new_urls
æ§é ã«æžã蟌ã¿ãŸãã
push @$urls, $href;
ãã®å Žåããªã³ã¯ã䜿çšããŠè€éãªããŒã¿æ§é ãäœæããã³æäœããŸãã 倿°$urls
ã¯$new_urls
ã®æ§é ãåç
§ããããã倿°$urls
ã倿Žããããšã $new_urls
æ§é ã倿Žãã$new_urls
ã Perlã§ã®ããŒã¿æ§é ãšã¢ã«ãŽãªãºã ã®è©³çްã«ã€ããŠã¯ããJon Orwant-Perlã§ã¢ã«ãŽãªãºã ããã¹ã¿ãŒããããåç
§ããŠãã ããã
次ã«ãã°ã©ãã«ã«ããã«ã远å ããŸã(real_current_url (parent) => href (current))
ã
$g->add_edge($real_current_url, $href);
ãã®åŸã $new_urls
ã®æ§é ã確èªããŸã-é
åinternal_urls
ãŸãã¯external_urls
空ã§ãªãå Žåã¯ãããŒã¿ããã°ã«åºåããŠã³ãŒã«ããã¯ãå®è¡ããæ§é $new_urls
æž¡ããŸã
if (is_to_be_scanned($new_urls)) { $debug_log->(($parent_url // '')." -> $real_current_url ".p($new_urls)) if $debug; $cb->($new_urls); }
ãªãã·ã§ã³ïŒãšã©ãŒãŸãã¯å
éšããŒãžã®è§£æïŒã®ãããã«ã該åœããªãã£ãå Žåãã€ãŸã ãµã€ãã¯å€éšã§ãšã©ãŒãçºçããŠããªããããã³ãŒã«ããã¯ãå®è¡ããŸã
else { $cb->(); }
ãã®åŒã³åºãã¯ããã¹ãŠã®å€éšãµã€ããçŸåšã®URL $current_urls
ãªã¹ãã«ããå Žåã«å¿
èŠã§ããã $to_be_scanned
ãŸã $to_be_scanned
ã$to_be_scanned
ã ãã®åŒã³åºãããªããã°ã http_head
ãšhttp_head
å®è¡ããŠ$current_urls
ã®ãªã¹ããhttp_head
ã
process_page
颿°ã®ã³ãŒã«ããã¯ã§ãçµæã®æ§é $new_urls
ãä¿åã$new_urls
ã
process_page(\%urls, sub { my $new_urls = shift;
ããã倿°$to_be_scanned
ãšçµã¿åãããŸãã
$to_be_scanned = merge($to_be_scanned, $new_urls) if $new_urls;
次ã«ãã°ã©ãèŠçŽ ã®æ°ãURLã®æ°ã®å¶é以äžãã©ããã確èªããå¿åãµãã«ãŒãã³ãžã®ãªã³ã¯ãåé€ããŠ$cv->send()
ãŸãã
if (scalar($g->vertices) >= $count_url_limit) { undef $do; $cb->(); $cv->send; }
ãã以å€ã®å Žåããã§ãã¯ããURLãããã°ã
elsif (is_to_be_scanned($to_be_scanned)) {
ãã®åŸãå¿åãµãã«ãŒãã³ãååž°çã«åŒã³åºããŸã
$do->();
äžèšã®èª²é¡ãèæ
®ãããŸããã $to_be_scanned
process_page
( ).
, GraphViz â svg, png .. :
$ perl bin/checker_with_graph.pl -u planetperl.ru -m 500 -c 5 \ -g -f svg -o etc/panetperl_ru.svg -l "broken link check" -r "http_//planetperl.ru/" $ perl bin/checker_with_graph.pl -u habrahabr.ru -m 500 -c 5 \ -g -f svg -o etc/habr_ru.svg -l "broken link check" -r "https_//habrahabr.ru/" $ perl bin/checker_with_graph.pl -u habrahabr.ru -m 100 -c 5 \ -g -f png -o etc/habr_ru.png -l "broken link check" -r "https_//habrahabr.ru/"
ã©ãã§
--url | -u --max_urls | -m --max_connects | -c --graphviz | -g --graphviz_log_level | -e , . perldoc Log::Handler --format | -f - png, svg, etc --output_file | -o --label | -l --root | -r - .. twopi
PERL_ANYEVENT_VERBOSE,
$ export PERL_ANYEVENT_VERBOSE=n
n:
- 5 (warn) â http
- 6 (note) â http ( $headers)
- 7 (info) â URLs
- 8 (debug) â ,
ãããã«
Perl, , â , . Perl C++, (callbacks) Perl - C++. AnyEvent::HTTP, .