Node.jsã®
puppeteerã©ã€ãã©ãªã䜿çšãããšãGoogle Chromeãã©ãŠã¶ãŒã§ã®äœæ¥ãèªååã§ããŸãã ç¹ã«ã
puppeteer
ã䜿çšããŠãWebãµã€ãããããŒã¿ãèªåçã«åéããããã®ããã°ã©ã ãäœæã§ããŸããããã¯ãéåžžã®ãŠãŒã¶ãŒã®ã¢ã¯ã·ã§ã³ãæš¡å£ãããããããWebã¹ã¯ã¬ã€ããŒã§ãã ãã®ãããªã·ããªãªã§ã¯ããŠãŒã¶ãŒã€ã³ã¿ãŒãã§ãŒã¹ã®ãªããã©ãŠã¶ãŒãããããããããã¬ã¹Chromeãã䜿çšã§ããŸãã
puppeteer
ã䜿çšãããšãéåžžã¢ãŒãã§å®è¡ããããã©ãŠã¶ãŒãå¶åŸ¡ã§ããŸããããã¯ãããã°ã©ã ã®ãããã°æã«ç¹ã«åœ¹ç«ã¡ãŸãã
仿¥ã¯ãNode.jsãš
puppeteer
åºã¥ããWebã¹ã¯ã¬ã€ããŒã®äœæã«ã€ããŠèª¬æããŸãã ãã®èšäºã®èè
ã¯ãã§ããã ãå€ãã®ããã°ã©ããŒã®èªè
ã«ãã®èšäºãé¢çœãããããåªããŸããããããã£ãŠãæ¢ã«
puppeteer
çµéšãããWebéçºè
ãšãããããã¬ã¹ã¯ããŒã ãã
äºåæºå
éå§ããåã«ãããŒã8以éãå¿
èŠã§ãã çŸåšã®ããŒãžã§ã³ãéžæããŠã
ããã§æ€çŽ¢ããŠããŠã³ããŒãã§ããŸãã Nodeã§äœæ¥ããããšããªãå Žåã¯ã
ãããã®ãã¬ãŒãã³ã°ã³ãŒã¹ãèŠãããä»ã®è³æãæ¢ããŠãã ããããããã¯Webäžã«ãããããããŸãã
Nodeã®ã€ã³ã¹ããŒã«åŸããããžã§ã¯ãçšã®ãã©ã«ããŒãäœæãã
puppeteer
ãã€ã³ã¹ããŒã«ããŸãã ãããšãšãã«ãChromiumã®çŸåšã®ããŒãžã§ã³ãã€ã³ã¹ããŒã«ãããŸããããã¯ãèå³ã®ããAPIã§åäœããããšãä¿èšŒãããŠããŸãã ãããè¡ãã«ã¯ã次ã®ã³ãã³ãã䜿çšããŸãã
npm install
äŸïŒ1ïŒã¹ã¯ãªãŒã³ã·ã§ãããäœæãã
puppeteer
ã€ã³ã¹ããŒã«ããåŸãç°¡åãªäŸãèŠãŠã¿
puppeteer
ã 圌ã¯ãããããªä¿®æ£ãå ããŠãã©ã€ãã©ãªã®ããã¥ã¡ã³ããç¹°ãè¿ããŸãã ããããã¬ãã¥ãŒããã³ãŒãã¯ãç¹å®ã®WebããŒãžã®ã¹ã¯ãªãŒã³ã·ã§ãããæ®ããŸãã
æåã«ã
test.js
ãã¡ã€ã«ãäœæãã
test.js
ãã¡ã€ã«ã
test.js
å
¥ããŸãã
const puppeteer = require('puppeteer'); async function getPic() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://google.com'); await page.screenshot({path: 'google.png'}); await browser.close(); } getPic();
ãã®ã³ãŒããè¡ããšã«è§£æããŸãããã ãŸããå
šäœåã瀺ããŸãã
const puppeteer = require('puppeteer');
ãã®è¡ã§ã¯ã以åã«ã€ã³ã¹ããŒã«ãã
puppeteer
ã©ã€ãã©ãªãäŸåé¢ä¿ãšããŠæ¥ç¶ããŸãã
async function getPic() { ... }
ãããã¡ã€ã³é¢æ°
getPic()
ã§ãã ãã®é¢æ°ã«ã¯ããã©ãŠã¶ãŒã§ã®äœæ¥ãèªååããã³ãŒããå«ãŸããŠããŸãã
getPic()
ãã®è¡ã§ã¯ã
getPic()
颿°ãåŒã³åºããŸããã€ãŸããå®è¡ããŸãã
getPic()
颿°ã¯éåæã§ããã
async
å®çŸ©ãããŠããããšã«æ³šæããããšãéèŠã§ãã ES 2017ã®
async / await
ã³ã³ã¹ãã©ã¯ãã䜿çšããŸã
getPic()
ã¯éåæé¢æ°ã§ãããããåŒã³åºããããš
Promise
ãªããžã§ã¯ããè¿ããŸãã ãã®ãããªãªããžã§ã¯ãã¯ãéåžžãçŽæããšåŒã°ããŸãã
async
å®çŸ©ããã颿°ãçµäºããŠå€ãè¿ããšãpromiseã¯èš±å¯ãããïŒæäœãæåããå ŽåïŒãŸãã¯æåŠãããŸãïŒãšã©ãŒãçºçããå ŽåïŒã
颿°ãå®çŸ©ãããšãã«
async
ããŒã¯ãŒãã䜿çšããããšã«ããã
await
ããŒã¯ãŒãã䜿çšããŠä»ã®é¢æ°ãåŒã³åºãããšãã§ããŸãã 颿°ã®å®è¡ãäžæåæ¢ãã察å¿ãããããã¹ã®è§£æ±ºãåŸ
ã€ããšãã§ããŸãããã®åŸã颿°ã¯ç¶è¡ããŸãã ãã®ãã¹ãŠããŸã æç¢ºã§ãªãå Žåã¯ãèªã¿é²ããŠãã ãããåŸã
ã«ãã¹ãŠãé©åãªäœçœ®ã«åãŸãå§ããŸãã
次ã«
getPic()
颿°ã³ãŒãã
getPic()
ãŸãããã
const browser = await puppeteer.launch();
ããã§ã¯
puppeteer
ãå®è¡ããŸãã å®éãããã¯ãChromeãã©ãŠã¶ãŒã®ã€ã³ã¹ã¿ã³ã¹ãèµ·åããäœæããã°ããã®
browser
宿°ã«ãã®ã€ã³ã¹ã¿ã³ã¹ãžã®ãªã³ã¯ãæžã蟌ãããšãæå³ããŸãã ãã®è¡ã§ã¯
await
ããŒã¯ãŒãã䜿çšãã
await
ãã
await
ã察å¿ããpromiseã解決ããããŸã§ãmain颿°ã®å®è¡ãäžæãããŸãã ãã®å Žåãããã¯ãChromeã€ã³ã¹ã¿ã³ã¹ãæ£åžžã«èµ·åãããããšã©ãŒãçºçããã®ãåŸ
ã€ããšãæå³ããŸãã
const page = await browser.newPage();
ããã§ã¯ãããã°ã©ã ã³ãŒãã«ãã£ãŠå¶åŸ¡ããããã©ãŠã¶ã§æ°ããããŒãžãäœæããŸãã ã€ãŸãããã®æäœãèŠæ±ããå®äºããã®ãåŸ
ã£ãŠãããŒãžãžã®ãªã³ã¯ã
page
宿°ã«æžã蟌ã¿
page
ã
await page.goto('https://google.com');
åã®è¡ã§äœæãã
page
倿°ã䜿çšããŠãæå®ããURLã«ç§»åããã³ãã³ããããŒãžã«äžããããšãã§ããŸãã ãã®äŸã§ã¯ã
https://google.com
ã
https://google.com
ã åã®è¡ã®ããã«ãã³ãŒãã®å®è¡ã¯ãæäœãå®äºãããŸã§äžæåæ¢ããŸãã
await page.screenshot({path: 'google.png'});
ããã§ã¯ã
puppeteer
ã
page
宿°ã§è¡šãããçŸåšã®ããŒãžã®ã¹ã¯ãªãŒã³ã·ã§ãã
puppeteer
ããã«äŸé Œã
page
ã
screenshot()
ã¡ãœããã¯ããã©ã¡ãŒã¿ãŒãšããŠãªããžã§ã¯ããåãå
¥ããŸãã ããã§ãã¹ã¯ãªãŒã³ã·ã§ããã
.png
圢åŒã§ä¿åãããã¹ãæå®ã§ããŸãã ç¹°ãè¿ããŸãããããã§ã¯
await
ããŒã¯ãŒãã䜿çšãããæäœãå®äºãããŸã§é¢æ°ãäžæåæ¢ããŸãã
await browser.close();
getPic()
颿°
getPic()
ãããã©ãŠã¶ãŒãéã
getPic()
ã
å®è¡äŸ
test.js
ä¿åãããäžèšã®ã³ãŒãã¯ã次ã®ããã«Nodeã䜿çšããŠå®è¡ã§ããŸãã
node test.js
æ£åžžã«å®äºããåŸã¯æ¬¡ã®ããã«ãªããŸãã

ãããïŒ ãããŠä»ãããããã£ãšæ¥œããããããã«ïŒãããŠãããã°ãç°¡åã«ããããã«ïŒãChromeãéåžžã¢ãŒãã§èµ·åããããšã§åãããšãããããšãã§ããŸãã
ããã¯ã©ãããæå³ã§ããïŒ è©ŠããŠã¿ãŠãèªåã®ç®ã§ç¢ºãããŠãã ããã ãããè¡ãã«ã¯ã次ã®ã³ãŒãè¡ã眮ãæããŸãã
const browser = await puppeteer.launch();
ããã«ïŒ
const browser = await puppeteer.launch({headless: false});
ãã¡ã€ã«ãä¿åããNodeã䜿çšããŠå床å®è¡ããŸãã
node test.js
ããã§ãã ãã©ãŠã¶ã®èµ·åæã«
{headless: false}
ãªããžã§ã¯ãããã©ã¡ãŒã¿ãŒãšããŠ
{headless: false}
ããšã§ãã³ãŒããGoogle Chromeã®åäœãå¶åŸ¡ããæ¹æ³ã芳å¯ã§ããŸãã
å
ã«é²ãåã«ãå¥ã®ããšãè¡ããŸãã ããã°ã©ã ã«ãã£ãŠäœæãããã¹ã¯ãªãŒã³ã·ã§ããã«ã¯ãããŒãžã®äžéšã®ã¿ãå«ãŸããŠããããšã«æ°ã¥ããŸãããïŒ ããã¯ããã©ãŠã¶ãŠã£ã³ããŠãWebããŒãžã®ãµã€ãºãããããã«å°ããããã§ãã æ¬¡ã®è¡ã§ãããä¿®æ£ãããŠã£ã³ããŠã®ãµã€ãºã倿Žã§ããŸãã
await page.setViewport({width: 1000, height: 500})
URLã«ç§»åããã«ã¯ãã³ãã³ãã®çŽåŸã«ã³ãŒãã«è¿œå ããå¿
èŠããããŸãã ããã«ãããããã°ã©ã ã¯éåžžã«èŠæ ãã®è¯ãã¹ã¯ãªãŒã³ã·ã§ãããæ®ããŸãã

ã³ãŒãã®æçµããŒãžã§ã³ã¯æ¬¡ã®ãšããã§ãã
const puppeteer = require('puppeteer'); async function getPic() { const browser = await puppeteer.launch({headless: false}); const page = await browser.newPage(); await page.goto('https://google.com'); await page.setViewport({width: 1000, height: 500}) await page.screenshot({path: 'google.png'}); await browser.close(); } getPic();
äŸ2ïŒãŠã§ãã¹ã¯ã¬ã€ãã³ã°
puppeteerã䜿çšããŠChromeèªååã®åºæ¬ããã¹ã¿ãŒããã®ã§ãWebããŒãžããããŒã¿ãåéããããæŽç·ŽãããäŸãèŠãŠã¿ãŸãããã
ãŸãã
puppeteer
ããã¥ã¡ã³ãã puppeteer
ãã ããã ããŒãžèŠçŽ ã®ããŠã¹ã¯ãªãã¯ãã·ãã¥ã¬ãŒãããã ãã§ãªãããã©ãŒã ã«å
¥åããŠããŒãžããããŒã¿ãèªã¿åãããšãã§ããèšå€§ãªæ°ã®ç°ãªãæ¹æ³ãããããšã«æ³šæããŠãã ããã
Books To ScrapeããããŒã¿ãåéããŸãã ããã¯ãWebã¹ã¯ã¬ã€ãã³ã°å®éšçšã«äœæãããé»åæžåºã®æš¡é åã§ãã
test.js
ãã¡ã€ã«ã
test.js
ãŠããåããã£ã¬ã¯ããªã«ã
test.js
ãã¡ã€ã«ãäœæããããã«æ¬¡ã®
scrape.js
ã貌ãä»ããŸãã
const puppeteer = require('puppeteer'); let scrape = async () => { // ... // }; scrape().then((value) => { console.log(value); // ! });
çæ³çã«ã¯ãæåã®äŸãè§£æããåŸããã®ã³ãŒããã©ã®ããã«æ©èœãããããã§ã«çè§£ããŠããå¿
èŠããããŸãã ããããããã§ãªãå Žåã¯å€§äžå€«ã§ãã
ãã®ã¹ããããã§ã¯ã以åã«ã€ã³ã¹ããŒã«ãã
puppeteer
ãæ¥ç¶ããŸãã æ¬¡ã«ãscrape
scrape()
颿°ããããŸãããã®é¢æ°ã«ã以äžã«ã¹ã¯ã¬ã€ãã³ã°çšã®ã³ãŒãã远å ããŸãã ãã®é¢æ°ã¯äœããã®å€ãè¿ããŸãã ãããŠæåŸã«ã
scrape()
颿°ãåŒã³åºãããããè¿ããã®ãæäœããŸãã ãã®å Žåãåã«ã³ã³ãœãŒã«ã«åºåããŸãã
scrape()
颿°ã«æ¹è¡ã远å ããŠããã®ã³ãŒãã確èªããŸãã
let scrape = async () => { return 'test'; };
ãã®åŸã
node scrape.js
ããŠããã°ã©ã ãå®è¡ã
node scrape.js
ã
test
ãšããåèªãã³ã³ãœãŒã«ã«è¡šç€ºãããŸãã ã³ãŒãã®æäœæ§ã確èªããã³ã³ãœãŒã«ã«ç®çã®å€ãååŸããŸããã ããã§ãWebã¹ã¯ã¬ã€ãã³ã°ãå®è¡ã§ããŸãã
âã¹ããã1ïŒã»ããã¢ãã
ãŸãããã©ãŠã¶ã€ã³ã¹ã¿ã³ã¹ãäœæããæ°ããããŒãžãéããŠURLã«ã¢ã¯ã»ã¹ããå¿
èŠããããŸãã ããããã¹ãŠã®æ¹æ³ã§ãã
let scrape = async () => { const browser = await puppeteer.launch({headless: false}); const page = await browser.newPage(); await page.goto('http://books.toscrape.com/'); await page.waitFor(1000);
ãã®ã³ãŒããåæããŸãããã
const browser = await puppeteer.launch({headless: false});
ãã®è¡ã§ã¯ããã©ãŠã¶ãŒã€ã³ã¹ã¿ã³ã¹ãäœæãã
headless
ãã©ã¡ãŒã¿ãŒã
false
èšå®ã
false
ã ããã«ãããäœãèµ·ãã£ãŠããã®ãã芳å¯ã§ããŸãã
const page = await browser.newPage();
ããã§ããã©ãŠã¶ã«æ°ããããŒãžãäœæããŸãã
await page.goto('http://books.toscrape.com/');
http://books.toscrape.com/
ãŸãã
await page.waitFor(1000);
ããã§ã¯ããã©ãŠã¶ã«ããŒãžãå®å
šã«ããŒãããæéãäžããããã«1000ããªç§ã®é
å»¶ã远å ããŸãããéåžžããã®æé ã¯çç¥ã§ããŸãã
browser.close(); return result;
ããã§ããã©ãŠã¶ãéããŠçµæãè¿ããŸãã
æºåãå®äºããŸãããä»åºŠã¯ã¹ã¯ã¬ã€ãã³ã°ãåãäžããŸãã
âã¹ããã2ïŒã¹ã¯ã¬ã€ãã³ã°
ããããæ¢ã«ãåç¥ã®ããã«ãBooks To Scrape Webãµã€ãã«ã¯ãæ¡ä»¶ä»ãããŒã¿ãåããæ¬ã®å€§ããªã«ã¿ãã°ããããŸãã ããŒãžã«ããæåã®æ¬ãåãããã®ååãšäŸ¡æ Œãè¿ããŸãã ããããµã€ãã®ããŒã ããŒãžã§ãã æåã®æ¬ãã¯ãªãã¯ããŸãïŒèµ€ã§åŒ·èª¿è¡šç€ºãããŠããŸãïŒã

puppeteer
ããã¥ã¡ã³ãã«ã¯ãããŒãžäžã®ããŠã¹ã¯ãªãã¯ãã·ãã¥ã¬ãŒãã§ããã¡ãœããããããŸãã
page.click(selector[, options])
selector <string>
ãã¥ãŒã¯ãã¯ãªãã¯ããèŠçŽ ãèŠã€ããããã®ã»ã¬ã¯ã¿ãŒã§ãã ã»ã¬ã¯ã¿ãŒãæºããè€æ°ã®èŠçŽ ãèŠã€ãã£ãå Žåãæåã®èŠçŽ ãã¯ãªãã¯ããŸãã
Google Chromeéçºè
ããŒã«ã䜿çšãããšãç¹å®ã®èŠçŽ ã®ã»ã¬ã¯ã¿ãŒãç°¡åã«æ±ºå®ã§ããŸãã ãããè¡ãã«ã¯ãç»åãå³ã¯ãªãã¯ããŠã[
Inspect
ã³ãã³ãïŒã³ãŒãã®è¡šç€ºïŒãéžæããŸãã

ãã®ã³ãã³ãã¯ã
Elements
ããã«ãéããŸãããã®ããã«ã§ã¯ãããŒãžã®ã³ãŒãã衚瀺ããã察象ã®èŠçŽ ã«å¯Ÿå¿ãããã©ã°ã¡ã³ãã匷調衚瀺ãããŸãã ãã®åŸãå·ŠåŽã«ãã3ã€ã®ãããã®ãã¿ã³ãã¯ãªãã¯ããŠã衚瀺ãããã¡ãã¥ãŒãã[
Copy â Copy selector
[
Copy â Copy selector
]ã
Copy â Copy selector
ããŸãã

ãããïŒ ããã§ã»ã¬ã¯ã¿ãŒãäœæããã
click
ã¡ãœãããäœæããŠããã°ã©ã ã«è²Œãä»ããæºåããã¹ãŠæŽããŸããã ãããã©ã®ããã«èŠãããã§ãïŒ
await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
ããã§ãããã°ã©ã ã¯è£œåã®æåã®ç»åã®ã¯ãªãã¯ãã·ãã¥ã¬ãŒããããã®è£œåã®ããŒãžãéããŸãã
ãã®æ°ããããŒãžã§ã¯ãæ¬ã®ååãšãã®äŸ¡æ Œã«èå³ããããŸãã äžã®å³ã§åŒ·èª¿è¡šç€ºãããŠããŸãã

ãããã®å€ãååŸããããã«ã
page.evaluate()
ã¡ãœããã䜿çšããŸãã ãã®ã¡ãœããã䜿çšãããšã
querySelector()
ãªã©ã®JavaScriptã¡ãœããã䜿çšããŠDOMãæäœã§ããŸãã
ãŸãã
page.evaluate()
ã¡ãœãããåŒã³åºããããã«ãã£ãŠè¿ãããå€ã
result
宿°ã«å²ãåœãŠãŸãã
const result = await page.evaluate(() => {
ãã®é¢æ°ã§ã¯ãå¿
èŠãªèŠçŽ ãéžæã§ããŸãã å¿
èŠãªãã®ãèšè¿°ããæ¹æ³ãçè§£ããããã«ãåã³Chromeéçºè
ããŒã«ã䜿çšããŸãã ãããè¡ãã«ã¯ãããã¯ã®ååãå³ã¯ãªãã¯ããŠã[
Inspect
ã³ãã³ãïŒã³ãŒãã®è¡šç€ºïŒãéžæããŸãã

[
Elements
]ããã«ã§ãæ¬ã®ã¿ã€ãã«ãéåžžã®ç¬¬1ã¬ãã«ã®èŠåºã
h1
ã§ããããšãããããŸãã æ¬¡ã®ã³ãŒãã䜿çšããŠããã®ã¢ã€ãã ãéžæã§ããŸãã
let title = document.querySelector('h1');
ãã®èŠçŽ ã«å«ãŸããããã¹ããå¿
èŠãªã®ã§ã
.innerText
ããããã£ã䜿çšããå¿
èŠããããŸãã ãã®çµæãæ¬¡ã®æ§é ã«å°éããŸãã
let title = document.querySelector('h1').innerText;
åãã¢ãããŒãã¯ãããŒãžããæ¬ã®äŸ¡æ ŒãååŸããæ¹æ³ãèŠã€ããã®ã«åœ¹ç«ã¡ãŸãã

price_color
ã¯ã©ã¹ãäŸ¡æ Œã®ããè¡ã«å¯Ÿå¿ããŠããããšã«æ°ä»ããããããŸããã ãã®ã¯ã©ã¹ã䜿çšããŠèŠçŽ ãéžæããããã«å«ãŸããããã¹ããèªã¿åãããšãã§ããŸãã
let price = document.querySelector('.price_color').innerText;
ããŒãžããæ¬ã®ååãšãã®äŸ¡æ ŒãåŒãåºããã®ã§ã颿°ãããã®ãã¹ãŠããªããžã§ã¯ããšããŠè¿ãããšãã§ããŸãã
return { title, price }
çµæã¯æ¬¡ã®ã³ãŒãã§ãã
const result = await page.evaluate(() => { let title = document.querySelector('h1').innerText; let price = document.querySelector('.price_color').innerText; return { title, price } });
ããã§ãããŒãžããæ¬ã®ååãšäŸ¡æ Œãèªã¿åãããããããªããžã§ã¯ãã«ä¿åãããã®ãªããžã§ã¯ããè¿ããŸããããã«ããã
result
ãæžã蟌ãŸããŸãã
çŸåšã¯ã
result
宿°ãè¿ãããã®å
容ãã³ã³ãœãŒã«ã«è¡šç€ºããã ãã§ãã
return result;
ãã®äŸã®å®å
šãªã³ãŒãã¯æ¬¡ã®ããã«ãªããŸãã
const puppeteer = require('puppeteer'); let scrape = async () => { const browser = await puppeteer.launch({headless: false}); const page = await browser.newPage(); await page.goto('http://books.toscrape.com/'); await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img'); await page.waitFor(1000); const result = await page.evaluate(() => { let title = document.querySelector('h1').innerText; let price = document.querySelector('.price_color').innerText; return { title, price } }); browser.close(); return result; }; scrape().then((value) => { console.log(value);
ããã§ãNodeã䜿çšããŠããã°ã©ã ãå®è¡ã§ããŸãã
node scrape.js
ãã¹ãŠãæ£ããè¡ããããšãæ¬ã®ååãšãã®äŸ¡æ Œãã³ã³ãœãŒã«ã«è¡šç€ºãããŸãã
{ title: 'A Light in the Attic', price: '£51.77' }
å®éãããã¯ãã¹ãŠWebã¹ã¯ã¬ã€ãã³ã°ã§ããããã®ã¬ãã¹ã³ã®æåã®ã¹ããããèžãã ã ãã§ãã
äŸ3ïŒããã°ã©ã ã®æ¹å
ããã«ã¯ãããªãåççãªè³ªåããããŸãããæ¬ã®ååãšäŸ¡æ Œã®äž¡æ¹ãããŒã ããŒãžã«è¡šç€ºãããŠããå Žåããªãæ¬ã®ããŒãžã«ã€ãªãããªã³ã¯ãã¯ãªãã¯ããŸããïŒ ãããããŸã£ããã«é£ããŠè¡ã£ãŠã¿ãŸãããïŒ ãããŠããããã§ãããããã¹ãŠã®æ¬ã®ååãšäŸ¡æ Œãèªãã§ã¿ãŸãããïŒã
ãããã®è³ªåã«å¯Ÿããçãã¯ãWebã¹ã¯ã¬ã€ãã³ã°ã«ã¯å€ãã®ã¢ãããŒãããããšããããšã§ãã ããã«ãããŒã ããŒãžã«è¡šç€ºãããããŒã¿ã«å¶éãããšãæžç±ã®ååãçããªããšããäºå®ã«ééããå ŽåããããŸãã ããããããããã¹ãŠã®èãã¯ããªãã«ç·Žç¿ããçµ¶å¥œã®æ©äŒãäžããŠãããŸãã
âã¿ã¹ã¯
ããªãã®ç®æšã¯ããã¹ãŠã®æ¬ã®ã¿ã€ãã«ãšãã®äŸ¡æ ŒãããŒã ããŒãžããèªã¿ããããããªããžã§ã¯ãã®é
åãšããŠè¿ãããšã§ãã ããã«ç§ãåŸãé
åããããŸãïŒ

ç¶è¡ã§ããŸãã ããã«èªãããšã¯ããããã¹ãŠèªåã§ããããã«ããŠãã ããã ãã®åé¡ã¯ãå
ã»ã©è§£æ±ºããåé¡ãšéåžžã«äŒŒãŠãããšèšããªããã°ãªããŸããã
ããŸããããŸãããïŒ ããã§ãªãå Žåã¯ãããã«ãã³ãããããŸãã
ãã³ã
ãã®ã¿ã¹ã¯ãšåã®äŸã®äž»ãªéãã¯ãããã§ããŒã¿ã®ãªã¹ãã調ã¹ãå¿
èŠãããããšã§ãã æ¹æ³ã¯æ¬¡ã®ãšããã§ãã
const result = await page.evaluate(() => { let data = []; // let elements = document.querySelectorAll('xxx'); // // // // data.push({title, price}); // return data; // });
ä»ã§ãåé¡ã解決ã§ããªãå Žåã¯ãå¿é
ããå¿
èŠã¯ãããŸããã ããã¯ç·Žç¿åé¡ã§ãã èãããã解決çã®1ã€ã次ã«ç€ºããŸãã
âåé¡ã解決ãã
const puppeteer = require('puppeteer'); let scrape = async () => { const browser = await puppeteer.launch({headless: false}); const page = await browser.newPage(); await page.goto('http://books.toscrape.com/'); const result = await page.evaluate(() => { let data = []; // let elements = document.querySelectorAll('.product_pod'); // for (var element of elements){ // let title = element.childNodes[5].innerText; // let price = element.childNodes[7].children[0].innerText; // data.push({title, price}); // } return data; // }); browser.close(); return result; // }; scrape().then((value) => { console.log(value); // ! });
ãŸãšã
ãã®èšäºã§ã¯ãGoogle Chromeãã©ãŠã¶ãŒãšPuppeteerã©ã€ãã©ãªãŒã䜿çšããŠWebã¹ã¯ã¬ã€ãã³ã°ã·ã¹ãã ãäœæããæ¹æ³ãåŠã³ãŸããã ã€ãŸããã³ãŒãã®æ§é ããã©ãŠã¶ãããã°ã©ã ã§å¶åŸ¡ããæ¹æ³ãã¹ã¯ãªãŒã³ã³ããŒãäœæããæ¹æ³ãããŒãžã§ã®ãŠãŒã¶ãŒã®äœæ¥ãã·ãã¥ã¬ãŒãããæ¹æ³ãããã³WebããŒãžã«æçš¿ãããããŒã¿ãèªã¿åã£ãŠä¿åããæ¹æ³ãæ€èšããŸããã ãããWebã¹ã¯ã¬ã€ãã³ã°ã®æåã®ç¥ãåãã§ããå Žåãã€ã³ã¿ãŒãããããå¿
èŠãªãã®ãã¹ãŠãå
¥æããããã«å¿
èŠãªãã®ããã¹ãŠæã£ãŠããããšãé¡ã£ãŠããŸãã
芪æãªãèªè
ïŒ ãŠãŒã¶ãŒã€ã³ã¿ãŒãã§ã€ã¹ãªãã§Puppeteerã©ã€ãã©ãªãšGoogle Chromeãã©ãŠã¶ãŒã䜿çšããŠããŸããïŒ