æ¹ä¿®ããã°ã©ã ãžã®å
¥å ŽãŸãã¯éå Žã®æç¥šãå®äºãããšãäœããã®çç±ã§ãç¹å®ã®åäœå®
ã®å€èгã«é¢ããããŒã¿ãã¢ã¹ã¯ã¯åžé·ã®ãµã€ãããæ¶ããå
šäœãšããŠè³æç¥šãšå察祚ã®ã¿ãæ®ããŸããã ãã¡ãããããã€ãã®æ°åã¯ãã¥ãŒã¹ã«æžãããŠããŸãããããªãã¯æ¬åœã«èªåã§ããããèŠãŠãçµ±èšãããããã°ã©ããäœæãããã®ã§ããïŒ
ã¯ããæ¬¡ã®ãããªã¹ããŒãã¡ã³ãã®åŸïŒ
ãããã®ãµãŒãã¹ã®äººæ°ã«é¢ããŠã¯ãMy Documentsã®å
Œ
±ãµãŒãã¹ã»ã³ã¿ãŒã¯æç¥šè
å
šäœã®ååãããè¶
ããŠãããActive CitizenããŒã¿ã«ã«ãããã«è² ããŠããŸãã
ã©ããããããããããªçåããããŸãã ã ãã-æ
å ±ã®åéãå§ããŸãããïŒ ãããŠããããåæããŸãã ãããè¡ãã«ã¯ãããçš®ã®èšèªïŒããšãã°pythonïŒãããçš®ã®ããŒã¿ããŒã¹ïŒããšãã°sqliteïŒãããã³ããçš®ã®Webã¹ã¯ã¬ãŒããŒãå¿
èŠã§ãããããã¯pythonçšã®ãã®ãå€ãããã§ãã æåŸã«ãçµæã®ããŒã¿ããŒã¹ãžã®ãªã³ã¯ãæäŸããŸãããããã䜿ã£ãŠäœã§ãã§ãããšããã«èšããªããã°ãªããŸããã
åžåœ¹æã®æ·å°ã«è¡ããè»èŒªãåããŸãã 4543ã®å®¶ããããŒã¿ãåéããå¿
èŠããããŸãã
å°åºãªã¹ãã§
ã©ã³ãã ãªå®¶ãã¯ãªãã¯ããŠãããŒã¿ã®äžè¬çãªåœ¢åŒã確èªããŸãã
ã©ãããããã¹ãŠã®å®¶ã«ã¯ç¹å®ã®IDããããURLã§ç¢ºèªã§ããŸãã
https://www.mos.ru/otvet-stroitelstvo/itogi-golosovaniya-zhitelej-po-proektu-programmy-renovacii/?u=121
ãããã£ãŠããã¹ãŠãèªå®
ã§åŠçããã«ã¯ããã¹ãŠã®èå¥åã®ãªã¹ããååŸããŠæŽçããå¿
èŠããããåããŒãžããå¿
èŠãªæ
å ±ãã€ãŸãæç¥šæ°ãå察祚æ°ãåèšæç¥šæ°ãååŸããŸãïŒãæªæ±ºå®ãããããŸãïŒ ãæç¥šãããã¢ããŒãã§åœŒãã®ååãè³æããååãå察ããå ŽåïŒãããã³äžè¬çãªå®¶ã®äŒè°ããã£ããã©ããïŒãªãã§ããïŒ
IDã®ãªã¹ãã¯ã©ãã§å
¥æã§ããŸããïŒ äžã§èŠãããã«ãå°åºãªã¹ãã«ã¯æå®ãããå®¶ãžã®ãªã³ã¯ã¯ãããŸãããããã¯åãªãäœæã®ããã¹ããªã¹ãã§ãããããã ãã§ãã æ®å¿µãããã¯äŸ¿å©ã§ãããã ããã§äžŠã¹æ¿ããè¡ãå¿
èŠãããããã§ãã ãã ãããŸããç¹å®ã®åèå¥åããããŒã¿ãåéãã颿°ãäœæããŸãã
ãŸããå®¶ã®ååŸãªã¯ãšã¹ããäœæããçµæã®åçãèŠãŠãããããäœãã¹ã¯ã©ããããã¹ãããçè§£ããŸãã
import requests r = requests.get('https://www.mos.ru/otvet-stroitelstvo/itogi-golosovaniya-zhitelej-po-proektu-programmy-renovacii/?u=121') print(r.text)
å¿çã«æç¥šããŒã¿ããªãããšãç¥ã£ãŠé©ãããããçš®ã®ããŒãžãã³ãã¬ãŒããäžããããã çµæã®ããŠã³ããŒãæ¹æ³ãèŠãŠã¿ãŸãããã ãããŠå®éã«ã¯-æåã«ãã©ãŠã¶ã«æªå
¥åã®ãã£ãŒã«ããããããŒãžãããããã°ããããŠããããŒã¿ã衚瀺ãããŸãã ãéçºè
åããã®é©åãªããŒã«ïŒMozillaãªã©ïŒãåãããã©ãŠã¶ãŒã䜿çšããŠãããã§äœãèµ·ããããæ£ç¢ºã«ç¢ºèªããŠãã ããã

ããïŒ ããã€ãã®apiã«å¯Ÿãããã©ãŠã¶ãªã¯ãšã¹ããèŠã€ãããŸãããIDã¯ç£èŠããŠããå®¶ã®IDãšåãã§ãã
ã€ãŸãããã©ã¡ãŒã¿ãŒãªãã§
www.mos.ru/altmosprx/api/1/renovation/house_result/121ãžã®getãªã¯ãšã¹ãã
éä¿¡ãããšããã®çš®é¡ã®JSONãè¿ãããŸãã
{ "execTime": 0.044450044631958, "errorMessage": "", "result": { "table": "<table class="table table-big"><thead><tr class="table-header"><td> </td><td> </td><td></td><td></td><td> </td></tr></thead><tbody><tr><td class="apartment-id">0G6O4</td><td class="voting-info"><p>bf659227e8e3</p><p>5f9403659209</p></td><td class="voting-choice"><p></p><p></p></td><td class="voting-date"><p>18.05</p><p>18.05</p></td><td class="apartment-position apartment-agree"><p></p></td></tr><tr><td class="apartment-id">0G6O5</td><td class="voting-info"><p>3f12be5cea77</p></td><td class="voting-choice"><p></p></td><td class="voting-date"><p>15.05</p></td><td class="apartment-position apartment-agree"><p></p></td></tr><tr> ... <td class="apartment-id">0G6V1</td><td class="voting-info"><p>5acd126a410ea1a842e67066ea68fa8f</p></td><td class="voting-choice"><p></p></td><td class="voting-date"><p>24.05</p></td><td class="apartment-position apartment-agree"><p></p></td></tr></tbody></table>", "total": { "und": 0, "za": 100, "protocol_res": 0, "protiv": 0, "gorod_mark": 0, "protocol_date": null, "house_status": 1, "gorod": 0 }, "und_table": "<table class=\"table table-big\"><thead><tr class=\"table-header\"><td> </td><td> </td><td></td><td></td><td> </td></tr></thead><tbody></tbody></table>", "address": " , 63, 2" }, "request_id": "empty_requestid", "errorCode": 0 }
ããŠãããã§ã¯äœã廿£ããå¿
èŠã¯ãããŸããããã¹ãŠã®ããŒã¿ã¯APIããçŽæ¥ååŸãããŸãã å¿
èŠãªã®ã¯ãç·äŒã«é¢ããã¢ããŒãã¡ã³ãIDã®æ°ïŒç·æç¥šæ°ïŒãèšç®ããæç¥šçã«é¢ããæ¢è£œã®ããŒã¿ãååŸããããšã ãã§ãã
ããããããã§ããã©ã®ç¯å²ã®IDãèŠãã®ã§ããããïŒ åèšã§4543ã®å®¶ãããã¯ãã§ããapi0ãæå®ããŸã-ãã®ãããªå®¶ããããŸãã -1ãäžãã-ãšã©ãŒããŸããããããšãã äžéãæ±ºå®ãããŸããã ç§ãã¡ã¯10000ãäžããŸã-ãã®ãããªå®¶ããããŸãã ããŠãã¯ã£ãããš4543ããããŸããæè¿åå ããé åããããã€ãã®å®¶ãèŠãŠãäžéšã®å¢çç·ãããããæ±ºå®ããŠã¿ãŸããã...å°å³ã«æ»ãããå€ãããã®ãããæ°ããã¢ã¹ã¯ã¯ãã®ã©ããã«è¡ããŸããçŽïŒã³ã³ã·ããéèœãã³ã³ã·ããäŒææãåŽåè
éããå®¶2ãidïŒ440931ããŸããå°ãªããšã50äžäººãããŸãã
50äžã®ãªã³ã¯ãéåžžã®ã«ãŒãã§æŽçããã®ã¯æåã®ã¢ã€ãã¢ã§ã¯ãªãããã
concurrent.futuresã¢ãžã¥ãŒã«ã䜿çš
ããŸãã ãã¡ãããasyncioã®ãããªãã®ã䜿çšããããšãå¯èœã§ããããã®ãããªå€§èŠæš¡ãªã¿ã¹ã¯ã¯ãªãããå°ããªè¡ãã§è¡ãããšãã§ããŸãã ãã¹ãŠãéåžžã«ç°¡åã§ãã å®¶çªå·ãæããã«æ£ããå Žåãæããã«ééã£ãŠããå ŽåãapiãæäŸãããã®ã調ã¹ããã¹ãŠããã§ãã¯ããããã®é¢æ°ãäœæããŸãã æ¬¡ã«ã䞊åã¯ãšãªã䜿çšããŠãã¹ãŠãã«ãŒãã§ããããŸãã IDã¯ããªãåŠçããå¿
èŠããããŸãã æ¬¡ã«ãçµæãäœæããŠæžãçããŸãã äžè¬ã«ã次ã®ã³ãŒãã®ãããªãã®ãåŸãããŸãã
import requests from concurrent.futures import ProcessPoolExecutor import concurrent.futures def check(url):
ç§ãã¡ã¯ããžãã¹ã«åãæãããŸã-ããã¯6人ã®åŽåè
ã§ããé·ãéã§ãã å°æ¥çã«ã¯ã100äžãåŠçããåŸãçµæããã70è»å°ãªãå®¶ãæã«å
¥ããã®ã§ããã®ãã°ãã€ãã1,000äžã«å€æŽããªããã°ãªããªãã£ããšèšããŸãã ããã¯é·ãæéã§ãä»äºãæ®ããŠä»äºã«åºãŸããã
ãã¡ããã䞊åãªã¯ãšã¹ãã®æ°ãå¢ããããšã¯å¯èœã§ãããä»ã®èª°ãã®IPã䜿çšããŠãããããç€Œåæ£ããæ¯ãèãå¿
èŠããããŸãïŒããã§ãªããã°ãçªç¶çŠæ¢ãããŸãïŒã
äžè¬ã«ããã¹ãŠã®å®¶ã®èå¥åã®ãªã¹ãããããŸããæ¬¡ã«ãããããåŠçããå©çšå¯èœãªãã¹ãŠã®æ
å ±ãåéããå¿
èŠããããŸãã ãã®ãããªé
åã¯ãæã§ããã¹ããã¡ã€ã«ã«ããã·ã¥ããããšã¯ã§ããŸãããåŠçããã®ã¯äžäŸ¿ã§ãã sqlite3ã䜿çšããŸãã
ããŒã¿ããŒã¹ãäœæããŸãã å®¶ãšè¹äœãæ§é ãªã©ã®è¿œå èŠçŽ ãé€ããŠããã¹ãŠã®ãã£ãŒã«ãã¯æããã§ãã ããšãã°ãè¿ãã®å®¶ã®åŸåãèŠããå Žåã«åããŠãåãå£ããŸãã
import sqlite3 schema = "CREATE TABLE `houses` (\ `id` INTEGER PRIMARY KEY,\ `street` TEXT NOT NULL ,\ `house_nbr` TEXT NOT NULL,\ `house_additional` TEXT,\ `total_votes` INTEGER,\ `total_za` INTEGER,\ `meeting` INTEGER DEFAULT '0',\ `flats` INTEGER\ );" conn = sqlite3.connect('renovation.db') cur = conn.cursor() db = cur.execute(schema) conn.commit() conn.close()
ãŸãããã€ã³ããŸã§ïŒ ãã¹ãŠã®JSONãšã©ãŒïŒãããããåçã«JSONããªãã£ãïŒãšäžæãªãšã©ãŒïŒã»ãšãã©åçããªãã£ãå¯èœæ§ãé«ãïŒãæžãçããæ¹æ³ã«æ²¿ã£ãŠãå®¶ã«é¢ããæ
å ±ãååŸããŠããŒã¿ããŒã¹ã«è¿œå ãã颿°ãäœæããŸãããã¡ãããããããã°ãå¥ã
ã«ã
import requests import re import sqlite3 def gethouseinfo(idd): print(idd) urly = 'https://www.mos.ru/altmosprx/api/1/renovation/house_result/' + str(idd) + '/' try: r = requests.get(urly) r.encoding = 'utf-8' results = r.json() adress = results['result']['address'] print(adress) if re.match('(.*), (.*), (.*)', adress): adress_street = re.match('(.*), (.*), (.*)', adress).group(1) adress_house = re.match('(.*), (.*), (.*)', adress).group(2) adress_building = re.match('(.*), (.*), (.*)', adress).group(3) else: adress_street = re.match('(.*), (.*)', adress).group(1) adress_house = re.match('(.*), (.*)', adress).group(2) adress_building = '' totalvotes = len(re.findall('apartment-id', results['result']['table'])) + len(re.findall('apartment-id', results['result']['und_table'])) aye = results['result']['total']['za'] meetinghappened = bool(results['result']['total']['protocol_res']) iddlist = [] iddlist.append(idd) check = cur.execute('SELECT * FROM houses WHERE id=?', iddlist) res = check.fetchone() if res: print('already exists') else: insert = cur.execute('INSERT INTO houses (id, street, house_nbr, house_additional, total_votes, total_za, meeting) values (?, ?, ?, ?, ?, ?, ?)', [idd, adress_street, adress_house, adress_building, totalvotes, aye, meetinghappened]) print('added ' + str(idd)) except ValueError: print('no data for id '+ str(idd)) jsonerror.append(idd) except: print('unknown eggog') unknownerror.append(idd) jsonerror = [] unknownerror = [] with open('/home/deb/mosres.txt') as fc: mosres = fc.read().splitlines() conn = sqlite3.connect('/home/deb/renovation.db') cur = conn.cursor() for house in mosres: gethouseinfo(house) conn.commit() conn.close() if jsonerror: with open('/home/deb/jsonerror.txt', 'w') as f: for item in jsonerror: f.write('{}\n'.format(item)) if unknownerror: with open('/home/deb/unknownerror.txt', 'w') as f: for item in unknownerror: f.write('{}\n'.format(item))
ã ãããããã¯äœãã§ãã çŸåšãåžé·å®€ã®ãŠã§ããµã€ãäžã®å
¬éæ
å ±ã«åºã¥ããŠäœæããããæ¹ä¿®ã®æç¥šã«é¢ããããŒã¿ããŒã¹ããããŸãã ãã§ã«æãäžããŠïŒæçµçã«ïŒïŒã°ã©ãã£ãã¯ãæãããšãã§ããŸãïŒ
ãã¹ãŠã®çµæïŒè³åŠäž¡è«ã®å²åãèŠèŠçã«ç¢ºèªããããã«å°ãã倧ã«ãœãŒããX-èªå®
ããã¬ãŒã€ãŒ-ã®å²åãç·ã®ç·-䟿å®äžãå
æããå¿
èŠããã66ïŒ
ãåãæšãŠãïŒïŒ

è³æç¥šã®ååžïŒ

ããã¯ãã¹ãŠè¯ãããšã§ãããå€èгã«èå³ããããŸããã ãããŠãããã§ã¯é£ãããªããŸãã å®éã«ã¯ãäœæã«ã¢ããŒãã®æ°ãåããéåžžã®ãåçŽãªãéäžåããããªãœãŒã¹ã¯ãããŸããã å°ãªããšã2GISã«ã¯ãå°ãªããšãæææ
å ±ã®ã¢ããããŒãã®äŸã§ã¯ãã¢ããŒãã®æ°ã¯ååšããŸãããææã§ããïŒ ããã¯ç§ãã¡ã®æ¹æ³ã§ã¯ãããŸããïŒ ç§ãã¡ã¯ä»ã®æ¹æ³ã§è¡ããŸãã
容赊ãªãã°ãŒã°ã«ãšã€ã³ããã¯ã¹ã¯ããµã€ã
tvoyadres.ru/domaãæããŠãããããã«ã¯ãå€ãã®å Žåãã¢ããŒãã«é¢ããæ
å ±ã®ããå®¶ããããŸãã ããããããããåéããæ¹æ³ã¯ïŒ çæ³çã«ã¯ããŸãããŒã¿ã®ããå®¶ã®ãªã¹ãå
šäœïŒå°ãªããšãéãïŒãåéããŠãããåžåœ¹æã®åœ¢åŒã§ååŸããããŒã¿ããŒã¹å
ã®äœæããã®ãµã€ãã®åœ¢åŒã®äœæã«æ¥ç¶ããŠãããããããåŒãåºãå¿
èŠããããŸãäœãšçµã³ã€ããã®ããã¢ããŒãã ããããéãããå§ãã䟡å€ã¯ãããŸããïŒ
tvoyadres.ru/ulitsy-ãããã200ããŒãžã®ãœãŒããååé¡ã®ã¹ã¯ã¬ã€ãã³ã°ãšåŠçã¯éåžžã«éå±ã§ãã ãã¶ããããªããããã§ããã€ãã®APIãèŠã€ããããšãã§ããŸããïŒ
è¡ã®ããŒãžã§ã¯æåãåŸ
ã£ãŠããŸãããè¡ã®ãªã¹ãã ãã§ãªãããMore streetsããã¿ã³ããããŸãïŒ
tvoyadres.ru/moskovskaya-oblast/goroda/551
ããïŒ ã¿ã€ããªã¯ãšã¹ã
http://tvoyadres.ru/js/street.php?region=81&city=&count=2073&_=1499809159225
ã¯ãéè·¯ãžã®ãªã³ã¯ïŒããã³ãªã³ã¯ãžã®èå¥åïŒãå«ãéè·¯ã®ãªã¹ããæäŸããŸãã ããŠããªã¯ãšã¹ãã®æåãäœãæå³ããã®ããçè§£ããããšã¯æ®ã£ãŠããŸãã ãã¡ãããç§ãã¡ã¯å°åãéœåžã«ãè§ŠããŸããã æåŸã®çè§£ã§ããªããã®ãåé€ããŠã
tvoyadres.ru / js / street.phpïŒregion = 81ïŒcity = MoscowïŒcount = 2073ã®ã¿ãæ®ããŠã¿ãŸããã-çµæã¯åãã§ãã OKãããäžåºŠãã¿ã³ãã¯ãªãã¯ããŸããåããªã¯ãšã¹ããéä¿¡ãããŸããããã«ãŠã³ãã¯100å°ãªãããšãããããŸãã ãã®ãã©ã¡ãŒã¿ãŒãæåã§è©ŠããŠã¿ãŸãããã
0-ãšã©ãŒãè¿ãããŸãã 1-1ã€ã®éããæ»ããŸãã 2-2ã€ã®éããæ»ãããã§ã«1ã€ãèŠãŠããŸãã 100-100éããè¿ãããŸããã 200-å¥ã®100éããæ»ã£ãŠããŠããŸãã ç§ãã¡ã¯2073幎ã«å§ããŸãã-2173ã詊ããŠã¿ãŸãããïŒ ã¯ãããããã¯è¡ã®ããŒãžã«è¡šç€ºãããæåã®çŸéãã§ãã 2174ïŒ
é倧ãªãšã©ãŒ
SQLæ§æã«ãšã©ãŒããããŸãã è¡1ã§ '-1ã100'ã®è¿ãã§äœ¿çšããæ£ããæ§æã«ã€ããŠã¯ãMySQLãµãŒããŒã®ããŒãžã§ã³ã«å¯Ÿå¿ããããã¥ã¢ã«ã確èªããŠãã ããã
ãã£ãšã ã«ãŠã³ãã¯
LIMIT SELECTã¯ãšãªã«éãããå¶éã«ã¯åžžã«100è¡ããããã«èŠããŸãããæåã®è¡ã¯ã«ãŠã³ã-2173ãšããŠèšç®ãããŸããã¡ãªã¿ã«ãããªãäžä¿çãªããã§ã-SQLã€ã³ãžã§ã¯ã·ã§ã³ã®äœææ¹æ³ãããããŸããã§ãããããã¯éä¿¡ãããŸããããèšç®ããããããŠããªããããã«æ°åãå
¥ããªããã°ãããã¯é³è
ã«ãªããŸãã ãŸãã çµæããããŸãã ããŠããããŸã§ã§ãã
ãã¹ãŠãéåžžãããç°¡åã§ãïŒ
def getstreets(num): r = requests.get(url + str(num)) results = r.json() result = results['string'] return(result) for i in range(1, 2272, 100): totalres += getstreets(i)
ãããŠçµæãæžããŸãã çµæã¯ãããããã®htmlã³ãŒãã®ããã«ãªããŸãã äžè¬ã«ããã§ã«ããã§ã¯æ£èŠè¡šçŸãããåªãããã®ã«é²ãããšãã§ããŸããã ã¿ã¹ã¯ã¯éåžžã«åçŽã§ã-ããã«ãããããããç§ãã¡ã¯åœŒãã«ãã£ãŠã¢ã€ãã³ãã£ãã£ã®äººã
ãšäžç·ã«è¡ãåŒãåºããããããç§ãã¡ã¯ããŒã§åœŒãã®å£è¿°ãäœããŸã-è¡ã®ååã
sids = re.findall('ulitsy\/(.*?)">(.*?)<\/a>', totalres)
ãããããããããç§ãã¡ã¯éãã®ããŒãžããå®¶ã«åž°ããªããã°ãªããŸãããèšãæããã°-ããããã®htmlã§äœæ¥ããŸãã ãã®ãããªäžé£ã®æ£èŠè¡šçŸãè§£æããããšãããšãããé«äŸ¡ã«ãªããŸãã ãããã£ãŠã
BeautifulSoupã«ç²ŸéããŠãã ããã
颿°ãäœæããå¥ã®èŸæžã«å¯ŸããŠå®è¡ããŸãããã®å ŽåãããŒã¯æ¢ã«èå¥åã§ãããå€ã¯ã¢ããŒãã¡ã³ãã§ãã ããžãã¯ã¯æ¬¡ã®ãšããã§ããã¢ããŒãã®ååããšã«ãIDããšã«IDãååŸã§ããŸããã¢ããŒããååŸã§ããŸãã
import re from bs4 import BeautifulSoup def gethouses(num): r = requests.get('http://tvoyadres.ru/moskovskaya-oblast/moskva/ulitsy/' + str(num) + '/') results = r.text soup = BeautifulSoup(results, 'html.parser') ul = soup.find("ul", {"class": "next"}) houses = [] try: for li in ul.find_all("li"): urly = li.a['href'] urly = re.search('doma\/(.*)\/', urly).group(1) houses.append([li.get_text(), urly]) return(houses) except: print('None') return('None') totalyres = {} for key in sids: num = sids[key] totalyres[num] = gethouses(num)
2ã€ã®ããšãåæã«æãæµ®ãã³ãŸãã ãŸããåéãã«ãªã¹ãããããåå€ãå®¶ã®ååãšãã®èå¥åãå«ããªã¹ãã§ãããã£ã¯ããŒã·ã§ã³ã§ã¯ãªãããã£ã¯ããŒã·ã§ã³ãæ·»ä»ããããã£ã¯ããŒã·ã§ã³ãäœæããå¿
èŠããããŸãã ããã¯åçŽãªã«ãŒãã«å€ããããšãã§ããŸããæ³šæãéäžããŸãããããã®ãããªdict totreãåŒã³åºããŸããã-ã¯ãããã¡ã³ã¿ãžãŒã¯æåŸã«å®å
šã«ç§ãæ®ããŸããã ããããªãã
2ã€ç®-ãããŠåå®¶ã®yurlã«ã¯ãèå¥åã ãã§ãªããéãã®é³èš³ãå«ãŸããŠããŸãïŒ
äŸ ã
for key in totre: urlo = 'http://tvoyadres.ru/moskovskaya-oblast/moskva/ulitsy/' + key + '/' ra = requests.get(urlo) try: streetname = re.search('<ul class="next"><li><a href="\/moskovskaya-oblast\/moskva\/(.+?)\/doma\/', ra.text).group(1) totre[key]['streetname'] = streetname except: print(key)
ãããŠä»ãç§ãã¡ã¯ã»ãšãã©æãå°é£ã«çŽé¢ããŠããŸãã ããŒã¿ããŒã¹ã«ããåžåœ¹æã®çªå°ãšãäœæé²ã®çªå°ã䜿çšããèå¥åãçµã³ä»ããå¿
èŠããããŸãã
Difflibã¯pythonã«çµã¿èŸŒãŸããŠãããããããã«åœ¹ç«ã¡ãŸãã ããããdifflibã«ã¯ã»ãšãã©æã¿ããããŸãããé »åºŠãšé¡äŒŒæ§ã¯ãã¡ããè¯ãã®ã§ããŠãŒã¶ãŒã確èªããå¿
èŠããããŸãããæããªééããé¿ããå¿
èŠããããŸãã äžè¬ã«ããã©ãŒããããèŠããšãã¯ã©ã¹ãšããŠæånoããªãå Žæãéè·¯åãããéãããšããåèªãåé€ãããŠããå ŽæããããŠãããè¡ãããšã«æ°ä»ããŸãã
conn = sqlite3.connect('renovation.db') cur = conn.cursor() streets = cur.execute('SELECT DISTINCT street FROM houses order by street asc') streeets = streets.fetchall() conn.close() exactmatches = {} keyslist = [] for key in sids.keys(): keyslist.append(key) def glue(maxres=3, freq=0.6): for each in streeets: eachnoyo = each[0].replace('', '') diffres = difflib.get_close_matches(eachnoyo, keyslist, maxres, freq) if each[0] not in exactmatches.keys(): if len(diffres) == 1: print(each[0] + ': ' + diffres[0]) notcompleted = False while notcompleted == False: inp = input('Correct? y/n ') if inp == 'y': notcompleted = True exactmatches[each[0]] = sids[diffres[0]] elif inp == 'n': notcompleted = True else: print('Incorrect input, try again') elif len(diffres) == 0: print('No matches for ' + each[0]) elif len(diffres) > 1: print(each[0] + ': ' + str(diffres)) notcompleted = False while notcompleted == False: inp = input('List number? Or n ') try: listnum = int(inp) except: listnum = None if inp == 'n': notcompleted = True elif listnum in range(0, len(diffres)): notcompleted = True exactmatches[each[0]] = sids[diffres[0]] else: print('Incorrect input, try again') with open('exactmatches.json', 'w') as f: json.dump(exactmatches, f, ensure_ascii=False)
ã³ã³ãœãŒã«ã«åº§ã£ãŠãçµæãèŠãŠããã¿ã³ãæŒããŸãã ãµã€ã¯ã«å
šäœãçµãã£ãããããšãã°ãglueïŒ10ãfreq = 0.4ïŒãªã©ãããå€ãã®ã¹ãã¢ãã©ã¡ãŒã¿ãŒã§é¢æ°ãéå§ããŸãã

ç§ã¯ãã³ã±ãã¯ã§700éãã®ãã¡506éãã«å¿èããŸãããç§ã®æèŠã§ã¯ãçŽ æŽãããçµæã§ãããæãéèŠãªããšã¯ãçµ±èšçã«ææã§ãïŒæãå¯èœæ§ãé«ãïŒã
ãããŠä»ãããªãã¯å®¶ã®ããã«åãããšãããå¿
èŠããããå®éã«ã¯ããã®æ°ã®ã¢ããŒããåããŸãã ãããŠãããŒã¿ããŒã¹ã«å
¥ããŸãã
conn = sqlite3.connect('renovation.db') cur = conn.cursor() allhouses = cur.execute('SELECT * FROM houses WHERE flats IS NULL ORDER BY id') allhousesres = allhouses.fetchall() url2 = 'http://tvoyadres.ru/moskovskaya-oblast/moskva/' def getnumberofflats(streetname, houseid): urlo = url2 + str(streetname) + '/doma/' + str(houseid) + '/' r = requests.get(urlo) results = r.text numbe = re.search('<span class="left"> <\/span> <span class="right">(\d*)<', results).group(1) return numbe def gluehousesnumbers(freq=3, ratio=0.6): for house in allhousesres: if house[1] in exactmatches.keys(): housenbr = house[2].replace('', '') if house[3]: housenbr = housenbr + ' ' + house[3] housenbr = housenbr.lower() diffres = difflib.get_close_matches(housenbr, totre[exactmatches[house[1]]].keys(), freq, ratio) if len(diffres) == 1: print(housenbr + ': ' + diffres[0]) notcompleted = False while notcompleted == False: inp = input('Correct? y/n ') if inp == 'y': notcompleted = True try: flatsnumber = getnumberofflats(totre[exactmatches[house[1]]]['streetname'], totre[exactmatches[house[1]]][diffres[0]]) insertion = cur.execute('UPDATE houses SET flats = ? WHERE id = ?', [flatsnumber, house[0]]) except: print('weird, no flat number for ' + str(house)) elif inp == 'n': notcompleted = True else: print('Incorrect input, try again') elif len(diffres) > 1: print(housenbr + ': ' + str(diffres)) notcompleted = False while notcompleted == False: inp = input('List number? Or n ') try: listnum = int(inp) except: listnum = None if inp == 'n': notcompleted = True elif listnum in range(0, len(diffres)): notcompleted = True try: flatsnumber = getnumberofflats(totre[exactmatches[house[1]]]['streetname'], totre[exactmatches[house[1]]][diffres[0]]) insertion = cur.execute('UPDATE houses SET flats = ? WHERE id = ?', [flatsnumber, house[0]]) except: print('weird, no flat number for ' + str(house)) else: print('Incorrect input, try again') conn.commit() conn.close()
楜ãã¿ç¶ããŠããŸãã äž»ãªããšã¯ãå颿°ã®å®è¡åŸã«å€æŽãã³ãããããããšãå¿ããªãããšã§ãã ããŒã¿ããŒã¹ãåžžã«ããããªãããã«ãæ¥ç¶ã远å ããŠé¢æ°èªäœã«ã³ãããããŸããã§ããã
ã¯ããããã¯ããªãç²ããŸãã ããããç¹°ãè¿ããŸããããã¹ãŠãdifflibã«å§ãããšãã35 'bããããã35ãã«è¿ãã35bããèæ
®ããã¹ã¯ãªãŒã³ã·ã§ããã®ããã«ãæããªééããçºçããŸãã ãã¡ãããããã¯difflibãšã©ãŒã§ã¯ãããŸããããæ£çŽãªãšãããå®ç§ãªãªã¯ãšã¹ããæ¢ãã®ã«ãã£ãšæéãè²»ãããã©ããã§ã€ãŸããã§ãããã ãŠãŒã¶ãŒã®ç¢ºèªã§æé«ã®ãããèªä¿¡ãæã£ãŠã

åèšïŒã¢ããŒãã®æ°ã¯ãçŽ4,500æžã®ãã¡3,592æžã§ãïŒ çŽ æŽãããçµæïŒããªãã¯èªåãè³è³ããŸãã-誰ãè³è³ããŸããïŒã ãããããã¡ãããéåžžã«å€ãã®å¶ç¶ã®äžèŽã確èªãããå Žåããšã©ãŒãçºçããŸãã

ã¢ããŒãã¡ã³ãã®ã¿ãšæããããããå€ãã®ã¢ããŒãã¡ã³ããæç¥šãã43ã®ã¢ããŒãã¡ã³ããåé€ããŸãã å¶ç¶ã®äžèŽã誀ã£ãŠç¢ºèªããããæããã«ééã£ãããŒã¿ããã£ããã¯æããã§ãã
ããŠãæ®ãã®éšåã䜿ãã°ããã§ã«æ¥œããããšãã§ããŸãã æç¥šçã¯ãæããã«ææš©è
ã®æ°ãã¢ããŒãã®æ°ã§å²ã£ããã®ãšããŠã«ãŠã³ããããŸãã ã°ã©ããæããã«æç»ãããŸãããå¯äžã®ããšã¯ãæç¥šã«ããåããŒã¿ãã€ã³ãã«å¯ŸããŠãã°ã©ããæç»ããŠãæå³ããªãããšã§ãã 圱ä»ãã®çŽã®ããã«èŠããã®ã§ãæç¥šããŒã¿ãåãäžããŠããã®ããŒã¿ãã€ã³ãã®å¹³åçµæããŒã¿ãååŸããããšããå§ãããŸãã
æãè¿ãæŽæ°ã«äžžããïŒ

æãè¿ã5ã®åæ°ã«äžžããŸãã

äžè¬ã«ã80ïŒ
ã®æç¥šçã®é åã§ã®äžå¯Ÿã®ã¢ã³ããã€ã¯ãé€ããŠããŸãã¯ãããã«é ããäžžããŠèŠããšãäžè¬ã«30ïŒ
ã40ïŒ
ãããã³80ïŒ
ã®é åã§ãæç¥šçã¯çµæã«åœ±é¿ããŸããã§ããã é©ãã¹ãæ¹æ³ã§100ïŒ
ã®æç¥šçãåžžã«100ïŒ
ã®çµæããããããªãéãã ãããŠãå¹³åããŠãæç¥šçã¯58.7ïŒ
ã§ããã
ããã¯äŸ¡å€ããããŸãããïŒ ç§ã«ãšã£ãŠãã¯ããç§ã¯å€ããåŠã³ãŸããã ãããŠèªè
ã®ããã«ïŒ ããŠãèªè
ã«ã¯
ããŒã¿ããŒã¹èªäœãæçš¿ã
ãŸã ã
ãã®ããŒã¿ã§ãã£ãšé¢çœãããšãã§ãããããããŸããã