翻蚳è
ã®åºæ翻蚳ã¯ãHabrã«é¢ããä»ã®ããŒã¿ãµã€ãšã³ã¹ãã¥ãŒããªã¢ã«ã®ã¹ããªãŒã ã«çªç¶æåããŸããã :)
ããã¯Dataquest.ioã®åµèšè
ã§ããVic Paruchuriã«ãã£ãŠæžãããŸããã圌ãã¯ãã®çš®ã®ããŒã¿ãµã€ãšã³ã¹ã®ã€ã³ã¿ã©ã¯ãã£ããªãã¬ãŒãã³ã°ãšããã®åéã§ã®å®éã®äœæ¥ã®æºåã«åŸäºããŠããŸãã ããã«ã¯æä»çãªããŠããŠã¯ãããŸããããããŒã¿ã®åéãããããã«é¢ããæåã®çµè«ãŸã§ã®ããã»ã¹ã¯éåžžã«è©³çŽ°ã«èª¬æãããŠããŸããããããã©ãããå§ããã°ããã®ãããããŸããã
ããŒã¿ãµã€ãšã³ã¹äŒæ¥ã¯ãæ¡çšã決å®ããéã«ããŒããã©ãªãªããŸããŸãæ€èšããŠããŸãã ããã¯ãç¹ã«ãå®è·µçãªã¹ãã«ãå€æããæè¯ã®æ¹æ³ã¯ããŒããã©ãªãªã䜿çšããããšã ããã§ãã è¯ããã¥ãŒã¹ã¯ãå®å
šã«èªç±ã«äœ¿ããããšã§ããè©ŠããŠã¿ããšãå€ãã®äŒæ¥ãæåãããåªããããŒããã©ãªãªããŸãšããããšãã§ããŸãã
é«å質ã®ããŒããã©ãªãªã®æåã®ã¹ãããã¯ããã®äžã§å®èšŒããå¿
èŠãããã¹ãã«ãç解ããããšã§ãã
äŒæ¥ãããŒã¿ãµã€ãšã³ãã£ã¹ãã§èŠãããšèããŠããäž»èŠãªã¹ãã«ã¯ã次ã®ãšããã§ãã
- ã³ãã¥ãã±ãŒã·ã§ã³èœå
- ä»ã®äººãšååããèœå
- æè¡çèœå
- ããŒã¿ã«åºã¥ããŠçµè«ãåºãèœå
- äž»å°æš©ãæ¡ãåæ©ãšèœåã
ãã¹ãŠã®åªããããŒããã©ãªãªã«ã¯è€æ°ã®ãããžã§ã¯ããå«ãŸããŠãããåãããžã§ã¯ãã«ã¯1ã2åã®ããŒã¿ãã€ã³ãã衚瀺ã§ããŸãã ããã¯ã調åã®åããããŒã¿ãµã€ãšã³ã¹ããŒããã©ãªãªã®ååŸãæ€èšããã·ãªãŒãºã®æåã®æçš¿ã§ãã ããŒããã©ãªãªã®æåã®ãããžã§ã¯ããäœæããæ¹æ³ãšãããŒã¿ãéããŠè¯ãã¹ããŒãªãŒãäŒããæ¹æ³ãèŠãŠãããŸãã æåŸã«ãããªãã®ã³ãã¥ãã±ãŒã·ã§ã³èœåãšããŒã¿ã«åºã¥ããŠçµè«ãåºãèœåãæããã«ãããããžã§ã¯ãããããŸãã
ã泚æ perevããµã€ã¯ã«å
šäœã翻蚳ããããšã¯çµ¶å¯Ÿã«ãããŸããããããããæ©æ¢°åŠç¿ã«é¢ããèå³æ·±ããã¥ãŒããªã¢ã«ã«è§Šããäºå®ã§ãã
ããŒã¿ã«ããå±¥æŽ
åºæ¬çã«ãããŒã¿ãµã€ãšã³ã¹ã¯ã³ãã¥ãã±ãŒã·ã§ã³ã«é¢ãããã®ã§ãã ããŒã¿ã«ãããã¿ãŒã³ãèŠãåŸããã®ãã¿ãŒã³ãä»ã®äººã«èª¬æããããã®å¹æçãªæ¹æ³ãæ¢ããå¿
èŠã ãšæãè¡åããšãããã«èª¬åŸããŸãã ããŒã¿ãµã€ãšã³ã¹ã§æãéèŠãªã¹ãã«ã®1ã€ã¯ãããŒã¿ãéããŠã¹ããŒãªãŒãèŠèŠåããããšã§ãã è¯ãã¹ããŒãªãŒã¯ããªãã®æŽå¯ãããè¯ãäŒããä»ã®äººãããªãã®ã¢ã€ãã¢ãç解ããã®ãå©ããŸãã
ããŒã¿ãµã€ãšã³ã¹ã®æèã§ã®ã¹ããŒãªãŒã¯ãããªããèŠã€ãããã¹ãŠã®æŠèŠãšãã®æå³ã§ãã äŸãšããŠãéå»1幎éã§äŒç€Ÿã®å©çã20ïŒ
æžå°ãããšããçºèŠããããŸãã ãã®äºå®ãææããã ãã§ã¯ååã§ã¯ãããŸããããªãå©çãèœã¡ãã®ããããã«ã€ããŠã©ãããã®ãã説æããå¿
èŠããããŸãã
ããŒã¿ã®ã¹ããŒãªãŒã®äž»ãªã³ã³ããŒãã³ãã¯æ¬¡ã®ãšããã§ãã
- ã³ã³ããã¹ãã®ç解ãšåœ¢æ
- ããŸããŸãªè§åºŠããã®ç 究
- é©åãªèŠèŠåã®äœ¿çš
- ããŸããŸãªããŒã¿ãœãŒã¹ã®äœ¿çš
- äžè²«ãããã¬ãŒã³ããŒã·ã§ã³ã
ããŒã¿ãéããŠã¹ããŒãªãŒãæ確ã«äŒããæè¯ã®æ¹æ³ã¯JupyterããŒãããã¯ã§ãã ããªãã圌ã«æ
£ããŠããªããªãã ããã¯è¯ããã¥ãŒããªã¢ã«ã§ãã JupyterããŒãããã¯ã䜿çšãããšãããŒã¿ãã€ã³ã¿ã©ã¯ãã£ãã«æ¢çŽ¢ããgithubãå«ãããŸããŸãªãµã€ãã«å
¬éã§ããŸãã çµæã®å
¬éã¯ã³ã©ãã¬ãŒã·ã§ã³ã«åœ¹ç«ã¡ãŸã-ä»ã®äººãåæãæ¡åŒµã§ããŸãã
ãã®æçš¿ã§ã¯ãPandasãmatplotlibãªã©ã®Pythonã©ã€ãã©ãªãšãšãã«JupyterããŒãããã¯ã䜿çšããŸãã
ããŒã¿ãµã€ãšã³ã¹ãããžã§ã¯ãã®ãããã¯ãéžæãã
ãããžã§ã¯ããäœæããæåã®ã¹ãããã¯ãããŒãã決å®ããããšã§ãã ããªããèå³ãæã¡ãæ¢æ±ããããšããäœããéžã¶äŸ¡å€ããããŸãã 人ã
ã¯ããã€ãããå®çŸããããã ãã«ãããžã§ã¯ããäœæãããããã€ããŒã¿ãæãäžããããšãæ¬åœã«é¢çœãã£ãã®ããåžžã«ç¢ºèªã§ããŸãã ãã®ã¹ãããã§ã¯ãé
åçãªäœããæ£ç¢ºã«èŠã€ããããã«æéãè²»ããããšãçã«ããªã£ãŠããŸãã
ãããã¯ãèŠã€ããè¯ãæ¹æ³ã¯ãç°ãªãããŒã¿ã»ããã«ç»ã£ãŠãäœãé¢çœãããèŠãããšã§ãã éå§ããã®ã«é©ããå Žæã次ã«ç€ºããŸãã
å®éã®ããŒã¿ãµã€ãšã³ã¹ã§ã¯ãç 究çšã«å®å
šã«æºåãããããŒã¿ã»ãããèŠã€ãããªãããšããããããŸãã ããŸããŸãªããŒã¿ãœãŒã¹ãéçŽããããçå£ã«ã¯ãªãŒã³ã¢ããããå¿
èŠãããå ŽåããããŸãã ãããã¯ãããªãã«ãšã£ãŠéåžžã«èå³æ·±ãå Žåãããã§åãããšãããã®ã¯çã«ããªã£ãŠããŸãïŒæåŸã«èªåèªèº«ãèŠããæ¹ãè¯ãã§ãã
ããããæçš¿ã«ãã¥ãŒãšãŒã¯ã®ç·ååŠæ ¡ã®ããŒã¿ã䜿çšããŸã ã
ã泚æ perevã念ã®ãããç§ãã¡ã«è¿ãé¡äŒŒã®ããŒã¿ã»ããïŒãã·ã¢èªïŒã®äŸã瀺ããŸãã
ãããã¯éžæ
ãããžã§ã¯ãå
šäœãæåããæåŸãŸã§äœæããããšãéèŠã§ãã ãããè¡ãã«ã¯ãåŠç¿ç¯å²ãå¶éããŠãçµäºããå
容ãæ£ç¢ºã«ææ¡ããŠãããšäŸ¿å©ã§ãã æ¢ã«çµäºãããããžã§ã¯ãã«æ¢ã«äœããè¿œå ããæ¹ããæ¢ã«çµããã«æã£ãŠããã®ã«é£œã飜ãããŠãããã®ãå®æããããããç°¡åã§ãã
ç§ãã¡ã®ã±ãŒã¹ã§ã¯ãé«æ ¡çåãã®çµ±äžåœå®¶è©Šéšã®æ瞟ããããŸããŸãªäººå£çµ±èšããã³ãã®ä»ã®æ
å ±ãšãšãã«èª¿æ»ããŸãã è©ŠéšãŸãã¯çµ±äžç¶æ
è©Šéšã¯ãé«æ ¡çã倧åŠã«å
¥ãåã«åãããã¹ãã§ãã 倧åŠã¯ãå
¥åŠã決å®ãããšãã«æ瞟ãèæ
®ããŸãããããã£ãŠãåæ Œããããšãéåžžã«éèŠã§ãã è©Šéšã¯3ã€ã®ããŒãã§æ§æãããåããŒãã®è©äŸ¡ã¯800ãã€ã³ãã§ãã æåŸã®åèšã¹ã³ã¢ã¯2400ã§ãïŒãã ããååŸã«å€åããããšããããŸã-ããŒã¿ã»ããã§ã¯ãã¹ãŠ2400ã§ãïŒã é«æ ¡ã¯ãå€ãã®å Žåãå¹³åè©Šéšã¹ã³ã¢ã§ã©ã³ã¯ä»ããããéåžžãé«ãå¹³åç¹ã¯åŠåºãã©ãã ãè¯ããã瀺ãææšã§ãã
ç±³åœã®äžéšã®å°æ°æ°æã®è©äŸ¡ã®äžå
¬æ£ã«ã€ããŠäžæºãããã€ããããŸããããã®ããããã¥ãŒãšãŒã¯ã§ã®åæã¯ãè©Šéšã®å
¬å¹³æ§ãæããã«ããã®ã«åœ¹ç«ã¡ãŸãã
USEè©äŸ¡ã®ããŒã¿ã»ããã¯ãã¡ã ãååŠæ ¡ã®æ
å ±ãå«ãããŒã¿ã»ããã¯ãã¡ãã§ãã ããããããžã§ã¯ãã®åºç€ã«ãªããŸãããå®å
šãªåæãè¡ãã«ã¯ããã«æ
å ±ãå¿
èŠã§ãã
ã泚æ perevãå
ã®è©Šéšã¯ãSAT-Scholastic Aptitude TestãšåŒã°ããŸãã ããããããã¯ç§ãã¡ã®USEãšå®è³ªçã«åãæå³ãªã®ã§ããã®ããã«ç¿»èš³ããããšã«ããŸããã
ããŒã¿åé
è¯ããããã¯ãèŠã€ãã£ããããããã¯ãå±éãããã調æ»ãæ·±ãããããã®ã«åœ¹ç«ã€ä»ã®ããŒã¿ã»ããã調ã¹ããšäŸ¿å©ã§ãã ãããžã§ã¯ããäœæãããã®ãšåããããå€ãã®ããŒã¿ãã¹ã¿ãã£ã«ååšããããã«ãæåã«ãããè¡ãããšããå§ãããŸãã ããŒã¿ãå°ãªãå Žåãããã«éäŒããå¯èœæ§ããããŸãã
ç§ãã¡ã®å Žåãåããµã€ãã®ãã®ãããã¯ã«ã¯ã人å£çµ±èšæ
å ±ãšè©Šéšçµæãã«ããŒããããŒã¿ã»ãããããã€ããããŸãã
䜿çšãããã¹ãŠã®ããŒã¿ã»ãããžã®ãªã³ã¯ã¯æ¬¡ã®ãšããã§ãã
ãããã®ããŒã¿ã¯ãã¹ãŠçžäºæ¥ç¶ãããŠãããåæãéå§ããåã«ããããçµã¿åãããããšãã§ããŸãã
èæ¯æ
å ±ã®åé
ããŒã¿åæã«å
¥ãåã«ãäž»é¡ã«é¢ããäžè¬çãªæ
å ±ãèŠã€ããããšã¯æçšã§ãã ç§ãã¡ã®å Žåã圹ã«ç«ã€ãããããªãäœããç¥ã£ãŠããŸãïŒ
- ãã¥ãŒãšãŒã¯ã¯5ã€ã®å°åºã«åå²ãããŠããããããã¯ã»ãŒå¥ã
ã®ãšãªã¢ã§ãã
- ãã¥ãŒãšãŒã¯ã®åŠæ ¡ã¯ããã€ãã®åŠåºã«åãããŠãããååŠåºã«ã¯äœåãã®åŠæ ¡ãå«ããããšãã§ããŸãã
- ããŒã¿ã»ããå
ã®ãã¹ãŠã®åŠæ ¡ãé«æ ¡ã§ã¯ãªããããããŒã¿ãäºåã«ã¯ãªãŒã³ã¢ããããå¿
èŠãããå ŽåããããŸãã
- ãã¥ãŒãšãŒã¯ã®ååŠæ ¡ã«ã¯ãåºæã®DBNãŸãã¯å°åºã³ãŒãããããŸãã
- å°åºããšã«ããŒã¿ãéèšããããšã§ãå°å³äœææ
å ±ã䜿çšããŠããããã®éãããããã³ã°ã§ããŸãã
ã泚æ perevããNeighborhoodsããšç¿»èš³ãããã®ã¯ãå®éã«ã¯NYCãèªæ²»åºããšåŒã°ããåã¯ããããèªæ²»åºãšåŒã°ããŸãã
ããŒã¿ãç解ããŠããŸã
ããŒã¿ã®ã³ã³ããã¹ããæ¬åœã«ç解ããã«ã¯ãæéããããŠãã®ããŒã¿ã«ã€ããŠèªãå¿
èŠããããŸãã ãã®å Žåãäžèšã®åãªã³ã¯ã«ã¯ååã®ããŒã¿ã®èª¬æãå«ãŸããŠããŸãã 人å£çµ±èšããã®ä»ã®æ
å ±ãå«ãä»ã®ããŒã¿ã»ãããšãšãã«ãé«æ ¡çã®è©Šéšã®æšå®å€ã«é¢ããããŒã¿ãããããã§ãã
ã³ãŒããå®è¡ããŠããŒã¿ãèªã¿åããŸãããã ç 究ã«ã¯JupyterããŒãããã¯ã䜿çšããŸãã 以äžã®ã³ãŒãïŒ
- ããŠã³ããŒãããåãã¡ã€ã«ãå®è¡ãã
- PandasããŒã¿ãã¬ãŒã ã®ãããããèªã¿åããŸã
- ãã¹ãŠã®ããŒã¿ãã¬ãŒã ãpythonèŸæžã«å
¥ããŸãã
import pandas import numpy as np files = ["ap_2010.csv", "class_size.csv", "demographics.csv", "graduation.csv", "hs_directory.csv", "math_test_results.csv", "sat_results.csv"] data = {} for f in files: d = pandas.read_csv("schools/{0}".format(f)) data[f.replace(".csv", "")] = d
ãã¹ãŠãèªãã ããããŒã¿ãã¬ãŒã ã§headã¡ãœããã䜿çšããŠãããããã®æåã®5è¡ã衚瀺ã§ããŸãã
for k,v in data.items(): print("\n" + k + "\n") print(v.head())
ããŒã¿ã»ããå
ã®ç¹å®ã®æ©èœãæ¢ã«ç¢ºèªã§ããŸãã
math_test_results
| Dbn | ã°ã¬ãŒã | 幎 | ã«ããŽãªãŒ | ãã¹ãæžã¿ã®æ° | å¹³åã¹ã±ãŒã«ã¹ã³ã¢ | ã¬ãã«1ïŒ | \ |
---|
0 | 01M015 | 3 | 2006 | ãã¹ãŠã®åŠç | 39 | 667 | 2 |
1 | 01M015 | 3 | 2007幎 | ãã¹ãŠã®åŠç | 31 | 672 | 2 |
2 | 01M015 | 3 | 2008幎 | ãã¹ãŠã®åŠç | 37 | 668 | 0 |
3 | 01M015 | 3 | 2009 | ãã¹ãŠã®åŠç | 33 | 668 | 0 |
4 | 01M015 | 3 | 2010 | ãã¹ãŠã®åŠç | 26 | 677 | 6 |
| ã¬ãã«1ïŒ
| ã¬ãã«2ïŒ | ã¬ãã«2ïŒ
| ã¬ãã«3ïŒ | ã¬ãã«3ïŒ
| ã¬ãã«4ïŒ | ã¬ãã«4ïŒ
| \ |
---|
0 | 5.1ïŒ
| 11 | 28.2ïŒ
| 20 | 51.3ïŒ
| 6 | 15.4ïŒ
|
1 | 6.5ïŒ
| 3 | 9.7ïŒ
| 22 | 71ïŒ
| 4 | 12.9ïŒ
|
2 | 0ïŒ
| 6 | 16.2ïŒ
| 29æ¥ | 78.4ïŒ
| 2 | 5.4ïŒ
|
3 | 0ïŒ
| 4 | 12.1ïŒ
| 28 | 84.8ïŒ
| 1 | 3ïŒ
|
4 | 23.1ïŒ
| 12 | 46.2ïŒ
| 6 | 23.1ïŒ
| 2 | 7.7ïŒ
|
| ã¬ãã«3 + 4ïŒ | ã¬ãã«3 + 4ïŒ
|
---|
0 | 26 | 66.7ïŒ
|
1 | 26 | 83.9ïŒ
|
2 | 31 | 83.8ïŒ
|
3 | 29æ¥ | 87.9ïŒ
|
4 | 8 | 30.8ïŒ
|
ap_2010
| Dbn | åŠæ ¡å | APåéšè
| åèšè©Šéšæ° | ã¹ã³ã¢3 4ãŸãã¯5ã®è©Šéšã®æ° |
---|
0 | 01M448 | 倧åŠãã€ããŒãããHS | 39 | 49 | 10 |
1 | 01M450 | ã€ãŒã¹ããµã€ãã³ãã¥ããã£HS | 19 | 21 | s |
2 | 01M515 | ã€ãŒã¹ããµã€ãã®äžãããã | 24 | 26 | 24 |
3 | 01M539 | æ°ããæ¢æ»SCIãTECHãMATH | 255 | 377 | 191 |
4 | 02M296 | ãã¹ãã¿ãªãã£ãããžã¡ã³ãã®é«æ ¡ | s | s | s |
sat_results
| Dbn | åŠæ ¡å | SATåéšè
æ° | SAT Critical Reading Avgã åŸç¹ | SAT Mathå¹³å åŸç¹ | SATã©ã€ãã£ã³ã°å¹³å åŸç¹ |
---|
0 | 01M292 | åœéç 究ã®ããã®ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã« | 29æ¥ | 355 | 404 | 363 |
1 | 01M448 | 倧åŠè¿é£é«çåŠæ ¡ | 91 | 383 | 423 | 366 |
2 | 01M450 | ã€ãŒã¹ããµã€ãã³ãã¥ããã£ã¹ã¯ãŒã« | 70 | 377 | 402 | 370 |
3 | 01M458 | ãã©ãŒãµã€ã¹ãµãã©ã€ãã¢ã«ãã㌠| 7 | 414 | 401 | 359 |
4 | 01M509 | ãã«ã¿ãã¬ãŒãã€ã¹ã¯ãŒã« | 44 | 390 | 433 | 384 |
class_size
| CSD | èªæ²»åº | åŠæ ¡ã³ãŒã | åŠæ ¡å | ã°ã¬ãŒã | ããã°ã©ã ã®çš®é¡ | CORE SUBJECTïŒMS COREããã³9-12ã®ã¿ïŒ | ã³ã¢ã³ãŒã¹ïŒMSã³ã¢ããã³9-12ã®ã¿ïŒ | \ |
---|
0 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 0K | GEN ED | - | - |
1 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 0K | CTT | - | - |
2 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 01 | GEN ED | - | - |
3 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 01 | CTT | - | - |
4 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 02 | GEN E | - | - |
| ãµãŒãã¹ã«ããŽãªïŒK-9 *ã®ã¿ïŒ | åè¬è
æ°/åžæ° | ã»ã¯ã·ã§ã³ã®æ° | å¹³åã¯ã©ã¹ãµã€ãº | æå°ã¯ã©ã¹ã®ãµã€ãº | \ |
---|
0 | - | 19.0 | 1.0 | 19.0 | 19.0 |
1 | - | 21.0 | 1.0 | 21.0 | 21.0 |
2 | - | 17.0 | 1.0 | 17.0 | 17.0 |
3 | - | 17.0 | 1.0 | 17.0 | 17.0 |
4 | - | 15.0 | 1.0 | 15.0 | 15.0 |
| æ倧ã¯ã©ã¹ã®ãµã€ãº | ããŒã¿ãœãŒã¹ | åŠæ ¡ã®çåŸãšæåž«ã®æ¯ç |
---|
0 | 19.0 | ATS | ãã³ |
1 | 21.0 | ATS | ãã³ |
2 | 17.0 | ATS | ãã³ |
3 | 17.0 | ATS | ãã³ |
4 | 15.0 | ATS | ãã³ |
人å£çµ±èš
| Dbn | ãåå | åŠå¹Ž | fl_percent | frl_percent | \ |
---|
0 | 01M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 20052006 | 89.4 | ãã³ |
1 | 01M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 20062007 | 89.4 | ãã³ |
2 | 01M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 20072008 | 89.4 | ãã³ |
3 | 01M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 20082009 | 89.4 | ãã³ |
4 | 01M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 20092010 | | 96.5 |
| total_enrollment | ãã¬ã㯠| k | ã°ã¬ãŒã1 | ã°ã¬ãŒã2 | ... | black_num | black_per | \ |
---|
0 | 281 | 15 | 36 | 40 | 33 | ... | 74 | 26.3 |
1 | 243 | 15 | 29æ¥ | 39 | 38 | ... | 68 | 28.0 |
2 | 261 | 18 | 43 | 39 | 36 | ... | 77 | 29.5 |
3 | 252 | 17 | 37 | 44 | 32 | ... | 75 | 29.8 |
4 | 208 | 16 | 40 | 28 | 32 | ... | 67 | 32.2 |
| hispanic_num | hispanic_per | white_num | white_per | male_num | male_per | female_num | female_per | \ |
---|
0 | 189 | 67.3 | 5 | 1.8 | 158.0 | 56.2 | 123.0 | 43.8 |
1 | 153 | 63.0 | 4 | 1.6 | 140.0 | 57.6 | 103.0 | 42.4 |
2 | 157 | 60.2 | 7 | 2.7 | 143.0 | 54.8 | 118.0 | 45.2 |
3 | 149 | 59.1 | 7 | 2.8 | 149.0 | 59.1 | 103.0 | 40.9 |
4 | 118 | 56.7 | 6 | 2.9 | 124.0 | 59.6 | 84.0 | 40.4 |
åæ¥
| 人å£çµ±èš | Dbn | åŠæ ¡å | ã³ããŒã | \ |
---|
0 | ç·ã³ããŒã | 01M292 | ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã«ãã©ãŒã€ã³ã¿ãŒãã·ã§ãã« | 2003 |
1 | ç·ã³ããŒã | 01M292 | ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã«ãã©ãŒã€ã³ã¿ãŒãã·ã§ãã« | 2004 |
2 | ç·ã³ããŒã | 01M292 | ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã«ãã©ãŒã€ã³ã¿ãŒãã·ã§ãã« | 2005幎 |
3 | ç·ã³ããŒã | 01M292 | ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã«ãã©ãŒã€ã³ã¿ãŒãã·ã§ãã« | 2006 |
4 | ç·ã³ããŒã | 01M292 | ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã«ãã©ãŒã€ã³ã¿ãŒãã·ã§ãã« | 2006幎8æ |
| ç·ã³ããŒã | ç·åæ¥-n | ç·åæ¥ç-ã³ããŒãã®å²å | åèšãªãŒãžã§ã³ã-n | \ |
---|
0 | 5 | s | s | s |
1 | 55 | 37 | 67.3ïŒ
| 17 |
2 | 64 | 43 | 67.2ïŒ
| 27 |
3 | 78 | 43 | 55.1ïŒ
| 36 |
4 | 78 | 44 | 56.4ïŒ
| 37 |
| ç·ãªãŒãžã§ã³ã-ã³ããŒãã®ïŒ
| ç·ãªãŒãžã§ã³ã-åæ¥çã®å²å | ... | é«åºŠãªãã®ãªãŒãžã§ã³ã-n | \ |
---|
0 | s | s | ... | s |
1 | 30.9ïŒ
| 45.9ïŒ
| ... | 17 |
2 | 42.2ïŒ
| 62.8ïŒ
| ... | 27 |
3 | 46.2ïŒ
| 83.7ïŒ
| ... | 36 |
4 | 47.4ïŒ
| 84.1ïŒ
| ... | 37 |
| é«åºŠãªãã®ãªãŒãžã§ã³ã-ã³ããŒãã®ïŒ
| é«åºŠãªãã®ãªãŒãžã§ã³ã-åæ¥çã®ïŒ
| \ |
---|
0 | s | s |
1 | 30.9ïŒ
| 45.9ïŒ
|
2 | 42.2ïŒ
| 62.8ïŒ
|
3 | 46.2ïŒ
| 83.7ïŒ
|
4 | 47.4ïŒ
| 84.1ïŒ
|
| ããŒã«ã«-n | ããŒã«ã«-ã³ããŒãã®ïŒ
| ããŒã«ã«-åæ¥çã®ïŒ
| sãŸã ç»é²æžã¿-n | \ |
---|
0 | s | s | s | s |
1 | 20 | 36.4ïŒ
| 54.1ïŒ
| 15 |
2 | 16 | 25ïŒ
| 37.200000000000003ïŒ
| 9 |
3 | 7 | 9ïŒ
| 16.3ïŒ
| 16 |
4 | 7 | 9ïŒ
| 15.9ïŒ
| 15 |
| ãŸã ç»é²æžã¿-ã³ããŒãã®ïŒ
| ããããã¢ãŠã-n | ããããã¢ãŠã-ã³ããŒãã®ïŒ
|
---|
0 | s | s | s |
1 | 27.3ïŒ
| 3 | 5.5ïŒ
|
2 | 14.1ïŒ
| 9 | 14.1ïŒ
|
3 | 20.5ïŒ
| 11 | 14.1ïŒ
|
4 | 19.2ïŒ
| 11 | 14.1ïŒ
|
hs_directory
| dbn | school_name | ãã | \ |
---|
0 | 17K548 | ãã«ãã¯ãªã³é³æ¥œåŠæ ¡ | ãã«ãã¯ãªã³ |
1 | 09X543 | ãã€ãªãªã³ãšãã³ã¹ã®é«æ ¡ | ããã³ã¯ã¹ |
2 | 09X327 | å
æ¬çãªã¢ãã«ã¹ã¯ãŒã«ãããžã§ã¯ãMS 327 | ããã³ã¯ã¹ |
3 | 02M280 | ãã³ããã¿ã³åºåå€§åŠ | ãã³ããã¿ã³ |
4 | 28Q680 | ã¯ã€ãŒã³ãºãã«ã¹ã²ãŒããŠã§ã€ã»ã«ã³ããªãµã€ãšã³ã¹ã³ãŒã¹... | ã¯ã€ãŒã³ãº |
| building_code | é»è©±çªå· | fax_number | grade_span_min | grade_span_max | \ |
---|
0 | K440 | 718-230-6250 | 718-230-6262 | 9 | 12 |
1 | X400 | 718-842-0687 | 718-589-9849 | 9 | 12 |
2 | X240 | 718-294-8111 | 718-294-8109 | 6 | 12 |
3 | M520 | 718-935-3477 | ãã³ | 9 | 10 |
4 | Q695 | 718-969-3155 | 718-969-3552 | 6 | 12 |
| expgrade_span_min | expgrade_span_max | ... | priority02 | \ |
---|
0 | ãã³ | ãã³ | ... | ãããããã¥ãŒãšãŒã¯åžã®äœæ°ãž |
1 | ãã³ | ãã³ | ... | ãã®åŸããã¥ãŒãšãŒã¯åžã®äœæ°ã«åºåžã... |
2 | ãã³ | ãã³ | ... | 次ã«ãåºåžããããã³ã¯ã¹ã®åŠçãŸãã¯å±
äœè
ã«... |
3 | 9 | 14.0 | ... | ãã®åŸããã¥ãŒãšãŒã¯åžã®äœæ°ã«åºåžã... |
4 | ãã³ | ãã³ | ... | 次ã«ã28åºããã³29åºã®åŠçãŸãã¯å±
äœè
ãž |
| priority03 | priority04 | priority05 | \ |
---|
0 | ãã³ | ãã³ | ãã³ |
1 | ãã®åŸãããã³ã¯ã¹ã®åŠçãŸãã¯å±
äœè
ã« | ãããããã¥ãŒãšãŒã¯åžã®äœæ°ãž | ãã³ |
2 | ãã®åŸããã¥ãŒãšãŒã¯åžã®äœæ°ã«åºåžã... | ãã®åŸãããã³ã¯ã¹ã®åŠçãŸãã¯å±
äœè
ã« | ãããããã¥ãŒãšãŒã¯åžã®äœæ°ãž |
3 | ãã®åŸããã³ããã¿ã³ã®åŠçãŸãã¯å±
äœè
ã« | ãããããã¥ãŒãšãŒã¯åžã®äœæ°ãž | ãã³ |
4 | ãã®åŸãã¯ã€ãŒã³ãºã®åŠçãŸãã¯å±
äœè
ãž | ãããããã¥ãŒãšãŒã¯åžã®äœæ°ãž | ãã³ |
| priority06 | priority07 | priority08 | priority09 | åªå
床10 | å Žæ1 |
---|
0 | ãã³ | ãã³ | ãã³ | ãã³ | ãã³ | 883 Classon Avenue \ nãã«ãã¯ãªã³ããã¥ãŒãšãŒã¯11225 \ nïŒ40.67 ... |
1 | ãã³ | ãã³ | ãã³ | ãã³ | ãã³ | 1110 Boston Road \ nBronxãNY 10456 \ nïŒ40.8276026 ... |
2 | ãã³ | ãã³ | ãã³ | ãã³ | ãã³ | 1501ãžã§ããŒã ã¢ããã¥ãŒ\ nããã³ã¯ã¹ããã¥ãŒãšãŒã¯10452 \ nïŒ40.84241 ... |
3 | ãã³ | ãã³ | ãã³ | ãã³ | ãã³ | 411 Pearl Street \ nãã¥ãŒãšãŒã¯ãNY 10038 \ nïŒ40.7106 ... |
4 | ãã³ | ãã³ | ãã³ | ãã³ | ãã³ | 160-20 Goethals Avenue \ nãžã£ãã€ã«ãNY 11432 \ nïŒ40 ... |
- ã»ãšãã©ã«ã¯DBNåãå«ãŸããŸã
- äžéšã®ãã£ãŒã«ãã¯ããããã³ã°ã«èå³æ·±ãããã«èŠããŸããç¹ã«ãäœçœ®1ã«ã¯ãè¡ã®åº§æšãå«ãŸããŠããŸãã
- äžéšã®ããŒã¿ã»ããã«ã¯ãåŠæ ¡ããšã«è€æ°ã®è¡ãããïŒDBNå€ãéè€ããŠããŸãïŒãååŠçã®å¿
èŠæ§ã瀺åããŠããŸãã
å
±éåæ¯ã«ããŒã¿ããããã
ããŒã¿ã®æäœãç°¡åã«ããã«ã¯ããã¹ãŠã®ããŒã¿ã»ããã1ã€ã«çµåããå¿
èŠããããŸããããã«ãããããŒã¿ã»ããã®åããã°ããæ¯èŒã§ããŸãã ãã®ããã«ã¯ããŸããçµåã®å
±éã®åãèŠã€ããå¿
èŠããããŸãã 以åã«æšæž¬ããå
容ãèŠããšã DBNã¯è€æ°ã®ããŒã¿ã»ããã§ç¹°ãè¿ãããŠããããã DBNããã®ãããªåã§ãããšæ³å®ã§ããŸãã
ãDBN New York City SchoolsããGoogleã§æ€çŽ¢ãããšã ããã«æ¥ãŸã ãããã¯ãDBNãååŠæ ¡ã«åºæã®ã³ãŒãã§ããããšã説æããŠããŸãã ããŒã¿ã»ãããç¹ã«æ¿åºã®ããŒã¿ã»ããã®èª¿æ»ã§ã¯ãååãäœãæå³ããã®ããç解ããããã«ããã°ãã°åããŒã¿ã»ããã§ãããæ¢åµã®ä»äºãããªããã°ãªããŸããã
çŸåšã®åé¡ã¯ã class_sizeãšhs_directoryã® 2ã€ã®ããŒã¿ã»ããã«DBNãå«ãŸããŠããªãããšã§ãã hs_directoryã§ã¯ãdbnãšåŒã°ãããããååãå€æŽãããã DBNã«ã³ããŒããã ãã§ãã Class_sizeã«ã¯å¥ã®ã¢ãããŒããå¿
èŠã§ãã
DBNåã¯æ¬¡ã®ããã«ãªããŸãã
In [5]: data["demographics"]["DBN"].head() Out[5]: 0 01M015 1 01M015 2 01M015 3 01M015 4 01M015 Name: DBN, dtype: object
class_sizeãèŠããšããããæåã®5è¡ã«è¡šç€ºãããŸãã
In [4]: data["class_size"].head() Out[4]:
| CSD | èªæ²»åº | åŠæ ¡ã³ãŒã | åŠæ ¡å | ã°ã¬ãŒã | ããã°ã©ã ã®çš®é¡ | CORE SUBJECTïŒMS COREããã³9-12ã®ã¿ïŒ | / |
---|
0 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 0K | GEN ED | - |
1 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 0K | CTT | - |
2 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 01 | GEN ED | - |
3 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 01 | CTT | - |
4 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 02 | GEN ED | - |
| ã³ã¢ã³ãŒã¹ïŒMSã³ã¢ããã³9-12ã®ã¿ïŒ | ãµãŒãã¹ã«ããŽãªïŒK-9 *ã®ã¿ïŒ | åè¬è
æ°/åžæ° | / |
---|
0 | - | - | 19.0 |
1 | - | - | 21.0 |
2 | - | - | 17.0 |
3 | - | - | 17.0 |
4 | - | - | 15.0 |
| ã»ã¯ã·ã§ã³ã®æ° | å¹³åã¯ã©ã¹ãµã€ãº | æå°ã¯ã©ã¹ã®ãµã€ãº | æ倧ã¯ã©ã¹ã®ãµã€ãº | ããŒã¿ãœãŒã¹ | åŠæ ¡ã®çåŸãšæåž«ã®æ¯ç |
---|
0 | 1.0 | 19.0 | 19.0 | 19.0 | ATS | ãã³ |
1 | 1.0 | 21.0 | 21.0 | 21.0 | ATS | ãã³ |
2 | 1.0 | 17.0 | 17.0 | 17.0 | ATS | ãã³ |
3 | 1.0 | 17.0 | 17.0 | 17.0 | ATS | ãã³ |
4 | 1.0 | 15.0 | 15.0 | 15.0 | ATS | ãã³ |
ã芧ã®ãšãããDBNã¯CSD ã BOROUGH ãããã³SCHOOL_ CODEã®åãªãçµã¿åããã§ãã ãã¥ãŒãšãŒã¯ã«äžæ
£ããªäººã®ããã«ïŒããã¯5ã€ã®å°åºã§æ§æãããŠããŸãã åãšãªã¢ã¯ãååã«å€§ããç±³åœã®éœåžãšã»ãŒåããµã€ãºã®çµç¹åäœã§ãã DBNã¯ãå°åºå°åºçªå·ã®ç¥ã§ãã CSDã¯é¡ã®ããã§ã BOROUGHã¯å°åºã§ããã SCHOOL_CODEãšçµã¿åããããšDBNãååŸãããŸãã
DBNã®äœææ¹æ³ãããã£ãã®ã§ããããclass_sizeãšhs_directoryã«è¿œå ã§ããŸãã
In [ ]: data["class_size"]["DBN"] = data["class_size"].apply(lambda x: "{0:02d}{1}".format(x["CSD"], x["SCHOOL CODE"]), axis=1) data["hs_directory"]["DBN"] = data["hs_directory"]["dbn"]
ã¢ã³ã±ãŒããè¿œå
æãèå³æ·±ãå¯èœæ§ã®ããããŒã¿ã»ããã®1ã€ã¯ãåŠæ ¡ã®è³ªã«é¢ããåŠçãä¿è·è
ãæåž«ã®èª¿æ»ã®ããŒã¿ã»ããã§ãã ãããã®èª¿æ»ã«ã¯ãååŠæ ¡ã®å®å
šæ§ãæè²åºæºãªã©ã®äž»èŠ³çãªèªèã«é¢ããæ
å ±ãå«ãŸããŠããŸãã ããŒã¿ã»ãããçµåããåã«ã調æ»ããŒã¿ãè¿œå ããŸãããã å®éã®ããŒã¿ãµã€ãšã³ã¹ãããžã§ã¯ãã§ã¯ãåæäžã«èå³æ·±ãããŒã¿ã«åºãããããšãå€ãããã®ããŒã¿ãæ¥ç¶ãããå ŽåããããŸãã JupyterããŒãããã¯ãªã©ã®æè»ãªããŒã«ã䜿çšãããšãã³ãŒãããã°ããè¿œå ããŠåæãããçŽãããšãã§ããŸãã
ãã®äŸã§ã¯ãè¿œå ã®ããŒãªã³ã°ããŒã¿ãããŒã¿ãã£ã¯ã·ã§ããªã«è¿œå ãããã¹ãŠã®ããŒã¿ã»ãããçµåããŸãã 調æ»ããŒã¿ã¯ããã¹ãŠã®åŠæ ¡çšãšåŠåº75çšã®2ã€ã®ãã¡ã€ã«ã§æ§æãããŠããŸããããããçµåããã«ã¯ãããã€ãã®ã³ãŒããèšè¿°ããå¿
èŠããããŸãã ãã®äžã§ãããè¡ããŸãïŒ
- Windows-1252ãšã³ã³ãŒãã£ã³ã°ã䜿çšããŠããã¹ãŠã®åŠæ ¡ã®ã¢ã³ã±ãŒããèªã¿ãŸã
- windows-1252ã䜿çšããŠé¡75ã®æ祚ãèªã
- åããŒã¿ã»ãããã©ã®å°åºã«å±ããŠãããã瀺ããã©ã°ãè¿œå ããŸãã
- ããŒã¿ãã¬ãŒã ã§concatã¡ãœããã䜿çšããŠããã¹ãŠã®ããŒã¿ã»ããã1ã€ã«çµåããŸãã
In [66]: survey1 = pandas.read_csv("schools/survey_all.txt", delimiter="\t", encoding='windows-1252') survey2 = pandas.read_csv("schools/survey_d75.txt", delimiter="\t", encoding='windows-1252') survey1["d75"] = False survey2["d75"] = True survey = pandas.concat([survey1, survey2], axis=0)
ãã¹ãŠã®æ祚ãçµåãããšããã«ããããªãå°é£ãçããŸãã çµåãããããŒã¿ã»ããã®åã®æ°ãæå°éã«æããŠãåãç°¡åã«æ¯èŒããŠäŸåé¢ä¿ãèå¥ã§ããããã«ããŸãã æ®å¿µãªããã調æ»ããŒã¿ã«ã¯äžå¿
èŠãªåãå€ãå«ãŸããŠããŸãã
In [16]: survey.head() Out[16]:
| N_p | N_s | N_t | aca_p_11 | aca_s_11 | aca_t_11 | aca_tot_11 | / |
---|
0 | 90.0 | ãã³ | 22.0 | 7.8 | ãã³ | 7.9 | 7.9 |
1 | 161.0 | ãã³ | 34.0 | 7.8 | ãã³ | 9.1 | 8.4 |
2 | 367.0 | ãã³ | 42.0 | 8.6 | ãã³ | 7.5 | 8.0 |
3 | 151.0 | 145.0 | 29.0 | 8.5 | 7.4 | 7.8 | 7.9 |
4 | 90.0 | ãã³ | 23.0 | 7.9 | ãã³ | 8.1 | 8.0 |
| åå | com_p_11 | com_s_11 | ... | t_q8c_1 | t_q8c_2 | t_q8c_3 | t_q8c_4 | / |
---|
0 | M015 | 7.6 | ãã³ | ... | 29.0 | 67.0 | 5.0 | 0.0 |
1 | M019 | 7.6 | ãã³ | ... | 74.0 | 21.0 | 6.0 | 0.0 |
2 | M020 | 8.3 | ãã³ | ... | 33.0 | 35.0 | 20.0 | 13.0 |
3 | M034 | 8.2 | 5.9 | ... | 21.0 | 45.0 | 28.0 | 7.0 |
4 | M063 | 7.9 | ãã³ | ... | 59.0 | 36.0 | 5.0 | 0.0 |
| t_q9 | t_q9_1 | t_q9_2 | t_q9_3 | t_q9_4 | t_q9_5 |
---|
0 | ãã³ | 5.0 | 14.0 | 52.0 | 24.0 | 5.0 |
1 | ãã³ | 3.0 | 6.0 | 3.0 | 78.0 | 9.0 |
2 | ãã³ | 3.0 | 5.0 | 16.0 | 70.0 | 5.0 |
3 | ãã³ | 0.0 | 18.0 | 32.0 | 39.0 | 11.0 |
4 | ãã³ | 10.0 | 5.0 | 10.0 | 60.0 | 15.0 |
ããã¯ã調æ»ããŒã¿ãšãšãã«ããŠã³ããŒãããããŒã¿èŸæžãã¡ã€ã«ã調ã¹ãããšã§åŠçã§ããŸãã 圌ã¯éèŠãªåéã«ã€ããŠæããŠãããŸãã

ãããŠã調æ»ã§ç§ãã¡ã«é¢ä¿ã®ãªããã¹ãŠã®åãåé€ããŸãã
In [17]: survey["DBN"] = survey["dbn"] survey_fields = ["DBN", "rr_s", "rr_t", "rr_p", "N_s", "N_t", "N_p", "saf_p_11", "com_p_11", "eng_p_11", "aca_p_11", "saf_t_11", "com_t_11", "eng_t_10", "aca_t_11", "saf_s_11", "com_s_11", "eng_s_11", "aca_s_11", "saf_tot_11", "com_tot_11", "eng_tot_11", "aca_tot_11",] survey = survey.loc[:,survey_fields] data["survey"] = survey survey.shape Out[17]: (1702, 23)
åããŒã¿ã»ãããæ£ç¢ºã«äœãå«ãã®ãããããŠããããã©ã®åãéèŠã§ããã®ããç解ããããšã§ãå°æ¥ã®æéãšåŽåã倧å¹
ã«ç¯çŽã§ããŸãã
ããŒã¿ã»ãããå§çž®ããŸã
class_sizeãå«ãããã€ãã®ããŒã¿ã»ãããèŠããšãããã«åé¡ãããããŸãã
In [18]: data["class_size"].head() Out[18]:
| CSD | èªæ²»åº | åŠæ ¡ã³ãŒã | åŠæ ¡å | ã°ã¬ãŒã | ããã°ã©ã ã®çš®é¡ | CORE SUBJECTïŒMS COREããã³9-12ã®ã¿ïŒ | / |
---|
0 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 0K | GEN ED | - |
1 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 0K | CTT | - |
2 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 01 | GEN ED | - |
3 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 01 | CTT | - |
4 | 1 | M | M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 02 | GEN ED | - |
| ã³ã¢ã³ãŒã¹ïŒMSã³ã¢ããã³9-12ã®ã¿ïŒ | ãµãŒãã¹ã«ããŽãªïŒK-9 *ã®ã¿ïŒ | åè¬è
æ°/åžæ° | ã»ã¯ã·ã§ã³ã®æ° | å¹³åã¯ã©ã¹ãµã€ãº | / |
---|
0 | - | - | 19.0 | 1.0 | 19.0 |
1 | - | - | 21.0 | 1.0 | 21.0 |
2 | - | - | 17.0 | 1.0 | 17.0 |
3 | - | - | 17.0 | 1.0 | 17.0 |
4 | - | - | 15.0 | 1.0 | 15.0 |
| æå°ã¯ã©ã¹ã®ãµã€ãº | æ倧ã¯ã©ã¹ã®ãµã€ãº | ããŒã¿ãœãŒã¹ | åŠæ ¡ã®çåŸãšæåž«ã®æ¯ç | Dbn |
---|
0 | 19.0 | 19.0 | ATS | ãã³ | 01M015 |
1 | 21.0 | 21.0 | ATS | ãã³ | 01M015 |
2 | 17.0 | 17.0 | ATS | ãã³ | 01M015 |
3 | 17.0 | 17.0 | ATS | ãã³ | 01M015 |
4 | 15.0 | 15.0 | ATS | ãã³ | 01M015 |
ååŠæ ¡ã«ã¯è€æ°ã®è¡ããããŸãïŒéè€ãããã£ãŒã«ãDBNãšSCHOOL NAMEããç解ã§ããŸãïŒã ãã ãã sat_resultsãèŠããšãåŠæ ¡ããšã«1è¡ãããããŸããã
In [21]: data["sat_results"].head() Out[21]:
| Dbn | åŠæ ¡å | SATåéšè
æ° | SAT Critical Reading Avgã åŸç¹ | SAT Mathå¹³å åŸç¹ | SATã©ã€ãã£ã³ã°å¹³å åŸç¹ |
---|
0 | 01M292 | åœéç 究ã®ããã®ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã« | 29æ¥ | 355 | 404 | 363 |
1 | 01M448 | 倧åŠè¿é£é«çåŠæ ¡ | 91 | 383 | 423 | 366 |
2 | 01M450 | ã€ãŒã¹ããµã€ãã³ãã¥ããã£ã¹ã¯ãŒã« | 70 | 377 | 402 | 370 |
3 | 01M458 | ãã©ãŒãµã€ã¹ãµãã©ã€ãã¢ã«ãã㌠| 7 | 414 | 401 | 359 |
4 | 01M509 | ãã«ã¿ãã¬ãŒãã€ã¹ã¯ãŒã« | 44 | 390 | 433 | 384 |
ãããã®ããŒã¿ã»ãããçµåããã«ã¯ã class_sizeãªã©ã®ããŒã¿ã»ãããå§çž®ããŠãé«æ ¡ããšã«1ã€ã®è¡ãããããã«ããæ¹æ³ãå¿
èŠã§ãã ããŸããããªãå Žåã¯ãããŸãããããUSEã°ã¬ãŒããã¯ã©ã¹ãµã€ãºãšæ¯èŒããŸãã ãããå®çŸããã«ã¯ãããŒã¿ãããããç解ããŠãããããã€ãã®éèšãå®è¡ããŸãã
class_size ããŒã¿ã»ããå¥-GRADEãšPROGRAM TYPEã«ã¯åŠæ ¡ããšã«ç°ãªãæ瞟ãå«ãŸããŠããããã§ãã åãã£ãŒã«ããåäžã®å€ã«å¶éããããšã«ãããéè€ããè¡ããã¹ãŠç Žæ£ã§ããŸãã 以äžã®ã³ãŒãã§ã¯ïŒ
- class_sizeãããããã®å€ã®ã¿ãéžæããŸããGRADEãã£ãŒã«ãã¯09-12ã§ãã
- PROGRAM TYPEãã£ãŒã«ããGEN EDã§ããclass_sizeããã®å€ã®ã¿ãéžæããŸãã
- class_sizeãDBNã§ã°ã«ãŒãåããååã®å¹³åãåããŸãã æ¬è³ªçã«ãååŠæ ¡ã®å¹³åclass_sizeãèŠã€ããŸãã
- DBNãåãšããŠåã³è¿œå ãããããã«ãã€ã³ããã¯ã¹ããªã»ããããŸãã
In [68]: class_size = data["class_size"] class_size = class_size[class_size["GRADE "] == "09-12"] class_size = class_size[class_size["PROGRAM TYPE"] == "GEN ED"] class_size = class_size.groupby("DBN").agg(np.mean) class_size.reset_index(inplace=True) data["class_size"] = class_size
æ®ãã®ããŒã¿ã»ãããåããã
次ã«ã 人å£çµ±èšããŒã¿ã»ãããå§çž®ããå¿
èŠããããŸãã åãåŠæ ¡ã«ã€ããŠæ°å¹Žã«ããã£ãŠåéãããããŒã¿ã schoolyearãã£ãŒã«ãããã¹ãŠã®äžã§æãæ°ããè¡ã®ã¿ãéžæããŸãã
In [69]: demographics = data["demographics"] demographics = demographics[demographics["schoolyear"] == 20112012] data["demographics"] = demographics
次ã«ãmath_test_results ããŒã¿ã»ãããå§çž®ããå¿
èŠããããŸãã GradeãšYearã®å€ã§é€ç®ãããŸãã 1幎éã§1ã€ã®ã¯ã©ã¹ãéžæã§ããŸãã
In [70]: data["math_test_results"] = data["math_test_results"][data["math_test_results"]["Year"] == 2011] data["math_test_results"] = data["math_test_results"][data["math_test_results"]["Grade"] ==
æåŸã«ã åæ¥ãåçž®ããå¿
èŠããããŸãã
In [71]: data["graduation"] = data["graduation"][data["graduation"]["Cohort"] == "2006"] data["graduation"] = data["graduation"][data["graduation"]["Demographic"] == "Total Cohort"]
ãããžã§ã¯ãã®æ¬è³ªã«åãçµãåã«ãããŒã¿ãã¯ãªãŒã³ã¢ããããŠèª¿æ»ããããšãéèŠã§ãã ããã åã å
æ¬çãªããŒã¿ã»ããã¯ãåæãé«éåããã®ã«åœ¹ç«ã¡ãŸãã
éçŽå€æ°ã®èšç®
å€æ°ã®èšç®ã¯ãæ¯èŒãããé«éã«è¡ãæ©èœã«ããåæãé«éåããååãšããŠãå€æ°ãªãã§ã¯äžå¯èœãªããã€ãã®æ¯èŒãå¯èœã«ããŸãã æåã«ã§ããããšã¯ãåã
ã®SAT Math Avgåããåèšè©Šéšã¹ã³ã¢ãèšç®ããããšã§ãã ã¹ã³ã¢ ã SATã¯ãªãã£ã«ã«ãªãŒãã£ã³ã°å¹³å ã¹ã³ã¢ ãããã³SAT Writing Avgã ã¹ã³ã¢ ã 以äžã®ã³ãŒãã§ã¯ïŒ
- åè©Šéšã®ã¹ã³ã¢ãè¡ããæ°å€ã«å€æããŸã
- ãã¹ãŠã®åãè¿œå ããåèšè©Šéšã¹ã³ã¢ã§ããsat_scoreåãååŸããŸãã
In [72]: cols = ['SAT Math Avg. Score', 'SAT Critical Reading Avg. Score', 'SAT Writing Avg. Score'] for c in cols: data["sat_results"][c] = data["sat_results"][c].convert_objects(convert_numeric=True) data['sat_results']['sat_score'] = data['sat_results'][cols[0]] + data['sat_results'][cols[1]]
次ã«ãååŠæ ¡ã®åº§æšã解æããŠããããäœæããå¿
èŠããããŸãã 圌ãã¯ç§ãã¡ãååŠæ ¡ã®ç¶æ³ãèšé²ã§ããããã«ããŸãã ã³ãŒãã§ã¯ïŒ
- 緯床ãšçµåºŠã®åã«è§£æããå Žæ1ã®å
- latãšlonãæ°å€ã«å€æããŸãã
ããŒã¿ã»ããã衚瀺ããäœãèµ·ãã£ããã確èªããŸãã
In [74]: for k,v in data.items(): print(k) print(v.head())
math_test_results
| Dbn | ã°ã¬ãŒã | 幎 | ã«ããŽãªãŒ | ãã¹ãæžã¿ã®æ° | å¹³åã¹ã±ãŒã«ã¹ã³ã¢ | \ |
---|
111 | 01M034 | 8 | 2011 | ãã¹ãŠã®åŠç | 48 | 646 |
280 | 01M140 | 8 | 2011 | ãã¹ãŠã®åŠç | 61 | 665 |
346 | 01M184 | 8 | 2011 | ãã¹ãŠã®åŠç | 49 | 727 |
388 | 01M188 | 8 | 2011 | ãã¹ãŠã®åŠç | 49 | 658 |
411 | 01M292 | 8 | 2011 | ãã¹ãŠã®åŠç | 49 | 650 |
| ã¬ãã«1ïŒ | ã¬ãã«1ïŒ
| ã¬ãã«2ïŒ | ã¬ãã«2ïŒ
| ã¬ãã«3ïŒ | ã¬ãã«3ïŒ
| ã¬ãã«4ïŒ | \ |
---|
111 | 15 | 31.3ïŒ
| 22 | 45.8ïŒ
| 11 | 22.9ïŒ
| 0 |
280 | 1 | 1.6ïŒ
| 43 | 70.5ïŒ
| 17 | 27.9ïŒ
| 0 |
346 | 0 | 0ïŒ
| 0 | 0ïŒ
| 5 | 10.2ïŒ
| 44 |
388 | 10 | 20.4ïŒ
| 26 | 53.1ïŒ
| 10 | 20.4ïŒ
| 3 |
411 | 15 | 30.6ïŒ
| 25 | 51ïŒ
| 7 | 14.3ïŒ
| 2 |
| ã¬ãã«4ïŒ
| ã¬ãã«3 + 4ïŒ | ã¬ãã«3 + 4ïŒ
|
---|
111 | 0ïŒ
| 11 | 22.9ïŒ
|
280 | 0ïŒ
| 17 | 27.9ïŒ
|
346 | 89.8ïŒ
| 49 | 100ïŒ
|
388 | 6.1ïŒ
| 13 | 26.5ïŒ
|
411 | 4.1ïŒ
| 9 | 18.4ïŒ
|
調æ»
| Dbn | rr_s | rr_t | rr_p | N_s | N_t | N_p | saf_p_11 | com_p_11 | eng_p_11 | \ |
---|
0 | 01M015 | ãã³ | 88 | 60 | ãã³ | 22.0 | 90.0 | 8.5 | 7.6 | 7.5 |
1 | 01M019 | ãã³ | 100 | 60 | ãã³ | 34.0 | 161.0 | 8.4 | 7.6 | 7.6 |
2 | 01M020 | ãã³ | 88 | 73 | ãã³ | 42.0 | 367.0 | 8.9 | 8.3 | 8.3 |
3 | 01M034 | 89.0 | 73 | 50 | 145.0 | 29.0 | 151.0 | 8.8 | 8.2 | 8.0 |
4 | 01M063 | ãã³ | 100 | 60 | ãã³ | 23.0 | 90.0 | 8.7 | 7.9 | 8.1 |
| ... | eng_t_10 | aca_t_11 | saf_s_11 | com_s_11 | eng_s_11 | aca_s_11 | \ |
---|
0 | ... | ãã³ | 7.9 | ãã³ | ãã³ | ãã³ | ãã³ |
1 | ... | ãã³ | 9.1 | ãã³ | ãã³ | ãã³ | ãã³ |
2 | ... | ãã³ | 7.5 | ãã³ | ãã³ | ãã³ | ãã³ |
3 | ... | ãã³ | 7.8 | 6.2 | 5.9 | 6.5 | 7.4 |
4 | ... | ãã³ | 8.1 | ãã³ | ãã³ | ãã³ | ãã³ |
| saf_tot_11 | com_tot_11 | eng_tot_11 | aca_tot_11 |
---|
0 | 8.0 | 7.7 | 7.5 | 7.9 |
1 | 8.5 | 8.1 | 8.2 | 8.4 |
2 | 8.2 | 7.3 | 7.5 | 8.0 |
3 | 7.3 | 6.7 | 7.1 | 7.9 |
4 | 8.5 | 7.6 | 7.9 | 8.0 |
ap_2010
| Dbn | åŠæ ¡å | APåéšè
| åèšè©Šéšæ° | ã¹ã³ã¢3 4ãŸãã¯5ã®è©Šéšã®æ° |
---|
0 | 01M448 | 倧åŠãã€ããŒãããHS | 39 | 49 | 10 |
1 | 01M450 | ã€ãŒã¹ããµã€ãã³ãã¥ããã£HS | 19 | 21 | s |
2 | 01M515 | ã€ãŒã¹ããµã€ãã®äžãããã | 24 | 26 | 24 |
3 | 01M539 | æ°ããæ¢æ»SCIãTECHãMATH | 255 | 377 | 191 |
4 | 02M296 | ãã¹ãã¿ãªãã£ãããžã¡ã³ãã®é«æ ¡ | s | s | s |
sat_results
| Dbn | åŠæ ¡å | SATåéšè
æ° | SAT Critical Reading Avgã åŸç¹ | \ |
---|
0 | 01M292 | åœéç 究ã®ããã®ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã« | 29æ¥ | 355.0 |
1 | 01M448 | 倧åŠè¿é£é«çåŠæ ¡ | 91 | 383.0 |
2 | 01M450 | ã€ãŒã¹ããµã€ãã³ãã¥ããã£ã¹ã¯ãŒã« | 70 | 377.0 |
3 | 01M458 | ãã©ãŒãµã€ã¹ãµãã©ã€ãã¢ã«ãã㌠| 7 | 414.0 |
4 | 01M509 | ãã«ã¿ãã¬ãŒãã€ã¹ã¯ãŒã« | 44 | 390.0 |
| SAT Mathå¹³å åŸç¹ | SATã©ã€ãã£ã³ã°å¹³å åŸç¹ | sat_score |
---|
0 | 404.0 | 363.0 | 1122.0 |
1 | 423.0 | 366.0 | 1172.0 |
2 | 402.0 | 370.0 | 1149.0 |
3 | 401.0 | 359.0 | 1174.0 |
4 | 433.0 | 384.0 | 1207.0 |
class_size
| Dbn | CSD | åè¬è
æ°/åžæ° | ã»ã¯ã·ã§ã³ã®æ° | \ |
---|
0 | 01M292 | 1 | 88.0000 | 4.000000 |
1 | 01M332 | 1 | 46.0000 | 2.000000 |
2 | 01M378 | 1 | 33.0000 | 1.000000 |
3 | 01M448 | 1 | 105.6875 | 4.750000 |
4 | 01M450 | 1 | 57.6000 | 2.733333 |
| å¹³åã¯ã©ã¹ãµã€ãº | æå°ã¯ã©ã¹ã®ãµã€ãº | æ倧ã¯ã©ã¹ã®ãµã€ãº | åŠæ ¡ã®çåŸãšæåž«ã®æ¯ç |
---|
0 | 22.564286 | 18.50 | 26.571429 | ãã³ |
1 | 22.000000 | ååŸ9æ | 23.500000 | ãã³ |
2 | 33.000000 | 33.00 | 33.000000 | ãã³ |
3 | 22.231250 | 18.25 | 06/27/2500 | ãã³ |
4 | 21.200000 | 19.40 | 22.866667 | ãã³ |
人å£çµ±èš
| Dbn | ãåå | åŠå¹Ž | \ |
---|
6 | 01M015 | PS 015ããã«ãã¯ã¬ã¡ã³ã | 20112012 |
13 | 01M019 | PS 019ã¢ã·ã£ãŒã¬ãŽã£ãŒ | 20112012 |
20 | 01M020 | PS 020ã¢ã³ãã·ã«ã㌠| 20112012 |
27 | 01M034 | PS 034ãã©ã³ã¯ãªã³Dã«ãŒãºãŽã§ã«ã | 20112012 |
35 | 01M063 | PS 063ãŠã£ãªã¢ã ã»ãããã³ã¬ãŒ | 20112012 |
| fl_percent | frl_percent | total_enrollment | ãã¬ã㯠| k | ã°ã¬ãŒã1 | ã°ã¬ãŒã2 | \ |
---|
6 | ãã³ | 89.4 | 189 | 13 | 31 | 35 | 28 |
13 | ãã³ | 61.5 | 328 | 32 | 46 | 52 | 54 |
20 | ãã³ | 92.5 | 626 | 52 | 102 | 121 | 87 |
27 | ãã³ | 99.7 | 401 | 14 | 34 | 38 | 36 |
35 | ãã³ | 78.9 | 176 | 18 | 20 | 30 | 21 |
| ... | black_num | black_per | hispanic_num | hispanic_per | white_num | \ |
---|
6 | ... | 63 | 33.3 | 109 | 57.7 | 4 |
13 | ... | 81 | 24.7 | 158 | 48.2 | 28 |
20 | ... | 55 | 8.8 | 357 | 57.0 | 16 |
27 | ... | 90 | 22.4 | 275 | 68.6 | 8 |
35 | ... | 41 | 23.3 | 110 | 62.5 | 15 |
| white_per | male_num | male_per | female_num | female_per |
---|
6 | 2.1 | 97.0 | 51.3 | 92.0 | 48.7 |
13 | 8.5 | 147.0 | 44.8 | 181.0 | 55.2 |
20 | 2.6 | 330.0 | 52.7 | 296.0 | 47.3 |
27 | 2.0 | 204.0 | 50.9 | 197.0 | 49.1 |
35 | 8.5 | 97.0 | 55.1 | 79.0 | 44.9 |
åæ¥
| 人å£çµ±èš | Dbn | åŠæ ¡å | ã³ããŒã | \ |
---|
3 | ç·ã³ããŒã | 01M292 | ãã³ãªãŒã¹ããªãŒãã¹ã¯ãŒã«ãã©ãŒã€ã³ã¿ãŒãã·ã§ãã« | 2006 |
10 | ç·ã³ããŒã | 01M448 | 倧åŠè¿é£é«çåŠæ ¡ | 2006 |
17 | ç·ã³ããŒã | 01M450 | ã€ãŒã¹ããµã€ãã³ãã¥ããã£ã¹ã¯ãŒã« | 2006 |
24 | ç·ã³ããŒã | 01M509 | ãã«ã¿ãã¬ãŒãã€ã¹ã¯ãŒã« | 2006 |
31 | ç·ã³ããŒã | 01M515 | äžéšæ±åŽæºåé«çåŠæ ¡ | 2006 |
| ç·ã³ããŒã | ç·åæ¥-n | ç·åæ¥ç-ã³ããŒãã®å²å | åèšãªãŒãžã§ã³ã-n | \ |
---|
3 | 78 | 43 | 55.1ïŒ
| 36 |
10 | 124 | 53 | 42.7ïŒ
| 42 |
17 | 90 | 70 | 77.8ïŒ
| 67 |
24 | 84 | 47 | 56ïŒ
| 40 |
31 | 193 | 105 | 54.4ïŒ
| 91 |
| ç·ãªãŒãžã§ã³ã-ã³ããŒãã®ïŒ
| ç·ãªãŒãžã§ã³ã-åæ¥çã®å²å | ... | é«åºŠãªãã®ãªãŒãžã§ã³ã-n | \ |
---|
3 | 46.2ïŒ
| 83.7ïŒ
| ... | 36 |
10 | 33.9ïŒ
| 79.2ïŒ
| ... | 34 |
17 | 74.400000000000006ïŒ
| 95.7ïŒ
| ... | 67 |
24 | 47.6ïŒ
| 85.1ïŒ
| ... | 23 |
31 | 47.2ïŒ
| 86.7ïŒ
| ... | 22 |
| Regents w/o Advanced â % of cohort | Regents w/o Advanced â % of grads | \ |
---|
3 | 46.2% | 83.7% |
10 | 27.4% | 64.2% |
17 | 74.400000000000006% | 95.7% |
24 | 27.4% | 48.9% |
31 | 11.4% | 21% |
| Local â n | Local â % of cohort | Local â % of grads | Still Enrolled â n | \ |
---|
3 | 7 | 9% | 16.3% | 16 |
10 | 11 | 8.9% | 20.8% | 46 |
17 | 3 | 3.3% | 4.3% | 15 |
24 | 7 | 8.300000000000001% | 14.9% | 25 |
31 | 14 | 7.3% | 13.3% | 53 |
| Still Enrolled â % of cohort | Dropped Out â n | Dropped Out â % of cohort |
---|
3 | 20.5% | 11 | 14.1% |
10 | 37.1% | 20 | 16.100000000000001% |
17 | 16.7% | 5 | 5.6% |
24 | 29.8% | 5 | 6ïŒ
|
31 | 27.5% | 35 | 18.100000000000001% |
hs_directory
| dbn | school_name | boro | \ |
---|
0 | 17K548 | Brooklyn School for Music & Theatre | Brooklyn |
1 | 09X543 | High School for Violin and Dance | Bronx |
2 | 09X327 | Comprehensive Model School Project MS 327 | Bronx |
3 | 02M280 | Manhattan Early College School for Advertising | ãã³ããã¿ã³ |
4 | 28Q680 | Queens Gateway to Health Sciences Secondary Sc... | Queens |
| building_code | phone_number | fax_number | grade_span_min | grade_span_max | \ |
---|
0 | K440 | 718-230-6250 | 718-230-6262 | 9 | 12 |
1 | X400 | 718-842-0687 | 718-589-9849 | 9 | 12 |
2 | X240 | 718-294-8111 | 718-294-8109 | 6 | 12 |
3 | M520 | 718-935-3477 | ãã³ | 9 | 10 |
4 | Q695 | 718-969-3155 | 718-969-3552 | 6 | 12 |
| expgrade_span_min | expgrade_span_max | ... | priority05 | priority06 | priority07 | priority08 | \ |
---|
0 | ãã³ | ãã³ | ... | ãã³ | ãã³ | ãã³ | ãã³ |
1 | ãã³ | ãã³ | ... | ãã³ | ãã³ | ãã³ | ãã³ |
2 | ãã³ | ãã³ | ... | Then to New York City residents | ãã³ | ãã³ | ãã³ |
3 | 9 | 14.0 | ... | ãã³ | ãã³ | ãã³ | ãã³ |
4 | ãã³ | ãã³ | ... | ãã³ | ãã³ | ãã³ | ãã³ |
| priority09 | priority10 | Location 1 | \ |
---|
0 | ãã³ | ãã³ | 883 Classon Avenue\nBrooklyn, NY 11225\n(40.67... |
1 | ãã³ | ãã³ | 1110 Boston Road\nBronx, NY 10456\n(40.8276026... |
2 | ãã³ | ãã³ | 1501 Jerome Avenue\nBronx, NY 10452\n(40.84241... |
3 | ãã³ | ãã³ | 411 Pearl Street\nNew York, NY 10038\n(40.7106... |
4 | ãã³ | ãã³ | 160-20 Goethals Avenue\nJamaica, NY 11432\n(40... |
| DBN | lat | lon |
---|
0 | 17K548 | 40.670299 | -73.961648 |
1 | 09X543 | 40.827603 | -73.904475 |
2 | 09X327 | 40.842414 | -73.916162 |
3 | 02M280 | 40.710679 | -74.000807 |
4 | 28Q680 | 40.718810 | -73.806500 |
, DBN. , . , , sat_results . , outer join, . â . â .
.
:
In [75]: flat_data_names = [k for k,v in data.items()] flat_data = [data[k] for k in flat_data_names] full = flat_data[0] for i, f in enumerate(flat_data[1:]): name = flat_data_names[i+1] print(name) print(len(f["DBN"]) - len(f["DBN"].unique())) join_type = "inner" if name in ["sat_results", "ap_2010", "graduation"]: join_type = "outer" if name not in ["math_test_results"]: full = full.merge(f, on="DBN", how=join_type) full.shape survey 0 ap_2010 1 sat_results 0 class_size 0 demographics 0 graduation 0 hs_directory 0 Out[75]: (374, 174)
, full , . . , , :
In [76]: cols = ['AP Test Takers ', 'Total Exams Taken', 'Number of Exams with scores 3 4 or 5'] for col in cols: full[col] = full[col].convert_objects(convert_numeric=True) full[cols] = full[cols].fillna(value=0)
, school_dist , . , :
In [77]: full["school_dist"] = full["DBN"].apply(lambda x: x[:2])
, full ,
In [79]: full = full.fillna(full.mean())
, , â . , . corr Pandas. 0 â . 1 â . -1 â :
In [80]: full.corr()['sat_score'] Out[80]: Year NaN Number Tested 8.127817e-02 rr_s 8.484298e-02 rr_t -6.604290e-02 rr_p 3.432778e-02 N_s 1.399443e-01 N_t 9.654314e-03 N_p 1.397405e-01 saf_p_11 1.050653e-01 com_p_11 2.107343e-02 eng_p_11 5.094925e-02 aca_p_11 5.822715e-02 saf_t_11 1.206710e-01 com_t_11 3.875666e-02 eng_t_10 NaN aca_t_11 5.250357e-02 saf_s_11 1.054050e-01 com_s_11 4.576521e-02 eng_s_11 6.303699e-02 aca_s_11 8.015700e-02 saf_tot_11 1.266955e-01 com_tot_11 4.340710e-02 eng_tot_11 5.028588e-02 aca_tot_11 7.229584e-02 AP Test Takers 5.687940e-01 Total Exams Taken 5.585421e-01 Number of Exams with scores 3 4 or 5 5.619043e-01 SAT Critical Reading Avg. Score 9.868201e-01 SAT Math Avg. Score 9.726430e-01 SAT Writing Avg. Score 9.877708e-01 ... SIZE OF SMALLEST CLASS 2.440690e-01 SIZE OF LARGEST CLASS 3.052551e-01 SCHOOLWIDE PUPIL-TEACHER RATIO NaN schoolyear NaN frl_percent -7.018217e-01 total_enrollment 3.668201e-01 ell_num -1.535745e-01 ell_percent -3.981643e-01 sped_num 3.486852e-02 sped_percent -4.413665e-01 asian_num 4.748801e-01 asian_per 5.686267e-01 black_num 2.788331e-02 black_per -2.827907e-01 hispanic_num 2.568811e-02 hispanic_per -3.926373e-01 white_num 4.490835e-01 white_per 6.100860e-01 male_num 3.245320e-01 male_per -1.101484e-01 female_num 3.876979e-01 female_per 1.101928e-01 Total Cohort 3.244785e-01 grade_span_max -2.495359e-17 expgrade_span_max NaN zip -6.312962e-02 total_students 4.066081e-01 number_programs 1.166234e-01 lat -1.198662e-01 lon -1.315241e-01 Name: sat_score, dtype: float64
, :
- ( total_enrollment ) ( sat_score ), , , , , , .
- ( female_per ) , ( male_per ) â .
- .
- ( white_per , asian_per , black_per , hispanic_per ).
- ell_percent .
â .
ã泚æ perevã, ( ) . , , , â .
, , . , . .
, , , , , . â . , , .
:
In [82]: import folium from folium import plugins schools_map = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10) marker_cluster = folium.MarkerCluster().add_to(schools_map) for name, row in full.iterrows(): folium.Marker([row["lat"], row["lon"]], popup="{0}: {1}".format(row["DBN"], row["school_name"])).add_to(marker_cluster) schools_map.create_map('schools.html') schools_map Out[82]:

, , - . :
In [84]: schools_heatmap = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10) schools_heatmap.add_children(plugins.HeatMap([[row["lat"], row["lon"]] for name, row in full.iterrows()])) schools_heatmap.save("heatmap.html") schools_heatmap Out[84]:

, - , . , .. . - , .
. :
In [ ]: district_data = full.groupby("school_dist").agg(np.mean) district_data.reset_index(inplace=True) district_data["school_dist"] = district_data["school_dist"].apply(lambda x: str(int(x))
. GeoJSON , , school_dist , , , .
In [85]: def show_district_map(col): geo_path = 'schools/districts.geojson' districts = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10) districts.geo_json( geo_path=geo_path, data=district_data, columns=['school_dist', col], key_on='feature.properties.school_dist', fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2, ) districts.save("districts.html") return districts show_district_map("sat_score") Out[85]:

, ; , . , , . â , , .
, :
In [87]: %matplotlib inline full.plot.scatter(x='total_enrollment', y='sat_score') Out[87]: <matplotlib.axes._subplots.AxesSubplot at 0x10fe79978>

, . , . .
, :
In [88]: full[(full["total_enrollment"] < 1000) & (full["sat_score"] < 1000)]["School Name"] Out[88]: 34 INTERNATIONAL SCHOOL FOR LIBERAL ARTS 143 NaN 148 KINGSBRIDGE INTERNATIONAL HIGH SCHOOL 203 MULTICULTURAL HIGH SCHOOL 294 INTERNATIONAL COMMUNITY HIGH SCHOOL 304 BRONX INTERNATIONAL HIGH SCHOOL 314 NaN 317 HIGH SCHOOL OF WORLD CULTURES 320 BROOKLYN INTERNATIONAL HIGH SCHOOL 329 INTERNATIONAL HIGH SCHOOL AT PROSPECT 331 IT TAKES A VILLAGE ACADEMY 351 PAN AMERICAN INTERNATIONAL HIGH SCHOO Name: School Name, dtype: object
, , , , , . , â , , .
, , . ell_percent - . :
In [89]: full.plot.scatter(x='ell_percent', y='sat_score') Out[89]: <matplotlib.axes._subplots.AxesSubplot at 0x10fe824e0>

, ell_percentage . , , :
In [90]: show_district_map("ell_percent") Out[90]:

, , .
, , . , . :
In [91]: full.corr()["sat_score"][["rr_s", "rr_t", "rr_p", "N_s", "N_t", "N_p", "saf_tot_11", "com_tot_11", "aca_tot_11", "eng_tot_11"]].plot.bar() Out[91]: <matplotlib.axes._subplots.AxesSubplot at 0x114652400>

, N_p N_s , . , ell_learners . â saf_t_11 . , , . , , â . , , , , . , - , ( â , ).
. , , :
In [92]: full.corr()["sat_score"][["white_per", "asian_per", "black_per", "hispanic_per"]].plot.bar() Out[92]: <matplotlib.axes._subplots.AxesSubplot at 0x108166ba8>

, , . , , . , :
In [93]: show_district_map("hispanic_per") Out[93]:

, - , .
â . , . :
In [94]: full.corr()["sat_score"][["male_per", "female_per"]].plot.bar() Out[94]: <matplotlib.axes._subplots.AxesSubplot at 0x10774d0f0>

, female_per sat_score :
In [95]: full.plot.scatter(x='female_per', y='sat_score') Out[95]: <matplotlib.axes._subplots.AxesSubplot at 0x104715160>

, ( ). :
In [96]: full[(full["female_per"] > 65) & (full["sat_score"] > 1400)]["School Name"] Out[96]: 3 PROFESSIONAL PERFORMING ARTS HIGH SCH 92 ELEANOR ROOSEVELT HIGH SCHOOL 100 TALENT UNLIMITED HIGH SCHOOL 111 FIORELLO H. LAGUARDIA HIGH SCHOOL OF 229 TOWNSEND HARRIS HIGH SCHOOL 250 FRANK SINATRA SCHOOL OF THE ARTS HIGH SCHOOL 265 BARD HIGH SCHOOL EARLY COLLEGE Name: School Name, dtype: object
, , . . , , , , , .
. , â , , . , , .
In [98]: full["ap_avg"] = full["AP Test Takers "] / full["total_enrollment"] full.plot.scatter(x='ap_avg', y='sat_score') Out[98]: <matplotlib.axes._subplots.AxesSubplot at 0x11463a908>

, . , :
In [99]: full[(full["ap_avg"] > .3) & (full["sat_score"] > 1700)]["School Name"] Out[99]: 92 ELEANOR ROOSEVELT HIGH SCHOOL 98 STUYVESANT HIGH SCHOOL 157 BRONX HIGH SCHOOL OF SCIENCE 161 HIGH SCHOOL OF AMERICAN STUDIES AT LE 176 BROOKLYN TECHNICAL HIGH SCHOOL 229 TOWNSEND HARRIS HIGH SCHOOL 243 QUEENS HIGH SCHOOL FOR THE SCIENCES A 260 STATEN ISLAND TECHNICAL HIGH SCHOOL Name: School Name, dtype: object
, , , . , .
data science - . , . , , , .
â . â - . â , .
次ã¯äœã§ãã
â , .
Dataquest , , . â .