Assignment: Notebook for Peer Assignment¶
Introduction¶
Using this Python notebook you will:
- Understand three Chicago datasets
- Load the three datasets into three tables in a SQLIte database
- Execute SQL queries to answer assignment questions
Understand the datasets¶
To complete the assignment problems in this notebook you will be using three datasets that are available on the city of Chicago's Data Portal:
1. Socioeconomic Indicators in Chicago¶
This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2
2. Chicago Public Schools¶
This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t
3. Chicago Crime Data¶
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
Download the datasets¶
This assignment requires you to have these three tables populated with a subset of the whole datasets.
In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the links below to download and save the datasets (.CSV files):
NOTE: Ensure you have downloaded the datasets using the links above instead of directly from the Chicago Data Portal. The versions linked here are subsets of the original datasets and have some of the column names modified to be more database friendly which will make it easier to complete this assignment.
Store the datasets in database tables¶
To analyze the data using SQL, it first needs to be loaded into SQLite DB. We will create three tables in as under:
- CENSUS_DATA
- CHICAGO_PUBLIC_SCHOOLS
- CHICAGO_CRIME_DATA
Let us now load the ipython-sql extension and establish a connection with the database
Here you will be loading the csv files into the pandas Dataframe and then loading the data into the above mentioned sqlite tables.
Next you will be connecting to the sqlite database FinalDB.
%load_ext sql
The sql extension is already loaded. To reload it, use: %reload_ext sql
import csv, sqlite3
con = sqlite3.connect("Final.db")
cur = con.cursor()
%pip install --upgrade pandas
Requirement already satisfied: pandas in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (2.2.3) Requirement already satisfied: numpy>=1.26.0 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from pandas) (1.26.4) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2024.1) Requirement already satisfied: six>=1.5 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0) Note: you may need to restart the kernel to use updated packages.
%sql sqlite:///Final.db
import pandas
df1 = pandas.read_csv('ChicagoCensusData.csv')
df2 = pandas.read_csv('ChicagoCrimeData.csv')
df3 = pandas.read_csv('ChicagoPublicSchools.csv')
df1.to_sql("ChicagoCensusData", con, if_exists='replace', index=False)
df2.to_sql("ChicagoCrimeData", con, if_exists='replace', index=False)
df3.to_sql("ChicagoPublicSchools", con, if_exists='replace', index=False)
566
cur. execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cur. fetchall())
[('ChicagoCensusData',), ('ChicagoCrimeData',), ('ChicagoPublicSchools',)]
%sql SELECT * FROM ChicagoCensusData LIMIT 5;
* sqlite:///Final.db Done.
COMMUNITY_AREA_NUMBER | COMMUNITY_AREA_NAME | PERCENT_OF_HOUSING_CROWDED | PERCENT_HOUSEHOLDS_BELOW_POVERTY | PERCENT_AGED_16__UNEMPLOYED | PERCENT_AGED_25__WITHOUT_HIGH_SCHOOL_DIPLOMA | PERCENT_AGED_UNDER_18_OR_OVER_64 | PER_CAPITA_INCOME | HARDSHIP_INDEX |
---|---|---|---|---|---|---|---|---|
1.0 | Rogers Park | 7.7 | 23.6 | 8.7 | 18.2 | 27.5 | 23939 | 39.0 |
2.0 | West Ridge | 7.8 | 17.2 | 8.8 | 20.8 | 38.5 | 23040 | 46.0 |
3.0 | Uptown | 3.8 | 24.0 | 8.9 | 11.8 | 22.2 | 35787 | 20.0 |
4.0 | Lincoln Square | 3.4 | 10.9 | 8.2 | 13.4 | 25.5 | 37524 | 17.0 |
5.0 | North Center | 0.3 | 7.5 | 5.2 | 4.5 | 26.2 | 57123 | 6.0 |
%sql SELECT * FROM ChicagoCrimeData LIMIT 5;
* sqlite:///Final.db Done.
ID | CASE_NUMBER | DATE | BLOCK | IUCR | PRIMARY_TYPE | DESCRIPTION | LOCATION_DESCRIPTION | ARREST | DOMESTIC | BEAT | DISTRICT | WARD | COMMUNITY_AREA_NUMBER | FBICODE | X_COORDINATE | Y_COORDINATE | YEAR | LATITUDE | LONGITUDE | LOCATION |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3512276 | HK587712 | 28/8/2004 | 047XX S KEDZIE AVE | 890 | THEFT | FROM BUILDING | SMALL RETAIL STORE | 0 | 0 | 911 | 9 | 14.0 | 58.0 | 6 | 1155838.0 | 1873050.0 | 2004 | 41.8074405 | -87.70395585 | (41.8074405, -87.703955849) |
3406613 | HK456306 | 26/6/2004 | 009XX N CENTRAL PARK AVE | 820 | THEFT | $500 AND UNDER | OTHER | 0 | 0 | 1112 | 11 | 27.0 | 23.0 | 6 | 1152206.0 | 1906127.0 | 2004 | 41.89827996 | -87.71640551 | (41.898279962, -87.716405505) |
8002131 | HT233595 | 4/4/2011 | 043XX S WABASH AVE | 820 | THEFT | $500 AND UNDER | NURSING HOME/RETIREMENT HOME | 0 | 0 | 221 | 2 | 3.0 | 38.0 | 6 | 1177436.0 | 1876313.0 | 2011 | 41.81593313 | -87.62464213 | (41.815933131, -87.624642127) |
7903289 | HT133522 | 30/12/2010 | 083XX S KINGSTON AVE | 840 | THEFT | FINANCIAL ID THEFT: OVER $300 | RESIDENCE | 0 | 0 | 423 | 4 | 7.0 | 46.0 | 6 | 1194622.0 | 1850125.0 | 2010 | 41.74366532 | -87.56246276 | (41.743665322, -87.562462756) |
10402076 | HZ138551 | 2/2/2016 | 033XX W 66TH ST | 820 | THEFT | $500 AND UNDER | ALLEY | 0 | 0 | 831 | 8 | 15.0 | 66.0 | 6 | 1155240.0 | 1860661.0 | 2016 | 41.7734553 | -87.70648047 | (41.773455295, -87.706480471) |
%sql ALTER TABLE ChicagoPublicSchools RENAME COLUMN 'Elementary, Middle, or High School' TO Elementary_Middle_or_High_School;
* sqlite:///Final.db Done.
[]
%sql SELECT * FROM ChicagoPublicSchools LIMIT 5;
* sqlite:///Final.db Done.
School_ID | NAME_OF_SCHOOL | Elementary_Middle_or_High_School | Street_Address | City | State | ZIP_Code | Phone_Number | Link | Network_Manager | Collaborative_Name | Adequate_Yearly_Progress_Made_ | Track_Schedule | CPS_Performance_Policy_Status | CPS_Performance_Policy_Level | HEALTHY_SCHOOL_CERTIFIED | Safety_Icon | SAFETY_SCORE | Family_Involvement_Icon | Family_Involvement_Score | Environment_Icon | Environment_Score | Instruction_Icon | Instruction_Score | Leaders_Icon | Leaders_Score | Teachers_Icon | Teachers_Score | Parent_Engagement_Icon | Parent_Engagement_Score | Parent_Environment_Icon | Parent_Environment_Score | AVERAGE_STUDENT_ATTENDANCE | Rate_of_Misconducts__per_100_students_ | Average_Teacher_Attendance | Individualized_Education_Program_Compliance_Rate | Pk_2_Literacy__ | Pk_2_Math__ | Gr3_5_Grade_Level_Math__ | Gr3_5_Grade_Level_Read__ | Gr3_5_Keep_Pace_Read__ | Gr3_5_Keep_Pace_Math__ | Gr6_8_Grade_Level_Math__ | Gr6_8_Grade_Level_Read__ | Gr6_8_Keep_Pace_Math_ | Gr6_8_Keep_Pace_Read__ | Gr_8_Explore_Math__ | Gr_8_Explore_Read__ | ISAT_Exceeding_Math__ | ISAT_Exceeding_Reading__ | ISAT_Value_Add_Math | ISAT_Value_Add_Read | ISAT_Value_Add_Color_Math | ISAT_Value_Add_Color_Read | Students_Taking__Algebra__ | Students_Passing__Algebra__ | 9th Grade EXPLORE (2009) | 9th Grade EXPLORE (2010) | 10th Grade PLAN (2009) | 10th Grade PLAN (2010) | Net_Change_EXPLORE_and_PLAN | 11th Grade Average ACT (2011) | Net_Change_PLAN_and_ACT | College_Eligibility__ | Graduation_Rate__ | College_Enrollment_Rate__ | COLLEGE_ENROLLMENT | General_Services_Route | Freshman_on_Track_Rate__ | X_COORDINATE | Y_COORDINATE | Latitude | Longitude | COMMUNITY_AREA_NUMBER | COMMUNITY_AREA_NAME | Ward | Police_District | Location |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
610038 | Abraham Lincoln Elementary School | ES | 615 W Kemper Pl | Chicago | IL | 60614 | (773) 534-5720 | http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610038.pdf | Fullerton Elementary Network | NORTH-NORTHWEST SIDE COLLABORATIVE | No | Standard | Not on Probation | Level 1 | Yes | Very Strong | 99.0 | Very Strong | 99 | Strong | 74.0 | Strong | 66.0 | Weak | 65 | Strong | 70 | Strong | 56 | Average | 47 | 96.00% | 2.0 | 96.40% | 95.80% | 80.1 | 43.3 | 89.6 | 84.9 | 60.7 | 62.6 | 81.9 | 85.2 | 52 | 62.4 | 66.3 | 77.9 | 69.7 | 64.4 | 0.2 | 0.9 | Yellow | Green | 67.1 | 54.5 | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | 813 | 33 | NDA | 1171699.458 | 1915829.428 | 41.92449696 | -87.64452163 | 7 | LINCOLN PARK | 43 | 18 | (41.92449696, -87.64452163) |
610281 | Adam Clayton Powell Paideia Community Academy Elementary School | ES | 7511 S South Shore Dr | Chicago | IL | 60649 | (773) 535-6650 | http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610281.pdf | Skyway Elementary Network | SOUTH SIDE COLLABORATIVE | No | Track_E | Not on Probation | Level 1 | No | Average | 54.0 | Strong | 66 | Strong | 74.0 | Very Strong | 84.0 | Weak | 63 | Strong | 76 | Weak | 46 | Average | 50 | 95.60% | 15.7 | 95.30% | 100.00% | 62.4 | 51.7 | 21.9 | 15.1 | 29 | 42.8 | 38.5 | 27.4 | 44.8 | 42.7 | 14.1 | 34.4 | 16.8 | 16.5 | 0.7 | 1.4 | Green | Green | 17.2 | 27.3 | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | 521 | 46 | NDA | 1196129.985 | 1856209.466 | 41.76032435 | -87.55673627 | 43 | SOUTH SHORE | 7 | 4 | (41.76032435, -87.55673627) |
610185 | Adlai E Stevenson Elementary School | ES | 8010 S Kostner Ave | Chicago | IL | 60652 | (773) 535-2280 | http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610185.pdf | Midway Elementary Network | SOUTHWEST SIDE COLLABORATIVE | No | Standard | Not on Probation | Level 2 | No | Strong | 61.0 | NDA | NDA | Average | 50.0 | Weak | 36.0 | Weak | NDA | NDA | NDA | Average | 47 | Weak | 41 | 95.70% | 2.3 | 94.70% | 98.30% | 53.7 | 26.6 | 38.3 | 34.7 | 43.7 | 57.3 | 48.8 | 39.2 | 46.8 | 44 | 7.5 | 21.9 | 18.3 | 15.5 | -0.9 | -1.0 | Red | Red | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | 1324 | 44 | NDA | 1148427.165 | 1851012.215 | 41.74711093 | -87.73170248 | 70 | ASHBURN | 13 | 8 | (41.74711093, -87.73170248) |
609993 | Agustin Lara Elementary Academy | ES | 4619 S Wolcott Ave | Chicago | IL | 60609 | (773) 535-4389 | http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_609993.pdf | Pershing Elementary Network | SOUTHWEST SIDE COLLABORATIVE | No | Track_E | Not on Probation | Level 1 | No | Average | 56.0 | Average | 44 | Average | 45.0 | Weak | 37.0 | Weak | 65 | Average | 48 | Average | 53 | Strong | 58 | 95.50% | 10.4 | 95.80% | 100.00% | 76.9 | NDA | 26 | 24.7 | 61.8 | 49.7 | 39.2 | 27.2 | 69.7 | 60.6 | 9.1 | 18.2 | 11.1 | 9.6 | 0.9 | 2.4 | Green | Green | 42.9 | 25 | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | 556 | 42 | NDA | 1164504.29 | 1873959.199 | 41.8097569 | -87.6721446 | 61 | NEW CITY | 20 | 9 | (41.8097569, -87.6721446) |
610513 | Air Force Academy High School | HS | 3630 S Wells St | Chicago | IL | 60609 | (773) 535-1590 | http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610513.pdf | Southwest Side High School Network | SOUTHWEST SIDE COLLABORATIVE | NDA | Standard | Not on Probation | Not Enough Data | Yes | Average | 49.0 | Strong | 60 | Strong | 60.0 | Average | 55.0 | Weak | 45 | Average | 54 | Average | 53 | Average | 49 | 93.30% | 15.6 | 96.90% | 100.00% | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | NDA | None | None | None | None | NDA | NDA | NDA | NDA | 14.6 | 14.8 | NDA | 16 | 1.4 | NDA | NDA | NDA | NDA | NDA | 302 | 40 | 91.8 | 1175177.622 | 1880745.126 | 41.82814609 | -87.63279369 | 34 | ARMOUR SQUARE | 11 | 9 | (41.82814609, -87.63279369) |
%sql SELECT COUNT (*) AS Total_num_of_crime FROM 'ChicagoCrimeData';
* sqlite:///Final.db Done.
Total_num_of_crime |
---|
533 |
%sql SELECT COMMUNITY_AREA_NUMBER, COMMUNITY_AREA_NAME FROM ChicagoCensusData WHERE PER_CAPITA_INCOME < 11000;
* sqlite:///Final.db Done.
COMMUNITY_AREA_NUMBER | COMMUNITY_AREA_NAME |
---|---|
26.0 | West Garfield Park |
30.0 | South Lawndale |
37.0 | Fuller Park |
54.0 | Riverdale |
%sql SELECT CASE_NUMBER FROM ChicagoCrimeData WHERE DESCRIPTION LIKE '%MINOR%';
* sqlite:///Final.db Done.
CASE_NUMBER |
---|
HL266884 |
HK238408 |
%sql SELECT CASE_NUMBER FROM ChicagoCrimeData WHERE PRIMARY_TYPE = 'KIDNAPPING' AND DESCRIPTION LIKE '%CHILD%';
* sqlite:///Final.db Done.
CASE_NUMBER |
---|
HN144152 |
%sql SELECT CASE_NUMBER, PRIMARY_TYPE FROM ChicagoCrimeData WHERE LOCATION_DESCRIPTION LIKE '%SCHOOL%'
* sqlite:///Final.db Done.
CASE_NUMBER | PRIMARY_TYPE |
---|---|
HL353697 | BATTERY |
HL725506 | BATTERY |
HP716225 | BATTERY |
HH639427 | BATTERY |
JA460432 | BATTERY |
HS200939 | CRIMINAL DAMAGE |
HK577020 | NARCOTICS |
HS305355 | NARCOTICS |
HT315369 | ASSAULT |
HR585012 | CRIMINAL TRESPASS |
HH292682 | PUBLIC PEACE VIOLATION |
G635735 | PUBLIC PEACE VIOLATION |
%sql SELECT Elementary_Middle_or_High_School, AVG(SAFETY_SCORE) AS SAFETY_SCORE FROM ChicagoPublicSchools GROUP BY Elementary_Middle_or_High_School;
* sqlite:///Final.db Done.
Elementary_Middle_or_High_School | SAFETY_SCORE |
---|---|
ES | 49.52038369304557 |
HS | 49.62352941176471 |
MS | 48.0 |
%sql SELECT COMMUNITY_AREA_NAME, PERCENT_HOUSEHOLDS_BELOW_POVERTY FROM ChicagoCensusData ORDER BY PERCENT_HOUSEHOLDS_BELOW_POVERTY DESC LIMIT 5;
* sqlite:///Final.db Done.
COMMUNITY_AREA_NAME | PERCENT_HOUSEHOLDS_BELOW_POVERTY |
---|---|
Riverdale | 56.5 |
Fuller Park | 51.2 |
Englewood | 46.6 |
North Lawndale | 43.1 |
East Garfield Park | 42.4 |
%sql SELECT COMMUNITY_AREA_NUMBER, COUNT(*) AS TOTAL_CRIMES FROM ChicagoCrimeData GROUP BY COMMUNITY_AREA_NUMBER ORDER BY TOTAL_CRIMES DESC LIMIT 1;
* sqlite:///Final.db Done.
COMMUNITY_AREA_NUMBER | TOTAL_CRIMES |
---|---|
25.0 | 43 |
Double-click here for a hint
%sql SELECT COMMUNITY_AREA_NAME FROM ChicagoCensusData WHERE HARDSHIP_INDEX = (SELECT MAX(HARDSHIP_INDEX) FROM ChicagoCensusData);
* sqlite:///Final.db Done.
COMMUNITY_AREA_NAME |
---|
Riverdale |
%sql SELECT ChicagoCensusData.COMMUNITY_AREA_NUMBER, ChicagoCensusData.COMMUNITY_AREA_NAME, COUNT(*) AS TOTAL_CRIMES FROM ChicagoCrimeData JOIN ChicagoCensusData ON ChicagoCrimeData.COMMUNITY_AREA_NUMBER = ChicagoCensusData.COMMUNITY_AREA_NUMBER GROUP BY ChicagoCrimeData.COMMUNITY_AREA_NUMBER ORDER BY TOTAL_CRIMES DESC LIMIT 1;
* sqlite:///Final.db Done.
COMMUNITY_AREA_NUMBER | COMMUNITY_AREA_NAME | TOTAL_CRIMES |
---|---|---|
25.0 | Austin | 43 |
Copyright © 2020 This notebook and its source code are released under the terms of the MIT License.