Assignment: Notebook for Peer Assignment¶

Introduction¶

Using this Python notebook you will:

  1. Understand three Chicago datasets
  2. Load the three datasets into three tables in a SQLIte database
  3. Execute SQL queries to answer assignment questions

Understand the datasets¶

To complete the assignment problems in this notebook you will be using three datasets that are available on the city of Chicago's Data Portal:

  1. Socioeconomic Indicators in Chicago
  2. Chicago Public Schools
  3. Chicago Crime Data

1. Socioeconomic Indicators in Chicago¶

This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2

2. Chicago Public Schools¶

This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t

3. Chicago Crime Data¶

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

Download the datasets¶

This assignment requires you to have these three tables populated with a subset of the whole datasets.

In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the links below to download and save the datasets (.CSV files):

  • Chicago Census Data

  • Chicago Public Schools

  • Chicago Crime Data

NOTE: Ensure you have downloaded the datasets using the links above instead of directly from the Chicago Data Portal. The versions linked here are subsets of the original datasets and have some of the column names modified to be more database friendly which will make it easier to complete this assignment.

Store the datasets in database tables¶

To analyze the data using SQL, it first needs to be loaded into SQLite DB. We will create three tables in as under:

  1. CENSUS_DATA
  2. CHICAGO_PUBLIC_SCHOOLS
  3. CHICAGO_CRIME_DATA

Let us now load the ipython-sql extension and establish a connection with the database

  • Here you will be loading the csv files into the pandas Dataframe and then loading the data into the above mentioned sqlite tables.

  • Next you will be connecting to the sqlite database FinalDB.

In [25]:
%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
In [26]:
import csv, sqlite3

con = sqlite3.connect("Final.db")
cur = con.cursor()
In [27]:
%pip install --upgrade pandas
Requirement already satisfied: pandas in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (2.2.3)
Requirement already satisfied: numpy>=1.26.0 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from pandas) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2024.1)
Requirement already satisfied: six>=1.5 in c:\users\ducth\appdata\local\programs\python\python312\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
In [28]:
%sql sqlite:///Final.db
In [29]:
import pandas
df1 = pandas.read_csv('ChicagoCensusData.csv')
df2 = pandas.read_csv('ChicagoCrimeData.csv')
df3 = pandas.read_csv('ChicagoPublicSchools.csv')
df1.to_sql("ChicagoCensusData", con, if_exists='replace', index=False)
df2.to_sql("ChicagoCrimeData", con, if_exists='replace', index=False)
df3.to_sql("ChicagoPublicSchools", con, if_exists='replace', index=False)
Out[29]:
566
In [30]:
cur. execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cur. fetchall())
[('ChicagoCensusData',), ('ChicagoCrimeData',), ('ChicagoPublicSchools',)]
In [31]:
%sql SELECT * FROM ChicagoCensusData LIMIT 5;
 * sqlite:///Final.db
Done.
Out[31]:
COMMUNITY_AREA_NUMBER COMMUNITY_AREA_NAME PERCENT_OF_HOUSING_CROWDED PERCENT_HOUSEHOLDS_BELOW_POVERTY PERCENT_AGED_16__UNEMPLOYED PERCENT_AGED_25__WITHOUT_HIGH_SCHOOL_DIPLOMA PERCENT_AGED_UNDER_18_OR_OVER_64 PER_CAPITA_INCOME HARDSHIP_INDEX
1.0 Rogers Park 7.7 23.6 8.7 18.2 27.5 23939 39.0
2.0 West Ridge 7.8 17.2 8.8 20.8 38.5 23040 46.0
3.0 Uptown 3.8 24.0 8.9 11.8 22.2 35787 20.0
4.0 Lincoln Square 3.4 10.9 8.2 13.4 25.5 37524 17.0
5.0 North Center 0.3 7.5 5.2 4.5 26.2 57123 6.0
In [32]:
%sql SELECT * FROM ChicagoCrimeData LIMIT 5;
 * sqlite:///Final.db
Done.
Out[32]:
ID CASE_NUMBER DATE BLOCK IUCR PRIMARY_TYPE DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT DISTRICT WARD COMMUNITY_AREA_NUMBER FBICODE X_COORDINATE Y_COORDINATE YEAR LATITUDE LONGITUDE LOCATION
3512276 HK587712 28/8/2004 047XX S KEDZIE AVE 890 THEFT FROM BUILDING SMALL RETAIL STORE 0 0 911 9 14.0 58.0 6 1155838.0 1873050.0 2004 41.8074405 -87.70395585 (41.8074405, -87.703955849)
3406613 HK456306 26/6/2004 009XX N CENTRAL PARK AVE 820 THEFT $500 AND UNDER OTHER 0 0 1112 11 27.0 23.0 6 1152206.0 1906127.0 2004 41.89827996 -87.71640551 (41.898279962, -87.716405505)
8002131 HT233595 4/4/2011 043XX S WABASH AVE 820 THEFT $500 AND UNDER NURSING HOME/RETIREMENT HOME 0 0 221 2 3.0 38.0 6 1177436.0 1876313.0 2011 41.81593313 -87.62464213 (41.815933131, -87.624642127)
7903289 HT133522 30/12/2010 083XX S KINGSTON AVE 840 THEFT FINANCIAL ID THEFT: OVER $300 RESIDENCE 0 0 423 4 7.0 46.0 6 1194622.0 1850125.0 2010 41.74366532 -87.56246276 (41.743665322, -87.562462756)
10402076 HZ138551 2/2/2016 033XX W 66TH ST 820 THEFT $500 AND UNDER ALLEY 0 0 831 8 15.0 66.0 6 1155240.0 1860661.0 2016 41.7734553 -87.70648047 (41.773455295, -87.706480471)
In [33]:
%sql ALTER TABLE ChicagoPublicSchools RENAME COLUMN 'Elementary, Middle, or High School' TO Elementary_Middle_or_High_School;
 * sqlite:///Final.db
Done.
Out[33]:
[]
In [34]:
%sql SELECT * FROM ChicagoPublicSchools LIMIT 5;
 * sqlite:///Final.db
Done.
Out[34]:
School_ID NAME_OF_SCHOOL Elementary_Middle_or_High_School Street_Address City State ZIP_Code Phone_Number Link Network_Manager Collaborative_Name Adequate_Yearly_Progress_Made_ Track_Schedule CPS_Performance_Policy_Status CPS_Performance_Policy_Level HEALTHY_SCHOOL_CERTIFIED Safety_Icon SAFETY_SCORE Family_Involvement_Icon Family_Involvement_Score Environment_Icon Environment_Score Instruction_Icon Instruction_Score Leaders_Icon Leaders_Score Teachers_Icon Teachers_Score Parent_Engagement_Icon Parent_Engagement_Score Parent_Environment_Icon Parent_Environment_Score AVERAGE_STUDENT_ATTENDANCE Rate_of_Misconducts__per_100_students_ Average_Teacher_Attendance Individualized_Education_Program_Compliance_Rate Pk_2_Literacy__ Pk_2_Math__ Gr3_5_Grade_Level_Math__ Gr3_5_Grade_Level_Read__ Gr3_5_Keep_Pace_Read__ Gr3_5_Keep_Pace_Math__ Gr6_8_Grade_Level_Math__ Gr6_8_Grade_Level_Read__ Gr6_8_Keep_Pace_Math_ Gr6_8_Keep_Pace_Read__ Gr_8_Explore_Math__ Gr_8_Explore_Read__ ISAT_Exceeding_Math__ ISAT_Exceeding_Reading__ ISAT_Value_Add_Math ISAT_Value_Add_Read ISAT_Value_Add_Color_Math ISAT_Value_Add_Color_Read Students_Taking__Algebra__ Students_Passing__Algebra__ 9th Grade EXPLORE (2009) 9th Grade EXPLORE (2010) 10th Grade PLAN (2009) 10th Grade PLAN (2010) Net_Change_EXPLORE_and_PLAN 11th Grade Average ACT (2011) Net_Change_PLAN_and_ACT College_Eligibility__ Graduation_Rate__ College_Enrollment_Rate__ COLLEGE_ENROLLMENT General_Services_Route Freshman_on_Track_Rate__ X_COORDINATE Y_COORDINATE Latitude Longitude COMMUNITY_AREA_NUMBER COMMUNITY_AREA_NAME Ward Police_District Location
610038 Abraham Lincoln Elementary School ES 615 W Kemper Pl Chicago IL 60614 (773) 534-5720 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610038.pdf Fullerton Elementary Network NORTH-NORTHWEST SIDE COLLABORATIVE No Standard Not on Probation Level 1 Yes Very Strong 99.0 Very Strong 99 Strong 74.0 Strong 66.0 Weak 65 Strong 70 Strong 56 Average 47 96.00% 2.0 96.40% 95.80% 80.1 43.3 89.6 84.9 60.7 62.6 81.9 85.2 52 62.4 66.3 77.9 69.7 64.4 0.2 0.9 Yellow Green 67.1 54.5 NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA 813 33 NDA 1171699.458 1915829.428 41.92449696 -87.64452163 7 LINCOLN PARK 43 18 (41.92449696, -87.64452163)
610281 Adam Clayton Powell Paideia Community Academy Elementary School ES 7511 S South Shore Dr Chicago IL 60649 (773) 535-6650 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610281.pdf Skyway Elementary Network SOUTH SIDE COLLABORATIVE No Track_E Not on Probation Level 1 No Average 54.0 Strong 66 Strong 74.0 Very Strong 84.0 Weak 63 Strong 76 Weak 46 Average 50 95.60% 15.7 95.30% 100.00% 62.4 51.7 21.9 15.1 29 42.8 38.5 27.4 44.8 42.7 14.1 34.4 16.8 16.5 0.7 1.4 Green Green 17.2 27.3 NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA 521 46 NDA 1196129.985 1856209.466 41.76032435 -87.55673627 43 SOUTH SHORE 7 4 (41.76032435, -87.55673627)
610185 Adlai E Stevenson Elementary School ES 8010 S Kostner Ave Chicago IL 60652 (773) 535-2280 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610185.pdf Midway Elementary Network SOUTHWEST SIDE COLLABORATIVE No Standard Not on Probation Level 2 No Strong 61.0 NDA NDA Average 50.0 Weak 36.0 Weak NDA NDA NDA Average 47 Weak 41 95.70% 2.3 94.70% 98.30% 53.7 26.6 38.3 34.7 43.7 57.3 48.8 39.2 46.8 44 7.5 21.9 18.3 15.5 -0.9 -1.0 Red Red NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA 1324 44 NDA 1148427.165 1851012.215 41.74711093 -87.73170248 70 ASHBURN 13 8 (41.74711093, -87.73170248)
609993 Agustin Lara Elementary Academy ES 4619 S Wolcott Ave Chicago IL 60609 (773) 535-4389 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_609993.pdf Pershing Elementary Network SOUTHWEST SIDE COLLABORATIVE No Track_E Not on Probation Level 1 No Average 56.0 Average 44 Average 45.0 Weak 37.0 Weak 65 Average 48 Average 53 Strong 58 95.50% 10.4 95.80% 100.00% 76.9 NDA 26 24.7 61.8 49.7 39.2 27.2 69.7 60.6 9.1 18.2 11.1 9.6 0.9 2.4 Green Green 42.9 25 NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA 556 42 NDA 1164504.29 1873959.199 41.8097569 -87.6721446 61 NEW CITY 20 9 (41.8097569, -87.6721446)
610513 Air Force Academy High School HS 3630 S Wells St Chicago IL 60609 (773) 535-1590 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610513.pdf Southwest Side High School Network SOUTHWEST SIDE COLLABORATIVE NDA Standard Not on Probation Not Enough Data Yes Average 49.0 Strong 60 Strong 60.0 Average 55.0 Weak 45 Average 54 Average 53 Average 49 93.30% 15.6 96.90% 100.00% NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA NDA None None None None NDA NDA NDA NDA 14.6 14.8 NDA 16 1.4 NDA NDA NDA NDA NDA 302 40 91.8 1175177.622 1880745.126 41.82814609 -87.63279369 34 ARMOUR SQUARE 11 9 (41.82814609, -87.63279369)

Problems¶

Now write and execute SQL queries to solve assignment problems

Problem 1¶

Find the total number of crimes recorded in the CRIME table.¶
In [35]:
%sql SELECT COUNT (*) AS Total_num_of_crime FROM 'ChicagoCrimeData';
 * sqlite:///Final.db
Done.
Out[35]:
Total_num_of_crime
533

Problem 2¶

List community areas with per capita income less than 11000.¶
In [36]:
%sql SELECT COMMUNITY_AREA_NUMBER, COMMUNITY_AREA_NAME FROM ChicagoCensusData WHERE PER_CAPITA_INCOME < 11000;
 * sqlite:///Final.db
Done.
Out[36]:
COMMUNITY_AREA_NUMBER COMMUNITY_AREA_NAME
26.0 West Garfield Park
30.0 South Lawndale
37.0 Fuller Park
54.0 Riverdale

Problem 3¶

List all case numbers for crimes involving minors?(children are not considered minors for the purposes of crime analysis)¶
In [37]:
%sql SELECT CASE_NUMBER FROM ChicagoCrimeData WHERE DESCRIPTION LIKE '%MINOR%';
 * sqlite:///Final.db
Done.
Out[37]:
CASE_NUMBER
HL266884
HK238408

Problem 4¶

List all kidnapping crimes involving a child?¶
In [38]:
%sql SELECT CASE_NUMBER FROM ChicagoCrimeData WHERE PRIMARY_TYPE = 'KIDNAPPING' AND DESCRIPTION LIKE '%CHILD%';
 * sqlite:///Final.db
Done.
Out[38]:
CASE_NUMBER
HN144152

Problem 5¶

What kinds of crimes were recorded at schools?¶
In [39]:
%sql SELECT CASE_NUMBER, PRIMARY_TYPE FROM ChicagoCrimeData WHERE LOCATION_DESCRIPTION LIKE '%SCHOOL%'
 * sqlite:///Final.db
Done.
Out[39]:
CASE_NUMBER PRIMARY_TYPE
HL353697 BATTERY
HL725506 BATTERY
HP716225 BATTERY
HH639427 BATTERY
JA460432 BATTERY
HS200939 CRIMINAL DAMAGE
HK577020 NARCOTICS
HS305355 NARCOTICS
HT315369 ASSAULT
HR585012 CRIMINAL TRESPASS
HH292682 PUBLIC PEACE VIOLATION
G635735 PUBLIC PEACE VIOLATION

Problem 6¶

List the average safety score for each type of school.¶
In [41]:
%sql SELECT Elementary_Middle_or_High_School, AVG(SAFETY_SCORE) AS SAFETY_SCORE FROM ChicagoPublicSchools GROUP BY Elementary_Middle_or_High_School;
 * sqlite:///Final.db
Done.
Out[41]:
Elementary_Middle_or_High_School SAFETY_SCORE
ES 49.52038369304557
HS 49.62352941176471
MS 48.0

Problem 7¶

List 5 community areas with highest % of households below poverty line¶
In [42]:
%sql SELECT COMMUNITY_AREA_NAME, PERCENT_HOUSEHOLDS_BELOW_POVERTY FROM ChicagoCensusData ORDER BY PERCENT_HOUSEHOLDS_BELOW_POVERTY DESC LIMIT 5;
 * sqlite:///Final.db
Done.
Out[42]:
COMMUNITY_AREA_NAME PERCENT_HOUSEHOLDS_BELOW_POVERTY
Riverdale 56.5
Fuller Park 51.2
Englewood 46.6
North Lawndale 43.1
East Garfield Park 42.4

Problem 8¶

Which community area is most crime prone?¶
In [46]:
%sql SELECT COMMUNITY_AREA_NUMBER, COUNT(*) AS TOTAL_CRIMES FROM ChicagoCrimeData GROUP BY COMMUNITY_AREA_NUMBER ORDER BY TOTAL_CRIMES DESC LIMIT 1;
 * sqlite:///Final.db
Done.
Out[46]:
COMMUNITY_AREA_NUMBER TOTAL_CRIMES
25.0 43

Double-click here for a hint

Problem 9¶

Use a sub-query to find the name of the community area with highest hardship index¶
In [48]:
%sql SELECT COMMUNITY_AREA_NAME FROM ChicagoCensusData WHERE HARDSHIP_INDEX = (SELECT MAX(HARDSHIP_INDEX) FROM ChicagoCensusData);
 * sqlite:///Final.db
Done.
Out[48]:
COMMUNITY_AREA_NAME
Riverdale

Problem 10¶

Use a sub-query to determine the Community Area Name with most number of crimes?¶
In [ ]:
%sql SELECT ChicagoCensusData.COMMUNITY_AREA_NUMBER, ChicagoCensusData.COMMUNITY_AREA_NAME, COUNT(*) AS TOTAL_CRIMES FROM ChicagoCrimeData JOIN ChicagoCensusData ON ChicagoCrimeData.COMMUNITY_AREA_NUMBER = ChicagoCensusData.COMMUNITY_AREA_NUMBER GROUP BY ChicagoCrimeData.COMMUNITY_AREA_NUMBER ORDER BY TOTAL_CRIMES DESC LIMIT 1;
 * sqlite:///Final.db
Done.
Out[ ]:
COMMUNITY_AREA_NUMBER COMMUNITY_AREA_NAME TOTAL_CRIMES
25.0 Austin 43

Copyright © 2020 This notebook and its source code are released under the terms of the MIT License.