Introduction

This notebook shows how to work with a real world dataset using SQL and Python. In this lab you will:

  1. Understand the dataset for Chicago Public School level performance
  2. Store the dataset in an Db2 database on IBM Cloud instance
  3. Retrieve metadata about tables and columns and query data from mixed case columns
  4. Solve example problems to practice your SQL skills including using built-in database functions

Chicago Public Schools - Progress Report Cards (2011-2012)

The city of Chicago released a dataset showing all school level performance data used to create School Report Cards for the 2011-2012 school year. The dataset is available from the Chicago Data Portal: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t

This dataset includes a large number of metrics. Start by familiarizing yourself with the types of metrics in the database: https://data.cityofchicago.org/api/assets/AAD41A13-BE8A-4E67-B1F5-86E711E09D5F?download=true

NOTE: Do not download the dataset directly from City of Chicago portal. Instead download a more database friendly version from the link below. Now download a static copy of this database and review some of its contents: https://ibm.box.com/shared/static/f9gjvj1gjmxxzycdhplzt01qtz0s7ew7.csv

Store the dataset in a Table

In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.

While it is easier to read the dataset into a Pandas dataframe and then PERSIST it into the database as we saw in the previous lab, it results in mapping to default datatypes which may not be optimal for SQL querying. For example a long textual field may map to a CLOB instead of a VARCHAR.

Therefore, it is highly recommended to manually load the table using the database console LOAD tool, as indicated in Week 2 Lab 1 Part II. The only difference with that lab is that in Step 5 of the instructions you will need to click on create "(+) New Table" and specify the name of the table you want to create and then click "Next".

Now open the Db2 console, open the LOAD tool, Select / Drag the .CSV file for the CHICAGO PUBLIC SCHOOLS dataset and load the dataset into a new table called SCHOOLS.

Connect to the database

Let us now load the ipython-sql extension and establish a connection with the database

In [1]:
%load_ext sql
In [2]:
dsn_driver = "{IBM DB2 ODBC DRIVER}"
dsn_database = "BLUDB"            # e.g. "BLUDB"
dsn_port = "50000"                # e.g. "50000" 
dsn_protocol = "TCPIP"            # i.e. "TCPIP"

%sql ibm_db_sa://rjz07540:xwc76jpv64jfwb%40v@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Out[2]:
'Connected: rjz07540@BLUDB'

Query the database system catalog to retrieve table metadata

You can verify that the table creation was successful by retrieving the list of all tables in your schema and checking whether the SCHOOLS table was created
In [4]:
# type in your query to retrieve list of all tables in the database for your db2 schema (username)
%sql select * from syscat.columns where tabname = 'SCHOOLS' 
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[4]:
tabschema tabname colname colno typeschema typename length scale typestringunits stringunitslength DEFAULT NULLS codepage collationschema collationname logged compact colcard high2key low2key avgcollen keyseq partkeyseq nquantiles nmostfreq numnulls target_typeschema target_typename scope_tabschema scope_tabname source_tabschema source_tabname dl_features special_props hidden inline_length pctinlined IDENTITY rowchangetimestamp GENERATED text compress avgdistinctperpage pagevarianceratio sub_count sub_delim_length avgcollenchar implicitvalue seclabelname rowbegin rowend transactionstartid pctencoded avgencodedcollen qualifier func_path randdistkey remarks
RJZ07540 SCHOOLS School_ID 0 SYSIBM INTEGER 4 0 None None None Y 0 None None 566 610543 609674 5 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS NAME_OF_SCHOOL 1 SYSIBM VARCHAR 65 0 OCTETS 65 None Y 1208 SYSIBM IDENTITY 566 'Woodlawn Community Elementary Sch' 'Abraham Lincoln Elementary School' 41 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 36 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Elementary, Middle, or High School 2 SYSIBM VARCHAR 2 0 OCTETS 2 None Y 1208 SYSIBM IDENTITY 3 'MS' 'ES' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Street_Address 3 SYSIBM VARCHAR 30 0 OCTETS 30 None Y 1208 SYSIBM IDENTITY 566 '9912 S Avenue H' '10015 S Leavitt St' 22 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 17 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS City 4 SYSIBM VARCHAR 7 0 OCTETS 7 None Y 1208 SYSIBM IDENTITY 1 'Chicago' 'Chicago' 12 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 7 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS State 5 SYSIBM VARCHAR 2 0 OCTETS 2 None Y 1208 SYSIBM IDENTITY 1 'IL' 'IL' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS ZIP_Code 6 SYSIBM INTEGER 4 0 None None None Y 0 None None 51 60707 60607 5 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Phone_Number 7 SYSIBM VARCHAR 14 0 OCTETS 14 None Y 1208 SYSIBM IDENTITY 566 '(773) 535-9930' '(773) 534-0146' 19 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 14 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Link 8 SYSIBM VARCHAR 78 0 OCTETS 78 None Y 1208 SYSIBM IDENTITY 2 '' 'http://schoolreports.cps.edu/Scho' 83 None 0 20 10 1 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 78 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Network_Manager 9 SYSIBM VARCHAR 40 0 OCTETS 40 None Y 1208 SYSIBM IDENTITY 20 'Southwest Side High School Networ' 'Austin-North Lawndale Elementary ' 35 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 30 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Collaborative_Name 10 SYSIBM VARCHAR 34 0 OCTETS 34 None Y 1208 SYSIBM IDENTITY 5 'SOUTHWEST SIDE COLLABORATIVE' 'NORTH-NORTHWEST SIDE COLLABORATIV' 33 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 28 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Adequate_Yearly_Progress_Made_ 11 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 3 'Yes' 'NDA' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Track_Schedule 12 SYSIBM VARCHAR 12 0 OCTETS 12 None Y 1208 SYSIBM IDENTITY 3 'Track_E' 'Non_Standard' 13 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 8 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS CPS_Performance_Policy_Status 13 SYSIBM VARCHAR 16 0 OCTETS 16 None Y 1208 SYSIBM IDENTITY 4 'Not on Probation' 'Not Applicable' 18 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 13 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS CPS_Performance_Policy_Level 14 SYSIBM VARCHAR 15 0 OCTETS 15 None Y 1208 SYSIBM IDENTITY 5 'NDA' 'Level 2' 12 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 7 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS HEALTHY_SCHOOL_CERTIFIED 15 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 2 'Yes' 'No' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Safety_Icon 16 SYSIBM VARCHAR 11 0 OCTETS 11 None Y 1208 SYSIBM IDENTITY 6 'Very Weak' 'NDA' 11 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 6 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS SAFETY_SCORE 17 SYSIBM SMALLINT 2 0 None None None Y 0 None None 86 99 5 3 None 0 20 10 53 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Family_Involvement_Icon 18 SYSIBM VARCHAR 11 0 OCTETS 11 None Y 1208 SYSIBM IDENTITY 6 'Very Weak' 'NDA' 10 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 5 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Family_Involvement_Score 19 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 72 '99' '17' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Environment_Icon 20 SYSIBM VARCHAR 11 0 OCTETS 11 None Y 1208 SYSIBM IDENTITY 6 'Very Weak' 'NDA' 11 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 6 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Environment_Score 21 SYSIBM SMALLINT 2 0 None None None Y 0 None None 80 99 2 3 None 0 20 10 53 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Instruction_Icon 22 SYSIBM VARCHAR 11 0 OCTETS 11 None Y 1208 SYSIBM IDENTITY 6 'Very Weak' 'NDA' 11 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 6 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Instruction_Score 23 SYSIBM SMALLINT 2 0 None None None Y 0 None None 83 99 11 3 None 0 20 10 53 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Leaders_Icon 24 SYSIBM VARCHAR 11 0 OCTETS 11 None Y 1208 SYSIBM IDENTITY 6 'Very Weak' 'NDA' 10 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 5 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Leaders_Score 25 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 79 '99' '14' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Teachers_Icon 26 SYSIBM VARCHAR 11 0 OCTETS 11 None Y 1208 SYSIBM IDENTITY 6 'Very Weak' 'NDA' 10 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 5 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Teachers_Score 27 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 77 '99' '14' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Parent_Engagement_Icon 28 SYSIBM VARCHAR 7 0 OCTETS 7 None Y 1208 SYSIBM IDENTITY 4 'Strong' 'NDA' 10 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 5 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Parent_Engagement_Score 29 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 32 '69' '39' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Parent_Environment_Icon 30 SYSIBM VARCHAR 7 0 OCTETS 7 None Y 1208 SYSIBM IDENTITY 4 'Strong' 'NDA' 10 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 5 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Parent_Environment_Score 31 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 33 '70' '36' 7 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 2 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS AVERAGE_STUDENT_ATTENDANCE 32 SYSIBM VARCHAR 6 0 OCTETS 6 None Y 1208 SYSIBM IDENTITY 155 '98.40%' '60.90%' 11 None 0 20 10 1 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 6 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Rate_of_Misconducts__per_100_students_ 33 SYSIBM DECIMAL 5 1 None None None Y 0 None None 288 +0230.6 +0000.1 4 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Average_Teacher_Attendance 34 SYSIBM VARCHAR 6 0 OCTETS 6 None Y 1208 SYSIBM IDENTITY 55 '98.10%' '91.70%' 11 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 6 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Individualized_Education_Program_Compliance_Rate 35 SYSIBM VARCHAR 7 0 OCTETS 7 None Y 1208 SYSIBM IDENTITY 69 '99.60%' '72.70%' 12 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 7 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Pk_2_Literacy__ 36 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 308 '96.3' '16.4' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Pk_2_Math__ 37 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 224 '97.2' '10.1' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr3_5_Grade_Level_Math__ 38 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 336 '98.6' '10' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr3_5_Grade_Level_Read__ 39 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 340 '98.6' '10' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr3_5_Keep_Pace_Read__ 40 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 276 '82.5' '28.8' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr3_5_Keep_Pace_Math__ 41 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 260 '83.5' '21.7' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr6_8_Grade_Level_Math__ 42 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 296 '96.5' '10.2' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr6_8_Grade_Level_Read__ 43 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 300 '98.9' '100' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr6_8_Keep_Pace_Math_ 44 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 268 '93.6' '24.3' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr6_8_Keep_Pace_Read__ 45 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 260 '85.5' '21.4' 9 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 4 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr_8_Explore_Math__ 46 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 208 '9.7' '1.4' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Gr_8_Explore_Read__ 47 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 244 '99' '1.4' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS ISAT_Exceeding_Math__ 48 SYSIBM DECIMAL 4 1 None None None Y 0 None None 297 +100.0 +000.6 4 None 0 20 10 90 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS ISAT_Exceeding_Reading__ 49 SYSIBM DECIMAL 4 1 None None None Y 0 None None 261 +100.0 +000.5 4 None 0 20 10 90 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS ISAT_Value_Add_Math 50 SYSIBM DECIMAL 3 1 None None None Y 0 None None 57 +03.6 -03.1 3 None 0 20 10 98 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS ISAT_Value_Add_Read 51 SYSIBM DECIMAL 3 1 None None None Y 0 None None 60 +04.9 -02.8 3 None 0 20 10 98 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS ISAT_Value_Add_Color_Math 52 SYSIBM VARCHAR 6 0 OCTETS 6 None Y 1208 SYSIBM IDENTITY 4 'Red' 'NDA' 10 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 5 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS ISAT_Value_Add_Color_Read 53 SYSIBM VARCHAR 6 0 OCTETS 6 None Y 1208 SYSIBM IDENTITY 4 'Red' 'NDA' 10 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 5 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Students_Taking__Algebra__ 54 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 116 '98.5' '10.5' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Students_Passing__Algebra__ 55 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 94 '93.3' '11.1' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS 9th Grade EXPLORE (2009) 56 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 45 '22.4' '11.6' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS 9th Grade EXPLORE (2010) 57 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 51 '22.2' '11.2' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS 10th Grade PLAN (2009) 58 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 45 '24.5' '12.9' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS 10th Grade PLAN (2010) 59 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 49 '24.7' '12.7' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Net_Change_EXPLORE_and_PLAN 60 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 16 '2.3' '0.5' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS 11th Grade Average ACT (2011) 61 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 53 '28.8' '13.6' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Net_Change_PLAN_and_ACT 62 SYSIBM VARCHAR 3 0 OCTETS 3 None Y 1208 SYSIBM IDENTITY 30 '4.3' '0.3' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS College_Eligibility__ 63 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 78 '98' '10.6' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Graduation_Rate__ 64 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 75 '97.6' '10.3' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS College_Enrollment_Rate__ 65 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 70 '90.7' '32.6' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS COLLEGE_ENROLLMENT 66 SYSIBM SMALLINT 2 0 None None None Y 0 None None 444 3320 26 3 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS General_Services_Route 67 SYSIBM SMALLINT 2 0 None None None Y 0 None None 20 48 30 3 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Freshman_on_Track_Rate__ 68 SYSIBM VARCHAR 4 0 OCTETS 4 None Y 1208 SYSIBM IDENTITY 79 '98.1' '37.1' 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 3 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS X_COORDINATE 69 SYSIBM DECIMAL 13 3 None None None Y 0 None None 504 +0001202756.093 +0001121355.698 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Y_COORDINATE 70 SYSIBM DECIMAL 13 3 None None None Y 0 None None 560 +0001949501.716 +0001817244.896 8 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Latitude 71 SYSIBM DECIMAL 18 8 None None None Y 0 None None 472 +0000000042.01709536 +0000000041.65376037 11 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Longitude 72 SYSIBM DECIMAL 18 8 None None None Y 0 None None 566 -0000000087.53301251 -0000000087.82933190 11 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS COMMUNITY_AREA_NUMBER 73 SYSIBM SMALLINT 2 0 None None None Y 0 None None 77 76 2 3 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS COMMUNITY_AREA_NAME 74 SYSIBM VARCHAR 22 0 OCTETS 22 None Y 1208 SYSIBM IDENTITY 77 'WEST TOWN' 'ARCHER HEIGHTS' 16 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 11 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Ward 75 SYSIBM SMALLINT 2 0 None None None Y 0 None None 50 49 2 3 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Police_District 76 SYSIBM SMALLINT 2 0 None None None Y 0 None None 23 24 2 3 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 -1 None None N N N -1 -1.0 None None N None
RJZ07540 SCHOOLS Location 77 SYSIBM VARCHAR 27 0 OCTETS 27 None Y 1208 SYSIBM IDENTITY 536 '(42.01709536, -87.67779928)' '(41.65376037, -87.61434156)' 32 None 0 20 10 0 None None None None None None None None 0 -1 N N None O None -1.0 -1 -1 27 None None N N N -1 -1.0 None None N None

Double-click here for a hint

Double-click here for the solution.

Query the database system catalog to retrieve column metadata

The SCHOOLS table contains a large number of columns. How many columns does this table have?
In [5]:
# type in your query to retrieve the number of columns in the SCHOOLS table
%sql select count(*) from syscat.columns where tabname = 'SCHOOLS' 
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[5]:
1
78

Double-click here for a hint

Double-click here for the solution.

Now retrieve the the list of columns in SCHOOLS table and their column type (datatype) and length.

In [6]:
# type in your query to retrieve all column names in the SCHOOLS table along with their datatypes and length
%sql select COLNAME, TYPENAME, LENGTH from SYSCAT.COLUMNS where TABNAME = 'SCHOOLS'
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[6]:
colname typename length
School_ID INTEGER 4
NAME_OF_SCHOOL VARCHAR 65
Elementary, Middle, or High School VARCHAR 2
Street_Address VARCHAR 30
City VARCHAR 7
State VARCHAR 2
ZIP_Code INTEGER 4
Phone_Number VARCHAR 14
Link VARCHAR 78
Network_Manager VARCHAR 40
Collaborative_Name VARCHAR 34
Adequate_Yearly_Progress_Made_ VARCHAR 3
Track_Schedule VARCHAR 12
CPS_Performance_Policy_Status VARCHAR 16
CPS_Performance_Policy_Level VARCHAR 15
HEALTHY_SCHOOL_CERTIFIED VARCHAR 3
Safety_Icon VARCHAR 11
SAFETY_SCORE SMALLINT 2
Family_Involvement_Icon VARCHAR 11
Family_Involvement_Score VARCHAR 3
Environment_Icon VARCHAR 11
Environment_Score SMALLINT 2
Instruction_Icon VARCHAR 11
Instruction_Score SMALLINT 2
Leaders_Icon VARCHAR 11
Leaders_Score VARCHAR 3
Teachers_Icon VARCHAR 11
Teachers_Score VARCHAR 3
Parent_Engagement_Icon VARCHAR 7
Parent_Engagement_Score VARCHAR 3
Parent_Environment_Icon VARCHAR 7
Parent_Environment_Score VARCHAR 3
AVERAGE_STUDENT_ATTENDANCE VARCHAR 6
Rate_of_Misconducts__per_100_students_ DECIMAL 5
Average_Teacher_Attendance VARCHAR 6
Individualized_Education_Program_Compliance_Rate VARCHAR 7
Pk_2_Literacy__ VARCHAR 4
Pk_2_Math__ VARCHAR 4
Gr3_5_Grade_Level_Math__ VARCHAR 4
Gr3_5_Grade_Level_Read__ VARCHAR 4
Gr3_5_Keep_Pace_Read__ VARCHAR 4
Gr3_5_Keep_Pace_Math__ VARCHAR 4
Gr6_8_Grade_Level_Math__ VARCHAR 4
Gr6_8_Grade_Level_Read__ VARCHAR 4
Gr6_8_Keep_Pace_Math_ VARCHAR 4
Gr6_8_Keep_Pace_Read__ VARCHAR 4
Gr_8_Explore_Math__ VARCHAR 4
Gr_8_Explore_Read__ VARCHAR 4
ISAT_Exceeding_Math__ DECIMAL 4
ISAT_Exceeding_Reading__ DECIMAL 4
ISAT_Value_Add_Math DECIMAL 3
ISAT_Value_Add_Read DECIMAL 3
ISAT_Value_Add_Color_Math VARCHAR 6
ISAT_Value_Add_Color_Read VARCHAR 6
Students_Taking__Algebra__ VARCHAR 4
Students_Passing__Algebra__ VARCHAR 4
9th Grade EXPLORE (2009) VARCHAR 4
9th Grade EXPLORE (2010) VARCHAR 4
10th Grade PLAN (2009) VARCHAR 4
10th Grade PLAN (2010) VARCHAR 4
Net_Change_EXPLORE_and_PLAN VARCHAR 3
11th Grade Average ACT (2011) VARCHAR 4
Net_Change_PLAN_and_ACT VARCHAR 3
College_Eligibility__ VARCHAR 4
Graduation_Rate__ VARCHAR 4
College_Enrollment_Rate__ VARCHAR 4
COLLEGE_ENROLLMENT SMALLINT 2
General_Services_Route SMALLINT 2
Freshman_on_Track_Rate__ VARCHAR 4
X_COORDINATE DECIMAL 13
Y_COORDINATE DECIMAL 13
Latitude DECIMAL 18
Longitude DECIMAL 18
COMMUNITY_AREA_NUMBER SMALLINT 2
COMMUNITY_AREA_NAME VARCHAR 22
Ward SMALLINT 2
Police_District SMALLINT 2
Location VARCHAR 27

Double-click here for the solution.

Questions

  1. Is the column name for the "SCHOOL ID" attribute in upper or mixed case?
  2. What is the name of "Community Area Name" column in your table? Does it have spaces?
  3. Are there any columns in whose names the spaces and paranthesis (round brackets) have been replaced by the underscore character "_"?

Problems

Problem 1

How many Elementary Schools are in the dataset?
In [12]:
%sql select count(*) from SCHOOLS where "Elementary, Middle, or High School" = 'ES';
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[12]:
1
462

Double-click here for a hint

Double-click here for another hint

Double-click here for the solution.

Problem 2

What is the highest Safety Score?
In [14]:
%sql select MAX(SAFETY_SCORE) from SCHOOLS;
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[14]:
1
99

Double-click here for a hint

Double-click here for the solution.

Problem 3

Which schools have highest Safety Score?
In [20]:
%sql select NAME_OF_SCHOOL,SAFETY_SCORE from SCHOOLS WHERE SAFETY_SCORE = (SELECT MAX(SAFETY_SCORE) from SCHOOLS);
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[20]:
name_of_school safety_score
Abraham Lincoln Elementary School 99
Alexander Graham Bell Elementary School 99
Annie Keller Elementary Gifted Magnet School 99
Augustus H Burley Elementary School 99
Edgar Allan Poe Elementary Classical School 99
Edgebrook Elementary School 99
Ellen Mitchell Elementary School 99
James E McDade Elementary Classical School 99
James G Blaine Elementary School 99
LaSalle Elementary Language Academy 99
Mary E Courtenay Elementary Language Arts Center 99
Northside College Preparatory High School 99
Northside Learning Center High School 99
Norwood Park Elementary School 99
Oriole Park Elementary School 99
Sauganash Elementary School 99
Stephen Decatur Classical Elementary School 99
Talman Elementary School 99
Wildwood Elementary School 99

Double-click here for the solution.

Problem 4

What are the top 10 schools with the highest "Average Student Attendance"?
In [22]:
%sql select NAME_OF_SCHOOL,AVERAGE_STUDENT_ATTENDANCE from SCHOOLS ORDER BY AVERAGE_STUDENT_ATTENDANCE DESC nulls last LIMIT 10;
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[22]:
name_of_school average_student_attendance
John Charles Haines Elementary School 98.40%
James Ward Elementary School 97.80%
Edgar Allan Poe Elementary Classical School 97.60%
Orozco Fine Arts & Sciences Elementary School 97.60%
Rachel Carson Elementary School 97.60%
Annie Keller Elementary Gifted Magnet School 97.50%
Andrew Jackson Elementary Language Academy 97.40%
Lenart Elementary Regional Gifted Center 97.40%
Disney II Magnet School 97.30%
John H Vanderpoel Elementary Magnet School 97.20%

Double-click here for the solution.

Problem 5

Retrieve the list of 5 Schools with the lowest Average Student Attendance sorted in ascending order based on attendance
In [24]:
%sql select NAME_OF_SCHOOL,AVERAGE_STUDENT_ATTENDANCE from SCHOOLS ORDER BY AVERAGE_STUDENT_ATTENDANCE fetch first 5 rows only;
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[24]:
name_of_school average_student_attendance
Richard T Crane Technical Preparatory High School 57.90%
Barbara Vick Early Childhood & Family Center 60.90%
Dyett High School 62.50%
Wendell Phillips Academy High School 63.00%
Orr Academy High School 66.30%

Double-click here for the solution.

Problem 6

Now remove the '%' sign from the above result set for Average Student Attendance column
In [25]:
%sql select NAME_OF_SCHOOL, TRIM('%' FROM AVERAGE_STUDENT_ATTENDANCE) AS ATTENDANCE  from SCHOOLS ORDER BY AVERAGE_STUDENT_ATTENDANCE fetch first 5 rows only;
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[25]:
name_of_school attendance
Richard T Crane Technical Preparatory High School 57.90
Barbara Vick Early Childhood & Family Center 60.90
Dyett High School 62.50
Wendell Phillips Academy High School 63.00
Orr Academy High School 66.30

Double-click here for a hint

Double-click here for the solution.

Problem 7

Which Schools have Average Student Attendance lower than 70%?
In [28]:
%sql select NAME_OF_SCHOOL, TRIM('%' FROM AVERAGE_STUDENT_ATTENDANCE) AS ATTENDANCE  from SCHOOLS WHERE CAST ( REPLACE(Average_Student_Attendance, '%', '') AS DOUBLE )  < 70.00 ORDER BY AVERAGE_STUDENT_ATTENDANCE ;
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[28]:
name_of_school attendance
Richard T Crane Technical Preparatory High School 57.90
Barbara Vick Early Childhood & Family Center 60.90
Dyett High School 62.50
Wendell Phillips Academy High School 63.00
Orr Academy High School 66.30
Manley Career Academy High School 66.80
Chicago Vocational Career Academy High School 68.80
Roberto Clemente Community Academy High School 69.60

Double-click here for a hint

Double-click here for another hint

Double-click here for the solution.

Problem 8

Get the total College Enrollment for each Community Area
In [29]:
%sql select Community_Area_Name, sum(College_Enrollment) AS TOTAL_ENROLLMENT \
   from SCHOOLS \
   group by Community_Area_Name 
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[29]:
community_area_name total_enrollment
ALBANY PARK 6864
ARCHER HEIGHTS 4823
ARMOUR SQUARE 1458
ASHBURN 6483
AUBURN GRESHAM 4175
AUSTIN 10933
AVALON PARK 1522
AVONDALE 3640
BELMONT CRAGIN 14386
BEVERLY 1636
BRIDGEPORT 3167
BRIGHTON PARK 9647
BURNSIDE 549
CALUMET HEIGHTS 1568
CHATHAM 5042
CHICAGO LAWN 7086
CLEARING 2085
DOUGLAS 4670
DUNNING 4568
EAST GARFIELD PARK 5337
EAST SIDE 5305
EDGEWATER 4600
EDISON PARK 910
ENGLEWOOD 6832
FOREST GLEN 1431
FULLER PARK 531
GAGE PARK 9915
GARFIELD RIDGE 4552
GRAND BOULEVARD 2809
GREATER GRAND CROSSING 4051
HEGEWISCH 963
HERMOSA 3975
HUMBOLDT PARK 8620
HYDE PARK 1930
IRVING PARK 7764
JEFFERSON PARK 1755
KENWOOD 4287
LAKE VIEW 7055
LINCOLN PARK 5615
LINCOLN SQUARE 4132
LOGAN SQUARE 7351
LOOP 871
LOWER WEST SIDE 7257
MCKINLEY PARK 1552
MONTCLARE 1317
MORGAN PARK 3271
MOUNT GREENWOOD 2091
NEAR NORTH SIDE 3362
NEAR SOUTH SIDE 1378
NEAR WEST SIDE 7975
NEW CITY 7922
NORTH CENTER 7541
NORTH LAWNDALE 5146
NORTH PARK 4210
NORWOOD PARK 6469
OAKLAND 140
OHARE 786
PORTAGE PARK 6954
PULLMAN 1620
RIVERDALE 1547
ROGERS PARK 4068
ROSELAND 7020
SOUTH CHICAGO 4043
SOUTH DEERING 1859
SOUTH LAWNDALE 14793
SOUTH SHORE 4543
UPTOWN 4388
WASHINGTON HEIGHTS 4006
WASHINGTON PARK 2648
WEST ELSDON 3700
WEST ENGLEWOOD 5946
WEST GARFIELD PARK 2622
WEST LAWN 4207
WEST PULLMAN 3240
WEST RIDGE 8197
WEST TOWN 9429
WOODLAWN 4206

Double-click here for a hint

Double-click here for another hint

Double-click here for the solution.

Problem 9

Get the 5 Community Areas with the least total College Enrollment sorted in ascending order
In [30]:
%sql select Community_Area_Name, sum(College_Enrollment) AS TOTAL_ENROLLMENT \
   from SCHOOLS \
   group by Community_Area_Name \
   order by TOTAL_ENROLLMENT asc \
   fetch first 5 rows only
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[30]:
community_area_name total_enrollment
OAKLAND 140
FULLER PARK 531
BURNSIDE 549
OHARE 786
LOOP 871

Double-click here for a hint

Double-click here for the solution.

Problem 10

Get the hardship index for the community area which has College Enrollment of 4638
In [31]:
%%sql 
select hardship_index 
   from chicago_socioeconomic_data CD, schools CPS 
   where CD.ca = CPS.community_area_number 
      and college_enrollment = 4368
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[31]:
hardship_index
6.0

Double-click here for the solution.

Problem 11

Get the hardship index for the community area which has the highest value for College Enrollment
In [32]:
%sql select ca, community_area_name, hardship_index from chicago_socioeconomic_data \
   where ca in \
   ( select community_area_number from schools order by college_enrollment desc limit 1 )
 * ibm_db_sa://rjz07540:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
Done.
Out[32]:
ca community_area_name hardship_index
5.0 North Center 6.0

Double-click here for the solution.

Summary

In this lab you learned how to work with a real word dataset using SQL and Python. You learned how to query columns with spaces or special characters in their names and with mixed case names. You also used built in database functions and practiced how to sort, limit, and order result sets, as well as used sub-queries and worked with multiple tables.

Copyright © 2018 cognitiveclass.ai. This notebook and its source code are released under the terms of the MIT License.