Skip to main content

Summer Session Courses

Northwestern SPS Undergraduate Students

View Course

STAT 305-0 : Information Management for Data Science


Description

The Information Management for Data Science course aims to give students an extensive skillset to upload, clean, process, store and utilize data from various sources. Starting with the main libraries and data structures in Python, it moves on to advanced techniques to obtain data. Namely, it will cover HTML text from web sites using CSS and Xpath techniques, interacting with Application Programming Interfaces (APIs) using Javascript Object Notation (JSON) files and the corresponding libraries. Students are expected to have fundamental Python skills from STAT 303-1 or CS 110.

The course then moves on to relational databases and how to store/obtain data from them using Structured Query Language (SQL). Students are not expected to have any prior knowledge on SQL; it will be introduced from scratch and applied during the lectures. After a certain understanding of SQL is established, database design will be the last main topic of the course.

Prerequisites/Registration Requirements: STAT 303-1 or CS 110 (or equivalent Python knowledge) If you have introductory Python knowledge but you are not sure if it is sufficient for the course, please email emre.besler@northwestern.edu to check.

Learning Objectives: At the completion of this course, students should be able to: - Identify data parts that misleading, wrong, irrelevant or redundant according to the task at hand and process the dataset they uploaded accordingly. - Create new variables from the data they have, in a new data type if necessary. - Visualize the data in an interactive and visually aesthetic manner. - Scrape different types of data from online sources and process it for further analysis. - Handle SQL queries to obtain data that is spread across multiple and relational databases. - Obtain data from a mobile application or a website and process it for numerical analysis - Design relational databases according to the needs of the datasets at hand.

Teaching Method: Remote asynchronous - The weekly videos will be released on Panopto (via Canvas) during the week before and there will be weekly office hours on Thursdays and Fridays.

Evaluation Method: There will be 5 homework assignments, (35%) post-lecture assignments after every video (35%) a take-home midterm exam, (15%) and a take-home final exam. (15%)

Required Class Materials: A laptop that is able to run Anaconda Navigator for Python programming and SQLiteStudio for SQL programming. 


Additional Information:

PLEASE NOTE: The registration period for ALL summer courses is April 8, 2024 through June 16, 2024, even if the course begins later in the summer.


Summer 2024
Start/End DatesDay(s)TimeBuildingSection
07/22/24 - 08/25/24Asynch
Asynch 20
InstructorCourse LocationStatusCAESAR Course ID
Besler, Emre
Online
Closed42516
Back to top