ClassInfo

CSC 455 Database Processing for Large-Scale Analytics

Alexander Rasin

Office: CDM 847
Fall 2014-2015
Class number: 10645
Section number: 710
-
Online Campus
Course homepage: http://d2l.depaul.edu/

Summary

This is an introductory graduate course in database design and applications. Specific topics to be covered include:


* Relational Model

* Structured Query Language (SQL)

* Database Design and Normalization

* Extract-Transform-Load (ETL) using Python

* Materialized views and clustered indexes



Texts

Python for Data Analysis, by Wes McKinney, O'Reily 2012. ISBN 1449319793 or 978-1449319793


Additional reading: Oracle Database 11g SQL, by Jason Price. ISBN 9780071498500 (available in DePaul books 24x7 library)


Grading

There will be homework assignments given most weeks; assignments (with associated readings) will be posted on the course web site and will be due one week after the day they are posted, unless otherwise noted. Details of the submission process will be discussed in class; it is your responsibility to verify that your submitted files are readable and submitted in the correct locations. Late assignments will be accepted up to three days late with a 10% penalty for each day or fraction of a day that the assignment is late; these penalties will be assessed uniformly and in full to all assignments submitted at any point beyond the posted due date and time (including those submitted or re-submitted later the same day). No assignments will be accepted more than three days beyond the posted due date and time (contact me if you require an extension due to extenuating circumstances). The homework assignments will be worth a total of 40% of the course grade. There will be a midterm exam that is worth 30% of the course grade. The final exam will be given as a take-home and will be worth 30% of the grade.


Prerequisites

CSC401


Regarding Email Communication

Please begin the subject line of any email to me with "CSC 455", so that I can easily identify your messages. I will reply to email messages within one business day after the day I receive them; therefore questions that are only received by me on an assignment's due date (or late the night before) are not guaranteed replies before the assignment is due. Please plan accordingly and begin the assignments early enough to ask questions and receive answers. If you are having problems, send me a detailed description of the problems you are having; I will try to guide you in locating and solving your problems yourself, rather than simply solve your problems for you. Please do not use the comment field of the assignment submission system to send me questions.


Regarding Academic Integrity

You are expected to be familiar with and to adhere to DePaul's Academic Integrity Policy, which is available on-line at http://academicintegrity.depaul.edu/AcademicIntegrityPolicy.pdf. Violations of the Academic Integrity Policy will be dealt with decisively; penalties may range up to an automatic F in the course and possible expulsion.


Plagiarism includes, but is not limited to: Turning in another person?s work as your own (including hiring someone else to complete an assignment for you); Starting with another person's work and modifying it to turn in as your own; Cutting and pasting, or otherwise copying, sections of another person's work into your assignment; Allowing another person (such as a tutor) to write part of your assignment; and so on. Supplying such assistance to another student is considered an equivalent violation of the policy. You may feel free to discuss the assignments with other students at a general level. However, when it comes to actually completing your assignment, you must work independently. Your assignments must be entirely your own individual work. If you have any questions or doubts about what plagiarism entails, you should consult me.


Week 1: Introduction to Databases
- General database intro
- Relational model
- Schema normalization
- Python review

Week 2: Database Design and SQL DDL
- Functional dependencies and keys
- Normal forms and normalization
- CREATE, domains, INSERT, CHECK, CONSTRAINT, ALTER, UPDATE, DELETE
- Python review
- (Assignment 1: Schema normalization)
Week 3: SQL
- SQL DDL with python
- SQLDeveloper
- SELECT, FROM, WHERE, comparisons, LIKE, ORDER BY
- Aggregate functions, GROUP BY, HAVING
- (Assignment 2: SQL I)
Week 4: Intermediate SQL
- SQL queries with python
- Joins (Cartesian product, equi-join, inner join, outer join)
- Aggregate functions
- Twitter data/parsing data from file
- (Assignment 3: SQL II)
Week 5: Database Programming
- Read/parse data from the web
- Data cleaning and validation
- Loading data into DBMS
Week 6: Advanced SQL
- Guest lecture (TBA)
- Windowing aggregate function and time series
- Customized reporting in SQL
- Midterm exam review
- (Assignment 4: Database Programming I, due Week 8, after Midterm)
Week 7: Midterm
- Midterm
Week 8: Database Programming
- Extract-Transform-Load with python
- Transformation in DBMS vs python
- Database integrity
- (Assignment 5: Database Programming II)
Week 9: Query Performance
- Performance considerations in a DBMS
- Views, Materialized views
- Indexes, Clustered indexes
Week 10: NoSQL Databases
- Key-value, document, columnar stores.
- MapReduce/Hadoop
- Streaming data processing engines

School policies:

Changes to Syllabus

This syllabus is subject to change as necessary during the quarter. If a change occurs, it will be thoroughly addressed during class, posted under Announcements in D2L and sent via email.

Online Course Evaluations

Evaluations are a way for students to provide valuable feedback regarding their instructor and the course. Detailed feedback will enable the instructor to continuously tailor teaching methods and course content to meet the learning goals of the course and the academic needs of the students. They are a requirement of the course and are key to continue to provide you with the highest quality of teaching. The evaluations are anonymous; the instructor and administration do not track who entered what responses. A program is used to check if the student completed the evaluations, but the evaluation is completely separate from the student’s identity. Since 100% participation is our goal, students are sent periodic reminders over three weeks. Students do not receive reminders once they complete the evaluation. Students complete the evaluation online in CampusConnect.

Academic Integrity and Plagiarism

This course will be subject to the university's academic integrity policy. More information can be found at http://academicintegrity.depaul.edu/ If you have any questions be sure to consult with your professor.

All students are expected to abide by the University's Academic Integrity Policy which prohibits cheating and other misconduct in student coursework. Publicly sharing or posting online any prior or current materials from this course (including exam questions or answers), is considered to be providing unauthorized assistance prohibited by the policy. Both students who share/post and students who access or use such materials are considered to be cheating under the Policy and will be subject to sanctions for violations of Academic Integrity.

Academic Policies

All students are required to manage their class schedules each term in accordance with the deadlines for enrolling and withdrawing as indicated in the University Academic Calendar. Information on enrollment, withdrawal, grading and incompletes can be found at http://www.cdm.depaul.edu/Current%20Students/Pages/PoliciesandProcedures.aspx.

Students with Disabilities

Students who feel they may need an accommodation based on the impact of a disability should contact the instructor privately to discuss their specific needs. All discussions will remain confidential.
To ensure that you receive the most appropriate accommodation based on your needs, contact the instructor as early as possible in the quarter (preferably within the first week of class), and make sure that you have contacted the Center for Students with Disabilities (CSD) at:
Lewis Center 1420, 25 East Jackson Blvd.
Phone number: (312)362-8002
Fax: (312)362-6544
TTY: (773)325.7296