Course Info

DSC 333: Introduction to Big Data Processing

Due to COVID-19, Spring in-class sections will be conducted online. Please contact your instructor for more information.

This course will explore different approaches and a framework for performing data analytics on a dynamic, heterogeneous cluster of computing nodes. The course will begin with studying principles behind MapReduce and implementation of custom distributed queries using Hadoop. It will then expand to cover higher-level languages and tools within Hadoop ecosystem (e.g., Pig, Hive) and cluster configuration techniques. Finally, the course will delve into a comparative evaluation of several NoSQL and NewSQL databases that make fundamentally different assumptions for data processing (e.g., OLAP vs OLTP, disk-bound vs in-memory or real-time streaming data). The primary focus of the course will be hands-on implementation and tuning performance for large-scale clusters and data sets.

CSC 355 is a prerequisite for this class.

Fall 2020-2021

Section: 401
Class number: 16273
Meeting time: TuTh 10:10AM - 11:40AM
Location: TBA-L OOP00 at Loop Campus
Instructor:

Spring 2019-2020

Section: 901
Class number: 30934
Meeting time: Tu 5:45PM - 9:00PM
Location: REMOT E0000

Spring 2018-2019

Section: 901
Class number: 37056
Meeting time: Tu 5:45PM - 9:00PM
Location: CDM 00226 at Loop Campus
CLOSED