Course Info

DSC 333: Introduction to Big Data Processing

This course will explore different approaches and a framework for performing data analytics on a dynamic, heterogeneous cluster of computing nodes. The course will begin with studying principles behind MapReduce and implementation of custom distributed queries using Hadoop. It will then expand to cover higher-level languages and tools within Hadoop ecosystem (e.g., Pig, Hive) and cluster configuration techniques. Finally, the course will delve into a comparative evaluation of several NoSQL and NewSQL databases that make fundamentally different assumptions for data processing (e.g., OLAP vs OLTP, disk-bound vs in-memory or real-time streaming data). The primary focus of the course will be hands-on implementation and tuning performance for large-scale clusters and data sets.

CSC 355 is a prerequisite for this class.

Fall 2020-2021

Section: 401
Class number: 16273
Meeting time: Tu 5:45PM - 9:00PM
Location: Online: Sync
Section: 410
Class number: 17309
Meeting time: -
Location: Online: Async (Sync-Option)

Spring 2019-2020

Section: 901
Class number: 30934
Meeting time: Tu 5:45PM - 9:00PM
Location: REMOT E0000

Spring 2018-2019

Section: 901
Class number: 37056
Meeting time: Tu 5:45PM - 9:00PM
Location: CDM 00226 at Loop Campus