Module code: ECS640U
Credits: 15
Semester: SEM1
Big Data Processing covers the new large-scale programming models that allow to easily create algorithms that process massive amounts of information with a cluster of computer nodes. These platforms hide the complexity of coordinating complex parallel computations across the cooperating nodes, instead providing to developers a high-level programming model.
The module is based on the MapReduce programming model. Lectures explain how multiple data analysis algorithms can be expressed under this model, and executed automatically over clusters of machines. The module also covers the internal mechanisms that a MapReduce framework uses to coordinate and execute the job among the infrastructure. Finally, additional related topics in the area of Big Data, such as alternative large-scale processing platforms, NoSQL data stores, and Cloud Computing execution infrastructure are presented. In addition to the lectures, weekly lab sessions and coursework exercises present multiple applications where real world datasets are analysed using platforms such as Hadoop.
* Note that this module is dual level, i.e. is taught at both levels 6 and 7. The assessment for the level 6 and 7 variants differs by at least 1/3, either in coursework or exam components, with the higher level variant testing the more advanced learning objectives noted in the relevant module descriptor. Any student who has already studied the level 6 variant may not subsequently study the equivalent level 7 variant. *
Level: 6