After completing this course, students
- can explain the system architecture of a data centre and clarify the challenges of programming at data centre scale;
- can describe the design of distributed filesystems;
- understand the system architecture of the Map-Reduce and Spark big data platforms;
- can apply widely used and scalable algorithms, including map-reduce design patterns;
- understand core techniques and data structures that scale to large data, including bloom filters, locality sensitive hashing and inverted files;
- can use the Apache Spark architecture as a basis for solving big data problems.
|
|
How to program a data center instead of a single computer?
Would you like to find out how internet giants like Amazon, Facebook, Google, Twitter and Netflix build their solutions? This course offers a basic introduction into the techniques to process (very) large amounts of data efficiently. We cover the motivation for big data analysis, key aspects of large scale compute infrastructure, algorithms and implementation techniques appropriate for handling large volumes of data, and the fundamental design decisions that lead to the large scale software platforms as have evolved this decade.
Instructional Modes
|
|
|
Basic programming knowledge (at the level of first-year Computing Science). |
|
Written exam (two separate tests), practical assignments and a final project. Both tests as well as the project should have a minimum grade of 5.0.
The tests contribute 40% and 30% to the final grade; the project 30%. The assignments will be graded with pass/fail, and are intended to prepare for the project.
|
|
|