NWI-IBC036
Big Data
Course infoSchedule
Course moduleNWI-IBC036
Credits (ECTS)6
CategoryBA (Bachelor)
Language of instructionEnglish
Offered byRadboud University; Faculty of Science; Informatica en Informatiekunde;
Lecturer(s)
Coordinator
prof. dr. ir. A.P. de Vries
Other course modules lecturer
Lecturer
prof. dr. ir. A.P. de Vries
Other course modules lecturer
Contactperson for the course
prof. dr. ir. A.P. de Vries
Other course modules lecturer
Examiner
prof. dr. ir. A.P. de Vries
Other course modules lecturer
Academic year2018
Period
KW3-KW4  (28/01/2019 to 01/09/2019)
Starting block
KW3
Course mode
full-time
Remarks-
Registration using OSIRISYes
Course open to students from other facultiesYes
Pre-registrationNo
Waiting listNo
Placement procedure-
Aims
After completing this course, students
  • can explain the system architecture of a data centre and clarify the challenges of programming at data centre scale;
  • can describe the design of distributed filesystems;
  • understand the system architecture of the Map-Reduce and Spark big data platforms;
  • analyse and apply widely used and scalable algorithms, including map-reduce design patterns;
  • understand core techniques and data structures that scale to large data, such as locality senstive hashing and inverted files;
  • use the Apache Spark architecture as a basis for solving big data problems.
Content
How to program a data center instead of a single computer?
Would you like to find out how internet giants like Amazon, Facebook, Google, Twitter and Netflix build their solutions? This course offers a basic introduction into the techniques to process (very) large amounts of data efficiently. We cover the motivation for big data analysis, key aspects of large scale compute infrastructure, algorithms and implementation techniques appropriate for handling large volumes of data, and the fundamental design decisions that lead to the large scale software platforms as have evolved this decade.
Topics
Big data, large scale data engineering, access patterns, latency vs. throughput, distributed file systems, mapreduce / Hadoop, Spark, NOSQL, locality sensitive hashing, inverted files, sharding, streaming, replication, fault-tolerance.

Test information
Written exam (two separate tests), practical assignments and a final project.

Prerequisites
Basic programming knowledge (at the level of first-year Computing Science).

Required materials
To be announced
Literatuur is made available through Brightspace.

Instructional modes
Course occurrence
Attendance MandatoryYes

Lectures
Attendance MandatoryYes

Project
Attendance MandatoryYes

Remark
After a pass-fail series of assignments, you carry out your own big data project on a large Web crawl.

Self study

General
The aim of the course is to enhance your practical skills, such as using Spark, github and Docker. Assignments prepare the students for a final project, part of which is a big web crawl (~150 TB) on the national Hadoop cluster of SurfSara.

Tests
Test 1
Test weight4
Test typeTest
OpportunitiesBlock KW4, Block KW4

Test 2
Test weight3
Test typeTest
OpportunitiesBlock KW4, Block KW4

Assignment
Test weight3
Test typeAssignment
OpportunitiesBlock KW4, Block KW4