OSIRIS - Onderwijsaanbod NWI-IBC036 2019

Cursus

NWI-IBC036

Studiepunten (ECTS)

Categorie

BA (Bachelor)

Voertaal

Engels

Aangeboden door

Radboud Universiteit; Faculteit der Natuurwetenschappen, Wiskunde en Informatica; Informatica en Informatiekunde;

Docenten

Coördinator		prof. dr. ir. A.P. de Vries Overige cursussen docent
Docent		prof. dr. ir. A.P. de Vries Overige cursussen docent
Contactpersoon van de cursus		prof. dr. ir. A.P. de Vries Overige cursussen docent
Examinator		prof. dr. ir. A.P. de Vries Overige cursussen docent

Collegejaar

2019

Periode

KW3-KW4

(03-02-2020 t/m 30-08-2020)

Aanvangsblok

KW3

Onderwijsvorm

voltijd

Opmerking

Inschrijven via OSIRIS

Inschrijven voor bijvakkers

Voorinschrijving

Nee

Wachtlijst

Nee

Plaatsingsprocedure

Cursusdoelen

After completing this course, students

can explain the system architecture of a data centre and clarify the challenges of programming at data centre scale;
can describe the design of distributed filesystems;
understand the system architecture of the Map-Reduce and Spark big data platforms;
analyse and apply widely used and scalable algorithms, including map-reduce design patterns;
understand core techniques and data structures that scale to large data, such as locality senstive hashing and inverted files;
use the Apache Spark architecture as a basis for solving big data problems.

Inhoud

How to program a data center instead of a single computer?

Would you like to find out how internet giants like Amazon, Facebook, Google, Twitter and Netflix build their solutions? This course offers a basic introduction into the techniques to process (very) large amounts of data efficiently. We cover the motivation for big data analysis, key aspects of large scale compute infrastructure, algorithms and implementation techniques appropriate for handling large volumes of data, and the fundamental design decisions that lead to the large scale software platforms as have evolved this decade.

Niveau

Voorkennis

Basic programming knowledge (at the level of first-year Computing Science).

Toetsinformatie

Written exam (two separate tests), practical assignments and a final project.

Bijzonderheden

Onderwerpen

Big data, large scale data engineering, access patterns, latency vs. throughput, distributed file systems, mapreduce / Hadoop, Spark, NOSQL, locality sensitive hashing, inverted files, sharding, streaming, replication, fault-tolerance.

Toetsinformatie

Written exam (two separate tests), practical assignments and a final project.

Voorkennis

Basic programming knowledge (at the level of first-year Computing Science).

Verplicht materiaal

Wordt nader bekendgemaakt

Literatuur is made available through Brightspace.

Werkvormen

Cursusgebeurtenis

Aanwezigheidsplicht

Exam Q4

Lectures

Aanwezigheidsplicht

Project

Aanwezigheidsplicht

Opmerking

After a pass-fail series of assignments, you carry out your own big data project on a large Web crawl.

Resit Exam Q4

Self study

Algemeen

De cursus is gericht op het versterken van praktische vaardigheden, waaronder het gebruik van Spark, github en Docker. Opdrachten bereiden de studenten voor op een eindproject waarin wordt gewerkt met een grote webcrawl (~150 TB) op het nationale Hadoop cluster van SurfSara.

Toetsen

Toets 1

Weging

Toetsvorm

Test

Gelegenheden

Blok KW3, Blok KW4

Digitale toets 2

Weging

Toetsvorm

Digitale toets met CIRRUS

Gelegenheden

Blok KW4, Blok KW4

Project

Weging

Toetsvorm

Project

Gelegenheden

Blok KW4, Blok KW4