RSS02.E1 Big Data: Collecting Web Data with R
Over the past years, more and more information useful for social science research has become available online. For example, political organizations publish press releases and recordings of their meetings online. NGOs publish policy briefings and report indicators about economic, social, and political developments. And social media users produce large chunks of online content every day.
The increasing availability of such information online enables new types of research in the social sciences. Yet extracting this information and reshaping it into data formats ready for downstream analyses can be challenging. This makes web data collection skills essential for researchers.
The goal of this course is to equip participants with the R programming skills necessary to gather online data and process them into formats they can use in their research.
Participants will learn :
- about the characteristics of web data
- how to extract via Application Programming Interfaces (APIs), including those maintained by popular social media platforms such as Instagram or Twitter
- how to scrape content from different types of webpage
|26 June 2023 - 30 June 2023|
Early Bird Regular: €895 (application deadline* April 1st)
|Scholarships and discounts||Find more information here|
*Your application is only completed when the course fee has been paid
|Course leader||Hauke Licht|
|Level of participant||
|Admission requirements||Participants require prior experience with programming R. Specifically, they should know how to create and manipulate vector, list, and data frame object; how to program for loops; how to use functions in base R’s apply family (mainly lapply and sapply, or the map functions in the “purrr” package); they should know how to in- and export data files into an R session (e.g., with read.csv or saveRDS, or with functions in the “readr” package); and they should know the basics of how to manipulate character vectors and values in R.
Moreover, participants will need to implement some rather technical setup tasks before and during the course. This includes applying for and establishing API access (“authentication”), and setting up the “RSelenium” package in Docker () for scraping dynamic webpage. The instructor will provide guidelines and support for these steps, however.
Further, participants should be willing and motivated to engage with different web technologies.
|Mode of Study||On Campus|
|ECTS||2 or 4 Find more information here|