Computers that search

How do search engines and recommendation systems work?
Duration
2024 until 2024
Project member(s)
Dr H.R. Oosterhuis (Harrie)
Project type
Education

We have come to heavily depend on major search engines (such as Google, Bing, Yahoo, etc.) for finding information and navigating the vast internet. Every day we also receive dozens of suggestions from recommendation systems about news that we should read, videos that we must watch, music that we must listen to and products that we should buy. It is essential to both the users of these systems and the companies that make money from them that these systems work properly. This project is based on Harrie Oosterhuis’ research into an improved method for search engines and recommendation systems. The project introduces pupils to the workings of search engines and recommendation systems, without making use of a computer. This subsequently helps them to discover the answers to the following questions: 

  • How do search engines know which of the billions of web pages you are looking for?
  • And how do they know this in a split second?
  • How does a recommendation system figure out what you like?
  • When is this easy or difficult to figure out?
  • What are the disadvantages of these large systems?
  • And could we actually live without them?

Classroom project

The ‘Computers That Search’ project involves a number of activities that will be carried out in the classroom, and which are designed to provide insight into how internet search engines and recommendation systems work. Recommendation systems are programs that ensure that certain suggestions are made when you are using websites and apps like YouTube, Netflix and TikTok.

Following an introduction by Harrie Oosterhuis, pupils will make their first acquaintance with search engines. But instead of using a computer to search for information, they will carry out a manual search. This will consequently teach them how difficult it is to find the information that they need in a mass of disorganised information. In the next step, the pupils will once again search for information, but they will now be allowed to use indexes, which is something that search engines also use when searching for information.

The following step will involve the simulation of recommendation systems. Once the pupils have been given an explanation about the difference between search engines and recommendation systems, they will simulate a recommendation system. They will do this by collectively trying to establish a preference that is based on previous choices, in exactly the same way as recommendation systems do. They will then learn about the various principles that computers also use when making recommendations.

In between the activities, the parallel between the activities and computers will also be discussed and explained. Learning about the principles that the computer works with without actually using a computer will give the pupils insight into how computer processes work.

About Harrie Oosterhuis’ research

Search engines and recommendation systems use ‘machine learning’ methods to learn from users’ behaviour, so that they can automatically adapt to their preferences. These self-learning methods basically work in the following: they look at very large datasets, which contain many examples of choices that users have made, and they then try to recognise patterns in them. An example of how they achieve this is by looking at which series Netflix users have watched, and subsequently identifying the user and series properties that usually go together. This means that the system tries to predict whether a series and a user are a good match. The more accurate this prediction is, the better the service’s recommendations will be.

When it comes to learning about user behaviour, the problem is that it this influenced by a lot of factors that have nothing to do with preference. There may be several reasons why someone buys product X instead of product Y. It could be that this person actually prefers product X, but it may also be that product X has been recommended more often or that they have never been exposed to product Y. In many cases, user behaviour says more about the recommendations that were made or the search results that were displayed (presentation factors) rather than users’ actual preferences (preference effects).

The focus of Harrie Oosterhuis’ research is about separating the effect that preferences and presentation factors (i.e., the role of recommendations and search results) have on users. This involves two common approaches: a frequency method and a regression method. The frequency method focuses on how many users click on an item. On the basis of various data, an estimation is then made about how many users considered clicking on the corresponding result. This helps to determine the percentage of users who would click on the item if all of the users were to consider buying it. For example, say that 12% of users clicked on a search result, but since it’s at the bottom of the list, we believe that only 40% of the users considered buying it. The frequency method accounts for the 60% of absent users and estimates that 30% of all users preferred the result (12% of 40% is 30%). The characteristics method looks at certain characteristics of the result, for example, how many words of the search are in the result, and estimates how many users will click on the basis of this. Harrie Oosterhuis has devised a method that combines both approaches, as it consequently examines both the click frequency and the characteristics of the results. The uniqueness of the new method lies in the fact that it is mathematically substantiated, which makes it possible to prove that the method is guaranteed to give the right answer if enough data is available (under certain conditions).

Because the new method requires much less data than the old methods, smaller companies that have a small user base can also benefit from it. But large tech companies can also take greater advantage of this method, because it will enable them to learn much more efficiently from their customers, which will allow them to adapt to user preferences more quickly.