Session 5

 Ranking problem: the Holy Grail of a modern search engine

• Historical outline of the ranking approaches

 What to optimise for: ranking metrics

Session 6

 Frequency-based approaches to ranking

• Pagerank

 Learning-to-rank. Typical factors

Session 7 - 8

 Practice, Q&A

• Mid-course test

Session 9

 How a typical SERP is organised

• The problem of snippet extraction. Automated summarisation

Session 10

 Practice, Q&A

• Profiling a user for better search results. Location, query history, intents

Session 11

 Collecting the data: web crawling

• Duplicates and near-duplicates, origin vs. copy

 Web crawling challenges. The “Hidden Web”

Session 12

 Practice, Q&A

Session 13

 Automated web search evaluation. Collecting implicit user feedback

• A/B testing

COURSE OUTLINE

Session 3

 Real-world queries. Fixing typos. Edit distance and variations

• Searching real-world texts. Stemming and stopping. Tokenization, n-grams

 Practice

Session 4

 Getting the language right. Language models via Markov chains

 Practice

Session 2

 Searching a database. How it differs from substring search

 Database indices

 Boolean search. Processing a boolean query

Session 1

  Course logistics: communication, project organisation. Brief introduction to git and github and ideology of continuous integration

 Course outline. How does a modern web search engine work (a bird’s-eye view in 30 minutes)

Session 14

 Beyond standard text retrieval: what challenges search engines face searching for images, tweets etc.

Session 15

 Course wrap-up, project discussion