http://in.harbour.space/undefined/

Session 5

• Ranking problem: the Holy Grail of a modern search engine

• Historical outline of the ranking approaches

• What to optimise for: ranking metrics

Session 6

• Frequency-based approaches to ranking

• Pagerank

• Learning-to-rank. Typical factors

Session 7 - 8

• Practice, Q&A

• Mid-course test

Session 9

• How a typical SERP is organised

• The problem of snippet extraction. Automated summarisation

Session 10

• Practice, Q&A

• Profiling a user for better search results. Location, query history, intents

Session 11

• Collecting the data: web crawling

• Duplicates and near-duplicates, origin vs. copy

• Web crawling challenges. The “Hidden Web”

Session 12

• Practice, Q&A

Session 13

• Automated web search evaluation. Collecting implicit user feedback

• A/B testing

COURSE OUTLINE

Session 3

• Real-world queries. Fixing typos. Edit distance and variations

• Searching real-world texts. Stemming and stopping. Tokenization, n-grams

• Practice

Session 4

• Getting the language right. Language models via Markov chains

• Practice

Session 2

• Searching a database. How it differs from substring search

• Database indices

• Boolean search. Processing a boolean query

Session 1

• Course logistics: communication, project organisation. Brief introduction to git and github and ideology of continuous integration

• Course outline. How does a modern web search engine work (a bird’s-eye view in 30 minutes)

Session 14

• Beyond standard text retrieval: what challenges search engines face searching for images, tweets etc.

Session 15

• Course wrap-up, project discussion