Session 5
• Ranking problem: the Holy Grail of a modern search engine
• Historical outline of the ranking approaches
• What to optimise for: ranking metrics
Session 6
• Frequency-based approaches to ranking
• Pagerank
• Learning-to-rank. Typical factors
Session 7 - 8
• Practice, Q&A
• Mid-course test
Session 9
• How a typical SERP is organised
• The problem of snippet extraction. Automated summarisation
Session 10
• Practice, Q&A
• Profiling a user for better search results. Location, query history, intents
Session 11
• Collecting the data: web crawling
• Duplicates and near-duplicates, origin vs. copy
• Web crawling challenges. The “Hidden Web”
Session 12
• Practice, Q&A
Session 13
• Automated web search evaluation. Collecting implicit user feedback
• A/B testing
COURSE OUTLINE
Session 3
• Real-world queries. Fixing typos. Edit distance and variations
• Searching real-world texts. Stemming and stopping. Tokenization, n-grams
• Practice
Session 4
• Getting the language right. Language models via Markov chains
• Practice
Session 2
• Searching a database. How it differs from substring search
• Database indices
• Boolean search. Processing a boolean query
Session 1
• Course logistics: communication, project organisation. Brief introduction to git and github and ideology of continuous integration
• Course outline. How does a modern web search engine work (a bird’s-eye view in 30 minutes)
Session 14
• Beyond standard text retrieval: what challenges search engines face searching for images, tweets etc.
Session 15
• Course wrap-up, project discussion