Alex Dainiak was born in Moscow in 1985. He had his first encounter with programming in 1998 while studying a Pascal circle and discovered he loved it. After working for some time as a programmer, he turned to mathematics. Alex Dainiak now considers himself a professional tutor and applied mathematician rather than a programmer. Nevertheless, he still produces a reasonable amount of code from time to time and takes part in personal and collective software development projects.
Research/Academic Interests:
Graph Theory, Combinatorics, Data Visualisation, Discrete Optimisation
The main benefit of the student passing the course is the opportunity to structure the knowledge they already have gained from courses of machine learning, probability, algorithms and data structures. The measurable goal is a working prototype of a search engine having all the traditional steps of data flow implemented.
SKILLS:
- Algorithms
- Computer Science
- Machine Learning
- Discrete Mathematics
- C++
- Research
- Python
- Data Analysis
- Natural Language Processing
DATE: 16 Oct - 3 Nov, 2017
DURATION: 3 Week
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
WHAT YOU WILL LEARN
ABOUT ALEX
HARBOUR.SPACE
INFORMATION RETRIEVAL AND WEB SEARCH
The web search has been one of the major applications for many applied computer science and data analytic tools for the last two decades, and still remains being so. We’ll consider a typical architecture of a modern search engine, including crawling unstructured text data, building data index, information storage, user query analysis and result ranking. Eventually we’ll build a tiny, decent search engine from scratch
ALEX DAINIAK
HARBOUR.SPACE UNIVERSITY
DATE: 16 Oct – 3 Nov, 2017
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
BIBLIOGRAPHY
INFORMATION RETRIEVAL & WEB SEARCH
All rights reserved. 2017
COURSE OUTLINE
Session 1
• Course logistics: communication, project organisation. Brief introduction to git and github and ideology of continuous integration
• Course outline. How does a modern web search engine work (a bird’s-eye view in 30 minutes)
Session 2
• Searching a database. How it differs from substring search
• Database indices
• Boolean search. Processing a boolean query
Session 3
• Real-world queries. Fixing typos. Edit distance and variations
• Searching real-world texts. Stemming and stopping. Tokenization, n-grams
• Practice
Session 4
• Getting the language right. Language models via Markov chains
• Practice