TEXT MINING AND TRANSLATION
SERGEY
NIKOLENKO
We offer innovative university degrees taught in English by industry leaders from around the world, aimed at giving our students meaningful and creatively satisfying top-level professional futures. We think the future is bright if you make it so.
Ten years ago, machine learning went through a revolution. While neural networks had been one of the oldest tools in artificial intelligence, people had not really been able to train deep architectures efficiently until the mid-2000s. After the breakthrough results of the groups of Geoffrey Hinton and Yoshua Bengio, however,
deep neural architectures quickly outperformed state of the art in image processing, speech
recognition, natural language
processing, and by now they basically define the modern state of machine learning in many different domains, from face recognition and self-driving cars to playing Go. In the course, we will see what makes modern neural networks so powerful, learn to train them properly, go through the most important architectures, and, best of all, learn to implement all of these ideas in code through standard libraries such as TensorFlow and Keras.
Sergey Nikolenko is a computer scientist with wide experience in machine learning and data analysis, algorithms design and analysis, theoretical computer science, and algebra. He graduated from the St. Petersburg State University in 2005, majoring in algebra (Chevalley groups), and earned his Ph.D. at the Steklov Mathematical Institute at St. Petersburg in 2009 in theoretical computer science (circuit complexity and theoretical cryptography). Since then, Dr. Nikolenko has been interested in machine learning and probabilistic modeling, producing theoretical results and working on practical projects for the industry. He is currently employed at the Steklov Mathematical Institute at St. Petersburg and Higher School of Economics at St. Petersburg. Dr. Nikolenko has more than 100 publications, including top computer science journals and conferences and several books.
• Machine learning: probabilistic graphical models, recommender systems, topic modeling
• Algorithms for networking: competitive analysis, FIB optimization
• Bioinformatics: processing mass-spectrometry data, genome assembly
• Proof theory, automated reasoning, computational complexity, circuit complexity
• Algebra (Chevalley groups), algebraic geometry (motives).
• Understand the main problems of natural language processing
• Be able to construct topic models by using standard libraries
• Understand and be able to use different forms of word embeddings
• Learn the structure and composition of encoder-decoder architectures and be able to construct such models in practice
SKILLS:
- Machine learning
- Algorithms for networking
- Bioinformatics
- Mathematical Modeling
- Python
ABOUT SERGEY
HARBOUR.SPACE
WHAT YOU WILL LEARN
DATE: 8 Jan - 26 Jan, 2018 .
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
HARBOUR.SPACE UNIVERSITY
@snikolenko
DATE: 8 Nov - 26 Jan, 2018
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
All rights reserved. 2017
COURSE OUTLINE
Session 1
NLP problems and naive Bayes
Natural language processing: defining the problems. From syntactic to semantic problems. The text classification problem and the naive Bayesian classifier. Tf-idf weights.
Session 4
Practical session
Construct different topic models.
Session 3
Topic modeling
Regularised pLSA: additive regularisation of topic models (ARTM). Bayesian pLSA: latent Dirichlet allocation (LDA). LDA extensions: additional dependencies and/or additional information
Session 2
Extending naive Bayes
Can we remove the naive Bayes assumptions? From classification to clustering. From clustering to topic modeling: probabilistic latent semantic analysis.
BIBLIOGRAPHY
Natural language processing is one of the most challenging parts of artificial intelligence. It encompasses many different problems, from well-defined classification problems to rather vague tasks that involve text generation. In the course, we will go over some of the most common NLP problems, including text classification, topic modeling, and sentiment analysis. But we will pay the most attention to modern deep learning approaches that use word embeddings and/or character-based models. We will consider encoder-decoder architectures and architectures with attention, specifically in application to machine translation and similar problems.