TEXT MINING AND 
TRANSLATION

SERGEY
NIKOLENKO

We offer innovative university degrees taught in English by industry leaders from around the world, aimed at giving our students meaningful and creatively satisfying top-level professional futures. We think the future is bright if you make it so.

Ten years ago, machine learning went through a revolution. While neural networks had been one of the oldest tools in artificial intelligence, people had not really been able to train deep architectures efficiently until the mid-2000s. After the breakthrough results of the groups of Geoffrey Hinton and Yoshua Bengio, however,
deep neural architectures quickly outperformed state of the art in image processing, speech
recognition, natural language
processing, and by now they basically define the modern state of machine learning in many different domains, from face recognition and self-driving cars to playing Go. In the course, we will see what makes modern neural networks so powerful, learn to train them properly, go through the most important architectures, and, best of all, learn to implement all of these ideas in code through standard libraries such as TensorFlow and Keras.

Senior Researcher

Sergey Nikolenko is a computer scientist with wide experience in machine learning and data analysis, algorithms design and analysis, theoretical computer science, and algebra. He graduated from the St. Petersburg State University in 2005, majoring in algebra (Chevalley groups), and earned his Ph.D. at the Steklov Mathematical Institute at St. Petersburg in 2009 in theoretical computer science (circuit complexity and theoretical cryptography). Since then, Dr. Nikolenko has been interested in machine learning and probabilistic modeling, producing theoretical results and working on practical projects for the industry. He is currently employed at the Steklov Mathematical Institute at St. Petersburg and Higher School of Economics at St. Petersburg. Dr. Nikolenko has more than 100 publications, including top computer science journals and conferences and several books.

• Machine learning: probabilistic graphical models, recommender systems, topic modeling

• Algorithms for networking: competitive analysis, FIB optimization

• Bioinformatics: processing mass-spectrometry data, genome assembly

• Proof theory, automated reasoning, computational complexity, circuit complexity

• Algebra (Chevalley groups), algebraic geometry (motives).

• Understand the main problems of natural language processing

• Be able to construct topic models by using standard libraries

• Understand and be able to use different forms of word embeddings

• Learn the structure and composition of encoder-decoder architectures and be able to construct such models in practice

SKILLS:

- Machine learning

- Algorithms for networking

- Bioinformatics

- Mathematical Modeling

- Python

ABOUT SERGEY

HARBOUR.SPACE

WHAT YOU WILL LEARN

DATE: 8 Jan - 26 Jan, 2018 .

DURATION: 3 Weeks

LECTURES: 3 Hours per day

LANGUAGE: English

LOCATION: Barcelona, Harbour.Space Campus

COURSE TYPE: Offline

HARBOUR.SPACE UNIVERSITY

@snikolenko

DATE: 8 Nov - 26 Jan, 2018

DURATION: 3 Weeks

LECTURES: 3 Hours per day

LANGUAGE: English

LOCATION: Barcelona, Harbour.Space Campus

COURSE TYPE: Offline

COURSE OUTLINE

SHOW MORE

Session 1

NLP problems and naive Bayes

Natural language processing: defining the problems. From syntactic to semantic problems. The text classification problem and the naive Bayesian classifier. Tf-idf weights.

Session 4

Practical session

Construct different topic models.

Session 3

Topic modeling

Regularised pLSA: additive regularisation of topic models (ARTM). Bayesian pLSA: latent Dirichlet allocation (LDA). LDA extensions: additional dependencies and/or additional information

Session 2

Extending naive Bayes

Can we remove the naive Bayes assumptions? From classification to clustering. From clustering to topic modeling: probabilistic latent semantic analysis.

TEXT
MINING AND
TRANSLATION

BIBLIOGRAPHY

Introduction to Information Retrieval
by Christopher D. Manning, 
Prabhakar Raghavan, etc.

Natural language processing is one of the most challenging parts of artificial intelligence. It encompasses many different problems, from well-defined classification problems to rather vague tasks that involve text generation. In the course, we will go over some of the most common NLP problems, including text classification, topic modeling, and sentiment analysis. But we will pay the most attention to modern deep learning approaches that use word embeddings and/or character-based models. We will consider encoder-decoder architectures and architectures with attention, specifically in application to machine translation and similar problems.

TEXTMINING AND TRANSLATION

TEXT
MINING AND
TRANSLATION