TEXT MINING
SERGEY KHOROSHENKIKH
Natural Language Processing (NLP) field has gained increased attention in recent years because of impressive algorithmic advances in Deep Learning and significant progress in hardware.
Text Mining is a subset of NLP focused on unsupervised and semi-supervised algorithms of text analysis.
The course covers the main algorithms and concepts of Text Mining, including both “classical” methods from Information Retrieval domain (like TF-IDF and topic modeling) and modern Deep Learning architectures.
Sergey Khoroshenkikh is a senior software engineer with 5 years of experience in applied machine learning and data analysis. He graduated from Moscow Institute of Physics and Technology in 2015, and now he is earning a PhD at Moscow Institute of Physics and Technology in the area of random geometric graphs.
Currently he works in R&D department at Yandex, developing large-scale machine learning solutions for web-advertising (which is the main source of company’s income by now).
Students will learn:
- What types of problems can be solved with Text Mining
- Which algorithms are used for various Text Mining problems
- How to use practical tools for Text Mining
SKILLS:
-Python programming language
-Calculus and optimisation
-Probability
-Linear algebra
ABOUT SERGEY
WHAT YOU WILL LEARN
DATE: 18 May - 5 Jun, 2020
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
DATE: 18 May - 5 Jun, 2020
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
All rights reserved. 2017
COURSE OUTLINE
TEXT MINING
BIBLIOGRAPHY
“Introduction To Information Retrieval” by Christopher D Manning Cambridge University Press(2008)
"The Hundred-Page Machine Learning Book" by Andriy Burkov Andriy Burkov(2019)
Natural Language Processing (NLP) field has gained increased attention in recent years because of impressive algorithmic advances in Deep Learning and significant progress in hardware.
Text Mining is a subset of NLP focused on unsupervised and semi-supervised algorithms of text analysis.
The course covers the main algorithms and concepts of Text Mining, including both “classical” methods from Information Retrieval domain (like TF-IDF and topic modeling) and modern Deep Learning architectures.
Session 1
Introduction
NLP pipeline with spaCy.
TF-IDF.
Text analysis with scikit-learn.
Session 2
Language models and text classification
Language models: definition, algorithms, and evaluation.
Text classification: algorithms and feature engineering.
Session 3
Topic modeling
Non-negative matrix factorization (NMF).
Latent Semantic Indexing (LSI).
Latent Dirichlet Allocation (LDA).
Session 4
Word vectors
Distributional hypothesis.
Word2Vec algorithm.
HARBOUR.SPACE
Harbour.Space is a university created by entrepreneurs for entrepreneurs. We focus on meeting the demands of the future, while traditional education providers are too often stuck in the past.
We’re one of the only European institutions completely dedicated to technology, design and entrepreneurship, and our interdisciplinary courses are taught by some of today’s leading professionals. Our aim is not only to equip students with the knowledge to take on the real world, but to nurture, create and shape tomorrow’s tech superstars.
HARBOUR.SPACE UNIVERSITY