COURSE OUTLINE

Session 1

Introduction

NLP pipeline with spaCy.
TF-IDF.
Text analysis with scikit-learn.

Session 2

Language models and text classification

Language models: definition, algorithms, and evaluation.
Text classification: algorithms and feature engineering.

Session 3

Topic modeling

Non-negative matrix factorization (NMF).
Latent Semantic Indexing (LSI).
Latent Dirichlet Allocation (LDA).

Session 4

Word vectors

Distributional hypothesis.
Word2Vec algorithm.

Session 5

Word vectors - 2

GloVe algorithm. Typical use cases of word vectors in NLP tasks. Word2Vec in recommendation systems. Analysis of graphs using Node2Vec.

Session 7

Recurrent Neural Networks

Vanilla RNN. Neural Language Models.

Session 8

Recurrent Neural Networks - 2

Vanishing gradients.
LSTM and GRU.
Bidirectional RNN.

Session 9

Contextual Word Embeddings

ELMo and ULM-Fit

Session 10

Attention and transformers

Session 11

Text mining in the wild

Structured datasets.
Public APIs and semi-structured datasets. Using metadata and link information. HTML parsing.

Session 12

NLP libraries and models

FastText.
More about spaCy and GenSim.
Pre-trained transformers.

Session 6

Neural Networks

Feedforward neural networks.
Computation graph and back propagation. Optimization methods.

Session 13

Case study: text summarization
(example project)

Session 14

Case study: news aggregator

Session 15

Final project session