COURSE OUTLINE
Session 1
Introduction
NLP pipeline with spaCy.
TF-IDF.
Text analysis with scikit-learn.
Session 2
Language models and text classification
Language models: definition, algorithms, and evaluation.
Text classification: algorithms and feature engineering.
Session 3
Topic modeling
Non-negative matrix factorization (NMF).
Latent Semantic Indexing (LSI).
Latent Dirichlet Allocation (LDA).
Session 4
Word vectors
Distributional hypothesis.
Word2Vec algorithm.
Session 5
Word vectors - 2
GloVe algorithm. Typical use cases of word vectors in NLP tasks. Word2Vec in recommendation systems. Analysis of graphs using Node2Vec.
Session 7
Recurrent Neural Networks
Vanilla RNN. Neural Language Models.
Session 8
Recurrent Neural Networks - 2
Vanishing gradients.
LSTM and GRU.
Bidirectional RNN.
Session 9
Contextual Word Embeddings
ELMo and ULM-Fit
Session 10
Attention and transformers
Session 11
Text mining in the wild
Structured datasets.
Public APIs and semi-structured datasets. Using metadata and link information. HTML parsing.
Session 12
NLP libraries and models
FastText.
More about spaCy and GenSim.
Pre-trained transformers.
Session 6
Neural Networks
Feedforward neural networks.
Computation graph and back propagation. Optimization methods.
Session 13
Case study: text summarization
(example project)
Session 14
Case study: news aggregator
Session 15
Final project session