Topic Overview
(*) indicates that you should be familiar with the basic concept, but details will not be tested on the exam.
General Linguistics Concepts (Parts of J&M 2nd ed. Ch 1 but mostly split over different chapters in J&M. Ch.1 not yet available in 3rd ed.)
- Levels of linguistic representation: phonetics/phonology, morphology, syntax, semantics, pragmatics
- Ambiguity, know some examples in syntax and semantics, including PP attachment, noun-noun compounds.
- Garden-path sentences.
- Type/Token distinction.
- Know the following terms: word form, stem, lemma.
- Parts of speech:
- know the 9 traditional POS and some of the Penn Treebank tags (but no need to remember the complete tagset).
- Types of Linguistic Theories (Prescriptive, Descriptive, Explanatory)
- Syntax:
- Constituency and Recursion. Constituency tests (*).
- Dependency.
- Grammatical Relations.
- Subcategorization
- Long-distance dependencies. (*)
- Syntactic heads (connection between dependency and constituency structure).
- Center embeddings.
- Dependency syntax:
- Head, dependent
- Dependency relations and labels
- Projectivity
Text Processing (Split over different chapters in J&M. Parts of J&M 3rd ed. Ch. 6)
- Tokenization (word segmentation).
- Sentence splitting.
- Lemmatization.
Probability Background
- Prior vs. conditional probability.
- Sample space, basic outcomes
- Probability distribution
- Events
- Random variables
- Bayes' rule
- conditional independence
- discriminative vs. generative models
- Noisy channel model.
- Calculating with probabilities in log space.
Text Classification (J&M 3rd ed. Ch. 6)
- Task definition and applications
- Document representation: Set/Bag-of-words, vector space model
- Naive Bayes' and independence assumptions in text classification.
Language Models (J&M 2nd ed. Ch 4.1-4.8, J&M 3rd ed. Ch 4)
- Task definition and applications
- Probability of the next word and probability of a sentence
- Markov independence assumption.
- n-gram language models.
- Role of the START and END markers.
- Estimating ngram probabilities from a corpus:
- Maximum Likelihood Estimates
- Dealing with unseen Tokens
- Smoothing and Back-off:
- Additive Smoothing
- Discounting
- Linear Interpolation
- Katz' Backoff
- Perplexity
Sequence Labeling (POS tagging) (J&M 2nd ed Ch 5.1-5.5, J&M 3rd ed. Ch 10.1-10.4)
- Linguistic tests for part of speech (*).
- Hidden Markov Model:
- Observations (sequence of tokens)
- Hidden states (sequence of part of speech tags)
- Transition probabilities, emission probabilities
- Markov chain
- Three tasks on HMMs: Decoding, Evaluation, Training
- Decoding: Find the most likely sequence of tags:
- Viterbi algorithm (dynamic programming, know the algorithm and data structures involved)
- Evaluation: Find the probability of a sequence of words
- Spurious ambiguity: multiple hidden sequences lead to the same observation.
- Forward algorithm (difference to Viterbi).
- Training: We only discussed maximum likelihood estimates.
- Decoding: Find the most likely sequence of tags:
- Extending HMMs to trigrams. (*)
- Applying HMMs to other sequence labeling tasks, for example Named Entity Recognition
- B.I.O. tags for NER.
Parsing with Context Free Grammars (J&M 2nd ed Ch. 12 and 13.1-13.4 and 14.1-14.4 and Ch. 16, J&M 3rd ed. Ch. 11.1-11.5and Ch 13.1-13.4
- Tree representation of constituency structure. Phrase labels.
- CFG definition: terminals, nonterminals, start symbol, productions/rules
- derivations and language of a CFG.
- derivation trees (vs. derived string), parse tree.
- Regular grammars / languages(*)
- Complexity classes.
- "Chomsky hierarchy"
- Center embeddings as an example of a non-regular phenomenon.
- Cross-serial dependencies as an example of a non-context-free phenomenon. (*)
- Probabilitistic context free grammars (PCFG):
- Probability of a tree vs. probability of a sentence.
- Maximum likelihood estimates from treebanks.
- Computing the most likely parse tree using CKY.
- Recognition (membership checking) problem vs. parsing
- Top-down vs. bottom-up parsing.
- CKY parser:
- bottom-up approach.
- Chomsky normal form.
- Dynamic programming algorithm. (know the algorithm and required data structure: CKY parse table). Split position.
- Backpointers.
- Parsing with PCFGs
Dependency parsing ( J&M 3rd ed. Ch 14.1-14.5 Supplementary material: Küber, McDonald, and Nivre (2009): Dependency Parsing, Ch.1 to 4.2 )
- Transition based dependency parsing:
- States (configuration): Stack, Buffer, Partial dependency tree A
- Transitions (Arc-standard system): Shift, Left-Arc, Right-Arc
- Predicting the next transition using discriminative classifiers.
- Feature definition (address + attribute)
- Training the parser from a treebank:
- Oracle transitions from annotated dependency tree.
- Arc-eager system (*)
- Graph-based dependency parsing (* know the general idea: each word needs to get a head, start with completely connected graph, score each edge, keep the highest scoring edges)
Machine Learning (Some textbook references below)
- Generative vs. discriminative algorithms
- Supervised learning. Classification vs. regression problems, multi-class classification
- Loss functions: Least squares error (*). Classification error.
- Training (empirical risk) vs. testing error (validation risk). Overfitting.
- Linear Models.
- activation function. (step function)
- Bias weight
- perceptron learning algorithm (*)
- linear separability and the XOR problem. (*)
- Feature functions
- Log-linear models
- model formulation
- logistic regression and logits (*)
- log likelihood as a function of the weight vector.
- gradient ascent
- derivative of the log likelihood function (*)
- regularization (*)
- Maximum Entropy Markov Models (for POS tagging)
- Feed-forward neural nets (J&M 3rd. ed. ch. 8, as well as Goldberg Ch 3. & 5.1)
- Multilayer neural nets.
- Different activation functions (sigmoid, ReLU, tanh)
- Computation Graph.
- Softmax.
- Backpropagation (*) (know the idea of propagating back partial derivatives of the loss with respect to each weight -- you won't have to do calculus on the final).
- Neural language model (Goldberg Ch. 9.4, J&M SLP 3rd. edition Ch. 6.2-6.5 )