Lecture 6
Natural Language Processing - Part 2
Machine Learning: Observe pattern of features and attempt to imitate it in some way
Language = vocabulary and its usage in a specific context captured by textual data
Measure how important (or descriptive) a word is in a given document collection
e.g., find the set of words that best describe multiple clusters (see Assignment 2)Predict how likely a sequence of words is to occur in a given context
e.g., find the words that are more likely to occur nextAssign numbers to words, and put semantically related words close to each other
Assign multiple numbers (a vector) to words
cat =[4,2]
dog =[3,3]
pizza =[1,1]
cat =[0,0,0,0,0,0,0,0,0,0,1,0,0,0,…,0]
dog =[0,0,0,0,0,0,0,1,0,0,0,0,0,0,…,0]
pizza =[1,0,0,0,0,0,0,0,0,0,0,0,0,0,…,0]
Raw frequency
tf(t,d)=ft,dLog normalisation
tf(t,d)=log(1+ft,d)Normalised Frequency
tf(t,d)=0.5+fmax(d)0.5ft,dIDF(t,D)=log∣d∈D:t∈d∣N
tfIDF(t,d,D)=tft,d×IDFt,D
“You shall know a word by the company it keeps” - The distributional hypothesis, John Firth (1957)
cat =[0.7,0.5,0.1]
dog =[0.8,0.3,0.1]
pizza =[0.1,0.2,0.8]
cat =[0.7,0.5,0.1]
dog =[0.8,0.3,0.1]
pizza =[0.1,0.2,0.8]
Word vector analogies
a:b=c:?
man:woman=king:?Biases in word vectors might leak through to produce unexpected, hard-to-predict biases
Lecture 6
Natural Language Processing - Part 2
CIS 419/519 Applied Machine Learning. Eric Eaton, Dinesh Jayaraman. https://www.seas.upenn.edu/~cis519/spring2020/
EECS498: Conversational AI. Kevin Leach. https://dijkstra.eecs.umich.edu/eecs498/
CS 4650/7650: Natural Language Processing. Diyi Yang. https://www.cc.gatech.edu/classes/AY2020/cs7650_spring/
Natural Language Processing. Alan W Black and David Mortensen. http://demo.clab.cs.cmu.edu/NLP/
IN4325 Information Retrieval. Jie Yang.
Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Third Edition. Daniel Jurafsky, James H. Martin.
Natural Language Processing, Jacob Eisenstein, 2018.