Welcome to Constant Etherpad!
These pads are archived each night (around 4AM CET) @
An RSS feed from the etherdump also appears on
To prevent your public pad from appearing in the archive and RSS feed, put or just leave the following (including the surrounding double underscores) anywhere in the text of your pad:
Changes will be reflected after the next update at 4AM.
Tuesday 23 May 2017
Previous sessions with all ressources (in nr 1)
a practical excercise
softmax exploration
Following assignment 1, from PSET 1
and listening back to course 3, Simplest window classifier: Softmax
assignment 1:
PSET 1 overview:
lecture notes on the softmax:
wiki page (with example):
a classifier for classification problems
"Logistic regression = Softmax classification on word vector x to obtain probability for class y" (from slides lecture 3)
softmax = cross entropy = logistic regression (all synonyma's, not synonyms, but do similar things)
softmax = for multiclass problems
logistic regression = for binary class problems
cross entropy = loss function for softmax
'The softmax classifier is a linear classifier that uses the cross-entropy loss function. In other words, the gradient of the above function tells a softmax classifier how exactly to update its weights using something like gradient descent.'
example from lecture 3
problem: Named Entity Recognition, location detection
sample sentence: museums in paris are amazine
window size = 2
center word = "paris"
resulting vector is a column vector = a accumulation of 5 (row?) vectors = a 5 dimensional column vector
how to take derivatives of words for next layers softmax is considered one layer, word vectors could be considered as other layer
step 1:
define all variables
create wordvectors
then softmax is a simple next step, that is not a common task but a good simple example for now
class y/word vector x, we take the y'th row of our matrix, x is the column(?)
we normalize this over all the classes, so that sum of probabilites is 1
(sigmoid function works with 2 classes)
as we train our softmax, we use a loss/cost/objective function (minimize or maximize cost)
loss for softmax = cross entropy
we compute probability of word for certain class y: take y'th row of W and multiply that row with x
f(y) = feature vector, for the y'th class
C = compute all fs for different classes
loss wants to maximize probability of x for class y
all this comes back in information theory
when training osft max classifier we try to optimize entropy error
our previous notes on this part of the course
slide 38
Next step:
softmax > classifying vectors into multiple classes (sigmoid is the version for 2 classes)
The formula builds upon the previous results/matrix, and uses the data to classify groups of word vectors
P(y|x) = give vector x, and ask what the probability is that vector x belongs to class y
word vector x
Wy: we take the y-th row of matrix W
C amount of classes you have
c is a specific class, a row (because the row is already a row vector, we do not transpose)
d columns/dimensions
normalize for all classes (all probabilities of y, notated as C)
number of window size = number of dimensions
Softmax function scripts in Python
wiki softmax function (expanded version with many options that do the same (or approximations to the very same))
following the Wikipedia example:
import math
print '~~'
x = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0]
# ORIGINAL CODE -----------------------------------
# x_exp = [math.exp(i) for i in x]
# STUDY CODE -----------------------------------
x_exp = []
for i in x:
# exp = math.exp(i)
exp = math.e**i
# math.e = 2.718281828459045 # (this is an approximation) [~~]
# exp = 2.718281828459045**i # (this is an approximation) [~~]
print x_exp
# Result: [2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09]
print '~~'
# ORIGINAL CODE -----------------------------------
sum_x_exp = sum(x_exp)
print sum_x_exp # Result: 114.98
# STUDY CODE -----------------------------------
for x in x_exp:
print sumofall
print '~~'
# ORIGINAL CODE -----------------------------------
softmax = [round(i / sum_x_exp, 3) for i in x_exp]
print softmax
# Result: [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]
# STUDY CODE -----------------------------------
# [~~] remember!!! 5/3=1
print 5/3
# [~~] remember!!! 3/5=0
print 3/5
# [~~] remember!!! 3.0/5.0=0.6
print 3.0/5.0
softmax = []
for y in x_exp:
s = round(i / sumofall, 3)
# round() = round(number [, ndigits])
print 'y:', y
print 'y/sum:', y/sumofall
# [~~] i = 1.0, Result: 2.71(...) / 114.9(...) = 0.0236405430216
# [~~] i = 2.0, Result: 7.38(...) / 114.9(...) = 0.0642616585105
# [~~] i = 3.0, Result: 20.08(...) / 114.9(...) = 0.174681298596
# [~~] i = 4.0, Result: 54.59(...) / 114.9(...) = 0.474832999744
print softmax
print '~~'
# ~~~~~~~~~~~~~~~~~~~~~~~~
# links:
# ~~~~~~~~~~~~~~~~~~~~~~~~
# source: