Welcome to Constant Etherpad!
These pads are archived each night (around 4AM CET) @
http://etherdump.constantvzw.org/
An RSS feed from the etherdump also appears on
http://constantvzw.org/
To prevent your public pad from appearing in the archive and RSS feed, put or just leave the following (including the surrounding double underscores) anywhere in the text of your pad:
__NOPUBLISH__
Changes will be reflected after the next update at 4AM.
Tuesday 23 May 2017
Previous sessions with all ressources (in nr 1)
http://pad.constantvzw.org/public_pad/neural_networks_6
http://pad.constantvzw.org/public_pad/neural_networks_5
http://pad.constantvzw.org/public_pad/neural_networks_4
http://pad.constantvzw.org/public_pad/neural_networks_3
http://pad.constantvzw.org/public_pad/neural_networks_2
http://pad.constantvzw.org/public_pad/neural_networks_1
http://pad.constantvzw.org/public_pad/neural_networks_algolit_extensions
http://pad.constantvzw.org/public_pad/neural_networks_small_dict
http://pad.constantvzw.org/public_pad/neural_networks_maisondulivre
a practical excercise
softmax exploration
Following assignment 1, from PSET 1
and listening back to course 3, Simplest window classifier: Softmax
assignment 1:
http://web.stanford.edu/class/cs224d/assignment1/assignment1.pdf
PSET 1 overview:
http://web.stanford.edu/class/cs224d/assignment1/index.html
lecture notes on the softmax:
http://web.stanford.edu/class/cs224d/lecture_notes/notes2.pdf
wiki page (with example):
https://en.wikipedia.org/wiki/Softmax_function
a classifier for classification problems
"Logistic regression = Softmax classification on word vector x to obtain probability for class y" (from slides lecture 3)
softmax = cross entropy = logistic regression (all synonyma's, not synonyms, but do similar things)
softmax = for multiclass problems
logistic regression = for binary class problems
cross entropy = loss function for softmax
'The softmax classifier is a linear classifier that uses the cross-entropy loss function. In other words, the gradient of the above function tells a softmax classifier how exactly to update its weights using something like gradient descent.'
example from lecture 3
problem: Named Entity Recognition, location detection
sample sentence: museums in paris are amazine
window size = 2
center word = "paris"
resulting vector is a column vector = a accumulation of 5 (row?) vectors = a 5 dimensional column vector
how to take derivatives of words for next layers softmax is considered one layer, word vectors could be considered as other layer
step 1:
define all variables
create wordvectors
then softmax is a simple next step, that is not a common task but a good simple example for now
p(y/x)
class y/word vector x, we take the y'th row of our matrix, x is the column(?)
we normalize this over all the classes, so that sum of probabilites is 1
(sigmoid function works with 2 classes)
as we train our softmax, we use a loss/cost/objective function (minimize or maximize cost)
loss for softmax = cross entropy
we compute probability of word for certain class y: take y'th row of W and multiply that row with x
f(y) = feature vector, for the y'th class
C = compute all fs for different classes
loss wants to maximize probability of x for class y
all this comes back in information theory
when training osft max classifier we try to optimize entropy error
our previous notes on this part of the course
http://pad.constantvzw.org/public_pad/neural_networks_4
-
slide 38
-
Next step:
-
softmax > classifying vectors into multiple classes (sigmoid is the version for 2 classes)
-
The formula builds upon the previous results/matrix, and uses the data to classify groups of word vectors
-
-
P(y|x) = give vector x, and ask what the probability is that vector x belongs to class y
-
-
word vector x
-
Wy: we take the y-th row of matrix W
-
C amount of classes you have
-
c is a specific class, a row (because the row is already a row vector, we do not transpose)
-
d columns/dimensions
-
normalize for all classes (all probabilities of y, notated as C)
number of window size = number of dimensions
Softmax function scripts in Python
https://martin-thoma.com/softmax/
https://stackoverflow.com/questions/34968722/softmax-function-python
wiki softmax function (expanded version with many options that do the same (or approximations to the very same))
following the Wikipedia example:
https://en.wikipedia.org/wiki/Softmax_function
import math
print '~~'
x = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0]
# ORIGINAL CODE -----------------------------------
# x_exp = [math.exp(i) for i in x]
# STUDY CODE -----------------------------------
x_exp = []
for i in x:
# exp = math.exp(i)
exp = math.e**i
# math.e = 2.718281828459045 # (this is an approximation) [~~]
# exp = 2.718281828459045**i # (this is an approximation) [~~]
x_exp.append(exp)
print x_exp
# Result: [2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09]
print '~~'
# ORIGINAL CODE -----------------------------------
sum_x_exp = sum(x_exp)
print sum_x_exp # Result: 114.98
# STUDY CODE -----------------------------------
sumofall=0
for x in x_exp:
sumofall=sumofall+x
print sumofall
print '~~'
# ORIGINAL CODE -----------------------------------
softmax = [round(i / sum_x_exp, 3) for i in x_exp]
print softmax
# Result: [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]
# STUDY CODE -----------------------------------
# [~~] remember!!! 5/3=1
print 5/3
# [~~] remember!!! 3/5=0
print 3/5
# [~~] remember!!! 3.0/5.0=0.6
print 3.0/5.0
softmax = []
for y in x_exp:
s = round(i / sumofall, 3)
# round() = round(number [, ndigits])
print 'y:', y
print 'y/sum:', y/sumofall
# [~~] i = 1.0, Result: 2.71(...) / 114.9(...) = 0.0236405430216
# [~~] i = 2.0, Result: 7.38(...) / 114.9(...) = 0.0642616585105
# [~~] i = 3.0, Result: 20.08(...) / 114.9(...) = 0.174681298596
# [~~] i = 4.0, Result: 54.59(...) / 114.9(...) = 0.474832999744
softmax.append(s)
print softmax
print '~~'
# ~~~~~~~~~~~~~~~~~~~~~~~~
# links:
# ~~~~~~~~~~~~~~~~~~~~~~~~
# source:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.exp.html