----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
# KDD step 4: interpretation
golden standard
*--> annotated test set
*
10-fold cross validation
*taking 1000 tweets
*training 800 tweets
*test 100 tweets
*val 100 tweets
*
compare to baseline scores
*- frequency-baseline
*you expect 80% of the tweets to be neutral(?)
*- informative baseline
*i have a 60% chance that it will rain tomorrow
*--> your result need to be higher
*otherwise --> why do all the work?
from : CLiPS Guy de Pauw, Pattern workshop — Cqrrelations, January 2015
----------------------------------------------------------------------------