Certainty detection
'Commitment' classifier?

Logbook: http://pad.constantvzw.org/p/certainty_logbook
Vragen: http://pad.constantvzw.org/p/certainty_questions
modality.py close reading: http://pad.constantvzw.org/p/certainty_modality.py_close_reading
Modality paper notes: http://pad.constantvzw.org/public_pad/certainty_notes_Modality-and-Negation

- Kan de modaliteits functie gebruikt worden voor het meten van modaliteit in de engelse taal in het algemeen? Of is de modaliteit context gevoelig?

nuance onderzoeken op verschillende niveaus:
* tussen verschillende ideologieen van tekstmining praktijken (verklarend/niet-verklarend + geinteresseerd in thematische/syntactische analyses)
*World Well Being Project http://wwbp.org
*CLiPS
* Alexander Hogenboom, PhD student Erasmus Universiteit http://www.erim.eur.nl/research/news/detail/3720-phd-defence-alexander-cornelis-hogenboom/
*binnen ons tekstmining proces grijze gebieden detecteren:
*statistische aannames
*threshold
*annotatie systemen
*quotering van woordenschat (0.75, 1, -1)
*achtergrond van annotaters
*software black boxes
*vb scikit learn waarbij je in 1 lijntje code de algoritmes kan selecteren (kNN, SVM, ...), hoe is die gemaakt en door wie?
*trial-and-error proces om een classifier te selecteren
*in taalgebruik van populaire sprekers*to detect: epistemic modality, which expresses the speaker’s degree of commitment to the truth of a proposition. 

- definition of types of certainty
*- We understand modality in a broad sense, which involves related concepts like “subjectivity”, “hedging”, “evidentiality”, “uncertainty”, “committed belief,” and “factuality”. (Modality and Negation: An Introduction to the Special Issue)
- corpus (video's)
*> noodzaak: hoge kwaliteit ondertiteling, engels gesproken
*- Ted talks / most popular
*- most popular talks: http://www.ted.com/talks?sort=popular
*- (historical) speeches van leiders (Obama, Steve Jobs, Merkel...)
*- talks / lectures (an transcribed archive)
*- artists
*- time 10 most famous speeches (text): http://content.time.com/time/specials/packages/completelist/0,29569,1841228,00.html
*- Marie-Claire selection of 20 most famous speeches: http://www.marieclaire.co.uk/blogs/544196/25-iconic-speeches-you-ll-want-to-watch-on-repeat.html
*- http://www.americanrhetoric.com/top100speechesall.html

Deadline An: 9 mei
Deadline Manetta: 30 mei

plan:
*Study  the state of the art in the automatic assignment of negation and  modality (and their scope) in text mining. A good place to start is 
*Morante,  Roser, and Caroline Sporleder. "Modality and Negation: An Introduction  to the Special Issue." Computational Linguistics 38.2 (2012) http://www.anthology.aclweb.org/J/J12/J12-2001.pdf
*Collect a corpus with texts of a register and domain of interest. 
*https://ted2srt.org/#/talks/al_gore_the_case_for_optimism_on_climate_change
*hoeveel video's?
*hoeveel zinnen?
*selectie = most popular (most viewed/of all times-20/per year-15) + aanwezigheid op http://ted2srt.org
*Apply an existing negation and modality analyzer to your corpus.
*how to integrate unsupervised ML in process?
*Pattern modality.py
*andere ?
*Use Machine Learning techniques to learn a model from this automatically annotated data.


Ressources
* "Modality and Negation: An Introduction  to the Special Issue."  (2012) - Morante,  Roser, and Caroline Sporleder.
http://www.anthology.aclweb.org/J/J12/J12-2001.pdf
--> notes: http://pad.constantvzw.org/public_pad/Certainty_notes_Modality-and-Negation

* Tom De Smedt, Modeling Creaitvity (Pattern Phd)
http://www.clips.ua.ac.be/sites/default/files/modeling-creativity.pdf
p.123
"The modality() function returns a value  between -1.0 and +1.0, expressing  the  degree  of certainty based on modal verbs and adverbs in the sentence. For example, “I  wish it  would stop raining” scores -0.75 while “It  will surely stop raining soon” scores +0.75 . In Wikipedia terms, modality  is  sometimes  referred  to  as weaseling  when  the  impression  is  raised  that  something important  is  said,  but  what  is  really vague  and  misleading  (Farkas  et  al.,  2010).  For  example: “some people claim that” or “common sense dictates that”."
from  pattern.en import parsetree
from pattern.en import  modality
print  modality(parsetree('some people claim that')) # 0.120

* 2011 Morante, R., & Daelemans W.   (2011).  Annotating Modality and Negation for a Machine Reading Evaluation.   QA4MRE at CLEF 2011:  http://clef2011.org/resources/proceedings/Overview_QA4MRE_Pilot_Clef2011.pdf

* Palmer (1986) maakt een driedeling; hij onderscheidt dynamische, deon-tische en epistemische modaliteit.

* Memory-Based Resolution of In-Sentence Scopes of Hedge Cues (2010) - Roser Morante, Vincent Van Asch, Walter Daelemans
http://www.aclweb.org/anthology/W/W10/W10-3006.pdf 
(CLiPS paper, 2010, als onderdeel van de CoNLL-2010 uitdaging, waarvoor *waarschijnlijk* de modality.py gemaakt is.)
*about hedging:
*The term hedging is originally due to Lakoff (1972). 
*Palmer (1986) defines a term related to hedging, epistemic modality, which expresses the speaker’s degree of commitment to the truth of a proposition. 
*Hyland (1998) focuses specifically on scientific texts. He proposes a pragmatic classification of hedge expressions based on an exhaustive analysis of a corpus.
*
* Stating with Certainty or Stating with Doubt: Intercoder Reliability Results for Manual Annotation of Epistemically Modalized Statements (2007) - Victoria L. Rubin, Faculty of Information and Media Studies, University of Western Ontario 
http://aclweb.org/anthology/N/N07/N07-2036.pdf
(comp. linguistic research to the certainty levels of statements in a written news discourse)
*ABSOLUTE (defined as a stated unambiguous indisputable conviction or reassurance)
*HIGH (defined as hesitancy or stated lack of clarity or knowledge)
*MODERATE (i.e., high probability or firm knowledge)
*LOW CERTAINTY (i.e., estimation of an average likelihood or reasonable chances)
*UNCERTAINTY (i.e., distant possibility)
in three news pragmatic contexts: (three contextual dimen-sions relevant to news discourse)
*perspective
*attributes explicit certainty either to the writer or two types of reported sources – direct participants and experts in a field.
*focus
*separates certainty in facts and opinions
*time
*is an organizing principle of news production and presentation, and if relevant, is separated into past, present, or future.


Pattern classifier 
modality.py

*def modality(sentence, type=EPISTEMIC):
*    """ Returns the sentence's modality as a weight between -1.0 and +1.0.
*        Currently, the only type implemented is EPISTEMIC.
*        Epistemic modality is used to express possibility (i.e. how truthful is what is being said).
*    """

# "likely" => weight 1, "very likely" => weight 2
# "likely" => score 0.25 (neutral inclining towards positive).

# Numbers, citations, explanations make the sentence more factual.

if m == 0:
*return 1.0 # No modal verbs/adverbs used, so statement must be true.

#---------------------------------------------------------------------------------------------------

# Celle, A. (2009). Hearsay adverbs and modality, in: Modality in English, Mouton.
# Allegedly, presumably, purportedly, ... are in the negative range because
# they introduce a fictious point of view by referring to an unclear source.

#---------------------------------------------------------------------------------------------------

# Tseronis, A. (2009). Qualifying standpoints. LOT Dissertation Series: 233.
# Following adverbs are not epistemic but indicate the way in which things are said.
# 1) actually, admittedly, avowedly, basically, bluntly, briefly, broadly, candidly, 
#    confidentially, factually, figuratively, frankly, generally, honestly, hypothetically, 
#    in effect, in fact, in reality, indeed, literally, metaphorically, naturally, 
#    of course, objectively, personally, really, roughly, seriously, simply, sincerely, 
#    strictly, truly, truthfully.
# 2) bizarrely, commendably, conveniently, curiously, disappointingly, fortunately, funnily, 
#    happily, hopefully, illogically, interestingly, ironically, justifiably, justly, luckily, 
#    oddly, paradoxically, preferably, regretfully, regrettably, sadly, significantly, 
#    strangely, surprisingly, tragically, unaccountably, unfortunately, unhappily unreasonably

#---------------------------------------------------------------------------------------------------

# The modality() function was tested with BioScope and Wikipedia training data from CoNLL2010 Shared Task 1.
# See for example Morante, R., Van Asch, V., Daelemans, W. (2010): 
# Memory-Based Resolution of In-Sentence Scopes of Hedge Cues
# http://www.aclweb.org/anthology/W/W10/W10-3006.pdf
# Sentences in the training corpus are labelled as "certain" or "uncertain".
# For Wikipedia sentences, 2000 "certain" and 2000 "uncertain":
# modality(sentence) > 0.5 => A 0.70 P 0.73 R 0.64 F1 0.68