Notes: Modality and Negation: An Introduction to the Special Issue (2012) by Roser Morante, University of Antwerp & Caroline Sporleder, Saarland University http://www.anthology.aclweb.org/J/J12/J12-2001.pdf “Certainty”? & “Modality”? *Proposition: the container of truth or falsity value of declarative sentences. [...] *Propositional aspects of meaning: elements in sentences that are presented as factual. *Extra-propositional aspects of meaning: *“a further step towards text understanding”(223) *linguistic constructions that give an indication of the degree of commitment of the speaker to the truth of a proposition. *“there is more to meaning than just propositional content is a long-held view”(223) *the attitude of the speaker towards her statements in terms of degree of certainty, reliability, subjectivity, sources of information, and perspective. (225) *Epistemic modality: *“expresses the speaker’s degree of commitment to the truth of a proposition”. (http://www.aclweb.org/anthology/W/W10/W10-3006.pdf) *Epistemic modals are used to indicate the possibility or necessity of some piece of knowledge. (Wikipedia, Linguistic_modality) Traditionally, most research in NLP has focused on propositional aspects of meaning. To truly understand language, however, extra-propositional aspects are equally important. Modality and negation typically contribute significantly to these extra-propositional meaning aspects. Researchers have started to work on modeling factuality, belief and certainty, detecting speculative sentences and hedging, identifying contradictions, and determining the scope of expressions of modality and negation. In this article, we will provide an overview of how modality and negation have been modeled in computational linguistics. 1. introduction grammatical phenomena One of the first categorizations of modality is proposed by Otto Jespersen (1924 = The Philosophy of Grammer) in the chapter about Mood, where the grammarian distinguishes between “categories containing an element of will” and categories “containing no element of will.” *from The Philosophy of Grammer: *Would it be possible to place all "moods" in a logically consistent system? This was attempted by grammarians more than a hundred years ago on the basis of first Wolff's and then Kant's philosophy. The former in his Ontology had the three categories, possibility, necclISity and contingency, a.nd the latter under the head of "modality" the three of possibility, existence, a.nd necessity; Gottfricd Hormann then gave the further subdivisions: objective possibility (conjunctive), subjective possibility (optative), objective necessity (Greek verba.l adjectives in -teos) and subjective necessity (imperative). (Jespersen, 1924) extra-propositional meanings to the event LAY OFF(GM,workers): *a. GM will lay off workers. *b. A spokesman for GM said GM will lay off workers. *c. GM may lay off workers. *d. The politician claimed that GM will lay off workers. *e. Some wish GM would lay of workers. *f. Will GM lay off workers? *g. Many wonder whether GM will lay off workers. Generally speaking, modality is a grammatical category that allows the expression of aspects related to the attitude of the speaker towards her statements in terms of degree of certainty, reliability, subjectivity, sources of information, and perspective. We understand modality in a broad sense, which involves related concepts like “subjectivity”, “hedging”, “evidentiality”, “uncertainty”, “committed belief,” and “factuality”. So far computational linguistics addressed these two main tasks: *detecting modality *lexical based, but the lexical markers are varied/heterogeneous *for example: 'might', 'this brings us to the largest of all mysteries', 'little was known' *interacts with mood and tense markers *and so discourse factors(?) do (224) *the resolution of the scope of modality Modality recognition is used for: *textual entailment (meaningful relations) *machine translation *trustworthiness detection *classification of citations *clinical and biomedical text processing *identification of text structure Most of the work in this area has been carried out at the sentence or predicate level. 2. Modality From a theoretical perspective, modality can be defined as: *as a philosophical concept, *as a subject of the study of logic *as a grammatical category. ...modality is a big intrigue.Questions erstwhile considered solved become open questions again. New observations and hypotheses come to light, not least because the subject matter is changing. (Salkie, Busuttil, and van der Auwera (2009, page 7)) To mention some examples, research focuses on *categorizing modality, *on committed belief tagging, *on resolving the scope of hedge cues, *on detecting speculative language, *and on computing factuality. These concepts are related to the attitude of the speaker towards her statements in terms of degree of *certainty, *reliability, *subjectivity, *sources of information, *and perspective Theoretical linguistic background of 'modality': *Jespersen (1924, page 329) attempts to place all moods in a logically consistent system, distinguishing between “categories containing an element of will” and “categories containing no element of will” *Lyons (1977, page 793) describes epistemic modality as concerned with matters of knowledge and belief, *“the speaker’s opinion or attitude towards the proposition that the sentence expresses or the situation that the proposition describes.” *Palmer (1986, page 8) distinguishes propositional and event modality *propositional modality *Kate must be at home now. *the speaker’s attitude to the truth-value or factual status of the proposition *divided up in: *epistemic, used by speakers “to express their judgement about the factual status of the proposition,” *evidential, used “to indicate the evidence that they have for its factual status” (Palmer 1986, 8–9). *event modality *Kate must come in now. *events that are not actualized,events that have not taken place but are merely potential *event modality *deontic, which relates to obligation or permission and to conditional factors “that are external to the relevant individual,” *dynamic, where the factors are internal to the individual (Palmer 1986, pages 9–13). Additionally, Palmer indicates other categories that may be marked as irrealis and may be found in the mood system: *future *negative *interrogative *imperative-jussive *presupposed *conditional *purposive *resultative *wishes *fears Fintel (2006), philosophic modality is a category that deals with *mogelijkheid (possibility) *noodzakelijkheid (necessity) The term hedging is originally due to Lakoff (1972, page 195), who describes hedges as “words whose job is to make things more or less fuzzy.”(...) Lakoff starts from the observation that “natural language concepts have vague boundaries and fuzzy edges and that, consequently, natural language sentences will very often be neither true, nor false, nor nonsensical, but rather true to a certain extent and false to a certain extent, true in certain aspects and false in certain aspects” (Lakoff 1972, page 183) In order to deal with this aspect of language, he extends the classical propositional and predicate logic to fuzzy logic and focuses on the study of hedges. (227) Hyland (1998) studies hedging in scientific texts. He proposes a pragmatic classification of hedge expressions based on an exhaustive analysis of a corpus. The catalogue of hedging cues includes modal auxiliaries, epistemic lexical verbs, epistemic adjectives, adverbs, nouns, and a variety of non-lexical cues. Certainty is a type of subjective information that can be conceived of as a variety of epistemic modality (Rubin, Liddy, and Kando 2005). Here we take their definition (page 65): . . . certainty is viewed as a type of subjective information available in texts and a form of epistemic modality expressed through explicitly-coded linguistic means (what are linguistic means?). Such devices [...] explicitly signal presence of certainty information that covers a full continuum of writer’s confidence, ranging from uncertain possibility and withholding full commitment to statements. Modality and evidentiality are grammatical categories, whereas certainty, hedging, and subjectivity are pragmatic positions, and event factuality is a level of information. (228) Modality-related phenomena are not rare. *11% of sentences in MEDLINE contain speculative language. (According to Light, Qiu, and Srinivasan (2004)) *around 18% of sentences occurring in biomedical abstracts are speculative. (Vincze et al. (2008) report) *20% of the events in a biomedical corpus belong to speculative sentences and that 7% of the events are expressed with some degree of speculation. (Nawaz, Thompson, and Ananiadou (2010)) *a significant proportion of the gene names mentioned in a corpus of biomedical articles appear in speculative sentence (638 occurences out of a total of 1,968). This means that approximately 1 in every 3 genes should be excluded from the interaction detection process. (Szarvas (2008)) *59% of the sentences in a corpus of 80 articles from The New York Times were identified as epistemically modalized. (Rubin (2006)) 4. Categorizing and Annotating Modality and Negation categorization schemes annotated corpora 'modality attributes': OntoSem project (Nirenburg and Raskin 2004): modality type *polarity - whether a proposition is positive or negated *volition - the extent to which someone wants or does not want the event/state to occur *obligation - the extent to which someone considers the event/state to be necessary *belief - the extent to which someone believes the content of the proposition *potential - the extent to which someone believes that the event/state is possible *permission - the extent to which someone believes that the event/state is permitted *evaluative - the extent to which someone believes the event/state is a good thing value *0-1 scope *predicate that is affected by the modality of the sentence attributed-to *to whom the modality is assigned (default = speaker) (9) Entrance to the tower should be totally camouflaged In Example (9), should is identified as a modality cue and characterized with: *type obligative, *value 0.8, *scope camouflage, *and is attributed to the speaker. MPQA Opinion Corpus (Wiebe, Wilson, and Cardie 2005) 10,657 sentences in 535 documents of English newswire private state frames *SOURCE of the private state, whose private state is being expressed; *the TARGET, what the private state is about; properties *INTENSITY *SIGNIFICANCE *TYPE OF ATTITUDE Automatic Content Extraction 2008 corpus (Linguistic Data Consortium 2008) English and Arabic texts from a variety of resources including radio and TV broadcast news, talk shows, newswire articles, Internet news groups, Web logs, and conversational telephone speech *modality *asserted *Asserted relations pertain to situations in the real world *If the entities constituting the arguments of a relation are hypothetical, then the relation can still be understood as asserted *We are afraid Al-Qaeda terrorists will be in Baghdad. --> relation between Al-Qaeda and terrorists *other *other relations pertain to situations in “some other world defined by counterfactual constraints elsewhere in the context.” *We are afraid Al-Qaeda terrorists will be in Baghdad. --> unsure relation between terrorists and Baghdad *tense *past *future *present *unspecified TimeML (Pustejovsky et al, 2005) language for events and temporal expressions, used for TimeBank Situation Selecting Predicates (SSPs): *actions *Companies such as Microsoft or a combined world com MCI are trying to monopolize Internet access. *states *Analysts also suspect suppliers have fallen victim to their own success. *perception *Some neighbors told Birmingham police they saw a man running. *reporting *No injuries were reported over the weekend. FactBank: (Saur ?? and Pustejovsky 2009) a corpus of events annotated with factuality information, by using the Square of Opposition (Aristotle) degree of factualities: *fact *counterfact *probable *not probable *possible *not certain *certain but unknown output *unknown or uncommitted --> but in the paper there is another set of categories *certain *not certain *possible *impossible modality lexicon (Baker et al., 2010) http://www.umiacs.umd.edu/~bonnie/ModalityLexicon.txt to automatically annotate a corpus with modality information the lexicon entries structure: *cue sequence of modal words *POS for each word *madality type *a head word *one or more subcategorization codes three components are identified in a sentence: *trigger - the word or sequence of words that expresses modality *target - the event, state, or relation that the modality scopes over *holder - the experiencer or cognizer of the modality eight modalities: *requirement - (does H require P?) *permissive - (does H allow P?) *success - (does H succeed in P?) *effort - (does H try to do P?) *intention - (does H intend P?) *ability - (can H do P?) *want - (does H want P?) *belief - (with what strength does H believe P?) The annotation work by Wilbur, Rzhetsky, and Shatkay (2006) is motivated by the need to identify and characterize parts of scientific documents where reliable information can be found. They define five dimensions to characterize scientific sentences: *FOCUS (scientific versus general) *POLARITY (positive versus negative statement) *LEVEL OF CERTAINTY in the range 0–3 *STRENGTH of evidence *DIRECTION / TREND (increase or decrease in certain measurement). Scientific language makes use of speculation and hedging to express lack of definite belief 5. Detection of Speculative Sentences Three types of text analysis seem to be able to detect speculation: From the research presented in this section it seems that classifying sentences as to whether they are speculative or not can be performed by using knowledge-poor machine learning approaches as well as by linguistically motivated methods.It has also been shown that it is feasible to build a hedge classifier in an unsupervised manner. (241) 10. Final Remarks which aspects of extra-propositional meaning need to be modeled for which applications. Outside sentiment analysis, relatively little research has been carried out in this area so far. most research so far has been carried out on English and on selected domains and genres (biomedical, reviews, newswire). It would also be good to broaden the set of domains and genres (including fiction, scientific texts, weblogs, etc.) since extra-propositional meaning is particularly susceptible to domain and genre effects.