Algemeen: http://pad.constantvzw.org/p/certainty
Vragen: http://pad.constantvzw.org/p/certainty_questions
modality.py close reading: http://pad.constantvzw.org/p/certainty_modality.py_close_reading
Modality paper notes: http://pad.constantvzw.org/public_pad/certainty_notes_Modality-and-Negation

*BIOSCOPE CORPUS*
The bioscope corpus is used to test/train(?) the modality.py script, in the context of the CoNLL-2010 shared task 1.

links
* Description
CoNLL-2010 shared task 1 description: http://rgai.inf.u-szeged.hu/conll2010st/tasks.html#task1
official bioscope corpus page: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-S11-S9
bioscope annotation guidelines: http://rgai.inf.u-szeged.hu/project/nlp/bioscope/Annotation%20guidelines2.1.pdf
paper about the annotation of the bioscope corpus: http://www.clips.ua.ac.be/NeSpNLP2010/nespnlp2010-proceedings.pdf#page=40
Made in 2008
"This article reports on a corpus annotation project that has produced a  freely available resource for research on handling negation and  uncertainty in biomedical texts (we call this corpus the BioScope  corpus)... The dataset contains annotations at the token level for negative and  speculative keywords and at the sentence level for their linguistic  scope."

"The annotation process was carried out by two independent linguist  annotators and a chief linguist – also responsible for setting up the  annotation guidelines – who resolved cases where the annotators  disagreed. The resulting corpus consists of more than 20.000 sentences  that were considered for annotation and over 10% of them actually  contain one (or more) linguistic annotation suggesting negation or  uncertainty. ... The corpus consists of texts taken from 4 different sources and 3  different types in order to ensure that it captures the heterogeneity of  language use in the biomedical domain. We decided to add clinical  free-texts (radiology reports), biological full papers and biological  paper abstracts (texts from Genia)."

"Apart from the intended goal of serving as a common resource for the  training, testing and comparing of biomedical Natural Language  Processing systems, the corpus is also a good resource for the  linguistic analysis of scientific and clinical texts."
see also paper as pdf: http://rgai.inf.u-szeged.hu/project/nlp/bioscope/bioscope_cameraready.pdf

* Downloads
-> convert xml to csv: http://askubuntu.com/questions/174143/convert-xml-to-csv-shell-command-line
annotated bioscope corpus - abstracts only: http://rgai.inf.u-szeged.hu/project/nlp/bioscope/abstracts_pmid.xml (annotated on negation and speculation (on token level) and linguistic scope of these words (on sentence level) )
annotated bioscope corpus - full articles: http://rgai.inf.u-szeged.hu/project/nlp/bioscope/full_papers.xml (annotated on negation and speculation (on token level) and linguistic scope of these words (on sentence level) )
a sample of the bioscope dataset that is used for task 1 on the CoNLL competition: http://rgai.inf.u-szeged.hu/conll2010st/trial_Task1.zip (annotation on certain and uncertain per sentence, specific cues are marked only in the uncertain sentences) --> is this a test- of trainingset? can be both :-)

example of the annotated corpus (from the 'abstracts only' version):
*<sentence id="S1.6">When U937 cells were infected with HIV-1, <xcope id="X1.6.3"><cue type="negation" ref="X1.6.3">no</cue> induction of NF-KB factor was detected</xcope>, whereas high level of progeny virions was produced, <xcope id="X1.6.2"><cue type="speculation" ref="X1.6.2">suggesting</cue> that this factor was <xcope id="X1.6.1"><cue type="negation" ref="X1.6.1">not</cue> required for viral replication</xcope></xcope>.

example of the CoNLL sample dataset:
*<sentence id="S7.105" certainty="uncertain">To distinguish which tissues require ADGF-A expression for proper development, we <ccue>tested</ccue> for rescue of adgf-a lethality by expressing ADGF-A in specific subsets of larval tissues.</sentence>
*<sentence id="S7.205" certainty="certain">We produced a loss-of-function mutation in the ADGF-A gene, which produces a product (ADGF-A) with ADA activity.</sentence>