commonsense_kdd

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------

# KDD step 2: data preperation

syntax:

tokenization/normalization (98%)*
simplest thing/important thing
identifying the units in your text
- to read the punctuation, e.g.:
  - - dr.
  - - This is a sentence.

lemmatization:

reduce wordforms to their dictionary item
- is/been/was/be
  - --> belongs to 'to be'
- + plurals --> singulars

syntactical:

part-of-speech tagging
- important elements for object text-mining
  - --> nouns
- for subjective text-mining
  - --> adjectives
word sense disambiguation
- bank / bank
  - --> river bank / money bank
semantic role labeling

pragmatics: (?)

named entity recognition
co-reference resolution (50%)*
- <-- meaning output

*(% refers to accuracy)
from : CLiPS Guy de Pauw, Pattern workshop — Cqrrelations, January 2015

----------------------------------------------------------------------------

pattern.en |es|de|fr|it|nl

- text preperation
- sentiment analysis tool
- WordNet interface
- wordlists interface

pattern.search

- a pattern matching system similar to regular expressions, that can be used to search a string by syntax (word function) or by semantics (word meaning).
- eg.:('{NP} be * than {NP}')