Welcome to Etherpad!

This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!

Get involved with Etherpad at http://etherpad.org
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------

# introduction


Training Common Sense ( proposed by Manetta Berends, Femke Snelting )
Where and how can we find difference, ambiguity and dissent in pattern-recognition?


"Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves." Chris Anderson (2008) http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory

-> This track is co-organised in close collaboration with the Text generation project , http://pad.constantvzw.org/p/text_generation , and will partially overlap.

What kind of assumptions do we encounter when valuating information from the point of view of an algorithm? In what way does the introduction of pattern-recognition allow (or makes impossible) difference, ambiguity and dissent? Through exploring the actual math and processes of pattern-recognition together, and by studying and experimenting with software packages (Pattern, ...), methods and reference-libraries (WordNet. ...) we would like to understand better what agency human and computational actors might have in the co-production of 'common sense'.

Pattern-recognition is a method applied in all kinds of data-mining applications. Data mining is an industry aimed at producing predictable, conventional and plausible patterns within a dataset. In other words it is about avoiding exceptions, uncertainties and surprises. It promises to have overcome ideology and the need for models by letting the data 'speak' for itself, but it relies on the extrapolation of the common sense of human actors (eg. mining-software developers, designers of mining methods, dataset annotators, ...) . In order to start recognizing patterns in a set of data, normalization is applied on many interconnected levels. While arranging categories, annotating a training set, and in comparing to a so called (preset) Golden Standard, mining-algorithms are being trained. All these steps contain acts of normalization. Is the information in such process valuated on its regularity or rather on its average?

Training Common Sense is inspired by discoveries we did during Cqrrelaties (January 2015) but we'll focus on pattern-recognition, not just for text but also for images, 3D-objects etc.



----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------

# Knowdledge Discovery in Data (KDD) steps

http://pad.constantvzw.org/p/commonsense_kdd_ s tep-1 --> data collection
http://pad.constantvzw.org/p/commonsense_kdd_step-2 --> data preperation
http://pad.constantvzw.org/p/commonsense_kdd_step-3 --> data mining
http://pad.constantvzw.org/p/commonsense_kdd_step-4 --> interpretation
http://pad.constantvzw.org/p/commonsense_kdd_step-5 --> determine actions


----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------

# Text mining application examples


"Online social media such as Facebook are a particularly promising resource for the study of people, as “status” updates are self-descriptive, personal, and have emotional content [7]. Language use is objective and quantifiable behavioral data [96],  and unlike surveys and questionnaires, Facebook language allows  researchers to observe individuals as they freely present themselves in their own words. Differential language analysis (DLA)  in social media is an unobtrusive and non-reactive window into the social and psychological characteristics of people's everyday concerns."

"This method can complement traditional assessments, and can quickly and cheaply assess many people with minimal burden."

"Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions."

---

* Facebook messages Gender/Age profiles


* Hedonometer --> http://hedonometer.org/api.html

* CLiPS --> AMiCA



----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------------------- ---------------------------------------------

# notes:


common sense = "?"


"In machine learning, one aims to construct algorithms that are able to learn to predict a certain target output." (Mitchell, 1980; desJardins and Gordon, 1995). — http://en.wikipedia.org/wiki/Inductive_bias
A trained algorithm has 'learned to predict', which already contains a speculative act within it. What if we use the fact that our predictions doesn't nescessarely need to find a truth in a near future? We could stretch and scale the type of training-elements we would like to work with. This fiction element could help us to show the absurdity of annotating a certain 'truth'/term/concept with 0's or 1's.

* machine-training elements we could replace:
  (that could touch the problem of common sense)


* problems related to the common-sense results:

* data mining methods:
"Several data mining methods are particularly suitable for profiling. For instance, classification and clustering may be used to identify groups. Regression is more useful for making predictions about a known individual or group." Discrimination and Privacy in the Information Society (2013), Bart Custers, Toon Calders, Bart Schermer, Tal Zarsky, (eds.) — page 13 

* "Supervised learning is the machine learning task of inferring a function from labeled training data." The term 'supervised learning' does quite nicely higlight the position of the human in an machine training process. ( http://en.wikipedia.org/wiki/Supervised_learning)

Occam's razor, simplification
Galileo Galilei lampooned the misuse of Occam's razor in his Dialogue.  The principle is represented in the dialogue by Simplicio. The telling  point that Galileo presented ironically was that if one really wanted to  start from a small number of entities, one could always consider the  letters of the alphabet as the fundamental entities, since one could  construct the whole of human knowledge out of them.



----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------

# Links


text-mining software
data-mining training-sets 
data-mining model types

other

texts

datasets

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------

Subject: [Cqrrelations] The Annotator, report and afterthoughts.
Date: Thu, 07 May 2015 10:44:42 +0200
From: Roel Roscam Abbing <roel@roelroscamabbing.nl>
To: cqrrelations@lists.constantvzw.org


Dear Cqrrelators,

Femke and me finished the report on The Annotator, which together with a
group of Annotators we worked on during Cqrrelations: http://pad.constantvzw.org/p/the_annotator

A few months after Cqrrelations we had digested some impressions and
intuitions and wrote these into the report. The focus of this was the
idea of how 'common sense' is being produced by the self-referential
system of data-selection, the Gold Standard, parsing, desired outcomes
and training. Text-mining requires normalization on all levels of the
process, which for us was exemplified by 'The Removal of Pascal'.

Although the report is a way to round this project up, it is not the
end. Rather we would see it as a beginning to look deeper into these
technological processes of normalization. Perhaps Relearn 2015 is a good
opportunity to continue thinking along these lines. So if you are
interested in collaborating on that please don't hesitate get in touch!

all the best,

R
-------------------

http://www.imprint.co.uk/data-driven-narcissism-how-will-big-data-feed-back-on-us/
https://en.wikipedia.org/wiki/Abductive_reasoning