----

TITLE 
Conscients: Hans, Manetta, Femke (Roel)


Text generation project
DESCRIPTION

more: http://pad.constantvzw.org/p/text_generation


Training Common Sense
Where and how can we find difference, ambiguity and dissent in pattern-recognition?

"Forget taxonomy, ontology, and psychology. Who knows why people do what  they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves." Chris Anderson (2008) http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory

-> This track is co-organised in close collaboration with the Text generation project [LINK], and will partially overlap.

What kind of assumptions do we encounter when valuating information from the point of view of an algorithm? In what way does the introduction of pattern-recognition allow (or makes impossible) difference, ambiguity and dissent? Through exploring the actual math and processes of pattern-recognition together, and by studying and experimenting with software packages (Pattern, ...), methods and reference-libraries (WordNet. ...) we would like to understand better what agency human and computational actors might have in the co-production of 'common sense'.

Pattern-recognition is a method applied in all kinds of data-mining applications. industry aimed at producing predictable, conventional and plausible patterns within a dataset. In other words it is about avoiding exceptions, uncertainties and surprises. It promises to have overcome ideology and the need for models by letting the data 'speak' for itself, but it relies on the extrapolation of the common sense of human actors. In order to start recognizing patterns in a set of data, normalization is applied on many interconnected levels. While arranging categories, annotating a training set, and in comparing to a so called (preset) Golden Standard, mining-algorithms are being trained. All these steps contain acts of normalization. Is the information in such process valuated on its regularity or rather on its average?

Training Common Sense is inspired by discoveries we did during Cqrrelaties (January 2015) but we'll focus on pattern-recognition, not just for text but also for images, 3D-objects etc.

more: http://pad.constantvzw.org/p/commonsense


----------------------------



Text generation project (proposed by Hans L.).

Language  practices are in a large part ritualistic. That aspect makes them also  predictable and open for automation. An interesting question is what  happens to such language practices when they indeed get automated.  Before they have some magic-like qualities, in the sense that words can  create realities or make things happening. Do they retain this quality  when they become algorithmically reproducible? What qualitative change  undergo such language practices when automated?  
Does  it become nonsense? Famous is the Sokal-Bricmont article that got  published in an academic journal, but was produced with automated means.  Aim was to show that post-modern language was nonsense, as it was  impossible to discern between meaningful content and nonsense. But  nowadays also articles on computer science or mathematics can get  automatically produced and sometimes get published in academic journals.
In  fact a lot of legal and administrative language practices are already  automated. E.g. when you buy something on the internet, sign a license  agreement, and so on. The financial markets thrive on automated trade  contracting, like with flash trading. Examples of administrative  practices you can find in all sorts of e-government. So, what happens  when you start to automate your side of the relation and automatically  produce grant and tender applications, make demands for permissions, and  so on?
Ideological language is as well very ritualistic. Alexei Yurchak wrote in Everything was forever, until it was no more : the last Soviet generation  about the production of political speeches, which were constructed from  a set of citations of earlier official texts en developed in a speech  industry of its own. Any meaningful content was avoided in favour of a  hegemony of the form. We can question how far the hegemony of the frame  does create a similar situation in some of our current political speech.  Can we investigate this through automating it? What will an attempt to  automate this form of speech about this language practice? And in  reverse, does the normalizing of language allows its automation and  which effect does it have on the language practice itself.  

This  project is about trying to construct our own text generator(s). Idea is  to look at several existing text generators, their code and how  language has been modelled in it. An interesting example is the  generator of academic computer science articles SCIgen. On http://pdos.csail.mit.edu/scigen/ you can try it out, find links to code and to other text generators. Other examples are the Dada Engine http://dev.null.org/dadaengine/, used for postmodern articles in http://www.elsewhere.org/pomo/ and erotic texts in http://xwray.com/fiftyshades. Further http://thatsmathematics.com/mathgen/, http://projects.haykranen.nl/markov/, http://rubberducky.org/cgi-bin/chomsky.pl, https://twitter.com/letkanyefinish, and more can be found.
Based  on the methods used in these text generators and other proposed methods  in literature, we can try to develop our own generators and explore  their uses. This project has a strong coding part, but also people  without non-coding background can participate by constructing corpora of  texts or drafting templates and text structures for use with these  generators or by developing uses and projects with such text  generators.  

The  results of experiments in automated text generation are an artistic  research in itself and can raise a lot of questions on the status of  text and language. Further artistic use of text generators can be aimed  directly at literary texts, automated theatre on Twitter (tweatre?), …  It can also be used as a building block for artistic and activist  intervention in political, administrative and social practices. From  automated filing of all sorts of requests, over automated artistic  responses on social media to a qualitative upgrade of the noble art of  spamming. It can be integrated with other code like for text analysis of  social media to guide responses, text-to-speech for automated speeches,  …
These  wider uses are probably too ambitious to develop on Relearn. But more  realistic goals are to develop a simple framework which can be used by a  lot of people, some basic experiments and further develop and share a  lot of ideas for its use.


Training common sense

"Forget taxonomy, ontology, and psychology. Who knows why people do what  they do? The point is they do it, and we can track and measure it with  unprecedented fidelity. With enough data, the numbers speak for  themselves." Chris Anderson (2008) http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory

Data-mining is an industry aimed at producing predictable, conventional and plausible results. In other words it is about avoiding exceptions, uncertainties and surprises. It promises to have overcome ideology and the need for models, but relies on the extrapolation of the 'common sense' of human actors.

For this to work, normalization is applied on many interconnected levels. The available dataset needs to be aligned with the desired outcome (or the desired outcome needs to be aligned with the available sources), a 'Gold Standard' validates the training data, while the training data is used to validate a 'Golden Standard'. Available sources include online reviews of goods, desired outcomes include sentiment analysis of what people think of products.

The Relearn-thread we would like to propose, focuses on the process of 'training' that is part of most data-mining practices. As far as we understan by now, this process includes a process of human annotation, which is extrapolated to validate larger datasets. 

In what way the nature of this process allows (or makes impossible) difference, ambiguity and dissent? Through exploring the actual math and processes of datamining together, and by studying and experimenting with reference-libraries and technologies such as WordNet we would like to understand better what agency human and computational actors  might have in the co-production of 'common sense'.

Prepared by: (Roel), Manetta, Femke


common sense = "?"


"In machine learning, one aims to construct algorithms that are able to learn to predict a certain target output." (Mitchell, 1980; desJardins and Gordon, 1995). — http://en.wikipedia.org/wiki/Inductive_bias
A trained algorithm has 'learned to predict', which already contains a speculative act within it. What if we use the fact that our predictions doesn't nescessarely need to find a truth in a near future? We could stretch and scale the type of training-elements we would like to work with. This fiction element could help us to show the absurdity of annotating a certain 'truth'/term/concept with 0's or 1's.

* machine-training elements we could replace:
  (that could touch the problem of common sense)


* problems related to the common-sense results:

Links:
http://groups.csail.mit.edu/vision/SUN/
http://www.cqrrelations.constantvzw.org/1x0/the-annotator/
http://test.manettaberends.nl/machine-training/plot_multioutput_face_completion_001.png
https://en.wikipedia.org/wiki/Bag-of-words_model
http://ooteoote.nl/2015/03/de-dichter-als-informatiemanager/
https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Words_to_watch#Unsupported_attributions
http://sicv.activearchives.org/logbook/template-after-the-fact/

Texts:
Household words, Stephanie A. Smith (U. of Minnesota Press, 2005)
Bernhard E. Harcourt, Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age (U. of Chicago Press, 2007) http://libgen.org/book/index.php?md5=cc10cea0de40bfd17dc6dbc202f80cc3
Gerald Moore, Stuart Elden, Henri Lefebvre. Rhythmanalysis: Space, Time and Everyday Life (Continuum, 2004) http://libgen.org/book/index.php?md5=4D8E81ABDF0AF9055887C40ED0DFEB39
Matteo Pasquinelli, Anomaly Detection: The Mathematization of ?the Abnormal in the Metadata Society (2015) http://matteopasquinelli.com/anomaly-detection/
Nathan Jurgenson, View From Nowhere: On the Cultural Ideology of Big Data, Oct 2014, http://thenewinquiry.com/essays/view-from-nowhere/

Notes on the side:
- "Supervised learning is the machine learning task of inferring a function from labeled training data." The term 'supervised learning' does quite nicely higlight the position of the human in an machine training process. (http://en.wikipedia.org/wiki/Supervised_learning)

--------------------
Subject: [Cqrrelations] The Annotator, report and afterthoughts.
Date: Thu, 07 May 2015 10:44:42 +0200
From: Roel Roscam Abbing <roel@roelroscamabbing.nl>
To: cqrrelations@lists.constantvzw.org


Dear Cqrrelators,

Femke and me finished the report on The Annotator, which together with a
group of Annotators we worked on during Cqrrelations: http://pad.constantvzw.org/p/the_annotator

A few months after Cqrrelations we had digested some impressions and
intuitions and wrote these into the report. The focus of this was the
idea of how 'common sense' is being produced by the self-referential
system of data-selection, the Gold Standard, parsing, desired outcomes
and training. Text-mining requires normalization on all levels of the
process, which for us was exemplified by 'The Removal of Pascal'.

Although the report is a way to round this project up, it is not the
end. Rather we would see it as a beginning to look deeper into these
technological processes of normalization. Perhaps Relearn 2015 is a good
opportunity to continue thinking along these lines. So if you are
interested in collaborating on that please don't hesitate get in touch!

all the best,

R
-------------------