ALGOLIT 7-10-16
Piero, Gijs, Manetta, An, Hans, Natacha
Works
___Writing with Film (Gijs) in exhibition in Groningen
will be in another exhibition on 22-10
website (published soon)
continuation of workshop of last year
voice recognition software to listen to video & make transcription automatically
with the aim to generate speeches of Obama
started using Markov Chains (Relearn)
the current version includes tools to scrape content (download Obama's weekly address from Youtube 340 speeches in between 2010-2016, 3-5 min each, and getting the transcriptions from thewhitehouse.gov)
total video material: 340 * 4 minutes = 1360 min =
created 2 db
* with clips, Gijs uses RoboMongo as a GUI for the Mongo database that is used - it recognises words, speech, silence <sil>, noise using PocketSphinx for the speech-to-text conversion
* another database with the transcriptions that are available at thewhitehouse.gov
-> is there much difference between Pocket Sphinx transcriptions & automatic transcriptions?
NN: grab official transcriptions, feed them to NN, recreate videos from that -> not very interesting
Gijs made different interfaces into the database
* DISCOVERIES:
ngram interface: Obama uses the same sentences in his weekly addresses, and the interface shows ngrams based on automatic trnacriptions that occur in different video's with obama speaking them, sometimes with 4 years of difference & exact same sentences
comparing the PocketSphinx output with the transcriptions
generates scored db of clips based on Levenhstein Distance algorithm with points: if distance is 0, it is very precise
it splits sentences for each source (official transcript on the left / generated transcription on right), looks for words that agree, if more words agree, it adds a whote space in right list
-> words get penalised or rewards...
*short sidetrack Levenshtein Distance algorithm - how precise are generated transcriptions
*-> shows difference between written & spoken language, reading from an autocue
*- 4 elements for left, 4 elements for right. the left list of elements is the right one (the transcription), and will not change. the right (the pocketsphinx output) list needs to adjust to the words in the left, to be as similar as possible
*- first step is to calculate all the options to order these elements
*- an element is a word, listing the possible orders is needed in order to merge multiple words into one (which happened for example with the 500.000 / parthunderdthousand)
*- it creates an anchor between 2 similar words & moves the word up in the list ('fold them up')
*- numbers are code!
*- repetition creates noise: it recombines with words that appear further on
*
an automated Obama speeches generator, an artificial Obama version 0.1
- Obama only being silent
- Markov chain generator, no rhythm yet: uses limites set of videos / not every video has title
-> turn Obama in chatbot you can talk to
-> create hologram: https://pbs.twimg.com/media/Ct7eiAGXYAAADzH.jpg
-> publish code online, and perhaps a log text explaining the different steps
-> interface: 15 ngrams with 2 months difference or 7 y apart
-> comparison between Obama's speeches and other people reusing parts of his texts
-> 1-1-17: Obama forever! refrains & chorus
try to understand what refrains are, time lapse could help & context of existing videos, show technical team behind him, he is on 'autopilot'
-> including the Youtube comments or ranking rates
-> compare with interviews with Obama on similar topics
-> create hologram choir / cfr Transcendence
quotes:
"Donald Trump is a markov chain"
___I could have written that: algorithmic writing/reading machine based on Pattern sentiment/opinion analysis (Manetta)
long way from Cqrrelations to graduation project
how to go on
quote of Jopseph Weisenbaum (Eliza): machines start to act in wonderous ways, sceptical, inviting people to look into how it is made & give them teh feeling 'I could have written that'
* workshop
* poster series
* thesis
interested to see how textmining works, find ways to communicate
bridge way text mining is presented (corporations/acadmeics) & processes that happen in back
used Pattern / look into rethorics (presenting technique as objective thruth)/ look into metaphores (ex mining machine, raw)
looked into modality.py: -1 is negative, +1 is positive (very certain), in between is uncertain
beautiful topic, because outcomes of textmining are not absolute
used this script to let it speak about itself
beautiful interface!
-> not automatically generated, used modality.py to add colour codes (from clear red to clear green), underlining for weasel words (wikipedia / 'to weasel your way out of a situation', practice of lawyers & politicans), exact numbers vs floaty certainties/confidence
-> would be itneresting to apply weasel words to legal texts
in workshop posters were introduction on the side
workshop: introduction on text mining & interface for people without technical background
make your own oracle
* pick binary opposition
* copy your own text or use api to scrape Twitter & co
* different options to produce rule based system, ranges words from -7 till +7 using TF/TF-IDF
* print results of scored words, outputs matrix
* use ranged list to write new sentences & add average of the number
-> a way to check if your language is fitting in a certain context: ex job letter, google ranking, legal texts, parodies
-> use output for other algorithms
next step: turn it into a supervised model, correct & improve your results, annotation (now done by Twitter or manually)
-> visualise how the reading goes, how the machine sees the language
-> last step: generate similar posters based on their scores
examples of metaphores: http://www.wwbp.org/ f;ex. look into demos linked to Big Five personality tests (done by 75000 people), not clear what they do with it
test is reductionist, but becomes decision making tool
how to ocntextualize it in the right way / show reduction / judge in a quantified way with nuance -> algorithmic critique
how to avoid the binary
evaluation takes time
people behave in accordance to judging system
___Frankenstein Botparade: IRC bots using natural language techniques & publication (Piero, An)
the start of the project was python for beginners
Needed a context to work on the project
curator Roland Fischer invited the group to take part of the Mad scientist Festival in Bern, Switzerland.
Topic of fesitval Artificial Intelligence. The curator proposed to work on Frankenstein.
the context that the Frankenstein story has been written 200 years ago and Switzerland was a motivation
Many participants starting programmers. James was active as a programmer and most experienced.
each member of the group contributed to the project: An, Piero, James, An, ?, ?,
and various bots were made: some where more metaphorical, others more technical
The project was presented in the form of a workshop in Natural History museum. In the end no participants, not many freelancers, students still on Holidays. Instead of a workshop it became a residency with a publication in the end: originally this publication would have been a reflection / report of the workshop. In the preparations to the workshop focused on the book.
http://researchcatalogue.net/view/297607/297608
The publication focuses on chapters 1 - 5 in which the monster in the story grows.
The 4 letters contextualize the publication.
Sarah garcin brought a PJ-machine: machine to design spreads. Usoing buttons text and images could be selected. Modififed in size or line-height. In one public evening this machine was very much used.
Make-up bot: very visusual analisys. In this bot all the o-s are replaced by zero's
ugly bot, for Frankenstein everbody was running away,
spoiler bot: learning definitions. Based on the chapter where Frankenstein finds dictionaries in a shed.
Cow-bot: old ascii bot. The drawing change based on the chats is observes
Annotating bot: comment texts in the chat
Neural-network bot: generate texts based on the first 5 chapters. Because of limited time, but also in this chapters the monster speaks himself.
Bot which kills all the participants in the chat, as the monster is so bitter
and bots that had physical responses: one bot gave electrical impulses to a device attached to James' arm, and another bot produces sound through a speaker, on which [non-metionium?] material responded to the movements and changed into various forms.
On the 27th of October there will be a vinnesage in Constant, where the spying bot will be the taken as a starting point
There is an interest to spend more time with the neural network bot.
for this, we could use different frameworks of neural networks: karpathy char-rnn (https://github.com/karpathy/char-rnn) / tensorflow (which is easier)
Scripts & publication:
http://www.algolit.net/frankenstein/
___On Journey with Hovelbot (Constant V) - IRC chat at 16h
http://pad.constantvzw.org/public_pad/hovelbot_vitrine
------------------------------------------------------
Agenda & topics for 2016/7
* Proposal to invite - as a program for 2 days 'algolit seminar':
- Marc Matter, history of algorithmic art & concrete poetry
- Publishing House 0x0a, from Berlin http://0x0a.li/en/page/2/
- specialist on Generative Grammar
* Algorithmic Models for text analysis & connect to context where it is used & metaphors around it & visualisation
- neural networks, karpathy char-rnn (https://github.com/karpathy/char-rnn) or tensorflow (which is easier)
- Vladimir Propp & dada engine / Claude Levy Strauss on myths (structuralism) / generative grammar
- supervised ML - Uncertainty Detected
- word2vec
- Levenhstein Distance: Word folding / expanding
- writing-interfaces: rule based / super vised / unsupervised - visualizing a machine learning process
- bootstrapping (unsupervised)
- Algorithmic agents / bots - Franco Moretti book on Distant Reading / harvesting botfamily / bottarium
- XMPP bots - Immaterial Labour Zine: ILZ on XMPP http://ilu.servus.at/ & the call for bots http://lurk.org/groups/hsc/messages/topic/5WW8CUSyl5fEdBwIfxF5Wu/
- WordNet - connecting to storytelling device
generative grammar
- metaphor analysis through text analysis tools - Metaphor Lab's VU Amsterdam Metaphor Corpus http://www.vismet.org/metcor/search/
- use mailinglists archives as input for analysis
* making a collection of algoritmic litterary works on the algolit wiki
algorithmic agents can report back on the wiki
* references
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin: https://web.stanford.edu/~jurafsky/slp3/
Dates
3 November - Uncertainty Detected / supervised ML using scikit-learn, look into visualisation using matplotlib/js & context on Supervised ML
___Uncertainty Detected: supervised ML software to detect uncertainty in scientific papers & visualisations (Gijs & An)
25 November - intro on Neural Networks (and built up a neural-net dedicated computer / XMPP bots)
16 December - neural network
20 January - neural network
10 February
17 March
21 April
19 May
23 June