Welcome to Constant Etherpad!
These pads are archived each night (around 4AM CET) @
http://etherdump.constantvzw.org/
An RSS feed from the etherdump also appears on
http://constantvzw.org/
To prevent your public pad from appearing in the archive and RSS feed, put or just leave the following (including the surrounding double underscores) anywhere in the text of your pad:
__NOPUBLISH__
Changes will be reflected after the next update at 4AM.
Reduction
Intro Constant FS 3min
What is Constant, What is this set of scripts (From 2000 (input) to a 1000 words (output) in many machinic ways) and why did we prepare them
compression -- lossy lossless -- value -- perception
different cultures, ways of thinking reduction
FILTERS
[5 m, website, geen plaatjes of 1 plaatje?]
stopwords.py AM 3min
"""
Input texts are checked against occurences of certain words included in a list of "stopwords" established by NLTK (Natural Language Toolkit). These words are then removed.
In data mining, text processing and machine learning, these so-called high frequency words are filtered out before or after natural language data is processed. Relational words such as 'the', 'is', 'at', 'which', and 'on' are considered redundant because they are too frequent, and meaningless once the word order is removed.
http://www.ranks.nl/stopwords
http://etherbox.local/var/www/nltk_data/corpora/stopwords/
"""
[lijst van stopwoorden, cgi -- "A bag but is language nothing of words"]
disappearance.py FS 3min
"""
This script goes through the input text word by word. Every duplicate word and its subsequent occurence is removed, until the desired reduction is reached.
Disappearance is inspired by a script by the same name developed by Stephanie Villayphiou and Alex Leray, that takes a .srt file as an input. It appeared for the first time in the context of Timed Text, a workshop that considered writing, reading and listening as parallel but interacting tracks (Constant, 2010).
http://activearchives.org/wiki/Disappearance
http://activearchives.org/wiki/Kitchen_table_workshop
"""
rewrite.py AM min
'''
Keep your summary in style with Rewrite.py!
Using Markov Chains, texts are rewritten and reduced based on the word pairs (n-grams) present in the original text. Markov chains are widely used in attempts to pass messages through spam filters. Anti-spam software uses Bayesian analysis that is based on the Markov chain to keep up with spam techniques. The Markov generator begins by organizing the words of a source text stream into a dictionary, gathering all possible words that follow each chunk into a list. Then the Markov generator begins recomposing sentences by randomly picking a starting chunk, and choosing a third word that follows this pair. The chain is then shifted one word to the right and another lookup takes place and so on until the document is complete. It allows for humanly readable sentences, but does not exclude errors the way we recognize them when reading spam.
This script is based on a workshop by Sebastian Luetgert in the context of »(Re-)Constructing Authorship« (Stuttgart, 2015). Markov Chain was also performed as a reading/writing game by Brendan Howell, Catherine Lenoble and An Mertens, by then members of Algolit, the Constant working group around F/LOSS literature, texts & code:
http://algolit.constantvzw.org
.
https://en.wikipedia.org/wiki/Markov_chain
[commandline, laat lopen, "typen" -> Death of the Author, PD etc.]
'''
[plaatje; interest in indexing systems, grey literature]
automatic_summary.py
FS
3min
"""
To produce automatic summaries is a useful machinic task that many people already have had a go at. This relatively simple script uses the ubiquitous Natural Language Processing Kit (NLTK), and does a reasonably good job. The summarizer first determines the frequencies of words in the document, splits the document into a series of sentences, creates a summary by including the first sentence that contains most of the most frequent words. Finally the sentences are reordered back into the order of the original document.
https://github.com/thavelick/summarize
Requires nltk and numpy with the stopwords corpora installed
"""
[ -> ]
Herschrijven met enkel onzekere of zekere zinnen
AM 3min
Sentiment analysis
http://www.cqrrelations.constantvzw.org/0x0/
[laat script zien -> machine learning, cqrrelaties]
//////////////////////////////////////////////////////////////////////////////////////////////
encryption_text_md5.py FS 2min
'''
encryption_text_md5.py provides the ultimate reduction (although at the expense of human as well as machine legibility) by encrypting your text as a 128-bit hash value. The resuIt can be reversed by a 'brute-force attack', in this case trying to match your encrypted text with all existing texts.
The MD5 algorithm is a widely used hash function producing a 128-bit hash value. It is one in a series of message digest algorithms designed by Professor Ronald Rivest of MIT (Rivest, 1992). Although MD5 was initially designed to be used as a cryptographic hash function, it has been found to suffer from extensive vulnerabilities. It can still be used as a checksum to verify data integrity, but only against unintentional corruption. Like most hash functions, MD5 is neither encryption nor encoding.
'''
[??]
encryption_lines_md5.py
'''
encryption_lines_md5.py provides the ultimate reduction (although at the expense of human as well as machine legibility) by encrypting every line of your text as a 128-bit hash value. Each hash value can of course be reversed again if you try to match it with every single line of every single text existing.
The MD5 algorithm is a widely used hash function producing a 128-bit hash value. It is one in a series of message digest algorithms designed by Professor Ronald Rivest of MIT (Rivest, 1992). Although MD5 was initially designed to be used as a cryptographic hash function, it has been found to suffer from extensive vulnerabilities. It can still be used as a checksum to verify data integrity, but only against unintentional corruption. Like most hash functions, MD5 is neither encryption nor encoding.
'''
encryption_text_sha1.py
'''
encryption_text_sha1.py provides the ultimate reduction (although at the expense of human as well as machine legibility) by encrypting your text as a 128-bit hash value. The resuIt can be reversed by a 'brute-force attack', in this case trying to match your encrypted text with all existing texts.
In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographic hash function designed by the United States National Security Agency and is a U.S. Federal Information Processing Standard published by the United States NIST in 1993. SHA-1 produces a 160-bit (20-byte) hash value known as a message digest. A SHA-1 hash value is typically rendered as a hexadecimal number, 40 digits long.
'''
encryption_lines_sha1.py
'''
encryption_lines_sha1.py provides the ultimate reduction (although at the expense of human as well as machine legibility) by encrypting every line of your text as a 128-bit hash value. Each hash value can of course be reversed again if you try to match it with every single line of every single text existing.
In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographic hash function designed by the United States National Security Agency and is a U.S. Federal Information Processing Standard published by the United States NIST in 1993. SHA-1 produces a 160-bit (20-byte) hash value known as a message digest. A SHA-1 hash value is typically rendered as a hexadecimal number, 40 digits long.
'''
=============================================
Automatic keyword extraction
"A Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons."
https://github.com/aneesha/RAKE
Perverse index
Indexing text, search index
woosh
Posting list
Bi-grams/trigrams
bigrams.py