__NOPUBLISH__

Welcome to Constant Etherpad!

These pads are public. To prevent them from appearing in the archive and the RSS feed on Constant, put the word __NOPUBLISH__ (including surrounding double underscores) anywhere in your pad.
  Pads are archived every night around 04:00 CET at http://etherdump.constantvzw.org

  To stay informed about Constant infrastructures, please subscribe to this mailinglist: https://tumulte.domainepublic.net/cgi-bin/mailman/listinfo/infrastructures
  More about the way pads work: https://pad.constantvzw.org/p/etherpads Algolit Session 13 of decembre 2019

Speech recognition and voice assistant.
https://github.com/mozilla/DeepSpeech
https://rhasspy.readthedocs.io/en/latest/
https://susi.ai/about


Runway ML:
    https://runwayml.com/

Articles on Bert:
https://www.quantamagazine.org/machines-beat-humans-on-a-reading-test-but-do-they-understand-20191017/
Describes what BERT is, but also explains what GLUE is (a collection of tests, with training data). Gives a short overview
of the architecture in BERT and the reasoning behind it: try to analyze text more through a tree-structure rather than a sequence
of words.

BERT performs very well on GLUE making it the State-of-the-art algorithm. The article raises the question whether BERT performs
so well because it truly understands language or whether it has discovered some unintentional patterns in the training data. One of
the solutions of the field is to come up with a better test: SuperGLUE, but still, if the model also passes this test: 
"...does it mean that machines can really understand language any better than before? Or does just it mean that science has gotten better at teaching machines to the test?"

https://towardsml.com/2019/09/17/bert-explained-a-complete-guide-with-theory-and-tutorial/

Cristina ref: The National Algorithm
https://sjef.nu/portfolio/the-national-algorithm/
Gijs: a recipe for nationalistic camouflage, heavily relying on a Photoshop filter ( https://en.wikipedia.org/wiki/Perlin_noise ).
     
Morning: Bert +Maison du Livre
Afternoon: Experiment with Bert 

OMISSUM

Reading on the pad for omissum:
    - https://pad.constantvzw.org/p/omissum
    
    Hans: What does the "although milder in tone" refers to?
    - 
    
https://credo.library.umass.edu/view/pageturn/mums312-b015-i002   -> this is quite shocking
Classification as complexity reduction.


Gutemberg: is 70 years old most of the time: so the spirit of the times from 1970 and earlier -> has repercussion on the place of women and races.

Sentence advices: -> black vs white: recommendation but does not take into accound the context.
Is the dataset a good sample fo the population?
Amercian vs European: should the population be classified by race: make discrimination visible but also create discimination and reduce complexity as well.

NLP TASK for benchmarking

https://gluebenchmark.com/ - GLUE Task and explanation 
one of the task is Cola: https://nyu-mll.github.io/CoLA/
Where it becomes interesting is that Cola is checking for Semantic Violation and they give the example :                                                                                
                        
                                
                                        
Kim persuaded it to rain. 
                                
                        
                
   - we can see how such a model would prevent some poetic use of the language and straightjackt it into a fixed thing. 
   Also: How does a model account for the mutation of language?
         
https://rajpurkar.github.io/SQuAD-explorer/ ->allow to explore the task the models are trying to resolve: questions and ground truths and how the different models answer them.

BERT and GTP2

What are the task of BERT


What is the exact dataset of BERT (can people read it or is it just too voluminous?)


Hello world of BERT:
    glovetech-> BERT is more diverse you have to train it again?

-> distilbert:
    https://arxiv.org/abs/1910.01108
    
BERT and Sentiment Analysis:
    -> https://medium.com/southpigalle/how-to-perform-better-sentiment-analysis-with-bert-ba127081eda


Write with transformers:
    https://transformer.huggingface.co/doc/distil-gpt2
    -> maybe possibility to reverse engineer the algorithm and check with the names/

hugging Face transformers: looks like a one stop ressource for NLP models, they also work on making them lighter (and more eco-conscious):
    https://github.com/huggingface/transformers



some links on how to practically use the GPT-2 model:
https://medium.com/@mapmeld/deciphering-explainable-ai-with-gpt-2-528611a3c75
https://github.com/huggingface/transformers?source=post_page-----528611a3c75----------------------
https://lambdalabs.com/blog/run-openais-new-gpt-2-text-generator-code-with-your-gpu/
https://minimaxir.com/2019/09/howto-gpt2/ - https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce#scrollTo=H7LoMj4GA4n_


A nice explanation of the techniques behind the model: http://jalammar.github.io/illustrated-gpt2/

To get the tensors ( https://github.com/huggingface/transformers/issues/1458      ):
    
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')  # or any other checkpoint
word_embeddings = model.transformer.wte.weight  # Word Token Embeddings 
position_embeddings = model.transformer.wpe.weight  # Word Position Embeddings 

To get the coordinates of the word 'human':
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text_index = tokenizer.encode('human',add_prefix_space=True)
vector = model.transformer.wte.weight[text_index,:]

https://huggingface.co/transformers/model_doc/gpt2.html -> classes and methods Transformer package
When defining a model, you can define the dimensionality of the embeddings. Default is 768 for the GPT2-model.



    
TRANSFORMER:
    
MAISON DU LIVRE

19/12 à 10h, à la Maison du Livre 28 rue de Rome à St-Gilles

La Maison du Livre:

From the document they sent:
    "  Beyond the questions of support and dissemination, digitalization opens up new ways of writing. Will the architecture of the story free itself from linearity? Will the stories write themselves? Will the spectator be able to intervene in the course of the story?  "
    
Ideas for Maison du Livre:
    - growing a tree with object recognition and google image search.
    - help GPT2 write their political manifesto (possibility to vote for the next generated sentence)/ possiblity to add prompts?
    - How to convince an algorithm: two computer with GPT2 text generation, the audience tries to nudge them into a right-leaning or left-leaning manifesto by writing prompts.
    - A classifier for left or right leaning text? liberal or conservative?   (I suggest taking other categories than left/right or liberal/conservative but rather use current concrete and conflicting political issues.)  
    
    To do next steps:
        - Figuring out who is interested and making sure it is diverse ( how can we include more women, Would elodie be interested in participating?, are Christina and An in?)
        - Ask maison du livre how they curated artists (did they talk with Constant)
        - Can a work be a workshop by constant ?
        - Set up another meeting for participants (can be a shorter one as well)
        - Call An for organisation check-up
        
        
IDEAS for next sessions:
    
    - speech to text/ text to speech
    - text to sing
    

Just for fun: https://ai-adventure.appspot.com/   -  https://www.theverge.com/tldr/2019/12/6/20998993/ai-dungeon-2-choose-your-own-adventure-game-text-nick-walton-gpt-machine-learning