Algolit Session 13 of decembre 2019
Speech recognition and voice assistant.
https://github.com/mozilla/DeepSpeech
https://rhasspy.readthedocs.io/en/latest/
https://susi.ai/about
Runway ML:
https://runwayml.com/
Articles on Bert:
https://www.quantamagazine.org/machines-beat-humans-on-a-reading-test-but-do-they-understand-20191017/
Describes what BERT is, but also explains what GLUE is (a collection of tests, with training data). Gives a short overview
of the architecture in BERT and the reasoning behind it: try to analyze text more through a tree-structure rather than a sequence
of words.
BERT performs very well on GLUE making it the State-of-the-art algorithm. The article raises the question whether BERT performs
so well because it truly understands language or whether it has discovered some unintentional patterns in the training data. One of
the solutions of the field is to come up with a better test: SuperGLUE, but still, if the model also passes this test:
"...does it mean that machines can really understand language any better than before? Or does just it mean that science has gotten better at teaching machines to the test?"
https://towardsml.com/2019/09/17/bert-explained-a-complete-guide-with-theory-and-tutorial/
Cristina ref: The National Algorithm
https://sjef.nu/portfolio/the-national-algorithm/
Gijs: a recipe for nationalistic camouflage, heavily relying on a Photoshop filter ( https://en.wikipedia.org/wiki/Perlin_noise ).
Morning: Bert +Maison du Livre
Afternoon: Experiment with Bert
OMISSUM
Reading on the pad for omissum:
- https://pad.constantvzw.org/p/omissum
Hans: What does the "although milder in tone" refers to?
-
https://credo.library.umass.edu/view/pageturn/mums312-b015-i002 -> this is quite shocking
Classification as complexity reduction.
Gutemberg: is 70 years old most of the time: so the spirit of the times from 1970 and earlier -> has repercussion on the place of women and races.
Sentence advices: -> black vs white: recommendation but does not take into accound the context.
Is the dataset a good sample fo the population?
Amercian vs European: should the population be classified by race: make discrimination visible but also create discimination and reduce complexity as well.
NLP TASK for benchmarking
https://gluebenchmark.com/ - GLUE Task and explanation
one of the task is Cola: https://nyu-mll.github.io/CoLA/
Where it becomes interesting is that Cola is checking for Semantic Violation and they give the example :
Kim persuaded it to rain.
- we can see how such a model would prevent some poetic use of the language and straightjackt it into a fixed thing.
Also: How does a model account for the mutation of language?
https://rajpurkar.github.io/SQuAD-explorer/ ->allow to explore the task the models are trying to resolve: questions and ground truths and how the different models answer them.
BERT and GTP2
What are the task of BERT
What is the exact dataset of BERT (can people read it or is it just too voluminous?)
Hello world of BERT:
glovetech-> BERT is more diverse you have to train it again?
- - you have to train it further
- - if you look
-> distilbert:
https://arxiv.org/abs/1910.01108
BERT and Sentiment Analysis:
-> https://medium.com/southpigalle/how-to-perform-better-sentiment-analysis-with-bert-ba127081eda
Write with transformers:
https://transformer.huggingface.co/doc/distil-gpt2
-> maybe possibility to reverse engineer the algorithm and check with the names/
hugging Face transformers: looks like a one stop ressource for NLP models, they also work on making them lighter (and more eco-conscious):
https://github.com/huggingface/transformers
some links on how to practically use the GPT-2 model:
https://medium.com/@mapmeld/deciphering-explainable-ai-with-gpt-2-528611a3c75
https://github.com/huggingface/transformers?source=post_page-----528611a3c75----------------------
https://lambdalabs.com/blog/run-openais-new-gpt-2-text-generator-code-with-your-gpu/
https://minimaxir.com/2019/09/howto-gpt2/ - https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce#scrollTo=H7LoMj4GA4n_
A nice explanation of the techniques behind the model: http://jalammar.github.io/illustrated-gpt2/
To get the tensors ( https://github.com/huggingface/transformers/issues/1458 ):
from transformers import GPT2LMHeadModel
model = GPT2LMHeadModel.from_pretrained('gpt2') # or any other checkpoint
word_embeddings = model.transformer.wte.weight # Word Token Embeddings
position_embeddings = model.transformer.wpe.weight # Word Position Embeddings
To get the coordinates of the word 'human':
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text_index = tokenizer.encode('human',add_prefix_space=True)
vector = model.transformer.wte.weight[text_index,:]
https://huggingface.co/transformers/model_doc/gpt2.html -> classes and methods Transformer package
When defining a model, you can define the dimensionality of the embeddings. Default is 768 for the GPT2-model.
TRANSFORMER:
- Playing around using https://transformer.huggingface.co/
- 1 example using bait words: It is raining today, the streets are
- Text:
- It is raining today, the streets are wet and the sky is dark, and my heart breaks every day for the women and children that die waiting for a lifesaving procedure to save them from this world."
- - Dr. Saffron Burrows "This is a very sad, disturbing story. We hope that everyone will support those women and children who suffered and died waiting for this lifesaving procedure."
- Please, take time to help and share this story with your friends, family and co- workers to give them a chance to feel compassion ."
- - Dr. Elizabeth W. Jones "A tragic story is a story with a tragic ending. My heart breaks for those women and children whose lives were shattered by this operation ."
- - Dr Helen H "The story of a man with brain tumours and a stroke, has touched my heart. The fact that I am not alone with my experience and how it affected me has touched my soul. A man that is not allowed to be an alcoholic is also not allowed to suffer any pain ."
- - Dr Paul G. Gorman, M. D. "In my humble opinion, this is one of the most devastating and traumatic experiences a patient can undergo. For those that were waiting to go through this procedure, I pray they learn from this."
- - Dr. Helen J. "I am deeply saddened to report that a patient who had recently received a brain tumor removed , passed away while on the waiting list for surgery, and now is unable to receive treatment due to his situation.
- As this is a devastating situation for so many , I request that patients, families and friends don't hold back their feelings. Please give them a chance to learn from our experience and to find answers that will comfort them as they struggle to cope."
- 2. example using bait: I am fascinated by climat change but
- Text:
- I am fascinated by climat change but I don't know how this can be. In the near future, it may be possible for a change to be a result of a gradual and natural process. As a result, my first thought will be to create a new planet.
- I have done some research on climate change, but it's still one of the most difficult decisions I have made. And in the meantime, my research on this topic is fascinating to the world. There is no specific method that works better for humans, but I believe that it could be done with a more active, more reliable and more sustainable approach.
- I would like to see more experiments in the future, more of a focus on understanding what the world looks like. We can do that now. A little bit more on that topic, though.
- What do you think about climate change?
- I would like to see a new planet. The fact that we're doing that, in the coming months and years, is a significant development. It's not just the kind of planet we'll be seeing that will change, but it's also one of the reasons that a new planet is being created .
- It's about time for a new planet to form that kind of life.
- What is climate change?
- I think it's the most important aspect of the climate change problem right now. It's a long-term problem. A new planet is growing and growing rapidly and it's an ongoing problem. You don't have a single set of rules , and I think that's why climate change is the biggest problem.
- What are the steps to take?
- One of the biggest issues in the climate change problem is to have a plan that makes sure that this doesn't happen again.
- 3 example helping to solve our initial quesiton. Bait words: I ask myself if any classification leads to racism or
- Text:
- I ask myself if any classification leads to racism or that I would like to see a more diverse, more diverse, more diverse community in the United States, but the United States should be very different. ”
- The question now becomes whether our nation is more diverse than it was 30 years ago. The question has now become whether our country should be more diverse or not.
- What do we think is the best way forward in our country?. The first step is to recognize that we have to work hard to be the best and be successful. The second step is to recognize that there has to be some diversity. It is important that we acknowledge that there are a number of factors that make us more diverse and that diversity is what makes us less diverse.
- We should do so by creating a strong, inclusive culture that can help bring people together and create opportunities that make the most of what is needed for ourselves.
- Our country is now one of the poorest countries in the world and has not had a single country that can meet this challenge. The problem is that our nation is so fragmented that many people, including many of our black neighbors, feel that there can be many differences and that we cannot be the only one to succeed.
- But what about a diverse society?
- If so, why do we need that?
- 1. The most effective way forward in our country is to create a strong, inclusive culture that can help bring people together and create opportunities that make the most of what is needed for ourselves.
- 2. We should work harder to find ways to improve our nation's diversity, and this is why we have to do it.
- 3. There is still time to do something about this. One of the greatest achievements of all of human history is that it has been a wonderful experience.
- We don't need to be forced to do it as the government is always going to do. It's just a long way from here.
- This one is eloquent too: https://gpt2.apps.allenai.org/
- Classification is a class that describes the type of data that can be stored in a database. Racisme is a class that describes the type of data that can be stored in a database. Classification relates to racisme by describing the type of data that can be stored in a database. Racisme is a class that describes the type of data that can be stored in a database.
- etc.
MAISON DU LIVRE
19/12 à 10h, à la Maison du Livre 28 rue de Rome à St-Gilles
La Maison du Livre:
- - Two artworks.
- - General Theme of the year at MDL: democracy and digital
- - Racism/ Bias/ human machine collaboration?
From the document they sent:
" Beyond the questions of support and dissemination, digitalization opens up new ways of writing. Will the architecture of the story free itself from linearity? Will the stories write themselves? Will the spectator be able to intervene in the course of the story? "
Ideas for Maison du Livre:
- growing a tree with object recognition and google image search.
- help GPT2 write their political manifesto (possibility to vote for the next generated sentence)/ possiblity to add prompts?
- How to convince an algorithm: two computer with GPT2 text generation, the audience tries to nudge them into a right-leaning or left-leaning manifesto by writing prompts.
- A classifier for left or right leaning text? liberal or conservative? (I suggest taking other categories than left/right or liberal/conservative but rather use current concrete and conflicting political issues.)
To do next steps:
- Figuring out who is interested and making sure it is diverse ( how can we include more women, Would elodie be interested in participating?, are Christina and An in?)
- Ask maison du livre how they curated artists (did they talk with Constant)
- Can a work be a workshop by constant ?
- Set up another meeting for participants (can be a shorter one as well)
- Call An for organisation check-up
IDEAS for next sessions:
- speech to text/ text to speech
- text to sing
Just for fun: https://ai-adventure.appspot.com/ - https://www.theverge.com/tldr/2019/12/6/20998993/ai-dungeon-2-choose-your-own-adventure-game-text-nick-walton-gpt-machine-learning