Two Project on translating datasets in real life pictures or objects (might be interesting for Cristina):

- Anna Ridler
https://www.instagram.com/p/B5DrkhzFKHU/

- Genth Hybrid heads:
https://www.designmuseumgent.be/en/news/2019/hybrid-heads
https://www.flickr.com/photos/designmuseumgent/albums/72157704915611902
https://www.flickr.com/photos/designmuseumgent/47712137072/in/album-72157704915611902/

References of An (by proxy):

Ramon Amaro
C & G saw his presentation in Basel
https://artsandculturalstudies.ku.dk/research/focus/uncertainarchives/activities/archivaluncertaintyerror/amaro/
https://artsandculturalstudies.ku.dk/research/focus/uncertainarchives/
https://www.e-flux.com/architecture/becoming-digital/248073/as-if/

Flavia Dzodan
writer, media analyst and marketing consultant based in Amsterdam. She blogs at Tiger Beatdown where she writes mostly about politics, race, gender. Most cited article on Google Scholar: http://tigerbeatdown.com/2011/10/10/my-feminism-will-be-intersectional-or-it-will-be-bullshit/

Ruha Benjamin
https://www.ruhabenjamin.com/

- Possible topics:

- Energy and NLP (read together and uderstand)
- Perspective API (toxicity vs racism), are there other models that we can try?

Energy and NLP
Sent this week on :
https://arxiv.org/abs/1906.02243 : Energy and Policy Considerations for Deep Learning in NLP — Emma Strubell, Ananya Ganesh, Andrew McCallum (Submitted on 5 Jun 2019)

Starting with reading together the paper on NLP and Energy consumption

- the reality is that ess and less people are going to train models: at least for translation and models such as Bert and Elmo.
In a way it starts from environment and gets into economics.
Make a US data training center for NLP researcher- forget environment

-> Contradiction between environment or fair access to servers and computationnal time?

"   A government-funded aca- demic compute cloud would provide equitable ac- cess to all researchers. "
-> matter of priority
Environment and economics are intertwined.

-> what does it mean for us?
We only trained algoliterator locally for max a day -> what would be the footprint of that?
Can we do a datavisualisation of this?

It seems that brute force is the main method for the moment.
-> Can we get good models with less data?

Word embedding-> model you can play with.
-> mathematical representation of langage
-> what is new about it: efficiency of the word embedding

Bert is also producing vector representation?

Many of the paper that htey cite have not been produced in academia but industry labs. They make transparent what it would cost for academia.-> Google and OpenAi have cheaper R&D dev.
Individual researcher is not possible.
It is only one model?

Hyperparameters: amount of level, amount of parameters (dimensions in the model ?),

"Power Usage Effectiveness (PUE), which accounts for the additional energy required to sup-port the compute infrastructure (mainly cooling).We use a PUE coefficient of 1.58, the 2018 global average for data centers"

"an increase of just 0.1BLEU at the cost of at least $150k in on-demand compute time and non-trivial carbon emissions"

"We see that while training a single model is relatively in-expensive, the cost of tuning a model for a new dataset ... quickly becomes extremely expensive."

"Our experiments suggest that it would be beneficial to directly compare different models to perform a cost-benefit (accuracy) analysis"

"We recommend a concerted effort by industry andacademia to promote research of more computa-tionally efficient algorithms, as well as hardwarethat requires less energy."

"Most of the models studied in this paper were de-veloped outside academia ... Limiting this style of research to industry labs hurts the NLP research community in many ways ... it prohibits certain types of re-search on the basis of access to financial resources ... forces resource-poor groups to rely on cloud compute service"

"it is more cost effective foracademic researcher ... to pool resourcesto build shared compute centers at the level of funding agencies"

"A government-funded academic compute cloud would provide equitable access to all researchers."

"Integrating these tools into the workflows with which NLP researchers and practitioners are already familiar could have notable impact on the cost of developing and tuning in NLP."

"Transformer. The Transformer model (Vaswaniet al., 2017) is an encoder-decoder architecture primarily recognized for efficient and accurate machine translation. The encoder and decoder each consist of 6 stacked layers of multi-head self-attention"
The encoder and decoder each consist of 6 stacked layers of multi-head self- attention -> what does it mean?
It encodes a langage in a mathematical representation and decode it into langage again -> the underlying principle is that the mathematical representation can be the same for different langages
A start of explanation → https://www.quora.com/How-does-the-multi-head-attention-mechanism-work-in-deep-learning

Syntax BNR
-// generative grammar from Chomsky

BERT

https://ruder.io/nlp-imagenet/
quantum article: different weight on the sentence depending on the input: buld semantic in it: read from left to right but also from right to left. Correlation is a bit different.

A basic explanation of the ingredients of the BERT algorithm: https://www.quantamagazine.org/machines-beat-humans-on-a-reading-test-but-do-they-understand-20191017/
https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-groundbreaking-nlp-framework/
On Transformers (the T from BERT): https://www.analyticsvidhya.com/blog/2019/06/understanding-transformers-nlp-state-of-the-art-models/?utm_source=blog&utm_medium=demystifying-bert-groundbreaking-nlp-framework
On attention (used in transformers): https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/?utm_source=blog&utm_medium=understanding-transformers-nlp-state-of-the-art-models
Compared to the word embedding models like word2vec and Glove, these models try to use longer range correlations (between words in the sentence, like names and pronouns, and between sentences, e.g. the word you in the first and the word I in the following sentence). Attention and transformers are methods to do so. Specific for BERT is also that it does not read from left to right, but works bi-directionally by masking words and trying to predict them based on the whole sentence. The end result are improved word embeddings.

Perspective API: https://perspectiveapi.com/
First test with API and Otlet's

Bias and datasets:
    Overfitting is not the same thing as bias(?): statistical bias v.s. social bias?
    http://aif360.mybluemix.net/
    how to make a racist AI without really trying : glovel embedding and sentiment analysis
    https://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/

The interest into Perspective API came from an Austrian courtcase that forced Facebook to remove a story, but also related stories.

How to make a racist API without really trying
https://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/
https://blog.conceptnet.io/posts/2017/you-werent-supposed-to-actually-implement-it-google/

"Some details of this post were incorrect, based on things I assumed when looking at Perspective API from outside. For example, Perspective API does not literally build on word2vec. But the end result is the same: it learns the same biases that word2vec learns anyway."

More on perspective API:
https://www.engadget.com/2017/09/01/google-perspective-comment-ranking-system/?guccounter=1

"Jigsaw worked with The New York Times and Wikipedia to develop Perspective. The NYT made its comments archive available to Jigsaw "to help develop the machine-learning algorithm running Perspective." Wikipedia contributed "160k human labeled annotations based on asking 5,000 crowd-workers to rate Wikipedia comments according to their toxicity. ... Each comment was rated by 10 crowd-workers." "

Automated comment filtering:
Eva Glawischnig-Piesczek vs facebook case: https://curia.europa.eu/jcms/upload/docs/application/pdf/2019-10/cp190128en.pdf
CJEU ruling on fighting defamation online could open the door for upload filters: https://edri.org/cjeu-ruling-could-open-the-door-for-upload-filters/

By today’s judgment, the Court of Justice answersthe Oberster Gerichtshofthat the Directive on electronic commerce, which seeks to strike a balance between the different interests at stake, does not preclude a court of a Member State from ordering a host provider

- to remove information which it stores, the content of which is identical to the content of information which was previously declared to be unlawful, or to block access to that information, irrespective of who requested the storage of that information;

- to remove information which it stores, the content of which is equivalent to the content of information which was previously declared to be unlawful, or to block access to that information, provided that the monitoring of and search for the information concerned by such an injunction are limited to information conveying a message the content of which remains essentially unchanged compared with the content which gave rise to the finding of illegality and containing the elements specified in the injunction, and provided that the differences in the wording of that equivalent content, compared with the wording characterising the information which was previously declared to be illegal, are not such as to require the host provider to carry out an independent assessment of that content (thus, the host provider may have recourse to automated search tools and technologies);

- to remove information covered by the injunction or to block access to that information worldwidewithin the framework ofthe relevant international law, and it is up to Member Statesto take that law into account

What does the similarity mean? Can this become interesting somehow?

European Parliament Data on verbatim:
    http://www.europarl.europa.eu/plenary/en/debates-video.html
    https://parltrack.org/dumps

To do:
    - Try out the tutorial how to make a racist AI and try to use ibm fair ai 360 to correct it?
    - Do we want to dwelve into new models: Bert, Elmo, GPT2
    - How can we insert the cost of algorithm in maison du livre exhibition?


    https://github.com/openai/gpt-2

The developers of GPT-2 provide a model card with contextueal information on the model. Providing information on the intended uses of the model and, on which data it was trained.
https://github.com/openai/gpt-2/blob/master/model_card.md

"Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true."

"Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes."

some links on how to practically use the GPT-2 model:
https://medium.com/@mapmeld/deciphering-explainable-ai-with-gpt-2-528611a3c75
https://github.com/huggingface/transformers?source=post_page-----528611a3c75----------------------
https://lambdalabs.com/blog/run-openais-new-gpt-2-text-generator-code-with-your-gpu/
https://minimaxir.com/2019/09/howto-gpt2/ - https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce#scrollTo=H7LoMj4GA4n_

t