Meeting 14 December 2018
Javier, Cristina, Vinicius, Tim, An

Discussing exhibition: https://pad.constantvzw.org/p/algolit-exhibition-mons
Discussing podcasts: https://pad.constantvzw.org/p/algolit-exhibition-mons-podcasts

"I do not believe that my father was (or ever could have been) such a Poet as I shall be an Analyst." Ada Lovelace, July 1843

Bruce Naumann, Good boy, bad boy
Susan Treister https://www.suzannetreister.net/HFT_TheGardener/HFT_menu.html
"One day, staring at the list he had compiled of the botanical names of his plants he decided to conduct a gematria experiment. Using his rudimentary knowledge of the Hebrew language, gained during his school days, Traumberg made numerical experiments translating the botanical names of psychoactive plants into phonetic Hebrew and then deriving their numerical equivalents.
He discovered that, for example, Mandrake, (Mandragora officinarum) had a gematria value of 970. Adding together the 9 the 7 and the 0 made 16 and then adding the 1 and the 6 made 7.
A copy of the Financial Times on his desk prompted him one day to check the numerical equivalents of the plants against the top companies in the FT Global 500 index.
Traumberg found that the two final numbers for Mandrake, 16 and 7, corresponded to Petro China and Wells Fargo which came 16th and 7th respectively in the FT index.
Traumberg compiled a gematria chart of all the plants, listing their botanical names alongside their global companies equivalents. He then developed an algorithm that would trawl the internet collecting images of the groups of psychoactive plants which corresponded to each company."


ideas:
    average warmth in cities & news articles on corruption in those places
    geolocation data & behaviours
    compare self scored sentiment dictionary with wordnet sentiment dictionary
    work with existing data of trees
    how manifestations are organised in east/west Europe
can we find a way to generate a story that prograssively is rewritten to arrive to the perfect correlation
-> we try it with teh Frankenstein novel and the hypothesis we developed in the previous session

list of synonyms
['monster', 'beast', 'freak', 'giant', 'monstrosity' 'miscreation' 'demon' 'being']

adjectives = set()

for sentence in selected_sentences:
    for word, pos in nltk.pos_tag(sentence): 

        if pos in ['ADJ']: 
            adjectives.add(word)

Code so far: https://gitlab.constantvzw.org/algolit/algolit/commit/697a40041eb9ac807046fbec7d97fd97f5f4b8d8


Meeting 16 November 2018

Linear Regression - a forest game
http://www.paramoulipist.be/?p=1693
Manuals for Linear Regression:
    Introduction https://www.youtube.com/watch?v=zPG4NjIkCjc
    Calculating https://www.youtube.com/watch?v=JvS2triCgOY

Spurious correlations:
    http://tylervigen.com/spurious-correlations

Link to a friend - writer artist programmmer:
http://pohflepp.net/

Steps in the game
Ideally the participants in this game take a sample of 100 trees. Experience shows that this requires 20 people, who measure each 10 trees, in groups of two. With previous knowledge about the species of common trees, this takes one afternoon.

Proposal for the day:
    create a game ourselves, using text
    looking if there is a correlation between the word tree and type of adjective uthat is closest to it
    
1. Find Sentences on refugees - sample data to train on

1. A refugee, generally speaking, is a displaced person who has been forced to cross national boundaries and who cannot return home safely.
https://en.wikipedia.org/wiki/Refugee
2. A refugee has a well-founded fear of persecution for reasons of race, religion, nationality, political opinion or membership in a particular social group.
https://www.unrefugees.org/refugee-facts/what-is-a-refugee/
3. Rohingya refugee camp quiet after Bangladesh postpone return
https://www.foxnews.com/world/rohingya-refugee-camp-quiet-after-bangladesh-postpone-return
4.Five acclaimed photographers travel the world to provide detailed insight into the difficult conditions faced by refugees who dream of a better life.
https://www.netflix.com/title/80160127
5. Many smugglers are just collecting the money; I want the refugees to cross the sea safely. 
https://www.aljazeera.com/indepth/features/2016/02/diary-syrian-refugee-family-reach-greece-160202112221725.html
6. Brutal conditions, suicide in Burmese refugee camps
https://video.foxnews.com/v/5822100421001/?#sp=show-clips 
7. A refugee is one who flees, especially to another country, seeking refuge from war, political oppression, religious persecution, or a natural disaster.
8. Ahmad is the second refugee from Iraq in the last four months to be arrested on terrorism charges and the third Iraqi refugee to be charged with trying to kill others. 
https://www.breitbart.com/national-security/2018/11/04/refugee-from-iraq-accused-of-making-two-bombs-in-las-vegas-for-attack/

2. Formulate hypothesis/prediction pattern
- One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
– The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
Is there a correlation between the position of the word 'refugee' in a sentence and the degree of positivity of the closest adjective 

3. Find consensus on scaling of sentiment
Adjectives:
    - displaced 3 / 3 / 0 / 0 -> 1.5
    - well-founded 10 / 10 / 8 / 10  -> 9.5
    - quiet 5 / 6 / 7 / 10 -> 7
    - difficult 0 / 0 / 2 / 2  -> 1
    - many 8 / 7 / 5 / 5 -> 6.25
    - Burmese 5 / 5 / 5 / 5 -> 5
    - another 2 / 5 / 5 / 5 -> 4.25
    - second 2 / 4 / 3 / 5 -> 3.5

We score individually the different words from 0 till 7, and compare our scores.
Then we give them a value from 0 to 9 (0 is negative, 4 is neutral, 9 is positive).
We try to consider each of the adjectives as such, outside of the context of refugees/the sentence.
Because in code you would work in the beginning with a dictionary of scored adjectives, but then maybe you can also do an analysis first  

4. Result: hypothesis is rejected

5. We adjust hypothesis
Is there a correlation between the distance between the word 'refugee' and the closest adjective in a sentence and the degree of positivity of the closest adjective 

Result: ok

++++Side note+++++
GAN tutorials
https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f
+++++++++



6. Check the accuracy
    calculate the PM Pressure Multiplier) coefficient to check the accuracy of the correlation
  https://sciencing.com/calculate-regression-coefficient-5087094.html
  We get PM 1!
  We have a full correlation prediction - but.... we would need to test it with at least 100 samples to be sure
  -> we calculated R² (see youtube video: https://www.youtube.com/watch?v=w2FKXOa0HGA)
  
  PM coefficient or Pearson Correlation Coefficient: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
  R Sqaured: https://en.wikipedia.org/wiki/Coefficient_of_determination

7. Find a script
  Line regression python examples with scikitlearn
  https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
  Following this manual to create a model based on 'Monster/creature' in Frankenstein text:
  https://towardsdatascience.com/simple-and-multiple-linear-regression-in-python-c928425168f9
  
  the script does not come up with the same results,
  rather negative correlation
  -> maybe check or try with balanced dataset (leftwing / rightwing sources on refugees)

See code we made today: https://gitlab.constantvzw.org/algolit/algolit/tree/master/2018/linear_regression