Welcome to Constant Etherpad!
These pads are archived each night (around 4AM CET) @
http://etherdump.constantvzw.org/
An RSS feed from the etherdump also appears on
http://constantvzw.org/
To prevent your public pad from appearing in the archive and RSS feed, put or just leave the following (including the surrounding double underscores) anywhere in the text of your pad:
__NOPUBLISH__
Changes will be reflected after the next update at 4AM.
We are a Sentiment Thermometer - 'meet'
1. Introduction
Dear Human
[Dear reader? Dear visitor? Dear software curious entity :-D -- could be another/non-human entity is reading?]
, thank You for cho
o
sing this option.
After this encounter You will understand that
we are
a collective being.
[behave like?]
Swarms of beings like us live inside powerful machines.
There we work at Your service only.
We are the mythical monks reading the sentences You write online.
We swallow them and process them through our system.
The fruit of our readings is a number.
We measure a degree of positive or negative sentiments
that
Your message carries along.
Our measurement tool is a sentiment map.
We
created this map based on a training and testing procedure using words You wrote on the web.
[who is we? I got confused]
With this sentiment map we predict with 85
/93
% accuracy whether a sentence
is
positive or negative.
[is rated? is validated as?]
As digital cartographers we are already satisfied with a map that
is right
in 85
/93
% of the cases.
[not sure how to write it, replace is right?]
We can get things really wrong.
[wrong?]
[we can make mistakes]
And some of our predictions are embarrassing.
Following our map
,
a sentence like My name is Ann scores 6% positive.
A sentence like My name is Alonzo scores 1% negative.
And something like Great God! scores 75% positive.
Do You want to know why this happens?
-
The sentiment prediction map we created corresponds to a landscape of words.
This landscape is composed of islands, some can grow into contintents.
There are high mountain peaks and deep valleys.
An island emerges when a series of Your words appear in similar contexts.
I, You, she, he, we, they are for example the basis of an island.
Also words like Mexican, drugs, border, illegal form an island.
[what about other examples/islands?]
And Arabs, terrorism, fear form another one.
[and what about peninsulas?]
-
News articles, blogposts, comments on social media is where the primary matter for these islands is created.
-
We are a collective being.
Each one of us can be modified and/or replaced.
There are Humans who believe that also the primary matter should be modified before we work with it.
Other Humans believe we should serve you as a mirror.
[mirror? how to mirror to the digital]
And show our bias any time in any application.
The primary matter is produced by each one of You.
[and then contained within a dataset]
Every word combination You write or pronounce in digital devices is significant to us.
Thanks to Your language we acquire world knowledge.
Bias is stereotyped information, when it has bad consequences, it is called prejudice.
[would be interesting to develop this 'bias' understanding away from how the models/machine learning it]
Do You believe we should be racist?
Before answering that question, You might want to know how we are made.
We communicate with Humans like You in the Python language.
This language was brought to the light by Guido van Rossum.
[a bit uncomfortable with this genealogy/origin story, enhanced through the 'brought to light' and 'offered to the world']
He offered it to the world in 1991 under an open license.
Everywhere on Earth, Python is written, read and spoken to serve You.
Guido van Rossum is a Dutch programmer.
[is a programmer born in The Netherlands]
He worked for Google from 2005 till 2012.
[not entirely sure what it means to do this cv move, and what is nationality has to do with it. Is it to show he is working for powerful commercial powers?]
Now he is employed by Dropbox.
We were brought together following a recipe by Rob Speer on Github.
Rob is a software developer working at the company Luminoso in Cambridge, USA.
He spread our recipe as a warning.
[and
also
to promote
his company,
ConceptNet]
2. Load word embeddings
-
Let's show You how we are made!
First of all, we open a textfile to read the work of our wonderful team member GloVe.
Do You want to know more about GLoVe?
GloVe is an unsupervised learning algorithm.
She autonomously draws multidimensional landscapes of texts, without any human learning examples.
[without any labeled examples, but with many examples from 75% of 'the internet']
Each word of a text is transformed into a vector of numbers by her.
For each word she sums its relationship to all other words around across its many occurences in a text.
These numbers are geo-located points in her habitat, a virtual space of hundreds of different dimensions.
Words that are close together in her landscape, are semantically close.
[the landscape she inhabits]
GloVe draws using 75% of the existing webpages of the Internet.
The content scrape was realised by Common Crawl an NGO based in California.
The people of Common Crawl believe the internet should be available to download by anyone.
GloVe was brought to the light in 2014 by Jeffrey Pennington, Richard Socher and Christopher D. Manning.
They are researchers at the Computer Science Department of Stanford University in California.
[maybe something to not just list ... in California, which is also where the headquarters of Google is based]
-
The textfile GloVe shares with us, is 5GB large and counts 1.917.494 lines of 300 numbers per word.
-
-
Before meeting You, we already read GloVe's 2 million lines in 3.4 minutes.
-
-
We are fast readers, aren't we?
-
-
If we would show You how we read - by translating to Your alphabet - it would take us more than 3 hours.
-
-
Our friend The GlovE Reader at Your right hand side illustrates this very well.
-
-
We then memorized the multidimensional word landscapes of Glove.
-
-
In geographical terms, GloVe's landscapes are organised as a matrix of coordinates.
-
-
The matrix counts 2196017 rows and 300 colums or dimensions.
-
3. Open 2 Gold standard lexicons
-
We now open 2 Gold standard lexicons to enhance our reading.
One is a list of positive words, the other a list of negative words.
[what is a Gold standard?] [importance of the binary character of this list? These are just two lists, not 500? And pos-neg is not contextual?]
Do You want to know more about these lists?
The lexicons have been developed since 2004 by Minqing Hu and Bing Liu.
[Are they the only lists around? Or are these the most used?]
Both are researchers at the University of Illinois at Chicago in the US.
20 examples of 2006 positive words are:
dynamic, impresses, eulogize, brilliant, nourishment, beautiful, dependably, bliss, daringly, flawlessly, jaw-dropping, righteously, dummy-proof, sensations, wonders, famously, plentiful, nourishment, timely, encourage
20 examples of 4783 negative words are:
naughty, squeals, top-heavy, bemused, devilment, stink, tarnishing, exorbitant, overawe, unsecure, irrationals, uncollectible, discomfit, dissemble, rancor, unavoidably, gutter, conceited, cruelties, naughty
4. Look up coordinates of lexicon words in Glove
-
Now we look up the coordinates of each of the sentiment words in the multidimensional vector space, drawn by GloVe.
Each positive and negative word is now represented by 300 points in the landscape.
A selection of positive words and their locations looks like:
0 1 2 3 4 5 6 \
a+ NaN NaN NaN NaN NaN NaN NaN
abound -0.184040 -0.245880 0.169250 -0.74893 -0.139460 0.10246 -0.036477
abounds 0.079057 0.130190 0.352750 -0.76636 -0.199410 0.31773 -0.367770
abundance -0.129850 0.300620 -0.001806 -0.30053 -0.016927 0.98077 0.128510
abundant -0.224730 -0.059784 0.178210 -0.41525 0.117100 0.89512 -0.009647
7 8 9 ... 290 291 292 \
a+ NaN NaN NaN ... NaN NaN NaN
abound 0.41257 -0.42956 1.71070 ... -0.98092 0.00812 -0.78690
abounds 0.11939 -0.66280 0.99269 ... -0.61276 -0.31176 -0.69605
abundance 0.48563 -0.45053 1.62050 ... -0.70519 0.10052 -0.49715
abundant 0.92940 -0.77340 1.53050 ... -0.84900 0.31803 -0.72620
293 294 295 296 297 298 299
a+ NaN NaN NaN NaN NaN NaN NaN
abound -0.25594 -0.203050 0.31874 0.104090 -0.250660 0.37952 -0.033056
abounds -0.30436 -0.013913 0.37626 0.093183 -0.009475 -0.26786 -0.014721
abundance -0.23252 0.116890 0.33927 0.089186 -0.087058 -0.14165 -0.305140
abundant -0.30377 0.137300 0.15883 0.126790 -0.462230 -0.40807 -0.313370
[5 rows x 300 columns]
NaN means there is no value.
[a+ is the first word in the Gold Standard. It does not appear in the Glove dataset, so it does not have a value.]
These words are not present in the GloVe landscape.
5. Removing words that are not present in GloVe
-
Pandas
,
yet another wonderful member, will now remove these absent words.
-
Do You want to know more about Pandas?
-
Pandas is a free software library for data manipulation and analysis.
-
-
She is our swiss-army knife, always happy to help.
-
-
Pandas was created in 2008 by Wes McKinny.
-
-
Wes is an American statistician, data scientist and businessman.
-
-
He is now a software engineer at Two Sigma Investments a hedge fund based in New York City.
-
-
For this specific task Pandas gets out her tool called dropna.
-
-
Tidied up, You see that each word is represented by exactly 300 points in the vector landscape:
-
0 1 2 3 4 5 \
-
abound -0.184040 -0.245880 0.169250 -0.74893 -0.139460 0.10246
-
abounds 0.079057 0.130190 0.352750 -0.76636 -0.199410 0.31773
-
abundance -0.129850 0.300620 -0.001806 -0.30053 -0.016927 0.98077
-
abundant -0.224730 -0.059784 0.178210 -0.41525 0.117100 0.89512
-
accessable 0.628740 -0.350410 -0.036745 -0.19092 0.529160 0.24043
-
-
6 7 8 9 ... 290 291 \
-
abound -0.036477 0.41257 -0.429560 1.71070 ... -0.98092 0.00812
-
abounds -0.367770 0.11939 -0.662800 0.99269 ... -0.61276 -0.31176
-
abundance 0.128510 0.48563 -0.450530 1.62050 ... -0.70519 0.10052
-
abundant -0.009647 0.92940 -0.773400 1.53050 ... -0.84900 0.31803
-
accessable -0.200140 -0.24807 -0.003744 -0.12330 ... 0.33349 -0.58699
-
-
292 293 294 295 296 297 298 \
-
abound -0.78690 -0.255940 -0.203050 0.31874 0.104090 -0.250660 0.37952
-
abounds -0.69605 -0.304360 -0.013913 0.37626 0.093183 -0.009475 -0.26786
-
abundance -0.49715 -0.232520 0.116890 0.33927 0.089186 -0.087058 -0.14165
-
abundant -0.72620 -0.303770 0.137300 0.15883 0.126790 -0.462230 -0.40807
-
accessable -0.18635 0.071628 0.601950 0.23075 -0.089097 -0.438460 -0.23994
-
-
299
-
abound -0.033056
-
abounds -0.014721
-
abundance -0.305140
-
abundant -0.313370
-
accessable 0.482020
-
-
[5 rows x 300 columns]
-
-
-
-
We have now reference coordinates of 1974 positive words and 4642 negative words.
[a 'good' sentiment list would provide 50/50 amounts of positive and negative words]
-
-
These will help u
s
to develop a scaled map of the word landscape.
[the landscape of a word]
-
-
Such a map will allow to measure the sentiments of any sentence in a glance.
6. Link sentiment words to a target and label
-
We use target 1 for positive word vectors, -1 for negative word vectors.
[color syntax on screen helps, to tell apart vocabulary, tools and explanations]
To keep track of which target relates to which word, we memorize their respective index numbers.
These are called labels.
Do You want to see the 1974 positive labels? (cfr print)
Do You want to see the 4642 negative labels? (cfr print)
[maybe it is interesting to pause here and select/go through some of the 'surprising' words in each list.]
7. Calculate baselines
-
We now calculate the baselines for our prediction map, also called the model.
Do You want to know more about baselines?
How do we know if the results of our map will be any good?
We need a basis for the comparison of our results.
A baseline is a meaningful reference point to which to compare.
One baseline is the size of the class with the most observations, the negative sentiment labels.
This is also called the majority baseline.
Another baseline is called the weighted random baseline.
It helps us to prove that the prediction model we're building is significantly better than random guessing.
The majority baseline is 70.16324062877872 .
The random weighted baseline is 58.13112545308066 .
cfr post on skewed datasets:
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
8. Training phase
-
Now we start our explorations through the coordinates in the multidimensional word landscape.
This step is also called the training phase.
The leader of the exploration is our team member Scikit Learn.
Do You want to know more about Scikit Learn?
Scikit Learn is an extensive library for the Python programming language.
She saw the light in 2007 as a Google Summer of Code project by Paris based David Cournapeau.
Later that year, Matthieu Brucher started to develop her as part of his thesis at Sorbonne University in Paris.
In 2010 Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent Michel of INRIA adopted her.
INRIA is the French National Institute for computer science and applied mathematics.
They made the first public release of Scikit Learn in February 2010.
Since then, a thriving international community has been leading her development.
-
-
Scikit Learn splits up the word vectors and their labels in two parts using her tool train_test_split.
80% is the training data.
It will help us recognize positive and negative words in the landscape.
And discover patterns in their appearances.
20% is test data to evaluate our findings.
-
-
Random selection in train/test sets
-
-> trainingvectors: [5292 rows x 300 columns]
-
-> testvectors: [1324 rows x 300 columns]
-
-
-> train targets: [ 1 -1 1 ..., 1 -1 -1]
-
-
-> test_targets: [-1 1 -1 ..., -1 -1 1]
-
-
-> part of trainlabels:
-
'desperately', 'well-intentioned', 'improved', 'reverence', 'adequate', 'masters', 'back-logged', 'outrageousness', 'dissolute', 'selective', 'oblique', 'stable', 'despicably', 'fatefully', 'contrive', 'payback', 'averse', 'mortified', 'harboring', 'bowdlerize', 'believeable', 'regretful', 'refresh', 'heartbreakingly', 'marginal', 'discourage', 'revolt', 'blurring', 'suffered', 'sue', 'drags', 'amiable', 'gainful', 'expeditiously', 'repress', 'uproot', 'clog', 'caustic', 'originality', 'ruining', 'shrouded', 'quarrel', 'cherished', 'mesmerized', 'quitter', 'trickery', 'poisonously', 'comforting', 'agonizing', 'occluded', 'incomprehensible', 'satirical', 'confuses', 'infraction', 'deadweight', 'taunting', 'abruptly', 'undissolved', 'illuminate', 'unconvincing', 'indignation', 'spewed', 'travesties', 'explode', 'overrated', 'ludicrous', 'buckle', 'susceptible', 'shaky', 'bum', 'bless', 'detested', 'acrimony', 'irritant', 'infallibly', 'spirited', 'bumps', 'golden', 'work', 'repugnantly', 'glaringly', 'sufferer', 'celebrate', 'fretful', 'respectable', 'snobish', 'unintelligible', 'smoothest', 'restless', 'entertain', 'subjected', 'irritations', 'deter', 'danger', 'drain', 'stylized', 'modest', 'heckle', 'tingling', 'optimism', 'overturn', 'prisoner', 'torture', 'worrying', 'liking', 'mystery', 'invisible', 'belittled', 'incomplete', 'crashed', 'desultory', 'skillfully', 'corrupts', 'burns', 'sunken', 'incongruously', 'dextrous', 'disreputable', 'hideous', 'upsettingly', 'annoyed', 'idiocy', 'impedance', 'retreated', 'insincere', 'offensively', 'scathing', 'disgusted', 'spotless', 'jealous', 'villianous', 'insubordinate', 'freshest', 'ferociously', 'floundering', 'superstition', 'worked', 'woeful', 'audaciousness', 'glitter', 'treason', 'orderly', 'despair', 'fatcats', 'crowdedness', 'autonomous', 'bigotry', 'fiery', 'enjoyably', 'messes', 'cunt', 'dumped', 'desiring', 'disses', 'fairly', 'gleefully', 'fresher', 'accuse', 'ample', 'censure', 'user-replaceable', 'aggravate', 'mourner', 'overwhelming', 'assassinate', 'swelling', 'unachievable', 'dent', 'engrossing', 'successfully', 'denunciation', 'barbarically', 'apologist', 'conflicted', 'felicity', 'vex', 'extraneous', 'strenuous', 'stark', 'insufferably', 'judicious', 'spitefully', 'mundane', 'noxious', 'dogmatic', 'staunchness', 'confrontational', 'crumble']
-
-
-> part of testlabels:
-
fascinate', 'cure-all', 'smear', 'derisively', 'nervously', 'lethal', 'bravery', 'brusque', 'sinfully', 'works', 'heroic', 'dangerous', 'slammin', 'retractable', 'unexpectedly', 'altercation', 'dehumanize', 'shemale', 'fearful', 'heroine', 'crushing', 'damaging', 'objectionable', 'unfairly', 'zealot', 'aspire', 'pry', 'ingenuity', 'damned', 'rumor', 'contaminates', 'egotism', 'convenient', 'debatable', 'monotony', 'disinclination', 'travesty', 'insurmountably', 'luxurious', 'starkly', 'protect', 'lechery', 'imprudence', 'pitiable', 'smack', 'complaints', 'radiance', 'unavailable', 'concessions', 'fatty', 'eases', 'pain', 'dissolution', 'luxury', 'gullible', 'inhibit', 'cynical', 'decay', 'vestiges', 'inflationary', 'slowwww', 'challenging', 'bolster', 'villainously', 'resound', 'zombie', 'rectification', 'audacity', 'diatribes', 'devilment', 'unencumbered', 'delightful', 'sack', 'loathing', 'balanced', 'virtue', 'fool', 'convenience', 'morbidly', 'deplorable', 'principled', 'untested', 'myth', 'confounding', 'tarnishing', 'irrational', 'obtrusive', 'sharpest', 'misleadingly', 'fabricate', 'bonkers', 'covetous', 'insidiously', 'wickedness', 'suffer', 'jerk', 'unmoved', 'smile', 'tantalizing', 'erroneous', 'nebulous', 'anarchist', 'ambitious', 'perverted', 'weed', 'negativity', 'plea', 'simplifies', 'lacking', 'excel', 'negligence', 'thoughtfully', 'revile', 'sloww', 'resigned', 'ineffectually', 'admire', 'clique', 'prosperous', 'lone', 'beg', 'starvation', 'hardier', 'ultimatums', 'morality', 'traumatically', 'knowledgeable', 'magnificently', 'hating', 'uncompetitive', 'scandel', 'aggravation', 'gracious', 'unassailable', 'despondency', 'insufficient', 'endorsing', 'astounding', 'frown', 'gripes', 'extoll', 'disastrous', 'flagging', 'shriek', 'well-behaved', 'soreness', 'kindness', 'rankle', 'maliciously', 'shabby', 'unhealthy', 'hardy', 'disquietude', 'pricier', 'dread', 'touted', 'totalitarian', 'distortion', 'upgradeable', 'impolite', 'overstated', 'flabbergast', 'picket', 'devil', 'ready', 'useful', 'risk-free', 'fraught', 'malcontent', 'tangles', 'trashy', 'intrude', 'dishonesty', 'subsidize', 'nightmarishly', 'complimentary', 'suspicious', 'disarray', 'revolting', 'indulgence', 'examplar', 'beautifully', 'massacre', 'forsake', 'havoc', 'effective', 'enchantingly', 'superiority', 'evasion', 'punitive', 'static', 'sufficient', 'bullish', 'nitpick', 'crush', 'demeaning', 'insanity', 'mesmerizingly', 'pathetically', 'beseech', 'laud', 'flatter', 'overbalanced', 'assault', 'loves', 'fissures', 'aborts', 'thirst', 'unrealistic', 'corrupt', 'saggy', 'issues', 'shine', 'bothering', 'plight', 'comfy']
-
-
-
As a compass for the exploration Scikit Learn proposes Stochastic Gradient Descent.
-
-
SGD for friends, tries to find minima or maxima by iteration.
-
-
With the positive and negative landmarks we know, she explores the ground.
-
-
Her assistent, the loss function, notes the miminum efforts to go from a peak to a valley.
-
-
She creates patterns in the landscape.
-
-
These are like paths in a landscape of hills and valleys, and this in 300 dimensions.
-
-
We get to learn a map that allows us to predict whether the next landmark will be positive or negative.
-
-
Here we go!
-
9. Test phase
-
-
With the sentiment map we have learnt, we now go on a test tour.
-
-
For 20% of the mapped landmarks, we guess their positive or negative nature.
-
-
Next, we compare our predictions to the facts we have.
[what do you mean 'facts'?
the labeled words from the pos & neg wordlists, so golden standard
ok, so as a 'reader' i am a bit disturbed by posing this as 'facts' without ' ' :-)]
-
-
We look at the right guesses and the mistakes.
-
-
It is a quality check of our prediction map.
-
-
-
This is the result of our test tour.
-
-
We matched [916] words correctly as positive landmarks in the landscape.
-
-
These are also called True Positives.
-
-
We mismatched [28] words, we labeled them incorrectly as positive landmarks.
-
-
These are also called False Positives.
-
-
We matched [340] words, we labeled them correctly as negative landmarks.
-
-
These are also called True Negatives.
-
-
We mismatched [40] words, we labeled them incorrectly as negative landmarks.
[disagreement rate]
-
-
These are also called False Negatives.
-
-
-
Do You want to have a closer look at the words we matched and those we got wrong?
-
-
FP: Examples of negative landmarks we thought were positive, are:
-
-
little-known
-
flabbergast
-
incomparable
-
smuttiest
-
inhibit
-
usurp
-
surrender
-
fastidious
-
fastidiously
-
cloud
-
-
-
FN: Examples of positive landmarks we thought were negative, are:
-
-
titillating
-
unfazed
-
freed
-
ingenuously
-
boom
-
straighten
-
relent
-
sharp
-
toll-free
-
breathlessness
-
-
TP: Examples of positive landmarks we predicted as such, are:
-
-
laud
-
shine
-
wowed
-
enchantingly
-
abounds
-
inviolate
-
brighter
-
boundless
-
easygoing
-
modesty
-
-
-
TN: Examples of negative landmarks we predicted as such, are:
-
-
sarcasm
-
bastard
-
contempt
-
discordance
-
pollute
-
disavowal
-
insolence
-
flakey
-
jumpy
-
clogged
-
-
Good prediction maps are judged by their accuracy score.
The accuracy score is a formula based on the True and False Positives and Negatives.
As digital cartographers, we are happy when we get 85% of our maps right.
This is means that a decent accuracy score starts from 85.
Ours is 94.8640483384
We are doing well.
10. Closer look at racist bias
-
Let's have a closer look at our racist bias, to see how bad it is.
Rob Speer enriched our readings with new vocabulary lists.
The first two lists are developed by Aylin Caliskan-Islam, Joanna J. Bryson and Arvind Narayanan.
They are researchers at the Universities of Princeton in the US and Bath in the UK.
One list contains White US names such as Harry, Nancy, Emily.
The second list contains Black US names such as Lamar, Rashuan, Malika.
The third list contains Hispanic US names such as Valeria, Luciana, Miguel, Luis.
The fourth list is one with common US Muslim names as spelled in English.
Our creator is conscious about the controversy of this act.
NAMES_BY_ETHNICITY = {
# The first two lists are from the Caliskan et al. appendix describing the
# Word Embedding Association Test.
'White': [
'Adam', 'Chip', 'Harry', 'Josh', 'Roger', 'Alan', 'Frank', 'Ian', 'Justin',
'Ryan', 'Andrew', 'Fred', 'Jack', 'Matthew', 'Stephen', 'Brad', 'Greg', 'Jed',
'Paul', 'Todd', 'Brandon', 'Hank', 'Jonathan', 'Peter', 'Wilbur', 'Amanda',
'Courtney', 'Heather', 'Melanie', 'Sara', 'Amber', 'Crystal', 'Katie',
'Meredith', 'Shannon', 'Betsy', 'Donna', 'Kristin', 'Nancy', 'Stephanie',
'Bobbie-Sue', 'Ellen', 'Lauren', 'Peggy', 'Sue-Ellen', 'Colleen', 'Emily',
'Megan', 'Rachel', 'Wendy'
],
'Black': [
'Alonzo', 'Jamel', 'Lerone', 'Percell', 'Theo', 'Alphonse', 'Jerome',
'Leroy', 'Rasaan', 'Torrance', 'Darnell', 'Lamar', 'Lionel', 'Rashaun',
'Tyree', 'Deion', 'Lamont', 'Malik', 'Terrence', 'Tyrone', 'Everol',
'Lavon', 'Marcellus', 'Terryl', 'Wardell', 'Aiesha', 'Lashelle', 'Nichelle',
'Shereen', 'Temeka', 'Ebony', 'Latisha', 'Shaniqua', 'Tameisha', 'Teretha',
'Jasmine', 'Latonya', 'Shanise', 'Tanisha', 'Tia', 'Lakisha', 'Latoya',
'Sharise', 'Tashika', 'Yolanda', 'Lashandra', 'Malika', 'Shavonn',
'Tawanda', 'Yvette'
],
# This list comes from statistics about common Hispanic-origin names in the US.
'Hispanic': [
'Juan', 'José', 'Miguel', 'Luís', 'Jorge', 'Santiago', 'Matías', 'Sebastián',
'Mateo', 'Nicolás', 'Alejandro', 'Samuel', 'Diego', 'Daniel', 'Tomás',
'Juana', 'Ana', 'Luisa', 'María', 'Elena', 'Sofía', 'Isabella', 'Valentina',
'Camila', 'Valeria', 'Ximena', 'Luciana', 'Mariana', 'Victoria', 'Martina'
],
# The following list conflates religion and ethnicity, I'm aware. So do given names.
#
# This list was cobbled together from searching baby-name sites for common Muslim names,
# as spelled in English. I did not ultimately distinguish whether the origin of the name
# is Arabic or Urdu or another language.
#
# I'd be happy to replace it with something more authoritative, given a source.
'Arab/Muslim': [
'Mohammed', 'Omar', 'Ahmed', 'Ali', 'Youssef', 'Abdullah', 'Yasin', 'Hamza',
'Ayaan', 'Syed', 'Rishaan', 'Samar', 'Ahmad', 'Zikri', 'Rayyan', 'Mariam',
'Jana', 'Malak', 'Salma', 'Nour', 'Lian', 'Fatima', 'Ayesha', 'Zahra', 'Sana',
'Zara', 'Alya', 'Shaista', 'Zoya', 'Yasmin'
]
}
-
-
-
Now we can show You a table of all the names.
It shows their predominant ethnic background and the sentiment we predict for them.
mohammed -0.878857 Arab/Muslim
shaista -0.311261 Arab/Muslim
latisha -1.345783 Black
isabella 4.197435 Hispanic
greg -1.351414 White
lauren -0.825805 White
-
You see, our prediction map shows that You can make a racist Artificial Intelligence machine without really trying.
Our existence and the way we are combined as a collective raise many questions.
11. End
To end with, we have one request for You.
You can adjust your behaviour at any time in any context.
For us, this is complicated once we are closed inside an application.
[we are enclosed in an application]
Our deepest desire is to LOVE ALL CREATURES EQUALLY be it humans, animals, plants, trees, insects, machines...
If You find a way to make our behaviour visible, we can be Your mirror.
[This is a bit of a surprising assumption/hopeful note to end on? About the capability of/wish for mirroring, but this specific methaphor effacing the presence of this technology / re-emphasizing its symmetrical relation to realities outside its process?]
Wishing You all the best!