GENDER MINING
https://en.wikipedia.org/wiki/Wikipedia:Meetup/justfortherecord


Bags of words

You are alone in the room, except for two computer terminals flickering in the dim light. You use the terminals to communicate with two entities in another room, whom you cannot see. Relying solely on their responses to your questions, you must decide which is the man, which the woman. Or, in another version of the famous “imitation game” proposed by Alan Turing in his classic 1950 paper “Computer Machinery and Intelligence,” you use the responses to decide which is the human, which the machine. One of the entities wants to help you guess correctly. His/her/its best strategy, Turing suggested, may be to answer your questions truthfully. The other entity wants to mislead you. He/she/it will try to reproduce through the words that appear on your terminal the characteristics of the other entity. Your job is to pose questions that can distinguish verbal performance from embodied reality. If you cannot tell the intelligent machine from the intelligent human, your failure proves, Turing argued, that machines can think.
Here, at the inaugural moment of the computer age, the erasure of embodiment is performed so that “intelligence” becomes a property of the formal manipulation of symbols rather than enaction in the human lifeworld. The Turing test was to set the agenda for artificial intelligence for the next three decades. In the push to achieve machines that can think, researchers performed again and again the erasure of embodiment at the heart of the Turing test. All that mattered was the formal generation and manipulation of informational patterns.

*Katherine Hayles, How We Became Posthuman

A hero (masculine or gender-neutral) or heroine (feminine) (Ancient Greek: ????, h?r?s) is a person or main character of a literary work who, in the face of danger, combats adversity through impressive feats of ingenuity, bravery or strength, often sacrificing his or her own personal concerns for some greater good.
The concept of the hero was first founded in classical literature. It is the main or revered character in heroic epic poetry celebrated through ancient legends of a people; often striving for military conquest and living by a continually flawed personal honor code.[1] The definition of a hero has changed throughout time, and the Merriam Webster dictionary defines a hero as "a person who is admired for great or brave acts or fine qualities".[2] Examples of heroes range from mythological figures, such as Gilgamesh, Achilles and Iphigenia, to historical figures, such as Joan of Arc and Gandhi, to modern societal heroes like Rosa Parks.

*https://en.wikipedia.org/wiki/Hero

8: or, of, a
6: the
4: hero, and
3: is, in, for, as
2: who, to, through, such, personal, person, often, main, heroes, figures, character, ancient
1: work, webster, was, time, throughout, striving, strength, some, societal, sacrificing, rosa, revered, range, qualities, poetry, people, parks, own, neutral, mythological, modern, military, merriam, masculine, living, literature, literary, like, legends, joan, it, iphigenia, ingenuity, impressive, honor, historical, his, heroine, heroic, her, has, greek, greater, great, good, gilgamesh, gender, gandhi, from, founded, flawed, first, fine, feminine, feats, face, examples, epic, dictionary, definition, defines, danger, continually, conquest, concerns, concept, combats, code, classical, changed, celebrated, by, bravery, brave, arc, adversity, admired, acts, achilles

In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or 'boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and personality.
*In: Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783449

"While "bag of words" might well serve as a cautionary reminder to programmers of the essential violence perpetrated to a text and a call to critically question the efficacy of methods based on subsequent transformations, the expressions used seems in practice more like a badge of pride or a schoolyard taunt that would go: Hey language: you're nothing but a big BAG-OF-WORDS. Following this spirit of the term, "bag of words" celebrates a perfunctory step of "breaking" a text into a purer form amenable to computation, to stripping language of its silly redundant repetitions and foolishly contrived stylistic phrasings to reveal a purer inner essence." 

*Michael Murtaugh, a bag but is language nothing of words http://www.mondotheque.be/wiki/index.php/A_bag_but_is_language_nothing_of_words

Sedimentation and rigidity
*
"With our data fetched, we ran a data categorization  job where we asked our contributors to visit the profile pages of Twitter accounts and judge the gender of each. We had them bucket  accounts into “male,” “female,” “brand or organization,” and gave them  an option for “can’t tell” as well. Then, we ran the tweets through our AI feature."

*http://www.crowdflower.com/blog/using-machine-learning-to-predict-gender

"One of the crucial questions raised by the gender Turing test, to my mind, is about the role of rigid, socially defined gender binaries. The test is predicated on an understanding that there are two genders, male and female, and that they each behave in a certain way. If we choose not to take this idea for granted, and instead decide that there is a vast spectrum of behaviour and appearance running from that which completely and stereotypically matches a gender, to that which is entirely opposed, the gendered Turing Test becomes impossible. How do we decide, from a textual discussion, what gender someone is if we do not require all people to adhere to a strict social script about their gender?"

*ginger coons, Gendered Turing Test http://www.adaptstudio.ca/blog/2014/04/gendered-turing-tests-and-strategies-for-concealing-and-identifying-gender-online.html

"Because common sense always appears authorative, it conjures up rock-solid reassurance, reliability, stability. It is simple, obvious and certainly not hard to read or difficult to convey. Indeed, from Thomas Paine to more contemporary appeals so widely shared that it rarely requires justification and so intimate that it is as if everyone should know it deep in the bones, both a public and a private sense, half intuition, half cultural attribute"

*Stephanie A. Smith, Household words http://www.contentburns.com/household_words_96468.htm

"Well, about the social normativites. I completely agree that algorithmic normativity, despite the fact that it appears completely a-normative in fact, is a reflection of social or unreflected upon social normativities. An increase. An encouragement of such normativities. But also a naturalisation of these normativities. Which become invisible. Unspeakable. Because they have been translated into ones and zeroes."
*
*Antoinette Rouvroy, Discrimination and Big Data. With Geoffrey Bowker, Solon Barocas, Antoinette Rouvroy and Seda Guerses. January 2015, Constant in collaboration with Vlaams-Nederlands Huis deBuren and CPDP. http://video.constantvzw.org/cqrrelations/bigdatadiscrimination.webmhttp://sound.constantvzw.org/cqrrelations/big-data-discrimination.mp3

Female language features
*
Example 1: The main aim of this article is to propose an exercise in stylistic analysis which can be employed in the teaching of English language. It details the design and results of a workshop activity on narrative carried out with undergraduates in a university department of English. 
The methods proposed are intended to enable students to obtain insights into aspects of cohesion and narrative structure: insights, it is suggested, which are not as readily obtainable through more traditional techniques of stylistic analysis. The text chosen for analysis is a short story by Ernest Hemingway comprising only 11 sentences. A jumbled version of this story is presented to students who are asked to assemble a cohesive and well formed version of the story. Their re-constructions are then compared with the original Hemingway version.
*
Example 2: My aim in this article is to show that given a relevance theoretic approach to utterance interpretation, it is possible to develop a better understanding of what some of these so-called apposition markers indicate. It will be argued that the decision to put something in other words is essentially a decision about style, a point which is, perhaps, anticipated by Burton-Roberts when he describes loose apposition as a rhetorical device. However, he does not justify this suggestion by giving the criteria for classifying a mode of expression as a rhetorical device. Nor does he specify what kind of effects might be achieved by a reformulation or explain how it achieves those effects. In this paper I follow Sperber and Wilson's (1986) suggestion that rhetorical devices like metaphor, irony and repetition are particular means of achieving relevance. As I have suggested, the corrections that are made in unplanned discourse are also made in the pursuit of optimal relevance. However, these are made because the speaker recognises that the original formulation did not achieve optimal relevance. In contrast, deliberate reformulations are designed to achieve particular contextual effects, and they should not be taken to indicate a failure to communicate any more than, for, repetition.

*Examples from: http://www.cs.biu.ac.il/~koppel/papers/male-female-text-final.pdf
*1: Language and Literature Vol. 1 (1992). Simpson, Paul
*2: Language and Literature Vol. 2 (1993). Blakemore, Diane
*
enlarges -- 786.304020616
blueberry -- 671.821194342
purse -- 645.593493698
bronchitis -- 579.204082721
girlie -- 576.800606068
pedicure -- 575.362642071
girlies -- 551.940227459
bridal -- 531.683556034
hubby -- 525.153731832
unpack -- 514.00857283
earrings -- 504.685659968
lightening -- 488.326058918
women's -- 481.237104908
herself -- 474.534279513
bangs -- 472.133257463
husband -- 469.026020307
heels -- 467.818165741
dresses -- 455.762353952
bruises -- 452.136263156

*Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783449/

Words, phrases, and topics most highly distinguishing females and males

«Female language features are shown on top while males below. Size of the word indicates the strength of the correlation; color indicates relative frequency of usage. Underscores (_) connect words of multiword phrases. Words and phrases are in the center; topics, represented as the 15 most prevalent words, surround. (: females and males; correlations adjusted for age; Bonferroni-corrected).»
In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or 'boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and personality.

LIWC (Linguistic Inquiry and Word Count) has also been used extensively for studying gender and age [21]. Many studies have focused on function words (articles, prepositions, conjunctions, and pronouns), finding females use more first-person singular pronouns, males use more articles, and that older individuals use more plural pronouns and future tense verbs [30]–[32]. Other works have found males use more formal, affirmation, and informational words, while females use more social interaction, and deictic language (specifying identity or spatial or temporal location from the perspective of one or more of the participants in an act of speech or writing, as the words we, you, here, now, then, and that. FS) [33]–[36]. For age, the most salient findings include older individuals using more positive emotion and less negative emotion words [30], older individuals preferring fewer self-references (i.e. ‘I’, ‘me’) [30], [31], and stylistically there is less use of negation [37]. Similar to our finding of 2000 topics (clusters of semantically-related words), Argamon et al. used factor analysis and identified 20 coherent components of word use to link gender and age, showing male components of language increase with age while female factors decrease [32].
*
*Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783449/

Unreasonable Effectiveness: From mining to prediction

Common applications of text mining
*Sentiment mining
*Age prediction
*Gender prediction
*Personality prediction
*Level of education prediction
*Deception detection
*Authorship attribution

"People write differently in different forums. For example, a single writing sample may appear MALE for informal writing but test as FEMALE for formal writing. Be sure to interpret the results based on the appropriate writing style. (These notes, for example, are more informal/blog than formal/non-fiction.)"

*http://www.hackerfactor.com/GenderGuesser.php#Analyze
*
*Bernhard Harcourt, Against prediction

Correlation of style, content and personality

*Stylene Demo http://clips.uantwerpen.be/cgi-bin/stylenedemo.html
*
*Stylometry
*https://github.com/jpotts18/stylometry
*
*Style, content and personality
*http://u.cs.biu.ac.il/~koppel/papers/AuthorshipProfiling-cacm-final.pdf

----

00:00:06,025 --> 00:00:20,324
Facial Weaponization Communique: Fag Face

00:00:20,324 --> 00:00:24,567
Today, in our world of information capital and global Empire,

00:00:24,567 --> 00:00:29,655
biometric control has emerged as a golden frontier for neoliberal governance.

00:00:29,655 --> 00:00:33,734
A multibillion dollar industry in security and marketing sectors,

00:00:33,734 --> 00:00:39,231
biometric companies produce devices like iris scans and facial recognition machines

00:00:39,231 --> 00:00:43,533
with the hopes of manufacturing the perfect automated identification tools

00:00:43,533 --> 00:00:47,234
that can successfully read a core identity off the body.

00:00:47,234 --> 00:00:49,887
Biometric devices are becoming powerful weapons

00:00:49,887 --> 00:00:53,404
to control and police national borders and citizenship status,

00:00:53,404 --> 00:00:57,420
track and target a nation or companies’ enemies and criminals

00:00:57,420 --> 00:01:01,317
as well as to profile and parse various sectors of the public

00:01:01,317 --> 00:01:05,736
into potential risk categories, like activists.

00:01:05,736 --> 00:01:08,450
Biometrics also determine marketing strategies

00:01:08,450 --> 00:01:11,209
through standardized algorithmic processing

00:01:11,209 --> 00:01:15,605
of identification markers such as gender and race.

00:01:15,605 --> 00:01:20,662
Biometric technologies rely heavily on stable and normative
conceptions of identity,

00:01:20,662 --> 00:01:24,762
and thus, structural failures are encoded in biometrics that discriminate

00:01:24,762 --> 00:01:28,908
against race, class, gender, sex, and disability.

*Zach Blas, Facial Weaponization Communique: Fag Face

ADDITIONAL MATERIALS

Sarah Ahmed: Differences that matter (chapter on The death of the author)

pattern.paternalism project http://snelting.domainepublic.net/affiliation-cat/constant/pattern-en-paternalism

"Biases against women in the workplace have been documented in a variety  of studies. This paper presents the largest study to date on gender  bias, where we compare acceptance rates of contributions from men versus  women in an open source software community. Surprisingly, our results  show that women's contributions tend to be accepted more often than  men's. However, when a woman's gender is identifiable, they are rejected  more often. Our results suggest that although women on GitHub may be  more competent overall, bias against them exists nonetheless."

*Github research and it's trouble: https://peerj.com/preprints/1733/

"Ultimately, a feminist data structure might take cues from what Jo Freeman (aka Joreen) advocates in “The Tyranny of Structurelessness”  (1970-1973). Writing about group organization within the feminist  movement, Joreen notices that the ideal of “structurelessness” does not  work; a few “informal elites” always end up directing what happens  unless a group adopts principles of democratic structuring. If we carry  this line of thinking into the realm of organizing data, feminist data  structure would be one where classification categories are consciously  articulated and decided as fairly as possible by those who will access  or interact with it, not just by an elite few."

*Women's way of structuring data http://adanewmedia.org/2015/11/issue8-masters/

We got into a conversation on the mailing list.  Somebody, a non-native English speaker was asking about pronouns and  gendered pronouns and the proper way of 'pronouning' things. In English we  don't have a suitable gender neutral pronoun. So he asked the questions  and some guy responded: The proper way to do it, is to use  he. It's an invented problem. This whole question is an  invented question and there is no such thing as a need for considering any  other options besides this. 3 So I  wrote back and said: That's not up to you to decide, because if  somebody has a problem, than there is a problem. So I kind of naively  suggested that we could make a Unicode character, that can stand in, like  a typographical element, that does not necessarily have a pronounciation  yet. So something that, when you are reading it, you could either say he  or she or they and it would be sort of  [emergent|dialogic|personalized]. Like delayed political  correctness or delayed embraciveness. But, little did I know, that Unicode  was not the answer.

Did they tell you that? That Unicode is not the answer?

Well, Arthur actually wrote back 4, and he knows a lot about Unicode and he  said: With Unicode you have to prove that it's in use already. In  my sense, Unicode was a playground where I could just map whatever values  I wanted to be whatever glyph I wanted. Somewhere, in some corner of  unused namespace or something. But that's not the way it works. But TeX  works like this. So I could always just define a macro that would do this.  Hans actually wrote a macro 5 that  would basically flip a coin at the beginning of your paper. So whenever  you wanted to use the gender neutral, you would just use the macro and  then it wouldn't be up to you. It's another way of obfuscating, or pushing  the responsibility away from you as an author. It's like ok, well, on  this one it was she, the next it was he, or whatever.
So in a way gender doesn't matter anymore?
Right. And then I was just like, that's something we should  talk about at the meeting. I guess I sent out something about my thesis  and Hans or Taco, they know me, they said that it would great for you to  do a presentation of this at the meeting. So that's very much how I ended  up there.

*First part of an interview with John Haltiwanger on pronouns in ConTeXt http://freeze.sh/_/2015/conversations/catbod

*Phillip R. Polefrone hatespeech semantic analysis https://prpole.github.io/hate-speech-and-online-activism/

*Gender, Race, and Nationality in Black Drama, 1950-2006: Mining Differences in Language Use in Authors and their Characters http://www.digitalhumanities.org/dhq/vol/3/2/000043/000043.html
*Slave Narrative Name Database Project http://digitalinnovation.unc.edu/projects/slave-narrative-project/
*http://lklein.com/2012/01/a-report-has-come-here-social-network-analysis-in-the-papers-of-thomas-jefferson/

10/11/2017

"The Words Men and Women Use When They Write About Love"
https://www.nytimes.com/interactive/2017/11/07/upshot/modern-love-what-we-write-when-we-write-about-love.html

"while they (machine learning algorithms) classify in very different  ways, they all assume that the world is made of things or events that  fit in stable and distinct categories. Their capacity to classify  depends on learning to recognize the differences between categories that  themselves remain fixed. (...) This combination of indifference to actual differences and presumption of stable classifications is a  distinctively problematic feature of machine learning." (The production  of prediction: What does machine learning want?)