Notes workshop Nicolas Maleve: Variations on a glance http://constantvzw.org/site/Variations-on-a-Glance.html Notes from yesterday's lectures: http://pad.constantvzw.org/p/algoliterary.lectures Notes from workshop Algolit: http://pad.constantvzw.org/p/algoliterary.workshop.collective-gentleness -- cultural assumptions from the algorithm: it is not just a bedroom, it is a hotelroom economy & social world around machine learning a lot of manual work LabelMe as a tool to include cultural knowledge into machine vision techniques (http://labelme.csail.mit.edu/Release3.0/) the work of an annotator The computer analysis the configurations in the labeled shapes, and connect it to the label. annotation process, needs to happen before the code can be ran rarely discussed as work, while it gives maths a root in a daily world/cheap labour "sometimes it seems that algorithms exists high above the earth, but no, the actual work is very much on the ground" Notes on Image Annotation - Adela Barriuso&Antonio Torralba, 2012 https://arxiv.org/abs/1210.3448 Torralba, pioneer of image sets for annotation Adela barriuso, shop owner in Mallorca, during low season she does annotation work - was a champion of annotation world / nowadays it would be considered normal Now a lot of ML work is done in commercial context, but LabelMe was still developed in a context of research Adela Barriuso: labeling the images during work in a shop "gives you a different perspective on the act of seeing" "you are especially bothered by occlusions" occlusion is central theme, because in labeling the contour asks for full objects do you then label a bed as "a part of a bed"? There are many papers and discussions about this, and no consensus. SUN database (ancestor of these databases) http://groups.csail.mit.edu/vision/SUN/ SUN = Seeing UNderstanding [oh!] it is happening at huge scale make annotations & adopt classification or the labels (making sense of the labels) -> questions notion of image + words + classes... image-net http://image-net.org/explore current state of the art linked to wordnet WordNet as a system to standardize connections between objects, words, etc. The limit of the dataset is the limit of what can be said in the tools that are built on top of it. 14 million images found on the internet with crawlers > algorithmic curation relies on what Google thinks, f.ex. image is 'a flower' the algorithm is already inside the selection (through the crawler) Visual Genome http://visualgenome.org/ on top of Imagenet 33000 unique workers, doing annotations and validating the annotations of the others 4 cents ($) per annotation (Adela Barriuso added 250,000 labels) if you want to make money with this, your rhythm will need to be fast why and how is it possible that computer vision need to rely so much on this labour system, was the question for Nicolas to start his phd research how to relate the 4$ cents to a gold standard, two ways to refer to the same the Gold standard vs the 4 cents Li Fei-Fei, person behind Imagenet - superstar of computer vision you need common benchmark to judge algorithm imagenet = benchmark but stays always behind the scene "The real star is Google." Fei-Fei in beginning: did experiments without knowing what they would do with the practises/information also the intention was not focused on machine learning, but more coming from a psychology context 'seeing' - hot topic in psychology (she was part of the psychology team who worked on this project) immediate perception: what can you say of a building when you open door and look description after half second part of the experiment is to explore what a half-of-a-second is (500ms) experiment the score an experiment dating from 2007 two stages cfr paper that is on the table, copy for each one of you nowadays it is more difficult to get "naive test subjects" because administrative bodies require you to describe in detail what will happen and ask for permission 2007: camera's appeared that had an in-built face-recognition function Pts: presentation time stage 1: 22 students from California Institute unpaid job stage 2: scoring of the descriptions this is a paid position executed by 5 volunteer students from schools in the LA area (18 - 35 years old) this experiment inspired people who wanted to build models for computer vision it breaks from previous experiments in that: Micro-timing: perception is not uniform during the first 500ms of vision Images: they come from Google images Free recall: the subjects describe the images with their own words vs multiple choice before Taxonomy: a hierarchical tree of terms is produced and used to evaluate the descriptions images from "the internet", not especially created for the experiment researchers: "it means the images are less biased" there is an assumption that the images from the internet are a product of collective knowledge (do not contain bias?) 'it is all free but in the end it must match' we will replay the experiment with a difference with small modifications in different steps we will try to sense where it resists Questions: Vision? The context is cognitive science, computer science, optics, neurology Within the cognitive science discourse: the vision community, which excludes artists, media theorists, etc. interesting to note that copyright is waved when it comes to gathering material to train algorithms, but only for academics and commercial researchers (so not artists, activists, ...) The world of computer vision and cognition refer to each other to validate their work. "cognition works this way, as algorithms work the same way" - and the other way around ===END OF PART ONE=== EXPERIMENT stimulus = what is happening on the screen rectangular helps to focus / cross as fixation orienting the gaze to the task cancellation image "an image to cancle the memory" (blue rubber) (sometimes turns green) duration of the exposure to the image 27ms to 500ms looking at vision like looking is a camera operation this set-up s 'analog distributed camera' there is a script to create the random noise images 27 milliseconds maximum refresh rate! [so vision measured against the capacity of the machine] of screens in 2007 ("don't worry" the images will be different this time) It's not about being good or bad. keep in mind your level of comfort in doing this within this setup (Ready?) (X_X) 27ms is much too fast!! first: individual descriptions of 3 images then: descriptions in a group of 3 Interesting how the descriptions float between three perspectives. There is also much more cultural interpretation included in the descriptions. experiences of the experiment normally the experiment is executed individually in a dark room, excluded from any other form of input different reaction when a group less authentic in a group reports of others are built in in your memory group discussions create false memories interpretations depend on time and on the group always a mention of the absence of people in a picture in collective descriptions, the construction of the image became more important and included more details, lighting, quality, position of the photographer could also relate to time, and getting used to the procedure, could be a combination also because we are not looking at vision, but representation. We're not testing our eyes. False memories started forming in a collective setting a detail was not noticed in an individual setting, but when it was mentioned, it was possible to form an opinion triggered to think of aspects individual part *training part was important, i went first *afterwards you realise there is no people in the image *the more time passes, the more expectation there is collective *convinced opinions * descriptions don't come over as correct or not it defines the memory, but also models the memory description of the image became better, but does not relate anymore to what you saw different opinions are all correct you could recognize eachother interpretations in the image afterwards individual: focused on specific elements (focal) collective: focus on peripheral (?) you think more about the story concept of vision concept of imagination (specially when doing collective observations) role of errors where different collective: errors as a way to identify the ambigious elements errors signify an ambigious image more interesting to do it collectively it was fun to do it alone it was not fun to do it alone in general, my visual observation is not so good, lower than most people My landscapes are very vague. I can describe sounds and conversations. So i know, i probably have it wrong. I just don't see it, i can't even make anything out of it. Speed of transcription can influence. Is this experiment done with sounds? When i would type myself, i would write lists of keywords. The role of the transcriber is very active. Relation to training a machine The next step is to reduce the descriptions to a list of words to feed to the computer. In the original example, nearly 2000 descriptions were made, and then reduced to 60 words. For ImageNet, they applied this exeriment to all the images at least 3 times. They didn't use the different intervals. They use Mechanical Turks without any time constraints, but the time constraint is implied by the work conditions. Fast work is the only way to make money. Thesis: there are a series of relations that you establish, and you need to deal with its consequences. Same kind of relations are at work for 500ms to interpret the images. But the experiment shows that time constraint is actually not a problem. When there is no agreement, the image falls out of the process. Reduction of language, reduction of operable images. Models are learning on cliches/stereotypes/most common images (and words) We created a lot of bias during the experiment Discussion of bias -- producing biased descriptions and unbiasing it after Root of the word bias: related to 'grain' used in textile when you cut on the bias, you cut on ??? bias = cut diagionally on the grain of the thread (so it makes textile is more bendable, used to finish round edges for example) http://www.fabrics-store.com/blog/wp-content/uploads/2016/05/bandbinding_body10.jpg bias as (not) a on/off or good/bad thing Threadsmagazine.com "To become a grain rebel, you first need to identify and understand fabric grain (...) that is truly unique" http://www.threadsmagazine.com/2008/11/23/go-against-the-grain recognizing bias as part and parcel of the plasticity of the description, so how to work WITH bias. "To become a grain rebel, you first need to identify and understand fabric grain (see the box below). Then take a fabric and tug it in all directions to test its stability and stretch. Once you know how different fabrics stretch and drape, you can start playing with grain to make a garment that is truly unique." (sidenote: textile techniques are generous sources of metaphors. it seems to me that the two main metaphors in technology are to war machines and weaving machines. and the brain) "if we want the description to be plastic elastic, we need to work with a bias" 3 ways to approach expriment 1. Follow the grain direction: "it is science. It is tech" 2. crosswise, against the grain: it is bullshot, old manipulation, not engage with it (AI will never be as good as actual humans) 3. on the bias, "not technically a grain refers to any line diagona to the lengthwise and crosswise grains" going 45 degrees with the problem (?) So to take the bias into account, but to go with it, use its plasticity without erasing it -- using its strength. 'learn to ride/write on the bias, make it productive, develop oblique relations, not just frontal ones. find a way to not follow it as it is - ... objective .... - stand outside and looking at it, and say that it is biased and therefore bad - complex way: accept bias Take bias into account as an interesting dimension. - follow the grain - 45 degrees - "collapse of air pockets" Allows a square piece of fabric to morph into a diamond shape ref to conversation with Mike Kestemont before this event: machine learning is bias, the bias is doing the work [but how to process multiple biases into a system that often outputs one result? And also, how racism would be a strength as a bias ... so very touched by this 'third way' but not sure how/where to begin -- maybe it is about opening up spaces for conversation (NM: "making a variety of responses possible"), rather than to 'make efficient', ref. Zach Blas. But how to be fed from this conversation if you are only confronted with a machine learning output that is prepared for your 'profile', form of isolation/segregation that is immediately operative as a consequence] MK: "precise bias is precise tasks" (?) Nicolas trying to formulate a critique, by engaging into a process. Not a truth from a distance. That is why we need to experience the process, like today. to find what is possible to do with it, You need to do a session like this, only reading the paper is not enough. You would'nt experience the richness of the descriptions. http://www.zachblas.info/works/facial-weaponization-suite/ http://www.zachblas.info/writings/facial-weaponization-suite/ http://median.newmediacaucus.org/caa-conference-edition-2013/escaping-the-face-biometric-facial-recognition-and-the-facial-weaponization-suite/ Subtitles: http://possiblebodies.constantvzw.org/inventory/?023 paid students have to say for each description: 1. whether it is correct 2. what class the image is part of 3. only the keywords stay [hands out a description of the process of further processing an image after labeling] replacement exercise: only words in the pre-defined vocabulary are kept. Abstraction process = reduction process. Needed to train the machine. This is less precise than image net; this experiment was more crude. This is not wordnet but there are familiarities https://en.wikipedia.org/wiki/WordNet Eleanor Rosch, work on basic categories https://en.wikipedia.org/wiki/Eleanor_Rosch !!! http://psychology.berkeley.edu/people/eleanor-h-rosch example: parents, teenager, teenager comes back after going out, parents: what did you do?, teenager: 'something'. The parents do not expect to hear about many details. The is a level of expectations. Basic categories of the economy of language. You would describe something that take less effort to describe. expecting certain levels of description, economy of language -- less effort to describe. First: chair, then: it is made of plastic. Ref to information theory, where the least amount of coding effort is connected to the letter 'e' (most common letter). Making difference between harp, piano, saxophone ... they are all instruments. great difference between what goes through and what is left all sensuous exerience is reduced to 5 elements, while furniture is very extended (sink/toilet) Because the distribution of specificity is out of balance in the graph of categories, there is already a bias in the categorization. Specifically, everything that is related to the senses/body is already reduced. Reasons for this? Cartesian split. coco database (?) It's not an exception that we have a small vocabulary for body related experiences / abstractions(...?) wordnet: Transsexual under anomaly -- there are specific visions, obviously. Q: isn't there not any org that surveys this? A: This is the widespread database used in MANY applications. There are not a lot of people addressing the problems with vocabularies. It is also often related to what is available, what starts to circulate. Different contributions from different universities. Long history, slowly growing very big. wordnet is standard classification for computing its thickness becomes a reason for many people to see it as something neutral How you describe something ..., is complex. The filters are very crude. Wordnet becomes the filter of exclusion of physical impressions and ambiguities. An: some things are not updated since 1985 Q: what about law, should this be surveyed? First it should be in line with human rights to include it in the system? This example is terrible, but it is a sign of even greater horror of flattening. Showing and hiding. Bias in itself is not the problem, but how can we engage with it.! The process is about flattening. And highlighting certain things. The process is about biasing. The question is what is the nature of a bias. Perceptual performance. Detaching perception from the subject, and attaching it to the taxonomy. The 'perceptual performance of a term', but never of the people. perceptual performance of a term. not the perception of people, but in the paper they refer to perception as a response. response - stimulus (so it is a mechanic view on vision, again). It means 'rocks' have more perceptual performance than 'gay'. only when the response is translated into a list of terms, perception comes in. chair, rock have a different grade of performance .... emphasis towards unambiguously described objects so when recognizing something ambiguous, unambigious objects are more easy to describe think of perceptual performance of word 'arab', 'gay' versus 'chair', 'table', 'AirBnB room' ;) Q: what are the applications of this biasing system? A: security, attention economy, advertising (google ads) HansL: economical use of language. but the difficulties become obvious in security related usage Facebook advertisements is an interesting case study to relate to. NM: "ontological gymnastics" taking perception out of human, and attaching it to classification? HansL: it is correct from an information theory point of view: most simple code to get the most efficiency out, these simplified categories make sense. Everything goes towards/matches an economy of language, and also the economy of science (note: earlier AM mentioned that only academics and commercial researchers have access to this amount of info). Where is the knowledge? and to an economy of science. where is the knowledge? knowledge in the room [indoor household] knowledge in the hands of the academic researcher the annotators are anonymized so they cannot claim the knowledge produced. NM: "the knowledge is not an object, it's a process" if you want to perform the knowledge, you need to recreate the situation. How can the room migrate in the different settings where you want to use this knowledge? We just don't care about the room. Or, you need to migrate to (?) the room once you want to access and use the knowledge again. [situatedness] reproducing conditions of work, singular subjects, ... it is how algorithms become concrete / matter. And so you can think of algorithms in terms of *setups *practical conditions *labour if we target the researcher, we miss what is necessary to create/make these algorithms? Femke: In the narration of these technologies, it is often said that with more data, computing power and so, we will overcome this threats. Can we follow the dataset mantra? NM: i don't think the quantity of data can overcome this. economy of language is not going away all the circumstances in which relationships existing here are reproduced, are proliferating we could adress the different settings in which the knowledge is produced For example: What was different between doing the experiment of this afternoon alone or together? find different ways ot find convergences/keep differences/diversity (?) in the descriptions Hans: partially disagrees. Greater computer power means other levels of language. More complicated models of language. To work with letters in stead of words, is in a way similar to work with simplified categories in stead of having many categories to work with. We have to find ways to assume political agency. Always needed: find ways to take responsibility to make these descisions. How do you take your responsibility? And not deligate it to the system filter & bias are necessary to be able to express something. Hans: reference to legal language, where also very abstract language is used. Q: something concrete in front of you, but also there is a situation in which this concrete thing appears, what is more important? the concrete thing or the situation which permits it to exist? ... what does allow the thing to exist. Compare industrial food. "It smells good" The critique is not enough. They of course need to be addressed. You're only able to catch the most obvious problems. It is not just in 'their' hands, but how can it be in ours. [how can we connect to 'their' hands ... that is what we are trying these days/algolit is trying? How to actually change these extractive relationships?] NM: bias is a way of expression What about nouns that do not have clear visual representations. "Like 'patriot'" Things that cannot be imaged, so they are discarded from the taxonomy. A double reduction. "fluffy nouns" are not "physical entities". taking out the disease subtree. Expectation of deep serious concerns in these tech but is it really serious, to take out the disease sub-tree. 3000 people that work on wordnet, and they decide that patriot is not imaginable. FS: very often the response to concerns is "we are still at beta level" "and at some day we will grow up" now we can recognize a face in the crowd. Have we grown up? Or are will still on baby level and will it become better? NM: It always comes back to the question: What counts as knowledge? If it (if the applie algorithm?) is the product, there is no reason it will improve. Because you need to hide too much of the cultural decisions that were taken where (and how) the knowledge is produced. Only when you take them serious and start from there, a change is possible. verbs are thrown out, unless they indicate action the problem is often delegated to the annotator delegate to annotator and if they say it is not/if there is disagreement then the object will be discarded Is this fascist -- economic language without ambiguity. It takes the conflict out, but opens space for resistance? NM: as an annotator to be too much in disagreement, you will be out of a job. SO the question remains: how to make processes/tech that values disagreement, and how it can enrich performance. (annotator note: Maybe emphasise the bias?) it is not all bad if we place it in different context where we value disagreeemnt We need to be on the bias. NM: 19th century -- no reservation to make claims about the world. Hans: Are we not reading too much into it? NM: the classification systems are universalist ... they are plugged into these systems. https://www.researchgate.net/publication/283356710_An_Analysis_of_WordNet%27s_Coverage_of_Gender_Identity_Using_Twitter_and_The_National_Transgender_Discrimination_Survey Hans: cybernetic like systems that look at behavior of users and apply their results to for example search results, is less crude? ref to Google analytics "it is so available" the pervasiveness is dangerous Hans: are we too dark? It does not have that many applications? It will take a while, but more data WILL help in the end. Making bias political. We are just at the start. constructing an interesting political relation with the positioning of bias. Femke: paying attention to the creation of knowledge, bias is interesting, and inherent to language and communication but to see the machinaries at work that create biases and clichés how to work with biases, but also with racism/sexism concern about reiteration of convention, and the disabilities to deal with differences, but at the same time being super excited to work with the diagonal and the grey NM: Where do you put the emphasis in the process: - separation between data and algorithm is a problem So: think of practices where the intimate relation between the two is positive. Resist the separation, pressure that they can operate independently. they are symbiotic. - insisting on the embodiment computer vision exists because there are bodies that do this work of vision. "It's not a piece of software that does it all".> the eyes of the human subjects that make up computer vision there are many eyes, of people that have been trained to see in certain ways. "Adobe guy": a 'normal' picture (not much happens in a 'normal picture') "in a normal picture, the most important thing that happens stands in the middle" Pierre: political project, as it puts the problem of class at the center artist 'elites' can create not-normal images, with complexity. because of having knowledge on image making histories FS: normal vision and class vision are connected, but how to go from there? NM: stretch the cloth Pierres grandma: "oh, the bias, so difficult but it's so interesting" HL: do you have an idea on current annotation practices? eg: CAPTCHAs NM: synthetic annotation: from little data, the algorithm expands the annotation eg: Mike's example of throwing away the decoding part of the decoding/encoding process