# 2048 (important)
Kurenniemi didn't leave algorithms, he left a dataset.
>As work began on the project, it became immediately evident that Kurenniemi had documented his life, but not archived it in a traditional sense. Despite the lack of a consequent or single ordering however, multiple orderings of the material still existed. As work continued a number of core problematics became evident related to giving "direct access" to the digital materials of the archive: (1) Legality: how to address potential restrictions on its visibility due to questions of privacy, copyright, and a frequently intimate sexual nature; (2) Quantity: how to take on the vast amount and diversity of the material; (3) Fidelity; how whatever form of access given would then relate to Kurenniemi's ideal of an "artificial consciousness". In each case, the problems were embraced as central themes, and part of a unique opportunity to situate the work before the archive had fully formed, somewhat in the spirit of Kurenniemi's intention to archive for future use.
> ... it ’s possible that those living a couple of hundred years into the future will regret not having been able to live now, when it was all taking off: building the first ENIAC computer or mapping the DNA. And they may have a genuine interest
to reconstruct the twentieth or twenty-first century based on their archived material. What else will they have to do, sitting in their golf balls in space and looking forward to a hundred thousand years of humdrum life? They'll have to just go through old archives and maybe make new interactive video art or compose music. I don’t know what they will fill their time with. I think they’ll just watch porn. (Writing and Unwriting, pp.302-303)
http://kurenniemi.activearchives.org/logbook/wp-content/uploads/2012/05/overview_faces_800.jpg
http://sicv.activearchives.org/video/2faces.720p.webm
# 54–68 AD(important)
Recarving Nero, the third face
http://sicv.activearchives.org/logbook/recarving-nero/
http://sicv.activearchives.org/logbook/damnatio-memoriae/
Since the advent of face detection algorithm a face is always ready to morph into another. The difference between two faces is one of distance, it is never absolute.
But even long before face detection has been implemented in an algorithm, since the existence of the rules of proportions, faces have been living in a mathematical space where their differences could be computed. For a Greek sculptor, every face is a derivative from the canon. In the Roman Empire, when an emperor was overthrown, his sculpted portraits were physically altered into the likeness of his successor. A change in the political regime, and a vast process of face morphing was taking place. Far from being a trivial exercise, this process called damnatio memoriae involved the recarving of all the emperor's portraits in public and sacred spaces. After Nero's death, his face morphed into Augustus', bearing traces of removed facial features and incoherent planar elements. The transformation of a face into another was only possible because there was a sufficient continuity in the regimes of representation. Even if the styles were changing, the rules of computation of the canon remained relatively compatible.
Any morphing operation implies not two but three elements. The two elements to be compared and the third which makes the comparison possible. In the case of the Roman faces, the canon is this third element. The role of the canon in digital techniques of face morphing is played by the model used by the algorithm. An algorithm like the venerable Viola-Jones detects the presence of a face by “learning” from a series of images the regularities that make up a face. The same algorithm can be trained to recognize any other objects given the proper training set. The algorithm can recognize bananas if trained with enough examples. The algorithm is a recognizer. It knows how to recognize a face but doesn't “know” anything about the face. The knowledge lies in the model it is using. And, in turn, the production of this model necessitates images of faces. The model is produced from the extraction of the common features of an important number of images of faces. Its production requires a considerable amount of manual labour. To draw a box around a face, to highlight manually the facial features and to control the eventual mistakes that occurred along the process. When two faces are morphed into one another, a whole collection of human heads are making the transition possible. It takes thousands of eyes to morph one eye into another.
# 1999 (important)
Faces 1999
http://www.vision.caltech.edu/archive.html
A dataset in Computer Vision, is a collection of digital photographs the developers use as a material to test and train their algorithms. Using the same dataset makes it possible for different developers to compare their work.
1. The readme
001-readme
A text file, the readme, describes the content of the dataset. As the first line announces, Faces1999 contains pictures of people photographed frontally. This file is written with a mix of precision (it states the institutional affiliation, credits, dates) and approximation (“27 or so unique people”, “people under with different lighting/expressions/backgrounds”). The word “bike” instead of “face” in the sentence “Each column of this matrix hold the coordinates of the bike within the image” comes from a copy-paste from another readme file.
A dataset is key to computer vision, but is treated as a marginal practice. The mix of precision and approximation in the readme tells us: We are in the kitchen, not in the dining room of Computer Vision.
2. The lab and the house
002-004 intro
The people photographed in the dataset are Computer Scientists working in the Caltech Vision lab and their colleagues. This is the lab doing a selfie. It photographs itself at work.
The photographer is a member of the community. He is the measure of the dataset. It is a dataset at his scale. At his spatial scale, his surroundings. Where he can easily move and recruit people, he has bounds with the “subjects”. He can ask them “come with me”, “please smile” as expressions are part of the variations of the faces. In this sense, it is specific. With the dataset, we also enter the family circle.
005-006 family
Difficult to assert the detail of the family relations, but it is a house interior and it is a family with a woman and children. In the house there is no male subjects anymore. It is the world the photographer has access to, his universe.
2. The background
007-009 inscriptions
There is a sense of conversation happening in the backgrounds. The backgrounds are densely covered with texts of different sorts. They are also overlaid by commentaries giving a clue of the mixed nature of research activity. Work regulation documents (a summary of the Employee Polygraph Protection Act), staff emails, address directories, map of the building, invitations to conferences and parties, job ads, administrative announcements, a calendar page for October 1999, all suggest that more than code and mathematics are happening in this environment. Mix of bureaucratic injunctions, do this not that, forbidden, required, etc. Interpelation, language of advertising, invitation, suggestion. On a door a sign “Please do not disturb”. A note signed Jean-Yves: “Please do NOT put your fingers on the screen. Thanks.” There are networks of colleagues in the lab and outside. But the sense of intertwining of the social dimension and the maths, the code is nowhere more than in the picture of a whiteboard where complex mathematical equations cohabit with a note partially masked by the head: Sony Call Ma... your car is … 553-1. The same surface of inscription is used for both sketching the outline of an idea and internal communication.
3. Stitching the space
*stitch
The spatial continuity between the photographs makes me briefly consider reconstructing a panorama from the images. I see the software, Hugin, struggling to stitch these images together. Hugin looks for identical features in different images to smoothly assemble them. Hugin doesn't know how to distinguish a face from a background and rewrites their relations.
4. Closed loop
zzz-photographer
For the photographer and the person who annotated the dataset, traced the bounding boxes around the faces, the sense of familiarity with those depicted and the environment was strong. They were colleagues or even family. The dataset maker could be present at all stages in the creation of the dataset: he would select the people, the backgrounds, press the shutter, assemble and rename the pictures, trace the bounding boxes, write the readme, compress the files and upload them on the website. 10 years later when Fei Fei Li initiates a project with 14 millions of images, it is simply logistically impossible to have the same person doing this work. The closed loop between lab, engineers, photographs and dataset is broken.
# 2007 (important)
First commercial cameras with face detection? A contract with a rectangle
Before I press the button, it is already there on my camera screen. The rectangle. I remember the first time I saw it, I wondered why it was there. Why would you highlight a face on the camera screen? Does the camera manufacturer think its customers have lost the ability to recognize a face? Or is it a warning? A way for the algorithm to show its presence? I am inside the camera, look what I am capable of. That day, the camera screen became a shared surface where the face recognition program and I enter in dialogue. I would have to accept the company of something that has the cold eye of a scientist and a maniac obsession for over painting. And it would make its best to adjust the focus on the faces in the picture.
http://sicv.activearchives.org/share/nkwafia/face-detect-camera.png
# 2013
The Networked image ... new scales of datasets
ImageNet and Scale, 14 millions of images (important)
http://image-net.org/synset?wnid=n09618957
Based on the heritage of wordnet, a linguistic dataset with deep roots in AI research
Generic dataset, totalizing view
fer2013 (important)
Diversification of datasets ...
Phenomenon of the open call competitions, such as FER2013, a competition to train the best model to detect emotions.
> You may use additional training data, but please restrict yourself to publicly available datasets. Do not manually label the test data and train your classifier on it.
You do not need to use a representation learning algorithm. Algorithms that use hand-designed features such as HOG, SIFT, etc. are perfectly acceptable, as are algorithms that augment the dataset with synthetic transformations. However, your final system must be able to autonomously classify the test data without a human in the loop.
https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/rules
http://sicv.activearchives.org/video/fer2013.html
# 2015
IMDB Wiki
https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/
http://chalearnlap.cvc.uab.es/
# 2015
WIDERFACE (important)
Multimedia Laboratory, Department of Information Engineering, The Chinese University of Hong Kong
# 2017
Categories
https://twitter.com/GambleLee/status/862307447276544000/photo/1
Francois Chollet on the Future of Deep Learning (important)
https://blog.keras.io/the-future-of-deep-learning.html
The new stack...
Inversion of (Data is passive, program as active ...) ...
Training a convnet from scratch on a small dataset
Having to train an image-classification model using very little data is a common situ-
ation, which you’ll likely encounter in practice if you ever do computer vision in a
professional context. A “few” samples can mean anywhere from a few hundred to a
few tens of thousands of images. As a practical example, we’ll focus on classifying
images as dogs or cats, in a dataset containing 4,000 pictures of cats and dogs (2,000
cats, 2,000 dogs). We’ll use 2,000 pictures for training—1,000 for validation, and
1,000 for testing.
In this section, we’ll review one basic strategy to tackle this problem: training a new
model from scratch using what little data you have. You’ll start by naively training a
small convnet on the 2,000 training samples, without any regularization, to set a base-
line for what can be achieved. This will get you to a classification accuracy of 71%. At
that point, the main issue will be overfitting. Then we’ll introduce data augmentation, a
powerful technique for mitigating overfitting in computer vision. By using data aug-
mentation, you’ll improve the network to reach an accuracy of 82%.
In the next section, we’ll review two more essential techniques for applying deep
learning to small datasets: feature extraction with a pretrained network (which will get you to an accuracy of 90% to 96%) and fine-tuning a pretrained network (this will get you to a final accuracy of 97%). Together, these three strategies—training a small model from
scratch, doing feature extraction using a pretrained model, and fine-tuning a pre-
trained model—will constitute your future toolbox for tackling the problem of per-
forming image classification with small datasets.
Face as code
# SWIFT Faces
> The periphery of a Taylor Swift concert is as thought out as the show she presents on stage. Beyond the traditional merchandise stands, there are often dedicated selfie-staging points and staff distributing light-up bracelets. When Swift performed at the Los Angeles Rose Bowl venue on 18 May, fans could watch rehearsal clips at a special kiosk.
What they didn’t know was that a facial recognition camera inside the structure was taking their photographs and cross-referencing the images with a database held in Nashville of hundreds of Swift’s known stalkers, according to a Rolling Stone report.
https://www.theguardian.com/music/2018/dec/13/taylor-swift-facial-recognition-stalkers-rose-bowl-concert
OTHER MATERIAL TO USE? ??????
Wendy Chun
This intersection of data and methods designed to identify individuals
and those to identify larger trends suspends the traditional separation
between the two archival logics to incorporate the body that Allan Sekula
influentially theorized in relation to the production of photographic evi
dence. so The first, derived from the work of criminologist Alphonse Bertil
lon, focused on identifying the individual, on inscribing the body in the
archive (figure 3.5). The other archival logic, derived from the work of
the eugenicist Sir Francis Galton, sought to identify the hidden type driving
the body and thus to embed the archive in the photograph (figure 3.6) .
Currently, these processes have become inseparable a t the level o f data cap
ture and storage. The same process captures the data necessary to identify
individuals as singular and to identify their relation to certain groups. Ama
zon.com, for instance, tracks individual purchases not only to create a record
of a user (a digital fingerprint), but also so that it can connect that user's
actions with those of others in order to make suggestions for further
purchases-that is, so it can predict and encourage future behavior that
conforms to, and confirms and optimizes, statistical network analyses. (updating to remain the same, p. 120)
Aim: Face masks of speakers
Take picture of audience + run tiny face ?!
Joy Buoalamwini
https://www.ajlunited.org/the-coded-gaze
"off the shelf parts"
The process of augmenting existing models (inception 3) ... quote from Francois Chollet ...
<!> Complexifying the concept of removing the "human in the loop" ...
The image of kicking away the scaffolding and watching the model "just work" ..
Loops of Feeding the data/work back
in the end the process results in an "algorithm" that seems to stand and function alone, but which is in fact the result of endless loops of cleaning, aligning, cropping, retraining, tweaking, augmenting, mixing, cascading, curating, aquiring, ...
Management of the workforces involved
There are 172 corners present. This level indicates that the picture may have been taken in a built environment. There are 3 faces positioned at (xx,xx), (xx,xx) and (xx,xx). They occupy a portion of xx% of the image. One at the lower left corner and two in the upper half of the image. The distance between the tree rectangles is in average xxx. The people are close without touching each other. The proximity of the photographer and the position of the face in other pictures prove that the photographer moves easily among the circle of people photographed.
It is dark now. It is the end of an evening of November 2004. And the face in the center of the image receives more light than the one next to it at the right side. Two hands are close to the second face poorly lit. An image begins to form in the back of your mind, and we have only started counting.
http://kurenniemi.activearchives.org/logbook/?page_id=521
## erkki
http://sicv.activearchives.org/video/2faces.720p.webm
## doppelgänger
https://twitter.com/emilycarey_/status/952313214729060352
https://twitter.com/CarolineWazer/status/952025975692447744
## Important point to make
While deep learning empasizes a certain removal of "human in the loop" in the form of "manually created features" ... the process is still entirely dependent on a set of training data with (manually) applied labels. Male/ Female. Angry/Sad. Black/White/Other
Datasets come from somewhere and fit into particular economies / histories.
All form is a face looking at us
http://sicv.activearchives.org/logbook/all-form-is-a-face-looking-at-us/
Sliding into a face
http://sicv.activearchives.org/logbook/like-four-eye-machines-made-of-elementary-faces-linked-together-two-by-two/
The dawning of an aspect
http://sicv.activearchives.org/logbook/the-dawning-of-an-aspect/
http://sicv.activearchives.org/logbook/francis-galtons-composite-portraiture-meets-wittgensteins-camera/
# 1879 (optional)
Galton
English gentleman who invented the average portrait / also the fingerprint
needles and alignment
http://galton.org/composite.htm
http://galton.org/essays/1870-1879/galton-1879-jaigi-composite-portraits.pdf
# 1961 (optional)
Bochman, the segmentation of the face, face as passport
The Major Bochmann was head of the passport division on the Eastern side of Checkpoint Charlie […]. He developed a facial recognition system, designed to teach the border guards to scrutinize faces and look for features that cannot be altered. The aim was to assess the authenticity of passport photos for those who were trying to leave East Berlin.
A flashcard used for training at the Friedrichstrasse-Zimmerstrasse border crossing (the east side of Checkpoint Charlie). The guards had to answer the question: Does the photo on the right represent the same person as the photo on the left, and why?
http://sicv.activearchives.org/logbook/bochmans-face-recognition-system/
https://www.atlasobscura.com/articles/see-the-flashcards-the-stasi-used-for-facial-recognition
# 2001 (optional)
The Viola–Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and Michael Jones. Although it can be trained to detect a variety of object classes, it was motivated primarily by the problem of face detection.
[video](http://sicv.activearchives.org/share/vec/vec-faces.ogv)
Any morphing operation implies not two but three elements. The two elements to be compared and the third which makes the comparison possible. In the case of the Roman faces, the canon is this third element. The role of the canon in digital techniques of face morphing is played by the model used by the algorithm. An algorithm like the venerable Viola-Jones detects the presence of a face by “learning” from a series of images the regularities that make up a face. The same algorithm can be trained to recognize any other objects given the proper training set. The algorithm can recognize bananas if trained with enough examples. The algorithm is a recognizer. It knows how to recognize a face but doesn't “know” anything about the face. The knowledge lies in the model it is using. And in turn the production of this model necessitates images of faces. The model is produced from the extraction of the common features of an important number of images of faces. Its production requires a considerable amount of manual labour. To draw a box around a face, to highlight manually the facial features and to control the eventual mistakes that occurred along the process. When two faces are morphed into one another, a whole collection of human heads are making the transition possible. It takes thousands of eyes to morph one eye into another.
Where it all begins: http://wearables.cc.gatech.edu/paper_of_week/viola01rapid.pdf
Images: http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
# 2011 (optional)
http://activearchives.org/mw/images/b/bb/Dataset-02.jpg
What is a dataset, training, the process
http://activearchives.org/wiki/Cloning_into_orderings
# 2014
This paper addresses the problem of Face Alignment for
a single image. We show how an ensemble of regression
trees can be used to estimate the face’s landmark positions
directly from a sparse subset of pixel intensities, achieving
super-realtime performance with high quality predictions.
We present a general framework based on gradient boosting
for learning an ensemble of regression trees that optimizes
the sum of square error loss and naturally handles missing
or partially labelled data. We show how using appropriate
priors exploiting the structure of image data helps with ef-
ficient feature selection. Different regularization strategies
and its importance to combat overfitting are also investi-
gated. In addition, we analyse the effect of the quantity of
training data on the accuracy of the predictions and explore
the effect of data augmentation using synthesized data.
One Millisecond Face Alignment with an Ensemble of Regression Trees
Regression ... Face landmarks (quote)>..
# IMDB-Wiki