Notes from the Workshop Anyware - Location Privacy at VUB:
13 October 2016
Organized by Mireille Hildebrandt and Irina Baraliuc
Also in celebration of Katja De Vries' defense
https://vublsts.wordpress.com/2016/09/28/anyware-privacy-and-location-data-in-the-era-of-machine-learning/
Solon: ask not what your algorithm can do for you: the ethics of autonomous experimentation
waze:
outsmarting traffic together
was an app on smartphones
routing instructions for driving
have people using the report back to the app what the driving conditions are
police officer here, certain things happening on this road
learning about road conditions: from the people using the app
by using you were reporting
purchased by google
now under alphabet
what does this have to do with autonomous experimentation?
this seems to be the optimal path
based on what we know about traffic conditions
learning from other epxeriences
if service begine to direct everyone to location a, they may no longer no what happens in location b, where they diverted people from
to deal with this problem
there is a known technique: you send people to some other paths, uncertainty about the driving conditions, to see how the conditions are there
users are being used to collect information thaty ou have told some other users to avoid
there are different terms for this:
explore/exploit algorithms
for any given instance: routing information, you can exploit what you know about driving conditions
or you can use drivers to explore and see what discovers
it optimizes for the system overall so that in general you have an optimal solution for most users
at the expense of individual users
one users has a risk, the others share the benefit
machine learning: observational data, historical information that you have
explore exploit: it is experimental, you are proposing alternatives and looking at the consequences of the alternatives
varying treaments and comparing effects
these things are now being merged
online learning:
ethical implications at the intersection of the two procedures
credit: you make a credit decision, it effects the world and who is getting credit
and you want to know what happened to your assessment of people
we deploy the model, and see the effect, and retrain the model
online makes that continuous
A/B testing:
Multi-Arm Bandits
in optimization problems, you think you are getting to an optimal solution
you have confirming evidence that this is the best solution
but you may notice that there is another local maximum that is better
using randomness
reinforcement learning:
machine learning and experimentation mixed
alpha go:
this is the success of AI
alpha go had two primary steps:
look at previous go games
then have the computer play itself, using reinforcement techniques, which includes exploring other areas not explored by humans
same methods to deal with things that policy people are concerned with:
you deploy more police in one part of the city
as a consequence you are deploying less police to other parts of the city
so you cannot what is going on elsewhere
bandits: to produce disputes, could solve this problem
help you avoid you confront with predictions
driven by uncertainty:
if there is something you don't know much about, this is
uncertain effect is not equal to worse treatment
but there is uncertainty
should uncertainty be borne by individual users
and who is the person selected to explore the uncertain area
who gets to exploit that information
why is there greater uncertainty to certain solutions to a problem?
ethics:
belmont report:
autonomy
beneficence: do no harm
justice: unjust for prisoners, if they were all the subjects of risky experiments where the welfare flows to others
what is the relevance:
autonomy issue
consent issue
beneficence: users are being knowingly: put into uncertain conditions where they may incure extra cost
maybe there is reason why we have certain historical data, because humans know that it is risky or inconvenience
justice: you can imagine that there will be much more uncertainty about less common solutions
i minotiry exhibits different behavior than majority group
being less commmon, that population will be subject to mroe experimentation
the direct beneficiaries will be that population, so maybe itis ok
but the question still stands: is this the appropriate way to learn the information
could they be they benefit from a different solution
how do we avoid subjecting to significant risk
and that is what beneficence is telling us to do
baseline:
what is the baseline?
humans sometimes don't have an intuition about what is the right solution
the argument has to be that there will be many circumstances where there is knowledge of preferable solutions which are historically not well known to certain populations, like google?
we need to ask about why there is uncertainty in certain areas?
providing greater social context to what is currently known by these platforms
naive and obvious things to answer:
does the person know they are part of an experiment?
what information is the experiment intend to discover?
could that information have been obtained otherwise?
judith simon:
philosophy of sts: copenhagen
location based data and priacy: some epistemological considerations
is there a paradigm shift in science (due to changes in methods)?
target case:
inasion of privacy:
illegitmate access to data versus informed consent through payback cards
invasion of privacy not due to the gathering of data, but due to data processing and inferences
big data practices as epistemic practices
proposals:
big data practices looked through epistemics and politics
location data:
thomas hoffman
eth zurich: data analytics
hermeneutics of location data
zur hermeneutik digitaler daten
personbased raw data + common background data = enriched persona based data
semantic maps + social information + predictive models
problems: regarding privacy, discrimination, ...
what is the epistemic difference between consumer and location based data
data about location is non-inferential relevance
location data: data on presence and movement (this is banal)
if i know where your mobile phone is, i know where you live, your workplaces, if i zoom in, i know where you spend time, whether you have a child
i know where you are
secret endeavors
data of movement: mode, route and speed of transporation, when from where to where
it gives us a deep description of a sample of one
versus inferential statistics
you make a relationship between an individual and aggregated individual and make inferences
this is not new
you can do this with location data but not necessary for all usages
how do you ensure location privacy if it is so expressive:
solution: restrict data usage to aggregate level
solution 2: restrict data gathering, data sparsity
jean paul van bendegem
plato in the background (or in the machine)?
i want to explain simple point:
lots of space to explore
philosopher of mathematics
when i read stuff with ml: to my taste, not so often there is a questioning of the applicability of the mathematics
but not of the mathematics itself
in 10 minutes i will question the whole of mathematics
Niccolo Tartaglia (1499 - 1557)
beginning of the 16th century
i am fond of this kind of etching
we today would interpret this as a superposition
you have trees, canon, the smoke of the cannon ball, and we are tempted to say, there is a superimposed geometry
model of a cannonball's path
introduction of a machine learning/location thesis
we model the data "as if it were generated by a State Space Model....
we will make it discrete
for simplicities sake, we started as the uniform distribution as the start model
there are different places, and as a start position, we assume it can be anywhere
you have a probability distribution:
you take it discrete, so that you have the same probability of being everywhere
what has happened between the tartaglia etching and the phd thesis
important role of purification of mathematics
i dare to use that term, since we talk about "pure mathematics" vs. applied
if you look at the historical development
if you start with tartaglia
the term they used was not the opposition between pure and applied but mixed mathematics
if you forget the background
wait, tartaglia is one picture, if you could pull them apart, you could have a mixture of the two
they would only consider arithmetic as being pure
but back then they did not have infinity of numbers, they only had finite numbers
it takes some time for the emergence between pure and applied maths
whenever part of what was considered to be pure math, became infected with applications, the consequence was elimination
Hardy: a mathematician's apology
i have been doing all this time, it could have caused no harm whatsoever, becuase it has no use whatsoever
i can't have done anything wrong
that goes together with an ontology and epistemology
and it is related to platonism
in a sence platonism has been created!
in opposition to the (neo)platonist view is the constructivist view
mathematical objects are created
involves procedures and notations
also applies to identity
you say either things are the same, they are different or nothing can be said about it
major difference between
location (in pure sense): that being there
location + procedure (program): a program
and always think of the two together
if you think that is the basic unit, you...
i need to insert wittgenstein
if there is no basic unit, what location is, to extend this as a basic unit, is it interesting?
in road maps, some places are interesting?
being intentional: what are the purposes?
basic unit: location, procedure , and all these other things
why did the ancient egyptians needed right angles to measure the banks of the nile after the flooding?
it is easy to calculate rectangles, and that is great for taxes
and if you include the taxes, that is why in tartaglia, it is not an accident that it is a canon, it could have ben a ball, it would not have been so useful
a paper explaining random walks
two pictures
a person going 45 degrees left or right, w equal probability at the end you get a drunk person
later you see: a graph of the USD and Euro exchange rate and it looks like a random walk
why is there no drunk person at the end of that curve
this means something else
the pictures are similar
mathematically they are the same, but you treat the same because of all the elements in the basic unit
Q: is it better to be treated as member of a class or an individual?
maybe for some people it is better to be treated as an individual and for others as a group?
Q: is this related to the uncertainty?
so that we can experiment to find out what it is where we want to be?
insurance: if you know for certain that somebody would get sick, insurance system would unravel: charge them the exact amount
it is the uncertainty that allows us to socialize the cost
uncertainty enforces group categorizations, that if we would treat as individuals, we would loose
the way you aggregate, what if there is an outlier, is that the counterpart to the individual
or is it not an interesting feature anymore?
outlier would not be at the level of the person but to events
a person leaves his cell phone at home
it doesn't move the entire day
does that make me something special?
Q: what would your opinion be, if one of these systems, an ethical reflection, attempts in AI, to code ethical reflection through the belmont principles to automated decision making?
the paper gave me appreciation for people critical of singularity
this is oten what they have in mind: bostrom
the scenario is, you teach a machine to make paper clips
and it tries to make every atom in the universe into a paper clip
now i understand where these ideas are coming from
but particular way of designing a machine that intervenes, learns, ... that is oblivious to a bunch of things in the world
goal alignment is what you are describing
as much as still find those people obnoxious, there is some legitimacy
separate work: can we design systems that take this into account
how to use multi-arm bandit that does not engage in populations that can not take the burden
Q: is it that the balancing choice that the developers are making of how many they can screw before they can loose their audience
to what degree of experimentation would be acceptable
they are mathematically design to be very efficient: so they can involve small number of experiments
but there is an independent business decision to be made about it
q: location data is more direct than web data
i was suprised by that statement
certain web pages also give a lot of intimate information of things that i am interested in
if i look for something: it is indicative of something i am doing but it is not directly telling you
you cannot really know for sure, i am looking for someone else,
where i am is where i am
the more you dive in, the shadies the distinctions will be
i asked my father to send me emails he gets about post-factual emails he receives
what merkel is doing, paranoid emails
what does this tell you: if you just have this information, you don't know if i oppose these news or not
web beahvior is only probabilistic indicative
unlike in location
q: why the producers of these applications
would respect principles of bioethics, they are not producing science, but exploiting a business opportunity?
facebook contagion study
people rely on ethical practice to do corporate practice
this paper that we have written is far too, does not acknowledge
that this is a simplistic way of dealing with a practical problem
there is no expectation that people will adopt these principles
encourage people to think about how you might deisgn systems that take ethics into consideration
that gives some source to the argument
q: contrasting very routine location data, with very marginal web data
i received some stuff i am not interested about
i have my routines on the web, random places are of less importance to me
Lydia Nicholas, NESTA
most of the stuff you are talking about doesn't happen in local governments
medium data
basic statistics
who can push back against applications like waves
case workers may not be able to look at certain data
and there can be a system that looks at the different databases and provide risk scores
then you don't break confidentiality
smart places
traffic management and whether there are icy patches on the road
that is all that is there in the much of UK
optimizing resources: making your bins get collected effectively
governments collect a lot of data
most of the data interactions that i have with the government is at a life event
the times i may apply for benefits, crime
significant life events: but they don't know what i am doing in between
benefit system: requires knowing who you are living with, how long, your relationship with them
you may oversurveil certain populations
apart from your communications data,
which is not part of governance
case workers report about what was going on in a troubled family where a child is being abused
a lot of the data entered into systems is garbled, it is sensitive
it will take three years to get the permission to get into the room where the information is introduced
but then i was told the system doesn't work so they are writing it down
so what is going on in data practice with governments and governance
resistance from people working on the ground to data capture
you generally take this job because you want to help people
building relationships of trust
and help them transform their situation
you don't want to make them legible to the state
you are seeing the impossibility of doing big data work
people don't have the language, but they are thinking about the privacy issues
they want to protect their job, don't want to get fired
i find that extremely interesting as a point of focus
you get to what we are actually trying to do
you creating a social and healthcare system that looks like amazon warehouse
it is perfectly efficient, but there is no time to have a cup of tea
steps forward:
AI will get very good at parsing unstructured data
are you going to make that deliberately obfuscated
or, one of the interesting projects, they are embedding data analysts in the small teams
in place
when you try to identify families that need support
there are concern markers, attending school, drug use
usually you have to tick above a threshold
to go to a general social worker
ML use here: to segment the most significant risks and paths forward for those families
and put them into specialist social groups
and get data analysts to work with them
and find out not only what patterns there are
but which ones are important
end goal: idealized social landscape
what our government could do is turn this whole thing into an amazon warehouse
or you can find patterns that can transform
governments transforming from providing services to comissioning services
this idea wheter you are trying to watch to tick boxes and to sustain
or identify patterns to transform
produces a very different relationships between capture and machine learning
then google and apple's predictive models
it involves a lot of human and data expertise
dificulty of getting care workers, data analysts in the place
and improving statistical knowledge
people reject correlations: are you god!
basic denial of correlation is something that you may need to battle with here.
i have questions about the data analysis/care work
northern city: where to put cycle lanes
give away bikes with gps on them
30 million pounds
they were proud of the strategy
someone knocked on the door: you realize that the bikehub, and would have given you the location data and routes for free
they are people who chose to bike and engage with the app
there are basic issues in education and infrastructure
day by day work of taking state knowledge and how it impacts people
Arjen de Vries
who controls your search log data
information retrieval
i did a project on IR for children
improve it for children
best way to do that was to use log data from yahoo
i was enthusiastic about the profiling you can do
you can do cool things for normal people
somebody wanted to teach high school kids about profiling and digital identity
to teach the to think about what they do online
i was asked for advice on the tech aspects
they know everything about our online searches and what your interests are
at the same time: there is this web getting more centralized
web was intended as a decentralized thing
decentralized web summit
too much centralization: too much pwoer put to gether in a few parties that control the info online
the photos of the event are on a google drive
but they are meant to be shared! :::::>>>> centralization isnot the only way to share?
mobile makes things only worse
there is one way to earn money as a company: sell data
what can we do for seach
decentralized and localize
for search that might be harder
the shift to agile turn is causing that the line flattens: because these are bought by data center parties and people who illegally download films
we are used to this: central heating, do something for our house as way
we can store the whole web at home??
the data that i need from the web is much smaller than the web
it is a naive proposal
two problems:
how to get the data on the personal search engine?
how to replace the lack of usage data from many?
getting the data:
idea: organize the data in bundles anduse techniques inspired by query obfuscation to die the real user's interests when downloading bundles
web archive to the rescue?
we ship the part that is only of your interest
but does this mean that the web archive gets all my information instead of google?
but search gets slow
query obfuscation to hide a bit your profile and still get the chunks that would be most useful for you
how do you keep the data fresh?
make use of the fact that you use certain sites frequently
no log data: is that bad? yes!
all sorts of google search functionality
predict query intent/rank verticals etc. etc.
this is only available in exchange for the data that we give to these technologies
they do something good with the mass surveillance
without log data:
hinders retrieval experiments in academia
related problemm: reproducibility vs. representativeness of research results?
we can't study these companies, but only from the outside or as interns
alternative sources of clicks:
bitly api lists how often links are shortened and clicked
wikistts
google trends
anchor text and timestamps
anchor text with timestamps can be used to capture and trace entity evolution
or to find popular topics!
trade log data:
data markets for usage data
behavior data turns into something valuable for us
so i am much less concerned
challenges: how to select the part of your log data you are willing to share?
how to estimate the value of log data?
truly personal search:
safely gain access to rich personal data including email, browsing history, documents read and contents of the user's home directory
can high quality evidence about an individual's recurring long-term interests replace the shallow information ofmany?
alison:
slightly different ways of talking about our shared interests
rather than privacy: talk about communication as something people have always claimed a right to do
privacy and communication are corollaries with each other
to communicate and not to be forced to communicate
data and citizenship
in the broader framework of communication
seda: sketch out a transformation from a paradigm of communication about access to a network
to a communication paradigm of production of data (a compulsion to produce data for someone to take action)
hwo we organize our institutions is effected by this problem:
the second one is not about rights
the shift to wards producing data for the purposes of action: rights claims become more contested
sketch the shift
in the communication paradigm of access to a network
15 years ago
info society discourse: was about getting online and connecting with other people
glorification of a global network of peers
it was electric
speak to each other without intermediation
that network grew in which many entities, people or not, are on the network
totalizing
increasingly people have access to the network :::::::>>>>NO, they continue to exist for other people!
that is the end of the electric vision of globalization and equality
and the end of a certain way of making communication based rights claims
the right to communicate is now a compulsion to create consistent data
the network cannot continue to geenrate value, unless people continue to produce data
business models have shifted, from a whole bunch of companies providing access to a network
to intermediary giants, whose business is data processing and calculation
shift from rights claims to data acts
data acts happen through intermediation
location is a good example here
a coordinate: long/lat is nothing in itself, it needs to be constructed, you must be placed on a map, and positioned in relation to other information
next argument:
this intermediation creates what is meaningful about data
and the way that data can be connected to action
my concerns about citizenship
and the ability to act on things that people care about in the world
the intermediation happens through a framework that is first applied by the corporate and then public sector
public sector often responds, not knowing if they should repeat these things
frames:
consumption
optimization
the corporate actors make the conceptual space: what sorts of things get to be data
i work with civic orgs
bottom up orgs
they reappropriate the same kind of frameworks
data is valuable because you can link it to consumption and optimization
there are some interesting responses in the realm of ethics
normative
critical
we have this notion of data citizenship
taking infromation and calculating it
and making a calculative judgement
that produces a consumer model
what is location data good for: things that have consistent identities in space
based on the location of an object that gets information
which entities will create the first structured data first: the ones that want to sell you something
classic consumer model
second optimization model
let's use data to make things more efficient
this uses the idea that an expansion of things that are computable
expansion of areas of life that can become more optimal
comes with the datafication
commuting applications
two sided business model
you buy app as a commuter
"don't bother taking the central line"
the info depends on the availability of fully up to date streaming information
transit authority
take the potential of that mass data and turn it something that na indivdual can act on
the applications sell the data back to the transport authorities
this positions the ideal citizen as producer and consumer data
public institution as also free produced of data and consumer data
and optimization
what's interesting: these are the early low hanging fruit frameworks for how to use location data for an improved civic experience
the civic groups are doing the same thing
fixmystreet
aggregate citizen identification of problem issues in local areas
how do i demonstrate to a government that they should not close the local library?
this illustrated to me that there is a particular kind of computable civic action
for people who are working with the data
which does not fundamentally challenge the consumer paradigm of data production for consumption and optimization
normative response:
justice: inclusion/exclusion in relation to the results of the calculation
existing frameworks: protected classes, abstractions and contextual integrity
critical response:
boltanski's notion of critique: questioning the nature of things
can we question this fundamental shift in citizenship
how it has been redefined through datafication
what it means to have only spaces of action
instead of rights claims as a way to claim civic positions
play of uncertainty: make things unoptimal
you may think about optimization resistant behavior
good research that shows that if you seek to datafy an organization, there is always information that cannot be turnedi into data
one possible way of thinking about this is to shift from an expost framework: at the end of the technology
to think about the construction of ethical beahviors all the way through developing a technology
Irina shklovski
mutable stories: the shifting accountability of data interpretation
the conditions we are discussing here are not about tech per say, but a blend of human decision making and tech practice
not about algorithms and data itself
but about the output and its interpretation
the connection between data and action
information as thing: -> deserves careful examination
michael buckland
data is information as thing
giving attention to data is now strange: people used to think this is worth scholarly attention
iat: is just an object, may or may not be informative
information as thing has become central to the worries and criticisms
we worry about data and services
what seems to be have said over and over
there is a different between describing patterns in data and the attempts to understand and explain these patterns
what must be known or data to be interpretable?
how does one tell a story with.from data?
how do data shift from thing to knowledge?
what is the minimum necessary dataset for interpretation?
given enough data, the output can be interpreted in a way that is actionable?
location is commodity:
commodification: removing something from its context of production such that its value is determined by its context o fuse
commodity fetishism: when the commodity's value is entirely determined by other means, such that the richness of its context o production is lost, and its value, which once came from social relations, is invested in the object itself
commodification here shifts the process of interpreting by changing the available minimum necessary dataset
location data is commodified, and it displaces the practices for social meaning
interpreations of data can reshape expectation, accountability etc
.
interpretation of data is about managing uncertainty
when data used for decision making are interpreted through the lens of the social relations and contexts of its production - the minimum necessary dataset is inflected with the consequence of the decision taken
commodified data are never the entire story byt form a partial basis for it...
Katja de Vries
Thesis presentation
Baroque
17th century
time of scientific revolution
cartesian separation between object subject
newtorn society: difference between subjective beliefs and objectives facts
baroque: in reaction to protestantism
endless theatrical art style
17th century merchantile capitalism
the first financial bubbles
first insurances and probability calculus
a very baroque work of art
frescal ceiling of a church in home
visual illusion
if you stand in thie jesuit church in rome
you have to stand in a specific marble stone
it is as if the sky opens above your head
if you step down, the visual effect stops
as if the building collapses on ou
andrea ... had a problem with perspective painting that works in more than one stop
another interpretation
is that potzo, exactly wanted to show that this is an illusion
it is good to rejoice in the performativity and artificiality of the paingin
you step down from the marble stone and it collapses on your head
it rejoices in the work of making something work
baroque is not only an artistic style but a style of thinking
what is it to make something work and in general
it is a style, the process of making, and how to work with differences and probabilities
leibnitz: great thinker of baroque
working with princess sophie and something happens
sophie says: is everything really different
and he says, let's do emprical philosophy
and find leaves that are identical
they walk around for a while, and they say we cannot find identical leaves
hegel says this is like children looking for the same snow sakes
but it is turn: how do we deal with difference and sameness
chapter 2:
traces
if an animal walks through the snow, it leaves a trace
the trace is open for interpretation
a dear, a wolf, and what is it that you are after
scientific, hunter, tourist
do you follow the trace, or turn around and run for it
we give interpretations to traces and act on it
interpretations can be dead serious
it is what decides who is in the stew: the dear, or the hunter
interpretation is a word i don't use in my thesis
it sounds voluntaristic
as if you can chose what you want to see
but it is limited by your body, the space that has constituted you
this is a tick: it has three sensations
sweaty or not, hairy skin or not, 27 degrees or not
is the experience of the tick wrong, or not???
perceptions or percepts is the word
perceptions are not determinisitc
things can appear is diferent ways
picture that can be interpretted as a young fgirl or an old woman
you can extend bodies in ...
with a dog, glasses, google glasses, laptop
what the body can see and perceive will not be the same
train a body in a particular way of doing things
practice of law, science or medicine
again, this body will not be the same, and what the body can see and perceive will not be the same
what makes someone act or perceive
mapping what makes someone act
latour calls the mapping of the netowrk
or work-net
the work that goes into the percetption that is getting foregrounded
i look at two worknets
eu informational fundamental rights
and network of ML algorithms that are applied to human behavior and characteristics
we return to the baroque
the sameness of those two networks
practices that escape the opposition between art and science
these are not completely baroque
they are a bit modern and a bit baroque
and they could even become more baroque
i juxtaposed these two networks and how they contaminate each other
the exercise of studying these networks has two effects on two other levels
what is the best way to study the practice of making: law making, how do you study that as a philosopher of technology
do you look at it as a general way
or do you look at the specificities of makins
the analysis is also used to read philosophy against the grain as to how identity and difference are related to each other
a recurrent theme is the heigh ho heigh ho song
what does it mean to make something work today
digital traces: not only animals leave traces in the snow
the amount of traces we leave has exploded
the traveability isnot limited to what we do behind the screen
but the signals emitted by mobile phones
and footage captured by smart cameras
footage from a smart camera: how we stroll from a real shop
capter 4
machine perceptions
machines categorize
they give interpretation
separating male from female faces
google making a mistake
smart face analytics
the tech that is used is probabilistic
this face is mostly happy and a a little surprised, sad
summarized in clearly understandable labels
a trace
it is nothing without an interpretation
also nothing without acting upon that information
the question in modern tech is the same as with the trace in the snow: who ends up on the stew
let's look at the face again
a startup:
if you are angry, our app will offer you a whiskey
or if w recognize you are engry we will not respond at all
what is the worknet of the network of machine learning
i looked at 11 specific algorithms
how they construct identity and difference
is this face angry, sad, male, female
main idea:
classical programming:
explicit instruction:
machine learning:
examples, instructions how to extract patterns, sometimes feedback (you classified wrong)
chapter 6
percepts creation by eu fundemantal rights
privacy, data protection and anti-discrimination rights
the choice of these rights allows us to see something about these rights
what is important in understanding fundamental rights, they are embedded in a certain political constituttion
end of 18th century: the state is no longer leviathan, but a pastoral quality, the state is more like a shepherd
why do i compare these networks
they are both
they create similarities between cases
they are not scientific at all
but with the case of fundamental rights, there is something else at stake
i tis not just a baroque practice
but ther eis also a political idea behind it
can pastoral power be included in machine learning
chapter 7
contamination of ways of being
can ML be expected to make a political option
so that the sheep and people are given the possibility to resist
one of the possibilities is to do some transparency about how the algorithms work
how do we co-exist with ml being applied us
one is that we would be able to see what the algorithms do and think about alternatives
chapter 8
what is making, and how to study it?
looking at two ways of making
making through informational rights and ML
what is actually to make something
can we by looking at two different ways of making, say something about the nature of making
solon: you say we need more baroqueness: what problems would baroqueness help us solve?
i think it is good to say that ml is both baroque and not
it is not baroque in that it comes from a tradition of statistics
which is often about presenting an objective reality
classical statistical approach is we looka t how people with psychosis behave and we create a model and present reality in some way
the ml applicaions are applied in a quick pace to reality
this means that, suddenly it is not just knowledge from paper but it is knowledge that is being applied
when we are aware that this is performative interventions into reality
then it is important for the practitioners who created this algorithms to realize
what all the human efforts that go into it
what is the goal in creating these algorithms
better world, optimizing profit
how do we create the variables
constructive validity
these are choices that are being made
how do we test what is the standard against which we measure the algorithms
they are being tested often on standardized databases
how well can an alforithm recognize those in a standard database (in comparison to others)
this is all very constructivist
it would be good if the practitioners acknowledged that
there is no ultimate standard of whether this is a model that represents reality or not
ML person once said: there is no best, the question is does it work
a plane doesn't represent reality
but if it flies from a to b without crashing, then it is a good plane
so what is it that we want algorithms to do
in the case of the plane: not destroy the environment
it is important to clarify to practitioners it is more an art
creating an algorithm that cannot be measured with standards of reality and truth
but to make clear to them, you are making something that is like recipes
one recipe has certain advantages over others
solon: i like the idea, recognizing the constructivist aspect of ml? do the practitioners not recognize? who has this mistaken belief? practitioners or others?
two answers:
on a high level, the top level ml engineers are concerned about the implications that ml has
when you look at the concerns, they are mainly ethical concerns
kathy oneil: she did a course to students
she asked: can you develop an algorithm to weight your own essays
the algorithm effects the people themselves
an ethical awareness of their real life effects
by my thesis, i extend the view: it is not only ethical concerns, it is also a legal concern
it is also a political concern about the constitution
in lower levels of ml there is little understanding: when a company does not have high quality know how
and you are throwing in data, and something will come out
this will be based on what the black box puts out
outside these highly educated ml
there is a great believe that this is data, this model must be true
there is a lot of work to do in making clear that this is not about reality
judith simon:
reads her review: summarizes and says it is great!
how do you bring power asymmetry matter to the ml engineers?
law can change the obligations companies have towards users
once there is a legal situation where industry has to incorparate certain power balance
between users and more powerful info tech structures
it will be necessary to incorporate in the pragmatism of ml
if you are a big it company
it is much easier to think: let's look at the dp requirements
what to do
and how to comply with the anti-discrimination rule
as a more general way
there should be power equality - article 8
force companies to think about power balances in a broader way
this will be a way in which the machine learning pragmatism can be effectd
engineers can be compelled to do something about it
but i acknowledge that the economic incentives are big
thomas heskens:
i call myself a machine learner
you said ml is not science, what do you count as science: statistics, etc.
all modern science is disappearing
all of it is turning into engineering
i draw a line between 17th century as experimental science
a research doing an experiment and representing reality
so it is about the experimental part and what the truth
the idea that nature is discovered
science is really builogy, physics, but then when you start making models
then most of the models are completely wrong but useful
that is the line
if they are wrong they are not science
if they are right it is engineering?
this is one of the quotes
all models are wrong and some are useful
in fact, this is the criterium that should go for everything that is called science now
there is not a representation of reality but models that are very useful
but ml, is more upfront about it
it can be more upfrot about the performative
ibm watson
medical data, systems for medical diagnosis
you can go to watson or your physician
what would you choose? watson has seen much more data
i currently would ask both
ask them both
they give different answers?
even the people who make them cannot tell exactly
but you can't tell waht the physician's reasoning
but you can ask the physician
i can make a rule of a neural network to respond to such a question...
neither can the doctor give you more
medical practitioners will know that there is a model on top of the neural network
they can have some assessment of what is going on in the black box
what you see in those debates is that medical practioners, it is not clear what input you should put into the machine
a doctor may sometimes cannot explain, but based on his experience
that is weird:
jean paul van bendegem
as a mathematician and philosopher i have one question
i am grateful to tom for getting my question started
there was a nice exchange about what is science and not
if i understood right, math is not science
i am not a mathematician
i write about porbability
and sometimes i collide logic with mathematics
math it is a formal system with a cleaned up space where certain manipulations are possible
in some ways, mathematics can be very useful
you can build bridges with it
you can predict fluctuations on the financial market
it can be useful as a tool, some times
i have the impression when reading the thesis
you have an idea of pure mathematics
i still have the feeling that you are acknowledging this purity
i have used mathematics as a counter side
as a contrast
to show what baroque sciences can be
this question about science, i contrast law and machine learning with modern science
i acknowledge that modern science does not really exist
if you are going to disentangle that
in ml baroqueness was more visible
if you would not leave math behind but baroqify it, too, that would be great
our whole approach is similar to
we are standing at the right stone to se the illusion of law
if i want to show you the same illusion with machine learning, we need to make a move
that is your rhetorical strategy
thre are two aswers
contrast positions
when you speak two languages
you speak an issue in another language, you se the limits of the first language
if you only speak english, you don't know how a language shapes you
2) the front cover has the illlusion by potzo
it is important in law and in machine learning to be able to make it noticeable
that you are standing on the stone and to step down and see it is artificial
gloria gonzalez
my battery died!!!
========== NOTES IN PREPARATION OF TALK AT ANYWARE ===================
we are and always have been continuously reshaped by the artifacts we shape, to which we ask: who designed the lives we live today? What are the forms of life we inhabit, and what new forms are currently being designed? Where are the sites, and what are the techniques, to design others?
location determines who you are: you are where you live
you make up location
The first is the ZIP (Zone Improvement Plan) code. Mandated as an element of President Kennedy’s attempt to rationalize government, the ZIP code allowed for the first time the quantification and thereby the easy organization of both residence and business addresses
Under the ZIP code system, households were aggregated into units served by a single post office, serving at most perhaps 15,000 people, each indicated by a five-digit number. The US Postal Service at the same time established a system of numbering of postal carrier routes, the routes traveled by each letter carrier, and they received two-digit numbers, so that ZIP codes could in turn be divided into units of perhaps 800 people. As an incentive to using the systems the Postal Service (or, actually, its predecessor, the Post Office Department) gave a discount to mass mailers who sorted their mail by carrier route, so that that geographical unit, defined by the daily path of the individual letter carrier, came to be the preferred unit of division
The US Bureau of the Census had begun the establishment of first the GBF-DIME files and then the (currently used) TIGER files. First for urban areas and then for the entire country, these files were the basis of a computerized mapping system to be used for the first time in the 1970 decennial census
ore to the point here, these computerized files, consisting in part of latitude and longitude values for the four corners of every block in every city in the US, allowed -through a process of matching with the Postal Service’s ZIP code files - the determination of the geographical coordinates of every mailing address in every city in the US. (Rural addresses created special problems, whichhave only recently begin to be resolved through the development of new rural addressing systems, the impetus for which has been the perceived need to rationalize and support emergency response, or 911, systems.) This in turn allowed the creation not merely of lists, but also of maps of ZIP codes, postal carrier routes, and so on
umerical taxonomy is a sophisticated way of clustering similar individuals by imagining them to be in an N-dimensional space, where “n” is the number of socioeconomic variables. So using the roughly 600 socioeconomic variables available at the block-group level, the creators of the geodemographic systems determined the distance of each of 230,000 block groups to all the others in 600-dimensional space. The ones that were “closest” were characterized as being most alike, just as the ones farthest from one another were seen as least alike
These earliest systems, developed in an era before the advent of desktop computing, relied on computationally intensive numerical taxonomy, and ran on mainframe computers.
****Software was inevitably written specifically for the project at hand, and customized results were as a consequence expensive****
At the same time, the systems began to direct their attention to smaller and smaller units of measure, to the individual and household level.
he geodemographic industry was founded upon resources generated by public programs. These programs, ZIP codes, census data, and 911 address standardization, produced standardized regions as well as locational data (that is, latitude and longitude information) for particular entities. These standard regions were not developed for the specific requirements of the geodemographic industry. Nevertheless, the industry used these legacy regions as the foundation of their marketing analysis. System improvement was marked by the ability to define, categorize, and target smaller and smaller fixed regions
What changed at the end of the 90s?
1) The first is a new attention to what are viewed as temporally fluid regions
People's behavior can be understood only by understanding that people are mobile, and that they routinely move from home to work (school, markets etc.) and back
2) The second trend is the development of location based services (LBS). These services are in the first instance designed to coordinate or assist the activities of mobile individuals as they pass through stable regions
the recognition of temporal changes in the character of regions and the interest in tracking mobile individuals - are in an obvious sense closely connected
Two factors have been especially influential in the latest development of geodemographic systems. The first is the role of government-subsidized information. Especially critical here have been global positioning systems, the development of which was originally financed and implemented by the U.S. Department of Defense. Also, U.S. Federal legislation now mandates that mobile telecommunications devices transmit, under certain circumstances, their location
BUT OF COURSE, THIS MATTER HAS BEEN ECLIPSED BY THE GSM INFRASTRUCTURE, LOCATION FINGERPRINTING, AND SERVICES
IMPORTANT ANALYSIS:
we wish to trace the ways in which the sociological and geographical understandings of the developers of these systems and the systems themselves have been mutually influential.
NOTICE THE GREAT POINTERS HERE TO CLASS AND RACE!
Indeed, the today-familiar forms of horizontal segregation, emblematic of the idea that you are where you live, developed only through the nineteenth century, as a result of a growing middle class wishing to express its new status, and at the same time wishing to attain a degree of separateness from the less well-off people whom they had economically left behind (Johnson 1978). Each small region or neighborhood came to be seen as a place wherein
The ideal, if not the practice, of locational marketing can be found in this historical moment. Regional space was considered a container which people occupied without really affecting
Each of these neighborhoods could be conceptualized as a container, within which there were households and residents who occupied that neighborhood much like sardines in a can. That is, their inhabiting a neighborhood was just a matter of being there; it was fundamentally passive. Computational energy was spent, not in redefining regions, but in defining a set of classificatory categories, and assigning the extant regions to those categories
ut with the 1960s, even as the flight to the suburbs continued, and even as this sociological ideal was being implemented in working geodemographic systems, the nation was increasingly rent by schism. The notion of the suburbs, or of any segment of American life, as united in a set of core values or ideals, was increasingly challenged. The very premise of locational marketing - the social cohesion of neighborhoods -became increasingly questionable. Marketers responded with the technological means at their command, and within a conservative ideological framework. They made their locational analysis more and more precise in the seemingly desperate belief that at some level - if not 40,000 people then 1,000 people, and if not there, well, then 40 people -the ideal refuge of a like-minded group of neighbors could be resuscitate
The increasing availability of ever more precise locational data as well as ever more abundant personal information, and marketers’ sense of devolving social cohesion, have gradually led to an ever more narrow definition of the “where” of “you are where you live,” until it is now thought of as the skin that marks the boundaries of your physical extension
SO IMPORTANT:
In carrying “You are where youlive” to its technological and analytic extreme, it has turned on itself, and begun to recognize that the spatial container cannot be the primary definer of its individual contents
MOST PRIVACY WORK ASSUMES THAT THIS MODEL IS STILL VALID: FROM WHERE YOU ARE I CAN INFER WHAT YOU ARE DOING AND WHAT YOUR INTENTIONS ARE.
marketers and demographers have begun to understand regions themselves as constituted by the patterns of activity of individuals
TEMPORALITY OF HOW THE UTILITY OF THE SYSTEM IS MEASURED
The individual is an active geographical agent, making decisions on the fly, as opportunities arise. And here those decisions seem inevitably to occasion responses on the part of the users of the systems, just because the systems for the first time allow immediate validation of their worth. If a store uses a geodemographic system to offer electronic coupons to people walking by, or if a digital sign promoting a sale is set to appeal to an especially large group of people with certain tastes, again known to be walking or driving by, the utility of the system is immediately evident.
AND THIS IS THE POINT: BUT THE NEGATIVE SPACES ARE WHERE THIS AMBITION OF COURSE TURNS WEIRD
So just as from a philosophical point of view the new systems are fulfillments of the desire for a richer way of understanding people’s geographical behavior, they at the same time are themselves active agents in manipulating that behavior to create “ideal” geographies
WHAT ARE DIFFERENCES IN WHICH POWER MANAGES SPACE: TIME AND VISIBILITY
There is another sense in which the public domain becomes privatized through new developments in geodemographic systems. That is the degree to which the character of lived regions becomes the product of the goals and strategies of ever fewer, more interlinked, well-capitalized, and private corporate interests. Corporate and state actors have always been significant actors in the social construction of place. Historically, though, the mechanisms of those actions have at least been visible and to a degree opposable. Highway projects loom before they are built. The effects of redlines are enduring, relatively stable, and noticeable. However, new systems will potentially allow the instantaneous reconfiguring of spatial elements toward any emergent strategic end. The spatial contours of places will become more fluid, and the means by which the existence, the meaning, and the social importance of places are negotiated will become more fast-paced, and less visible to their inhabitants
The sociological belief that “You are where you live” has fostered a drive to understand “where you live”
BUT NOW THE YOU IS ALSO DISECTED, YOU ARE NOT REALLY SEEN AS ONE BUT MULTIPLE, AS A SET OF GESTURES, EMOTIONS, AND OTHER THINGS
Since regions are created by the behavior of individual inhabitants, the goal becomes to influence those behaviors through direct, persuasive appeals. Regions are managed by managing individuals
creation of temporal spaces:
predicting action
gaming action
simulation:
simulate locations so that you can plan for them
apple xcode
android emulator: mock location data
not just used by app providers, but also pokemon go users
the world needs to be legible
in place and time
so that it can simulate things
creation of negative spaces:
the meeting spot of courriers in brussels (femke snelting)
or the airplane tickets that become cheaper because they are out of the predictive profile of the masses