We thank the four reviewers for their time and their very helpful comments and feedback. We list below our proposals for addressing all of the comments, and we believe revising it this way will result in a stronger paper.
**Reviewer A:**
*Cost*
We agree that costs are not limited to financial costs. Our model of cost is not limited to financial costs. We tried to emphasize this in our examples. In the credit-scoring example the cost is the number of poisoning inputs, and in the routing case study the cost is expressed in terms of how many changes need to be made to the town (which is a proxy for the impact of the average travel time through town). We will emphasize our broader definition of costs, since this is a very important distinction.
*Word choice*
We agree that careful word choice is critical and can affect how a concept is interpreted. We will read over the paper with this in mind and make sure that the many concepts we introduce are done so consistently and they are inclusive. For instance, we will change "citizen" to "resident".
**Reviewers A & C:**
*Other POTs*
We would love to have expanded on existing tools that can be used as POTs as well as the existing POT-like artistic projects. We excluded this due to space constraints. We think there is great value in exploring and completing a comparative evaluation of the examples in Table 1 and we hope to do so in future work.
**Reviewer B:**
*Novelty*
We think that the critique on the novelty aspect of POTs is fair. Indeed we are closer to repurposing, for example by redefining the trust assumptions under adversarial ML, than proposing new concepts. We will rephrase to make this clearer and ensure that our claims express this point.
*Comparison of POTs and fairness-by-design*
We argue not that POTs will outperform fair-by-design systems. Fairness by design (FbD) proposes, à la Jackson [12], a specification-centric approach to address a subset of all the harms that systems can cause. POTs are based on a conceptualization that captures broader harms independent of the recognition of these harms by the service provider. Further, FbD aims for solutions that apply to (all) system users whereas POTs are intended for an outcome for an impacted environment or sub-population, some of whom may be the system's users. Therefore, unless the harms and the goals are the same for a FbD instantiation and a POT instantiation, it is not possible to compare the results. To illustrate the impossibility of this comparison, let us consider the credit-scoring case, in which the POT aims at ensuring that a target group that could repay a loan will get a high score regardless of demographics and the scores assigned to members of other subgroups. This is in contrast to traditional FbD approaches that seek equality accross demographic subgroups. These two goals are just not comparable.
What we could compare is performance of POTs from the outside against a credit-scoring model that incorporates the POT's objective during training or in post-processing (as in FbD approaches). However, we note that the "by-design" approach is equivalent to the currently considered service provider that minimizes the error for the whole population (including the target group). Thus, our evaluation already considers this case.
All in all, we thank the reviewer for this comment that made us reflect on the similarities and differences of the two approaches in a new way. We will include this discussion in the paper.
*Focus on Fairness by Design in Theory, but not in evaluation*
The reviewer observes that our theory focuses on FbD service providers, while the evaluation targets service providers that do not necessarily satisfy any notions of fairness. Indeed, Section 3 mostly focuses on FbD systems, except for 3.1.2 where we discuss that not all providers may be trusted to implement FbD. The reason we focus on FbD is to emphasize that even if the provider is benevolent fairness cannot address all possible issues caused by the introduction of a system into the environment.
Since in the theoretical part we address that even with a benevolent service provider fairness is insufficient, we focus our evaluation on harms that service providers will not or cannot mitigate. We note, however, that our analysis does not make any assumptions about fairness of the service providers in the case studies.
Let us assume the systems satisfied some notion of fairness, e.g., Waze would equally distribute routes based on gender/race/age, or the credit-scoring model would satisfy equality of odds across genders/races. Even in this case, the harms that our POTs are addressing are not accounted for by standard notions of fairness and the objective and implementation of our POTs would be the same.
*POTs are defined as changing inputs to the training data*
POTs are _not_ conceptualized to be applied only in the training phase, a matter detailed in the definition in Section 4, paragraph 2. Moreover, they are not specific to machine-learning systems with explicit training and testing phases, as the Waze case study demonstrates. In the final version, we will make this clearer. In any case, as the reviewer hints, an example of a POT at test-time could be a POT based on adversarial examples. In fact, we studied such an example related to credit scoring, but omitted it due to space constraints.
*Mathematical derivations*
We are very grateful to the reviewer for the thorough look at the mathematical derivations. We thank them for spotting notational ambiguities, and the significant typo in which we missed the nabla character.
We mostly agree with the comment that the main mathematical statement of Section 3 (Statement 3.1) is not essential, and we could have ended the story with claiming that it is always the case that $B \neq \hat B$. However, we believe that the approximation showing that the social-utility gap grows quadratically as the error of the model (informal) increases, adds significantly more nuance to the otherwise trivial statement. Morever, we think that this approximation is relevant, because it holds for the generic form of the optimization problem considered in fairness (learning with fairness constraints or regularization). We propose to change the structure and state it informally in the paper and move the details and some instrumental definitions to the appendix for the sake of clarity and readability.
*Epsilon procedure*
The method to derive the approximation is non-standard in fairness or economics; it comes from the recent machine-learning literature [36]. This method is useful for characterizing properties of solutions to optimization problems in their most generic form, which is exactly what we wanted to characterize.
*Externalities exist regardless of which particular parameter setting the service provider chooses*
The parametric form is inspired by the definition of externalities in Economics, but there is more to it. That externalities exist regardless of system parameters often holds in practice. However, as mentioned above, in Section 3.1.1 we consider the best-case FbD service provider and hence we make the same assumption as the FbD paradigm: there exist configurations of the system that fulfill its intended goal yet minimize "harms". To capture the fact that there exist different possible system configurations, we parameterize our definitions on $\theta$, which represents a system configuration.
*The limitations of POTs: asocial goals, and externalities of POTs themselves*
We acknowledge this is a limitation of POTs. We address some of these points in the discussion and will expand on this matter. We also note that much of the relevant discussion can be found in prior work [103,104]. We will make the link to these works explicit.
**Reviewer D:**
Thank you for your supportive comments and notes about additional related work. We will add the additional critiques of fairness to the discussion.