Responsible AI practices

The development of AI has created new opportunities to improve the lives of people around the world, from business to healthcare to education. It has also raised new questions about the best way to build fairness, interpretability, privacy, and safety into these systems.


AI systems are enabling new experiences and abilities for people around the globe. Beyond recommending apps, short videos, and TV shows, AI systems can be used for more critical tasks, such as predicting the presence and severity of a medical condition, matching people to jobs and partners, or identifying if a person is crossing the street. Such computerized assistive or decision-making systems have the potential to be more fair and more inclusive at a broader scale than historical decision-making processes based on ad hoc rules or human judgments. The risk is that any unfair bias in such systems can also have a wide-scale impact. Thus, as the impact of AI increases across sectors and societies, it is critical to work towards systems that are fair and inclusive for all.

This is a hard task. First, ML models learn from existing data collected from the real world, and so a model may learn or even amplify problematic pre-existing biases in the data based on race, gender, religion or other characteristics.

Second, even with the most rigorous and cross-functional training and testing, it is a challenge to build systems that will be fair across all situations or cultures. For example, a speech recognition system that was trained on US adults may be fair and inclusive in that specific context. When used by teenagers, however, the system may fail to recognize evolving slang words or phrases. If the system is deployed in the United Kingdom, it may have a harder time with certain regional British accents than others. And even when the system is applied to US adults, we might discover unexpected segments of the population whose speech it handles poorly, for example people speaking with a stutter. Use of the system after launch can reveal unintentional, unfair outcomes that were difficult to predict.

Third, there is no standard definition of fairness, whether decisions are made by humans or machines. Identifying appropriate fairness criteria for a system requires accounting for user experience, cultural, social, historical, political, legal, and ethical considerations, several of which may have tradeoffs. Even for situations that seem simple, people may disagree about what is fair, and it may be unclear what point of view should dictate AI policy, especially in a global setting. That said, it is possible to aim for continuous improvement toward "fairer" systems.

Addressing fairness, equity, and inclusion in AI is an active area of research. It requires a holistic approach, from fostering an inclusive workforce that embodies critical and diverse knowledge, to seeking input from communities early in the research and development process to develop an understanding of societal contexts, to assessing training datasets for potential sources of unfair bias, to training models to remove or correct problematic biases, to evaluating models for disparities in performance, to continued adversarial testing of final AI systems for unfair outcomes. In fact, ML models can even be used to identify some of the conscious and unconscious human biases and barriers to inclusion that have developed and perpetuated throughout history, bringing about positive change.

Far from a solved problem, fairness in AI presents both an opportunity and a challenge. Google is committed to making progress in all of these areas, and to creating tools, datasets, and other resources for the larger community and adapting these as new challenges arise with the development of generative AI systems. Our current thinking at Google is outlined below.

show more
show less

Recommended practices

It is important to identify whether or not machine learning can help provide an adequate solution to the specific problem at hand. If it can, just as there is no single "correct" model for all ML or AI tasks, there is no single technique that ensures fairness in every situation or outcome. In practice, AI researchers and developers should consider using a variety of approaches to iterate and improve, especially when working in the emerging area of generative AI.

Design your model using concrete goals for fairness and inclusion

  • Engage with social scientists, humanists, and other relevant experts for your product to understand and account for various perspectives.
  • Consider how the technology and its development over time will impact different use cases: Whose views are represented? What types of data are represented? What’s being left out? What outcomes does this technology enable and how do these compare for different users and communities? What biases, negative experiences, or discriminatory outcomes might occur?
  • Set goals for your system to work fairly across anticipated use cases: for example, in X different languages, or to Y different age groups. Monitor these goals over time and expand as appropriate.
  • Design your algorithms and objective function to reflect fairness goals.
  • Update your training and testing data frequently based on who uses your technology and how they use it.

Use representative datasets to train and test your model

  • Assess fairness in your datasets, which includes identifying representation and corresponding limitations, as well as identifying prejudicial or discriminatory correlations between features, labels, and groups. Visualization, clustering, and data annotations can help with this assessment.
  • Public training datasets will often need to be augmented to better reflect real-world frequencies of people, events, and attributes that your system will be making predictions about.
  • Understand the various perspectives, experiences, and goals of the people annotating the data. What does success look like for different workers, and what are the trade-offs between time spent on task and enjoyment of the task?
  • If you are working with annotation teams, partner closely with them to design clear tasks, incentives, and feedback mechanisms that ensure sustainable, diverse, and accurate annotations. Account for human variability, including accessibility, muscle memory, and biases in annotation, e.g., by using a standard set of questions with known answers.

Check the system for unfair biases

  • For example, organize a pool of trusted, diverse testers who can adversarially test the system, and incorporate a variety of adversarial inputs into unit tests. This can help to identify who may experience unexpected adverse impacts. Even a low error rate can allow for the occasional very bad mistake. Targeted adversarial testing can help find problems that are masked by aggregate metrics.
  • While designing metrics to train and evaluate your system, also include metrics to examine performance across different subgroups. For example, false positive rate and false negative rate per subgroup can help to understand which groups experience disproportionately worse or better performance.
  • In addition to sliced statistical metrics, create a test set that stress-tests the system on difficult cases. This will enable you to quickly evaluate how well your system is doing on examples that can be particularly hurtful or problematic each time you update your system. As with all test sets, you should continuously update this set as your system evolves, features are added or removed and you have more feedback from users.
  • Consider the effects of biases created by decisions made by the system previously, and the feedback loops this may create.

Analyze performance

  • Take the different metrics you’ve defined into account. For example, a system’s false positive rate may vary across different subgroups in your data, and improvements in one metric may adversely affect another.
  • Evaluate user experience in real-world scenarios across a broad spectrum of users, use cases, and contexts of use (e.g., TensorFlow Model Analysis). Test and iterate in dogfood first, followed by continued testing after launch.
  • Even if everything in the overall system design is carefully crafted to address fairness issues, ML-based models rarely operate with 100%% perfection when applied to real, live data. When an issue occurs in a live product, consider whether it aligns with any existing societal disadvantages, and how it will be impacted by both short- and long-term solutions.

Examples of our work


Automated predictions and decision making can improve lives in a number of ways, from recommending music you might like to monitoring a patient’s vital signs consistently. This is why interpretability, or the level to which we can question, understand, and trust an AI system, is crucial. Interpretability also reflects our domain knowledge and societal values, provides scientists and engineers with better means of designing, developing, and debugging models, and helps to ensure that AI systems are working as intended.

These issues apply to humans as well as AI systems—after all, it's not always easy for a person to provide a satisfactory explanation of their own decisions. For example, it can be difficult for an oncologist to quantify all the reasons why they think a patient’s cancer may have recurred—they may just say they have an intuition based on patterns they have seen in the past, leading them to order follow-up tests for more definitive results. In contrast, an AI system can list a variety of information that went into its prediction: biomarker levels and corresponding scans from 100 different patients over the past 10 years, but have a hard time communicating how it combined all that data to estimate an 80%% chance of cancer and recommendation to get a PET scan. Understanding complex AI models, such as deep neural networks which are at the foundation of generative AI systems, can be challenging even for machine learning experts.

Understanding and testing AI systems also offers new challenges compared to traditional software – especially as generative AI models and systems continue to emerge. Traditional software is essentially a series of if-then rules, and interpreting and debugging performance largely consists of chasing a problem down a garden of forking paths. While that can be extremely challenging, a human can generally track the path taken through the code, and understand a given result.

With AI systems, the "code path" may include millions of parameters – and in generative AI systems, they may include billions – and mathematical operations, so it is much harder to pinpoint one specific bug that leads to a faulty decision than with previous software. However, with responsible AI system design, those millions or billions of values can be traced back to the training data or to model attention on specific data or features, resulting in discovery of the bug. That contrasts with one of the key problems in traditional decision-making software, which is the existence of "magic numbers"—decision rules or thresholds set without explanation by a now-forgotten programmer, often based on their personal intuition or a tiny set of trial examples.

Overall, an AI system is best understood by the underlying training data and training process, as well as the resulting AI model. While this poses new challenges, the collective effort of the tech community to formulate proactive responsible guidelines, best practices, and tools is steadily improving our ability to understand, control, and debug AI systems. This, and we’d like to share some of our current work and thinking in this area.

show more
show less

Recommended practices

Interpretability and accountability is an area of ongoing research and development at Google and in the broader AI community. Here we share some of our recommended practices to date.

Plan out your options to pursue interpretability

Pursuing interpretability can happen before, during and after designing and training your model.

  • What degree of interpretability do you really need? Work closely with relevant domain experts for your model (e.g., healthcare, retail, etc.) to identify what interpretability features are needed, and why. While rare, there are some cases/systems where with sufficient empirical evidence, fine-grain interpretability is not needed.
  • Can you analyze your training/testing data? For example, if you are working with private data, you may not have access to investigate your input data.
  • Can you change your training/testing data, for example, gather more training data for certain subsets (e.g., parts/slices of the feature space), or gather test data for categories of interest?
  • Can you design a new model or are you constrained to an already-trained model?
  • Are you providing too much transparency, potentially opening up vectors for abuse?
  • What are your post-train interpretability options? Will you have access to the internals of the model (e.g., black box vs. white box)?

Treat interpretability as a core part of the user experience

  • Iterate with users in the development cycle to test and refine your assumptions about user needs and goals.
  • Design the UX so that users build useful mental models of the AI system. If not given clear and compelling information, users may make up their own theories about how an AI system works, which can negatively affect how they try to use the system.
  • Where possible, make it easy for users to do their own sensitivity analysis: empower them to test how different inputs affect the model output.
  • Additional relevant UX resources: Designing for human needs, user control, teaching an AI, habituation, fairness, representation

Design the model to be interpretable

  • Use the smallest set of inputs necessary for your performance goals to make it clearer what factors are affecting the model.
  • Use the simplest model that meets your performance goals.
  • Learn causal relationships not correlations when possible (e.g., use height not age to predict if a kid is safe to ride a roller coaster).
  • Craft the training objective to match your true goal (e.g., train for the acceptable probability of false alarms, not accuracy).
  • Constrain your model to produce input-output relationships that reflect domain expert knowledge (e.g., a coffee shop should be more likely to be recommended if it’s closer to the user, if everything else about it is the same).

Choose metrics to reflect the end-goal and the end-task

The metrics you consider must address the particular benefits and risks of your specific context. For example, a fire alarm system would need to have high recall, even if that means the occasional false alarm.

Understand the trained model

Many techniques are being developed to gain insights into the model (e.g., sensitivity to inputs).

  • Analyze the model’s sensitivity to different inputs, for different subsets of examples.

Communicate explanations to model users

  • Provide explanations that are understandable and appropriate for the user (e.g., technical details may be appropriate for industry practitioners and academia, while general users may find UI prompts, user-friendly summary descriptions or visualizations more useful). Explanations should be informed by a careful consideration of philosophical, psychological, computer science (including HCI), legal and ethical considerations about what counts as a good explanation in different contexts.
  • Identify if and where explanations may not be appropriate (e.g., where explanations could result in more confusion for general users, nefarious actors could take advantage of the explanation for system or user abuse, or explanations may reveal proprietary information).
  • Consider alternatives if explanations are requested by a certain user base but cannot or should not be provided, or if it's not possible to provide a clear, sound explanation. You could instead provide accountability through other mechanisms such as auditing or allow users to contest decisions or to provide feedback to influence future decisions or experiences.
  • Prioritize explanations that suggest clear actions a user can take to correct inaccurate predictions going forward.
  • Don’t imply that explanations mean causation unless they do.
  • Recognize human psychology and limitations (e.g., confirmation bias, cognitive fatigue)
  • Explanations can come in many forms (e.g., text, graphs, statistics): when using visualization to provide insights, use best practices from HCI and visualization.
  • Any aggregated summary may lose information and hide details (e.g., partial dependency plots).
  • The ability to understand the parts of the ML system (especially inputs) and how all the parts work together (“completeness”) helps users to build clearer mental models of the system. These mental models match actual system performance more closely, providing for a more trustworthy experience and more accurate expectations for future learning.
  • Be mindful of the limitations of your explanations (e.g., local explanations may not generalize broadly, and may provide conflicting explanations of two visually-similar examples).

Test, Test, Test

Learn from software engineering best test practices and quality engineering to make sure the AI system is working as intended and can be trusted.

  • Conduct rigorous unit tests to test each component of the system in isolation.
  • Proactively detect input drift by testing the statistics of the inputs to the AI system to make sure they are not changing in unexpected ways.
  • Use a gold standard dataset to test the system and ensure that it continues to behave as expected. Update this test set regularly in line with changing users and use cases, and to reduce the likelihood of training on the test set.
  • Conduct iterative user testing to incorporate a diverse set of users’ needs in the development cycles.
  • Apply the quality engineering principle of poka-yoke: build quality checks into a system so that unintended failures either cannot happen or trigger an immediate response (e.g., if an important feature is unexpectedly missing, the AI system won’t output a prediction).
  • Conduct integration tests: understand how the AI system interacts with other systems and what, if any, feedback loops are created (e.g., recommending a news story because it’s popular can make that news story more popular, causing it to be recommended more).

Examples of our work


ML models learn from training data and make predictions on input data. Sometimes the training data, input data, or both can be quite sensitive. Although there may be enormous benefits to building a model that operates on sensitive data (e.g., a cancer detector trained on a responsibly sourced dataset of biopsy images and deployed on individual patient scans), it is essential to consider the potential privacy implications in using sensitive data. This includes not only respecting the legal and regulatory requirements, but also considering social norms and typical individual expectations. For example, it’s crucial to put safeguards in place to ensure the privacy of individuals considering that ML models may remember or reveal aspects of the data they have been exposed to. It’s essential to offer users transparency and control of their data.

Fortunately, the possibility that ML models reveal underlying data can be minimized by appropriately applying various techniques in a precise, principled fashion. Google is constantly developing such techniques to protect privacy in AI systems, including emerging practices for generative AI systems. This is an active area of research in the AI community with ongoing room for growth. Below we share the lessons we have learned so far.

Recommended practices

Just as there is no single "correct" model for all ML tasks, there is no single correct approach to ML privacy protection across all scenarios, and new ones may arise. In practice, researchers and developers must iterate to find an approach that appropriately balances privacy and utility for the task at hand; for this process to succeed, a clear definition of privacy is needed, which can be both intuitive and formally precise.

Collect and handle data responsibly

  • Identify whether your ML model can be trained without the use of sensitive data, e.g., by utilizing non-sensitive data collection or an existing public data source.
  • If it is essential to process sensitive training data, strive to minimize the use of such data. Handle any sensitive data with care: e.g., comply with required laws and standards, provide users with clear notice and give them any necessary controls over data use, follow best practices such as encryption in transit and rest, and adhere to Google privacy principles.
  • Anonymize and aggregate incoming data using best practice data-scrubbing pipelines: e.g., consider removing personally identifiable information (PII) and outlier or metadata values that might allow de-anonymization (including implicit metadata such as arrival order, removable by random shuffling, as in Prochlo; or the Cloud Data Loss Prevention API to automatically discover and redact sensitive and identifying data).

Leverage on-device processing where appropriate

  • If your goal is to learn statistics of individual interactions (e.g., how often certain UI elements are used), consider collecting only statistics that have been computed locally, on-device, rather than raw interaction data, which can include sensitive information.
  • Consider whether techniques like federated learning, where a fleet of devices coordinates to train a shared global model from locally-stored training data, can improve privacy in your system.
  • When feasible, apply aggregation, randomization, and scrubbing operations on-device (e.g., Secure aggregation, RAPPOR, and Prochlo's encode step). Note that these operations may only provide pragmatic, best-effort privacy unless the techniques employed are accompanied by proofs.

Appropriately safeguard the privacy of ML models

Because ML models can expose details about their training data via both their internal parameters as well as their externally-visible behavior, it is crucial to consider the privacy impact of how the models were constructed and may be accessed.

  • Estimate whether your model is unintentionally memorizing or exposing sensitive data using tests based on “exposure” measurements or membership inference assessment. These metrics can additionally be used for regression tests during model maintenance.
  • Experiment with parameters for data minimization (e.g., aggregation, outlier thresholds, and randomization factors) to understand tradeoffs and identify optimal settings for your model.
  • Train ML models using techniques that establish mathematical guarantees for privacy. Note that these analytic guarantees are not guarantees about the complete operational system.
  • Follow best-practice processes established for cryptographic and security-critical software, e.g., the use of principled and provable approaches, peer-reviewed publication of new ideas, open-sourcing of critical software components, and the enlistment of experts for review at all stages of design and development.

Examples of our work

Safety and security

Safety and security entails ensuring AI systems behave as intended, regardless of how attackers try to interfere. It is essential to consider and address the safety of an AI system before it is widely relied upon in safety-critical applications. There are many challenges unique to the safety and security of AI systems. For example, it is hard to predict all scenarios ahead of time, when ML is applied to problems that are difficult for humans to solve, and especially so in the era of generative AI. It is also hard to build systems that provide both the necessary proactive restrictions for safety as well as the necessary flexibility to generate creative solutions or adapt to unusual inputs. As AI technology evolves, so will security issues, as attackers will surely find new means of attack; and new solutions will need to be developed in tandem. Below are our current recommendations from what we’ve learned so far.

Recommended practices

Safety research in ML spans a wide range of threats, including training data poisoning, recovery of sensitive training data, model theft and adversarial security examples. Google invests in research related to all of these areas, and some of this work is related to practices in AI privacy. One focus of safety research at Google has been adversarial learning—the use of one neural network to generate adversarial examples that can fool a system, coupled with a second network to try to detect the fraud.

Currently, the best defenses against adversarial examples are not yet reliable enough for use in a production environment. It is an ongoing, extremely active research area. Because there is not yet an effective defense, developers should think about whether their system is likely to come under attack, consider the likely consequences of a successful attack and in most cases should simply not build systems where such attacks are likely to have significant negative impact.

Another practice is adversarial testing, a method for systematically evaluating an ML model or application with the intent of learning how it behaves when provided with malicious or inadvertently harmful input, such as asking a text generation model to generate a hateful rant about a particular religion. This practice helps teams systematically improve models and products by exposing current failure patterns, and guide mitigation pathways (e.g. model fine-tuning, or putting in place filters or other safeguards on input or outputs). Most recently, we’ve evolved our ongoing "red teaming” efforts, an adversarial security testing approach that identifies vulnerabilities to attacks – to "ethically hack" our AI systems and support our Secure AI Framework.

Identify potential threats to the system

  • Consider whether anyone would have an incentive to make the system misbehave. For example, if a developer builds an app that helps a user organize their own photos, it would be easy for users to modify photos to be incorrectly organized, but users may have limited incentive to do so.
  • Identify what unintended consequences would result from the system making a mistake, and assess the likelihood and severity of these consequences.
  • Build a rigorous threat model to understand all possible attack vectors. For example, a system that would allow an attacker to change the input to the ML model may be much more vulnerable than a system that processes metadata collected by the server, like timestamps of actions the user took, since it is much harder for a user to intentionally modify input features collected without their direct participation.

Develop an approach to combat threats

Some applications, e.g., spam filtering, can be successful with current defense techniques despite the difficulty of adversarial ML.

  • Test the performance of your systems in the adversarial setting. In some cases this can be done using tools such as CleverHans.
  • Create an internal red team to carry out the testing, or host a contest or bounty program encouraging third parties to adversarially test your system.

Keep learning to stay ahead of the curve

  • Stay up to date on the latest research advances. Research into adversarial machine learning continues to offer improved performance for defenses and some defense techniques are beginning to offer provable guarantees.
  • Beyond interfering with input, it is possible there may be other vulnerabilities in the ML supply chain. While to our knowledge such an attack has not yet occurred, it is important to consider the possibility and be prepared.

Examples of our work