Federal law enforcement must now conduct transparent, standardized AI field testing
A White House advisory panel voted to approve a 24-page report that sets forth specific actions that all federal law enforcement agencies must now undertake when performing real-world testing of AI tools in the field, including AI-enhanced facial recognition technologies.
The report and its recommendations now go the White House and the National AI Initiative Office. President Joe Biden’s October 23 executive order on the safe, secure, and trustworthy development and use of AI “explicitly requires real-world testing” of AI tools. The problem for federal law enforcement agencies though had been that there wasn’t a standardized approach to testing AI in “the wild.” The subcommittee’s approved recommendations for AI field testing fixes that problem.
The adopted report and findings also says that “the results of the real-world testing should be made public so that they may contribute to an informed conversation and debate about the responsible use of AI.”
Additionally, the subcommittee called for increased funding to support research at the state and local law enforcement level.
The Law Enforcement Subcommittee of the National Artificial Intelligence Advisory Committee (NAIAC) which voted to approve the findings explained that “very few resources [had been] available to help guide the AI industry, law enforcement departments, and independent researchers through the process of testing AI tools when they are provisionally used in the field. This report and set of recommendations provide the infrastructure for AI field testing in the context of policing.”
The report puts forth a checklist for law enforcement agencies when carrying out tests of the performance of an AI tool before it is fully adopted and integrated into normal use. It synthesizes a range of empirical testing methods and adapted them to the context of policing using the National Institute of Standards and Technology (NIST) AI Risk Management Framework.
Specifically, field test designers are guided “through best practices for the “map and “measure” stages of AI risk management. The “management” phase of trustworthy implementation of AI is not addressed in this report, “but the evidence derived from field testing will allow decision-makers to make informed decisions as they manage and tradeoff multiple risks and objectives.”
The approved report states “field testing is essential to the government and the public’s understanding of AI applications in law enforcement. However, a good field test will need to be designed carefully to fit the context, needs, and practical limitations of a particular AI application. Researchers, police departments, and technology vendors will have to work together to create the conditions for high-quality field testing.”
Jane Bambauer, chair of the NAIAC subcommittee which approved the measure, said the report’s findings and recommendations represents only “one iteration and by no means [is] the sole means of understanding how an AI performs its function in the field.”
Bambauer pointed out that it’s “still pretty early in this idea of testing AI in the field” and that procedures that were approved by the subcommittee certainly can and likely will “be improved on.”
“The responsible use of AI in law enforcement requires AI developers to train, test, and audit their AI tools to ensure that the results of a predictive tool are sufficiently accurate, non-discriminatory, rights- respecting, and cost-effective,” the adopted report says, noting that “the true value and risks of an AI tool will depend on how it operates in the real world.”
The report points out that “when law enforcement agencies adopt a new technology, they often have to rely on testing performed under relatively sterile conditions,” and that they “may be justifiably concerned that their particular use of the tool in its operational context will lead to different performance characteristics than either published tests or as reported by other agencies.”
The report noted that “the testing performed by producers of an AI tool sometimes have not been independently verified, and this simultaneously can create too much optimism for a poor-performing tool or too much skepticism of a useful tool,” and “as a result, law enforcement (as well as the public) often don’t have good information about whether the tool is as accurate, fair, high-performing, and cost-saving as expected.”
The panel’s recommended testing methods “are listed in the order that is typically associated with validity, from most rigorous (blind randomized controlled trials) to least (matched case studies).”
All of the tests, “when designed properly, can produce useful information that improves the available evidence base, but the methods listed higher in the hierarchy are more likely to suggest causal relationships by removing the influence of external factors (“confounders”),” the report notes.
The report says “any time a field test is designed in advance, it creates an opportunity to discover information about a wide range of effects,” and that “each output metric typically adds only a minimal amount of extra cost or effort. For this reason,” the panel approved the recommendation to consider and collect data “on the widest range of outcomes that could plausibly be useful.”
The subcommittee’s three approved recommendations are:
- The Office of Management and Budget (OMB) require federal law enforcement agencies to undergo a form of field testing consistent with the checklist provided. The field-testing requirement may be waived if the agency’s use policy restricts the tool’s use to the same use policy, and substantially similar conditions, under which it has been previously field tested by another agency.
- OMB to revise its guidance to clarify that field testing plans and results must be published in the relevant AI inventory or on another public government website. This should occur even if the AI application is not adopted following the field test.
- Consistent with White House policy for “Removing Barriers to the Responsible Use of AI,” Congress should create special-purpose grants, to be awarded by the Bureau of Justice Assistance that will support collaborations between police agencies, technology producers, and independent researchers for the specific purpose of conducting independent field testing of AI law enforcement tools. Review of proposals should be based in part on consistency with the Field Test Checklist.
Article Topics
AI | biometrics | facial recognition | law enforcement | NAIAC | U.S. Government
Comments