DNA processing carried out by DHS S&T for FBI, intel agencies
DNA “bio-forensic” data is being used by the FBI — and other law enforcement and intelligence agencies – to determine whether a person’s DNA can sufficiently be broken down to identify – or at least zero in on – unidentified subjects of interest by analyzing their or known familial acquired DNA against a database of known and unknown persons’ DNA as part of Top Secret/Sensitive Compartmented Information (TS/SCI) development projects and law enforcement casework to evolve lead generation and bio-crime attribution.
The DNA analysis is being done by the Department of Homeland Security’s (DHS) Science and Technology Directorate’s (S&T) National Biodefense Analysis and Countermeasures Center’s (NBACC) newly created Classified Data Network and Analysis system (CDNA).
CDNA is operated within the National Bioforensic Analysis Center (NBFAC) program accessible only through the Joint Worldwide Intelligence Communications System (JWICS), which is used exclusively by the government for transmission of information and intelligence classified Top Secret, TS/SCI, codeword access only, Special Access Programs, etc.
CDNA is limited though to only processing DNA and amino-acid sequences transferred to CDNA from external sources, like the FBI, for the analysis of bio-forensic data that includes the DNA or amino-acid sequences of humans.
However, the recent Privacy Impact Assessment (PIA) of the CDNA by the S&T CDNA System Project Manager, NBACC, and DHS’s Chief Privacy Officer, noted that, while CDNA is not used to identify individuals, and the identity of an individual cannot be inferred from a DNA sequence or other information in the CDNA database, NBACC can be utilized to use the CDNA system for classified human forensic casework if requested by the FBI (if, for instance, human DNA is found at the scene of a crime, particularly if it involves terrorism or counterintelligence operations).
Comparing an unknown DNA genetic sequence against a known sequence may not create a one-to-one match,” but, “if sufficient human DNA is sequenced at specific marker locations, a DNA profile for the unknown sample could be generated and then used to identify a relationship to known DNA profiles,” including profiles of terrorists, especially terrorists who have been involved in biological and other weapons of mass destruction development or handling; terrorists involved in any sort of bomb-making, or foreign intelligence operatives who’ve left DNA during classified intelligence or covert military operations, according to sources familiar with how the Top Secret program is being used.
Commercial DNA genealogical relationships between two individuals is done by comparing a person’s autosomal DNA (chromosomes 1-22) and X chromosome(s) with others using the same service who also have elected to know if possible relatives have opted to know of possible DNA relatives. The way commercial DNA testing works is that, while the autosomal DNA is inherited in the same way for both genders, the X chromosome is not — autosomal DNA is inherited equally from both parents for both men and women. The X chromosome meanwhile is a sex chromosome women inherit from each parent, while men only receive an X chromosome from his mother. In men, the X chromosome is paired with the Y chromosome, which is only inherited by his father.
23andMe says, “We say that two individuals share DNA when both individuals inherited the same DNA from the same ancestor. For example, you and your sister share DNA that you both inherited from the same parent. You and your first cousin share DNA inherited from your mutual grandparents. The 23andMe DNA Relatives feature uses patterns of DNA sharing to estimate relationships.” Continuing, the company says, “Our simulations have concluded that we can confidently detect related individuals if they have at least one continuous region of matching SNPs (Single Nucleotide Polymorphisms) that is longer than our minimum threshold of 7cM (centiMorgans) long and at least 700 SNPs.”
However, testing done by individuals by all commercial DNA testing businesses have found significant disparities and differences. Consequently, it’s believed that NBACC is part of a FBI initiative to find methodologies for refining SNPs to a level where disparate acquired DNA SNPs can be tightened to make nearly 100 percent connections between people – such as a terrorist related to someone in the government’s DNA databases — thus providing a nearly certain “lead” to the suspect’s identity, according to intelligence sources familiar with the Top Secret programs.
In the obscure paper, Evaluation of the Precision ID Identity Panel for the Ion Torrent PGM Sequencer, by two scientists working at the FBI’s Investigation Laboratory Division’s Counterterrorism and Forensic Science Research Unit at Quantico, Virginia, the authors’ stated: “In cases where only a partial or incomplete STR profile is obtained from a sample, information contained in single nucleotide polymorphisms can prove informative for human identification.”
STR, or, Short Tandem Repeats (STR), are among the most informative polymorphic markers in the human genome. An STR analysis is a tool used in forensic analysis that evaluates specific STR regions found on nuclear DNA. The variable (polymorphic) nature of the STR regions that are analyzed for forensic testing intensifies the discrimination between one DNA profile and another.
The authors, Kelly A Meiklejohn, PhD, Assistant Professor of Forensic Science at North Carolina State University’s Department of Population Health and Pathobiology; and Rae M. Robertson-Anderson, PhD, Associate Professor and Chair of the Physics and Biophysics Department the University of San Diego, reported that, in their testing using Thermo Fisher Scientific’s Precision ID Identity Panel, a multiplex SNP panel for human identity, “demonstrate[d] that it is possible to obtain reliable and reproducible genotypes using the Precision ID Identity Panel, when using low quantities (≥0.2ng) of either pure native DNA or forensic type DNA samples,” and, “100 percent congruence among genotype calls …”
In another paper, Forensically Relevant SNaPshot Assays for Human DNA SNP Analysis: A Review, by scientists at the National Centre for Forensic Studies, Faculty of Education, Science, Technology and Mathematics (ESTeM), University of Canberra; Office of the Chief Forensic Scientist, Victoria Police Forensic Services Department, Macleod, Australia; and the Forensic Genetics Unit, Institute of Forensic Sciences, at the University of Santiago de Compostela in Spain, they stated, “Short tandem repeats are the gold standard for human identification, but are not informative for forensic DNA phenotyping (FDP). Single-nucleotide polymorphisms as genetic markers can be applied to both identification and FDP. The concept of DNA intelligence emerged with the potential for SNPs to infer biogeographical ancestry (BGA) and externally visible characteristics (EVCs), which together enable the FDP process. For more than a decade, the SNaPshot technique has been utilized to analyze identity and FDP-associated SNPs in forensic DNA analysis. SNaPshot is a single-base extension (SBE) assay with capillary electrophoresis as its detection system. This multiplexing technique offers the advantage of easy integration into operational forensic laboratories without the requirement for any additional equipment. Further, the SNP panels from SNaPshot assays can be incorporated into customized panels for massively parallel sequencing (MPS). Many SNaPshot assays are available for identity, BGA and EVC profiling with examples including the well-known SNPforID 52-plex identity assay, the SNPforID 34-plex BGA assay, and the HIrisPlex EVC assay. This review lists the major forensically relevant SNaPshot assays for human DNA SNP analysis and can be used as a guide for selecting the appropriate assay for specific identity and FDP applications.”
In 2017, the FBI’s DNA Casework Unit (DCU), located at Quantico, Virginia, requested numerous protein and nucleotide female Homo Sapiens Genome sequencing from BioProject as part of a forensic genetics biological sexing of a 4,000 year-old Egyptian mummy head “to assess the potential of nuclear DNA recovery from the most damaged and limited forensic specimens.”
Interestingly, DCU noted that, “Mitochondrial DNA (mtDNA) is a form of DNA that is transmitted from mother to child in a complete set; therefore, anyone in the maternal lineage will have the same mtDNA profile. This type of DNA testing can be useful on evidence items such as naturally shed hairs, hair fragments, bones, and teeth. MtDNA analysis is highly sensitive and may allow scientists to obtain information from items of evidence associated with cold cases, missing persons, samples from mass disasters, and small pieces of evidence containing little biological material …”
According to the FBI, DCU “provides forensic DNA examinations to the FBI and other duly constituted law enforcement agencies in support of criminal, missing persons, and intelligence cases through evidence testing using forensic serological, mitochondrial DNA, and nuclear DNA methodologies.”
The DNA Support Unit (DSU) also “ensures the DCU remains flexible and responsive to … evolving intelligence threats.”
For its part, S&T only conducted the PIA to analyze the specific potential privacy risks in the collection, analysis, and storage of human DNA sequences by CDNA. It does not address what the FBI and other federal agencies necessarily do with the information.
The PIA explained that, “The NBFAC is designated by Homeland Security Presidential Directive-10 (HSPD-10) to be the lead federal facility in the technical forensic analysis of materials recovered following a biological attack in support of the appropriate lead federal agency.”
NBFAC is one of two biological laboratory programs under the NBACC, which “was established to fill critical, biodefense-related shortfalls in the nation’s scientific knowledge of biological agents that could be used to cause harm to the public. The NBACC laboratory is a government-owned facility operated and managed as a Federally Funded Research and Development Center (FFRDC) by Battelle National Biodefense Institute, LLC (BNBI) for DHS S&T.
As the PIA notes, “CDNA conducts only DNA sequence and amino acid sequence analysis, which cannot be used to identify individuals. It is not currently used for ‘DNA profiling,’ which is the process of determining an individual’s DNA characteristics (which are as unique as fingerprints). CDNA is only used for analyzing and processing data from classified projects, and is not used for unclassified casework; therefore, analysis conducted through CDNA has a nexus to national security matters.”
Pursuant to a Memorandum of Agreement (MOA) between NBACC and FBI agreed to on September 17, 2018, “the FBI submits unknown DNA and amino-acid sequences (FBI Sequences) it obtains to NBACC for matching against the Known Sequence Database,” the PIA disclosed. “The FBI Sequences are DNA and amino-acid sequences collected by the FBI in connection with its authorized responsibilities, and are considered to be classified information. For example, CDNA may analyze FBI Sequences related to possible construction of a biological weapon of mass destruction. Human sequences submitted by the FBI could be human DNA found at the scene of constructing a chemical, biological, radiological, or nuclear weapon.”
The FBI Sequences are submitted to NBACC “through the FBI Liaison who is assigned to, and is onsite at, NBACC. These sequences are assigned anonymized alphanumeric identifiers by the FBI prior to submission to NBACC. This PIA only addresses the human DNA and amino-acid sequences that CDNA uses for the analysis of bio-forensic data. The only human DNA and amino-acid sequences in CDNA are those received from the FBI or downloaded from [the National Institutes of Health (NIH)] Known Sequence Database.”
Continuing, the PIA explained that, “Upon receipt of an FBI Sequence through the FBI Liaison, NBACC uses CDNA to characterize the sequence by first searching against the publicly available Known Sequence Database using the Basic Local Alignment Search Tool (BLAST) software managed by the National Center for Biotechnology Information (NCBI).
“The results of BLAST software sequence similarity searches are made available to human analysts who then develop a final report analyzing the search findings for each sequence in a case,” the PIA disclosed. “A compiled final report is sent to the FBI that includes, for each input sequence, the alphanumeric identifier and the report of findings regarding the sequence, including whether it matches a sequence in the Known Sequence Database. If the input sequence matches one or more sequences in the Known Sequence Database, the report also includes the National Institutes of Health [NIH] assigned identifier(s) for the matching sequence(s). The report, together with the corresponding FBI identifier, is provided to the FBI Liaison, who is responsible for submitting the report and data to the FBI. If an FBI Sequence does not match against the Known Sequence Database, the FBI is provided an Unknown Sample Final Report. The NIH does not maintain associated identifying information on human sequences in the Known Sequence Database.”
Technically, “A submitter of data to GenBank [NIH’s Known Sequence Database] is allowed to submit data derived from sequencing human DNA, but submitters are not supposed to submit “human sequences to GenBank,” that “include[s] any data that could reveal the personal identity of the source. GenBank assumes that the submitter has received any necessary informed consent authorizations required prior to submitting sequences.”
The CDNA system is so sensitive with regard to the DNA biometric information it contains that even its IP address — which is required to log into the system — is classified Top Secret. According to the PIA, “CDNA system user password complexity, password expiration, and system logging requirements are based on the Security Technical Implementation Guides (STIGs) from the Defense Information Systems Agency” which include biometric access control guidelines. “The CDNA system logs the following events – Successful/Failed login, privilege escalation, tasks performed under privilege escalation, cron (recurring) jobs, bash jobs, every attempted network access, and external media connected to any CDNA server. CDNA system logs are reviewed periodically by the CDNA system administrator. The CDNA system servers are housed in a SCIF server room at NBACC. JWICS workstations that can access CDNA are all located in a SCIF office area at NBACC, and both SCIFs are guarded by armed guards and monitored by 24 -hour video surveillance.”
CDNA system users must possess TS/SCI clearance, Top Secret suitability, been “read in” to specific TS/SCI classified projects, and have a “need-to-know” in order to access casework data on CDNA. The two types of roles are: bioinformatics staff who can log into CDNA, transfer FBI unknown sequence data from JWICS workstations to the CDNA system server, perform searches against known sequence databases, and transfer search results back to JWICS workstations; and, analysis staff who are subject matter experts who “review the search results and develop text describing the characterization of the unknown sequences in order to develop a final case report for the FBI. Remote access to the CDNA system is not allowed. External storage and communications devices are forbidden to be connected to the CDNA system,” the PIA says, giving clues to just how sensitive the data is.
Continuing, the PIA says “CDNA periodically downloads from the NIH its publicly available database of known DNA and amino-acid sequences (the Known Sequence Database), which includes both human and non-human sequences. The human sequences in the Known Sequence Database are identified as ‘human,’” but “no associated identifying information is provided or available regarding who is the source of a human sequence … The NIH sequence identifier for a human sequence does not contain any personal identifiers that would permit NBACC to link a human sequence to an individual.”
The PIA asserts that, “A CDNA report to the FBI on an FBI-provided sequence could not be used to match a human sequence to an individual because the CDNA Report only indicates whether the sequences are thought to be from a human source … The type of biological forensic data may be used by the FBI, and other agencies who submitted sequences to the FBI, for lead generation and bio-crime attribution. Other agencies do not submit directly to NBACC.”
“NBACC does not directly receive any DNA or amino-acid sequences other than the downloaded Known Sequence Database and FBI Sequences,” the PIA says, noting that, “If a federal, state, or local government agency or unit other than the FBI has DNA sequences it wishes to test through CDNA, that agency or unit cannot submit its sequences directly to NBACC. Rather, it submits those sequences to the FBI, which in turn follows its own evidentiary and privacy process and procedures prior to submitting to NBACC,” whose “reports on such sequences are returned to the FBI through the FBI Liaison; NBACC does not submit reports or communicate directly with any other federal, state, or local government agency or unit.”
According to the PIA, “Upon receipt of an FBI Sequence through the FBI Liaison, NBACC uses CDNA to characterize the sequence by first searching against the publicly available Known Sequence Database using the BLAST software,” the results of which on “sequence similarity searches are made available to human analysts who then develop a final report analyzing the search findings for each sequence in a case. A compiled final report is sent to the FBI that includes, for each input sequence, the alphanumeric identifier and the report of findings regarding the sequence, including whether it matches a sequence in the Known Sequence Database. If the input sequence matches one or more sequences in the Known Sequence Database, the report also includes the NIH assigned identifier(s) for the matching sequence(s). The report, together with the corresponding FBI identifier, is provided to the FBI Liaison, who is responsible for submitting the report and data to the FBI.”
Finally, “After the final CDNA Report is received and accepted by the FBI, and at the direction of the FBI, all corresponding casework data created by NBACC is deleted from the CDNA system. Neither CDNA nor NBACC maintains a copy of reports provided to the FBI.”