Clearview’s new storage system reduces stress on CPUs for database scalability
Clearview AI has introduced an update to its NNDB (Nearest Neighbor Database) system for indexing and searching billions of face vectors. Building on its predecessor, the upgrade lightens and makes more efficient the processing of data, promising to reduce 80 percent of computing costs and improve throughput by a factor of ten.
The system uses an SSD-based on-disk graph index structure that can optimize the placement and search of more vectors by assigning queries to data “buckets.” The vectors are stored in the buckets on-disk, with the graph index sufficiently representative of the underlying data to perform searches in CPU memory.
Clearview says its index is “carefully constructed to represent the demographic diversity of human faces,” and will improve accuracy in search at the deca-billion scale.
The technology harnesses two open source tools, Faiss and RocksDB. Writes Clearview’s Vice President of Machine Learning and Research, Terence Liu, “We created the binding layer that enabled Faiss’s inverted file structure to read from and write to RocksDB in parallel. By leveraging the strengths of both open source projects, we were able to create a scalable vector database, a winning duo forged into one.”
Liu discussed the company’s use of similarity calculations when building its databases in an interview last August.
The BBC recently reported that Clearview had reached a milestone, enabling American law enforcement to conduct one million facial recognition searches to investigate crimes.