Research ethics in machine vision need some attention
The AI community talks big about ethics, but skeptics are interested in its walk. For machine vision researchers, at least, there is room for improvement in demonstrating responsible behavior.
An article published this month by Princeton University’s Center for IT Policy singles out a controversial — though influential — video data set that was taken down in June 2019 but which lives on.
At least 135 research papers have used the data set in question, the Duke MTMC (multi-target, multi-camera), a product of Duke University, since it was withdrawn in May 2019.
The Princeton blog post uses the Duke MTMC set to illustrate a broader problem of people readily accessing sketchy vision data sets that have been withdrawn.
It can be also found in the way Microsoft withdrew MS-Celeb after criticism, but it reportedly can be found on Academic Torrents, according to Arvind Narayanan, one of the co-authors of the Princeton post.
Narayanan writes that the MS-Celeb data set is part of MS1M-IBUG, MS1M-ArcFace and MS1M-RetinaFace, all of which are in the open and available.
Meanwhile, the Duke data is surveillance video shot in 2014. According to the facial recognition blog Megapixels, the data set is most often used for developing person re-identification, video tracking and low-resolution facial recognition systems.
(Megapixels’ 2016 takedown of the incident is detailed and worth a read.)
It is unclear how many of the 2,000 Duke students captured in the data set knew they would appear in the multi-camera footage that has become Duke MTMC. There are other reasons it remains controversial, too.
Researchers working for biometrics vendors SenseNets and SenseTime used the Duke MTMC data set, and wrote about their re-identification experiments in 2018. The vendors are alleged to be aiding the Chinese government repress Uighur Muslims. Dozens more Chinese companies — and its military — are known to have used the algorithm in China.
Princeton’s Center for IT Policy article states that the video data set today is copied by other researchers and used with some modifications in new data sets. Some of those so-called derived data sets have themselves been withdrawn, but not all.
“Together,” according to the center, “the availability of the data, and the willingness of researchers and reviewers to allow its use, has made the removal of Duke MTMC only a cosmetic response to ethical concerns.”
The center’s article calls for more ethical decisions related research and development. There also needs to be a way to prevent derived sets from being used in unethical research. And authors should think about licenses that address this problem.