- Can an object-detection model trained on artificial images evaluate real images?
- Were there differences in the performance of models trained on purely real images, purely artificial images, and hybrids (those consisting of a combination of artificial and real images)?
Automated target recognition (ATR) is one of the most important potential military applications of the many recent advances in artificial intelligence and machine learning. A key obstacle to creating a successful ATR system with machine learning is the collection of high-quality labeled data sets. The authors investigated whether this obstacle could be sidestepped by training object-detection algorithms on data sets made up of high-resolution, realistic artificial images. The authors generated large quantities of artificial images of a high-mobility multipurpose wheeled vehicle (HMMWV) and investigated whether models trained on these images could then be used to successfully identify real images of HMMWVs. The authors obtained a clear negative result: Models trained on the artificial images performed very poorly on real images. However, they found that using the artificial images to supplement an existing data set of real images consistently results in a performance boost. Interestingly, the improvement was greatest when only a small number of real images was available. The authors suggest a novel method for boosting the performance of ATR systems in contexts where training data are scarce. Many organizations, including the U.S. government and military, are now interested in using synthetic or simulated data to improve machine learning models for a wide variety of tasks. One of the main motivations is that, in times of conflict, there may be a need to quickly create labeled data sets of adversaries' military assets in previously unencountered environments or contexts.
- Although the authors found that artificial images cannot replace real images, artificial images can supplement an existing data set of real images to boost performance.
- Models trained on artificial images performed very poorly on real images.
- Hybrid training sets—those consisting of a combination of artificial and real images—produced better performance than algorithms trained on real images alone.
- The improvements were most noticeable when the number of real images was severely limited.
- By boosting a data set of five real images with ten artificial ones, the authors were able to improve the precision and recall of the model by 54 percent and 29 percent, respectively.
- More research would be needed to determine under what conditions, if any, models trained successfully on artificial images might perform well on real images.
- More work is also needed to better understand the ability of models trained on hybrid data sets to perform well on purely real images.
Table of Contents
Operationally Relevant Data
The U.S. Military's Use of Bohemia Interactive Products
Additional Model and Hyperparameter Details