Our Research

Our goal was to develop a design optimized to increase the likelihood of successful identification of pedestrians by a wide variety of machine learning object detectors. To achieve this, we built a dataset of images of pedestrians in difficult-to-discern situation. These images are obfuscated through various means such as fog, shadow, or image manipulation processes, including distortion, contrast and brightness variation, noise insertion, tint changing, and color saturation variation. The dataset is labeled and bounding boxes indicating the presence of a pedestrian are computed for each image, forming our test dataset.


We then digitally add a starting image roughly in the shape of a t shirt to the people in the test dataset. For this, we use a virtual try on pipeline to fit the “shirt” to the person in the image, which ensures that we’re optimizing a shirt-shaped surface and not a flat, 2d image. The resulting combination dataset is presented to one or more machine learning object detectors trained to recognize pedestrians.


In each iteration, we use the likelihood that the pedestrian in the test dataset has been accurately labeled (found within their bounding box) to compute a loss function. The loss function values are used to guide the manipulation of pixels in the workpiece image for combination with the image dataset in subsequent iterations.


Our loss function is the inverse of the likelihood that a pedestrian is accurately spotted. We minimize the loss function to increase the probability that the object detector can find the obfuscated pedestrian. We’ve also experimented with including the total variation of pixels in the image and other design-oriented parameters such as similarity to a given target design image or set of images and different measures of image simplicity.


We then compute the gradients of the starting image pixel values with respect to the loss function, apply gradient descent to minimize the loss gradient, and update the starting image's pixels accordingly. The modified image is again combined with the test dataset, and the resulting combination dataset is presented to one or more machine learning object detectors in a new iteration. This process is repeatedly iterated to increase the likelihood that the pixels in the workpiece image are optimized to maximize the likelihood that the pedestrian(s) is perceived accurately by the machine learning object detector(s) and that the resulting design meets any applicable appearance constraints, such as color, shape, similarity to a given target image or set of images, different measures of image simplicity, or other potential desirable characteristics. We generally see a plateau after a few hundred epochs of optimization, but we are working to update our optimizer to improve our results further.


When the design resulting from the optimized manipulation of the starting image is applied to a pedestrian, the likelihood that a trained machine learning object detector will correctly locate and recognize the pedestrian is increased.