The system learns to identify objects based on partial ‘viewlets’
Current artificial intelligence (AI) computer vision systems can identify visual images, but these systems are very task-specific. Up to date, such systems were unable to confidently guess what picture they were shown after seeing only certain parts of it. Even best today’s computer vision systems can be fooled by showing the object in an unusual setting.
Engineers from UCLA Samueli School of Engineering and Stanford are aiming to make computer vision systems which will be able identify full picture of objects based on their partial glimpses – just like human beings can conclude that they are looking at a cat, even when the animal is hiding behind a door and only the tail and paws are visible. Humans can also easily understand where the cat's head and other parts of its body are, however, this ability still inscrutable for most AI computer vision systems. AI-based systems do not create an internal picture or a common-sense model of the observed objects in the way humans do.
The researchers led by Vwani Roychowdhury described a new method shows a way to overcome these shortcomings of AI systems. This method was described in the paper published in the Proceedings of the National Academy of Sciences on January 2, 2019.
The approach has three main steps. First, the computer breaks up an image into small pieces, called “viewlets”. Second, the system learns how these pieces fit together to form the initial object. And finally, it looks at other objects are in the setting, and conclude whether or not information about those objects is needed to describe and identify the primary object.
The engineers tested their system with about 9,000 images showing people in the surrounding of other objects. The system managed to build a detailed model of the human body without external help and without the pictures being labeled.
The engineers carried out similar tests with images of cars, airplanes and motorcycles. In all those cases, the new system managed to perform better or as well as traditional computer vision systems developed thanks to many years of training.
V. Roychowdhury commented that “contextual learning is a key feature of our brains, and it helps us build robust models of objects that are part of an integrated worldview where everything is functionally connected.”
Author: Alena Snezhnaya