Computer vision: how computers see and understand the world
Computer vision in simple terms is the science of how computers can be programmed to understand what they visually identify. The technology is implemented in applications where you want to make sense of the pixels in a certain image. The AI algorithm automatically analyses images and produces metadata based on it, such as date, time, camera type and geographical location, among other things. It is also possible for these algorithms to detect objects like people, animals, cars or buildings.
It differs from AI-based image recognition because computer vision essentially ‘talks’ to the camera with the help of an interface, whereas in image recognition, it just identifies objects within a digital image.
Deep learning is what enables computer vision technology. It is an approach to artificial intelligence that employs algorithms inspired by the structure and function of neurons in animal brains. Where deep learning comes in is that it employs multiple processing layers that are capable of using artificial intelligence that processes information more quickly than traditional computer programmes.
It is the continuous evolution of deep learning for computer vision that has made it a must-have for many organisations today, because it has been able to provide a new way for computers to ‘see’ and understand the world around them.
Industry use cases
Berlin-based Mobius Labs says it is developing a new generation of AI-powered computer vision which can be deployed on an organisation’s own servers, as well as apps on mobile devices. This, the company believes, will disrupt how the world works with visual content by enabling users to turn it into immediate insight and value.
Its computer vision technology is already being used by the European press agency, ANP and stock photography community, EyeEm. We asked Mobius’ CEO and Chief Scientist, Appu Shaji, to share his thoughts on computer vision and how it can be employed by businesses to their advantage.
“Computer vision tries to understand from a physiological sense how our brains are able to perceive our visual world. And as mentioned, one of the most popular and effective glues allowing us to connect these two fields are machine learning techniques, which encode the act of learning - and eventually understanding - computer algorithms,” he explains.
“Computer vision technology has a role to play in nearly every imaginable walk of life. In the media sector, the technology can not only detect the content of an image, but grade the style and quality of the visuals. The aesthetic score can be determined in a couple of seconds, assisting marketing, advertising or editorial departments to select the most pleasing photographs. It can also scrutinise thousands of video clips to provide relevant recommendations, plus flag and or block inappropriate content. It can also be trained to match influencers with brands to grow new client bases.”
Shaji also gave the example of the geospatial industry. Lightweight solutions can be installed directly on-board orbiting satellites so collected satellite imagery is processed and analysed in space and only the usable imagery is downloaded back to Earth. The result of this, he says, is a massive reduction in downlinking costs. He adds the same method can be used in identifying infestations in crops and monitoring critical events, like the expansion of riverfronts and detecting waterborne shipping containers.
Kasia Borowska, Founder and MD at Brainpool.ai said other use cases include healthcare, where it is becoming popular in early diagnosis, such as early signs of cancer and also in construction, where it can be used to continuously monitor progress on construction sites and identify associated risks.
She says: “Many of us also have computer facial recognition unlocking our phones and every time we tag a picture of someone on Facebook, for example, or Instagram, we also use computer vision - it is applied everywhere around us.”
The main adopters and the challenges they face
Borowska cites IBM Watson, which has built one of the most widely-known computer vision software available called Power AI. In healthcare, the software beats records in skin cancer detection. She adds that the Big Tech giants such as Google, Amazon, Facebook and Apple are leading the development and application of computer vision research.
Shaji adds Clarifai to the above list, but says that newer companies in the market are set to disrupt the space by making computer vision understandable and usable by anyone, regardless of position and/or job title. He adds they can offer on-premise solutions so they run locally on client systems, with no data being sent back to vendors, giving them total data privacy.
When it comes to associated challenges of adopting any new technology, Borowska says that systems are only as good as the data fed into them, similar to AI. “If the data is not representative of the total population they analyse, we are likely to get a biased outcome. If you train a system to recognise a shoe, but only feed it pictures of trainers and boots, when a pair of heels comes along, it is unlikely to recognise it as a shoe. That’s why Apple’s facial recognition famously failed to work on black faces. Statistical analysis of the data is required to ensure all relevant cases are represented in the data input,” she states.
According to Shaji there is also the issue of ‘bottlenecks’ and the need for expert AI scientists and technical support teams to both train and install solutions: “Mass adoption of computer vision technology hasn't happened yet because most solutions still require expert AI scientists and technical support teams. Furthermore, huge amounts of data sets and heavy computation are required to train most machine learning models. Most of these are not user-friendly and incomprehensible to users beyond the technical field.”
Democratising the technology
Shaji went on to say that vendors are now looking to democratise the technology so people can build their own applications.
“Such technologies employ a technique called ‘few shot learning’, allowing the training of very specific concepts using small amounts of data sets. The training is no-code which allows everyone to easily navigate through the machine learning process. These solutions are lightweight and easy to install on-premises in mobile phones, laptops and even satellites,” explained Shaji.
He concludes that once more companies pick up on the ‘few shot learning’ approach, the more widely adopted and used computer vision would become. Therefore its use cases would spread wider for the benefit of more businesses and industries.