Creating the next generation of AI with Datagen’s platform
Yesterday Datagen, the data and computer vision company, announced it had secured US$50mn in Series B funding led by Scale Venture Partners.
This latest round of funding brings Datagen’s total financing to over US$70mn. These funds will help the company to continue to bolster its leadership position in the nascent computer vision (CV) space.
Today, Datagen is defining a new data-as-code category that will serve as the next big frontier following the model- and data-centric approaches to computer vision AI development.
By removing the need to source and manually annotate training data, Datagen is helping CV teams get to market faster with applications in augmented reality, smart offices, automotive in-cabin monitoring and home security.
To learn more about the company following this announcement, AI Magazine caught up with CEO and Co-Founder Ofir Zuk (Chakon) who shared his insights into AI, car manufacturing and the company’s plans for the future.
Tell me about Datagen, your role and your responsibilities there?
Datagen helps computer vision teams solve the #1 problem in modern AI; acquiring data at scale for the development of deep learning applications. We reinvent the way data is acquired by creating high-performance synthetic data for training and testing purposes. With our self-service platform, engineers generate high-fidelity 2D/3D synthetic data with rich ground truth, in a seamless and scalable manner. Datagen customers include Fortune 500 companies across a variety of industries. Gil Elbaz (CTO) and I co-founded Datagen in 2018 with the specific goal of enabling computer vision teams in the creation of the next generation of visual AI. Today, I serve as Chief Executive Officer. My day-to-day consists of managing our team, directing the strategy for the company, working with our top clients and VC partners, and expanding our team.
How does your self-service platform utilise AI?
Datagen’s self-service platform uses Generative Adversarial Networks (GANs), along with some other leading-edge machine learning methods, to generate 3D simulations of subjects (human-centric focus). And because variations are built on a foundation of high-quality scanned 3D data, results retain realism and plausibility. We use those machine learning methods to generate a huge database of assets, including everything from human bodies with dynamic interactions, to home office environments, to vehicle interiors for in-cabin passenger monitoring. For each asset type, Datagen uses these ML algorithms to introduce variance across countless discrete axes that provide control to the engineers and let them create the data their AI algorithms need.
In what ways will AI transform car manufacturing?
The EU has already introduced legislation that will require Driver Monitoring Systems in all newly-manufactured cars, and similar legislation has been proposed in the United States. These systems use AI-enabled cameras to detect driver drowsiness, or distraction by a cell phone rather than keeping their eyes on the road. Moving forward, these systems will be equipped with all manner of capabilities, well beyond simple safety monitoring. We’ll soon see cars which deploy airbags intelligently, based on the size, shape, and physical orientation of the occupants at the time of impact.
Another reason that synthetic data is so important — it has a pivotal role in feeding those driver safety systems as we are talking about diverse datasets that can represent the broadest scale of scenarios a DMS system will face in production.
What benefits will this bring?
Human-centric visual intelligence is one of computer vision’s fastest-growing fields. But, it’s also one of the most challenging. Datagen’s unparalleled domain expertise in human simulation puts us in pole position to enable the next generation of computer vision applications.
Not all synthetic data is created equally. The most effective, highest-performance synthetic data is invariably grounded in real-world imagery. That’s why we’ve built the world’s largest 3D procedural content database, indexed and enriched with metadata like object materials, skin textures, high-poly meshes and intractable surfaces, to name a few.
AI, and computer vision in particular, will completely redefine the way we engage with vehicles (and the way vehicles engage with us!). And right now, one of the biggest issues auto manufacturers have in getting these AI-enabled vehicles to market is insufficient training data. In the ways explained above, synthetic data not only makes these AI more effective, it also expedites their development by orders of magnitude. We can’t share names, but Datagen is already working with some of the Top 10 largest auto manufacturers in the world to get their computer vision systems ready for production.
While we’ve had a great deal of success in the in-cabin automotive space, that’s just one of a growing number of domains that Datagen serves. As the rest of the industry grows increasingly fragmented, we’ve designed our product infrastructure with modularity and expandability in mind — which enables us to expand our offerings into additional domains and use cases at close to zero marginal cost.
What can we expect from Datagen in the future?
As a domain agnostic company, we aren’t beholden to a single industry or use case. We’ve already seen considerable success with our application-specific offerings, such as our in-cabin automotive solution. Moving forward, we’ll be expanding our human-centric offering to additional domains that cater to our customers’ needs as we continue to expand our platform’s capabilities.
In fact, we’re already seeing rapid growth in other domains, especially around the Metaverse. As interest and demand continue to outpace development, we see a huge opportunity for synthetic data to serve as a significant enabler of the Metaverse. Lastly, we’re actively developing additional tools and solutions on top of data generation, with the goal of establishing a comprehensive, streamlined infrastructure for computer vision.
The industry has matured a lot over 2021. From a handful of companies being the early adopters of synthetic technologies to a place where every AI team now understands that they need to start consuming synthetic data sooner rather than later. There is only one way to go about this — as simulation gets more advanced and overcomes the challenges we see with manual data today, synthetic data will start to replace manual data as the new industry standard.