AI Readiness Requires Tackling Data Integration Challenges
Whilst more than 90% of data is unstructured, the majority of the innovation around bringing data to AI has focused on structured data. However, there is work at play to help data stakeholders leverage significant unstructured data sets for new value or to reduce security risks.
AI data workflows are becoming invaluable to business development. They can help automate the discovery and classification of data to AI tools, which can then be used to integrate with AI tools to enrich unstructured data so that people can more easily find what they need when curating data sets for projects.
With this in mind, AI Magazine speaks with Krishna Subramanian, co-founder and COO of Komprise, about how businesses can best utilise data of all types to innovate more successfully. She explains how AI data workflows could speed time-to-value and expand the number of use cases that businesses can deploy on large data sets.
Confronting data challenges
Research by MIT finds that the top challenge preventing AI readiness is data integration or data pipelines. In particular, managing data volume, moving data from on-premises to the cloud, enabling real-time access and managing changes to data have all been touted as business concerns.
In this, Krishna surmises that managing data volume is a barrier to using data with AI, specifically because data is scattered in many storage silos.
“IT organisations are being crushed by the weight of unstructured data and how to not only integrate it but how to access, migrate, store, manage and protect it while being cost efficient,” she says. “Searching across these vast data estates to find relevant data for AI is a challenge. So having systems and processes to analyse data across multi-vendor and multi-cloud storage is important.
“You can then easily search to find data that you want to use with AI using both metadata characteristics such as filenames, directory paths, who created the file and when it was last modified, as well as by searching based on data contents. Then you can systematically mobilise the data of interest by copying it or moving it to AI services.
“Therefore, first understanding data, classifying it and moving to locations where it can be used in affordable AI tools is the foundation of AI success, and it requires systematic data management.”
Real-time access to data can be challenging to obtain, particularly if a business has billions of files distributed across many data silos that are not easily accessible. On this, Krishan suggests that enterprise IT requires a “unified data index” that allows search capabilities.
“The ability to search by metadata tags and enrich those tags using AI services is ideal,” she explains. “Tagging delivers context and some structure to unstructured data, which makes it easier for researchers and data scientists to find what they need when they want to run a project. Otherwise, it could take them months to find the right data sets.
“We must apply automation to the discovery, classification and workflow orchestration of data to ensure the right data is being delivered to AI tools.”
Keeping our systems safe
As far as AI development is concerned, data governance and security are inevitably large concerns for businesses. In order to maintain trust and use AI responsibly, businesses around the world harnessing AI are being advised to have clear regulations for how they use the technology.
“Most organisations are concerned about safeguarding their sensitive corporate data,” Krishna explains. “They don’t want private data getting into public LLMs. If sensitive customer data is exposed and then lands in public content somewhere, that can bring a company down. Leaking corporate data is easy to do. If done incorrectly, this could cause the data to leak into the public domain and cause harm.”
As innovation rapidly continues on account of digital transformation, the role of technology is significant when it comes to auditing data movement and segmentation.
“There must be ways to avoid harm while still allowing the use of AI platforms by employees and benefiting from the ways it can drive innovation and better productivity,” Krishna says. “We are still in the early stages. Technology and government leaders must come together and determine the risks and how we can strike the right balance.”
Leveraging unstructured data
Data quality is often thought of as structured data, whereas, for unstructured data it is very different as it does not have a unifying structure.
Krishna explains: “Poor data quality of unstructured data comes from using too much irrelevant data or obsolete data to train LLMs. This can be improved by leveraging data analysis solutions that can show obsolete data, that can help you search based both on metadata and data contents to curate the appropriate data set to use.”
She also advocates for data sets being discovered and qualified by businesses in order to harness valuable information that can be leveraged.
“Obviously, there are other aspects of data quality to consider and there are many tools on the market,” she adds. “Start by adding some structure to your unstructured data and then determine the quality of your data before it is fed to AI.”
When it comes to bridging the gap between AI and corporate data in this way, Krishna is optimistic.
She suggests: “These solutions will become platforms through which you can connect any AI service to any corporate data and have the automation take care of data security, data governance, auditing, and data movement to AI – all without any coding or specialised expertise.
“We are starting to see unstructured data management solutions offer data workflow solutions for AI.”
******
Make sure you check out the latest edition of AI Magazine and also sign up to our global conference series - Tech & AI LIVE 2024
******
AI Magazine is a BizClik brand