With artificial intelligence (AI) capabilities evolving at such an astonishing pace, one of the most pressing challenges facing data teams and engineers is how to handle the mass of unstructured and heterogeneous data sources.
Unlike structured data, which can fit neatly into tables and databases, unstructured data is built from a wide range of formats, including videos, texts, and images. All of these formats have their own complexities, and the heterogeneity of these data sources can add further layers of complexity.
With this in mind, can teams find a way to optimize their data collection and analysis to maximize the impact of AI on their business? Considering the trend of activity, agent-based systems and agent-to-agent communication seem to be the golden idea that will take the AI movement to the next level.
Senior Partner and Global Leader of AI and Data at Kearney.
The historical challenge of unstructured data
Historically, unstructured data such as audio, video and social media interactions have presented a substantial challenge for businesses trying to interpret and convert them into structured formats appropriately suited for analytics and AI applications. For many organisations, the sheer complexity and cost of processing this unstructured data has meant that, until recently, it has remained largely under-utilised.
As a result, even though unstructured data comprises the majority of available data and holds significant unrealized potential, organizations have tended to turn to structured data, such as Excel files and search engine optimization (SEO) tags.
However, in recent years, technological advances in the use of AI, along with generative AI, have transformed the way unstructured data can be interpreted and extracted.
For example, major cloud services companies including Microsoft and Google have expanded their cloud services to enable the creation of “data lakes” from unstructured data. Microsoft’s Azure AI now uses a combination of text analytics, optical character recognition, speech recognition, and computer vision to interpret an unstructured data set that could include text or images. Thanks to this advancement, companies can now access this richer data resource and finally unlock its value.
What are the current problems with unstructured data?
Organizations can now access a wealth of information that was previously inaccessible.
However, this is not without its challenges. For example, navigating the different levels of quality, scope, and detail of content in this unstructured data can be a major hurdle. With unstructured data, there is often a lot more irrelevant noise. If there is too much of it, it can be difficult for even AI to identify accurate answers while analyzing the information.
Additionally, a lack of regulation when it comes to the creation of unstructured data can impact its usefulness. While these larger data sets generally offer higher levels of consistency, it remains a challenge to adapt them for use by AI and, therefore, for organizations to leverage them more effectively.
In order to effectively use unstructured data, it is usually necessary to incorporate it into an organization’s existing data framework. To achieve this integration, it is necessary to have a deep understanding of the properties, connections, and potential uses of the data. A major challenge for many of these unstructured projects is simply defining a clear goal so that these models can be accurately trained.
Many organizations still struggle to leverage these existing data assets to generate business value.
So while the above problem of unlocking and obtaining data has largely been solved, being able to hypothesize about its potential value and applications remains a major hurdle.
What is expected from the GenAI movement in the future?
In the future, we should expect human involvement in data acquisition and interpretation to decline. Instead, we are likely to see a rise in agent-based systems, along with agent-to-agent communications, which minimizes the need for human intervention in data handling. The rise of generative AI has paved the way for specialized agents, including:
- “Engineering agents” for code generation
- “Data generation agents” to create synthetic data for testing
- “Code Testing Agents” to validate and test code
- “Documentation agents” to generate documentation for various aspects, such as code, use cases, and processes.
There is no doubt that a system in which specialized AI agents interact with each other can speed up development and make it more accurate and more consistent.
Organizations can now devote more resources to using data rather than preparing it. It is very likely that in the near future we will see these AI agents as a product offered by service providers. These service providers could take a company’s requirements and then produce fully tested, compliant code produced by AI agents.
By outsourcing these technical tasks, companies would significantly reduce the time it takes to complete these types of tasks, as well as reducing the need for large in-house development teams. It seems the time has come for companies to consider the specific roles generative AI can play in order to maximize the value of their data programs and ultimately get much better results from their investment in these newly expanded areas.
Generative AI has long been known to have the potential to revolutionize the way organizations operate. However, to effectively implement it in organizations, its weaknesses will still need to be overcome before it can reach its full capabilities.
Organizations have yet to fully embrace AI-enabled data acquisition and integration. Those that adapt will be able to maximize the value of their investment and improve their fortunes.
We have introduced the best artificial intelligence chatbot for businesses.
This article was produced as part of TechRadarPro's Expert Insights channel, where we showcase the brightest and brightest minds in the tech industry today. The views expressed here are those of the author, and not necessarily those of TechRadarPro or Future plc. If you're interested in contributing, find out more here: