Semantic data platforms: How AI gives context to your data
Numbers are dull.
They're useful, but still dull.
Consider your typical data research:
"89% of enterprise customers chose platform X over Y."
"There was a 50% increase in ad spending QoQ in vertical Z"
"The use of semantic data layers is growing 3x year over year"
This tells me that something is more popular over something else.
That there may be an opportunity to sell ads in a vertical.
Or that a concept is experiencing strong growth.
But it leaves what truly matters aside.
Why processing data without context is dangerous
Your business needs data to make informed decisions about where to go next. But that data is often skewed towards numerical analysis rather than knowledge, experience, and real-world facts.
Historically, computers have processed numbers more easily than “content” (e.g. the format of a presentation design, the text and structure of a Word file, spreadsheet cells, and more).
But using raw data to make decisions without context means that:
- You might push for a project against a trend that dies down a few months later
- There may be valuable internal resources you're not leveraging
- You could overlook knowledge that already exists
There are a ton of gaps in the way organisations process and use their data.
Those gaps are hard to identify unless you give the data context.
Why should you care about data semantics?
Data semantics is a deep technical concept.
But there's a single key takeaway you need to know:
– The primary concern of semantics is to give context (and therefore meaning) to something –
And there’s a fundamental question to ask in these regards:
"How do I turn my data into actionable knowledge?"
This is not a trivial question, and one that can only be answered with proper data semantics, especially if you have large volumes of data sitting in varied formats across many sources.
What value do you get from semantic data?
Going through the trouble of extracting context from potentially millions of documents is mind-numbing enough to make the effort not even worth thinking about in 90%+ of cases.
Your competitors know this.
They know that you have a ton of data to work with.
And they know that it's hard to manage.
That's why they try to undermine your business with "The next big thing in healthcare," or "Enhanced trade reporting," or even "The digital bank that makes traditional banks obsolete."
But these are just claims.
As a large organisation, what you have is:
- Decades of research, studies, documentation, and knowledge from senior consultants.
- Experience that spans across a variety of industries, verticals, and areas of study.
- Tools and resources to bring this knowledge together into innovative products.
It's this vastness of information that forms the backbone of your business, and it's the biggest advantage you have against startups, disruptors, and competitors.
The value that you get from implementing a well-structured semantic layer in your data platform isn't just incremental; it unhearts growth insights that were previously inaccessible to the business.
Semantics help you discover knowledge that you may never have thought you had.
Not because you hadn't done the research, but because connecting the dots between R&D, product development, and everything revolving around the customer is an immense task.
One that a semantic data layer solves for you.
The semantic data platform: Why it matters
Your organisation already leverages "data platforms," whether that be a CRM, an ERP, an on-premise solution—you name it. But the key issue of extracting real-world insights remains.
Some examples of questions that often arise in a business:
- Customer X churned after a year of working with us.
Why? What's the context around them leaving?
- 70%+ of employees say they don't feel close to their colleagues.
Why? What are they experiencing that management can't see?
- Team Y is 200% more productive than team Z despite the projects being similar.
Why? What is team Y doing that team Z hasn't implemented yet?
- What active ingredients did we research in the past?
What were the findings? Can we use them in today's research?
These examples are concerned with both quantitative and qualitative aspects. Yet most businesses stop at the quantitative part—because that's what traditional data platforms do best.
So you're left with a binary choice:
- Continue down the path of simply "trusting the numbers", or;
- Provide real context around those numbers and enable 360-degree analysis.
The numbers will provide you with value.
But without the proper context, they can be misinterpreted.
The great thing is—there is a solution.
And the cost and necessary skills are becoming much more accessible.
How semantic data platforms work: A brief overview
The semantic data platform—also known as a “knowledge engine”—is a piece of software made specifically with the purpose of extracting real-world meaning from all your data.
Not just your customer records, nor just your teams’ productivity.
With a true data platform, migrating data from multiple sources is straightforward, and the real-world context that comes with it can be extracted, analyzed, and refined at any time.
I say “migrating” because having a dedicated platform specifically meant for knowledge discovery is much more effective than implementing a semantic layer for each of your data sources.
Let me break it down in 6 simple steps:
1. Breaking through data silos
The problem of data silos has been talked about for years, but it’s still foundational to a good knowledge discovery implementation. Your first question with semantics should be:
“Where does my data reside?”
You likely have a large body of information available to you, some of which is hidden in old legacy systems. Combining disparate data often yields the greatest insight.
Before you even think about what context to extract from your data, you should know where your data is and figure out its composition (spreadsheets, word files, presentations, etc).
A good data platform will do the latter for you.
2. Ontologies & taxonomies
Identifying your entire body of available data is a great step towards achieving knowledge discovery capabilities. But the core value of a semantic platform is to give that data context.
These massive “vocabularies” provide formal, technical descriptions of a domain with its classes, relationships, and any real-world context that comes with that combination.
For example, here’s a (very simple) ontology:
Here, company World Inc. has contracted company Hello, Inc. to work on a project called “Semantic layer,” with Janett Reid being the customer’s spokesperson, Kelly Vaughn being the project owner, and Robert Zeist working under Kelly Vaughn to complete the project successfully.
Apart from the entities (Company, Employee, Freelancer, and Project), what truly matters here is the relationship between them: “Has a business partnership with”, “Works for”, “Works with”, etc.
Deriving these relationships manually is a tedious, time-consuming task that is hard to justify due to its sheer complexity. But machines are now capable of doing this quickly and effectively.
That’s where a data model comes in.
3. Creating a semantic data model
A semantic data model is a “repository” of concepts and relationships.
You can think of it as a building block of your organisation’s collective knowledge.
For example, a data model could focus around:
- Experimental drugs
- Customer journeys
- Financial trades
Each model is made up of a collection of entities with relationships, similar to ontologies. The key difference is that data models are applied directly to your database.
Ontologies represent the real-world business specification.
Models represent the actual database structure.
So if you were to extract information about a specific drug from your data, you’d create a model off of your existing ontologies—or you’d create one from scratch if you didn’t have any.
Through a series of machine learning algorithms, that model would be matched against the text and metadata available in your documents, extracting the entities relevant to you.
The sheer size and variety of your data works for you, providing the volume of training data necessary to create refined algorithms. This enables the foundational value of a semantic data platform.
From here onwards, you can effectively use the extracted information to achieve a variety of business use cases: knowledge graphs, cognitive search, automated compliance checks, and more.
4. Data lakes or data warehouses?
We’ve discussed the building blocks of your semantic platform.
But there’s a key operational part missing…
Where does this new knowledge go?
In this area, you’ll hear about 2 types of data patterns:
- Data lake / Data warehouse
- Data hub
You’ll even hear about “data lakehouses” which are a combination of data lakes and warehouses.
In short, a data lake is a huge repository of data that can store any format: structured, semi-structured, and unstructured. A warehouse aggregates data into a standard format that is analytics-ready.
Both have a role in a semantic data platform.
But the true semantic layer sits between the data hub and the business user:
The architecture of a semantic layer involves many moving parts, most of which are invisible to the business user. Ontologies, models, entities, machine learning algorithms, and so on.
These all live in the semantic layer, which is more of an architecture than a single tool. Here you’ll find multiple technologies dancing in unison to achieve your desired business outcome.
5. Implementing an effective semantic layer
There’s no hiding that implementing a semantic layer is a complex process. There are architectural challenges, business challenges, and technical challenges to overcome all at once.
Here, you have two options:
- Implement your own semantic data platform from scratch
- Use an existing framework to speed up development
Both are relevant depending on your needs, but the latter is the simpler option. There are many frameworks available—and even fully-fledged, plug-and-play data discovery platforms.
Where a proprietary solution isn’t an option, it’s important that you cover each concept one at a time rather than all at once. Start with listing the data, know what you want to extract (ontologies), store the data in a data warehouse, create the models (machine learning), and then develop the user interface.
Each of these parts requires consulting from experts in the respective fields as there are many pitfalls you’ll want to avoid during the design and development processes.
6. Presenting your data as knowledge
One of the key things to avoid with semantic data platforms is to start from the consumption end: the knowledge discovery interface. This is where a business user will perform their tasks.
To offer an amazing user experience, you have to get the underlying data engine right first.
If the semantic layer isn’t properly architected in the first place, it’ll be very hard to inform how the user interface should be designed. You won’t have a clear Application Programming Interface (API) for it.
A good semantic layer should expose an API that:
- Is relevant to the business specifications
- Provides thorough functionality against those specifications
- Is modular, secure, flexible, and can be easily modified
There’s a whole aspect to API design that I won’t touch in this article, but it’s important to know that the user experience is as much about the API as it is about the buttons the user will click.
“A proper semantic layer must have an exposed programmatic and data access application programming interface (API).”
Once the API is exposed, you can build the user interaction on top of it:
- Your API allows for uploading new data models?
Create a user flow that offers step-by-step instructions on how to do so.
- What if the user wants to modify an entity?
The user interface should reflect that API method for each model.
- Have a knowledge graph endpoint?
Build the visual representation for a user to consume.
It’s important that the API informs the user experience and not vice versa because it’s easier to implement thoughtful user interaction when you have documented functionality to back it up.
What are the challenges of implementing a semantic layer?
There are important challenges to overcome both at a business and at an architectural level when implementing a semantic layer. The major challenge is dealing with data in the right order.
Depending on your case, you have to consider:
- What problem you are trying to solve (lowering costs, speeding up research, etc)
- How much data you currently have and where it’s stored
- The information you’re trying to extract from it
- What steps are necessary to get there
Following a strict, iterative process, you can go from business requirements to end product in a few months, provided that you have a team of experts in certain technologies.
See an example of a complete technical architecture below:
Does implementation present any risks?
Any complex project involves risk, and semantic data platforms are no different.
Four common risks you’ll be facing are:
- Architectural failure – When the data platform is architected upon false assumptions around what the business needs to extract knowledge from its data, leading to quality issues.
- Technical failure – When the implementation doesn’t match business expectations or it fails to provide the value it promised due to bugs, errors, and a bad user experience.
- Failure due to complexity – When the team becomes overwhelmed with both architectural and technical complexities and the project comes to a halt before it gets deployed.
- Low return on investment – When the semantic data platform is technically deployed, but you run into high costs, lower than expected business value, or both.
These can be mitigated by ensuring that your team is made up of roles relevant to this type of project: a senior data architect, a business analyst, a database expert, and 3 - 4 developers.
The risk can also be reduced by contracting an external partner with whom you can draft a Service-Level Agreement (SLA) which will feature clauses on indemnification.
Semantic data platforms: 3 powerful use cases
Going through the effort of investing in data semantics clearly needs to provide significant returns for the business. The good news is that there are many proven use cases already:
Use case #1: Research & development
Thanks to improved data discovery, your R&D team can find the information they need faster and more accurately than with traditional data analysis methods.
It’s one of the biggest benefits of semantic platforms, allowing for:
- Contextual search, which can evolve into “cognitive” search.
- Unstructured document analysis (.docx, .ppt, .xlsx, etc).
- Real-time entity extraction and data monitoring.
Forrester has a clear graph showing the use cases possible in this domain.
Use case #2: Compliance reporting
Another big area where semantic platforms shine is compliance, allowing you to clearly track the lineage of your data (where data first originated and how it moved over time).
Enterprises who’ve faced regulatory scrutiny for their data practices know how hard it is to retroactively infer where the data comes from, who created it, and why it was stored.
A well-implemented semantic layer can do this at scale.
Use case #3: Process automation
Logistics companies in particular (but other industries too) need to manually approve a lot of documents at specific intervals, requiring roles almost exclusively dedicated to quality control.
By extracting entities automatically and providing all the information necessary for auditors to check that a process is being run as expected, these companies can save a lot of time.
These are just three of the possible use cases.
Depending on your needs, you can build fully-custom enterprise applications on top of a semantic layer, and deliver value that’s unique to your customers and team members.
Should you invest in a semantic data platform?
The choice to invest in semantics is entirely up to you.
My goal with this article is to help you understand why so many organizations are looking at semantic data layers as value drivers instead of giving you an overly generic view.
The core strength of a semantic layer is that it provides qualitative insight into your data.
This is particularly relevant for organizations in:
- Healthcare, pharmaceuticals, and medical devices
- Financial trading, insurance, and banking
- Life sciences, publishing, and media
One of the key capabilities I’ve experienced running a data consulting company is related to “historical search,” i.e. to unearth information that’s still relevant to this day but researched decades ago.
Traditional enterprise search can only do so much in this area.
What you’ll get is a huge list of results to filter through.
By implementing a semantic layer, your data platform can flip this scenario on its head and only provide results that are relevant to that specific concept you want to know about.
It’s the difference between knowledge and data.
It provides context, not just numbers.