To prepare data for AI, a cleanup is often required. AI can help with that
This article was first published on TU.no November 6th.
Successful AI is closely tied to having access to relevant, high-quality data. However, Anders Elton, Director of Data and Analytics at Computas, suggests that there is hope for those who have unstructured data they cannot leverage. Perhaps AI can be the solution to refine inaccessible data for your business?
Big data, data platforms, data lakes, and data-driven processes have been on the agenda for any self-respecting business in recent years. It’s been said that data is the new gold, and many have eagerly embarked on the journey to mine this new resource with enthusiasm for gaining insights and automation, and for those who dared to think ahead, the possibility of using data in AI solutions.
However, not everyone who embarked on this journey struck gold. To succeed in becoming data-driven and implementing AI, it requires good data quality. This means that the data must be accurate, complete, up-to-date, and consistent. Often, there’s a lot of cleaning up to do before you can even press the AI button.
“It requires good housekeeping. Historically, data has often been a byproduct where no one is responsible for correcting errors and deficiencies, especially when it comes to historical data,” says data expert Anders Elton at Computas.
“Even in large companies, where they may have many datasets, there is rarely someone managing these datasets. If you discover something that needs to be fixed, you probably have no one to ask or report the issue to.”
1995, 95, or ninty-five?
Elton uses seismic data as an example. In oil and gas exploration, sound cannons are used to shoot low-frequency sound waves into the seabed. When a seismic company shoots seismic data, they tag when it was shot and when the data was processed. This data is then sold to oil companies.
“Seismic companies have been in business since the early days of the data age. Much of what was done early on had some manual input and lacked the focus on data quality that we expect today. Geologists sometimes tagged the data with for example ‘1995,’ other times as ‘-95,’ or ‘ninety-five’ with letters. It’s challenging to work with the data using machines without significant formatting work,” Elton explains.
Through his role as an advisor at Computas, he has helped numerous clients in both the private and public sectors gain control of their data and become more data-driven.
“The larger the company, the more challenging it can be. They often have too much data and a structure that isn’t conducive to handling it. Lack of trust in the data can be a significant barrier to innovation,” says Elton.
However, he remains optimistic for those who are still far from being data-driven, and even further from using AI.
“Think big, but start small. Begin by identifying datasets with satisfactory quality, tag them, and make them accessible within the organization,” he advises.
“The goal is to trigger the desire to explore new opportunities. Suddenly, someone may spot potential in a data source that no one else noticed. A data-driven culture can prove to be an invaluable competitive advantage, especially for those who can adapt quickly.”
Ethical and Legal Considerations
Ethics and legality receive a lot of attention when it comes to adopting AI. Elton encourages starting with datasets where there are few ethical and legal pitfalls.
“Of course, you must ask whether the data contains personal information or trade secrets, whether it’s okay to send them to the cloud, whether the data, in that case, must be stored in the EU, what you can use them for, and what not. There is also the ethical aspect – should you use them? But focus primarily on datasets where it’s relatively uncomplicated to answer these questions, rather than getting stuck in the issues.”
Once an ethical and legal assessment has been made for a dataset, Elton recommends making the results of these clarifications available in a data catalog. This makes it easy for others in the organization who want to use the data without having to reevaluate it each time the data is to be used.
“In large organizations, it can be a challenge to access datasets from various internal sources, but this is where much of the innovation opportunity lies. It’s useful if you can look up in a catalog and see what data you have and what guidelines are associated with it, such as whether it can be used for AI.”
AI as goal – and means
Elton suggests that AI, in addition to being the goal for many when they want to refine their data, can also be the means in data refinement.
“Today, we can use generative AI and large language models to address challenges like tagging years, as mentioned in the seismic example. AI can be used to improve data quality where it falls short of today’s standards.”
Elton says that a good starting point is to look at the short-term possibilities one has, so that they can get started, test, and learn.
Computas offers what they call an AI Discovery Workshop. Together with the client’s key personnel, they examine the possibilities, the business needs, and the available data. Computas also assists in assessing laws and regulations and finding avenues for utilizing the data.
“Customers don’t need to have a clear idea of this before such a workshop. Right now, many are eager to get started with AI without clear thoughts on what and how. Our experience is that through this workshop, we help clients get started,” says Elton.
“Just having clarity on which data can be moved to the cloud is incredibly valuable. When these things fall into place, good ideas often follow.”
Want to know more?
Please fill out our contact form