Building a petabyte scale data platform
PGS is gearing up its digitalisation efforts by cloud-enabling its massive subsurface library. Together with cloud experts from Computas, they are developing the foundations for a secure and robust digital marketplace for subsurface data.
PGS offers geophysical services to support exploration and production of energy resources offshore. For 30 years, PGS has been collecting subsurface data using purpose built vessels towing seismic listening cables. The cables, packed with hydrophones and geophones, collect reflected sound waves like a giant subsurface ultrasound. By processing the several hundred gigabytes of field data each survey can yield, PGS produces 3D images and data describing structures and reservoirs, which they sell to a global market of energy companies.
With tens of petabytes of data collected over three decades stored in several databases and formats, PGS naturally faces challenges with varying data quality and standardisation when trying to move the data to the cloud. PGS and Computas are developing the data platform in a secure, scalable and robust way with a special consideration for data quality, based on Google Cloud Platform. Liberating the data from tapes will open up new commercial and technical solutions.
Easier for both PGS and the customer
– PGS is building a digital marketplace for seismic data. This aims to get rid of tedious manual processes and decrease turn-around time, allowing us to present seismic data seamlessly in our digital showroom, and offer our customers fast, secure and 24/7 access to the data they have licensed. This requires new solutions and a new culture, so it is a journey, both technical and organisational, to get this right, says Espen Grimstad, Project Manager Digitalisation in PGS.
PGS is working with Computas, an Google Cloud Premier Partner experienced in digitalization of companies.
– In terms of technology and scale of data acquisition, of networks, signal processing and 3D imaging, the seismic and energy sector is high tech. However, in some areas, like cloud adaptation, we see great potential in adapting. Therefore, we are relying on partners to make the cloud journey a successful one. Computas has both the expertise on Google Cloud Platform and the experience to give us valuable advice, says Grimstad.
A marketplace for the oil and gas industry
Due to the vast scale of the datasets, storing and delivering data on physical media, like tapes and disks, was until recently the norm across the seismic industry, including PGS. The handling of the physical media could cause a sale to take months to complete, and involve a lot of travelling both back and forth. Thanks to digitalization and automation of the value chain, the cloud based data platform currently under construction can serve relevant data to clients in a fraction of a second, saving time and reducing the carbon footprint of a sale.
– Traditionally the client would buy a licence, and then they would get the seismic data delivered on physical media. Now, it is possible to offer new products, like subscription models, and offer the data directly to the client, giving them access to their areas of interest on their own computer, Espen Grimstad explains.
– With PGS data in the cloud, the process of cutting tapes and shipping seismic data to the client is obsolete. With no physical handling of the data, the clients can get instant and 24/7 access, enabling them to respond to opportunities faster and develop new solutions. Having Computas’ cloud and data platform expertise within the team is invaluable for PGS in this process. Computas is in many ways assisting us to create this marketplace. They are helping us create a modern cloud architecture that they also implement, while challenging existing processes and solutions and being good advisors, says Grimstad.
Erik Ewig, Senior VP Technology & Digitalization at PGS, shares Grimstad’s experience.
– Computas and their consultants’ competence, knowledge and positive attitude has been a decisive factor for PGS to reach its goal of making our massive data sets digital available to our clients. We will continue our close relationship with Computas as a partner in our digital transformation moving forward, says Ewig.
Big Data
Anders Elton is senior advisor for Data and Insight at Computas, and is one of the cloud experts working with PGS.
– You need a lot of compute and storage resources to handle Big Data, and doing this in a cost effective way is tricky, he explains.
– There are thousands of seismic files and petabytes of data with varying metadata quality. For example, if the seismic metadata has a mismatch with the coordinate system actually used in the map projection of the seismic data, the data is of lesser value to the client. Imagine opening a Google Maps view of Oslo where all the street names are correct, but the actual drawing of the streets is from Stockholm, Sweden – you would certainly have a hard time finding what you are looking for! Fortunately, PGS and the project have identification of issues like these as a primary focus, enabling us to develop solutions to handle this in a good way, using the entire toolbox of Google Cloud Platform, Elton continues.
– Data will not reach the end user before it has gone through several data quality checkpoints – and we are building automation to correct the data quality issues. We use BigQuery as a central integration hub for metadata so we can look at all the different databases combined in ways we have not done before, he says.
Modern Tech Stack
The project harnesses the latest technologies that the cloud offers. Serverless and scalable pipelines – both Dataflow and Cloud Run, modern CI/CD using Gitlab, and of course a fully scripted infrastructure using Terraform.
Cognite Data Fusion (CDF) is a key component in serving and interpreting the seismic data. It handles the seismic data in cloud storage and makes it available in a modern API for clients to use, while enforcing access policies.
– Having a platform like CDF, that correctly contextualises seismic data, really opens up new opportunities, allowing you to do data-near analytics in the cloud, instead of moving petabytes of data on-premise for further workflows.
The project also delivers human readable representations of seismic and metadata, utilising various off-the-shelf visualisation tools like Google Data Studio and Microsoft Power BI. In addition, it has built a customised application to give insight where off-the-shelf solutions fall short.
– I really enjoy working with PGS. They are open to try any new technology to reach their goal of a fully liberated dataset. PGS has a mix of greenfield and legacy software making it both challenging and fun to work with this client, Anders Elton says.
Technology used in the project
- Terraform
- Gitlab
- Cognite Data Fusion
- SQL Server
- Python
- Google Data Studio
- Microsoft PowerBI
- GCP Stack
- BigQuery
- Dataflow
- Dataform
- Cloud Run
- Cloud Functions
- Cloud Scheduler
- Cloud Workflows
- Cloud Storage
- Cloud Alerting
- Cloud VPC (Virtual Private Cloud)
- Pub/Sub
- Identity Aware Proxy