In Australia, the Peter MacCallum Cancer Centre and the John Holland Group, an infrastructure and construction company, have turned to cloud data and AI platform Databricks to solve significant data fragmentation issues that were hampering their ability to extract insights from business data.
Speaking at Databricks’ Data + AI World Tour in Sydney, Australia, last month, technology leaders from both organizations reported facing challenges such as siloed data, competing business areas, data integration issues, and legacy systems, prompting the need to look to a cloud data solution.
Peter MacCallum Cancer Centre consolidates data to use AI
Peter Mac’s legacy data infrastructure limited its ability to effectively leverage big data and AI across its extensive clinical and research operations. Legacy technology also jeopardized its mission to improve the lives of people with cancer, including using AI to improve clinical decision-making and accelerate biological insights and drug discovery.
Problems with data infrastructure
During the conference, Jason Li, director of the bioinformatics center in the Peter Mac Division of Cancer Research, said:
- Peter Mac was dealing with a variety of data silos and legacy systems.
- The complexity and volume of clinical and research data in cancer center operations posed challenges in areas such as data storage and analysis.
- Ethical, privacy and security concerns were key drivers for Peter Mac’s data governance and implementation of any future AI use cases.
- Integration between clinical and research departments complicated the data governance challenge because each had different data requirements.
WATCH: Informatica says data fragmentation is a barrier to AI in APAC
Li said Peter Mac selected Databricks to help it harmonize data across the facility and support advanced analytics, including artificial intelligence, while meeting data security and privacy requirements in healthcare.
Expanding to new AI use cases
Peter Mac first tested the AI potential of the Databricks platform with an AI transformation pilot project:
- The center created an end-to-end AI lifecycle, which involved applying deep learning to analyze gigapixel whole-slide images to quantify a novel biomarker for breast cancer prognosis.
- Databricks supported the AI lifecycle (from initial data ingestion to model deployment and monitoring), which Li said made the project time- and cost-efficient;
- The results of the project could be “very promising” for improving the prognosis of breast cancer.
Li said the speed across the project was a huge advantage: “We estimate that with Databricks, we have accelerated the development process fivefold and reduced communication costs between stakeholders tenfold, allowing us to bring innovations to market sooner to benefit patients.”
AI strategy now includes future projects
AI has become an increasingly important part of Peter Mac’s strategy. Databricks is supporting the cancer center in three additional use cases: genomics, radiation oncology, and cancer imaging. In addition, Peter Mac:
- Expand the AI program to include mainstream bioinformatics, including population genetics projects involving large sample sizes and vast amounts of genomic data.
- Applying advances in large-scale language models and augmented generation retrieval to extract knowledge from clinical and radiological reports.
- LLM is planned to be implemented in the future for research in genomics and transcriptomics, which analyzes RNA or the transcriptome to remain competitive in cancer research.
John Holland aims to unify data from all construction operations
Meanwhile, John Holland managed 80 large-scale infrastructure projects worth A$13.2 billion by 2023. However, Travis Rousell, the company's chief data and analytics officer, said its legacy data storage environment was fragmented and difficult to integrate.
SEE: How to improve data quality in data lakes
“We have all the typical issues that we’ve all had historically with data warehouses and data problems,” Rousell said. “Our legacy data warehouse environment was built incrementally over 20 years. It’s evolved and developed slowly, and we’ve created this really swampy set of data silos.”
Rousell added: “We could build BI [Business Intelligence] and reporting on the front end of those, but bringing that data together to be able to create insights into the flow of activities and behaviors that are happening so that we can drive change across our business has been a really difficult process for us.”
A unified data platform to deliver actionable insights
John Holland set out to create a unified data platform to leverage data and generate business value. This was part of the group’s effort to drive innovation and competitive advantage in its industry through modern data and digital practices as part of a broader digital transformation drive.
The organization has sought:
- Provide a unified, integrated view of data across the enterprise.
- Manage data governance across separately managed projects.
- Achieve a focus on data engineering rather than platform engineering.
Cost savings come from better data management
To date, John Holland has delivered several core business processes to the Databricks data lake, including project management, project operations, project controls, security, and fleet analytics.
As a result of using Databricks, Rousell said John Holland had:
- Reduced platform infrastructure costs by 46% on comparable workflows compared to traditional environments;
- Reduced data engineering development time and effort by 30% by creating new data products and models.
- Migrated over 600 users to data products delivered through the Databricks data lake.
IT becomes an enabler for John Holland's business
Rousell said Databricks ensures that IT and technology do not limit business progress.
“I think the most important thing we're achieving with this is that we're creating a 'yes' data culture within John Holland,” Rousell explained. “Historically, the difficulty in delivering new and innovative products has meant we've had to take on large, slow projects and not meet business expectations.
“Now, if the company has an idea, we can say yes; we can deploy a data workspace that gives them access to all the capabilities and tools they’ll need, and they can develop it at the speed they want.”