How One Life Sciences Company Wrangled Their Data Sprawl
Scientists set out to build a revolutionary map of tumor tissues — but their data volume threatened the results.
Written by Alex Chang | 3 min • April 03, 2025
How One Life Sciences Company Wrangled Their Data Sprawl
Scientists set out to build a revolutionary map of tumor tissues — but their data volume threatened the results.
Written by Alex Chang | 3 min • April 03, 2025
Scientists at the National Physical Laboratory (NPL) are creating what they call a "Google Earth of cancer" – a revolutionary molecular-level map of tumor tissues that promises to transform our understanding of the disease. But behind this groundbreaking research lies a challenge that's becoming increasingly common in modern scientific institutions: how to manage the tsunami of data generated by today's scientific instruments.
"We need to be able to protect and manage vast amounts of information to support research initiatives," explains Nigel Budd, Science Support Leader for NPL's IT Services Unit. "If there [is] an issue with data integrity or availability, it could have a massive impact on the quality and continuity of scientific research."
This project alone will generate around 500 terabytes of data for NPL, equivalent to the content of roughly a quarter of all US academic research libraries.
"The amount of data generated by genomics research alone will generate between 2 and 40 exabytes of data within the next decade. "
The exponential growth in scientific data isn't unique to NPL. The amount of data generated by genomics research alone will generate between 2 and 40 exabytes of data within the next decade, according to the National Human Genome Research Institute. This acceleration is driven by increasingly sophisticated instruments and imaging systems that can capture more detailed information in less time than ever before.
For NPL, which has been a cornerstone of British scientific research since 1900, the data management challenge threatened to impede its mission of advancing scientific discovery. The organization's legacy system relied on weekly tape backups, a process so time-consuming it required a full day of staff time every week. With hundreds of employees around the U.K. collaborating with hundreds of scientific organizations globally, this manual approach was becoming untenable.
More critically, the outdated approach was ill-suited for modern scientific collaboration, which increasingly requires seamless data sharing across institutions and continents.
“People need to be able search [not just] their own data but also across data from other research projects as there might be a critical nugget of information that is needed further down the line,” says Budd. The challenge was particularly acute for NPL's advanced high-resolution imaging systems, which were creating huge data sets in much shorter time frames than ever before.
To support this ambitious goal, NPL needed a fundamentally new approach to data management. The solution came in the form of a hybrid approach that combines cloud storage for historical data with local storage for active research projects. Working with Hitachi Vantara, NPL implemented an object storage system that makes data searchable and accessible to authorized users regardless of location or device. This was particularly crucial for the cancer mapping project, which involves collaboration across multiple research institutions.
The solution includes specialized data migrator software to ingest data from partner organizations and upload granular metadata—a critical feature in the scientific sector where detailed information such as experimental conditions and parameters be preserved alongside the raw data. The system also provides detailed utilization metrics and reports that simplify capacity planning and help NPL manage storage resources across different research projects.
The new system addresses a unique aspect of scientific data management: the long-term value of research data.
"Our scientific data often has a value beyond the life of the project, instrument, or even scientist who created it," Budd explains. This longevity requirement adds another layer of complexity to scientific data management. Not only must the data be preserved, but it must remain discoverable and usable decades into the future.
The impact of this digital transformation extends beyond NPL's walls. As one of hundreds of scientific organizations collaborating globally, NPL's improved data management capabilities contribute to the broader scientific ecosystem. Researchers can now search not only their own data but also across data from other research projects, potentially uncovering valuable insights that might otherwise remain hidden in siloed storage systems.
The success of NPL's data management overhaul offers valuable lessons for other research institutions facing similar challenges. As scientific instruments continue to generate ever-larger datasets, the ability to efficiently store, protect, and share this information becomes as crucial to scientific progress as the research itself.
"Fast, secure and reliable access to data can make all the difference to the success of a research program," Budd observes.
The true scale of what this modernization enables becomes clear when Budd describes the cancer mapping project's ultimate goal: "By the time this project is finished, you'll be able to zoom into a tumor like you're zooming into a country on Google Earth, getting so detailed you could see someone sitting in a chair and read the book they're holding."
It's the kind of ambitious scientific undertaking that would have been impossible without a fundamental rethinking of how research data is stored, protected, and shared. In an era where scientific breakthroughs increasingly depend on our ability to analyze vast amounts of data, how we manage that data may well determine the pace of scientific progress itself.