
Private Health Records from UK Biobank Leaked Online Dozens of Times
A Guardian investigation has uncovered repeated exposures of sensitive health data from UK Biobank, raising serious questions about the security of one of Britain's most important medical research dat
Private Health Records from UK Biobank Found Exposed Online
A major investigation has revealed that confidential health data belonging to hundreds of thousands of British volunteers has been repeatedly leaked online, casting a shadow over one of the United Kingdom's most celebrated medical research initiatives.
UK Biobank, a vast repository holding the medical records, genetic data, and lifestyle information of 500,000 volunteers, has long been praised for its contributions to breakthroughs in cancer, dementia, and diabetes research. However, a Guardian investigation has found that sensitive data from the project has been inadvertently published online on dozens of occasions, prompting serious concerns among privacy experts.
How the Data Ended Up Online
The leaks appear to stem from a growing requirement by academic journals and research funders for scientists to publish the code they use when analyzing large datasets. In the process of uploading this code to GitHub — a widely used online platform for sharing programming work — some researchers accidentally included partial or complete Biobank datasets alongside it.
UK Biobank strictly prohibits researchers from distributing data outside their own secure systems. In response to the breaches, the organization says it has introduced additional training for all approved researchers. Between July and December 2025 alone, Biobank issued 80 legal notices to GitHub, resulting in the removal of approximately 500 repositories. Nevertheless, a significant amount of data reportedly remained accessible online.
What the Leaked Data Contains
The exposed files vary considerably in scope. Some contain little more than anonymous patient identification numbers or a handful of test results, while others are far more comprehensive. One dataset uncovered by the Guardian in January contained hospital diagnoses, corresponding diagnosis dates, sex, and month and year of birth for approximately 413,000 participants.
A data expert who reviewed the file described the experience as deeply unsettling. "It sent shivers down my spine to even open," the expert said. "I deleted the file immediately. It was very detailed and felt like a gross invasion of privacy even to glance at."
While the files do not include names or home addresses, experts warn this does not necessarily protect individuals from being identified.
Re-Identification: A Real and Demonstrated Risk
To assess the genuine risk of re-identification, the Guardian worked with Biobank volunteers and an independent data scientist. Two volunteers who had undergone medical procedures within the timeframe covered by the data agreed to participate in the test.
One volunteer, whose records related to a fracture and a seizure, could not be located within the dataset. However, a second volunteer — a woman in her seventies — was successfully identified using only her month and year of birth and the approximate timing of a hysterectomy. Just one individual in the entire dataset matched those details. Strikingly, five additional diagnoses from her medical history were also present in the record, none of which she had initially shared with the research team.
"Effectively you were rehearsing the main parts of my medical history to me without me having given you any information at all. I didn't expect that," the volunteer said.
Despite the unsettling experience, the woman said she intended to remain a participant in UK Biobank, viewing its research mission as genuinely important. However, she raised concerns about whether the organization had upheld its commitments. "They said they would hold our data securely," she noted. "I just feel as though that has to come into the equation."
UK Biobank Pushes Back on Privacy Concerns
UK Biobank has largely rejected the criticism, maintaining that no directly identifying information — such as names or addresses — is provided to researchers. In a statement, Professor Sir Rory Collins, the organization's chief executive, said there had never been any evidence of a Biobank participant being re-identified by outside parties.
The organization also argued that the re-identification scenario demonstrated by the Guardian was only possible because the volunteer had effectively provided key personal details. A spokesperson pointed to guidance the organization gives participants, warning them not to post health-related information publicly online, as doing so could theoretically allow their Biobank records to be cross-referenced.
"You have simply demonstrated why we tell participants not to do this," the spokesperson said.
Experts Warn the Approach Is Out of Step With Modern Reality
Privacy and data science experts have pushed back sharply against UK Biobank's position, arguing that it places an unreasonable burden on volunteers and fails to account for the realities of life in the digital age.
"Are these people aware that the internet exists?" said Professor Felix Ritchie, an economist at the University of the West of England. "The idea that they can rely on their volunteers never putting any other information out there about themselves is an entirely unreasonable thing to expect."
Dr. Luc Rocher, an associate professor at the Oxford Internet Institute who examined several of the leaked datasets, explained that stripping away obvious identifiers is rarely sufficient to guarantee true anonymity. Knowing only a person's birthday and the approximate date of a medical event — such as a broken bone — could be enough to identify them with a high degree of confidence, he warned.
"Once identified, that record could reveal sensitive information such as a psychiatric diagnosis, an HIV test result, or a history of drug abuse," Dr. Rocher said.
Scale of the Problem Described as 'Shocking'
Professor Niels Peek, a data science and healthcare expert at the University of Cambridge, acknowledged that UK Biobank had taken the issue seriously and acted within reasonable expectations. Nevertheless, he described the overall scale of the problem as deeply troubling.
"If it had happened once or 10 times I'd probably say it's not great but zero risk is impossible," he said. "Hundreds. That's a little bit too much."
In his view, the situation highlights a fundamental tension at the heart of large-scale health research: the ambition to harness data for medical progress sits in uncomfortable conflict with the legal and ethical duty to safeguard individual privacy.
Questions Remain Over Whether Control Can Be Fully Restored
Experts have also raised doubts about whether UK Biobank can realistically recover full control over all the data that has already been exposed. Despite the removal of numerous repositories by both researchers and GitHub, some files were still reportedly accessible through an independent code archive website until shortly before the Guardian's investigation was published.
UK Biobank was founded in 2003 with backing from the Department of Health and several medical research charities. Until late 2024, approved researchers were permitted to download data directly onto their own computer systems. Last month, the government extended Biobank's data access rights to include volunteers' GP records — a development that may intensify scrutiny of how the organization manages data security going forward.

