In late December 2019, eight pages of genetic code were sent to computers at the National Institutes of Health in Bethesda, Maryland.
Unbeknownst to U.S. officials at the time, the genetic map that had arrived on their doorstep contained critical clues about the virus that would soon trigger a pandemic.
The genetic code, submitted by Chinese scientists to a vast public repository of sequencing data run by the U.S. government, described a mysterious new virus that had infected a 65-year-old man weeks earlier in Wuhan. At the time the code was sent, Chinese officials had not yet warned about the unexplained pneumonia that was sickening patients in the central city of Wuhan.
But the US repository, which was designed to help scientists share common research data, never added the submission it received on December 28, 2019, to its database. Instead, he asked Chinese scientists three days later to resubmit the code with certain additional technical details. That request remained unanswered.
It took almost another two weeks for a pair of virologists, one Australian and one Chinese, to work together to publish the genetic code of the new coronavirus online, setting off a frantic global effort to save lives by creating tests and vaccines.
The initial attempt by Chinese scientists to publish the crucial code was first revealed in documents released Wednesday by House Republicans investigating the origins of Covid. The documents reinforced questions circulating since early 2020 about when China learned of the virus that was causing its unexplained outbreak, and also drew attention to gaps in the U.S. system for tracking dangerous new pathogens.
The Chinese government has said it quickly shared the virus's genetic code with global health officials. House Republicans said the new documents suggested that was not true. Chinese news accounts and social media posts have long reported that the virus was first sequenced in late December 2019.
But lawmakers and independent scientists said the documents did offer tantalizing new details about when and how scientists first attempted to share those sequences globally, illustrating the difficulty the United States has sifting pathogens of concern from thousands of sequences. monotonous genetics that are subjected to analysis. your repository every day.
“You would never have an ambulance sitting in normal traffic at 3 p.m.,” said Jeremy Kamil, a virologist at Louisiana State University Health Sciences Center in Shreveport. Referring to the 2019 coronavirus code, he said: “Why would you allow this sequence to sit there under the same process as a sequence I just got from a new species of snail I found in a ravine?”
A spokeswoman for the Department of Health and Human Services, which includes the NIH, said in a statement Wednesday that the genetic code was not released because it “could not be verified, even though the NIH followed the Chinese scientist for more time”. information and a response.”
In an earlier letter to House Republicans, Melanie Anne Egorin, a senior Health Department official, said the sequence had initially undergone “technical, but not scientific or public health” review, as was typical. . After receiving no response from Chinese scientists regarding the requested corrections, the database, known as GenBank, automatically removed the submission from its queue of unpublished sequences on January 16, 2020.
It is unclear why Chinese scientists did not respond. One of the submitters, Lili Ren, who worked at a pathogens institute within the state-affiliated Chinese Academy of Medical Sciences in Beijing, did not respond to a request for comment. The Chinese embassy said China's response was “based on science, effective and consistent with China's national realities.”
But the same sequence that Dr. Ren's group submitted to GenBank was made public in a different online database, known as GISAID, on January 12, 2020, shortly after other scientists published the first coronavirus code. Dr. Ren's group also resubmitted a corrected version of the code to GenBank in early February and published a paper describing his work.
The two-week gap between the code first being submitted to the US database and the moment China shared the sequence with global health officials “underscores why we cannot trust any of the so-called 'facts' or data.” ” from the Chinese government, Republican leaders said. said the member of the House Energy and Commerce Committee.
Jesse Bloom, a virologist at Seattle's Fred Hutchinson Cancer Center, said the genetic sequence would have strongly suggested to anyone who looked at it in late December 2019 that a new coronavirus was causing the mysterious pneumonia cases in Wuhan. Instead, official Chinese timelines indicate that the government did not make that diagnosis until early January.
“If this sequence had been available, probably the prototype vaccines could have been started right away, and that was two weeks earlier than they started,” Dr. Bloom said.
The documents, first reported by The Wall Street Journal, provide no information about the origins of the virus, Dr. Bloom and other scientists said, since the sequence contained no special clues about the virus's evolution and was later made publicly available. anyway.
But they do offer new details about the pace at which Dr. Ren's team worked to sequence the virus. The swab containing the virus they analyzed was taken from the 65-year-old patient, a vendor at the large market where the spread of the disease was first seen, on December 24, 2019. Within four days, the scientists sent the genetic data of that virus to GenBank.
“That's incredibly fast,” said Kristian Andersen, a virologist at the Scripps Research Institute.
At that time, finding a new coronavirus in the patient's sample would not have proven that it was that pathogen, and not a different virus or bacteria, that was causing his illness, Dr. Andersen said, although it would have been a reasonable hypothesis.
That consideration seemed to weigh on Chinese scientists studying samples from the first patients. A researcher at a Chinese commercial laboratory who worked with Dr. Ren wrote in a blog in late January 2020 that while she had identified a new virus in hospital samples, that alone did not prove that the virus was causing cases of pneumonia, which slowed down an official investigation. advertisement.
In early 2020, the Chinese government also issued directives discouraging certain lines of scientific research and restricted the release of data on the virus.
Even once the virus's genetic code was sent to the US repository, it would have been difficult for US officials working on the research-oriented database to notice. The repository contains hundreds of millions of genetic sequences. Much of the selection process is automated.
And at least until Chinese officials started sounding the alarm in late December 2019, almost no one would have known to look for a new coronavirus among the piles of submissions.
“At the time, there was no way for anyone at NCBI to realize the importance of that,” said Alexander Crits-Christoph, a computational biologist, referring to the NIH center that runs GenBank. Beyond that, he said, genetic repositories like GenBank must be careful about publishing sequences, since researchers often use the same data to prepare journal articles.
Still, some scientists believe that U.S. and global health officials have been slow to modernize databases like GenBank to allow them to tap into sequences that could have critical public health implications.
Such a database could, for example, automatically search for new pathogens whose genetic codes overlap with those known to be dangerous, Dr. Kamil said. And it could ensure that those sequences circulate more widely, even as health officials wait for missing details or reviews.
“Give concierge attention to those sequences, my goodness,” he said. “Why haven't the agencies in charge of public health or global health stepped up their game and said, 'This is the year 2024, we need to be safer so things like this don't happen again'?”