Volume 51 | May 26, 2022
A program of NIH’s National Center for Advancing Translational Sciences
N3C Privacy Preserving Record Linkage (PPRL)
2022 FedHealthIT Awardee!
The 8th Annual FedHealthIT Innovation Awards recognizes and honors the Federal Health technology and consulting community by celebrating programs nominated and selected by their peers for DRIVING INNOVATION and RESULTS across the Department of Veterans Affairs, Military Health, Health and Human Services, and Centers for Medicare and Medicaid Services.

National COVID Cohort Collaborative Privacy Preserving Record Linkage (N3C PPRL)
National Center for Advancing Translational Sciences, National Institutes of Health

The National COVID Cohort Collaborative (N3C) holds clinical data originating from 72+ health systems throughout the US. The N3C was established as an open science repository in response to the pandemic. As data accumulated and re-infections occurred, NCATS realized there needed to be a more extensive method for data de-duplication (same patient, different hospitals), and multi-modal data enrichment as a data strategy that provided the ability to leverage various NIH repositories. Using a PPRL architecture combined with a Linkage Honest Broker data governance model, N3C is able to support a broad array of data activities in a de-identified manner, including data de-duplication, data enrichment with Viral Variant sequence data, mortality data, CMS claims data, imaging data from other NIH repositories, and cohort discoveries under a rapid timeline. These capabilities underpin the N3C data strategy and ability to answer the increasingly complex questions on COVID, including Long COVID.

To learn more about N3C PPRL click Here. To see all the FedHealthIT winners click Here.
What the Research Community is Saying about N3C
A. Jerrod Anzalone, MS
Clinical Research Informatics Specialist
PhD Candidate in Biomedical Informatics
University of Nebraska Medical Center
Omaha, Nebraska
"N3C has represented a paradigm shift in connecting disparate health data and supporting rapid translation of data to knowledge, but it’s more than a platform. The N3C community has demonstrated the potential of Team Science at scale and how much more we can accomplish as a scientific community when physical proximity and institutional affiliation are less important than shared goals. The novel organization around Domain Teams has accelerated knowledge transfer and relationship building in a way that will extend beyond the COVID-19 pandemic. The N3C Rural Health Domain Team is one example where researchers from across the country have had an opportunity to share and coalesce their expertise on a medically underserved population that faces dramatically different regional challenges and has been historically underrepresented in the scientific literature."
The Next Phase of N3C

Even though the world is potentially transitioning to a more normal state, the COVID virus is still impacting our families, friends, colleagues, and our communities. There are many questions that need answers, especially around variants, vaccines, and Long COVID. The National COVID Cohort Collaborative (N3C) is a robust resource of medical record data that will continue to provide insights and information to researchers and citizen scientists across the country and beyond. 

In addition to the EHR data from more than 72 medical centers and community clinics around the country, we also have approximately 50 publicly available and acquired external datasets to enhance the medical record data.

Acquired External Datasets
  • Mortality Data
  • Viral Variant
  • Centers for Medicare and Medicaid Services (CMS)

Publicly Available External Datasets (examples) Link to External Dataset list 
  • Environmental Quality Index
  • Social Deprivation Index (SDI)
  • Food Access Research Atlas

Site Provided Data Enhancements
  • Adding Ventilator Settings and Respiratory Support Levels for Inpatients
  • Adding ADT (Admission-Discharge-Transfer) Timestamps to Indicate Inpatient Movement to/from the ED, ICU, and Floor.
  • Adding Structured Social Determinants of Health Data (SDoH)

NCATS will continue full and uninterrupted support of the National COVID Cohort Collaborative (N3C) for the foreseeable future. During the next phase, we will focus on enhancing centralized resources and will continue to provide training and community support. If you have a COVID research question, want to test an ML/AI algorithm, or want to advance COVID research let us know. So reach out to learn how to take advantage of this rich data source at cd2h@cuanschutz.edu.
Remember these 3 Rules if You are Obtaining Data Results From N3C?

You have analyzed the data for your project and now you want to present or publish your results. Please remember the following 3 rules when requesting permission to produce your results.

  1. Results must be requested through the Results Download Committee
  2. N3C data results cell size must be >20
  3. Row Level Data must not be requested and will not be approved by the committee

Please read and follow the policy is Here
NIH Needs Your Feedback!
1-minute Survey!
We would like to hear from you! As part of the N3C program evaluation, we are collecting information about your experience with N3C, as well as your utilization of N3C resources. Your feedback is extremely valuable to understanding what is working well in N3C and what can be improved.  

Please provide feedback by Monday, June 6 by 9:00pm PT for N3C User Satisfaction Survey.
Ready to Interact with Summary Data?
N3C Public Health Browser

  • N3C recently launched the Public Health Browser https://covid.cd2h.org/dashboard/public-health , making aggregate summary data available for all stakeholders to access sample sizes of high-value COVID-19 data cohorts without requesting access to the N3C Data Enclave. 
  • The data shown in the N3C public health browser comes from the N3C enclave, which is the largest collection of real-world data in the USA. The N3C Enclave comes from 72 health care institutions, from 49/50 states across the USA, that as of April 2022 has over 14 billion rows of clinical information, with 13 million participants and 5 million COVID+individuals. In addition to the real-world data (RWD), the N3C enclave has a library of over 30 external data sets that vary from mortality, and pollution index that can be linked to the clinical data. A full list of available external data sets can be found at https://discovery.biothings.io/dataset?guide=/guide/n3c/dataset
  • The public health browser currently includes 14 interactive dashboards that provide a snapshot of N3C participant data ranging from Long COVID and Pediatrics to mortality and medications. Of the many dashboards of high interest, the long covid dashboard https://covid.cd2h.org/dashboard/public-health/summary/long-covidhas multiple cohorts of interest including ~17,000 participants diagnosed with the ICD-10 code, ~3000 participants referred to specialty clinics and over 400,000 participants with potentially related symptoms.  
  • The N3C dashboards assist government officials and policymakers by having near real-time longitudinal data on patients from 49/50 that offer granular information on population-based outcomes, resource utilization, treatment variations, and outcomes of care across the county.
  • Clinicians can review the summary counts and use these as a base for their decisions to study a specific cohort.
  • Investigators/Researchers can utilize these snapshots of the clinical status of COVID as a signal for further hypothesis and non-hypothesis-based investigation in the N3C open science enclave.
  • Policymakers and congressional staff can leverage these high-value health data to report on current advances while influencing broad policies and federal legislation.
New Enclave Feature
Now You Can Remove Yourself from a Data User Request (DUR)
New Feature: Collaborators are now allowed to remove themselves from the project/workspace and DUR.
You can access this feature from your Enclave project dashboard.
Note: Make sure you enter the exact project name. If you want to rejoin the process, you must go through the entire DUR process again.

Published works using N3C data are listed on the N3C Cohort Exploration Dashboard. The Publications tab displays titles and links to fully published articles, articles online ahead of print, and published preprints, as well as a list of accepted conference presentations and posters.

Recent N3C Articles Accepted for Journal Publication

Help Us Track N3C Publications!

When you have a research product that is ready for publication or accepted for presentation, please submit it via the N3C Publication Intent Form, which will notify the Publication Committee of N3C output to be registered. (Research products include: manuscripts, posters, conference papers, blogs, press releases, podium presentations, etc.)

Per the N3C Attribution and Publication Principles, all manuscripts using N3C community resources must be reviewed by the Publication Committee. Non-manuscript products do not require review but should be submitted after they have been accepted by the conference to allow for promotion and tracking of collaborator accomplishments.

View the Publication Review web page for more details.
N3C In the News

SAN FRANCISCO, May 24, 2022 (GLOBE NEWSWIRE) -- Datavant, the leader in helping organizations securely connect health data today announced that the National COVID Cohort Collaborative Privacy-Preserving Record Linkage (N3C PPRL), powered by Datavant technology and Regenstrief Institute’s Linkage Honest Broker services, has been recognized with a 2022 FedHealthIT Innovation Award.

Scientists may have found a way to identify who is susceptible to long COVID, thanks to machine learning and artificial intelligence. 

News Medical Life Sciences: Identification of Long COVID Patients Through Machine Learning (May 20, 2022)
In a recent study posted to Preprints with The Lancet*, researchers developed a machine learning approach to identify patients with long coronavirus disease (COVID).

A research team supported by the National Institutes of Health has identified characteristics of people with long COVID and those likely to have it. Scientists, using machine learning techniques, analysed an unprecedented collection of electronic health records (EHRs) available for COVID-19 research to better identify who has long COVID. Exploring de-identified EHR data in the National COVID Cohort Collaborative (N3C), a national, centralised public database led by NIH’s National Center for Advancing Translational Sciences (NCATS), the team used the data to find more than 100,000 likely long COVID cases as of October 2021 (as of May 2022, the count is more than 200,000). The findings appear in The Lancet Digital Health.

Community 99: Impact of NSAIDS on COVID-19 Severity (May 19, 2022)
Since its emergence in late 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected over 525 million people and caused more than 6.28 million deaths globally. SARS-CoV-2 infection results in coronavirus disease 2019 (COVID-19), which is characterized by a wide range of symptoms with severe effects, including pneumonia and hypoxemic respiratory failure.

Researchers analyzed the largest database of private insurance claims in the United States in the first four months after a diagnostic code for long Covid was created.

Since the early months of the COVID-19 pandemic, doctors and scientists have been mystified by the occurrence of what’s come to be known as long-haul COVID or simply long COVID, in which individuals experience symptoms that last for weeks or months after the initial coronavirus infection has passed.

WEDNESDAY, May 18, 2022 (HealthDay News) -- Loss of smell and taste are less likely with new COVID-19 variants when compared with the initial untyped COVID-19, according to a study published online May 3 in Otolaryngology-Head and Neck Surgery.

Issues in Science and Technology: Building a Data Infrastructure for the Bioeconomy (May 18, 2022)
While the development of vaccines for COVID-19 has been widely lauded, other successful components of the national response to the pandemic have not received as much attention. The National COVID Cohort Collaborative (N3C), for example, flew under the public’s radar, even though it aggregated crucial US public health data about the new disease through cross-institutional collaborations among government, private, and nonprofit health and research organizations.

National Institutes of Health: Scientists Identify Characteristics to Better Define Long COVID (May 16, 2022)
A research team supported by the National Institutes of Health has identified characteristics of people with long COVID and those likely to have it. Scientists, using machine learning techniques, analyzed an unprecedented collection of electronic health records (EHRs) available for COVID-19 research to better identify who has long COVID. 

People infected with the COVID-19 omicron variant are significantly less likely to develop smell and taste loss compared to those infected by delta and earlier COVID-19 variants, according to results published this month by Virginia Commonwealth University researchers in the journal Otolaryngology — Head and Neck Surgery.

hen the tangle of data locked in EHRs is decoded and shared between organizations and institutions, important health insights can be discovered, Nature reported May 3.

Nature: Health Data for All (May 3, 2022)
For the gastrointestinal condition known as ulcerative colitis, some physicians recommend using a particular drug twice a day, others, three times. But which protocol is the best way to help people with the condition to avoid surgery? Instead of launching a clinical trial, Peter Higgins, a gastroenterologist at the University of Michigan at Ann Arbor, examined the data.

Long Covid, with its constellation of symptoms, is proving a challenging moving target for researchers trying to conduct large studies of the syndrome. As they take aim, they’re debating how to responsibly use growing piles of real-world data — drawing from the full experiences of long Covid patients, not just their participation in stewarded clinical trials.

Research led by Washington University School of Medicine in St. Louis has demonstrated that analyzing synthetic data generated from real COVID-19 patients accurately replicates the results of the same analyses conducted on the real patient data.
Sign Up for the Community Forum!

Earlier this month, we removed the N3C Community Forum Outlook calendar invite.
Please register for your Community Forum meeting link. After registering, you will receive a confirmation email from cd2h@uw.edu containing information about joining the meeting and you will also have the ability to download the .ics file to create calendar reminders for future Community Forums.

N3Community Forum

Presentations take place on select Mondays from 5–6 p.m. ET/2–3 p.m. PT. To attend these and future N3Community Forum presentations, please register here.

Missed an N3Community Forum or want to revisit a past Forum? You can find all the videos on our YouTube page.

June 6, 2022

Topic: Highly Effective Protocol for Treating Long COVID Patients that Either Completely Resolves or Significantly Improves Symptom Burden
Presenters: William Cornwell
University of Colorado, Anschutz

June 13, 2022

Topic: BARDA Challenge
Presenters: Tim Bergquist

June 27, 2022

Topic: PPRL Validation Process
Presenters: Shaun Grannis
Regenstrief Institute

July 11, 2022

Topic: Temporal Events Detector for Pregnancy Care (TED-PC): A Rule-based Algorithm to Infer Gestational Age and Delivery Date from Electronic Health Records of Pregnant Women with and without COVID-19
Presenters: Tianchu Lyu, MPH; Chen Liang, PhD; and Jihong Liu, ScD
University of South Carolina

July 18, 2022

Topic: Harmonizing Units and Values of Quantitative Data Elements in a Very Large Nationally-Pooled EHR Dataset (DOI: ocac054) in Journal of the American Medical Informatics Association
Presenters: Kate Bradwell, PhD and Richard Moffitt, PhD
Palantir and Stony Brook University

July 25, 2022

Topic: Vaccine Effectiveness Domain Team
Presenter: Nasia Safdar, MD, PhD
University of Wisconsin

August 15, 2022

Topic: COVID-19 Patients with Documented Alcohol Use Disorder or Alcohol-related Complications are More Likely to be Hospitalized and have Higher All-cause Mortality
Presenter: Kristina Bailey, MD
University of Nebraska

August 22, 2022

Topic: Acute Upper Airway Disease in Children With the Omicron (B.1.1.529) Variant of SARS-CoV-2-A Report From the US National COVID Cohort Collaborative
Presenter: Blake Martin, MD
University of Colorado
Congratulations to our Newest PhD!

Dr. Sunyang Fu
Bioinformatics and Computational Biology
University of Minnesota, Bioinformatics and Computational Biology

My research focuses on (i) designing and validating Natural Language Processing (NLP) techniques for clinical information extraction, (ii) developing informatics frameworks and processes to accelerate the secondary use of EHRs for clinical research and (iii) discovering EHR heterogeneity and information quality through quantitative and qualitative methods.

I had my cat (Kabi) when I started the PhD and it’s nice to have a furry companion throughout the journey.

Role on N3C
  • Leading CDH2 NLP Scoping Review
  • Co-leading CD2H Playbook NLP Chapter
  • Supporting multiple N3C NLP projects
New Team Member
Jessica Mitchell
BIDS Project Manager
Johns Hopkins University
N3C Domain Teams

N3C Domain Teams enable researchers with shared interests to analyze data within the N3C Data Enclave and collaborate more efficiently in a team science environment. They include multidisciplinary Clinical Domains composed of subject matter experts, statisticians, informaticists, and machine learning specialists who focus on clinical questions surrounding COVID-19's impact on health. Cross-Cutting Domains have a varied focus that applies to multiple domains. These teams provide an opportunity to collect pilot data for grant submissions, train algorithms on larger datasets, inform clinical trial design, learn how to use tools for large-scale COVID-19 data, and validate results. N3C encourages researchers of all levels to join a Domain Team that represents their interests, or to suggest new clinical areas to explore.

No Meeting Weeks - Mark Your Calendars!

To help our community with a positive and productive workload, CD2H-N3C will schedule several “No Meetings Weeks” throughout the year. 

Most meetings will be canceled. Impromptu meetings can still occur to push through action items as needed during no meeting week. Workgroups and Domain Teams should check with their Leads to determine meeting schedules for that week.

N3C support will continue with regular operations. If you have any trouble logging on to the enclave, please contact NCATSAuthSupport@mail.nih.gov. For all other issues please use the Support Desk.

2nd Quarter: May 30-June 3
3rd Quarter: July 4-8
4th Quarter: November 7-11
No Community Forum during these weeks

As we have recently crossed the 2-year mark for the COVID-19 pandemic, it is a good inflection point to identify colleagues who have completed their efforts with CD2H/N3C projects and have transitioned to other great opportunities. If this is you, or perhaps your colleague, we would like to ask that you complete this 2-minute form (Bit.ly/cd2h-offboarding-form) to offboard CD2H and or N3C projects. You can continue to just get the newsletter if you wish.

We thank you for your tireless efforts in CD2H/N3C projects and look forward to working with you on many other projects.
Reporting Concerns

In the event that you come across activities that pose misalignment with the principles outlined in the Community Guiding Principles for the National COVID Cohort Collaborative (N3C), you can privately notify us using the Report Conduct Concerns form located on the N3C website under the SUPPORT menu. Your feedback is important and we will take prompt and confidential action to address your concerns. All data management incidents should also be reported to NCATS. Thank you for your contribution!
The National COVID Cohort Collaborative (N3C) is a complementary and synergistic partnership among the Clinical and Translational Science Awards (CTSA) Program hubs, the National Center for Data to Health (CD2H), distributed clinical data networks (PCORnet, OHDSI, ACT, TriNetX), and other partner organizations, with overall stewardship by NIH’s National Center for Advancing Translational Sciences (NCATS). The N3C aims to improve the efficiency and accessibility of analyses using a very large row-level (patient-level) COVID-19 clinical dataset, demonstrate a novel approach for collaborative pandemic data sharing, and speed understanding of and treatments for COVID-19.
CD2H is supported by the National Center for Advancing Translational Sciences (NCATS) 
at the National Institutes of Health
(Grant U24TR002306).