Bakar Institute’s Responses to, & Resources for, COVID-19
Message from the Director
Dear Members,

I hope you're staying safe and healthy during these challenging times. I want to assure you that the Bakar Computational Health Sciences Institute is working hard to do our part in tackling COVID-19.   Our staff are rapidly assembling a central research database holding all clinical data on all SARS-CoV-2 tested patients at UCSF, to be usable by all researchers at UCSF. As a show of support on this project, we were able to get IRB approval for this work within hours, over a weekend!
In addition, we've compiled a list of studies conducted by our Institute and faculty, along with a list of resources and events in this special edition newsletter. With national datasets becoming readily available and data driven competitions taking place, I encourage you to apply your expertise and take action.
UCSF and the entire University of California Health system has released an online dashboard with our clinical data to help researchers see the amount of positive tests, and the age distribution of confirmed cases. The University of California Health system now sends out daily updates on COVID-19 patient counts via its   @UofCAHealth Twitter .
My own lab wanted to get into the fight, and developed a   COVID-19 County Tracker   app. It visualizes COVID-19 cases across all 3,142 counties in the United States. The site features plots of total cases, broken down by states and county, with interactive exploration to compare both. Looks like several thousands of individuals have already visited the site so far in the first few days. If your lab is doing any work against COVID-19, please do let us know!
The White House Office of Science and Technology Policy, U.S. Department of Energy and IBM are announcing an unprecedented volume of access to computing power to aid COVID-19 researchers. They’ve also launched the 2020 Call for Code Global Challenge to pursue and create COVID-19 solutions.
The COVID-19 Open Research Dataset (CORD-19) from the White House and a coalition of leading research groups is now freely available. It was created for the global community of researchers to apply natural language processing and artificial intelligence technology. CORD-19 contains over 45,000 scholarly articles with over 33,000 full text about COVID-19, SARS-CoV-2, and other coronaviruses.
Our NLP@UCSF group is taking on the COVID-19 Open Research Dataset Challenge (CORD-19). This call to action is to develop text and data mining tools to support the biomedical community. If you would like to participate, please see more details below.
Now, more than ever we need to come together and show that as a community we are better able to tackle this pandemic. The Bakar Computational Health Sciences Institute will continue our mission in advancing computational health sciences in research, practice and education — in support of Precision Medicine for all.

Butte Lab - Douglas Arneson, Paul Bleicher, Atul Butte, Matthew Elliot, Arman Mosenia, Boris Oskotsky, Vivek Rudrapatna, Rohit Vashisht and Travis Zack

Their group has been developing a data visualization app to help all of us better understand the impact of the COVID-19 pandemic at the local level. In collaboration with Paul Bleicher, MD PhD and former CEO of OptumLabs, we rapidly developed and deployed a new web app for the public: Their app features plots of total cases and deaths by county, with customizable views for real-time, interactive exploration to compare both counties and states. It also offers options to normalize by population size and rescale by time after the first 10 cases.
The app was released last Friday, and we’re pleased to say that in just the first few days it has already been used by thousands of unique users!
What did we learn? Much more than we can say here, but here are three findings:
1) The COVID-19 pandemic has spread to an overwhelming majority of counties in the US. 2) NYC has the highest new case rate but is about average when considering its large population. There are early signs that the rate of new cases there is starting to slow down. 3) LA and SF are currently doubling cases every 3 and 4 days respectively.
Future features may include looking at the impacts of policies like shelter-in-place on case rates, comparisons with other international cities who may be ahead of us from an infection dynamics standpoint, and predictive models involving time to peak and anticipated hospitalization/ICU use.
Butte Lab - Sanchita Bhattacharya and Zicheng Hu

Butte lab researchers Sanchita Bhattacharya and Zicheng Hu are working with the BCHSI Computational group to explore opportunities to integrate ImmPort, an NIAID-funded Immunology data portal into the Information Commons Ecosystem. The proposal aims to facilitate clinical and molecular data integration for the NIAID-funded COMET ( The COVID-19 Multi-phenotyping for Effective Therapies ) study led by UCSF investigators (Carolyn Calfee, David Erle, Max Krummel, Chaz Langelier, and Prescott Woodruff).

BCHSI Data Team - Creating COVID-19 Data Mart. BCHSI has been seeing many research initiatives directed at studying the extent, patient progress, treatment protocols and drug alternatives for the ongoing COVID pandemic. One of these was from the U.S Dept. of Health and Human Services. To serve as many of these as quickly as possible, BCHSI is constructing a Data Mart with fresh data on UCSF COVID patients, based on our Information Commons. This goal is to refresh the structured records and associated clinical notes daily, and images weekly. An IRB to access this data was approved in one business day - a record time! The next steps are to extract the data from CDW and Clarity, and to prepare the secure AWS environment to host all this. IT, which is still gearing up to provide full support on AWS, gave us special permission to host this, as long as all support was provided by BCHSI. We thank our colleagues in IT and the IRB for coming together so rapidly. Contact:

We are exploring, with both Mt. Sinai and the University of Central Florida - in two COVID hotspot regions - how to "share" our COVID data via a federated model. This will be made easier by both centers' decisions to emulate our Information Commons!

NLP@UCSF - Organizing to take on the COVID-19 Open Research Dataset Challenge (CORD-19). Potential Kaggle competition submissions and other NLP projects to process COVID-related literature.

Rima Arnaout - Aiding her cardiology colleagues Greg Marcus MD and Tommy Dewland MD in a digital health study to gather data on arrhythmia burden in COVID-positive patients sent to convalesce at home. Anecdotal reports suggest that myocarditis and arrhythmias may be affecting these patients—but more data is needed. This study is in collaboration with iRhythm Technologies, makers of a wearable EKG sensor. This study will leverage the ongoing COVID-19 Citizen Science Study, a mobile app-based study aiming to enroll every adult with a smartphone to help combat the disease (text COVID 41411 to join), and åopen to ideas and collaborations with other interested scientists. The study now has more than 10,000 participants!

Sergio Baranzini - The Scalable Precision-medicine Oriented Knowledge Engine (SPOKE) is a comprehensive biomedical knowledge graph connecting a wealth of information from basic molecular research, clinical insights, and many other databases. The SPOKE Neighborhood Explorer tool allows anyone to interact with the knowledge graph in a hypothesis-driven manner and browse connections between genes, drugs, diseases and more. Recently, the SPOKE team added Sars-CoV-2 data from the Krogan Lab's work examining the viral proteins. Pre-loaded queries in the Neighborhood Explorer let you explore the viral-human protein interactions and how they are connected to other data elements within SPOKE. Their team is using SPOKE to inform candidates for drug repurposing, and also in combination with EHRs to identify pre-existing conditions that can put people at higher risk of hospitalization

Katie Pollard - A Bioinformatics PhD student in her lab, Calla Martyn, is volunteering at the Biohub testing center. Alex Pico, Core Director of Bioinformatics at Gladstone Institutes , and colleagues are curating pathway data resources for SARS-COV , COVID pathway figures

Ida Sim - Vivli , is launching a portal for sharing participant-level data from Covid clinical trials.

Marina Sirota - The Sirota Lab is applying a computational drug repositioning pipeline based on transcriptomics to identify therapeutic candidates to combat COVID-19. While transcriptomics data is still rare, the tenOever Laboratory out of Mt. Sinai has published (on bioRxiv) a preliminary differential gene expression signature for COVID-19 by infecting human alveolar adenocarcinoma cells. In contrast to target-based approaches, by considering a larger signature instead of focusing on single targets, we hope to incorporate genome-wide effects into the predictions of therapeutics. If successful, the focus on drugs that are already FDA-approved for other indications could greatly accelerate the timeline for developing new therapies.
COVID-19 County Tracker - COVID-19 cases across all 3,142 counties in the US.

COVID-19 Open Research Dataset (CORD-19 ) - free resource with over 45,000 scholarly articles and preprints, including over 33,000 with full text, about COVID-19 and the coronavirus.

Search COVID-related literature - curated by PubMed’s LitCovid , the World Health Organization , and the Dimensions research analytics database. 

Elsevier’s Novel Coronavirus Information Cente r - opened access to over 20,000 full-text, machine readable articles related to COVID-19, “for as long as needed”. Clinical and patient resources are also provided, as well as access to selected textbooks .

Other publisher portals for relevant content – see “COVID19 Research” section in Resource Category column.

COVID-19 dataset resources - compiled by the UCSF Library.

UCSF Emotional Health & Wellbeing Resources - for employees, trainees, and students.

SF Food Friends - Helping to connect low-risk shoppers to people in need.

Popular Sites for COVID-19 Case Reporting and Predictions

Many people are taking advantage of their time at home to learn a new skill, and we encourage you all to explore the resources available to you for this purpose. The UCSF Library's Data Science Initiative has several trainings available online , and if you need help, they are available for consultations. The Gladstone Institutes Bioinformatics Core also has an excellent set of online training resources available, and we are working with these partners to organize open office hours to help get your questions answered.

If you want to learn Python, SQL or Unix programming, there will be a special virtual drop-in office hour every Tuesday in April, from 2:00-4:00 pm. You can join remotely and get your questions answered in a group setting. 

You can find introductory instruction materials here:

We will communicate more resources as they become available.

Registration is now open for  Intermediate R Data Visualization . A two-part webinar taking place on April 13-14, 2:30 - 4:00 pm. The high-level language of R is considered one of the most powerful languages for quantitative analysis, statistics, and graphics. It is designed for folks who have some experience in using R and are looking to take their R skills to the next level.  Attendance in the first part on April 13th is mandatory to attend the second part on April14th.
Community Events
April 10, 2020, 4-5 pm.
Watch this week’s Town Hall for presentations from Bakar Institute director Atul Butte MD, PhD on UC-wide COVID data efforts and BCHSI-affiliated faculty Vivek Rudrapatna MD, PhD, on the new COVID-19 tracker. These townhalls are weekly meetings, every Friday, 4-5 pm, through May 1, 2020.

April 13, 2020, 4:30 - 6:00 pm. Online live talks connecting leading experts with the public.

The following events have been postponed:
Funding Calls
There are several agencies that are offering special funding opportunities for COVID-directed research. For instance:
Private funding can also be found, such as:
Welcome to the Bakar Institute

Daniela Ushizima

Faculty Affiliate, UCSF
Staff Scientist, Lawrence Berkeley National Lab
Data Scientist, the  Berkeley Institute for Data Science (BIDS)  at UC Berkeley
  • William Brown - published a paper in the Journal of Disaster Medicine and Public Health Preparedness titled "Text Messaging and Disaster Preparedness Aids Engagement, Re-Engagement, Retention, and Communication Among Puerto Rican Participants in a Human Immunodeficiency Virus (HIV) Self-Testing Study After Hurricanes Irma and Maria." Published several papers in AIDS and Behavior titled “Broaching the Topic of HIV Self-testing with Potential Sexual Partners Among Men and Transgender Women Who Have Sex with Men in New York and Puerto Rico”“Then We Looked at His Results: Men Who Have Sex With Men from New York City and Puerto Rico Report Their Sexual Partner's Reactions to Receiving Reactive HIV Self-Test Results”, “Few Aggressive or Violent Incidents are Associated with the Use of HIV Self-tests to Screen Sexual Partners Among Key Populations”, “Use of HIV Self-Testing Kits to Screen Clients Among Transgender Female Sex Workers in New York and Puerto Rico”

  • William Brown, Maria Glymour - received a new T32 for "UCSF Data Science Training to Advance Behavioral and Social Science Expertise for Health Research (DaTABASE) Program"

  • Atul Butte - published a paper in ASCPT, Clinical Pharmacology and Therapeutics, in collaboration with Google titled "Predicting inpatient medication orders from electronic health record data"

  • Ida Sim + Atul Butte - published a paper in Science on NIH’s data sharing policy titled "Time for NIH to lead on data sharing"

Photo: Mike Luckovich/The Atlanta Journal-Constitution