Manhattan plot showing association between common genetic variants (MAF > 1%) and BMI.

Manhattan plot showing association between common genetic variants (MAF > 1%) and BMI.

Source publication
Article
Full-text available
The Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank at the University of Pennsylvania (Penn Medicine). A large variety of health-related information, ranging from diagnosis codes to laboratory measurements, imaging data and lifestyle information, is integrated with genomic and biomarker data in the PMBB to facilitat...

Context in source publication

Context 1
... then conducted crossancestry meta-analysis by integrating GWAS summary statistics from each ancestry group using PLINK. Our meta-analysis identified 201 genome-wide significant SNP associations with BMI (p < 5 × 10 −08 , Figure 5), replicating several previously reported associations in published GWAS of BMI. The strongest association in our PMBB analysis was with FTO variant rs55872725 (p = 4.7 × 10 −28 , beta = 0.271), which has been previously reported. ...

Similar publications

Article
Full-text available
AimsUnderstanding atypical forms of diabetes (AD) may advance precision medicine, but methods to identify such patients are needed. We propose an electronic health record (EHR)-based algorithmic approach to identify patients who may have AD, specifically those with insulin-sufficient, non-metabolic diabetes, in order to improve feasibility of ident...

Citations

... The UKB is composed of >500,000 participating individuals aged 37-73 years at the time of recruitment, who underwent various questionnaires, physical measurements, biological sampling (blood and urine), and genome sequencing across 22 assessment centers in the UK 30 . A subset of participants were invited to complete an additional examination that included magnetic resonance imaging of the heart 10 .The PMBB is composed of 174,712 consenting patients of the Penn Medicine health network, with a subset of 44,000 participants with available genotyping data 11 . Additionally, all participants medical records including imaging results are de-identified and linked to their identifier. ...
Preprint
Full-text available
Aortic structure and function impact cardiovascular health through multiple mechanisms. Aortic structural degeneration increases left ventricular afterload, pulse pressure and promotes target organ damage. Despite the impact of aortic structure on cardiovascular health, aortic 3D-geometry has yet to be comprehensively assessed. Using a convolutional neural network (U-Net) combined with morphological operations, we quantified aortic 3D-geometric phenotypes (AGPs) from 53,612 participants in the UK Biobank and 8,066 participants in the Penn Medicine Biobank. AGPs reflective of structural aortic degeneration, characterized by arch unfolding, descending aortic lengthening and luminal dilation exhibited cross-sectional associations with hypertension and cardiac diseases, and were predictive for new-onset hypertension, heart failure, cardiomyopathy, and atrial fibrillation. We identified 237 novel genetic loci associated with 3D-AGPs. Fibrillin-2 gene polymorphisms were identified as key determinants of aortic arch-3D structure. Mendelian randomization identified putative causal effects of aortic geometry on the risk of chronic kidney disease and stroke.
... The Penn Medicine Biobank (PMBB) is a large academic medical biobank in which participants are agnostically recruited from the outpatient setting and consented for access to their EHR data and permission to generate genomic and biomarker data [22]. The study flowchart is illustrated in Additional file 1: Fig. S1. ...
Article
Full-text available
Background Previous studies have shown that lifestyle/environmental factors could accelerate the development of age-related hearing loss (ARHL). However, there has not yet been a study investigating the joint association among genetics, lifestyle/environmental factors, and adherence to healthy lifestyle for risk of ARHL. We aimed to assess the association between ARHL genetic variants, lifestyle/environmental factors, and adherence to healthy lifestyle as pertains to risk of ARHL. Methods This case–control study included 376,464 European individuals aged 40 to 69 years, enrolled between 2006 and 2010 in the UK Biobank (UKBB). As a replication set, we also included a total of 26,523 individuals considered of European ancestry and 9834 individuals considered of African-American ancestry through the Penn Medicine Biobank (PMBB). The polygenic risk score (PRS) for ARHL was derived from a sensorineural hearing loss genome-wide association study from the FinnGen Consortium and categorized as low, intermediate, high, and very high. We selected lifestyle/environmental factors that have been previously studied in association with hearing loss. A composite healthy lifestyle score was determined using seven selected lifestyle behaviors and one environmental factor. Results Of the 376,464 participants, 87,066 (23.1%) cases belonged to the ARHL group, and 289,398 (76.9%) individuals comprised the control group in the UKBB. A very high PRS for ARHL had a 49% higher risk of ARHL than those with low PRS (adjusted OR, 1.49; 95% CI, 1.36–1.62; P < .001), which was replicated in the PMBB cohort. A very poor lifestyle was also associated with risk of ARHL (adjusted OR, 3.03; 95% CI, 2.75–3.35; P < .001). These risk factors showed joint effects with the risk of ARHL. Conversely, adherence to healthy lifestyle in relation to hearing mostly attenuated the risk of ARHL even in individuals with very high PRS (adjusted OR, 0.21; 95% CI, 0.09–0.52; P < .001). Conclusions Our findings of this study demonstrated a significant joint association between genetic and lifestyle factors regarding ARHL. In addition, our analysis suggested that lifestyle adherence in individuals with high genetic risk could reduce the risk of ARHL.
... The Penn Medicine Biobank (PMBB) is a large academic medical biobank in which participants are agnostically recruited from the outpatient setting and consented for access to their EHR data and permission to generate genomic and biomarker data [12]. The study flowchart is illustrated in Additional file 1: Fig. S1. ...
Article
Full-text available
Background Numerous observational studies have highlighted associations of genetic predisposition of head and neck squamous cell carcinoma (HNSCC) with diverse risk factors, but these findings are constrained by design limitations of observational studies. In this study, we utilized a phenome-wide association study (PheWAS) approach, incorporating a polygenic risk score (PRS) derived from a wide array of genomic variants, to systematically investigate phenotypes associated with genetic predisposition to HNSCC. Furthermore, we validated our findings across heterogeneous cohorts, enhancing the robustness and generalizability of our results. Methods We derived PRSs for HNSCC and its subgroups, oropharyngeal cancer and oral cancer, using large-scale genome-wide association study summary statistics from the Genetic Associations and Mechanisms in Oncology Network. We conducted a comprehensive investigation, leveraging genotyping data and electronic health records from 308,492 individuals in the UK Biobank and 38,401 individuals in the Penn Medicine Biobank (PMBB), and subsequently performed PheWAS to elucidate the associations between PRS and a wide spectrum of phenotypes. Results We revealed the HNSCC PRS showed significant association with phenotypes related to tobacco use disorder (OR, 1.06; 95% CI, 1.05–1.08; P = 3.50 × 10⁻¹⁵), alcoholism (OR, 1.06; 95% CI, 1.04–1.09; P = 6.14 × 10⁻⁹), alcohol-related disorders (OR, 1.08; 95% CI, 1.05–1.11; P = 1.09 × 10⁻⁸), emphysema (OR, 1.11; 95% CI, 1.06–1.16; P = 5.48 × 10⁻⁶), chronic airway obstruction (OR, 1.05; 95% CI, 1.03–1.07; P = 2.64 × 10⁻⁵), and cancer of bronchus (OR, 1.08; 95% CI, 1.04–1.13; P = 4.68 × 10⁻⁵). These findings were replicated in the PMBB cohort, and sensitivity analyses, including the exclusion of HNSCC cases and the major histocompatibility complex locus, confirmed the robustness of these associations. Additionally, we identified significant associations between HNSCC PRS and lifestyle factors related to smoking and alcohol consumption. Conclusions The study demonstrated the potential of PRS-based PheWAS in revealing associations between genetic risk factors for HNSCC and various phenotypic traits. The findings emphasized the importance of considering genetic susceptibility in understanding HNSCC and highlighted shared genetic bases between HNSCC and other health conditions and lifestyles.
... Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank research program at the University of Pennsylvania 48 . PMBB participants included in this study provided consent for research including access to their medical records, blood sample collection, and generation of genetic data 48 . ...
... Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank research program at the University of Pennsylvania 48 . PMBB participants included in this study provided consent for research including access to their medical records, blood sample collection, and generation of genetic data 48 . Individuals with both imputed genotype data from PMBB v2.0 and with lymphocyte count data were included in PGS analysis as a positive control. ...
... Approximately 80% of samples were genotyped by the Regeneron Genomics Center (RGC) using an Illumina Global Screening Array v.2.0 (GSAv2) 48 , while the remaining 20% were genotyped by the Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia using the GSAv1 and GSAv2 genotyping array 48 . ...
Article
Full-text available
Access to safe and effective antiretroviral therapy (ART) is a cornerstone in the global response to the HIV pandemic. Among people living with HIV, there is considerable interindividual variability in absolute CD4 T-cell recovery following initiation of virally suppressive ART. The contribution of host genetics to this variability is not well understood. We explored the contribution of a polygenic score which was derived from large, publicly available summary statistics for absolute lymphocyte count from individuals in the general population (PGSlymph) due to a lack of publicly available summary statistics for CD4 T-cell count. We explored associations with baseline CD4 T-cell count prior to ART initiation (n=4959) and change from baseline to week 48 on ART (n=3274) among treatment-naïve participants in prospective, randomized ART studies of the AIDS Clinical Trials Group. We separately examined an African-ancestry-derived and a European-ancestry-derived PGSlymph, and evaluated their performance across all participants, and also in the African and European ancestral groups separately. Multivariate models that included PGSlymph, baseline plasma HIV-1 RNA, age, sex, and 15 principal components (PCs) of genetic similarity explained ~26-27% of variability in baseline CD4 T-cell count, but PGSlymph accounted for <1% of this variability. Models that also included baseline CD4 T-cell count explained ~7-9% of variability in CD4 T-cell count increase on ART, but PGSlymph accounted for <1% of this variability. In univariate analyses, PGSlymph was not significantly associated with baseline or change in CD4 T-cell count. Among individuals of African ancestry, the African PGSlymph term in the multivariate model was significantly associated with change in CD4 T-cell count while not significant in the univariate model. When applied to lymphocyte count in a general medical biobank population (Penn Medicine BioBank), PGSlymph explained ~6-10% of variability in multivariate models (including age, sex, and PCs) but only ~1% in univariate models. In summary, a lymphocyte count PGS derived from the general population was not consistently associated with CD4 T-cell recovery on ART. Nonetheless, adjusting for clinical covariates is quite important when estimating such polygenic effects.
... We calculated PRS using PRS-CS for PAU (based on the EUR meta-analysis of PAU) in 131,500 individuals of EUR ancestry, and PRS for AUD (based on the AFR meta-analysis of AUD) in 27,494 individuals of AFR ancestry in four independent datasets (Vanderbilt University Medical Center's Biobank, Mount Sinai (BioMe), Mass General Brigham Biobank (MGBB) 84 and Penn Medicine Biobank (PMBB) 85 ) from the PsycheMERGE Network 86 , followed by PheWAS. Details for each dataset are described below. ...
Article
Full-text available
Problematic alcohol use (PAU), a trait that combines alcohol use disorder and alcohol-related problems assessed with a questionnaire, is a leading cause of death and morbidity worldwide. Here we conducted a large cross-ancestry meta-analysis of PAU in 1,079,947 individuals (European, N = 903,147; African, N = 122,571; Latin American, N = 38,962; East Asian, N = 13,551; and South Asian, N = 1,716 ancestries). We observed a high degree of cross-ancestral similarity in the genetic architecture of PAU and identified 110 independent risk variants in within- and cross-ancestry analyses. Cross-ancestry fine mapping improved the identification of likely causal variants. Prioritizing genes through gene expression and chromatin interaction in brain tissues identified multiple genes associated with PAU. We identified existing medications for potential pharmacological studies by a computational drug repurposing analysis. Cross-ancestry polygenic risk scores showed better performance of association in independent samples than single-ancestry polygenic risk scores. Genetic correlations between PAU and other traits were observed in multiple ancestries, with other substance use traits having the highest correlations. This study advances our knowledge of the genetic etiology of PAU, and these findings may bring possible clinical applicability of genetics insights—together with neuroscience, biology and data science—closer.
... Sequences are detailed in Fig. 1. Before the sequences begin, we will use electronic phenotyping procedures to identify a cohort of patients receiving care through two large OB-GYN clinics who are eligible for genetic testing for breast and ovarian cancer predisposition (based on information in the EHR), but who have no documentation of testing in the EHR [71]. Next, in the first sequence, all patients will be contacted via the MyPennMedicine (MPM) patient portal. ...
... Most patients at Radnor are white (83.7% white, 6.5% Black, 3.0% Asian, 6.8% other/unknown), while the Dickens Center predominantly serves Black patients (73.7% Black, 18.9% white, 2.1% Asian, 5.3% other/unknown). Patients seen at these two sites since January 1, 2009 will be selected by an EHR-based algorithm established previously [71] using the following eligibility criteria: (1) serous ovarian cancer diagnosed more than two years prior to study contact; (2) breast cancer diagnosed at under 50 years of age more than two years prior to study contact; (3) triple-negative breast cancer diagnosed at any age more than two years prior to study contact; (4) unaffected individuals reporting a family history of ovarian cancer; (6) at least two Penn Medicine appointments within the last three years. Utilizing electronic phenotyping in the EHR, participants who have previously received genetic counseling and testing will be excluded. ...
Article
Full-text available
Background Germline genetic testing is recommended by the National Comprehensive Cancer Network (NCCN) for individuals including, but not limited to, those with a personal history of ovarian cancer, young-onset (< 50 years) breast cancer, and a family history of ovarian cancer or male breast cancer. Genetic testing is underused overall, and rates are consistently lower among Black and Hispanic populations. Behavioral economics-informed implementation strategies, or nudges, directed towards patients and clinicians may increase the use of this evidence-based clinical practice. Methods Patients meeting eligibility for germline genetic testing for breast and ovarian cancer will be identified using electronic phenotyping algorithms. A pragmatic cohort study will test three sequential strategies to promote genetic testing, two directed at patients and one directed at clinicians, deployed in the electronic health record (EHR) for patients in OB-GYN clinics across a diverse academic medical center. We will use rapid cycle approaches informed by relevant clinician and patient experiences, health equity, and behavioral economics to optimize and de-risk our strategies and methods before trial initiation. Step 1 will send patients messages through the health system patient portal. For non-responders, step 2 will reach out to patients via text message. For non-responders, Step 3 will contact patients’ clinicians using a novel “pend and send” tool in the EHR. The primary implementation outcome is engagement with germline genetic testing for breast and ovarian cancer predisposition, defined as a scheduled genetic counseling appointment. Patient data collected through the EHR (e.g., race/ethnicity, geocoded address) will be examined as moderators of the impact of the strategies. Discussion This study will be one of the first to sequentially examine the effects of patient- and clinician-directed strategies informed by behavioral economics on engagement with breast and ovarian cancer genetic testing. The pragmatic and sequential design will facilitate a large and diverse patient sample, allow for the assessment of incremental gains from different implementation strategies, and permit the assessment of moderators of strategy effectiveness. The findings may help determine the impact of low-cost, highly transportable implementation strategies that can be integrated into healthcare systems to improve the use of genomic medicine. Trial registration ClinicalTrials.gov. NCT05721326. Registered February 10, 2023. https://www.clinicaltrials.gov/study/NCT05721326
... Globally, a majority of trials across diverse health disciplines were suspended due to the COVID-19 pandemic (17,18). The use of eIC allowed for remote enrolment of participants (19,20) consequently ensuring continuation of trials (13). Although current ndings on use of eIC are largely positive, they represent limited geographies and practice settings (21) with insu cient guidelines (22). ...
Preprint
Full-text available
Background: Technological advancements have facilitated increased use of virtual interactions in public health research between investigators and study participants. This includes electronic informed consent (eIC) as a feasible alternative to traditional paper based, in-person consenting processes. The COVID-19 pandemic impacted a large number of studies globally and processes like eIC enabled continued recruitment of participants into trials. Although current evidence on use of eIC are largely positive, further research is required from diverse contexts. This paper presents the processes of development and implementation of eIC in a large RCT on autism from India. Method: Cognitive interviews with 12 community members and 51 pilots were conducted to develop the eIC standard operating procedure for the RCT. The eIC implementation process included 5 steps and all interactions between participants and researchers were done over calls. This eIC procedure was used to recruit 220 participants to the trial between January 2021-December 2022. 14 researchers of the trial’s evaluation team used the eIC procedures and their feedback was routinely incorporated to the eIC implementation. All qualitative data was thematically analysed to identity strengths and limitations of the eIC procedure and descriptive analysis of quantitative data was done for population characteristics, eIC rates and duration of eIC. Results: 76.4% (n=220) of participants approached for eIC were found eligible for inclusion and gave consent for participation in the trial. The eIC calls took an average of 20 minutes (Range: 18-30 minutes) including the audio recording of participant responses to consenting statements read out by researchers. Key strengths of the eIC process as shared by researchers were time flexibility of conducting eIC calls and comprehension of trial information among participants. Major limitations were around establishing trust and rapport with participants during virtual interactions and appointment scheduling. Conclusion: The rate of consenting achieved in this trial using the eIC procedure and the feedback from researchers have provided further evidence supporting the use of eIC in complex trials in low- and middle-income countries. Trial registration: ISRCTN ID: 21454676; https://www.isrctn.com/ISRCTN21454676?q=21454676; Registration date: 22.06.2018
... Exome sequencing has been conducted at RGC for a total of 43,731 participants. 17 Variant and phenotype quality control Complete description on variant and phenotype quality control can be found in the Supplement. In brief, exonic variants in the canonical TP53 transcript NM_000546.5 with minor allele frequency (MAF) less than 1% in any population from the gnomAD v.2.1.1 non-cancer dataset 18 were annotated using ANNOVAR 19 and SnpEff. ...
... All 84 individuals with P/LP germline TP53 variants from UKB and Geisinger were self-reported or documented as White (Geisinger: 59/ 59 ''White''; UKB: 23/25 ''British'', 1/25 ''Irish'', and 1/25 ''any other White background''). Ancestry was genetically inferred in PMBB 17 and included 23 individuals of European ancestry, two African American individuals, and one Asian (Table 1). We carefully assessed the age at blood samples collection within each cohort given the possibility of CH-related TP53 variants at older ages. ...
... However, participation in these biobanks requires previous enrollment in the respective health care systems, which may have led to a higher number of cancer-affected volunteers. PMBB has been reported to be slightly skewed toward older males, 17 which is consistent with the proportionally greater number of males we identified in PMBB compared with Geisinger. Importantly, the prevalence estimates of P/LP germline TP53 variants in these two cohorts (1:2,983-3,790) resemble the conservative estimates between 1:3,555-5,476 previously reported by our group using the gnomAD dataset, 7 a compilation of several sequencing projects, including many with different disease-based recruitments. ...
Article
Full-text available
Pathogenic or likely pathogenic (P/LP) germline TP53 variants are the primary cause of Li-Fraumeni syndrome (LFS), a hereditary cancer predisposition disorder characterized by early-onset cancers. The population prevalence of P/LP germline TP53 variants is estimated to be approximately one in every 3,500 to 20,000 individuals. However, these estimates are likely impacted by ascertainment biases and lack of clinical and genetic data to account for potential confounding factors, such as clonal hematopoiesis. Genome-first approaches of cohorts linked to phenotype data can further refine these estimates by identifying individuals with variants of interest and then assessing their phenotypes. This study evaluated P/LP germline (variant allele fraction ≥30%) TP53 variants in three cohorts: UK Biobank (UKB, n = 200,590), Geisinger (n = 170,503), and Penn Medicine Biobank (PMBB, n = 43,731). A total of 109 individuals were identified with P/LP germline TP53 variants across the three databases. The TP53 p.R181H variant was the most frequently identified (9 of 109 individuals, 8%). A total of 110 cancers, including 47 hematologic cancers (47 of 110, 43%), were reported in 71 individuals. The prevalence of P/LP germline TP53 variants was conservatively estimated as 1:10,439 in UKB, 1:3,790 in Geisinger, and 1:2,983 in PMBB. These estimates were calculated after excluding related individuals and accounting for the potential impact of clonal hematopoiesis by excluding heterozygotes who ever developed a hematologic cancer. These varying estimates likely reflect intrinsic selection biases of each database, such as healthcare or population-based contexts. Prospective studies of diverse, young cohorts are required to better understand the population prevalence of germline TP53 variants and their associated cancer penetrance.
... To investigate the frequency and penetrance of NF1 PVs in a more unbiased way, we utilized the Penn Medicine BioBank (PMBB), a large academic medical biobank with exome sequencing data on 43,731 individuals, all patients of the University of Pennsylvania Health System (UPHS). 25 We identified 58 individuals heterozygous for any of 50 unique NF1 PVs: 43 predicted loss of function (pLOF) variants, five missense variants, and two deletions involving the entire NF1 gene ( Figure 1A, Table S2). This prevalence of 1 in 752 (0.13%) is four-fold greater than the reported prevalence of 1 in 2,500 to 3,500 for NF1. ...
Preprint
Loss of function variants in the NF1 gene cause neurofibromatosis type 1 (NF1), a genetic disorder characterized by complete penetrance, prevalence of 1 in 3,000, characteristic physical exam findings, and a substantially increased risk for malignancy. However, our understanding of the disorder is entirely based on patients ascertained through phenotype-first approaches. Leveraging a genotype-first approach in two large patient cohorts, we demonstrate unexpectedly high prevalence (1 in 450-750) of NF1 pathogenic variants. Half were identified in individuals lacking clinical features of NF1, with many appearing to have post-zygotic mosaicism for the identified variant. Incidentally discovered variants were not associated with classic NF1 features but were associated with an increased incidence of malignancy compared to a control population. Our findings suggest that NF1 pathogenic variants are substantially more common than previously thought, often characterized by somatic mosaicism and reduced penetrance, and are important contributors to cancer risk in the general population.
... The Penn Medicine BioBank (PMBB) has enrolled 174,712 racially diverse participants (56% female; 17% Black, 71% White, 4% Asian, 3% Other, and 6% Unknown) with a median of seven years of prospective health and disease data mapped to ICD diagnostic codes as well as the complete electronic medical record data (imaging, clinical laboratory measures, procedures, (23) . The PMBB also includes a biorepository of blood and tissue samples for genetic and other 'omics assays; to date, whole exome sequence and genome-wide genotype data are available for approximately 45,000 participants. ...
Article
Full-text available
Though diet quality is widely recognized as linked to risk of chronic disease, health systems have been challenged to find a user-friendly, efficient way to obtain information about diet. The Penn Healthy Diet (PHD) survey was designed to fill this void. The purposes of this pilot project were to assess the patient experience with the PHD, to validate the accuracy of the PHD against related items in a diet recall, and to explore scoring algorithms with relationship to the Healthy Eating Index (HEI)-2015 computed from the recall data. A convenience sample of participants in the Penn Health BioBank was surveyed with the PHD, the Automated Self-Administered 24-hour recall (ASA24), and experience questions. Kappa scores and Spearman correlations were used to compare related questions in the PHD to the ASA24. Numerical scoring, regression tree, and weighted regressions were computed for scoring. Participants assessed the PHD as easy to use and were willing to repeat the survey at least annually. The three scoring algorithms were strongly associated with HEI-2015 scores using National Health and Nutrition Examination Survey (NHANES) 2017-18 data from which the PHD was developed, and moderately associated with the pilot replication data. The PHD is acceptable to participants and at least moderately correlated with the HEI-2015. Further validation in a larger sample will enable the selection of the strongest scoring approach.