Data Mining in Biomedicine PDF
Document Details
![RighteousRetinalite2227](https://quizgecko.com/images/avatars/avatar-19.webp)
Uploaded by RighteousRetinalite2227
2024
Tags
Summary
This document presents an overview of data mining in biomedicine, focusing on various biomedical data resources. It explores resources like NCBI GEO and others, highlighting their applications for biological analysis and disease research.
Full Transcript
Data Mining in Biomedicine Understanding Biomedical Data Sources November 2024 Key Biomedical Data Sources NCBI Gene Expression Omnibus GTEx The Cancer Genome Atlas Ensembl Protein Data Bank ClinVar Human Protein Atlas Open Targets BioGRID LINCS NCBI GEO NC...
Data Mining in Biomedicine Understanding Biomedical Data Sources November 2024 Key Biomedical Data Sources NCBI Gene Expression Omnibus GTEx The Cancer Genome Atlas Ensembl Protein Data Bank ClinVar Human Protein Atlas Open Targets BioGRID LINCS NCBI GEO NCBI GEO (Gene Expression Omnibus): https://www.ncbi.nlm.nih.gov/geo/ one of the most widely used repositories for biological data contains data from a variety of experimental conditions, including gene expression profiles from microarray experiments and RNA sequencing (RNA-seq) researchers submit their datasets to GEO, and these can be freely accessed for data mining and analysis data is linked to associated clinical, experimental, and environmental metadata, making GEO an invaluable tool for understanding how gene expression varies under different biological or disease conditions NCBI GEO GEO has been used extensively in cancer research to explore gene expression changes in different cancer types, such as breast cancer, lung cancer, and glioblastomas. A study by Kanehisa et al. (2017) used GEO to analyze differentially expressed genes in pancreatic cancer, ultimately identifying novel biomarkers and therapeutic targets. Researchers can also integrate multiple GEO datasets for larger analyses to increase the statistical power of their findings and validate results across multiple cohorts GEO2R - Hands-On Exercise GEO2R, an online tool provided by NCBI, allows users to easily compare gene expression across multiple experimental conditions or datasets, streamlining the process of finding differentially expressed genes. This resource is heavily utilized in functional genomics, drug development, and disease biomarker discovery 1. Search for Datasets NCBI 2. Select an interesting dataset 3. Tweak settings and perform a test 4. View results GEO2R - Questions What are caveats of using this data? Can you select more than one dataset? Why? Why use it at all? GTEx The Adult Genotype Tissue Expression (GTEx) Project is a comprehensive public resource to study human gene expression and regulation, and its relationship to genetic variation across multiple diverse tissues and individuals. GTEx donor recruitment and molecular data generation are complete for core molecular assays including WGS, WES, and RNA-seq. Atlas of normal gene expression Programmatic access: https://gtexportal.org/api/v2/redoc#tag/GTEx-Portal-API-Info TCGA The Cancer Genome Atlas (TCGA), a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. Data is available through GDC Portal: https://portal.gdc.cancer.gov/ Programmatic access: ○ https://gdc.cancer.gov/access-data/data-access-processes-and-tools ○ https://docs.gdc.cancer.gov/API/Users_Guide/Getting_Started/ ENSEMBL The ENSEMBL database provides a wealth of genomic, transcriptomic, and proteomic information that can be mined for various biological and computational analyses. Offers data and tools regarding: Genomics (gene annotation, variants, non-coding RNAs) Transcriptomics (expression, splicing) Proteomics (domains, PTMs) Comparative genomics (orthology, synteny, evolutionary analysis) Regulatory genomics (enhancers, promoters, TFBS) Mining Tools: BioMart, REST API, FTP, MySQL Dumps Protein Data Bank Data mining from the Protein Data Bank (PDB) focuses on structural, functional, and interaction data for biomolecules, primarily proteins, DNA, RNA, and their complexes. PDB provides rich datasets that can be explored for insights into molecular biology, drug discovery, and bioinformatics. ClinVar Public archive of reports of human variations classified for diseases and drug responses, with supporting evidence Variants in any part of the genome, from single nucleotide variants and small insertions/deletions through large copy number variants Data available on the website, FTP site and by API Human Protein Atlas The Human Protein Atlas is a program initiated that aims to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies (antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology) All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome LINCS Library of Integrated Network-based Cellular Signatures (LINCS) program aims to create a network-based understanding of biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents All LINCS funded L1000 data, which is used to populate the Connectivity Map (CMap) dataset, are deposited into the NCBI Gene Expression Omnibus LINCS data from L1000 and Proteomics is integrated with perturbational datasets generated with other funding sources and made available for analysis at clue.io