Science topic

Bioinformatics - Science topic

Explore the latest questions and answers in Bioinformatics, and find Bioinformatics experts.
Questions related to Bioinformatics
  • asked a question related to Bioinformatics
Question
2 answers
I am wondering what is your tool of choice for making a powerful statistical analyses and beautiful publication-ready plots in case of genomic intervals overlays, usually starting with a two or more .bed files. In last few days I was playing with BedSect (https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.00003/full), but would like to try also some alternatives :)
Thank you in advance, for your tips and tricks!
Relevant answer
Answer
GenomicRanges is an R package for representing and manipulating genomic intervals. When used with the rtracklayer R package and it, dealing with the BED data format is really convenient.
  • asked a question related to Bioinformatics
Question
3 answers
Constant touchstones following the protocol introduces a new dimension of bioethics to the bioinformatics about big ideas, modes of inquiry, intellectual traits and habits of mins but it is relativity of shared economy that transcends attention span to specific evidence of integration.
Relevant answer
Answer
San Diego Supercomputer Center (sdsc.edu)
Math of where
where industry anchors innovation.
where industry strategizes innovation..
where industry constellates innovation...
  • asked a question related to Bioinformatics
Question
1 answer
Hello,
In the literature, there are some MS/MS results that include hypothetical proteins, which can be shorter than 40 amino acids. I can also find these when I search for an organism in the protein section of NCBI. My question is, would it be absurd if I synthetically synthesize these peptides called hypothetical proteins and test them as drug candidates in certain disease models? Or are studies like the one I mentioned feasible and being conducted? If so, what procedure should I follow? For example, when I find a hypothetical protein, should I first perform a blast and then synthesize and use it if it meets certain conditions?
Is there any chance you could share some references with me that have been done in this manner?
I hope I have been able to convey what I want to ask.
Thank you for your answers.
Relevant answer
Answer
It is hypothetical since its existence is presumed based on the transcriptomic data rather than being directly detected by proteomics or WB. You can try based on some apriori knowledge about it as a lead molecule or entirely randomly. Nothing wrong with that. Also, 'polypeptide' suits better for an amino acid chain this short.
  • asked a question related to Bioinformatics
Question
1 answer
Fatal error:
Atom HD1 in residue HIS 822 was not found in rtp entry HSE with 17 atoms while sorting atoms.
For a hydrogen, this can be a different protonation state, or it
might have had a different number in the PDB file and was rebuilt
(it might for instance have been H3, and we only expected H1 & H2).
Note that hydrogens might have been added to the entry for the N-terminus.
Remove this hydrogen or choose a different protonation state to solve it.
Option -ignh will ignore all hydrogens in the input.
I also Followed the suggestion to using -ignh in the code but it give me this error:
Fatal error:
Atom OXT in residue VAL 961 was not found in rtp entry VAL with 16 atoms
while sorting atoms.
Would anyone please help me? I can't concentrate on my studies unless I solve them.
Relevant answer
Answer
Histidin can be either delta or epsilon protonated, and it is user's responsibility to check the protonation state. Before runing the gmx pdb2gmx, you should have set this protonation state and change the residue name to HIE (epsilon protonated) or HID (delta protonated). Although, pdb2gmx can approximatly guess the protonation state which are fairley good.
Since, you ignore the Hydrogen atoms, pdb2gmx will rebuilt all those h-atoms based on the residue entery of the forcefield.
OXT seems to be the terminal atom. Is Val961 is the terminal residue? Do you have proper TER record in your pdb?
  • asked a question related to Bioinformatics
Question
2 answers
Can anyone assist with running the Linkage Analysis Tool 'Easylinkage' or any alternative tool for conducting linkage analysis and calculating LOD scores?
Relevant answer
Answer
Linkage analysis is a method used to map genetic loci associated with specific traits or diseases within families. It helps identify regions of the genome that may contain genetic variants contributing to the trait of interest. There are several software tools commonly used for linkage analysis, such as:
1. MERLIN (Multipoint Engine for Rapid Likelihood Inference):
2. Allegro:
3. GeneHunter:
4. SOLAR (Sequential Oligogenic Linkage Analysis Routines):
5. PLINK (PLatform for INtegrated Knowledge):
These are just a few examples of the linkage analysis tools available. The choice of tool depends on the specific requirements of your study, the type of genetic data you have, and the analysis methods you intend to use.
  • asked a question related to Bioinformatics
Question
1 answer
Hi, I'm new to bioinformatics. I have a weighted UniFrac metric generated from some bacterial samples and I want to test the effect of two factors A and B (both binary) on the composition using adonis2 in R. When I only include factor A, the result is not significant. But if I include the interaction of A and B, the interaction of A*B is not significant, but the individual effect of A and B are both significant. I'm wondering why the effect of A becomes significant, cause from the PCoA plot, the four groups (i.e., A1*B1, A1*B2, A2*B1, A2*B2) somehow overlapped.
Relevant answer
Answer
Hey, there's an important point to remember about ordination techniques like PCoA and NMDS. While they're great for visualizing complex data in 2D or 3D, they're just a simplified snapshot.
Look at the contribution % of each axes or the stress values (NMDS). It can some time help to try different ordination tech.
Another point to consider is the number of samples per condition A and B, maybe with more sample the cleavage will be more visible.
  • asked a question related to Bioinformatics
Question
4 answers
Hi all,
I have just started to learn about bioinformatics and I need help with it.
I have enriched some microbes from wastewater anaerobic sludge and sent them for 16S rRNA sequencing.
Based on the QC result I got after running trimmomatic, I am still not able to get a good quality sequence. The following is the code I ran for trimmomatic. Can you all help me with this?
trimmomatic PE -threads 2 -phred33 \
Raw160823_1.fastq.gz Raw160823_2.fastq.gz \
Raw160823_1P.fastq.gz Raw160823_1F.fastq.gz Raw160823_2P.fastq.gz Raw160823_2F.fastq.gz \
HEADCROP:10 SLIDINGWINDOW:4:30 MINLEN:50
Thank you very much!
Regards,
Kai
Relevant answer
Answer
you can use cutadapt or prinseqlite in bash terminal, these are having quality trimming options.
  • asked a question related to Bioinformatics
Question
1 answer
2024 IEEE 7th International Conference on Computer Information Science and Application Technology (CISAT 2024) will be held on July 12-14, 2024 in Hangzhou, China.
---Call For Papers---
The topics of interest for submission include, but are not limited to:
◕ Computational Science and Algorithms
· Algorithms
· Automated Software Engineering
· Bioinformatics and Scientific Computing
......
◕ Intelligent Computing and Artificial Intelligence
· Basic Theory and Application of Artificial Intelligence
· Big Data Analysis and Processing
· Biometric Identification
......
◕ Software Process and Data Mining
· Software Engineering Practice
· Web Engineering
· Multimedia and Visual Software Engineering
......
◕ Intelligent Transportation
· Intelligent Transportation Systems
· Vehicular Networks
· Edge Computing
· Spatiotemporal Data
All papers, both invited and contributed, the accepted papers, will be published and submitted for inclusion into IEEE Xplore subject to meeting IEEE Xplore's scope and quality requirements, and also submitted to EI Compendex and Scopus for indexing. All conference proceedings paper can not be less than 4 pages.
Important Dates:
Full Paper Submission Date: April 14, 2024
Submission Date: May 12, 2024
Registration Deadline: June 14, 2024
Conference Dates: July 12-14, 2024
For More Details please visit:
Invitation code: AISCONF
*Using the invitation code on submission system/registration can get priority review and feedback
Relevant answer
Please let me know if anyone is interested to o
  • asked a question related to Bioinformatics
Question
6 answers
Please suggest bioinformatics journals which do not need wet lab experiments.
Relevant answer
Answer
Your first suggestion is an excellent suggestion:
Your second suggestion, though interesting it is of 1st january 2024 no longer free of costs:
Possibly intesteresting, but they state that they are free of charge in 2023 (not sure what is the situation now in 2024):
Best regards.
  • asked a question related to Bioinformatics
Question
4 answers
I have several pairs of parameters (obtained from females and males) and want to find the difference in correlation between the two sexes for each parameter, but also want to give a weight so that the parameter showing the highest correlation with survival in either females or males have a greater weight. This way, I hope to find factors that shows a combination of strong correlation differences between females and males (with regard to survival) - and most positively correlated with survival for either sex (which I will resolve further).
To do this, if I have a correlation of parameter 1 for males as A and for females as B: I plan to do (A-B) multiplied by A or B (the highest correlation) - to acknowledge the weight of highest positive correlation with survival. For the next parameter, the correlation is C for males and D for females, I will do (C-D) X C or D (whichever is highest) - with the final aim to rank the parameters most differing between females and males, as well as, most correlating with survival of either sex. Do you think it is a reasonable idea?
I would be very very grateful for your advice, suggestions and tips.
Relevant answer
Answer
Thank you very much :)
  • asked a question related to Bioinformatics
Question
3 answers
I want to find the UTR sequence of mRNA sequence of bacteria protein. Can anyone suggest a insilico process for that
Relevant answer
Answer
Blast search the full ORF with the genome, you can then manually mine the promoter region, later check for the presence of regulatory elements associated with the gene function for validation
  • asked a question related to Bioinformatics
Question
1 answer
I am new to Desmond simulations and I want to know how can I find the estimated time left for a simulation to be completed? my 2nd query is how to perform B-Factor analysis after performing simulation on Desmond? Any help in this regard will be highly appreciated.
Thanks
Relevant answer
Answer
I hope someone finds this useful, I have myself been struggling with the same problem. My solution to this is looking at the chemical time and ns/day data, which is constantly updated.
To check this go to MONITOR -> double click running MD job -> check the last entry (status -running).
Double click the last entry for a 100ns default Simulation 100000 chemical time means 100ns this might give a rough idea of percentage job progress.
Going by the ns/day data can give you a rough idea of simulation speed like 11ns/day for 100ns simulation will finish on 10th day of the run applied.
  • asked a question related to Bioinformatics
Question
1 answer
I learnt to add 1 sequence at once but couldn't find an option to upload a bunch at once, still a tyro at bioinformatics, any leads?
Relevant answer
Answer
Hii
Javeria Tanveer
before uploading your sequence, maybe concatenate your sequences
  • asked a question related to Bioinformatics
Question
1 answer
Here are some examples of software that can be used for each step of RNA-seq data analysis:
  1. Quality Control: FastQC, PRINSEQ, Sickle
  2. Read Trimming: Trimmomatic, Cutadapt, AdapterRemoval
  3. Alignment: STAR, HISAT2, TopHat
  4. Quality Control of Alignment: Qualimap, RSeQC, Picard
  5. Assembly: Trinity, Oases, Trans-ABySS
  6. Quantification: RSEM, Kallisto, eXpress
  7. Differential Expression Analysis: DESeq2, EdgeR, limma
  8. Functional Annotation: Blast2GO, KEGG, Reactome
  9. Pathway Analysis: KEGG Pathway, Reactome, Enrichr
  10. Network Analysis: Cytoscape, STRING, ClueGO
  11. Visualization: IGV, GenomeBrowse, JBrowse
  12. Interpretation: GSEA, DAVID, IPA
Relevant answer
Answer
For the alignment step, I think it's important to mention pseudo alignment (Salmon, Sailfish, Kallisto) and RUM for hybrid alignment to both genome and transcriptome.
  • asked a question related to Bioinformatics
Question
1 answer
Dear Scientists and Researchers,
I'm thrilled to highlight a significant update from PeptiCloud: new no-code data analysis capabilities specifically designed for researchers. Now, at www.pepticloud.com, you can leverage these powerful tools to enhance your research without the need for coding expertise.
Key Features:
PeptiCloud's latest update lets you:
  • Create Plots: Easily visualize your data for insightful analysis.
  • Conduct Numerical Analysis: Analyze datasets with precision, no coding required.
  • Utilize Advanced Models: Access regression models (linear, polynomial, logistic, lasso, ridge) and machine learning algorithms (KNN and SVM) through a straightforward interface.
The Impact:
This innovation aims to remove the technological hurdles of data analysis, enabling researchers to concentrate on their scientific discoveries. By minimizing the need for programming skills, PeptiCloud is paving the way for more accessible and efficient bioinformatics research.
Join the Conversation:
  1. How do you envision no-code data analysis transforming your research?
  2. Are there any other no-code features you would like to see on PeptiCloud?
  3. If you've used no-code platforms before, how have they impacted your research productivity?
PeptiCloud is dedicated to empowering the bioinformatics community. Your insights and feedback are invaluable to us as we strive to enhance our platform. Visit us at www.pepticloud.com to explore these new features, and don't hesitate to reach out at [email protected] with your thoughts, suggestions, or questions.
Together, let's embark on a journey towards more accessible and impactful research.
Warm regards,
Chris Lee
Bioinformatics Advocate & PeptiCloud Founder
Relevant answer
Answer
I think they remove the need for programming skills and make data analysis much easier to do quickly and efficiently! For the future, I look forward to considering adding more no-code functions to meet a wider range of research needs. Just like the no-code platforms used before, a lot of time will be spent on data processing and analysis, and with no-code tools It will make our work easier and easier
  • asked a question related to Bioinformatics
Question
2 answers
How are the AMRFinderPlus and CARD different from each other for predication of AMR genes from bacterial genomic sequences?
How much overlap do AMRFinderPlus and CARD database have?
Relevant answer
Answer
Anuradha Goswami Thank you for the information.
  • asked a question related to Bioinformatics
Question
3 answers
My question relates to the implicit assumption that topologically associating domains(TAD) have to be contiguous along the genome.  This seems odd to me given that the DNA molecule exists in a 3D space while this contiguity criteria relates only to the 1D genome coordinate, which might not be appropriate to delimit interactions in a 3D space.
Consequently I am wondering if I'm missing anything obvious to impose such a criteria to characterise TADs.
Thanking you in advance.
Relevant answer
Answer
How to determine boundaries for the manual curation of pseudomolecules of chromosomes?
I created the pseudomolecules of chromosomes using HIC, Long reads assembly, HapHiC, 3d-DNA, and Juicebox. I have manually curated the pseudomolecules of chromosomes. Anyone let me Explain.
  • asked a question related to Bioinformatics
Question
3 answers
I am currently learning about PyMol to utilize in my project. I used PyMol to visualize potential H-bond interactions in specific amino acid residues. However, I have discovered that Arg465 and Ser461 show a distinct interaction, as shown.
Please help identify this interaction.
Relevant answer
Answer
The broken yellow line with the distance indicator (6.2) looks like a simple distance monitor which you generate with a "measure" command, although I do not know how you generated the blue tubes around it. At 6.2Å, the Ca-Ca distance indicated by the broken line is far larger than the sum of the carbon Van der Waals radii (3.4Å). It is just about short enough that you might classify the contact as a solvent excluding contact (hydrophobic interaction)
  • asked a question related to Bioinformatics
Question
3 answers
hello,
i am getting idle1.2.2 error in autodock1.5.6. to open my pdb 1j5e file, file is not even visualized on my screen. so i am getting error in first step of docking.
please give response
thank you.
Relevant answer
Answer
Did you install mgltools before installing autodock ? It contains certain commands that support autodock visualizations. You should try installing MGLtools and then re-install autodock. with updated versions
  • asked a question related to Bioinformatics
Question
7 answers
2024 3rd International Conference on Biomedical and Intelligent Systems (IC-BIS 2024) will be held from April 26 to 28, 2024, in Nanchang, China.
It is a comprehensive conference which focuses on Biomedical Engineering and Artificial Intelligent Systems. The main objective of IC-BIS 2024 is to address and deliberate on the latest technical status and recent trends in the research and applications of Biomedical Engineering and Bioinformatics. IC-BIS 2024 provides an opportunity for the scientists, engineers, industrialists, scholars and other professionals from all over the world to interact and exchange their new ideas and research outcomes in related fields and develop possible chances for future collaboration. The conference also aims at motivating the next generation of researchers to promote their interests in Biomedical Engineering and Artificial Intelligent Systems.
Important Dates:
Registration Deadline: March 26, 2024
Final Paper Submission Date: April 22, 2024
Conference Dates: April 26-28, 2024
---Call For Papers---
The topics of interest for submission include, but are not limited to:
- Biomedical Signal Processing and Medical Information
· Biomedical signal processing
· Medical big data and machine learning
· Application of artificial intelligent for biomedical signal processing
......
- Bioinformatics & Intelligent Computing
· Algorithms and Software Tools
· Algorithms, models, software, and tools in Bioinformatics
· Biostatistics and Stochastic Models
......
- Gene regulation, expression, identification and network
·High-performance computational systems biology and parallel implementations
· Image Analysis
· Inference from high-throughput experimental data
......
For More Details please visit:
Relevant answer
Answer
Veryy nice I interesting
  • asked a question related to Bioinformatics
Question
1 answer
I am interested in analyzing the correlation between the expression of a set of genes and transposable elements (TEs) in cancer. However, despite there are multiple online databases for gene expression in cancer, including TCGA, they do not include repetitive elements. Despite I've found some papers analyzing transposable elements and quantifying their expression in different cancer using TCGA data, supplemental tables only provide the fold change end p-values for differentially expressed TEs. Also, to identify and quantify TEs, the raw sequencing data, which have controlled access, would be necessary. Therefore, I was wondering if there is some database or published resource where I could find information regarding TE expression per sample in TCGA database. Does someone know something like that? Alternatively, if someone have analyzed this type of data and have some worksheet with pre-processed data that could be shared, I would be deeply grateful.
Relevant answer
Answer
Hi Glauco,
Are you aware of the Xena browser (https://xenabrowser.net/)? There are at least gene expression, mutation and methylation data per patient for the TCGA cohorts. Unfortunately I am not sure about the TE data, since I am not working with that but I think that Xena would be your best bet.
  • asked a question related to Bioinformatics
Question
5 answers
Dear ResearchGate network,
Recently I received an invitation to act as Associate Section Editor for Bentham Science for Current Bioinformatics for a period of 2 years, depending on de performance. Due I don’t have sufficient experience as editor, So, they requested me to given a name of a senior researcher with a h-index of 15 at least and knowledge in Bioinformatics to act together with me as a Section Editor (coeditor). As role of this coeditor is to propose at least one issue per year of a relevant theme for a special edition In Bioinformatics. Anyone here have knowledge from anyone who fits to this profile and could indicatr he/she for me, please?
I thank you in advance.
Pedro Paulo Gattai Gomes, PhD
Relevant answer
Answer
Hello, Mr. Pedro Paulo Gattai,
I would suggest Mohsin Saleet Jafri from George Mason University.
  • asked a question related to Bioinformatics
Question
4 answers
What to do if ChimeraX software doesn't recognise the .chimerax file downloaded from SwissDock after docking?
Besides, the zip file of prediction done was empty.
Thank you.
Relevant answer
Answer
I had this issue - I used 7-zip to unpack teh folder instead and then it was fine
  • asked a question related to Bioinformatics
Question
2 answers
dfsfdg dd
Relevant answer
Answer
Are there any updates on the book chapter?. We have not received any update for the abstract that we have sent
  • asked a question related to Bioinformatics
Question
1 answer
Explore the synergistic impact of machine learning on improving the precision of predicting protein structures in bioinformatics. Seeking insights into the specific methodologies and advancements that contribute to enhanced accuracy.
Relevant answer
Answer
Machine learning algorithms have been shown to enhance the accuracy of protein structure prediction in bioinformatics. Traditional methods for protein structure prediction rely on energy minimization and molecular dynamics simulations, which can be computationally expensive and time-consuming. Machine learning algorithms can be used to predict protein structure more efficiently and accurately by learning from large datasets of known protein structures and their corresponding sequences
Machine learning algorithms can be used to predict protein structure by analyzing the relationships between amino acid sequences and protein structures. These algorithms can identify patterns in the data and use them to predict the structure of unknown proteins. Machine learning algorithms can also be used to predict the stability of protein structures and to identify potential drug targets
  • asked a question related to Bioinformatics
Question
4 answers
In the rapidly evolving landscape of the Internet of Things (IoT), the integration of blockchain, machine learning, and natural language processing (NLP) holds promise for strengthening cybersecurity measures. This question explores the potential synergies among these technologies in detecting anomalies, ensuring data integrity, and fortifying the security of interconnected devices.
Relevant answer
Answer
Imagine we're talking about a superhero team-up in the world of tech, with blockchain, machine learning (ML), and natural language processing (NLP) joining forces to beef up cybersecurity in IoT environments.
First up, blockchain. It's like the trusty sidekick ensuring data integrity. By nature, it's transparent and tamper-proof. So, when you have a bunch of IoT devices communicating, blockchain can help keep that data exchange secure and verifiable. It's like having a digital ledger that says, "Yep, this data is legit and hasn't been messed with."
Then, enter machine learning. ML is the brains of the operation, constantly learning and adapting. It can analyze data patterns from IoT devices to spot anything unusual. Think of it as a detective that's always on the lookout for anomalies or suspicious activities.
And finally, there's NLP. It's a bit like the communicator of the group. In this context, NLP can be used to sift through tons of textual data from IoT devices or networks, helping to identify potential security threats or unusual patterns that might not be obvious at first glance.
Put them all together, and you've got a powerful team. Blockchain keeps the data trustworthy, ML hunts down anomalies, and NLP digs deeper into the data narrative. This combo can seriously level up cybersecurity in IoT, making it harder for bad actors to sneak in and cause havoc. Cool, right?
  • asked a question related to Bioinformatics
Question
4 answers
I've tried trimAl and Gblocks, but I'm unable to access the programs through the provided links. Thank you.
:)
Relevant answer
Answer
Debora Santos, here is a link for Gblock, I just opened this
  • asked a question related to Bioinformatics
Question
3 answers
This question blends various emerging technologies to spark discussion. It asks if sophisticated image recognition AI, trained on leaked bioinformatics data (e.g., genetic profiles), could identify vulnerabilities in medical devices connected to the Internet of Things (IoT). These vulnerabilities could then be exploited through "quantum-resistant backdoors" – hidden flaws that remain secure even against potential future advances in quantum computing. This scenario raises concerns for cybersecurity, ethical hacking practices, and the responsible development of both AI and medical technology.
Relevant answer
Answer
Combining image-trained neural networks, bioinformatics breaches, and quantum-resistant backdoors has major limitations.
Moving from image-trained neural networks to bioinformatics data requires significant domain transfer, which is not straightforward due to the distinct nature of these data types and tasks.
Secure IoT medical devices are designed with robust security features in mind and deployed. Successful attacks requires exploiting a specific vulnerability in the implementation of security measures, rather than the reliance on neural network capabilities.
Deliberately inserting backdoors and to the extent, even quantum-resistant ones, poses ethical and legal questions that would go against norms and standards of cybersecurity practitioners. The actions would violate privacy rights on the federal level, ethical standards and codes of conduct and pose severe legal consequences. Those would be the domestic ones; assuming we're keeping the products in the US.
Quantum computers with sufficient power to break current cryptographic systems are not yet available. Developing quantum-resistant backdoors knowingly anticipates a future scenario to be truth that is still today largely theoretical, without being proven or true.
  • asked a question related to Bioinformatics
Question
6 answers
Dear all, it is with great pleasure that I make public my latest exploration of openAI APIs. On this prototype, I have tested a medical chatbot.
Hope you enjoy the reading!
#bioinformatics #healthinformatics #medicine #chatbots #largelanguagemodels #openai #computervision #deeplearning #medicalimaging
You can leave a public review on
Relevant answer
Answer
Dear Loubna Youssar , there are several cases, and they are not hard to replica as I have shown on my preprint.
It is similar to the self-driving car: it is no longer a technology limitation, rather a question of time.
Best regards,
Jorge
  • asked a question related to Bioinformatics
Question
2 answers
Recently, I installed Modeller 10.4 software into my windows 10, 10GB RAM, 64x bit laptop to predict a 3D structure of a membrane protein (a.a length 574).
In this case , i used advanced modeller option to prediction. Because we can use multiple templates for structure prediction. But from the start I got errors when running the python script.
1)May I know what is the maximum number of templates,which can be used for advanced modeling.
Relevant answer
Answer
Here you can see better the format. If you want you can send me your code to check it.
  • asked a question related to Bioinformatics
Question
6 answers
What are good resources for an undergraduate student to start getting familiar with bioinformatics and, if possible, get some practical experience? Any favorite websites, blogs, videos, etc?
Thanks!
Relevant answer
Answer
Some recommendations to explore:
1. Online Courses and Tutorials:
- Coursera (https://www.coursera.org/): Coursera offers a wide range of bioinformatics courses, including introductory courses and specialized topics. Some popular courses include "Bioinformatics Specialization" by the University of California, San Diego and "Genomic Data Science" by Johns Hopkins University.
- edX (https://www.edx.org/): edX provides bioinformatics courses from renowned universities and institutions. "Bioinformatics: Introduction and Methods" by the University of Toronto and "Introduction to Bioinformatics" by the University of California, San Diego are highly recommended.
- Rosalind (http://rosalind.info/): Rosalind is an online platform that offers bioinformatics problems and challenges with interactive tutorials in various programming languages. It's a great resource to practice and apply bioinformatics concepts.
2. Bioinformatics Websites and Databases:
- National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/): NCBI provides a vast collection of biological databases, tools, and resources. It includes the GenBank database, PubMed, BLAST, and more.
- Ensembl (https://www.ensembl.org/): Ensembl offers genome annotation and analysis tools for various species. It provides access to genomic data, gene annotations, and comparative genomics.
- UniProt (https://www.uniprot.org/): UniProt is a comprehensive resource for protein sequence and functional information. It includes protein databases, tools for sequence analysis, and protein annotation.
3. Online Communities and Forums:
- BioStars (https://www.biostars.org/): BioStars is a popular Q&A platform for bioinformatics and computational biology. It's a great place to ask questions, find solutions, and learn from the community.
- SeqAnswers (https://www.seqanswers.com/): SeqAnswers is a forum dedicated to next-generation sequencing (NGS) technologies and data analysis. It covers a wide range of topics related to NGS and bioinformatics.
4. Books and Textbooks:
- "Bioinformatics Data Skills" by Vince Buffalo: This book provides practical guidance on bioinformatics data analysis using command-line tools and scripting.
- "Bioinformatics for Dummies" by Jean-Michel Claverie and Cedric Notredame: This beginner-friendly book covers the basics of bioinformatics and provides an introduction to various tools and techniques.
5. Research Internships and Projects:
- Reach out to professors or researchers at your university who are involved in bioinformatics or computational biology research. Inquire about opportunities to join their labs as an undergraduate research assistant or intern. This can provide hands-on experience and exposure to real-world bioinformatics projects.
Good luck: partial credit AI
  • asked a question related to Bioinformatics
Question
3 answers
Hello, I've recently started exploring molecular docking applications, and I'm still in the early stages. I'd like to ask Can I choose a ligand by giving the amino acid sequence and then do docking? Which applications would you suggest?
Thank you
Relevant answer
Answer
Hello. In order to dock a protein whose 3D structure is not available on the uniprot site, you must model its 3D structure, for this there are many servers such as Itasser, Quark, and Robetta, and after evaluating and measuring your 3D structure, you can do the docking by Servers like Cluspro, Zdock or Hdock
  • asked a question related to Bioinformatics
Question
4 answers
Hello, I've recently started exploring molecular docking applications, and I'm still in the early stages.I'd like to ask which proteins should be considered when examining the antimicrobial effects of certain molecules.
Is there a list of these proteins(that I should use as a docking protein), or are there general rules for proteins that should definitely be examined?
Also, can I perform docking not with a molecule but directly with an organism? If so, what should I look for to predict antimicrobial effects?
Could you please guide me on this?
Thank you.
Relevant answer
Answer
it's important to consider specific proteins that play crucial roles in the survival and reproduction of microorganisms. Enzymes involved in cell wall synthesis: Proteins like penicillin-binding proteins (PBPs) are crucial for bacterial cell wall formation.
DNA gyrase and topoisomerases: Involved in DNA replication and repair, these are essential targets for antimicrobial compounds.
Ribosomal proteins: Targeting bacterial ribosomes can disrupt protein synthesis. Utilize databases like the Protein Data Bank (PDB) to find crystal structures of your selected proteins. Molecular docking predictions should be validated through in vitro and in vivo experiments.for in vitro evaluation you can use microorganisms directly.
  • asked a question related to Bioinformatics
Question
3 answers
Hello. We understand that a volcano plot is a graphical representation of differential values (proteins or genes), and it requires two parameters: fold change and p-value. However, for IP-MS (immunoprecipitation-mass spectrometry) data, there are many proteins identified in the IP (immunoprecipitation group) with their intensity, but these proteins are not detected in the IgG (control group)(the data is blank). This means that we cannot calculate the p-value and fold change for these "present(IP) --- absent(IgG)" proteins, and therefore, we cannot plot them on a volcano plot. However, in many articles, we see that these proteins are successfully plotted on a volcano plot. How did they accomplish this? Are there any data fitting methods available to assist in drawing? need imputation? but is it reflect the real interaction degree?
Relevant answer
Answer
Albert Lee : the issue with doing this is it makes the fold changes entirely arbitrary. Imagine I have a protein I detect in my test samples at "arbitrary value 10" but do not detect in my control samples at all.
If I call the ctrl value 0.5, then 0.5 vs 10.5 = 20 fold increase.
If I call the ctrl value 0.1, then 0.1 vs 10.1 = 100 fold increase.
If I call the ctrl value 0.0001, then 0.0001 vs 10.0001 = 100,000 fold increase.
In reality, the increase is effectively "infinite fold", but what this is really highlighting is that fold changes are not an appropriate metric here.
A lot (most) of statistical analysis is predicated on the measurement of change in values, not "present/absent" scenarios.
For disease biomarkers, for example, something that is present/absent is of use as a diagnostic biomarker, but not as a monitoring biomarker: you can say "if you see this marker at all, you have the disease", but you cannot really use it to track therapeutic efficacy, because all values of this marker other than "N/A" are indicative of disease.
For monitoring biomarkers you really want "healthy" and "diseased" values such that you can track the shift from one to the other.
David Genisys: I agree with Jochen Wilhelm , and would not plot my data in this manner.
A lot will depend on the kind of reviewers you get, and the type of paper you're trying to produce, but it would be more appropriate to note that these markers are entirely absent in one group, and then to comment on the robustness of their detection in the other. You wouldn't run stats necessarily, because as noted, stats are horrible for yes/no markers, but you could use the combination of presence/absence and actual level of the former to make inferences as to biological effect. If a marker goes from "not detected" to "detected but barely", then it might be indicative of dysregulated, aberrant expression behaviour, or perhaps stochastic low-level damage. Interesting, but perhaps not of biological import or diagnostic utility. If instead if goes from "not detected" to "readily detected, at high levels", then it's probably very useful as a diagnostic biomarker, and also indicative of some active biological process, be it widespread damage/release, or active expression of novel targets.
In either case you can make biological inferences without resorting to making up numbers so you can stick them on a volcano plot (and to be honest, if you get the kind of reviewers that demand volcano plots, you can always use the trick Albert suggests).
Volcano plots are primarily a way to take BIG DATA and present it in a manner that allows you to highlight the most interesting targets that have changed between groups: if you have whole swathes of genes that are instead present/absent, then those could be presented as a table, perhaps sorted by GO terms or something (if it looks like there are shared ontological categories you could use to infer underlying biology).
  • asked a question related to Bioinformatics
Question
3 answers
Hi, I am working on protein-protein interaction studies, specifically on antibody-antigen interaction. I would like to observe the changes in interaction if there's mutation occurs in the protein. Could anyone suggest a tool that can be used to induce substitution mutation to a targeted amino acid of a 3D protein and tools to validate that the mutation is not a nonsense mutation that produces truncated protein?
Relevant answer
Answer
Hey,
You need to consider a few things:
  1. Nonsense Mutations: Regarding your concern about nonsense mutations leading to truncated proteins, it's important to note that you don't need 3D modeling tools for this. Nonsense mutations, AKA stop-gain mutations, can be identified through basic sequence analysis since they involve a codon change that introduces a premature stop codon. Therefore, any sequence analysis tool that can read and interpret genetic codes can be used to identify if a mutation is a nonsense mutation.
  2. Mutation Induction: To induce substitution mutations at targeted amino acids in a 3D protein model, you can use software like UCSF Chimera (or Chimera X ). These tools allow you to manipulate amino acid residues.
  3. Protein Folding Prediction: If you're interested in how these mutations might affect protein folding, ChimeraX can integrate with AlphaFold. This integration can help predict how the altered amino acid sequence might fold. However, it's important to remember that structural predictions may not provide direct insights into the functional impact of the mutations. I'm not sure how informative this approach would be, but you can check out this video: https://www.youtube.com/watch?v=H-pDs9rZtkw
  4. Functional Analysis of Missense Mutations: For a more reliable approach to missense mutations, it's advisable to consult databases and tools that provide functional insights. As of 2023, a valuable resource for this is AlphaMissense - . AlphaMissense is specifically designed to predict the functional impact of missense mutations, offering a more targeted approach to understanding if these changes alter the function of the protein. They probably already tested your mutations, and you can find the score in the tables attached to the article.
  • asked a question related to Bioinformatics
Question
3 answers
I want to find the UTR sequence of mRNA sequence of bacteria protein. Can anyone suggest a insilico process for that
Relevant answer
Answer
Hi Harshita
lot of possibilities, but the main ones are to go to the NCBI or UCSC database (for instance, just type NCBI XXXX YYYY UTR region, where XXXX is your bacteria and YYYY your gene) in google.
or just give the species and target in research gate...maybe someone could answer ;)
all the best
fred
  • asked a question related to Bioinformatics
Question
3 answers
As of now, there is no public database available for this kind of sample to take as a control.
Relevant answer
Answer
To gain insights from your proteomic data in the context of pathways:
1. Protein-protein interaction networks: Construct protein-protein interaction (PPI) networks using available databases or tools. These networks represent the physical interactions between proteins and can provide insights into functional relationships and pathway associations. Analyze the network topology, identify highly connected proteins (hubs), and explore protein clusters or modules that may represent enriched pathways.
2. Functional enrichment analysis: Perform functional enrichment analysis using tools such as DAVID, Enrichr, or g:Profiler. These tools allow you to input a list of proteins and assess enrichment of Gene Ontology (GO) terms, biological pathways, or other functional annotations. This analysis can help identify overrepresented functions or pathways in your protein dataset.
3. Cross-referencing with gene-level data: If available, consider integrating your proteomic data with gene-level or transcriptomic data from the same samples or a related study. By mapping proteins to corresponding genes, you can leverage gene-level pathway analysis methods and identify pathways enriched with differentially expressed genes associated with the proteins of interest.
4. Literature-based analysis: Conduct a literature search to explore existing knowledge and studies related to the proteins identified in your proteomic dataset. Look for studies that have investigated the functions, interactions, or pathways associated with these proteins. This qualitative analysis can provide valuable insights into the potential involvement of specific pathways in your disease sample.
5. Pathway databases: Explore curated pathway databases such as Reactome, KEGG, or WikiPathways. These databases provide well-annotated pathways and can serve as a reference to investigate potential connections between your identified proteins and known pathways. Look for proteins within your dataset that are annotated to specific pathways of interest.
Remember that pathway analysis based solely on proteomic data has limitations.
Hope it helps:credit AI
  • asked a question related to Bioinformatics
Question
4 answers
I'm on the lookout for remote bioinformatics and computational biology opportunities where I can actively contribute to research projects. Compensation is not a priority for me; my main focus is to gain hands-on experience in these fields.
#biopython
#computational_biology
#bioinformatics
#biology
#R
Relevant answer
Answer
Avenues you can explore to find such opportunities:
1. Academic research institutions: Many universities and research institutions offer remote research positions or internships in bioinformatics and computational biology. Check their websites, job boards, and reach out to individual researchers or research groups who align with your interests.
2. Online job portals and platforms: Websites and platforms dedicated to remote work, such as LinkedIn, Indeed, and Upwork, often have listings for bioinformatics and computational biology projects. You can search for specific keywords like "remote bioinformatics," "computational biology," or "bioinformatics internships" to find relevant opportunities.
3. Open-source projects: Contributing to open-source bioinformatics projects can provide valuable hands-on experience. Explore bioinformatics software and libraries like Biopython, Bioconductor (for R), or other popular tools on platforms like GitHub. Contribute to their development, report issues, or collaborate with the community.
4. Online communities and forums: Engage with online communities and forums focused on bioinformatics and computational biology. These platforms, such as Bioinformatics Stack Exchange, BioStars, or community forums associated with specific software packages, often have job boards or project collaboration opportunities shared by researchers or organizations.
5. Networking: Attend virtual conferences, webinars, and workshops related to bioinformatics and computational biology. Connect with researchers, presenters, and fellow attendees to express your interest in remote research opportunities. Networking can often lead to potential collaborations or recommendations for available positions.
When searching for opportunities, it's important to tailor your search keywords to include relevant terms like "remote," "internship," "volunteer," or "project-based." Additionally, clearly communicate your enthusiasm, willingness to contribute, and desire for hands-on experience in your application materials or when reaching out to potential mentors or supervisors.
Hope it helps:credit AI.
  • asked a question related to Bioinformatics
Question
2 answers
Hi,
I am beginner in "Bioinformatics" and want to learn " how to analyse bacterial and fungal genomic data?". Would you suggest me some materials and sources so that I can devleop myself?
Note: My interest is now on " Bacterial and fungal genome and proteome analysis by using bioinformatics"
Relevant answer
Answer
there are few R packages as Vegan and Phyloseq which you can use for your intial analysis and beside this ANCOM-BC and QIIME2 is also available which basically based on linux environment.
  • asked a question related to Bioinformatics
Question
4 answers
Hello,
I am trying to construct phylogenetic tree of HIV-1. I downloaded sequences from few neighbor countries from Los Alamos HIV database. After aligning and trimming the length of sequences is usually 722 nucleotides. I can't trim less, because there are a lot of gaps within alignment file. When I construct Maximum Liklehood tree in FastTree or PhyML, the branches look very short. What could be a possible reason for it?
If 722 nucleotides length sequences can be used for constructing reliable phylogenetic tree?
Thank you!
Relevant answer
You can also give it a try with the MEGA Platform.
  • asked a question related to Bioinformatics
Question
4 answers
Hi,
I am beginner in "Bioinformatics" and want to learn " how to analyse bacterial and fungal genomic data?". Would you suggest me some materials and sources so that I can devleop myself?
Note: My interest is now on " Bacterial and fungal genome and proteome analysis by using bioinformatics"
Relevant answer
Answer
You can start by taking courses in bioinformatics from Coursera.
  • asked a question related to Bioinformatics
Question
5 answers
Hello fellow researchers,
I wanted to start a discussion on the exciting topic of the future of bioinformatics and its evolution. Bioinformatics has come a long way in recent years, but there are undoubtedly new frontiers to explore and challenges to overcome. What are your thoughts on the current trends, emerging technologies, and the potential impact of bioinformatics in the years to come? I'm eager to hear your insights and predictions on the future of this rapidly evolving field.
Relevant answer
Answer
I think the most relevant avenue to be pursued is eliminating the term 'informatic' that constraints the filed to a purely technical and ancillary role of 'software development' while what is needed is to develop a new 'biological statistical mechanics' allowing to face the complexity of biological systems after the total failure of deterministic 'gene-centric' era.
  • asked a question related to Bioinformatics
Question
4 answers
I am reaching out to #researchers in the field of #Biochemistry, #Biophysics and #Bioinformatics, for collaborative partnership in scientific research. The researcher should be academic staff at the tertiary institutions in following listed countries:
#Afghanistan
#Angola
#Bangladesh
#Belarus
#Belize
#Benin
#Bhutan
#Burkina Faso
#Burma
#Burundi
#CaboVerde
#Cambodia
#Cameroon
#CentralAfricanRepublic
#Chad
#Comoros
#Congo
#CookIslands
#Cuba
#Democratic People's Republic of Korea
#Democratic Republic of the Congo
#Djibouti
#Dominica
#EquatorialGuinea
#Eritrea
#Eswatini
#Ethiopia
#Gambia
#Ghana
#Grenada
#Guinea
#Guinea-Bissau
#Guyana
#Haiti
#Iran
#IvoryCoast
#Kenya
#Kiribati
#Kyrgyzstan
#Lao People's Democratic Republic
#Lebanon
#Lesotho
#Liberia
#Madagascar
#Malawi
#Maldives
#Mali
#Marshall Islands
#Mauritania
#Micronesia (Federated States of)
#Mozambique
#Myanmar
#Nauru
#Nepal
#Nicaragua
#Niger
#Niue
#Palau
#PapuaNewGuinea
#Moldova (Republic of)
#Rwanda
#SaintHelena
#SaintLucia
#SaintVincent and the #Grenadines
#Samoa
#SaoTome and #Principe
#Senegal
#Sierra Leone
#SolomonIslands
#Somalia
#SouthSudan
#Sudan
#Suriname
#Syrian Arab Republic
#Tajikistan
#Timor-Leste
#Togo
#Tokelau
#Tonga
#Tuvalu
#Uganda
#Ukraine
#Tanzania (United Republic of)
#Vanuatu
#Yemen
#Zambia
#Zimbabwe
Interested researcher should kindly email to [email protected] with the subject: Research Collaboration from "your country".
Thanks.
Toluwase H. Fatoki
Visionary @ Heze-Sapience International, Nigeria.
Lecturer @ Department of Biochemistry, Federal University Oye-Ekiti, Nigeria.
Relevant answer
Answer
And why don’t you want any collaboration from Nigeria?
  • asked a question related to Bioinformatics
Question
2 answers
Our lab have a bioinformatics project about developing a functional enrichment software. We have several ideas but we realize we need real feedback from wet lab researchers as well to make sure our functional enrichment web application will be reliable and useful for all of you.
Therefore, if you are a wet lab scientist who have experience using functional enrichment software (such as Metascape, DAVID, etc), what kind of questions do you want to address in the functional enrichment result? Are there any information that they are still unable to give to you?
Relevant answer
Answer
so, what is your experience when you use a functional enrichment analysis application? do you think its result satisfy your expectation or you think it still has a room for improvement?
  • asked a question related to Bioinformatics
Question
3 answers
I already know the pathway but want to know the upstream lncRNAs that regulates that pathway using the datasets and bioinformatics.
Relevant answer
Answer
I think you can do correlation analysis between lncRNA expression, For each gene within the pathway, calculate the correlation coefficients between its expression and the expression of all known lncRNAs (list can be obtained from LNCipedia etc). then using the approriate threshold can narrowdown the list.
or the other method,
As we know the pathway then get the gene list for that pathways and can Search for lncRNAs that are located near the genes in the pathway and may act as cis-regulatory elements/ trans by using tools that predict the potential of lncRNAs to act as trans-regulatory elements by interacting with genes at the transcriptional or post-transcriptional level like LncTar (https://www.cuilab.cn/lnctar)
hope that helps
  • asked a question related to Bioinformatics
Question
1 answer
In their website they mentioned it's IF is 5.8. But in the JIF2022 report, I did not find. Is it because of its inclusion in the Emerging Source Citation Index? and because of not included in the "Science Citation Index Expanded" Please help.
From where can I get valid IF. One more thing, this journal is not included in BioxBio, have checked.
Relevant answer
Answer
Hi,
check the ISSN of the journal in the master journal list (https://mjl.clarivate.com/home) or in the scientific journal ranking (SJR) (https://www.scimagojr.com/journalrank.php) site to see its IF and Q ranking.
If it is a valid journal, you should find its information on these sites.
BW
  • asked a question related to Bioinformatics
Question
1 answer
bioinformatics
Relevant answer
Answer
I guess it would depend on the context, but generally "frame" refers to a sequences that encodes a peptide/protein. "Query" is the user input, (ie sequences you enter) and "subject" refers to the reference sequence
  • asked a question related to Bioinformatics
Question
6 answers
Could someone explain to me why the p-value in the right column of the forest plot is different than the p-value in the test for effect in the subgroup?
I thought that these two p.values should be the same.
Relevant answer
Answer
Now coming to your table p-value in the right column of the forest plot is the p-value for the overall test of the treatment effect across all subgroups. It is calculated by combining the results of the individual studies in the meta-analysis. In this case, the p-value is 0.56, which is not statistically significant.
The p-value for the test for effect in the subgroup is the p-value for the test of the null hypothesis that the treatment effect in the subgroup is equal to zero. It is calculated using only the data from the studies in the subgroup. In this case, the p-value for the test for effect in the subgroup is 0.094035, which is statistically significant.
The two p-values are different because of the heterogeneity between the studies in the meta-analysis. The heterogeneity statistic (0.5) is very high, which indicates that there is a lot of variability in the treatment effects across studies. This variability could be due to a number of factors, such as different study designs, different populations of patients, and different treatment regimens.
When there is heterogeneity in the treatment effects across studies, it is more difficult to detect a significant overall treatment effect. This is because the variability in the treatment effects across studies can mask the true effect of the treatment.
In this case, the p-value for the overall test of the treatment effect is not statistically significant, but the p-value for the test for effect in the subgroup is statistically significant. This suggests that the treatment may be effective in the subgroup, but it is not possible to draw a definitive conclusion without further research.
It is important to note that a statistically significant p-value for the test for effect in a subgroup does not necessarily mean that the treatment is clinically effective in that subgroup. It is possible that the difference in the treatment effect is small or that it is not clinically meaningful.
To determine whether the treatment is clinically effective in a subgroup, it is important to consider the magnitude of the difference in the treatment effect and the clinical implications of that difference
  • asked a question related to Bioinformatics
Question
4 answers
Hello, I've recently been studying Ancestral Sequence Reconstruction (ASR), attempting to infer ancestral sequences of viruses. I understand that this inference is constrained by factors like sample size and models, and represents a plausible sequence that may have existed. However, I'm curious about whether directly comparing these inferred ancestral sequences holds biological significance. Can they reflect the differences among the extant sequences from various lineages that were used to infer them?
Relevant answer
Answer
Hongzhuang Chen I am afraid that you can lose a lot of information from such comparison. But, it can be applied (and very useful) to illustrate the differences supported statistically by analysis of the original data (sequences).
  • asked a question related to Bioinformatics
Question
3 answers
Dear All,
Ph.D. full-time position in Bangalore with fellowship:
Eligibility: M.Sc. Chemistry/Biochemistry/Biotechnology/Microbiology/Bioinformatics with first class of 60%.
GATE or UGC-NET or UGC-CSIR or SLET or JRF should be qualified.
RS 25,000 per month for full three years will be given.
For further details, contact me on: +919182864256. Call or what's app me for further details.
Relevant answer
I apologize and excuse the owner of the post. I would like to invite you to read my ebook and discover why microorganisms are so fantastic. https://www.amazon.com.br/dp/B0CF1VKKK8
  • asked a question related to Bioinformatics
Question
4 answers
I am trying to analyse mutation data for endometrial cancer obtained from different studies within several databases (COSMIC, cBioportal, Intogen). I have collated the data and grouped the mutations by gene. The focus of the analysis are non-synonymous coding mutations - because these mutations are most likely to cause a change in the normal protein function.
The aim of the study is to understand the mutational landscape of Endometrial cancer. The main objectives of the study are to find the commonly mutated genes in endometrial cancer, to find significantly damaging gene mutations in endometrial cancer and to create an updated list of genes comparable to commercial gene panels.
I have created this table with the collated data:
  1. Gene name
  2. Number of samples with coding mutations
  3. Frequency ( number of samples with coding mutations / total number of samples with coding mutation)
  4. CDS length
  5. Total number of unique coding mutations
  6. Number of unique coding: synonymous mutations
  7. Number of unique coding: non-synonymous mutations
  8. Mutation burden (number of unique coding: non-synonymoys mutations / CDS length)
  9. Composite score [(frequency of samples * 0.7) + (mutation burden * 0.3)]
The idea here is to use mutation burden to imply damaging effects of the genes' mutations in endometrial cancer. We then created a composite score to use as a comparable figure between the genes.
At the moment, our list of genes is at 16,000+. We are currently trying to think of a way to narrow down the list of genes to only focus on those significantly mutated compared to the other genes by way of statistics. Any advice is greatly appreciated.
Relevant answer
Answer
The significance of gene mutation burden in endometrial cancer data collated from different studies can be assessed using statistical methods such as Fisher’s exact test and logistic regression.
  • asked a question related to Bioinformatics
Question
2 answers
We had sent some phytoplankton samples for sequencing. And we had just received the generated sequences, and the next step was to do BLAST to identify what the phytoplankton that we sent is. Basically DNA Barcoding.
To give some context, when we send our samples for sequencing to the sequencing facility, they send us back two files, one for the forward sequence and another for the reverse sequence, based on the primers (forward and reverse) we gave.
So, the initial step involves us checking the quality of the sequences, specifically looking for any signs of low quality, ambiguity, or overlapping signals in the chromatograph.
Now, I'm a bit uncertain about the next steps.
The following step would be sequence trimming. To do this, I need to identify the start of each sequence by locating the primer sequence. This means finding the forward primer sequence in the generated forward sequence and doing the same for the reverse primer in the reverse sequence.
Afterward, I perform reverse complementation on the reverse sequence.
Following that, I conduct a pairwise alignment between the generated forward and reverse sequences and subsequently generate the consensus sequence.
My questions are, as I am a bit stumped with this (I apologize in advance, I'm a bit new with bioinformatics), (1) what if neither of the generated sequences have the primer sequences? Would that mean the sequences generated were of bad/low quality? and (2) Is this approach correct, or have I missed a crucial step?
Thank you!
Relevant answer
Answer
With some sequencing technologies up to the first 50 bases read tend to be unreliable so do not pass quality control. This means that often your primers are already cropped from the 5' end. I find it best to just align the forward and reverse sequences and see how much overlap you are getting.
  • asked a question related to Bioinformatics
Question
6 answers
I have extensively searched google scholar but I am struggling to find any groups who have previously used Rosetta to conduct ab-initio structure modelling of single-pass or membrane anchored proteins and I'm specifically not talking about homology modelling just ab-initio.
Please let me know if you have read any papers or know anyone who has done this,
thanks.
2nd year PhD student at University of Liverpool.
Relevant answer
Answer
Waqas Abbasi, Hi- No Rosetta-Membrane did not work well at all for this task and I would not waste your time attempting to do so, unless you have preliminary signs indicating positively for your protein (or your membrane anchored protein has some specific other attributes meaning it may work better). I would suggest firstly you thoroughly reading Anne Marie Honegger's post above and the link/paper they kindly provided there though.
I would strongly suggest you instead just try and use the new generation methods RosettaFold and AlphaFold2 as they seem to be able to position single/double TMHs away from the membrane-associated globular regions/domains of the chains better, albeit still fa from perfect. However, there does seem to be some signs in the near-ish future similar methods may be released better for such instances of TMH-anchored membrane proteins, but it remains to be seen.
Best of luck,
David
  • asked a question related to Bioinformatics
Question
1 answer
Dear all,
I'm working on the finer details of my experimental design, and have some questions regarding bridging channels for TMT based experiments.
I have two conditions to test, across nine biological replicates, in order to run as one 18-plex TMT-pro experiment.
I am aware of the use of one or more bridging channels being used with pooled samples to combine multiple TMT mixtures, however a colleague has mentioned that a bridging channel should also be considered for normalisation if only one set is used.
Does anyone have any experience using a bridging channel for normalisation in a single mixture? Is it worth sacrificing one or more biological replicates for?
I will be using MSstatsTMT for normalisation and summarisation.
Sam
Relevant answer
Answer
As an update to this discussion, I have decided to reduce my sample size and incorporate a pooled reference channel. Mostly to open up the possibility of integrating additional samples and conditions in the future.
Sam
  • asked a question related to Bioinformatics
Question
1 answer
Hello there,
I'm searching for reliable bioinformatics/immunoinformatics tools for predicting the immunogenicity of B-Cell Epitopes. Your expertise is invaluable! Could you kindly recommend any devices that have proven effective in this area? Your insights will significantly contribute to advancing our understanding of immunogenicity prediction.
Thank you in advance for your suggestions!
  • asked a question related to Bioinformatics
Question
8 answers
Molecular dynamics simulation , bioinformatics , molecular docking
Relevant answer
Answer
RAM= 32 GB or higher
Processor= Intel core i7 or higher
High-end GPU instead of CPU
Linux OS
I would suggest using a workstation instead of a laptop.
  • asked a question related to Bioinformatics
Question
5 answers
Are you familiar with Research4Life? It's a program that provides free or low-cost access to scientific research in low-income countries. Research4Life has two eligibility lists: Group A and Group B. Group A includes countries with the lowest gross domestic product, lowest human development index, and other factors that indicate lower-income countries. As an immunoinformatics, Bioinformatics and Molecular Modelling researcher, I'm calling on researchers from Research4Life's Group A countries to join me in collaborative research efforts. By working together and utilizing the program's valuable resources, we can advance our research and make a difference in the world. Best of all, with this collaboration, it will be completely free. #Research4Life #immunoinformatics #bioinformatics #molecularmodelling #collaboration
Relevant answer
Answer
I’m interested
  • asked a question related to Bioinformatics
Question
4 answers
Hello everyone; I am new to R programming. I want to calculate the firmicutes to Bacteroides ratio from my OTU table. I couldn't find the command and don't know how to do it. Please guide me on this.
I put an example of my OTU table.
Relevant answer
Answer
Thank you for this...
  • asked a question related to Bioinformatics
Question
1 answer
Hello,
I measured the distance between two centers of mass during a MD run using gmx distance. Even though the -oall file shows me that the distance changed over time the histogram file -oh puts 100% of probability on the last bin.
As this makes no sense does anyone have an idea on what happened?
Both files are attached
Thank you very much in advance and have a nice day!
Relevant answer
Answer
try adding the -len flag for the mean distance you are expecting and add the -binw flag for the bin width so you have less bins. It seems like it only makes so many bins and then last bin will have 100% probability if all the prior bins are unfilled. So for my example i had a distance of 5.1nm and i set the average to 3 and the binw to 0.1 like this -len 3 -binw 0.1
  • asked a question related to Bioinformatics
Question
10 answers
I have been trying to dock a certain protein with nd ion i downloaded from rcsb but after i add it to pyrx and try to convert it to ligand i get the following error. I tried converting the sdf file to pdb using pymol, chimeraX, avogadro, open babel but even then when i open the file it gives me this error: ligand: :UNK0:Nd and ligand: :UNK0:Nd have the same coordinates. Could someone please help?
Update: I want to dock an unbound protein with the neodymium metal ion which i downloaded from rcsb in sdf format and later tried to convert it to pdb using the aforementioned softwares for autodock to accept it but i can't get it to be accepted by autodock as a proper ligand. Apparently I am unable to get any of the rare earth elements to be accepted properly as ligands.
Relevant answer
Answer
Hello Piyush. I am not able to completely understand your problem. Did you download a protein with an ion "nd" that you want to re-dock with using pyrx? Or did you separately downloaded the ion file and want to perform docking with the unbound protein?
  • asked a question related to Bioinformatics
Question
4 answers
I am in urgent need of list Bioinformatics journals without APC
Relevant answer
Answer
thank you maryam ,
i have used this finder some times but not aware of these settings. ll use it soon. thanks
is there any other option available for the same
  • asked a question related to Bioinformatics
Question
3 answers
I know many websites have simple tools like transcription and translation available, but are there any analysis tools that researchers need that either do not exist or are not publicly available? It could be anything from algorithms to visuals. Thanks!
Relevant answer
Answer
Abhijeet Singh Thank you for your response and mentioning my earlier post! My belief is that researchers would know tools that are missing based on the fact that they would run into such problem often during their research. If there is some manual analysis task that researchers can automate, I believe that PeptiCloud can be the perfect platform to develop and make those tools publicly available. (For instance, PeptiCloud has a unique feature that allows users to further alter codon sequence of each amino acid after codon optimization with respect to a specific bacterial strain). With that being said, if you could check out PeptiCloud for yourself and see if anything could be added or improved, that would be greatly appreciated!
  • asked a question related to Bioinformatics
Question
2 answers
Hello All,
I am very new to bioinformatics and biological data , please bare with my question.
I have differential expression data of three, Parental cellines(drug sensitive ) and 10 isoforms (made resistant to the drug) by these three parental cells.
Is the data enough to generate a coexpression network.?
I Have tried constructing it using GWENA , and was also successful but I am not confident about it because of two reasons one number of samples and second can isoforms be treated as samples or not.
I would really appreciate any suggestions and anr reading resource that can be helpful in this regard.
Thankyou
Relevant answer
Answer
Thankyou so much Susanta for your reply ,
can you suggest way of network analysis on this data or any good resource to read relevant to this
  • asked a question related to Bioinformatics
Question
4 answers
In recent years, number of vaccine have been approved to fight against Covid-19, list of approved is available at FDA site. We are looking for sequence of these vaccine (RNA sequence in case of mRNA vaccines and amino acid sequence in case of protein based vaccines. I will highly appreciate help of community in searching sequence of vaccines.
Relevant answer
Answer
  • asked a question related to Bioinformatics
Question
4 answers
Greetings,
I have recently isolated a new E.coli phage and during the assessment of its host range, I discovered that this particular phage was effective against Pseudomonas aureginosa and staphylococcus aureus in wet lab experiments. However, upon examining the complete genome of the phage on NCBI, I noticed that it did not exhibit any similarities with known P. aureuginosa and S. aureus phages. Additionally, when I performed a blastp analysis on all the phage proteins in NCBI, I could not identify any homology with the aforementioned P. aureuginosa and S. aureus phages. Normally, I would expect to observe some degree of homology, especially in proteins responsible for recognition, such as tail proteins or lytic proteins.
My question is how I can determine the wide host range of the phage based on its genome. It appears that bioinformatic tools should provide information regarding the extent of the phage's host range. I would greatly appreciate your comments and recommendations on this matter.
Thank you.
Relevant answer
Answer
I don't think you can predict host range using bioinformatics tools There are so many subtleties that impact host range, both in terms of gene expression, chaperons that influence folding of proteins, and most importantly the interactions with the host receptor for phage binding and injection. We don't yet know enough to predict except in a very few well studies examples.
  • asked a question related to Bioinformatics
Question
24 answers
Here is list of Impact factor 2023.
Journal Citation Reports 2023
Relevant answer
Answer
This is not the complete list ... where are all the Human Resource Management journals, for example?
  • asked a question related to Bioinformatics
Question
4 answers
Has any of you ever done research in the field of bioinformatics?
Relevant answer
Answer
I am a bioinformatician, let me know what specific query you have. Bioinformatics has many subfields like genomics, proteomics etc.
  • asked a question related to Bioinformatics
Question
5 answers
I want to annotate each gene in the Homo sapiens taxon with its respective GO terms and its hierarchical parent terms in the GO database. How can I systematically do that? While I am aware that the obo file contains information such as "is a," "part of," and "regulates," it lacks a comprehensive hierarchy from child GO terms to all their parent terms. Is there an existing method available to achieve this systematic annotation, or do I need to develop a custom script to extract this information from the obo file?
Relevant answer
Answer
Mohammad Shahbaz Khan Certainly! Although the data is currently presented in Gene Ontology (GO) format, I want to create a comprehensive graph that visualizes the entire information. Further, I intend to annotate each gene with its corresponding GO term, including all parent terms associated with each gene.
  • asked a question related to Bioinformatics
Question
4 answers
I have been experimenting with machine learning in JavaScript, please, let me know also your experience! 😎🤗😍
In attachment a preprint!
Relevant answer
Answer
Feel free to add more details on your perspective!
  • asked a question related to Bioinformatics
Question
5 answers
Dear ResearchGate Community,
I am currently engaged in single-cell analysis for my research project and would greatly appreciate your insights and experiences regarding the use of Seurat and ScanPy.
I have been exploring both Seurat and ScanPy as tools for analyzing single-cell RNA sequencing (scRNA-seq) data. However, I would like to gather more information about these packages directly from researchers who have bioinformatic hands-on experience with them.
Specifically, I would be grateful if you could share your thoughts on the following:
1. Which package (Seurat or ScanPy) have you used for scRNA-seq analysis, and what were your primary reasons for choosing it? Is it depending on familiarity with programming languages (R for Seurat and Python for Scanpy)?
2. What are the notable features, strengths, or advantages of the packages you have worked with?
3. Were there any challenges or limitations you encountered while using the packages, and how did you address them?
4. Have you encountered any specific use cases or applications where one platform outperformed the other?
5. Are there any particular resources, tutorials, or best practices you found helpful when working with Seurat or ScanPy?
Your firsthand experiences and insights would be immensely valuable in helping me make an informed decision about which package to choose and understanding potential considerations for my single-cell analysis workflows.
Thank you in advance for taking the time to share your expertise. I look forward to hearing from you and benefiting from your valuable insights.
Best regards,
Emil Lagumdzic Institute of Immunology Department of Pathobiology
University of Veterinary Medicine Vienna
Relevant answer
Answer
Thank you, ChatGPT.
  • asked a question related to Bioinformatics
Question
4 answers
Is the hierarchical structure observed in the Gene Ontology (GO) OBO-basic file limited to the 'is a' relationship, or do the relationships 'has part' and 'regulates' also exhibit a similar hierarchical nature and can be propagated to the root?
Relevant answer
Answer
The hierarchical structure observed in the Gene Ontology (GO) OBO-basic file is primarily based on the "is a" relationship, which represents the parent-child relationship between terms. The "is a" relationship defines a broader term (parent term) and a more specific term (child term), indicating a hierarchical structure.
However, the GO also incorporates other types of relationships beyond "is a" to capture additional aspects of gene function. Two such relationships are "has part" and "regulates":
  1. "Has part" relationship: This relationship indicates that a term represents a part of another term. It describes a physical or functional subcomponent of a larger entity. While the "has part" relationship does not strictly follow a hierarchical structure, it provides additional information about the organization and composition of biological processes or structures.
  2. "Regulates" relationship: This relationship describes the regulatory interactions between terms. It indicates that a term controls or influences the activity or expression of another term. Similar to the "has part" relationship, the "regulates" relationship does not strictly conform to a hierarchical structure.
Although the "has part" and "regulates" relationships do not exhibit a hierarchical nature like the "is a" relationship, they can still be informative for understanding the functional relationships between terms. However, the propagation of these relationships to the root of the hierarchy is not as straightforward as it is with the "is a" relationship. The propagation of relationships to the root may require additional analysis and considerations based on the specific research context or requirements.
  • asked a question related to Bioinformatics
Question
5 answers
I am looking for data from mammals ideally, but I will take anything to be honest. I am getting to grips with bioinformatics and need a practice data set with which I can go through the steps of filtering and trimming and mapping to a reference genome etc..
If anyone also has any advice on tools used subsequently for analysis such as MethylKit that would be awesome.
Thank you
Relevant answer
Answer
The majority of bioinformatics tools offer sample data. Instead, you may use the data. For instance, test data may be found at https://github.com/FelixKrueger/Bismark.
  • asked a question related to Bioinformatics
Question
7 answers
I prefer to join 2 drug molecules (cocktail) using bioinformatics approach. Are there any tools available for it? Any software available where one can submit the individual structure of the drug molecules and receive the merged drug molecules?
Relevant answer
Answer
Susanta Roy Very many thanks for answering my query. I just have one more query. Is there any possibility to check for synergism between 2 ligands when docked with a protein? That is multiple ligand dockings? Any protocols
do you know for that?
  • asked a question related to Bioinformatics
Question
4 answers
I have a protein sequence with two cysteine residues and I would like to predict if those cysteins will form disulfide bonds.
I am looking for user-friendly tools to do this, either online tools or some other kind of easy to use software, since I am not well-versed in bioinformatics.
  • asked a question related to Bioinformatics
Question
7 answers
Please provide useful insights and general experiments required for designing lab manual.
Relevant answer
Answer
Sanjay Nagar Mohammad Shahbaz Khan Sabine Strehl Sanjay, with the addition of Mohammad's suggestions it looks to me that you now have information to make the best manual ever. - Add my "thank you" to Mohammed & Sabine.
  • asked a question related to Bioinformatics
Question
1 answer
Hi there,
I'm comparing the arrangement of a gene complex across different species to try and find clues about its evolutionary history. In some cases genes appear to have jumped around and switched positions, but I do not know if this is the result of recombination, or due to the orientation in which the chromosome has been assembled?
I'm taking data from the NCBI genome browser using ref seq chromosome level assemblies in each case. Does anyone know if there a standard direction that homologous chromosomes have to be uploaded in?
I imagine this is perfectly possible to do if you consider the positions of conserved genes at each end of the chromosome, but I would rather not have to do this myself if I know that it has already been accounted for...
Thanks,
Jake
Relevant answer
Answer
Yes, there is a standard orientation for chromosomes to be assembled in most genome sequencing and assembly projects. Chromosomes are typically assembled with a consistent orientation known as the "forward" or "+" orientation. In this orientation, the DNA sequence is aligned from the 5' end (start) to the 3' end (end) of the chromosome. This convention ensures consistency in the representation of genomic information across different studies and facilitates comparisons between different genomes.
The forward orientation is determined based on the directionality of DNA replication during genome sequencing and assembly processes. The DNA strands are typically sequenced in both directions, and the resulting reads are then aligned and assembled into contigs, which are further scaffolded to construct chromosome-level assemblies.
It's worth noting that in some cases, certain regions or genes within a chromosome may be inverted or have a reverse orientation due to specific biological features, such as gene rearrangements or evolutionary events. However, the majority of the chromosome is assembled and represented in the forward orientation to maintain consistency and standardization in genome research.
  • asked a question related to Bioinformatics
Question
6 answers
If I have a sequence (genome.fasta). And I want to check the gene located in 400nt -500nt.
What bash script (I have WSL in my windows) I should use or are there any conda packages ?
Thank you in advanced
Relevant answer
Answer
To extract a sequence from a larger genome file based on a specific location, you can use various command-line tools available in Bash. you can achieve this using the samtools and bedtools utilities, which can be installed via conda.
  • asked a question related to Bioinformatics
Question
3 answers
Is there any server or tools (bioconda, java, etc.) to exclusively annotate membrane protein only (similar to dbCAN for polysaccharides) from a bacterial genome?
Thank you in advanced!
  • asked a question related to Bioinformatics
Question
3 answers
Hi - I'm currently working with two RNA-Seq studies; one has RNA extracted from whole blood, the other PBMCs. Eventually we want to combine these data and perform some cell-specific deconvolution to look at DEGs.
Are there any recommended methods for batch correcting these data from different sources?
Mari
Relevant answer
Answer
It is better to consider batch as a factor in the design formula. The tximport pipeline proposed by Michael Love himself offers the most useful solution. Please have a look.
  • asked a question related to Bioinformatics
Question
3 answers
I am interested in predicting the protein structure of my protein of interest. Using NCBI BLAST, I found an experimental structure that corresponds to a domain of my protein, showing 24% query coverage and 100% similarity. My question is whether I can confidently use this experimental structure as a template for homology modeling, or if I should explore alternative techniques such as threading, ab initio modeling, or any other suitable approach. I would also appreciate recommendations for relevant servers or software that can assist in this case.
Thank you for your insights and suggestions.
Relevant answer
Answer
Quite honestly if your protein isn't too large, i.e., to many amino acids for it I would just use AlphaFold or ESMFold and compare the best model with the resolved one by aligning on this region. I think the models (or variants of it that participated listed in the previous post all do have lower performance in the last CASP competitions then AF had. Although I haven't checked this ^^
RosettaFold would also be a good option.
Of course homology modeling can still work pretty well, but usually only if you have good templates and ideally many of them. But if you have regions that basically are missing in your templates and those are significant it usually doesn't really work that well.
  • asked a question related to Bioinformatics
Question
3 answers
I'm looking for an online course of Bioinformatics with a delivered certificate?
Relevant answer
Answer
There are several online courses in bioinformatics that offer a certificate upon completion. Here are a few reputable platforms where you can find such courses:
1. Coursera (www.coursera.org): Coursera offers a wide range of bioinformatics courses from universities and institutions worldwide. Some popular courses include "Bioinformatics Specialization" by the University of California, San Diego, and "Genomic Data Science Specialization" by Johns Hopkins University. These courses provide certificates upon completion.
2. edX (www.edx.org): edX offers bioinformatics courses from renowned universities such as MIT, Harvard, and the University of Toronto. Courses like "Introduction to Bioinformatics" by UC San Diego and "Data Analysis for Life Sciences" by Harvard University provide certificates upon successful completion.
3. Udemy (www.udemy.com): Udemy has various bioinformatics courses taught by instructors from different backgrounds. While the quality may vary, you can find courses like "Bioinformatics: Introduction & Methods" and "Bioinformatics with Python" that provide completion certificates.
4. Bioinformatics.org (www.bioinformatics.org): Bioinformatics.org offers online courses and tutorials in bioinformatics. Though they may not provide certificates, their courses cover a wide range of bioinformatics topics and can be a valuable learning resource.
5. Rosalind (www.rosalind.org): Rosalind is an online platform that focuses on bioinformatics through problem-solving. While they don't offer certificates, their interactive exercises and challenges can enhance your bioinformatics skills.
Remember to review the course syllabus, duration, and prerequisites before enrolling to ensure it aligns with your needs and skill level. Additionally, consider checking if the certificate offered by the course is recognized or accredited by relevant institutions or organizations.
  • asked a question related to Bioinformatics
Question
4 answers
Greetings!
I have an issue that drives me crazy this evening...
I have a list of gene vectors, downregulated in different transgenic plants and I want to make a Venn diagram to visualize it and to show the intersections between plants.
But! The results from any package I used (in R) gaves me something like this (the uploaded picture 1)...
What's bothering me:
1. The numbers on "clear" (not intersected) parts of a diagram are lower, than the gensets I have. And I tried to use factor instead of character vectors, to remove possible duplications, to remove symbols (like space) that could cause software misunderstanding - all gaves me nothing... same result.
2. The intersection of vectors is not true - on the picture you can see that the intersection of 2 datasets (of 365 and 154 genes) - is 1133 genes!! How could that be?
The manual usage of intersect function on the same dataset gaves pretty correct results.
Maybe I am misunderstanding about Venn diagrams? Because in a web I found many examples of such strange mistakes - on the second picture from Datanovia you can see that the intersection of the red elliplse (of 58) and yellow (of 144) is 66!
It seemes logical to me that the intersection of 2 vectors cannot be greater than the length of a smaller vector. What am I doing wrong or misunderstanding?
Relevant answer
Answer
I believe Rob is correct.
Since you are using the intersect function, the numbers in your figure (e.g. 365 and 154) are the number of genes without any intersection.
The total genes of each set (e.g. OE21) will be the sum of all the numbers in each intersection + genes with no intersection. I couldn't do full sum for you as the core intersection number is missing.
  • asked a question related to Bioinformatics
Question
6 answers
Hello everyone,
I am not good at R so I am trying to find solutions for my problems through the internet. I have been stuck on a problem. I couldn't find a way to compare the means of groups separated by facet function. Maybe I should not have put x axis as it is now but I wanna make sure. Here is the shorter version of my code for you to have a look at:
my_comparisons <- list( c("Hybrid","Single"))
ggplot(data = rpkms_new2, aes(x = strand, y = log2(RPKM), fill=strand, label = strand))+
geom_violin(scale = "count", alpha=0.5)+
facet_grid(~Trans, switch = "x", scales = "free_x", space = "free_x") +
theme(plot.title = element_text(hjust=0.5))+
theme(panel.spacing = unit(0, "lines"),
strip.background = element_blank(),
strip.placement = "outside") +
stat_compare_means(ref.group = "None", aes(label = ..p.signif..), method = "wilcox")+
stat_compare_means(comparisons = my_comparisons, aes(label = ..p.signif..), method = "wilcox")+
geom_text(data = mean_ranks, aes(x = strand, y = -Inf, label = round(rank, 0)), size = 3, vjust = -1)
How should I modify my code to be able to compare all the subgroups(single and hybrid) with the "None" group ?
My data looks like below:
STRAND TRANS VALUES:
sense hybrid 2
sense hybrid 2
sense single 3
sense single 7
antisense hybrid 10
antisense hybrid 12
antisense single 1
antisense single 2
none none 1
none none 4
Relevant answer
Answer
Thumbs up Tuba Sena Ogurlu
  • asked a question related to Bioinformatics
Question
3 answers
I am currently an Indonesian high school student passionate about bioinformatics and its potential to drive impactful innovations in the fields of biology and medicine. I am eager to participate in the Regeneron International Science and Engineering Fair and showcase a research project that can make a significant contribution to the scientific community.
Considering the vast possibilities within the realm of bioinformatics, I would greatly appreciate any suggestions, ideas, or insights for a research project that aligns with the following criteria:
  1. Impactful Innovation: I am looking for a research topic that has the potential to make a significant impact in the biology or medical world. It could involve the development of new algorithms, computational tools, or methodologies that address critical challenges in these domains.
  2. Bioinformatics Focus: The research should predominantly involve bioinformatics techniques, such as data analysis, data mining, machine learning, genomics, proteomics, or other computational approaches. It should leverage the power of data and computational tools to gain insights into biological processes or contribute to medical advancements.
  3. Feasibility for a High School Student: As a high school student, I have certain limitations in terms of resources, time, and expertise. Therefore, I am seeking research ideas that are feasible for a high school-level project. While the topic should be challenging enough to meet the standards of the Regeneron ISEF, it should also be manageable within the scope of a high school research project.
Thank you in advance for your valuable suggestions and insights.
Relevant answer
Answer
if you are passionate about bioinformatics and its application in medical industry then there is a lot of research going on in molecular and functional genomics now a days. certainly, you will have diverse arena of research from research from metagenomics to single cell RNA sequencing. If you like you can also try to develop a computational pipeline to analyze publicly available cancer genomics data, such as The Cancer Genome Atlas (TCGA) dataset. Focus on identifying potential biomarkers, genetic variants, or gene expression patterns associated with specific types of cancer, aiming to contribute to personalized medicine and targeted therapies. you should read about this and need to have clear understanding.
  • asked a question related to Bioinformatics
Question
3 answers
Hello everybody, I'm a master degree student. I'm working with 16S data on some environmental samples. After all the cleaning, denoising ecc... now I have an object that stores my sequences, their taxonomic classification, and a table of counts of ASV per sample linked to their taxonomic classification.
The question is, what should I do with the counts for assessing Diversity metrics? Should I transform them prior to the calculation of indexes, or i should transform them according to the index/distance i want to assess? Where can I find some resources linked to these problems and related other for study that out?
I know that these questions may be very simple ones, but I'm lost.
As far as I know there is no consensus on the statistical operation of transforming the data, but i cannot leave raw because of the compositionality of the datum.
Please help
Relevant answer
Answer
Assessing diversity metrics in 16S data is an important step in analyzing microbial communities. Handling count data in this context can be challenging due to the compositional nature of the data, as you mentioned. While there is no one-size-fits-all approach, there are several techniques and considerations you can explore. Here are some suggestions:
  1. Transformations for diversity metrics: The choice of transformation depends on the diversity metric you want to assess. Common transformations include rarefaction, normalization (e.g., by library size or cumulative sum scaling), or transformations that aim to address compositionality, such as log-ratio transformations (e.g., centered log-ratio, clr transformation) or Hellinger transformation. Different transformations may be more suitable for specific diversity metrics, so it's essential to consider the metric's assumptions and properties.
  2. Compositional data analysis (CoDA): Compositional data analysis provides a statistical framework to analyze and interpret compositional data. It accounts for the constrained nature of relative abundance data by working on transformed data. CoDA methods, such as ALDEx2 or ANCOM, can help identify differentially abundant features between groups while considering the compositional structure.
  3. Multivariate analyses: If you want to explore the overall community structure and relationships, multivariate techniques like principal component analysis (PCA), correspondence analysis (CA), or non-metric multidimensional scaling (NMDS) can be employed. It's advisable to perform these analyses on transformed data to mitigate the effects of compositionality.
  4. Research articles and resources: To delve deeper into the subject, you can refer to scientific articles and resources that discuss the statistical analysis of 16S data. Some useful references include: "Microbiome Analysis Methods" by Paul J. McMurdie and Susan Holmes. "A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses" by Egoitz Martínez-Costa et al. "Statistical analysis of microbiome data with R" by Yinglin Xia et al. "MicrobiomeSeq: An R package for analysis of microbial communities in an environmental context" by Paul McMurdie and Susan Holmes. These resources provide insights into various statistical approaches, transformations, and analysis techniques for 16S data.
Remember that there is ongoing research in the field, and best practices continue to evolve. It's important to critically evaluate the methods, consider the specific characteristics of your data, and consult with your advisor or peers with expertise in microbiome analysis to make informed decisions about data transformations and diversity metric assessment.
  • asked a question related to Bioinformatics
Question
7 answers
I'm interested in studying specific missense mutations in a human gene. My goal is to determine whether the mutated region of the protein is conserved across various species. Could you please guide me on how I can use in silico tools to find homologous protein sequences and identify their conserved regions?
Thank you very much
Relevant answer
Answer
That's a good approach Susanta Roy I would add that once you are working with your multiple sequence alignment (MSA) in Jalview (https://www.jalview.org), you load an experimental 3D protein structure, or an AlphaFold model (all possible from Jalview, just right-click on a sequence label), and visualise the mutations and conservation scores on the structure too. Jalview makes this easy by colouring the structure by the sequence, so you can choose to colour by conservation and add features to represent your mutations and they will instantly be viewable on the structure.
The other thing I would add is that in addition to BLASTing the full-length protein, have a look at it on InterPro and see what domains it has. Then you can work with curated MSAs from the individual domains too.
Great question Muhammad Abrar Yousaf !
  • asked a question related to Bioinformatics
Question
3 answers
Hi, I am a beginner in bioinformatics and I would like to identify CRISPRs in my MAGs fasta files. Can someone recommend an up-to-date good tool that can be easily installed through the Conda environment, please? Thank You in advance
Relevant answer
Answer
The CRISPRCasFinder helps with CRISPRs and cas genes finding in your MSA/fasta files - you can upload them on this website - https://crisprcas.i2bc.paris-saclay.fr/CrisprCasFinder/Index. Otherwise you may need to use another soft in case your files are too big and the first link does not work https://crisprcas.i2bc.paris-saclay.fr/Home/Download
Reference: David Couvin, Aude Bernheim, Claire Toffano-Nioche, Marie Touchon, Juraj Michalik, Bertrand Néron, Eduardo P C Rocha, Gilles Vergnaud, Daniel Gautheret, Christine Pourcel, CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W246–W251, https://doi.org/10.1093/nar/gky425
  • asked a question related to Bioinformatics
Question
2 answers
Dear Researchers,
If anyone is interested in reviewing manuscript on multiepitope vaccine design. Please provide your following details:
Note: Reviewers from India, Pakistan, Egypt & Saudi Arabia are not eligible for this manuscript.
First Name:
Last Name:
Degree:
Position:
Institution:
Department:
Institutional E-mail id:
Relevant answer
Answer
Hi, I am in
For what journal do you need reviewers?
  • asked a question related to Bioinformatics
Question
3 answers
Can an MD simulation be performed by adding other salts by varying their concentration inside the box?
Relevant answer
Answer
It is possible to add other ions to the solvent the system in addition to NaCl and MgCl2 during a MD simulation. But, it is important to consider the effect of additional ions on the simulation results and the choose of the appropriate ion concentration are based on the strudied system.
  • asked a question related to Bioinformatics
Question
3 answers
"The result shows absence of intragenomic variation among 16S rDNA gene and presence of variable regions among the 16S rDNA sequences (intergenomic variation), noticing for example high variability around 800, 900, and 1000 bp and a large conserved region between 1150 and 1350 bp. This information allowed us to discard the restriction enzymes FnuII, AsuI, FokI, Eco57I that recognized some restriction sites contained within variable regions, since they are more susceptible of acquiring future nucleotidic variations and with this, the potential generation of different band patterns." [1]
I add that the article mentioned that these discarded enzymes were targeting conserved sites in the study species.
[1]Mandakovic D, Glasner B, Maldonado J, Aravena P, González M, Cambiazo V, Pulgar R. Genomic-Based Restriction Enzyme Selection for Specific Detection of Piscirickettsia salmonis by 16S rDNA PCR-RFLP. Front Microbiol. 2016 May 9;7:643. doi: 10.3389/fmicb.2016.00643. PMID: 27242682; PMCID: PMC4860512.
Is my reading right that the article implies that there is such potential? If yes, what are the possible mechanisms?
More important, what's the time frame of this "future nucleotidic variation", is it an evolutionary time frame that could take thousands of years?
Edit: i think my question can be thought of as: How common are new 16s rRNA gene variants in bacterial species?
Relevant answer
Answer
Yes, your reading is correct. The article implies that there is potential for future nucleotide variations within the conserved restriction sites that are located in variable regions of the 16S rDNA gene.
The possible mechanisms for such variations are mutations, insertions, deletions, or recombinations, which can occur spontaneously or as a result of exposure to environmental factors, such as UV radiation, chemicals, or antibiotics. These changes can accumulate over time and result in differences in the sequence and/or length of the conserved restriction sites, leading to the generation of different band patterns upon restriction digestion.
The time frame for such variations can vary depending on the bacterial species, its population size, its growth rate, and the selective pressures it faces. Some bacterial species have high mutation rates and/or frequent horizontal gene transfer events, which can result in rapid evolution and diversification. Others have lower mutation rates and/or stable environments, which can lead to slower evolution and conservation of certain traits. However, even slow evolution can accumulate changes over time, and it is difficult to predict the exact time frame for future nucleotide variations within conserved restriction sites.
Regarding your edited question, the frequency of new 16S rRNA gene variants in bacterial species can also vary depending on the factors mentioned above. Some bacterial species have high genetic diversity and high rates of recombination and horizontal gene transfer, leading to frequent emergence of new variants. Others have low genetic diversity and low rates of recombination and horizontal gene transfer, resulting in slower emergence of new variants. However, the 16S rRNA gene is generally considered to be a stable and conserved marker for bacterial identification and classification, and many conserved regions within this gene are used as targets for PCR amplification and sequencing.
These video playlists might be helpful to you:
  • asked a question related to Bioinformatics
Question
3 answers
Dear Friends and connection
I believe in the power of community. So, I post this,
I am excited to explore the possibility of collaborating with someone who works on network pharmacology. As, network pharmacology is an interdisciplinary field that combines principles of network analysis, bioinformatics, and pharmacology to investigate drug-target interactions and predict the therapeutic effects of drugs.
I have some projects related to bioinformatics and I believe that our collaboration can result in significant progress in this exciting field.
I am looking forward to hearing from you and exploring our collaboration for network pharmacology.
Regards
Shopnil Akash
WhatsApp: +8801935567417
Relevant answer
Answer
Network pharmacology, a systematic analytical method, can analyze the interaction network of multiple factors such as drugs, protein target, diseases, and genes.
Regards,
Shafagat
  • asked a question related to Bioinformatics
Question
5 answers
I've recently been using the NCI's Cancer Genome Atlas to find datasets and perform basic clinical correlation analyses. I think it's a fantastic tool, even for people with a limited bioinformatics background, so it made me curious if there are similar resources for people who study non-cancer diseases.
I was wondering if people are aware of any other databases/repositories/webtools that serve a similar purpose for non-cancer diseases. If anyone has recommendations/suggestions, please comment/link them down below.
Thanks in advance for your input!
Relevant answer
Answer
There are several databases and repositories available that provide genomic data and tools for the study of non-cancer diseases. Here are a few examples:
  1. The Genetic Association Database (GAD) - GAD is a database that collects data from published studies investigating genetic associations with various diseases, including autoimmune disorders, cardiovascular diseases, and neurological disorders. The database includes information on single nucleotide polymorphisms (SNPs), genes, and diseases.
  2. The National Institute of Neurological Disorders and Stroke (NINDS) Repository - The NINDS Repository provides access to biospecimens and genetic data from patients with neurological disorders. Researchers can use this resource to investigate the genetic basis of neurological diseases and to develop potential treatments.
  3. The Online Mendelian Inheritance in Man (OMIM) - OMIM is a database that catalogs genes and genetic disorders. The database includes information on the genetic basis of various diseases, including cardiovascular diseases, neurological disorders, and rare genetic disorders.
  4. The Comparative Toxicogenomics Database (CTD) - The CTD provides information on how environmental chemicals can affect human health. The database includes information on the genes and proteins that are impacted by toxic substances and their associated diseases.
These are just a few examples of the many databases and repositories available for the study of non-cancer diseases. It is important to choose the appropriate resource based on your research question and to make sure that the data and tools are reliable and validated.
  • asked a question related to Bioinformatics
Question
4 answers
"Is there any in-silico methods for studying the effect of up-regulation and down-regulation of the same genes?"
If yes, please suggest me the name/article.....Thank you
Relevant answer
Answer
Luke V Schneider Thank You...
  • asked a question related to Bioinformatics
Question
4 answers
What bioinformatics tools are available to help analyze and interpret large-scale molecular data generated from crop research?
Relevant answer
Answer
I highly recommend you to focus on your education and understanding the basics and fundamentals and not to spam here by posting questions and answering yourself.
Further, since you don't have the proper education to understand what software can be used for what it is totally illogical to talk about bioinformatics tools.
  • asked a question related to Bioinformatics
Question
2 answers
We all know that nanobody development is time and money consuming, it nearly needs a grant. I'm wondering if there is any bioinformatics tool or a method to predict nanobody sequence against certain antigen using this antigen sequence as an input ? Something like you put in the antigen sequence and that tool could predict how the nanobody against this antigen could be, in term of sequence, structure, etc?
Relevant answer
Answer
Yes, there are several bioinformatics tools available to predict nanobody sequences based on antigen sequences. One such tool is called "AbDesign," which is a web-based server that predicts the sequence and structure of nanobodies based on the input antigen sequence.
AbDesign uses a computational algorithm to predict the amino acid sequence of nanobodies that can bind to the input antigen sequence. The algorithm takes into account the physicochemical properties of the antigen and the CDRs (complementarity-determining regions) of the nanobody.
Other bioinformatics tools that can be used for nanobody sequence prediction include "Nanobody Mapper," "VHHDB," and "Nanobodies.org." These tools use a variety of algorithms and techniques to predict nanobody sequences, and some also provide additional features, such as database searches and visualization tools.
It's important to note that while bioinformatics tools can be useful for predicting nanobody sequences, experimental validation is still necessary to confirm the predicted sequence and determine its binding properties.
  • asked a question related to Bioinformatics
Question
3 answers
Hi, I would like to ask if anybody has positive experiences with single primer PCR ? Can you recommend me any proven protocol of this type of PCR ? Thank you for all recommendations. Bohuš
Relevant answer
Answer
Hi , in selection of mismatches (SNPs) it easily works. Coupling flourcent dyes to such primers can convert PCR to RT PCR .
  • asked a question related to Bioinformatics
Question
4 answers
I am running an MD simulation on a protein-protein complex.
After seeing a similar question on research gate, I checked the amino acids rtp file in my force fields folder, and as expected from this error, the HD1 atom was not present in the HSE entry. The atom HD2 is however present in that entry. So I figured replacing the HD1 atoms in my PDB file with HD2 should solve the error.
And it did. For the time being.
To reaffirm, I made changes in Histidine's hydrogen atoms in the PDB file. When I went ahead with the energy minimization step, I got an error that said there's an Infinite Force on an atom. It turns out that the atom was "HD2" of some Histidine in the PDB file.
I saw online that the reason behind this error was due to atom overlap. Hence, just for seeing if that was the case for me, I changed the coordinates of that atom a little bit (this was just for checking, I can't do this for the actual work). When I ran the EM step again, I got the same error, but for a HD2 of a different Histidine molecule. So yes, overlapping of the atoms is the reason for this particular error. I cannot solve it by changing coordinates of all the HD2 atoms of the Histidines. So it all boils down to the main fatal error that I mentioned.
How do I approach this?
1. Changing the atom name (as in HD1 -> HD2 is not working due to the subsequent error)
2. I do not know if I should add the atom HD1 in the HSE entry in the rtp file (I tried this and got several warnings).
3. I cannot (or should I?) use -ignh because mine is not an NMR structure. I have modelled my proteins on Modeller and refined them online.
Any suggestions/solutions will help me a lot. Thank you in advance!
Relevant answer
Answer
Hi, a crude measure is to use -ignh during pdb2gmx, it will rebuild all the H-atoms based on the force field you are using. Most of the time, it is a reasonable choice (though not always), as the H-atoms are mostly absent in crystallographic structure (as it is difficult to resolve h-atom positions).
Histidine is unique in the sense that its side chain offers multiple h-bonds at physiological pH. The better procedure is to check which heavy atom of His side chain is forming H-bond in the protein (either it is delta or epsilon), and rename your His residues accordingly (HIS, HIE, HID, please check your FF how these residues are named there).
"2. I do not know if I should add the atom HD1 in the HSE entry in the rtp file (I tried this and got several warnings)." > Try not to mess with rtp entry at this stage (as you are very new), and if you like to play around, just make a backup of FF directory and do as you like.
  • asked a question related to Bioinformatics
Question
2 answers
I've been trying to know more about bioinformatics pipelines for whole genome shotgun sequencing data to use for the samples of animal fecal microbes diversity and identify pathogenic microorganisms (both of DNA and RNA).
Relevant answer
Answer
Dear Dr Abhijeet Singh, thank you so much.
  • asked a question related to Bioinformatics
Question
3 answers
I have tried to separate a direct coculture of MSCs (mesenchymal stromal cells) and macrophages to do bulk RNA seq on macrophages, as I want to find out how MSCs change the genetic expression on macrophages. I have tried different methods to separate the coculture as much possible, but I can only manage to retrieve a cell population with 95% macrophages, and 5% MSCs still present.
Therefore, I want to know if anyone has experience with analyzing data when the population is not completely pure with one cell type and how do I handle such data?
Is it wise to proceed with bulk RNA seq when 5% of my cells are still MSCs, well aware that the expressed genes observed could come from the 5% MSCs?
Relevant answer
Answer
Dear Kian,
have you tried improve your purity by FACS? It´s fairly easy to choose markers to distinguish MSC & macrophages and sort highly pure populations.
  • asked a question related to Bioinformatics
Question
8 answers
Risk of bias assessment (sometimes called "quality assessment" or "critical appraisal") helps to establish transparency of evidence synthesis results and findings. and it is mandatory to have it in your systematic review!
if you know any tools or used ones, can you please share it/them with me?
or if you have extra information regarding the risk of basis assessments, can you share it with me?
Relevant answer
Answer
Systematic reviews and meta-analyses are proliferating, as they are an important building block to inform evidence-based guidelines and decision-making. Enforcement of best practice in clinical trials is firmly on the research agenda of good clinical practice, but there is less clarity as to how evidence syntheses that combine these studies can be influenced by bad practice. Our aim was to conduct a living systematic review of articles that highlight flaws in published systematic reviews to formally document and understand these problems...
Many hundreds of articles highlight that there are many flaws in the conduct, methods and reporting of published systematic reviews, despite the existence and frequent application of guidelines. Considering the pivotal role that systematic reviews have in medical decision-making due to having apparently transparent, objective and replicable processes, a failure to appreciate and regulate problems with these highly cited research designs is a threat to credible science...