Science topic

Bioinformatics and Computational Biology - Science topic

Explore the latest questions and answers in Bioinformatics and Computational Biology, and find Bioinformatics and Computational Biology experts.
Questions related to Bioinformatics and Computational Biology
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
I'm using autodock vina in Python to dock multiple proteins and ligands, but I'm having trouble setting the docking parameters for each protein. How can I do this in Python? (I have attached my py code which I have done in this I have assumed this parameters same for all proteins)
Relevant answer
Answer
By the above code, irrespective of protein size the grid box size will be considered as 20x20x20. End of the vina execution, most of the complex shows binding affinity "0" or much less, as the active site will be out of the grid box range. Better increase the grid box size (SIZE_X,Y,Z) up to 60 or 120 each, depending on the maximum proteins (chains in PDB code) size of each complex, and try to run VINA again. Then you may get binding energy values of maximum protein-ligand complexes (sometimes for all).
However, this will not mimic the experimental structure correctly, since you handling bulk protein-ligand (separately) complexes docking with the common configuration file same time.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I'm on the lookout for remote bioinformatics and computational biology opportunities where I can actively contribute to research projects. Compensation is not a priority for me; my main focus is to gain hands-on experience in these fields.
#biopython
#computational_biology
#bioinformatics
#biology
#R
Relevant answer
Answer
Avenues you can explore to find such opportunities:
1. Academic research institutions: Many universities and research institutions offer remote research positions or internships in bioinformatics and computational biology. Check their websites, job boards, and reach out to individual researchers or research groups who align with your interests.
2. Online job portals and platforms: Websites and platforms dedicated to remote work, such as LinkedIn, Indeed, and Upwork, often have listings for bioinformatics and computational biology projects. You can search for specific keywords like "remote bioinformatics," "computational biology," or "bioinformatics internships" to find relevant opportunities.
3. Open-source projects: Contributing to open-source bioinformatics projects can provide valuable hands-on experience. Explore bioinformatics software and libraries like Biopython, Bioconductor (for R), or other popular tools on platforms like GitHub. Contribute to their development, report issues, or collaborate with the community.
4. Online communities and forums: Engage with online communities and forums focused on bioinformatics and computational biology. These platforms, such as Bioinformatics Stack Exchange, BioStars, or community forums associated with specific software packages, often have job boards or project collaboration opportunities shared by researchers or organizations.
5. Networking: Attend virtual conferences, webinars, and workshops related to bioinformatics and computational biology. Connect with researchers, presenters, and fellow attendees to express your interest in remote research opportunities. Networking can often lead to potential collaborations or recommendations for available positions.
When searching for opportunities, it's important to tailor your search keywords to include relevant terms like "remote," "internship," "volunteer," or "project-based." Additionally, clearly communicate your enthusiasm, willingness to contribute, and desire for hands-on experience in your application materials or when reaching out to potential mentors or supervisors.
Hope it helps:credit AI.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
Relevant answer
Answer
Dear Yahaya,
Thank you for your insightful suggestion regarding the active site prediction tool from the Supercomputing Facility for Bioinformatics & Computational Biology at IIT Delhi. I appreciate your thoroughness in exploring the potential utility of this tool.
I'm glad to make your attention to the paper by Tanya Singh, D. Biswas, and B. Jayaram titled "AADS - An automated active site identification, docking and scoring protocol for protein targets based on physico-chemical descriptors." The fact that this paper has been cited by 181 researchers highlights its significance in the field. While I haven't personally used the server yet. Though a humble suggestion to leverage it for gaining a preliminary understanding of the active site of proteins of interest is intriguing.
I would love to suggest an approach of cross-checking the tool's predictions with known active sites of a minimum of 10 proteins from the same organism is a thoughtful idea. This comparative validation would indeed provide valuable insights into the reliability and accuracy of the tool's predictions. If the results align well with established knowledge, it would lend substantial credibility to the tool's effectiveness.
Considering the extensive citations of the paper and the potential of the tool, I believe that exploring its feasibility is a worthwhile endeavor. The ability to gain a preliminary idea about the active site of proteins of interest could greatly inform and streamline any research efforts.
Additionally, there are several other active site prediction tools available that you might find useful for your research. Here are a few notable options:
CASTp (Computed Atlas of Surface Topography of Proteins), SiteMap, LIGSITE, DoGSiteScorer, PockDrug-Server, etc
Hope this answer will be helpful for you.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I know many websites have simple tools like transcription and translation available, but are there any analysis tools that researchers need that either do not exist or are not publicly available? It could be anything from algorithms to visuals. Thanks!
Relevant answer
Answer
Abhijeet Singh Thank you for your response and mentioning my earlier post! My belief is that researchers would know tools that are missing based on the fact that they would run into such problem often during their research. If there is some manual analysis task that researchers can automate, I believe that PeptiCloud can be the perfect platform to develop and make those tools publicly available. (For instance, PeptiCloud has a unique feature that allows users to further alter codon sequence of each amino acid after codon optimization with respect to a specific bacterial strain). With that being said, if you could check out PeptiCloud for yourself and see if anything could be added or improved, that would be greatly appreciated!
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
How can I dock more than one protein with more than one ligand, I know that pyrx is the software which docks 1 protein with multiple ligand but how can I do it for multiple proteins with multiple ligands?
Relevant answer
Answer
For batch docking, I have used the vLifeMDS tool.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Is there any server or tools (bioconda, java, etc.) to exclusively annotate membrane protein only (similar to dbCAN for polysaccharides) from a bacterial genome?
Thank you in advanced!
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I am attempting to use the Seurat FindAllMarkers function to validate markers for rice taken from the plantsSCRNA-db. I want to use the ROC test in order to get a good idea of how effective any of the markers are. While doing a bit of research, different stats forums say: "If we must label certain scores as good or bad, we can reference the following rule of thumb from Hosmer and Lemeshow in Applied Logistic Regression (p. 177):
0.5 = No discrimination 0.5-0.7 = Poor discrimination 0.7-0.8 = Acceptable discrimination 0.8-0.9= Excellent discrimination0.9 = Outstanding discrimination "
For more background, the output of the function returns a dataframe with a row for each gene, showing myAUC: area under the Receiver Operating Characteristic, and Power: the absolute value of myAUC - 0.5 multiplied by 2. Some other statistics are included as well such as average log2FC and the percent of cells expressing the gene in one cluster vs all other clusters.
With this being said, I would assume a myAUC score of 0.7 or above would imply the marker is effective. However given the formula used to calculate power, a myAUC score of 0.7 would correlate to a power of 0.4. So with this being said, would it be fair to assume that myAUC should be ignored for the purposes of validating markers? Or should both values be taken into account somehow?
Relevant answer
Answer
In the Seurat R package for analyzing single-cell RNA-seq data, "power" and "myAUC" are both functions used for selecting the most informative features or genes in the dataset. However, they employ different approaches and criteria to achieve this.
  1. Power: The "power" function in Seurat is used for identifying highly variable genes (HVGs) based on their expression dispersion relative to their mean expression level. This approach aims to capture genes that display biological variability across cells and are likely to be driving the observed heterogeneity in the dataset. By default, the "power" function calculates the power of a statistical test to detect differences in expression between two groups of cells, such as treatment vs. control or different cell types. It estimates the relationship between the mean expression and variance of each gene using a trend line and defines highly variable genes as those with expression levels deviating significantly from the trend line. The function outputs a list of highly variable genes ranked by their deviation.
  2. myAUC: The "myAUC" function in Seurat stands for "Area Under the Curve" and is used to rank genes based on their differential expression between two predefined groups or conditions. It employs the area under the receiver operating characteristic (ROC) curve as a measure of differential expression, where the ROC curve represents the true positive rate against the false positive rate at various gene expression thresholds. The myAUC algorithm evaluates the discriminatory power of each gene in distinguishing between the two groups and ranks them accordingly. Genes with higher AUC values have greater discriminatory power and are considered more differentially expressed between the groups of interest.
In summary, the "power" function identifies highly variable genes based on their expression dispersion relative to mean expression, while the "myAUC" function ranks genes based on their ability to discriminate between two predefined groups or conditions using the area under the ROC curve. Both functions aim to identify genes that are potentially important for distinguishing between different cell types, states, or experimental conditions, but they use different statistical and computational approaches to achieve this goal.
  • asked a question related to Bioinformatics and Computational Biology
Question
25 answers
I want to do a simulation with Molecular Dynamics but lack the facilities.
Relevant answer
Answer
You can use Visual Dynamics for simulations. It is freely available at https://visualdynamics.fiocruz.br/login
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
RNA docking using autodock has a different approach to deal with. What are the steps that are required to compute the gasteiger charges in particular?
Relevant answer
Answer
I have the same question.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
There are so many softwares for docking but which one is best? On which we have to rely?
Relevant answer
Answer
It's depended on the software algorithm.
The equipment listed below might be useful
AutoDock Vina, AutoDock
  • asked a question related to Bioinformatics and Computational Biology
Question
17 answers
These datasets will be used for data classification and predicting new information
Relevant answer
Answer
The Aidos-x system is an excellent intelligent system for diagnosing and classifying human diseases. The methods of using the Aidos-x system for diagnosing human diseases are disclosed in lectures with sound "Using automated system-cognitive analysis for the classification of human organ tumors", "Intelligent system for diagnosing early stages of chronic kidney disease", which can be downloaded right now from the website https ://www.patreon.com/user?u=87599532 Creator's title: «Lectures on Electronic Medicine». After subscribing to this site, you will receive databases for medical research to identify the diseases that you will read about in lectures. The acquired skills of working in the Aidos-x system will allow you to apply for grants to carry out scientific research in the field of medicine.
To subscribe to the site https://www.patreon.com/user?u=87599532 you do not need to go to the bank, but you can do it using the Pay Pal system. Send a transfer in your currencies, and the transfer to dollars will be done automatically. After subscribing on the site, you will receive the Aidos-x system with an English user interface for free.
Thank you.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
I currently use .csv files to work with pandas dataframes and perform UMAP analyses and I would like to use Scanpy moving forward. Can anyone help me with converting .csv files into Anndata files for Scanpy?
Relevant answer
Answer
Myles Joshua Toledo Tan Converting a.csv file to an AnnData file for use in Scanpy is a simple procedure. Here's an example of how it may be done:
1. You must first install the anndata package, which can be done by typing pip install anndata into your command line.
2. Following that, import the relevant libraries, such as pandas and anndata.
import pandas as pd
import anndata
3. Then, using the pd.read csv() method, read your.csv file into a pandas DataFrame.
data = pd.read_csv("your_file.csv")
4. After that, you can use the anndata.AnnData() method to convert the DataFrame to an AnnData object.
adata = anndata.AnnData(data)
5. Finally, you may use the scanpy library's different methods to conduct any extra processing or analysis on your AnnData object.
It's worth noting that when you convert a dataframe to an AnnData object, it thinks the rows are observations and the columns are variables. If you have the inverse, use.T to transpose the dataframe.
You can alternatively use scanpy.read csv() to directly import the csv file into an AnnData object to get the same result.
adata = sc.read_csv("your_file.csv")
Please let me know if there is anything else I can do for you.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Hello!!!
I would like you to help me with information about full predoctoral or doctoral fellowships in areas such as bioinformatics, computational biology, microbiology or related fields to which I can apply.
I would be very grateful if you could recommend some of them.
Fausto Cabezas-Mera
Greetings from Ecuador
Relevant answer
Answer
HI,
where are you from? Are you willing to travel?
This web site is filled with opportunities regarding what you are looking for:
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I created this R package to allow easy VCF files visual analysis, investigate mutation rates per chromosome, gene, and much more: https://github.com/cccnrc/plot-VCF
The package is divided into 3 main sections, based on analysis target:
  1. variant Manhattan-style plots: visualize all/specific variants in your VCF file. You can plot subgroups based on position, sample, gene and/or exon
  2. chromosome summary plots: visualize plot of variants distribution across (selectable) chromosomes in your VCF file
  3. gene summary plots: visualize plot of variants distribution across (selectable) genes in your VCF file
Take a look at how many different things you can achieve in just one line of code!
It is extremely easy to install and use, well documented on the GitHub page: https://github.com/cccnrc/plot-VCF
I'd love to have your opinion, bugs you might find etc.
Relevant answer
Answer
I use TASSEL software for genome analysis. You need plink format of map and pad to operate it. You can try and explore this software
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
I want parameterise the ZN metal, which is coordinated with CCCH (three CYS and one HIS residues) residues. I just followed MCPB tutorial. While side chain modelling i got errors and unable to fix the problem. Here, i have attached my pdb file , sidechain.bcl file and sidechain.bcl log files.
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
I upload a genome to check using Busco via galaxy server. Currently, it is 2 days and the result is not finished yet?
Did I miss something or is there is a problem?
Thank you in advanced
Relevant answer
Answer
Dear Dr. Ryan Gourlie
In my case it takes 5 days to generate the BUSCO result
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
I have a PDB file of a branched polymeric chain, I want to simulate its water solvation using "openMM"
My problem is to find an amber force field to fit that branched chain.
By the way, I have the force field file (XML) which fits the linear polymeric chain (attached)
Relevant answer
Answer
Thanks for your reply! I modified the parameters close to the paper's parameters, but it still gives me the same error: resName = resname_prefix+residue.attrib['name']
KeyError: 'name'
This paper contains the field for alkane, but my polymer has Oxygen.
By the way, I am using openMM.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
Good day! The question is really complex since CRISPR do not have any exact sequence - so the question is the probability of generation of 2 repeat units, each of 23-55 bp and having a short palindromic sequence within and maximum mismatch of 20%, interspersed with a spacer sequence that in 0.6-2.5 of repeat size and that doesn't match to left and right flank of the whole sequence, in a random sequence.
Relevant answer
Answer
First, I'd re-state the question to assure that I understood it correctly. A nucleotide sequence of length l contains a palindrom with unit of length k. The palindrom is not exact; there can be from kmin to k matches between units. The distance between palindrom units can be from smin to smax. First and last sub-sequences of length k are not exact matches of any palindrom unit.
My solution. Let's omit the last condition for now. How we search for a palindrom with unit of length k? Take any subsequence of length k and search for a 'match'. Searching for a 'match' is equal to checking (l-k-smin) subsequences, because the unit itself occupies k nucleotides and a spacer can't be shorter than smin nucleotides. In each window the probability of hit is (1/4)^(kmin), if every nucleotide has equal probability of occurrence. The probability of having 1 or more hits then is equal to binomial cdf with the number of attempts equal to n-k-smin, the probability of success equal to (0.25)^kmin and number of successes equal to 1. For example, GSL function gsl_cdf_binom_Q(n-k-smin,0.25^kmin,0) would give the answer. The last paramerter is zero, because the function computes the probability of more than x successes, i.e. 1 and more in this case.
Now, let's include the last condition. It is important to define what 'does not match' mean. I suppose that it means that we can't find the second palindrom unit at postions 1 and l-k. So, the number of windows that we check has to be decreased by 2. The final answer would be:
F(n-k-smin-2,0.25^kmin,0), where F - binomial cdf.
For varying length the answer would be a weighted sum of those propabilities, with weights equal to the probability of observing given legnth. So, if all lengths have equal probability, this is the mean.
I checked the answer on a synthetic set and it seems it is correct or close to being so.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I want to dig into machine learning for drug discovery, Can anyone suggest me some good reads from where and how to start, what prerequisites needs to be checked and is there any publicly available material online?
Relevant answer
Answer
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
hello
Please introduce me the companies that provide biotechnology services such as designing different types of primers, NGS, RNASeq, etc.
Relevant answer
Answer
Following companies have offices in Tehran
Sanofil - Roche - Novo NorDisk - Novartis - Bayer - Johnson & Johnson - Merck KGaA - Zoetis - TCI - BioHorizons Implant Systems. In US many are located in Philadelphia, Pennsylvania and Boston, Mine and San Fransisco, California. - But Iran had made working with US companies nearly impossible. For ease of working outside Iran I'd look to China. In general use caution sharing new ideas with companies you have not worked with in past..
  • asked a question related to Bioinformatics and Computational Biology
Question
11 answers
The workstation I have been using takes up nearly 15-20 days for a SPC model simulation. Also, I could not run in the HPC's as my simulation generates huge amount of data (which takes up lot of memory). So, I am planning to buy a system to run simulations in my home itself. Suggest me the best specifications I would require? 
PS: I already have one PC with 4 GB RAM with Intel core 2 duo. Can I add an external workstation motherboard?
Relevant answer
Answer
Here is the hardware informations.
Hardware with GPGPU
Intel Xeon 4C or above
500GB or above Class 20 Solid State Drive
2TB SATA 7200 rpm
16GB or above DDR4
Nvidia Quadro RTX5000, 16GB
Ubuntu 20.04 LTS
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Certain softwares and sites allow to calculate a DNA hairpin Tm depending on the size of the loop and the stem sequence. For example, Gene Runner. Yet the calculation method or citation is not provided. Is there a formula that could help?
Relevant answer
Answer
DOI: 10.1039/b804675c
This paper explains very well how unfolding and melting of DNA hairpin works. kindly have a look.
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
Hello everyone,
I have designed this guide on bioinformatics for the Univ of Florida and would greatly appreciate it you suggestions/ comments on what else I am missing and should include. Thanks, Rolando
Relevant answer
Answer
I can not visualize the link. I tried from two different browsers without success. Is it possible that the page is not working? is any other site for viewing it?
Thanks in advance!
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I have a protein with PDB (1ZK4), it obtains NADP as ligand in its structure when I tried to dock it, I faced error while creating the PDBQT file for this protein in AutoDock :Error: "Non-integral charge on residues" and the second error during grid run: Error:"Found an H-Bonding atom with three bonded atoms, atom serial 1903". When I removed NADP from my receptor protein I got the proper docking results. Can anyone explain why it was so?
Relevant answer
Answer
Hello,
I have the same problem. Did you fix it?
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
Dear all,
Enclose here is the grapgh of PMF of a small drug molecule in two different lipid systems using Gromacs steered MD and Umbrella sampling methods. 
The PMF for both systems are showing a negative energy value. PMF comparison indicating a significant difference at the hydropholic region of for my drug transport.
Is it right to say that the drug transport is more spontaneous (black graph/System-I) as compared to the green graph/System-II?
I hope to receive your comments and suggestions for a fruitful interpretation.
Thanking you
Sincerley
Bikash
Relevant answer
Answer
Bikash Ranjan Sahoo what was the reason for the negative values ?
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
I'm trying to find GC methylation percent of a specific gene promoter in mouse ESCs, I found http://imethyl.iwate-megabank.org and http://www.methdb.net websites as available databases, however, after putting my favorite gene ID number I was unable to interpret all peaks shown in those website, I was wondering if anybody knows how can I interpret and present those information? OR is there any other way to find methylation status of a gene? Thanks,
Relevant answer
Answer
Hi Azin
you can also go to the UCSC web server and apply the right parameters (species/genome/genename) and look on the genome browser what's up with the right options.
UCSC (http://genome.ucsc.edu) provides additional tools that could also be interesting in your research, take some times to get used.
keep safe
fred
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
During the addition of ions Na and Cl to the system in the sol, the program threw the error stating that "no line with molecule 'SOL' found in the [molecules] section of file 'topol.top'.
While the file topol.top has the entry in it. please suggest how to rectify the errror.
Thanks in advance.
Regards,
Vinay
Relevant answer
Answer
Hi,
If it still persists, just add a new line character around the line of SOL in topol.top file. Sometimes, gmx behaves in a weird manner during reading SOL.
HTH.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
During the topology generation of protein (PDB id: 6lu7), a fatal error occurred. How to resolve this error and what precautions should be taken to avoid such errors?
Thanks in advance.
Regards,
Vinay
Relevant answer
Answer
I suspect it has to do with the non-canonical residues in chain C: 02J in position C1 ( 5-methyl-1,2-oxazole-3-carboxylic acid, https://www.rcsb.org/ligand/02J ), PJE in position C5 ( (E,4S)-4-azanyl-5-[(3S)-2-oxidanylidenepyrrolidin-3-yl]pent-2-enoic acid, https://www.rcsb.org/ligand/PJE ) and 010 in position C6 ( phenylmethanol, https://www.rcsb.org/ligand/010 ). Chain C is a ligand to the protein. If you are interested in just the protein, e.g. in preparation for docking, you should remove chain C. If you actually are interested in the complex, you need to supply the topology and parameter files for these hetero-compounds as you would for non-peptidomimetic ligands.
To avoid such problems, start by looking at the structure in a molecule viewer to look for non-proteinic components ( e.g. PyMOL selection "organic" ) and/or look carefully at the annotations at rcsb.org ( https://www.rcsb.org/structure/6LU7 ), listing Entity ID 2: N-[(5-METHYLISOXAZOL-3-YL)CARBONYL]ALANYL-L-VALYL-N~1~-((1R,2Z)-4-(BENZYLOXY)-4-OXO-1-{[(3R)-2-OXOPYRROLIDIN-3-YL]METHYL}BUT-2-ENYL)-L-LEUCINAMIDE
In addition, the biological unit of the protein is a homodimer. However, since the dimer symmetry coincides with crystallographic symmetry, the asymmetric unit only contains one monomer. Depending on the questions you wish to adress with your computational analysis, you might want to use the biological assembly rather than the asymmetric unit of the structure.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I have performed a simulation via Desmond, now i want to perform MMPBSA. I dont think Desmond has any functionality of calculating MM/PBSA. I wonder how to move forward? Youre guidance will be highly appreciated.
Thanks
Relevant answer
Answer
Hi, I think you should look into this thread for details:
From "Bowen Tang"
Of course you can do this.
thermal_mmgbsa.py is in $SCHRODINGER/mmshare-vxxxxx/python/common
run it as below:
$SCHRODINGER/run thermal_mmgbsa.py <your_trajectory.cms>
Good luck.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
After finishing the simulation of the cyclic peptide, I tried to find the most populated structure using the cluster peak density algorithm. from the literature, the representative structure was chosen as the structure with maximal ρsum (The summation of local densities of all residues in one structure, ρ𝑠𝑢𝑚 = ∑ ρ𝑖𝑛_𝑟𝑒𝑠𝑖=1) so how can I extract the structure which has the highest density for the all residue?
ref: Clustering by Fast Search and Find of Density Peaks. Science 2014, 344, 1492–1496
Relevant answer
Answer
Dear Sam Mohel ,
luster analysis is an exploratory analysis that tries to identify structures within the data.  Cluster analysis is also called segmentation analysis or taxonomy analysis.  More specifically, it tries to identify homogenous groups of cases if the grouping is not previously known.  Because it is exploratory, it does not make any distinction between dependent and independent variables.  The different cluster analysis methods that SPSS offers can handle binary, nominal, ordinal, and scale (interval or ratio) data.
Regards,
Shafagat
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
Hi,
GO and KEGG functional analysis for a gene set was using the DAVID database (https://david.ncifcrf.gov/). However, the adjusted p-values (Bonferroni and Benjamini) of the enriched GO terms and KEGG pathways were more than 0.5. Meanwhile, a PPI network was constructed using the STRING database (https://string-db.org). The network was constructed with a confidence score of  0.4 was set as the cutoff criterion with no more than ten as the maximum number of interactions in the first shell. This step added a few more genes to the gene list, and genes with no interactions were removed. When the updated gene list was used for GO and KEGG functional analysis, the enriched GO terms and KEGG pathways were now significant (p-value < 0.05). Is the attempted workflow valid?
Relevant answer
Answer
Thank You, Dr Giovanni Colonna, for taking your time in answering the question. I concur with your explanation.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
How to calculate the ratio of the number of mismatches between reference and reads to the number of all mapped bases at each reference position when I got bam file? Comments on any program or script or any suggestions is welcome.
Relevant answer
Answer
Sorry, I think you have already got your answer. Anyways, for the reference of the other people having similar questions, You can use Integrative Genomic Viewer (https://software.broadinstitute.org/software/igv/) to visualize your BAM file against any reference sequence. After visualization, if you click on any specific point on the coverage plot, it will show you the mutation rate of that particular point of the genome.
  • asked a question related to Bioinformatics and Computational Biology
Question
28 answers
Is it possible to use Artificial Intelligence (AI) in Biological and Medical Sciences to search databases for potential candidate drugs/genes to solve global problems without first performing animal studies?
Relevant answer
Answer
Yes, AI has already been heavily deployed for biological diagnosis through predictions and classifications.
  • asked a question related to Bioinformatics and Computational Biology
Question
11 answers
I have not much experience in bioinformatics and I need to find what are the common genes in several gene expression datasets, in other words, I need to find genes that match in all (or some) of my datasets. I am looking for some kind of tool that give me Venn diagrams with the coincident genes. Any suggestion (free software plese) will be very appreciated.
Relevant answer
Answer
For a List of Venn diagram tools, their features, and references, you may check out the link below.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I'm working in use PSO for local alignment of ADN sequences, but I couldn't find a way to represent the alignment or gaps in the alignment.
Any opinion will be useful. Thanks
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
I would like to perform some high scale virtual screening with PyRx, docking libraries of compounds to the active site of a protein. To do that, the software needs all the .sdf (or .pdb or any other coordinate format) of the small molecules that I'd like to try. Form available online libraries usually all the molecules parameters are listed in a singular file containing all the thousands of molecules. Do you know if there is a fast way to extrapolate the singular .sdf from those kind of files? Is there a tool to obtain all the singular .sdf files from the mother one?
Thank you!
Relevant answer
Answer
Donato Calabrese Split the .sdf file into individual file using openBabel,something like:
babel -i sdf bigfile.sdf -o sdf file.sdf -m
(-m option should force output of files file1.sdf, file2.sdf, file3.sdf etc.)
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Dear RG members,
    I am trying to install AMBER in parallel in one cluster having ifort compiler. I am getting the error MPIF90 command not found. I read the configure2 file in AmberTools/src, that tells I need to install serial first.
 for searial 
setenv AMBERHOME "amber path"
./configure -noX11 intel
make install 
 
It is running perfectly.
 
How can I proceed to add MPI run. 
Kindly suggest the next commands.
Should I hit 
./configure -mpi intel
make install
or some other tricks.
 
I am failing each time with few errors.
 
Kindly share the complete commands after the searius installation steps.
 
 
Relevant answer
Answer
For parallel installation of Amber, you can visit my web page. There I summarized our experiences on Amber installation on a local Linux machine.
For a successful run of MPI, add ‘mpirun -bind-to core -np 4’ and ‘.MPI’ before and after the module name. The number before the module name is the number of cores you want to allocate for parallel calculation and can be changed accordingly. For example, to run the sander module of Amber, we use the following command in our local system.
$ mpirun -bind-to core -np 4 sander.MPI -O -i sander.in1 -o sander.out1 -r mdrest1 -c prmcrd -ref prmcrd
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I would like to study the apo form (lipid-free) of a protein that only has been crystallized with lipids. I want to explore if it is possible to generate with a molecular dynamic a reasonable structure, making subtraction of lipids in several steps until obtaining the apo form. Likewise, I don't know if, during the molecular dynamic trajectory, it is possible to disappear lipids. I am thinking of using programs like GROMACS, AMBER, etc.
Relevant answer
Answer
You need to remove lipid before MD simulation. You can not delete or add any atom/residue/molecule during and after MD simulation, as it will destroy your trajectory data.
  • asked a question related to Bioinformatics and Computational Biology
Question
11 answers
Is it possible to do 3D-QSAR without using commercial software? If so, can anybody develop a workflow for doing 3D-QSAR with the suitable free software in each step?
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I have used P2Rank in the PrankWeb software and the CASTp tool to analyze the refined structures of some proteins to predict protein pockets and cavities. But now I am not finding any clue to visualize them in PyMOL.
Relevant answer
Answer
There’s CASTpyMOL, a plugin in Pymol to vidualize CASTp results:
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I am currently trying to find homologues of a protein I am working with, but BLAST has been giving me nothing useable. I have now found a dataset of 1500 protein sequences of potential candidates that I want to align to my reference sequence. I have tried Clustal, Mega, Muscle, MAFFT and pretty much everything under the sun, but with this many sequences and only limited experience, I am having trouble achieving what I want to do, as the programs simply crash or lock up after a few minutes..
Instead of the traditional multiple sequence alignment, where every sequence gets aligned to every other sequence with multiple iterations, I want all of the sequences from the dataset to only be aligned to my one reference sequence. Think of it as doing 1500 pairwise alignments only. What would be the best way to perform this kind of alignment?
Relevant answer
Answer
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
After installing CASTp plugin to pyMOL, I tried to use it to view a protein and its pockets fro CASTp server through its job ID. But this error message occurs always
Relevant answer
Answer
import urllib
  • asked a question related to Bioinformatics and Computational Biology
Question
31 answers
Applications of bioinformatics in medicine is a key factor in technological advancement in the field of modern medical technologies.
In which areas of medical technology are the technological achievements of bioinformatics used?
What are the applications of bioinformatics in medicine?
Please reply
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
Please have look on our(Eminent Biosciences (EMBS)) collaborations.. and let me know if interested to associate with us
Our recent publications In collaborations with industries and academia in India and world wide.
Our Lab EMBS's Publication In collaboration with Universidad Tecnológica Metropolitana, Santiago, Chile. Publication Link: https://pubmed.ncbi.nlm.nih.gov/33397265/
Our Lab EMBS's Publication In collaboration with Moscow State University , Russia. Publication Link: https://pubmed.ncbi.nlm.nih.gov/32967475/
Our Lab EMBS's Publication In collaboration with Icahn Institute of Genomics and Multiscale Biology,, Mount Sinai Health System, Manhattan, NY, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Our Lab EMBS's Publication In collaboration with University of Missouri, St. Louis, MO, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30457050
Our Lab EMBS's Publication In collaboration with Virginia Commonwealth University, Richmond, Virginia, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
Our Lab EMBS's Publication In collaboration with ICMR- NIN(National Institute of Nutrition), Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Our Lab EMBS's Publication In collaboration with University of Minnesota Duluth, Duluth MN 55811 USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
Our Lab EMBS's Publication In collaboration with University of Yaounde I, PO Box 812, Yaoundé, Cameroon. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Our Lab EMBS's Publication In collaboration with Federal University of Paraíba, João Pessoa, PB, Brazil. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30693065
Our Lab EMBS's Publication In collaboration with collaboration with University of Yaoundé I, Yaoundé, Cameroon. Publication Link: https://pubmed.ncbi.nlm.nih.gov/31210847/
Our Lab EMBS's Publication In collaboration with University of the Basque Country UPV/EHU, 48080, Leioa, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852204
Our Lab EMBS's Publication In collaboration with King Saud University, Riyadh, Saudi Arabia. Publication Link: http://www.eurekaselect.com/135585
Our Lab EMBS's Publication In collaboration with NIPER , Hyderabad, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Our Lab EMBS's Publication In collaboration with Alagappa University, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Our Lab EMBS's Publication In collaboration with Jawaharlal Nehru Technological University, Hyderabad , India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Our Lab EMBS's Publication In collaboration with C.S.I.R – CRISAT, Karaikudi, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237676
Our Lab EMBS's Publication In collaboration with Karpagam academy of higher education, Eachinary, Coimbatore , Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Our Lab EMBS's Publication In collaboration with Ballets Olaeta Kalea, 4, 48014 Bilbao, Bizkaia, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Our Lab EMBS's Publication In collaboration with Hospital for Genetic Diseases, Osmania University, Hyderabad - 500 016, Telangana, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Our Lab EMBS's Publication In collaboration with School of Ocean Science and Technology, Kerala University of Fisheries and Ocean Studies, Panangad-682 506, Cochin, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27964704
Our Lab EMBS's Publication In collaboration with CODEWEL Nireekshana-ACET, Hyderabad, Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26770024
Our Lab EMBS's Publication In collaboration with Bharathiyar University, Coimbatore-641046, Tamilnadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27919211
Our Lab EMBS's Publication In collaboration with LPU University, Phagwara, Punjab, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/31030499
Our Lab EMBS's Publication In collaboration with Department of Bioinformatics, Kerala University, Kerala. Publication Link: http://www.eurekaselect.com/135585
Our Lab EMBS's Publication In collaboration with Gandhi Medical College and Osmania Medical College, Hyderabad 500 038, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27450915
Our Lab EMBS's Publication In collaboration with National College (Affiliated to Bharathidasan University), Tiruchirapalli, 620 001 Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27266485
Our Lab EMBS's Publication In collaboration with University of Calicut - 673635, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Our Lab EMBS's Publication In collaboration with NIPER, Hyderabad, India. ) Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Our Lab EMBS's Publication In collaboration with King George's Medical University, (Erstwhile C.S.M. Medical University), Lucknow-226 003, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579575
Our Lab EMBS's Publication In collaboration with School of Chemical & Biotechnology, SASTRA University, Thanjavur, India Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579569
Our Lab EMBS's Publication In collaboration with Safi center for scientific research, Malappuram, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Our Lab EMBS's Publication In collaboration with Dept of Genetics, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25248957
Our Lab EMBS's Publication In collaboration with Institute of Genetics and Hospital for Genetic Diseases, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26229292
Sincerely,
Dr. Anuraj Nayarisseri
Principal Scientist & Director,
Eminent Biosciences.
Mob :+91 97522 95342
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
So my last year project is Drug Efflux Pumps and Persistence in Methicillin Resistant Staphylococcus aureus and we gonna focus on persister cells to study the path way of antimicrobial resistance...my question is how can i link bioinformatics and some coding to this project without requiring wgs cause it's not an option inside our lab !I need a small yet beneficial technique/ tools in small scale that i can learn and implement by my self .PS I love programming in general but im still new to bioinformatics so i need help to link my passion for coding and my field "biotechnology"
Relevant answer
Answer
Please have look on our(Eminent Biosciences (EMBS)) collaborations.. and let me know if interested to associate with us
Our recent publications In collaborations with industries and academia in India and world wide.
Our Lab EMBS's Publication In collaboration with Universidad Tecnológica Metropolitana, Santiago, Chile. Publication Link: https://pubmed.ncbi.nlm.nih.gov/33397265/
Our Lab EMBS's Publication In collaboration with Moscow State University , Russia. Publication Link: https://pubmed.ncbi.nlm.nih.gov/32967475/
Our Lab EMBS's Publication In collaboration with Icahn Institute of Genomics and Multiscale Biology,, Mount Sinai Health System, Manhattan, NY, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Our Lab EMBS's Publication In collaboration with University of Missouri, St. Louis, MO, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30457050
Our Lab EMBS's Publication In collaboration with Virginia Commonwealth University, Richmond, Virginia, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
Our Lab EMBS's Publication In collaboration with ICMR- NIN(National Institute of Nutrition), Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Our Lab EMBS's Publication In collaboration with University of Minnesota Duluth, Duluth MN 55811 USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
Our Lab EMBS's Publication In collaboration with University of Yaounde I, PO Box 812, Yaoundé, Cameroon. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Our Lab EMBS's Publication In collaboration with Federal University of Paraíba, João Pessoa, PB, Brazil. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30693065
Our Lab EMBS's Publication In collaboration with collaboration with University of Yaoundé I, Yaoundé, Cameroon. Publication Link: https://pubmed.ncbi.nlm.nih.gov/31210847/
Our Lab EMBS's Publication In collaboration with University of the Basque Country UPV/EHU, 48080, Leioa, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852204
Our Lab EMBS's Publication In collaboration with King Saud University, Riyadh, Saudi Arabia. Publication Link: http://www.eurekaselect.com/135585
Our Lab EMBS's Publication In collaboration with NIPER , Hyderabad, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Our Lab EMBS's Publication In collaboration with Alagappa University, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Our Lab EMBS's Publication In collaboration with Jawaharlal Nehru Technological University, Hyderabad , India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Our Lab EMBS's Publication In collaboration with C.S.I.R – CRISAT, Karaikudi, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237676
Our Lab EMBS's Publication In collaboration with Karpagam academy of higher education, Eachinary, Coimbatore , Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Our Lab EMBS's Publication In collaboration with Ballets Olaeta Kalea, 4, 48014 Bilbao, Bizkaia, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Our Lab EMBS's Publication In collaboration with Hospital for Genetic Diseases, Osmania University, Hyderabad - 500 016, Telangana, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Our Lab EMBS's Publication In collaboration with School of Ocean Science and Technology, Kerala University of Fisheries and Ocean Studies, Panangad-682 506, Cochin, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27964704
Our Lab EMBS's Publication In collaboration with CODEWEL Nireekshana-ACET, Hyderabad, Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26770024
Our Lab EMBS's Publication In collaboration with Bharathiyar University, Coimbatore-641046, Tamilnadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27919211
Our Lab EMBS's Publication In collaboration with LPU University, Phagwara, Punjab, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/31030499
Our Lab EMBS's Publication In collaboration with Department of Bioinformatics, Kerala University, Kerala. Publication Link: http://www.eurekaselect.com/135585
Our Lab EMBS's Publication In collaboration with Gandhi Medical College and Osmania Medical College, Hyderabad 500 038, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27450915
Our Lab EMBS's Publication In collaboration with National College (Affiliated to Bharathidasan University), Tiruchirapalli, 620 001 Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27266485
Our Lab EMBS's Publication In collaboration with University of Calicut - 673635, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Our Lab EMBS's Publication In collaboration with NIPER, Hyderabad, India. ) Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Our Lab EMBS's Publication In collaboration with King George's Medical University, (Erstwhile C.S.M. Medical University), Lucknow-226 003, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579575
Our Lab EMBS's Publication In collaboration with School of Chemical & Biotechnology, SASTRA University, Thanjavur, India Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579569
Our Lab EMBS's Publication In collaboration with Safi center for scientific research, Malappuram, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Our Lab EMBS's Publication In collaboration with Dept of Genetics, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25248957
Our Lab EMBS's Publication In collaboration with Institute of Genetics and Hospital for Genetic Diseases, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26229292
Sincerely,
Dr. Anuraj Nayarisseri
Principal Scientist & Director,
Eminent Biosciences.
Mob :+91 97522 95342
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I have used P2Rank and the CASTp tool to analyze the refined structures of some proteins to predict protein pockets and cavities. But now I am not finding any way of visualizing them in PyMOL.
Relevant answer
Answer
you can install CASTp plugin to pymol.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
How I can identify differentially expressed genes from a particular gene family using the GEOdatasets? Which R package is best for differential gene expression analysis?
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
Is there any open-source software I can use to generate the images needed for NEB calculations? I will be using NEB as implemented by DMol3. Thanks in advance!
Relevant answer
Answer
You can use perl scripts provide by VTST tools (https://theory.cm.utexas.edu/vtsttools/scripts.html) to generate linearly interpolated images between initial and final states. It is always better to use chemical intuition even with generated images and that would ease the convergence.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
I have more then 10 sequence and I want to search for homologs of each sequence. I would like to use PSI-BLAST to retrieve this information from database. But I would like to do it at one go.. I don't want to retrieve this information one at a time.
Could anyone tell me any web based s/w or on-line tool to do do batch sequence BLAST search.
Relevant answer
Answer
Thank you for your post !!
I'm in a similar state and was searching for the right option for a week !! The above-mentioned online tool is no more available for the public.
Can you suggest any possible method which I could follow to search approximately 400 unknown sequences?
Thank you very much.
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
Hello!, I'm trying to use the ModelX for my final year thesis research and all requirements for this script are satisfied and I'm using mysql server but when i run the command i get following error.
Your dnaX time has expired on 2021-Jan-31
Academic licenses: just download it again
Commercial licenses: contact us
while i have the latest version of modelx and everything recommended by the Developers. Anyone have some solution to this please
Relevant answer
Answer
Hi.,
Kindly Check this link:
Best Wishes..
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I have some files in bed and bedgraph format to analyze with IGV. My team and I tried to upload them on IGV following the IGV site's tutorias but it hasn't worked. The bedgraph files are large (5157) and we converted them to the bynary .tdf format using the IGVTools "Count" command but it hasn't worked. Only with some files we can see a single flat line on IGV screen without any information. With FilexT we can see that the files in bed and bedgraph are not damaged.
We think that the problem is the step when we select the option "Load from File" on IGV. How can we do? What can we do?
We use the IGV_2.10.3
Relevant answer
Answer
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I am using DAVID (https://david.ncifcrf.gov/home.jsp) to cluster some genes I found upregulated in my RNAseq data. I am just using the official gene symbol without any quantitative data. However, the KEGG pathway results are giving me p-values which are extremely high. It does not make any sense to me. How the p-value can be calculated without any number? Can the p-value be significant?
Relevant answer
Answer
DAVID adopts Fisher's Exact test to measure the gene-enrichment in annotation terms. It is just a matter of making a 2x2 contingency table, as in the example here: https://david.ncifcrf.gov/content.jsp?file=functional_annotation.html (section 2.2).
---------------------------------------
A Hypothetical Example In the human genome background (30,000 genes total; Population Total (PT)), 40 genes are involved in the p53 signaling pathway (Population Hits (PH)). A given gene list has found that three genes (List Hits (LH)) out of 300 total genes in the list (List Total (LT)) belong to the p53 signaling pathway. Then we ask if 3/300 is more than a random chance compared to the human background of 40/30000. A 2 x 2 contingency table is built based on the above numbers: List Hits (LH) = 3 List Total (LT) = 300 Population Hits (PH) = 40 Population Total (PT) = 30,000
Exact p-value = 0.007. Since p-value < 0.05, this user's gene list is specifically associated (enriched) in the p53 signaling pathway by more than random chance.
---------------------------------------
Hence, quantitative data are not considered in such enrichment analyses unless you don't want to calculate an additional activation/inhibition score, as computed by Ingenuity Pathway Analysis, for example.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
When observing elastic modes of proteins, one of the results files shows deformation energy plot. What is the significance of deformation energy when studying protein dynamics?
Relevant answer
Answer
Deformation energy measures the rigidity of protein residues. Higher the Deformation energy, higher the rigidity residues have. Also, it gives idea about protein's local flexibility.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
I am looking for journals that will publish newly developed tool/server/web application/pipeline that are useful in biology, or a newly curated database with biological significance.
Can anyone kindly suggest some journals that publishes Bioinformatics and Computational Biology papers that will publish -
  • Bioinformatics Tools/Servers (Machine Learning, Deep Learning based or else)
  • Text Mining
  • Databases
  • Datasets
  • Pipeline etc.
I know a few such as:
  1. Bioinformatics
  2. Nucleic Acids Research
  3. Database
  4. GigaScience
  5. Nature Scientific Data
  6. Nature Computational Science
  7. Briefings in Bioinformatics
  8. BMC Bioinformatics
  9. PLOS Computational Biology
  10. Journal of Cheminformatics
If you know more, kindly suggest the journal names. Thank you in advance.
Relevant answer
Answer
I think my suggestion still holds. If your tool is usefull for let's say people working on viruses journals that are focussed on this topic might be interested.
Perhaps a good way might be to 'just' search for "bioinformatics tool" in Google Scholar to see which type of journals 'pop up' (besides the ones you already know).
Again good luck.
Best regards.
  • asked a question related to Bioinformatics and Computational Biology
Question
15 answers
can ny1 help me out in converting maestro format to pdb???
thanx in advnce.
Relevant answer
Answer
Thank you Rahul sir.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I have been working on a protein-ligand complex simulation. While I have been careful all the way in preparing the necessary files including the .top and the .gro files I have come across an error stating "2 particles communicated to PME rank 4 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x" while running the mdrun. Initial lookout into this issue gave indications of the system getting blown up. I initially tried to troubleshoot the issue by lessening the time steps as suggested in the gromacs documentation but couldn't resolve the issue. Could anybody give suggestions regarding this issue? 
Thanks
Relevant answer
Answer
From the page:
Possible causes include:
  • you didn't minimize well enough,
  • you have a bad starting structure, perhaps with steric clashes,
  • you are using too large a timestep (particularly given your choice of constraints),
  • you are doing particle insertion in free energy calculations without using soft core,
  • you are using inappropriate pressure coupling (e.g. when you are not in equilibrium, Berendsen can be best while relaxing the volume, but you will need to switch to a more accurate pressure-coupling algorithm later),
  • ...
There is a troubleshooting list here on how to diagnose the problem. I suggest trying the list first.
I'm sorry I don't have a better answer. I'm having similar issues and if I solve them I'll return with more info.
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
How to determine which bacterial virulence factor (bacterial toxins or cell wall components) in relevance to human sepsis or bacterial infection will interact or regulate my target protein of interest. I have examined with LPS treatment in a dose and time dependent fashion. However, I did not notice any difference in expression. Are there any panel of bacterial virulence factors commercially available or bioinformatically possible?
Relevant answer
Answer
But it doesn't contain data about all pathogenic bacteria. As I'm searching toxins of Gardrenella vaginalis and couldn't get that.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
Relevant answer
Answer
Thank you for your answer. If you mean the university library, I have no access to it currently due to the pandemic, Hidetoshi Shimizu
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Hi,
I am studying a simultaneous proton transfer, bond breakage and nucleophilic attack (by water molecule), using US approach for which I had already performed 5ns QM/MM simulation.
All three reactions takes places in a single step (Inversion mechanism for Glycoside hydrolase). Now, I am confused in defining the restraint variables.
I have selected 4 Reaction Coordinates:
1. RC1: Proton transfer from Base residue to leaving group
OE1-HE1 -> C----O4 (this glycosidic bond breaks and HE1 is transferred to O4 )
So, the reaction coordinate for this reaction is difference in distance between OE1-HE1 and O4--HE1.
2. RC2: Glycosidic bond breakage:
C-----O4 -> C O4. The reaction coordinate for this reaction is the distance between C and O4
3. RC3: Nucleophilic attack by water:
H(i)O(w)H(w) [this is nucleophilic water] ---- C (anomeric carbon of the broken glycosidic bond). The reaction coordinate for this reaction is the distance between C and O(w).
4. RC4: Proton transfer from water (H(i)) to Acid Residue
H(i)O(w)H(w) -- OD1 (Acid residue). The reaction coordinate for this step is difference in distance between O(w)-H(i) and OD1-H(i).
For the RC2, I have made the following restraint file:
# distance restraint
&rst iat=8122,8132 r1=0, r2=1.8, r3=1.8, r4=5, rstwt=1,-1, rk2 = 500.0, rk3 = 500.0, /
I have increased the the value for r2 & r3 by 0.2 and upto 3.4. I am not able to understand what should be the value for r1 and r4 ? Could anyone pls comment on it and explain it briefly?
I also not able to understand how to make the restraint file for difference in distances between two set of atoms, as in case of RC4 and RC1. I would be helpful for me if somebody explains it too with an example.
I also want to visualize all the four reaction steps so which trajectory files from all the four RCs I should see?
Since I am new to US, it would be a great help if somebody can guide me through this.
Regards
BHARAT
Relevant answer
Answer
Bharat Gupta how you become able to write your umbrella sampling input disang file (as mentioned in the amber tutorial) using the LCOD method. Can you tell me, please?
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I have RNA-Seq data for different cell lines and I'm looking to find lncRNAs which maybe deferentially expressed.
Relevant answer
Answer
Is there any method to work with NONCODE in R?
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
Cell free system
Relevant answer
Answer
I was looking for the definition of the cell-free system and thanks to you I found what I need.
Thank you.
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
Hi everyone! I'm trying to work on the acquisition of the Raman Spectra of a leaf section using Confocal Raman Spectroscopy. The samples to be used are pure, dried, and powdered leaf samples. I am going to use a 785 nm laser source.
However, the only thing I get was a spectra with no peaks or it is strongly masked by fluorescence. Do you have any tricks/sample preparations to avoid the fluorescence because I'm afraid that it covers the raman signal or enhance the Raman Signal because the compounds might have a relatively weak Raman Signal compare to the background signal and the fluorescence? Are there any sample preparations that can be done without the use of water or an immersion objective like the use of solid matrices which can be mixed with the sample? Thank you. 
Relevant answer
Answer
All of the suggesttions can be not feasible for John. I guess Raman microscope is what he has had and he needs to obtain Raman peaks out of the strong fluorescence background.
John, may I ask what is the model of your equipment? In general, you could try different configuration to enhace Raman peaks and reducing fluorescence background: objective lens, generally the larger magnifier the better; the shorter exposure time with larger acquisition numbers; adjusting sampling focal point, the best Raman response obained not always from the focused sample; sampling on larger particles/or particulates, the larger particles the smaller surface area and weaker fluorescence.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Hi,
We were carrying out in vacuo energy minimization studies of a protein dimer (which is experimentally proven to be a dimer). Earlier, the same work has been done in our lab using an older version of GROMACS (4.5.5) and used Group cutoff schemes with coulomb type= cutoff and with no pbc.
When we reinitiated the work again and have to use the Gromacs 5.0.4, the default cutoff scheme is changed to Verlet. We are observing that using Verlet cutoff scheme, the monomers dissociate from each other which is not the case even in this version when using Group cutoff scheme.
I searched for literatures and found out the differences are probably in the pairlist generation. In my graduate courses, I have read about energy drift in molecular dynamics simulation and is aware (though not in details) that Verlet algorithm has something to do with it.
Can anyone elucidate on this problem? The minimization runs fine and the protein remains dimerized when using Group cutoff. This happens even after solvation. We have used an xyz pbc and grid neighbour searching type with default fourier spacing and rlist as we have not mentioned the last two parameters explicitly in the mdp file.
I want to know the theory in play behind this. Please help.
Relevant answer
Answer
I believe the answer to this question is covered here:
  • asked a question related to Bioinformatics and Computational Biology
Question
23 answers
I have performed RAPD for V. cholerae isolates with 1281 and 1283 random primers and found a distinct band pattern. I have attached a picture.
Relevant answer
Answer
You can use GelJ software, it is java-based, free, and user friendly, which can draw different dendrogram with different analysis.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I am looking fora command that will modify 3 chains available in the original pdb into a single chain and then renumber all of the residues. I have tried using alter command but when I export the pdb I get only one chain (of the initial trimer) and not the merged chain
Relevant answer
Answer
You can first renumber the chains using the alter command (https://pymolwiki.org/index.php?title=Alter&redirect=no) in such a way that each residue has a unique residue number,
e.g.
alter (chain B),resi=str(int(resi)+100)
alter (chainC),resi=str(int(resi)+200)
to give chains B and C an offset of 100 and 200, respectively.
then again use the alter command to change the chain label
e.g.
alter (all), chain='A'
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I am looking for a recent diagnosis for chikungunya virus through computational biology techniques.
Relevant answer
Answer
Molecular docking
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
Hope everyone is having a good day.
I want to learn computational biology. I have a PhD. in pharmacology. Lots of times I heard about the computational biology/bioinformatics but never had a guideline how to learn or to start this interesting field of research.
It would be very helpful if you can guide me through this.
Have a nice day.
Relevant answer
Answer
Dear Apu, Bioinformatics is a mean not an end in itself and the confusion of a mean with an end is the reason of the drmatic crisis science (not technology) is experiencing in these days (see for example https://www.pnas.org/content/113/34/9384.short).
Thus, first of all, you must aquire a 'quantitative sensibility' for biological problems that means: in the face of a biological problem how to restate the issue in order to have a simple recognition of which are the statistical units, the variables of interest , the most interesting scale where to look and if I can provide a suitable metrics preserving the original biological meaining.
Then the informatics will come by alone, tis means you must learn statistics (with a special emphasis on multidimensional descriptive methods like PCA, Cluster Analysis, MDS..), complex networks analysis, non-linear dynamics fundamentals (what an attractor is, what is a transition) and fundamentals of probability.
Attached you will find a sketchy representation of the quantitative needs for facing biological problems.
  • asked a question related to Bioinformatics and Computational Biology
Question
28 answers
I'd also like to know the recent data sets used in research for the above domain.
  • asked a question related to Bioinformatics and Computational Biology
Question
9 answers
(For survival tests and probit analysis)
Relevant answer
Answer
Dear Mr. Kapoor,
you will have to email Dr. Ehab Mostafa Bakr the registration number of your Ldp line software to get the key number. the registration number can be found on the registration window when you open ldp line on trial verson. his email address is [email protected]. I got mine for free.
  • asked a question related to Bioinformatics and Computational Biology
Question
11 answers
Please suggest some protein ligand docking servers to do docking online
also need some webservers that allow the multiplwe ligands at a time to doking
thanking you to all the knowledgable persons
Relevant answer
Answer
You might try Webina too: https://durrantlab.pitt.edu/webina/ . It runs AutoDock Vina in your browser, without having to install anything.
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
I am curious if there exist any bioinformatics tools that can predict changes in secondary structure for proteins and/or nucleic acids. For instance, say a C-terminal loop on a protein reorganizes into a helix in response to binding to RNA. Or say an intrinsically unstructured segment of RNA forms a transient stem-loop in order to bind to a protein. Are there any computational tools that could predict such a change short of performing in-lab experiments?
Relevant answer
Answer
As indicated by Farhana Rumzum Bhuiyan you most likely have to look at program packages like GROMACS (requires quite some efforts to understand the way it works).
It sounds to me that you need to enter the field of molecular docking:
A variety of programs (and approaches like molecular dynamics as used by GROMACS and Monte Carlo etc.) can be used:
Hope this helps you further.
Best regards.
PS. The suggestions like PSIPRED are excellent ways to get a pretty reliable prediction of the protein (itself) but does not allow to combine this with a prediction of the same protein in the presence of some kind of ligand.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Hello, im Phd student, In my master's thesis, I investigated the cytotoxic, apoptotic and cell cycle effects of an anticancer drug (Danusertib) on pancreatic cancer cells (CFPAC-1and Mia-PaCa-2) by using xCelligence and Flow cytometry in Cell culture lab.
However, I want to do my Phd thesis with virtual experiments using databases ( OMIM, COSMIC, GAD, TCGA) and computer power (maybe on Amazon web services, google cloud or azure) due to financial insufficiency and I like to spend time with computers. So I don't know where to start research about these things and can I do a logical research with these databases? Can anyone give a tip or advice ?
Relevant answer
Answer
Yes, you can use these datasets for research work equivalent to a PhD thesis. As a reference, you can check the publications by TCGA and other groups which utilized TCGA data. A series of these publications have been published by Cell Press as TCGA-Pan Cancer Atlas.
You can see, just in silico work published in the Cell Press journals. But, before thinking of that extend i.e., to entirely rely on these datasets, think what novel question you can address. If you have a highly relevant question, you can go for it. Otherwise, a simple and safe plan can be using hypothesis generation by datasets followed by validation using in vitro studies or vice-versa. This type of combinational work is regularly published and will be more acceptable to most universities and individuals. All the best.
You can check our papers also where we have used simple tools to analyze TCGA data.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Besides gene essentiality and non homology with human proteins.
Relevant answer
Answer
Receptors are good targets, because the drug does not have to enter the cell. Proteins expressed in large quantity are poor targets, because you need a lot of drug. Proteins with fewer variants are better targets, because it may be harder for the pathogen to evolve resistance.
  • asked a question related to Bioinformatics and Computational Biology
Question
12 answers
INVdock software has been used to predict the first receptor of drug with low molecular weight and finding or predicting cell target, I would be thankful to anybody who could let me know how could I get the software and procedure for working with the same.
How is the popularity of INVdock software?
is this software free and what is the procedure of working with that?
Relevant answer
Answer
After all that time, unfortunately, it still unaccessible.
Kindly see the attached file and the link below:
Regards
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I'm trying to use GridMAT to get the area per lipid and thickness so I installed activeperl on my laptop (on windows) and put the three necessary files in and  ran this command:
> perl GridMAT-MD.pl param_example
It doesn't give me any valuable output.
#### I attach perl screen and its error ####
Would you help me please to get the desired results?
Relevant answer
Answer
In case you are still facing the problem, can you please post your param example file?
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Usually for genetic association analysis there are lots of SNPs, but we generally select few tagSNPs based on LD value (r2)? How can we calculate the r2 value to know which SNPs are to be chosen?
Relevant answer
Answer
Plink v.1.9 - Pairwise LD measures for multiple SNPs (genome-wide)
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
I have done my dockings of a ligand to a protein. I want to save protein-ligand complex as a PDB file in AutoDock so that protein viewer can see it. Or Why is it that Pymol did not see Autodock result I saved through ''write complex''.Thanks
Relevant answer
Answer
The reason might be the file type, while browsing for docking complex, change the file format accordingly, pymol will visualize your structure.
  • asked a question related to Bioinformatics and Computational Biology
Question
9 answers
In the fasta output of Prokka listing the name of genes, some genes does not have any name ("gene: NA"). My question is  whether these genes are hypothetical or they do not have any name?
If the former one is the case,  how Prokka determine them?
Relevant answer
Answer
If you mean gene feature, then you can use --addgenes option for Prokka. Sebawe Syaj
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
Hi everyone,
I need to do MD simulation of wild type and ten variants at 50 ns. I am looking for a low-cost cloud service/ simulation environment. Would you please suggest me any?
Thanks in advance.
Relevant answer
Answer
you can check docker: https://www.docker.com/
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Can anybody help me in submitting 8 Leishmania donovani partial gene sequences in genbank (ncbi)?
I have eight sequences with small differences.
The promoter region of the ldmdr1 gene of L. donovani was amplified and the PCR products were sequenced using sanger dideoxy sequencing method. Now I have eight different sequences (450-490 nucleotides long) but I don't know the "features" of those sequences as I am not very good in molecular biology. Kindly somebody help me in submitting my sequences. I shall be highly obliged and will acknowledge the person in my PhD thesis :-) 
Relevant answer
Answer
  • asked a question related to Bioinformatics and Computational Biology
Question
9 answers
I have query file which i got from genemarkhmm tool and which is in nucleotide format. My problem is I want to run BLASTx but It gives me "No alias or index file found for protein database" error. Mt database is protein database. I struck here help me this work is very important for me.
Relevant answer
Answer
I had the same problem... fixed using formatdb command ;)
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I'm a molecular biologist, and i have a few projects coming up in transcriptomes and small RNA analysis. Can i get by without knowing any programming using user-friendly software such an Geneious Prime or another program you can suggest or is it absolutely a must?
Relevant answer
Answer
Hi,
To be an efficient bioinformatician, you need to learn at least any programming language. You need not[ be high-end developer, but at least know how to do your bits. Also, you can be comfortable and can use various user-friendly GUI, but it would need more time and space, whereas in coding you can customize, according to your needs.
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
When I load repeat simulation of my mutated protein from 10ns-20ns-30ns, its RMSD graph is in picture. I watched my dcd, my protein goes out of the water box. I tried to put it inside of the box with "pbc wrap -centersel "protein" -center com -compound residue -all" code. The protein entered the box but the RMSD values doesn't change. How can I solve this problem?
Relevant answer
Answer
I asumed you use Gromacs package for rmsd calculation.
So first you should remove all pbc issue in your trajectory file. Try following steps:
1. If your system contains protein+ligand, make an Index file having group of protein+ligand
2. gmx trjconv -f xxx.xtc -s xxx.tpr -n index.ndx -o xxx_nowater.xtc -pbc cluster
select group number of protein+ligand.
3. now calculate rmsd using xxx_nowater.xtc
Hope it will work
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
Hello,
I trimmed and assembled (Reverse and Forward) my sequences using CLC Main Workbench software. The trimming I did aims to remove poor quality 3’ and 5’.
I’m going to perform multiple alignment, haplotype diversity and phylogenetic tree, using other software (DnaSP, Arlquin and MEGA).
Those softwares need uniform size of sequences (same length).
I have about 120 COI sequences of different sizes (640-750bp). The expected size was 710 bp.
Is there any software or method to trim sequences at same length: either one by one or both together? Which size do I need to choose for this trimming in my case?
Relevant answer
Answer
Please someone help give me a step wise procedure to trim poor quality from nucleotide sequences in either MEGA or Bioedit. Thanks.
  • asked a question related to Bioinformatics and Computational Biology
Question
12 answers
Hi to all,
I'm approaching to the haddock web-tool for the first time. I got the username and password for the easy interface.
I'd like to know wheather i'm on the right way.
Once I've uploaded the pdb files to be docked, I have to specify both the active and the passive residues.
In order to determine the active residues I have performed an NMR titration of the unlabelled protein with the labelled ligand and vice versa. Then I've calculated the chemical shift perturbation.
Now I have to determine which among them are the active residues in the protein-ligand interaction.
So, shall I have to submit the pdb to a SASA (solvent accessible surface area) calculation program and chose the chemical shift perturbation residues that match with those solvent accessible by the SASA program?
is it correct?
do you advise any software/webtool? (i know NACCESS, but there is a very tedious procedure that i have to follow in order to get codes for decrypt the rar files)
thank you.
what have i do for the passive residues, is reliable the option on haddock that allows to determine them automatically?
Bye
Relevant answer
Answer
HADDOCK is a very good Protein-Protein docking server, and the new upgraded version, HADDOCK 2.4 is much more advanced comparatively. Besides the results from the server come refined and energy-minimized by default. For knowing the active and passive interacting residues, you dont even have to other software, since a server named CPORT, from the same developers can help you with the list of active and passive residues in the pdb files you have uploaded, and hence these residues can be used for docking analysis in HADDOCK 2.4.
Here are the links for the both these servers:
Link for HADDOCK 2.4 server:
Link for CPORT server:
Hope it helps.
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
Can you suggest some of the free journals in the field of bioinformatics, computational Biology, that provide green open access. If any one can recommend a Free-To-Publish journal with relevant scope, will be greatly appreciated!.
Relevant answer
Answer
I have extracted and compiled necessary information about all academic journals in excel format here. It also mentions which journals are free (subscription available) or paid open access:
please request and recommend on the above link.
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
Mainly in B cells
Relevant answer
Answer
A good protein designing is required in order to expect it to induce a strong immune response.
Here are some of the criteria you can follow:
1. Choosing the epitopes, i.e., the epitopes with good score and good binding affinities
2. The chosen epitopes should be promiscuous in nature, i.e., efficient in covering a wide range of immune molecules
3. The protein should have good antigenic score
4. The protein should be highly immunogenic and efficient to evoke the hypothesized immune response
5. The protein should contain sequences and conformation that can be easily identified by molecules such as TLRs
In case the protein is not immunogenic, as in upto the mark, it can be enhanced by the connected it with an adjuvant molecule, via suitable linkers.
Hope it helps.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
I downloaded a database from Binding DB. It contains a lot of duplicate structures. How can I remove these duplicate structures?
Relevant answer
Answer
I realize this is a very old post. Nevertheless, I think that many people might encounter the problem of removing duplicates from a multi-structure file. One convenient way to do this is to use OpenBabel. Suppose you have an SDF file ("oldfile.sdf") that contains multiple structures and you would like to generate a new SDF file ("new_noduplicates.sdf") with duplicates removed. In a terminal or command prompt, do the following:
> obabel -i sdf oldfile.sdf -o sdf -O new_noduplicates.sdf --unique
The new file is generated and the output in the terminal window shows a listing of all the duplicates that were identified in the process.
  • asked a question related to Bioinformatics and Computational Biology
Question
11 answers
Suppose i have a DNA sequence and i want to find transcription strat site, CDS, poly A signal etc., which software will be useful to find this out?
Relevant answer
Answer
#GlimmerHMM is a new gene finder based on a Generalized Hidden Markov Model (GHMM). It actually helps you to annotate the draft genome by running a few simple commands.
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
In the context of mapping next generation sequence reads (of RNA->cDNA) to a reference genome to estimate allele specific (AS) expression:
Allelic imballence (AI: more reads mapping to one allele than another) can be due to a variety of technical and biological factors, so it is important to control for causes of AI that are not biological if you want to estimate AS expression. There are several strategies that have been developed to try and address these problems, including read masking and genomic blacklists.
What is the difference between read masking and genomic blacklists?
Thanks!!!
Relevant answer
Answer
Do you know about Blacklist of Biological Factors ?
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Hi everyone!
I've been study the expression of 5 microRNAs using TaqMan assays in several cell lines (microRNAs were selected through literature review) and I obtained statistically significant results in spite of the results were not the expected.
Is it possible now to perform gene ontology studies and construct networks with the cytoscape to better understand the role of these microRNAs in my samples without a previously global microRNA expression analysis?
Relevant answer
Answer
Hi everyone!
I've been using the app BiNGO from Cytoscape but my output is always empty when i select my network for analysis. Can anyone help me to resolve this?
In attachement is the output that i obtain.
Thank you. :)
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Few of us wanted to create a discord server for Biophysics. What we intend is to begin a commonplace for discussions/numerical experiments. Also possibly document the results in the form of blogs or other media.
I believe that there are many biophysics/computational biophysics/Molecular Dynamics enthusiasts here. Here is the server link: https://discord.gg/qRQRq2k
Come and join us. Let us learn together.
Relevant answer
Answer
Dear Devanand,
Can you explain more what is the purpose of this post and what the discord means?
Bog
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
I was trying to design a nanocluster of 10 nm diameter using Material studio(MS) software. Due to the lack of the "nm" size option in the material studio, I have used a 50Å (angstrom) radius option available in MS to construct the nanocluster. I was confused when I found 60000 atoms in the generated nanocluster of size 10 nm diameter (100Å). Whether my conversion (nm to Å ) is correct or the Å mentioned in MS is different from my conversion?. I have doubt that a 10 nm-sized nanoparticle will have 60000 atoms in its cluster form. Please help me with this issue.
Thanking you
Hari Prasath
Relevant answer
Answer
What is the material that you are using for cluster?
Do you apply PBC or without?
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I actually have two queries. Biochemical studies suggested presence of an enzyme in an organism but the gene encoding the enzyme is not known. I would like to find out gene candidates based on homology search/sequence similarity, using sequences of similar enzymes present in other organisms. My questions are:
1. What points should I consider to select an already known gene/protein for use in homology search
2. To find out the orthologue of the known gene/prtoein by bioinformatics, which database/software should I use?
Please suggest papers/websites/softwares for beginners.
Thanks!
Relevant answer
Answer
1. It is recommended that you use genes / proteins in which the functionality has been verified using a database of nearby species for which there is information.
2. You can use the NCBI database to obtain the complete sequence of the genes or use the InterPro database to obtain the domains(Recommendable). Then align your sequences with one of the two databases using BLAST.
  • asked a question related to Bioinformatics and Computational Biology
Question
17 answers
We have dataset containing large numbers of proteins belonging to PDB, SwissProt, TargetDB and unknown sources. We have no idea about their nucleotide sequences, but we are interested in understanding codon preference in these proteins. I would highly appreciate it if you please advise me on how to extract original nucleotide sequences of these proteins.
Relevant answer
Answer
Hi,
Just in case you guys are still interested, this may be the answer:
1. Use your amino acid sequence to blastP in NCBI by selecting the appropriate organism or database;
2. Retain a list of the protein id for the 100% match hit for each of your query sequence, save it in a text file;
3. upload the text list file to NCBI Batch Entrez, to retrieve record for your protein sequences;
4. On the results page, there is an option "Send to", click it, then there is an option to download the "FASTA CDS" for all your protein queries.
I found this option by accident, Hope it helps :)
  • asked a question related to Bioinformatics and Computational Biology
Question
29 answers
I have 14 miRNA that is related to a particular disease. I want to draw a network like Gene Networking (GeneMania).I can draw a network easily by inputting the gene name in genemania but which softwere can take input miRNA name like this? Which software is better? I was trying to use Cytoscape but it require pre-networking data (if I am not wrong). I am not sure whether I can get any pre-networking data for miRNA. Some of the miRNA is quite new and some old version of the software can't recognize that one.
Please help me how I can get a network like Genemania. I only can input different miRNA name and particular disease. Thanks.   
Relevant answer
Answer
We recently developed miRViz to interpret microRNA datasets using microRNA networks:
To build miR-mRNA network, a Canadian group has developed miRNet: https://www.mirnet.ca/miRNet/home.xhtml
Both may be useful for you. No need for computational skills. I would suggest first miRViz, and then miRNet.
If you want more information, we also published in 2015 a paper building microRNA networks (we use these networks in miRViz):
'hope this will help you.
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
I am interested in doing analysis on a set of differentially expressed genes. We use to have access to Ingenuity, where I discovered the "Upstream Regulator" tool, identifying upstream regulatory proteins enriched for your dataset. 
Since we no longer have access to Ingenuity, I have been trying to find an alternative, preferably free. 
I am not looking for pathway analysis tools, but specifically for these upstream regulator-tool.  
Thank you, 
Relevant answer
Answer
We settled for Enrichr - transcription factor binding. Though it is not ideal since no experiment-specific background can be uploaded.
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
I am working on FIV and have sequenced each gene individually and retrieved multiple complete genomes and individual genes from Genbank. I am trying to align all the different genes to the complete genome. I need a program that can align each individual sequence to one reference sequence. I have tried MEGA but from what I understand the pairwise alignment compares 2 subsequent sequences in the list and tries to align them then the next pair and so forth (this won't work as I don't want to align different genes), while the multiple sequence alignment tries to align each sequence to all the other sequences (this also won't work for the same reason). Is there a setting in MEGA that will allow me to set a reference sequences to which it must compare all other sequences to or another program that will allow me to do this.
The only other option I can think of is to compare each sequence individually to the reference sequence and that will be a nightmare as I have 4289 sequences. Please if you have any suggestions let me know. Thank you.
Relevant answer
Answer
Hi,
you can use MUSCLE. It allows you to align two alignments with each other (known as profile-profile alignment),
For single sequence in the profile:-
muscle -profile -in1 first.fasta -in2 second..fasta -out combined_result.fasta
this aligns the alignments in the two files first.fasta and second.fasta to each other and writes them to combined_result.fasta
You can also do the same profiles thing with Mafft.
with ClustalW2:-
clustalw2 -profile1=data.fa -sequences -profile2=first.fa
assuming one seq is first.fa and the prealigned others are in data.fa
you can also look AVID
Bray, N., Dubchak, I. and Pachter, L. (2003) AVID: A Global Alignment Program. Genome Res., 13:97
  • asked a question related to Bioinformatics and Computational Biology
Question
34 answers
Which book should I read to understand bioinformatics from the very beginning?
Relevant answer
Answer
I have attached two books. I hope they would be useful.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Hi everyone, currently I' m doing a course "Whole-genome sequencing and its applications" from the technical university of Denmark, working on a final project
So I have five unknown samples,
And those five unknown samples, they are in the sample genomes.
That means they are in FASTA format, not FASTQ or RARI.
They already assembled.
And in those five genomes that will get,
three of them are the outbreak strains.
So, what I have to find out from the five strains, I have to find out which one are the three outbreak strains. So in order to identify which one are the outbreak strains.
Of course, because they're unknown, I have to know what they are.
Then I have to know how to treat them.
To know those questions, they give me some hint.
For example, what they are?
I can use KmerFinder to know a species and once I know species I also can know sequence type by using MLST tool.
And then I can see if my samples contain any plasmid using PlasmidFinder.
And if my sample contains plasmid and what kind of plasmid in my sample,
I can do Pm(t) or plasma typing.
Relevant answer
Answer
Your question is not quite clear enough for me to understand. It sounds like you would be asking about bacteria infecting humans. But it could possibly be an agricultural (pigs, cattle, horses, chickens, fish etc) or other type of outbreak. And even within human bacterial infections or "outbreaks" there is a large difference between different types of bacteria. The Escherchia coli that carry a shiga toxin gene on a plasmid (STEC), for example, are quite different from Klebsiella pneumoniae or other "outbreak" bacteria.
Some plasmids (or phages carried on plasmids) are rather promiscuous and can travel between rather diverse host bacterial "species", while others are usually associated with a single bacterial lineage. Sometimes it is a drug resistance issue, making the infection more difficult to treat, and sometimes it is a bacterial toxin causing a more accute issue. Some bacteria are spread by wastewater or other unsanitary conditions, and some are food-borne or require close human-to-human contact.
Anyway, the type of "detective work" needed in the databases to sort this out can be quite different for the different bacteria.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
I am trying to run SANDER in parallel in cluster. In Gromacs, I used to run
dplace -c 0-59 mdrun -v -s em.tpr -c em.pdb -nt 60 (here I was defining CPU numbers 0-60, and -nt was assigning the number of nodes).
For Sander in Amber, I tried
mpirun -np 10 sander -O -i min.in -o min.out -p test-solv.prmtop -c test-solv.inpcrd -r min.rst -ref test.inpcrd
I too also have tried
mpirun -np 10 pmemd -O -i heat.in -o heat.out -p test-solv.prmtop -c min.rst -r heat.rst -x heat.mdcrd -ref min.rst -inf heat.mdinfo &
However, it is running only in one cpu. I too have tried using -nt 10 or -t 10 etc. This is firing errors.
In this regard, can somebody help me in finding the correct command that can assign 10 CPU to calculate MD using Sander. I have referred mpirun from Amber FAQ and group discussions, but these are not working in my cluster.
Relevant answer
Answer
Here is my amber parallel note. It is in turkish but all you need is commands. So I think you can take advantages : https://www.evernote.com/shard/s685/sh/cf35c8ff-cba9-0f18-d20a-e664cd0d14d9/16b27a12b90c39faf16ea63e11c79d50