Science topic

Data Analysis - Science topic

Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Questions related to Data Analysis
  • asked a question related to Data Analysis
Question
6 answers
I'm currently analysing the results of my survey but I'm encountering the problem that my quantitative data is too similar and I'm not sure how to interpret it.
Does anyone have any advice for me or can recommend any reading about this issue?
Relevant answer
Answer
I would examine correlation matrices for your various scales. If the correlations are all uniformly high (say .8 and above), then that would indeed be a problem.
  • asked a question related to Data Analysis
Question
4 answers
Dear colleagues
Could you tell me please,how is it possible to consruct boxplot from dataframe in rstuio
df9 <- data.frame(Kmeans= c(1,0.45,0.52,0.54,0.34,0.39,0.57,0.72,0.48,0.29,0.78,0.48,0.59),hdbscan= c(0.64,1,0.32,0.28,0.33,0.56,0.71,0.56,0.33,0.19,0.53,0.45,0.39),sectralpam=c(0.64,0.31,1,0.48,0.24,0.32,0.52,0.66,0.32,0.44,0.28,0.25,0.47),fanny=c(0.64,0.31,0.38,1,0.44,0.33,0.48,0.73,0.55,0.51,0.32,0.39,0.57),FKM=c(0.64,0.31,0.38,0.75,1,0.26,0.55,0.44,0.71,0.38,0.39,0.52,0.53), FKMnoise=c(0.64,0.31,0.38,0.75,0.28,1,0.42,0.45,0.62,0.31,0.25,0.66,0.67), Mclust=c(0.64,0.31,0.38,0.75,0.28,0.46,1,0.36,0.31,0.42,0.47,0.66,0.53), PAM=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,1,0.73,0.43,0.39,0.26,0.41) ,
AGNES=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,1,0.31,0.48,0.79,0.31), Diana=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,1,0.67,0.51,0.43),
zones2=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,1,0.69,0.35),
zones3=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,0.59,1,0.41),
gsa=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,0.59,0.36,1), method=c("kmeans", "hdbscan", "spectralpam", "fanny", "FKM","FKMnoise", "Mclust", "PAM", "AGNES", "DIANA","zones2","zones3","gsa"))
head(df9)
df9 <- df9 %>% mutate(across(everything(), ~as.numeric(as.character(.))))
Thank you ery much
Relevant answer
Answer
Dear Valeriia Bondarenko
First you need to install "ggplot2" and "reshape2" along with those two libraries.
# Then you have to melted the methods
df9_melted<-melt(df9,id.vars="method")
# For the boxplot
ggplot(df9_melted,aes(x=method,y=value))+geom_boxplot()+labs(x="Method",y="Value",title="Boxplot of methods")
  • asked a question related to Data Analysis
Question
3 answers
Dear Experts,
I have a question, I am analyzing data for meta-analysis, is this possible SMD greater than 1 for any study,
I have a study in my data indicate 2 .03 SMD.
Relevant answer
Answer
Don't know specific situation,Dare not comment,But you need to analyze from the basic concept of "SMD".
  • asked a question related to Data Analysis
Question
3 answers
Hi!
I want to use the ADL model for my data analysis. However, after performing a stationary test, dependent and 6/8 independent variables are stationary only in differences. The other two are stationary in levels.
Is the cointegration test always necessary?
If so, I found on the Internet that I can only use the Pesaran Bounds test because I have a mix of I(0) and I(1) variables. Is it true? I am not sure.
And how do you perform that test?
Thanks a lot for your suggestions.
Relevant answer
Answer
After performing a stationary test for your data analysis using the ADL model, you have found that the dependent variable and 6 out of 8 independent variables are stationary only in their differences, while the other two are stationary in levels. In this scenario, you can proceed with modeling your data using an Autoregressive Distributed Lag (ADL) model.
Adam Tomko Moges Mengstu Kassaw Srk Haqbin Benjamine Gaspar Miku The ADL model is suitable for situations where variables exhibit different stationarity properties, such as some being stationary in levels and others in differences.The ADL model allows for the inclusion of lagged values of both the dependent and independent variables, accommodating the mixed stationarity properties of your variables. By incorporating lagged values of the variables that are stationary in differences, you can capture the short-term dynamics and relationships in your data. At the same time, including the variables that are stationary in levels enables you to account for the long-term equilibrium relationships.
This approach aligns with the flexibility of the ADL model, which can handle variables with diverse stationarity characteristics, making it a suitable choice for your data analysis scenario. By appropriately specifying the model with the lagged terms of the variables based on their stationarity properties, you can effectively capture the dynamics and relationships within your dataset. Adam Tomko
  • asked a question related to Data Analysis
Question
3 answers
AI in research offers tremendous potential, but ethical considerations are crucial. Biases in data or algorithms can lead to discriminatory or unfair results. The "black box" nature of some AI models makes it difficult to understand their reasoning, raising concerns about accountability. Ensuring data privacy, transparency in research methods, and maintaining human oversight are all essential for ethical AI-powered research.
Relevant answer
Answer
Using AI technologies for data analysis and study interpretation highlights a number of ethical concerns. There is the issue of biases in the data,records since AI algorithms can give biases in the data on which they are taught, resulting in unjust conclusions.
Transparency is essential for data analysis, understanding how AI makes its decisions enables accountability.
Privacy risks occur as a result of the massive volumes of data that AI demands, demanding strict data protection procedures.
The social implications of Artificial intelligence adoption impose some concerns, like as its effects on employment and inequality, which must be examined.
Responsible AI usage for predicting unanticipated outcomes and prioritizing ethical principles to guarantee research helps society without harming individuals or aggravating current imbalances.
Balancing in innovation with ethical integrity is critical to the ethical use of AI technologies in research and data analysis.
There are other pros and cons also but while using AI in research we should look for ethical concerns always. It's always better to work, and produce less, rather than work false or to follow the malpractices, and falsification of data in the research.
Thanks.
  • asked a question related to Data Analysis
Question
3 answers
I am studying leadership style's impact on job satisfaction. in the data collection instrument, there are 13 questions on leadership style divided into a couple of leadership styles. on the other hand, there are only four questions for job satisfaction. how do i run correlational tests on these variables? What values do i select to analyze in Excel?
Relevant answer
Answer
First, you need to do the correlation between your target variable and each of your potential independent variables. After checking what independent variables are the more correlated to your target variable (as mentioned earlier coefficient correlation closest to - 1 or + 1). Once, you decide according to these correlation coefficients which variables you can select for your model, you need to ensure that there will be no multicollinearity in your model. To ensure that, for each independent variable you do correlation tests again. If two independent variables are too correlated, you should introduce only one in your model (e.g. the variable which had the higher correlation rate with your dependent variable).
  • asked a question related to Data Analysis
Question
1 answer
Based on your expertise, which softwares are the best for the data analysis and graphs for the quantitative study of microbial biofilms?, pros & cons?
Relevant answer
Answer
Microbial biofilms are structured communities of microorganisms that attach to surfaces and produce a protective matrix. These communities are complex and can include bacteria, fungi, and protists. Biofilms are significant in both environmental and clinical settings because they can protect microbes from antibiotics and disinfectants, making infections difficult to treat.
Here are several highly regarded options for the quantitative study of microbial biofilms:
1. ImageJ/FIJI
Pros:
Adaptability: Excellent for image processing, crucial for analyzing biofilm structures.
Community Support: Being open-source, it benefits from a wide range of plugins developed by the community, enhancing its capabilities.
Cons:
User Experience: New users might find it challenging due to its extensive features and capabilities.
2. COMSTAT
Pros:
Specialization: Designed specifically for biofilm analysis, capable of processing image stacks to quantify biofilm thickness and coverage.
Cons:
Narrow Focus: Primarily focused on image analysis, which may require supplementary tools for broader data analyses.
3. R with Bioconductor
Pros:
Extensive Analysis Features: Offers robust statistical tools and is capable of handling diverse datasets, including genomic and transcriptional data. Flexibility: Extensive package options and strong community support for troubleshooting and development.
Cons:
Complexity: The learning curve can be steep for those new to programming or statistical analysis.
4. MATLAB
Pros:
Versatility: Well-suited for numerical computing and managing large datasets, with strong capabilities in both data analysis and visualization. Specialized Toolboxes: Offers specific toolboxes for image processing and statistical analysis, enhancing its utility.
Cons:
Cost: It is a proprietary software, which might be a barrier for some researchers due to its cost.
The choice of software for studying microbial biofilms depends on the specific needs of your research, such as the type of data you are analyzing and your level of expertise in data analysis. ImageJ/FIJI is optimal for detailed image analysis, while COMSTAT offers specialized biofilm quantification tools. For more comprehensive data analysis, R with Bioconductor is excellent, though it requires familiarity with statistical concepts. MATLAB provides a broad array of tools but at a higher financial cost. In many cases, researchers might find it beneficial to use a combination of these tools to fully address their analytical needs.
  • asked a question related to Data Analysis
Question
3 answers
In the process of drafting a new research article, which structure is most effective? Should one follow the order of Introduction, Materials and Methods, Data Analysis, Results, and Discussion, or is it better to write the Materials and Methods, Results, Discussion first and leave the Introduction for the end? What is the commonly adopted approach by other scholars/researcher around the world?
Relevant answer
Answer
In terms of the writing process, I prefer to begin with the Results section because it is usually the most concrete, and the Methods section follows more or less automatically from that. Once I have those completed, I have a better idea of what the Introduction and Background sections should look like, which then leads me to the most appropriate Discussion and Conclusions.
  • asked a question related to Data Analysis
Question
3 answers
Hey there!
I want to learn about correlation. I recently worked on a project related to rice genotypic trials, using a one-factorial RCBD design. While I know how to statistically analyze phenotypic and genotypic correlation, I specifically want to learn how to analyze environmental correlation using R. Could anyone help me out?
Thank you in advance. :)
Relevant answer
Answer
See the video on youtube on the subjects
genotypic and phenotypic correlation in r:
  • asked a question related to Data Analysis
Question
3 answers
Imagine you join a new research lab and are immediately assigned a dataset that contains the same variables as those in the substance abuse dataset but comprising a new sample. You are told the lab originally collected this data set with one particular research question in mind: “Does satisfaction with life predict health outcomes?” Before doing any statistical tests, you decide to browse through the dataset and make some graphs of the results. It seems to you that in your data, satisfaction with life predicts health outcomes more strongly in males than in females, and on reflection, you can think of several theoretical reasons why that should be the case. You disregard the data on females and investigate the hypothesis that low levels of satisfaction with life (using the “swl” variable) will be positively predictive of mental ill health (using the “psych6” variable) in males. You finally do a statistical test, and obtain a very low p-value (less than .001) associated with the regression coefficient. You write a paper using this single result, concluding that there is strong evidence for your hypothesis.
Question: What is a term for the practice that you are engaging in?
Is this practice a p-hacking or the garden of forking path?
Relevant answer
Answer
There is ongoing debate in the scientific community about the validity of p-values in scientific research. Many scientists and statisticians are calling for abandoning statistical significance tests and p-values. The only way to avoid p-hacking is to not use p-values. You should make scientific inference based on some descriptive statistics and your domain knowledge (not p-values).
  • asked a question related to Data Analysis
Question
3 answers
I have a case control study that I would like to publish but there has already been a meta analysis of observational studies in the same topic but on a different population (Iranians) and during a different time period (2000 to 2016). Mine is in the USA and will analyze data from 2016 onwards. Will my study be novel enough for a shot at a good journal?
Relevant answer
Answer
Yes. But it's important to highlight the differences in population, time period, and potentially methodology in your submission to emphasize the novelty and relevance of your findings.
  • asked a question related to Data Analysis
Question
4 answers
2024 4th International Conference on Machine Learning and Intelligent Systems Engineering (MLISE 2024) will be held on June 28- June 30, 2024 in Zhuhai China.
MLISE is conducting exciting series of symposium programs that connect researchers, scholars and students to industry leaders and highly relevant information. The conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. MLISE propose new ideas, strategies and structures, innovating the public sector, promoting technical innovation and fostering creativity in development of services.
---Call For Papers---
The topics of interest for submission include, but are not limited to:
1. Machine Learning
- Deep and Reinforcement learning
- Pattern recognition and classification for networks
- Machine learning for network slicing optimization
- Machine learning for 5G system
- Machine learning for user behavior prediction
......
2. Intelligent Systems Engineering
- Intelligent control theory
- Intelligent control system
- Intelligent information systems
- Intelligent data mining
- AI and evolutionary algorithms
......
All papers, both invited and contributed, will be reviewed by two or three experts from the committees. After a careful reviewing process, all accepted papers of MLISE 2024 will be published in the MLISE 2024 Conference Proceedings by IEEE (ISBN: 979-8-3503-7507-7), which will be submitted to IEEE Xplore, EI Compendex, Scopus for indexing.
Important Dates:
Submission Deadline: April 26, 2024
Registration Deadline: May 26, 2024
Conference Dates: June 28-30, 2024
For More Details please visit:
Invitation code: AISCONF
*Using the invitation code on submission system/registration can get priority review and feedback
Relevant answer
Answer
Yes, the conference is hybrid format,both online and offline could be accepted.
Submitting your papers to the system is free. Once your paper is accepted, you will need to pay the registration fee. The registration fee could be refer to the website: http://mlise.org/registration
  • asked a question related to Data Analysis
Question
6 answers
Apart from using the CASP tool, which I can only get to review articles with qualitative studies, I am looking to review articles that equally use quantitative methodology for their research
Relevant answer
Answer
You could look at the Mixed Methods Appraisal Tool.
Hong, Q. et al. (2019). Improving the content validity of the mixed methods appraisal tool: a modified e-Delphi study. Journal of clinical epidemiology, 111, 49-59.
  • asked a question related to Data Analysis
Question
3 answers
Would it be considered academic dishonesty if a phd student hire a data analyst to conduct the data analysis for his/here thesis?
Relevant answer
Answer
ABSOLUTELY NO. HOWEVER, THE RESPONSIBILITY TO ANSWER ALL QUESTIONS FROM THE REVIEW PROFESSORS REST WITH THE STUDENT. IT IS SIMPLY YOUR WORK.
  • asked a question related to Data Analysis
Question
3 answers
Hello guys
I want to employ FMRI for conducting research.
At first step, I want to know FMRI data is an image like MRI.
Or I should behave with FMRI like time-series when it comes to analyzing data
thank you
Relevant answer
Answer
MRI datasets typically result in high-resolution three-dimensional images representing anatomical structures. These images are often stored in formats such as DICOM (Digital Imaging and Communications in Medicine) or NIfTI (Neuroimaging Informatics Technology Initiative). fMRI datasets produce time-series data representing changes in brain activity over time. These data are often stored in formats compatible with neuroimaging software packages, such as NIfTI, Analyze, or MINC (Medical Imaging NetCDF). fMRI data can be conceptualized and analyzed both as images and time-series. The choice of representation depends on the specific research question and analysis techniques being employed. For many analyses, researchers will use both approaches, leveraging the spatial information provided by the image-like representation and the temporal dynamics captured in the time-series data.
  • asked a question related to Data Analysis
Question
3 answers
I need help analyzing enzyme kinetic data.
I have data from the Octet K2 system. In my experiment, I load the sensor with our protein of interest (6XHis tag on my recombinant protein to Ni-NTA sensors) and then expose this sensor to increasing concentrations of the candidate binding protein (five concentrations per experiment and each experiment is replicated four times). Each association step is followed by a dissociation step in buffer. A control sensor is used in each experiment where a sensor is loaded with the protein of interest but only exposed to buffer. (See picture, Part 1)
I have separate data where I loaded smaller recombinant domains of the protein of interest to the sensor and exposed it to the candidate binding protein. I would like to combine this data (the binding of the full-length protein and the binding of the domains) on the same graph.
My problem: In trying to analyze the data with the software provided with the Octet system (HT 11.1), the data misaligns. (See picture, Part 2)
My goal is to determine kinetic constants (KD) of the full-length protein and its separate domains to the protein of interest.
Suggestions for correctly aligning the data in the Octet software HT11.1? (I think the misalignment is because the program is trying to align the y axis to baseline 1 instead of baseline 2, which is the baseline right before the association step. If so, can you change this label after the fact?)
If the glitch with the Octet software cannot be fixed, then is there a manual/tutorial for the enzyme kinetic module for Sigma Plot?
I found I can extract the raw data from the Octet system. I can remove the background from the control sensor and manually assign concentrations. I uploaded this into Sigma plot 15, which has an enzyme kinetic module. I found the embedded help guide, but I have specific questions. For example:
*My candidate binding protein does not change, but how do you take into account the change in the kilodaltons of the proteins that are loaded to the sensor, full length vs. the smaller domain proteins? This is automatically taken care of in the Octet software.
*How do I differentiate between the association and dissociation phases?
I am new to Octet biolayer analysis and the Enzyme Kinetic Module analysis in Sigma Plot.
Any help will be greatly appreciated! I am happy to provide any more information.
Relevant answer
Answer
Awaiting for the analysis results' success 😊
  • asked a question related to Data Analysis
Question
3 answers
Hi all!
I've been collecting data on a group of 8 chimpanzees at Chester Zoo for my dissertation. The group consists of 4x males and 4x females, all of which have different hierarchical status' and ages.
I have been doing random focal observations with a checksheet consisting of 4 state behaviours (timed) and 6 behaviours (frequencies). I would start a random focal observation when a stressful context arose (such as high visitor numbers, anticipation to feeding, or feeding time)and denote the durations or frequencies of behaviours exhibited by that individual for 15 minutes. Then at the following visit, I would observe the same individual at the same time but under a non-stressful context (therefore utilising the Matched Control Method).
This process repeated for 4 months and I now have a complete data set.
I am <really> struggling on 1. How to use SPSS, and 2. What tests would be ideal to use? As you can imagine there is quite alot of data which hold different values so you can hopefully see my confusion around this.
Ideally, the statistical analysis of my data will reveal which contexts in the zoo precipitate an increase in stress the most (e.g. high visitor numbers, anticipation to feeding, feeding). I also want to be able to compare this data to the hierarchial status' and ages of the individuals.
Any help would be so appreciated. Thanks in advance!
Relevant answer
Answer
Hi Amber, I would recommend ; SPSS analysis without anguish book.
  • asked a question related to Data Analysis
Question
1 answer
compliments of the day. Please, how do I go about the data analysis of my PhD work? human health risk assessment of heavy metals. biomarker and heavy metal analysis of human and environmental samples.
#biomarker data analysis.
#Heavy metal data analysis
#environmental samples data analysis
Relevant answer
Answer
Hello.
You may have to factor a lot into the question.
1. What is your PhD all about?
2. What problems are you trying to solve?
3. The message you want to communicate will influence your data collection, processing, and analysis.
The specifics will your audience give you relatable answers
  • asked a question related to Data Analysis
Question
5 answers
Hi,
Can anyone explain if there is a better system available to analyze data than the IBM SPSS?
Thank you,
Ameenah
Relevant answer
Answer
R in many ways is more flexible, and it is free:
  • asked a question related to Data Analysis
Question
2 answers
Ecoplate is done to know the metabolic diversity of soil at community level.
Relevant answer
Answer
Hello. You can check the following article. I hope this helps
  • asked a question related to Data Analysis
Question
4 answers
I have several pairs of parameters (obtained from females and males) and want to find the difference in correlation between the two sexes for each parameter, but also want to give a weight so that the parameter showing the highest correlation with survival in either females or males have a greater weight. This way, I hope to find factors that shows a combination of strong correlation differences between females and males (with regard to survival) - and most positively correlated with survival for either sex (which I will resolve further).
To do this, if I have a correlation of parameter 1 for males as A and for females as B: I plan to do (A-B) multiplied by A or B (the highest correlation) - to acknowledge the weight of highest positive correlation with survival. For the next parameter, the correlation is C for males and D for females, I will do (C-D) X C or D (whichever is highest) - with the final aim to rank the parameters most differing between females and males, as well as, most correlating with survival of either sex. Do you think it is a reasonable idea?
I would be very very grateful for your advice, suggestions and tips.
Relevant answer
Answer
Thank you very much :)
  • asked a question related to Data Analysis
Question
2 answers
I am a researcher who decided to buy labtop suitable for my work as a simulation and data analysis work
Please suggest one
Relevant answer
Answer
There is no real way to suggest specific products, as it depends on affordability, location, availability etc. If you are likely to work with large amounts of data, data processing and data analysis you would likely benefit from good RAM, strong multi-core CPU, SSD harddrive with good ability to read/write quickly.
  • asked a question related to Data Analysis
Question
2 answers
I am checking if there any systematic differences in physical activity among different income group and education level for my masters’ thesis. The physical activity has been assessed through questionnaire in a municipality and taken three different dimensions (intensity, duration and frequency). I wonder if there is any way to integrate all three dimensions to make one new variable which could provide more reliable value for physical activity. If not possible and I have to select only one measure which one could be more reliable.
Relevant answer
Answer
Meena Pokhrel Could also get extracted via factor analyse
  • asked a question related to Data Analysis
Question
1 answer
Hello everyone, for my dissertation I have two predictor variables and one criterion variable. In one of the predictor variable- I further have 5 domains and it doesn't have a global score so in that case can i used multiple regression or i have to perform step wise linear regression seperately for 6 predictors(5 domains and another predictor) ?- keeping in mind the assumption of multicollinearity.
Relevant answer
Answer
There are two different issues here. The first is with regard to step-wise regression, which is a very old-fashioned technique which is no longer widely accepted. Instead, you should indeed use multiple regression.
The other issue is with regard to multicolinearity. Since you predictors will almost certainly be inter-correlated, you will thus have some degree of multicolinearity. But this goes back your wanting to keep the 5 domains separate, since it is their degree of inter-correlation that creates the multicolinearity.
Have you considered using Structural Equation Analysis, or exploratory factor analysis to clarify whether your 5 domains truly are statistically distinct, as opposed to indicators of a single larger domain.
  • asked a question related to Data Analysis
Question
1 answer
Here are some examples of software that can be used for each step of RNA-seq data analysis:
  1. Quality Control: FastQC, PRINSEQ, Sickle
  2. Read Trimming: Trimmomatic, Cutadapt, AdapterRemoval
  3. Alignment: STAR, HISAT2, TopHat
  4. Quality Control of Alignment: Qualimap, RSeQC, Picard
  5. Assembly: Trinity, Oases, Trans-ABySS
  6. Quantification: RSEM, Kallisto, eXpress
  7. Differential Expression Analysis: DESeq2, EdgeR, limma
  8. Functional Annotation: Blast2GO, KEGG, Reactome
  9. Pathway Analysis: KEGG Pathway, Reactome, Enrichr
  10. Network Analysis: Cytoscape, STRING, ClueGO
  11. Visualization: IGV, GenomeBrowse, JBrowse
  12. Interpretation: GSEA, DAVID, IPA
Relevant answer
Answer
For the alignment step, I think it's important to mention pseudo alignment (Salmon, Sailfish, Kallisto) and RUM for hybrid alignment to both genome and transcriptome.
  • asked a question related to Data Analysis
Question
1 answer
Dear Scientists and Researchers,
I'm thrilled to highlight a significant update from PeptiCloud: new no-code data analysis capabilities specifically designed for researchers. Now, at www.pepticloud.com, you can leverage these powerful tools to enhance your research without the need for coding expertise.
Key Features:
PeptiCloud's latest update lets you:
  • Create Plots: Easily visualize your data for insightful analysis.
  • Conduct Numerical Analysis: Analyze datasets with precision, no coding required.
  • Utilize Advanced Models: Access regression models (linear, polynomial, logistic, lasso, ridge) and machine learning algorithms (KNN and SVM) through a straightforward interface.
The Impact:
This innovation aims to remove the technological hurdles of data analysis, enabling researchers to concentrate on their scientific discoveries. By minimizing the need for programming skills, PeptiCloud is paving the way for more accessible and efficient bioinformatics research.
Join the Conversation:
  1. How do you envision no-code data analysis transforming your research?
  2. Are there any other no-code features you would like to see on PeptiCloud?
  3. If you've used no-code platforms before, how have they impacted your research productivity?
PeptiCloud is dedicated to empowering the bioinformatics community. Your insights and feedback are invaluable to us as we strive to enhance our platform. Visit us at www.pepticloud.com to explore these new features, and don't hesitate to reach out at [email protected] with your thoughts, suggestions, or questions.
Together, let's embark on a journey towards more accessible and impactful research.
Warm regards,
Chris Lee
Bioinformatics Advocate & PeptiCloud Founder
Relevant answer
Answer
I think they remove the need for programming skills and make data analysis much easier to do quickly and efficiently! For the future, I look forward to considering adding more no-code functions to meet a wider range of research needs. Just like the no-code platforms used before, a lot of time will be spent on data processing and analysis, and with no-code tools It will make our work easier and easier
  • asked a question related to Data Analysis
Question
2 answers
I'm recently trying to perform an RNA seq data analysis and in 1st step, I faced a few questions in my mind, which I would like to understand. Please help to understand these questions.
1) In 1st image, raw data from NCBI-SRA have marked 1&2 at the ends of the reads, What is the meaning of this? are those meaning forward and reverse reads?
2) In the second image I was trying to perform trimmomatic with this data set. I chose "paired-end as a collection" but it does not take any input even though my data was there in "fastqsanger.gz" format. Why is that? Should I treat this paired-end data as single-end data while performing Trimmomatic?
3) in the 3rd and 4th images, I collected the same data from ENA where they give two separate files for 1 and 2 marked data in SRA. Then I tried to process them in Trimmomatic by using "Paired-end as individual dataset" and then run it. Trimmomatic gives me 4 files for those, Why is that? which one will be useful for alignment ??
A big thank you in advance :)
Relevant answer
Answer
For NGS sequencers that have pair-end capability, those 1's and 2's refer to which reads they originate from, Read 1 & Read 2 or Forward & Reverse Read s (https://www.illumina.com/science/technology/next-generation-sequencing/plan-experiments/paired-end-vs-single-read.html). That makes it convenient to have it attached at the end to see what kind of reads you are dealing with before processing the reads. In a similar fashion, within the FASTQ format specification, you also specify whether that particular read belongs to Read 1 or Read 2 (see Illumina sequence identifiers: https://en.wikipedia.org/wiki/FASTQ_format). Regarding your "fastqsanger.gz" format data, is this Sanger sequencing related data? These tools are developed for NGS applications. Regarding the output file, check: https://www.biostars.org/p/199938/ and "Trimmomatic output files" on google.
  • asked a question related to Data Analysis
Question
3 answers
I'm performing RNA-seq data analysis. I want to do healthy vs disease_stage_1, Healthy vs disease_stage_2, and Healthy vs disease_stage_3. In the case of healthy, disease_stage_1, disease_stage_2, and disease_stage_3 data sets, I have 19, 7, 8, and 15 biological replicates respectively.
Does this uneven number of replicates affect the data analysis?
Should I Use an even no of datasets like for every dataset, 7 biological replicates (As the lowest number of replicates here is 7)?
Relevant answer
Answer
I agree with Alwala in general, less than 3 samples per group is not even worth considering. There is an app online that will help you come up with sample size and/or power given certain parameters (https://cqs-vumc.shinyapps.io/rnaseqsamplesizeweb/) as a useful estimate tool. Regarding your uneven biological replicates, check to see if the statistical method used for differential expression and library normalization can tolerate uneven sample sizes. In general IMO, 8-10 minimum is a pretty good starting point.
  • asked a question related to Data Analysis
Question
5 answers
Kindly research on how AI  is going to impact on legal sevices and in particular arbitraration.Right now some firms have started using AI for research,due diligence  and data analytics.
Relevant answer
Answer
AI stand for Artificial Intelligence, it is a machine or computer system that can perform numerous of task in just a minutes that human takes a several hours. Today AI is evolving and taking away varities of job like content writting, Video making, Imaginary image generator, graphic design and many more.
  • asked a question related to Data Analysis
Question
1 answer
It prompts for an explanation regarding which method is more suitable and why, aiming to enhance understanding of the selection process between these two techniques in the context of mixed data analysis. Is there any other ordination technique suitable for mixed data type?
Relevant answer
Answer
For mixed data types (nominal, ordinal, continuous), Non-Metric Multidimensional Scaling (NMDS) or Multiple Correspondence Analysis (MCA) are often more appropriate.
NMDS (Non-Metric Multidimensional Scaling):
Pros:
Suitable for various types of dissimilarity measures, making it flexible for mixed data.
Non-metric nature allows it to handle ordinal data well.
Cons:
Computationally intensive for large datasets.
Requires defining a dissimilarity measure.
PCAMix (Principal Component Analysis for Mixed Data):
This method specifically addresses the issue of mixed data.
Pros:
Handles mixed data types, including nominal and ordinal variables.
Cons:
Might not perform as well as NMDS for certain datasets.
The choice between NMDS and PCAMix often depends on the characteristics of your data and the specific goals of your analysis. If dissimilarities between observations are meaningful and you want to preserve ordinal information, NMDS might be more suitable. PCAMix, on the other hand, is designed for mixed data and can handle nominal and ordinal variables in addition to continuous ones.
Consider the nature of your data, the assumptions of each method, and the goals of your analysis when choosing an ordination technique for reducing dimensionality in mixed data. It may also be beneficial to perform exploratory data analysis and compare the results of different methods to determine which one best captures the patterns in your dataset.
  • asked a question related to Data Analysis
Question
6 answers
Cosine similarity, Soft Cosine similarity or SBERT?
Relevant answer
Answer
First, you should use SBERt to create embeddings of the sentences. Then, utilize an algorithm like UMAP to reduce dimensionality. If you want to highlight latent topics, use HDBSCAN for high-density clustering. Finally, use the TensorFlow Projector to check for sentences similar in cosine similarity. I explain this in this publication: https://www.linkedin.com/feed/update/urn:li:ugcPost:7076949962558705664?commentUrn=urn%3Ali%3Acomment%3A%28ugcPost%3A7076949962558705664%2C7077287447948070912%29&dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287077287447948070912%2Curn%3Ali%3AugcPost%3A7076949962558705664%29
  • asked a question related to Data Analysis
Question
11 answers
We usually use Excel in our lab to analyse data, but I would like to take a course on a more sophisticated tool. Can you share which one is most common in molecular biology?
Thank you :)
Relevant answer
Answer
SPSS and R programming
  • asked a question related to Data Analysis
Question
3 answers
Could this be due to an error in Mass Spec calibration or data analysis? I have 2 technical repeats that are fine, but the 3rd repeat is far away in the PCA plot and clusters with replicates of a different sample.
Relevant answer
Answer
It is always better to include an internal standard (deuterated isotopes or any analog compound) to uncover any system or operator-related bias...
If you don't have any IS in your experiment, an alternative way to do this is tracking the intensity (and RT) of an inherently available compound (consider this approach as visualization of housekeeping proteins as normalization targets in western blot analysis).
If the system is Orbitrap, EASY-IC calibrant performance may also indicate some hints. A baffled system for ESI fluidics infusing the leucine enkephalin in the Waters system may also be beneficial...
shifted position for a sample in PCA is more common for biological replicates but not for technical replicates. If the instrument performance is stable during analysis, it is probably caused by either sample prep or autosample operation failure.
  • asked a question related to Data Analysis
Question
6 answers
What is the short new way for you to solve this problem of data analysis in time series?Suppose you have time series data. What steps do you take and how to analyse this data? How to solve it in your work?Follow me and share with me your post and your personal experience. 
  • asked a question related to Data Analysis
Question
2 answers
What are some innovative approaches to data analysis and visualization in phonology research?
Relevant answer
Answer
Thank you very much
  • asked a question related to Data Analysis
Question
3 answers
Dear experts,
I have noticed that researchers who are able to publish in first-tier journals often use advanced data analysis methods, which usually involve numerical forms such as Confirmatory Factor Analysis (CFA), Structural Equation Modelling, and Comparative Analysis.
While I acknowledge the significance of using advanced data analysis methods like CFA and structural equation modelling to answer specific research questions, I am interested to know why there is a preference for these methods over qualitative studies.
Look forward to hearing from you.
Relevant answer
Answer
One reason may be that the analysis of data obtained in qualitative studies is more complex than in quantitative studies and that there is more subjectivity involved in the analysis of qualitative data. In addition, many (most?) phenomena in the social sciences are quantitative in nature (i.e., they can be measured on a continuum from low to high). Therefore, quantitative analysis of attributes such as intelligence, anxiety, personality, attitudes, etc. seems natural.
  • asked a question related to Data Analysis
Question
3 answers
What are the applications of Industry 4.0/5.0 technologies, including Big Data Analytics and generative artificial intelligence to business entities to improve business entity management processes?
What are the applications of Industry 4.0/5.0 technologies, including Big Data Analytics, Data Science, multi-criteria simulation models, digital twins, additive manufacturing, Blockchain, smart technologies and also generative artificial intelligence to business entities in order to improve internal business intelligence information systems supporting the management processes of a company, enterprise, corporation or other type of business entity?
In recent years, there has been a growing scale of implementation of Industry 4.0/5.0 technologies, including Big Data Analytics, Data Science, multi-criteria simulation models, digital twins, additive manufacturing, Blockchain, smart technologies and also generative artificial intelligence to business entities in order to improve internal information systems of the Business Intelligence type supporting the management processes of a company, enterprise, corporation or other type of business entity. The Covid-19 pandemic has accelerated the processes of digitizing the economy. The importance and application of analytics conducted via the Internet and/or using data downloaded from the Internet is also growing. An example is sentiment analysis conducted on data downloaded from the Internet implemented on Big Data Analytics platforms being an additional research instrument of conducted market research, marketing research as an additional source of data for conducted Business Intelligence type analysis. This is particularly important because in recent years the importance of Internet marketing, including viral marketing, Real-Time marketing carried out on social media sites is increasing. Accordingly, in many industries and sectors of the economy, there is already an increase in the application of certain Industry 4.0 technologies, i.e., such as Big Data Analytics, Data Science, cloud computing, machine learning, personal and industrial Internet of Things, artificial intelligence, Business Intelligence, autonomous robots, horizontal and vertical data system integration, multi-criteria simulation models, additive manufacturing, Blockchain, cybersecurity instruments, Virtual and Augmented Reality and other advanced data processing technologies Data Mining. Besides, using Big Data Analytics, interesting research is being conducted in the field of the issue: Analysis of changes in the relationship of consumer behavior in the markets for goods and services caused by the impact of advertising campaigns conducted on the Internet, applying new Internet marketing tools used in new online media, including primarily social media. The growth of behavioral economics and finance, including the analysis of the determinants of media formation of consumer opinions on the recognition of the company's brand, product and service offerings, etc., through the growth of Internet information services, including social media portals. Currently, online viral marketing based on social media portals and customer data collected and processed in Big Data Analytics databases is developing rapidly. In recent years, new online marketing instruments have also been developed, applied mainly on social media portals and are also used by e-commerce companies. Internet technology companies and fintechs are also emerging, offering online information services to assist marketing management, including in planning advertising campaigns for products sold via the Internet. For this purpose, the aforementioned sentiment analyses are used to study the opinions of Internet users regarding the prevailing awareness, recognition, brand image, mission, offerings of certain companies. Sentiment analysis is carried out on large data sets taken from various websites, including millions of social media pages, collected in Big Data systems. The analytical data collected in this way is very helpful in the process of planning advertising campaigns carried out in new media, including social media sites. These campaigns advertise, among other things, products and services sold via the Internet, available in online stores. In view of the above, the development of e-commerce is mainly determined by technological advances in ICT information technology and advanced data processing technology Industry 4.0, as well as new technologies used in securing financial transactions carried out over the Internet, including transactions related to e-commerce, i.e. blockchain technology, for example. In my opinion, ongoing scientific research confirms the strong correlation occurring between the development of Big Data technologies, Data Science, Data Analytics and the efficiency of the use of knowledge resources. I believe that the development of Big Data technology and Data Science, Data Analytics and other ICT information technologies, multi-criteria technology, advanced processing of large sets of information, Industry 4.0 technology increases the efficiency of the use of knowledge resources, including in the field of economics, finance and organizational management. In recent years, ICT information technologies, Industry 4.0, etc., have been developing particularly rapidly and are being applied in knowledge-based economies. These technologies are being applied in scientific research and business applications in commercially operating enterprises and in financial and public institutions. In view of the growing importance of this issue in knowledge-based economies, it is important to analyze the correlation between the development of Big Data technologies and analytics of Data Science, Data Analytics, Business Intelligence and the efficiency of using knowledge resources to solve key problems of civilization development. Analytics based on Business Intelligence, in addition to Data Science, Big Data Analytics are increasingly being used in improving business management processes. The development of this analytics based on the implementation of ICT information technologies and Industry 4.0 into analytical processes has a great future in the years to come. In recent years, ICT information technologies, Industry 4.0, etc., have been developing particularly rapidly and are being applied in knowledge-based economies. In addition, the application of artificial intelligence technologies can increase the efficiency of the use of Big Data Analytics and other Industry 4.0/5.0 technologies, which are used to support business management processes.
I have described the issues of application of Big Data and Business Intelligence technologies in the context of enterprise risk management in the following article:
APPLICATION OF DATA BASE SYSTEMS BIG DATA AND BUSINESS INTELLIGENCE SOFTWARE IN INTEGRATED RISK MANAGEMENT IN ORGANIZATION
In addition, I described the issues of opportunities and threats to the development of AI technology applications in my following article:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
In view of the above, I address the following question to the esteemed community of scientists and researchers:
What are the applications of Industry 4.0/5.0 technologies, including Big Data Analytics, Data Science, multi-criteria simulation models, digital twins, additive manufacturing, Blockchain, smart technologies and also generative artificial intelligence to business entities in order to improve internal business intelligence information systems supporting the management processes of a company, enterprise, corporation or other type of business entity?
What are the applications of Industry 4.0/5.0 technologies, including Big Data Analytics and generative artificial intelligence to business entities to improve business entity management processes?
How does Big Data Analytics and generative artificial intelligence support business entity management processes?
What do you think on this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Relevant answer
Answer
Big Data Analytics: Big Data Analytics uses large volumes of data and machine learning technologies to discover patterns that would otherwise allow organisations make effective decisions.
  • asked a question related to Data Analysis
Question
2 answers
What is the short new way for you to solve this problem of data analysis in time series? Suppose you have time series data. What steps do you take and how to analyse this data? How to solve it in your work? Follow me and share with me your post and your personal experience.
Relevant answer
Answer
In time series data analysis, a concise approach involves several key steps:
1. Data Exploration:
- Understand the characteristics of the time series.
- Check for trends, seasonality, and outliers.
2. Data Preprocessing:
- Handle missing values and outliers appropriately.
- Consider normalization or scaling if needed.
3. Visualization:
- Plot the time series to gain insights.
- Use tools like line plots, histograms, or box plots.
4. Feature Engineering:
- Extract relevant features, such as rolling statistics or lag features.
- Consider transformations for stationarity.
5. Model Selection:
- Choose an appropriate model based on characteristics (ARIMA, SARIMA, LSTM, etc.).
- Split data into training and testing sets.
6. Training:
- Train the selected model on the training set.
7. Evaluation:
- Evaluate model performance on the test set.
- Use metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE).
8. Optimization:
- Fine-tune model parameters if needed.
- Consider ensemble methods or hybrid models.
9. Prediction:
- Use the trained model to make predictions on new data.
10. Validation:
- Validate results against actual outcomes.
- Adjust the model or methodology if necessary.
Each work involves implementing these steps using statistical and machine learning techniques, and it often requires adapting strategies based on the specific characteristics of the data. It's crucial to stay mindful of the context and the goals of the analysis throughout the process.
  • asked a question related to Data Analysis
Question
3 answers
XPS Data Analysis
Relevant answer
Answer
1) A spectrum gets corrected exactly once, you don't shift every peak individually.
2) Before you shift around stuff, please check whether you need correction at all - a well-conducting sample should give a signal that can be used as it is. The C1s correction is more of a last resort method, if possible, other methods should be preferred. Since it is quite controversial in the community, here are four references that should cover the range of valid points quite well:
  • asked a question related to Data Analysis
Question
3 answers
What is univariate data analysis?
Relevant answer
Answer
Summarize the characteristics of a single variable (eg. age or gender) by frequency, mean, median, SD, IQR, etc.
  • asked a question related to Data Analysis
Question
3 answers
What is bivariate data analysis?
Relevant answer
Answer
If you start studying any descriptive statistics text, you understand it accordingly.
Regards...
  • asked a question related to Data Analysis
Question
4 answers
data collection methodology
data analysis methods
Relevant answer
Answer
You need to provide additional information on the nature of the data and yur dependent variable(s). Both are the determinants of what method of methodology to use in an empirical research
  • asked a question related to Data Analysis
Question
7 answers
Hello,
Im conducting a qualitative piece of research which ive positioned within a phenomenological grounding. I’m also utilising IPA as the framework for my data analysis. However, I’ve been asked to detail my analytical framework separate this to but I’m struggling to understand the difference between this and my data analysis approach. Any help would be greatly appreciated?
Relevant answer
Answer
Sorry for accidentally deleting the information. Here it is again:
Let's break down the concepts of the analytical framework and data analysis approach in the context of your qualitative phenomenological research using Interpretative Phenomenological Analysis (IPA).
Analytical Framework:
-Definition:
The analytical framework is the overarching theoretical and conceptual structure that guides your study. It sets the stage for how you approach your research questions and interpret your findings.
-Purpose:
It provides a lens through which you view and understand your research. In phenomenological research, your analytical framework is grounded in phenomenology, emphasizing the exploration of individuals' lived experiences.
-Components:
It encompasses the philosophical underpinnings, theoretical perspectives, and broader concepts that shape your research design. For your study, the analytical framework is rooted in phenomenology, focusing on the essence of experiences.
-Influence on Research Design:
Your analytical framework informs the formulation of research questions, the selection of participants, and the overall design of your study. It answers the "why" of your research.
Data Analysis Approach (IPA):
-Definition:
The data analysis approach is a more specific and detailed aspect of your research process. It involves the systematic examination of your qualitative data to identify patterns, themes, and meanings.
-Purpose:
It is the methodological technique you use to analyze and interpret the qualitative data you've collected. In your case, you've chosen Interpretative Phenomenological Analysis (IPA) as your data analysis approach.
-Components:
IPA involves a detailed and iterative process of analyzing textual data, such as interview transcripts, to identify emergent themes. It emphasizes the interpretation of participants' lived experiences and perceptions.
-Influence on Data Interpretation:
Your data analysis approach guides how you make sense of the rich, qualitative data you've gathered. It helps you uncover the underlying meanings and patterns in participants' narratives.
Key Difference:
-Scope:
· The analytical framework is broad and conceptual, setting the philosophical and theoretical foundation for your entire study.
· The data analysis approach is more specific and methodological, focusing on the techniques used to analyze and interpret the collected data.
-Timing:
· The analytical framework is established at the beginning of your research and influences the entire study.
· The data analysis approach comes into play after data collection and informs how you process and interpret the gathered information.
The analytical framework is like the big-picture guide for your entire research, while the data analysis approach is the specific method you use to analyze and interpret the detailed qualitative data within that broader framework. The two are interrelated, with the analytical framework influencing the overall design, and the data analysis approach guiding the processing and interpretation of your collected data.
  • asked a question related to Data Analysis
Question
1 answer
I have selected two deep learning models CNN and sae for data analysis of a 1 d digitized data set. I need to justify choice of these two dl models in comparison to other dl and standard ml models. I am using ga to optimize hyper parameters values of the two dl models. Can you give some inputs for this query.thanks.
Relevant answer
Answer
Typically, the rationale for choosing a model can be training time, prediction time, and the value of the metric itself, either on a validation set or cross-validation, depending on what you are using. It is better, of course, to use more than one metric for indicators, as well as an error matrix along with completeness and accuracy, or simply F1 or F1-beta, depending on the problem you are solving.
  • asked a question related to Data Analysis
Question
7 answers
What aspects of working with data are the most time-consuming in your research activities?
  1. Data collection
  2. Data processing and cleaning
  3. Data analysis
  4. Data visualization
What functional capabilities would you like to see in an ideal data work platform?
Relevant answer
Answer
Yes, I don't mind, and I am interested in everything related to statistics because it is my specialty.
Glad to inform me of the details
Thank You.
  • asked a question related to Data Analysis
Question
1 answer
Colleagues, good day!
We would like to reach out to you for assistance in verifying the results we have obtained.
We employ our own method for performing deduplication, clustering, and data matching tasks. This method allows us to obtain a numerical value of the similarity between text excerpts (including data table rows) without the need for model training. Based on this similarity score, we can determine whether records match or not, and perform deduplication and clustering accordingly.
This is a direct-action algorithm, relatively fast and resource-efficient, requiring no specific configuration (it is versatile). It can be used for quickly assessing previously unexplored data or in environments where data formats change rapidly (but not the core data content), and retraining models is too costly. It can serve as the foundation for creating personalized desktop data processing systems on consumer-grade computers.
We would like to evaluate the quality of this algorithm in quantitative terms, but we cannot find widely accepted methods for such an assessment. Additionally, we lack well-annotated datasets for evaluating the quality of matching.
If anyone is willing and able to contribute to the development of this topic, please step forward.
Sincerely, The KnoDL Team
Relevant answer
Answer
Dear teammates,
I am high experienced in clustering by optimization algorithms such as genetic algorithm, SA, particle swarm optimization algorithm and etc. So, I think I'm skilled to join your group. Please let me know if think so.
Thank you
  • asked a question related to Data Analysis
Question
1 answer
During my RNAseq data analysis, I encountered a problem where the statistics from MultiQC from the STAR method showed that 70% of my trimmed reads were aligned, but when I ran using the same BAM files, it said that only 3% of my reads were assigned.
Why is that? Should I proceed further or do I need to perform additional checkups? For reference i used HG38.p14 version from NCBI.
Relevant answer
Answer
Hi Pratanu,
have you checked if the parameters of featureCounts were set properly (correct annotation used, strand orientation, etc.)?
I would have a look at the bam files using a genome viewer, like IGV and check if there are reads at certain genes that I would expect to get reads for sure.
  • asked a question related to Data Analysis
Question
3 answers
  1. How is healthcare data collected and organized for analysis?
  2. What statistical and machine learning algorithms can be used for healthcare data analysis?
  3. How to address missing values and outliers in healthcare data?
  4. What feature selection and feature engineering methods should be used in healthcare data analysis?
Relevant answer
Answer
Hello, I have a book Armitage & Berry; Statistic methods in medical research.
  • asked a question related to Data Analysis
Question
2 answers
i have a made a proposal but i didnt find anyone who can help me properly and work with me properly.i need someone as co-author
my project:
1.social science related
2.in bangldeshi contex
3.you have to work with me as team member.skill need-zotero,spss data analysis,google form maker,
4.need 3 another member with specific skillsets.
Relevant answer
Answer
yeah, you are right. I am looking for this that you already mentioned. but I wanna add something that you have to work with our team as a top contributor in the shape of this article like you are doing this research for your purpose.
  • asked a question related to Data Analysis
Question
4 answers
How are researchers leveraging artificial intelligence and machine learning algorithms to enhance data analysis and prediction in their studies?
Relevant answer
Answer
In language studies, researchers harness artificial intelligence and machine learning algorithms to revolutionize data analysis and prediction. Through natural language processing (NLP), these tools sift through vast textual data, extracting patterns, sentiments, and semantic relationships within language corpora. AI-powered sentiment analysis and emotion detection unveil nuanced emotional tones and public opinions from text, while language models generated by machine learning aid in synthetic language creation for research purposes. These technologies also enable cross-linguistic studies, assisting in translation tasks and comparative analyses among languages, uncovering linguistic structures and language evolution trends. Predictive analytics driven by AI predict language shifts and societal changes, contributing to sociolinguistic inquiries and enabling researchers to delve deeper into discourse analysis and topic modeling.
  • asked a question related to Data Analysis
Question
2 answers
Someone help me with access to Scopus database. I need the information about articles in Geoderma journal for review. I want to compare the use of methods of remote sensing and proximal sensing soil data analysis for the last 30-40 years. In russia the access is disabled.
  • asked a question related to Data Analysis
Question
12 answers
Hi RG family. I’m trying to get a foothold on qualitative data analysis using Nvivo. First off, I must admit my addiction to quantitative methods through out much of my career. But recently, I’m getting obsessed with qualitative approaches to research, because of their potential to generate more detailed and comprehensive insights.
However, I’m not familiar with the Nvivo software. Please fill me in on any and I mean any detail you know about qualitative data analysis via Nvivo. From transcription, and data entry, to analysis, results visualization and interpretation. I look forward to learning massively from your immensely invaluable contributions to this discussion.
Over to you, fam!! I’m reading you!!
Relevant answer
Answer
NVivo has a series of 43 tutorials at:
I also like the series of NVivo videos on YouTube by Philip Adu (but personally, I prefer MAXDDA for my own work, just because I find its interface more comfortable).
  • asked a question related to Data Analysis
Question
4 answers
What are the most effective and widely-used software programs for analyzing data collected through observations in research studies?
Relevant answer
Answer
If you are referring to observation, as in qualitative data then you could opt for NVivo,
  • asked a question related to Data Analysis
Question
4 answers
I found in google but can't understand properly.
Relevant answer
Answer
Correlation measures the strength and direction of a linear relationship between two variables, while regression goes a step further by modeling and predicting the impact of one or more independent variables on a dependent variable. Correlation does not imply causation, merely showing association, whereas regression can provide insights into potential cause-and-effect relationships.
  • asked a question related to Data Analysis
Question
2 answers
Hello dear scientists
How can we distinguish maternal contamination from triploidy in QF-PCR analysis?
Relevant answer
Answer
Raghad Mouhamad Thank you dear doctor. In one case, some markers were the same as the mother's, so it was thought to be from the mother. But the karyotype result actually showed triploidy.
  • asked a question related to Data Analysis
Question
2 answers
Looking for somebody to share experience on using ChatGPT 4 as an assisting tool (in combination with either Excel or Perseus MaxQuant) for analyzing data such as OMICs. Concretely I'm dealing with proteomics from human blood samples.
Grateful for any feedback!
Relevant answer
Answer
Use this Python program for basic OMICs data analysis using the pandas library:
```python
import pandas as pd
# Load OMICs data from a CSV file
data = pd.read_csv('omic_data.csv')
# Perform basic data exploration
print("Data shape:", data.shape)
print("Column names:", data.columns)
# Calculate summary statistics
summary_stats = data.describe()
print("Summary statistics:")
print(summary_stats)
# Perform data filtering
filtered_data = data[data['Gene Expression'] > 10]
print("Filtered data:")
print(filtered_data)
# Perform data aggregation
aggregated_data = data.groupby('Sample Type').mean()
print("Aggregated data:")
print(aggregated_data)
# Perform data visualization using matplotlib
import matplotlib.pyplot as plt
# Box plot of gene expression by sample type
data.boxplot(column='Gene Expression', by='Sample Type')
plt.title('Gene Expression by Sample Type')
plt.xlabel('Sample Type')
plt.ylabel('Gene Expression')
```
In this example, we assume that the OMICs data is stored in a CSV file named 'omic_data.csv', and the data has columns like 'Gene Expression' and 'Sample Type'. The program demonstrates basic data exploration, summary statistics calculation, data filtering, data aggregation, and data visualization using pandas and matplotlib libraries or power BI.
Hope it helps
  • asked a question related to Data Analysis
Question
3 answers
Are there methods to evaluate studies using medical data?
Relevant answer
Answer
I am NOT a doctor. With the help of AI, this info is collected. Hope it helps you.
Yes, there are several scales, checklists, and other methods available to assess the quality and transparency of research that utilizes patient data, including imaging data. These tools aim to evaluate various aspects of research methodology, data reporting, and transparency. Here are a few examples:
1. STARD (Standards for Reporting Diagnostic Accuracy Studies): STARD is a checklist designed to assess the reporting quality of diagnostic accuracy studies. While it is not specific to patient data or imaging, it can be applicable to studies that use imaging data for diagnostic purposes.
2. QUADAS (Quality Assessment of Diagnostic Accuracy Studies): QUADAS is a tool specifically developed to assess the quality of diagnostic accuracy studies. It focuses on the methodological aspects of the study design, patient selection, index test, reference standard, and flow of participants.
3. CONSORT (Consolidated Standards of Reporting Trials): CONSORT is a widely used guideline for reporting randomized controlled trials (RCTs). While not specific to patient data or imaging, it provides a comprehensive checklist for assessing the transparency and quality of trial reporting.
4. TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis): TRIPOD is a guideline specifically designed for reporting prediction model studies. It provides a checklist for assessing the transparency, quality, and risk of bias in studies that develop or validate prediction models using patient data.
5. QIBA (Quantitative Imaging Biomarkers Alliance): QIBA, an initiative by the Radiological Society of North America (RSNA), aims to improve the reliability of quantitative imaging biomarkers. While not strictly a checklist, QIBA provides a framework and guidelines for assessing and improving the quality and standardization of quantitative imaging.
  • asked a question related to Data Analysis
Question
5 answers
can anybody suggest some state-of-the-art research problems for Ph.D. on "AI in the academic sector"?
Relevant answer
Answer
AI adoption and technology readiness amid organizations are one of the problems that needs to be solved.
  • asked a question related to Data Analysis
Question
6 answers
Scenario - There is an IV and DV. IV is measured in 5 point likert scale questions and DV is measured in 7 point likert scale questions.
Doubts -
01. Can we run a test like regression analysis directly irrespective of differences in measures?
02. if NOT, what are the transformation techniques available to transform data into same scale?
Relevant answer
Answer
If you have only one item for each of your measures, then you have two ordinal measures and you should not use a procedure such as regression that assumes you have interval-level data. However, if you actually have multi-item measures that you use to form a scale, then you can create interval-variables and use correlation and regression.
  • asked a question related to Data Analysis
Question
5 answers
I calculated SPSS that is better.
Relevant answer
Answer
There is a saying that "one tool can't fix every problem." Most of these software for data analytics as already pointed out has its strengths and weaknesses. However, I think 🤔 that the open source tools have larger communities to help. So if you are now going to start then learn Python for data analysis. There are incredible packages for doing great analysis.
  • asked a question related to Data Analysis
Question
4 answers
In my thesis I calculated two of this method correlation and regression. But I can't understand which is better for data analysis.
Relevant answer
Answer
Application of both statistical tools depends on the context of your thesis and the problem statement and related hypothesis being addressed in your research.
Correlation is for evaluating and plotting relationship between data in general and variables within a data to be specific.
Regression brings out the nature of the relationship between data./ variables and is also used to make a predictive plot of this relationship.
The research objective, problem statement and hypothesis should direct the application of both these statistical tools.
  • asked a question related to Data Analysis
Question
14 answers
I am writing my bachelor thesis and I'm stuck with the Data Analysis and wonder if I am doing something wrong?
I have four independent variables and one dependent variable, all measured on a five point likert scale and thus ordinal data.
I cannot use a normal type of regression (since my data is ordinal and my data is not normally distributed and never will be (transformations could not change that) and is also violating homoscedasticity), so I figured ordinal logisitc regression. Everything worked out perfectly but the test of parallel lines on SPSS was significant and thus the assumption of proportional odds violated. So, I am now considering multinomial logisitc regression as an alternative.
However, here I could not find out how to test the assumption on SPSS: Linear relationship between continuous variables and the logit transformation of the outcome variable. Does somebody know how to do this???
Plus, I have a more profound question about my data. To get the data on my variables, I asked respondents several questions. My dependent variable for example is Turnover Intention and I used 4 questions using a 5 point likert scale, thus I got 4 different values from everyone about their Turnover Intention. In order to do my analysis, I took the average since I only want one result, so one value of Turnover Intention per respondent (and not four). However, now the data does not range from 1,2,3,4 and 5 anymore like before with the five point likert scale but is infinite since I took the average and now have decimals like 1,25 or 1,75. This leaves me with endless data points and I was wondering if my approach makes sense? I was thinking of grouping them together since my analysis is biased by having so many different categories due to the many decimals.
Can somebody provide any sort of guidance on this??
Relevant answer
Answer
Lisa Ss it doesn't make sense to pool the data that way, if you believe that you have ordinal data. You cannot simply calculate a mean or a sum score from the items, since ordinal data doesnt provide that information. This demands a metric scale. Therefore, your "average" score is not appropriate and hence the "grouping", too.
In my opinion you have several options:
1) Use an ordinal multilevel model to account for the repeated measures and the ordinality.
2) Conduct an ordinal confirmatory factor analysis, calculate the factor score for the latent variable and use this as a dependent variable in a OLS regression.
3) Do everything with an ordinal SEM, the structural and the measurement model.
4) Treat the oridinal items as metric (not recommended),
Maybe others have different approaches, please share.
  • asked a question related to Data Analysis
Question
3 answers
Which free software is suitable for XRD data analysis and how can I get it?
Relevant answer
Answer
Eleanor Bakker Thank you for your response. I am looking for quantitive phase analysis.
  • asked a question related to Data Analysis
Question
2 answers
Hi,
Kindly help me to understand when we can use AMOS or SMART-PLS for data analysis? thanks
Regards,
Relevant answer
Answer
The two programs use different statistical techniques: AMOS uses covariance based SEM, whereas SMART-PLS uses partial least squares. There is some critique concerning PLS and I would stick with the covariance based approach (but not necessarily with AMOS). But you have to know what your are doing and what is maybe more common in your field.
  • asked a question related to Data Analysis
Question
3 answers
I want to ask you a question about data analysis in psychology. I have two independent variables, one is group (between subjects), one is age (continuous data), and the dependent variable is a 6-point Likert score. I intend to use regression for data analysis, with three questions:
1. Should the ID of the subjects (the number of each subject) be included in the model as a random variable? If it is included, it should be Linear Mixed Model (LMM), if it is not included, it should be Multiple Linear Regression, right?
2. In the case of multiple linear regression, should I directly build the full model and see the influence of each independent variable, or should I build the full model and compare with the zero model, and then analyze with the method of step by step elimination?
3. When I do my analysis, do I need to centralize both age (continuous variable) and rating, or do I just need to centralize age?
Relevant answer
Answer
you are welcome
  • asked a question related to Data Analysis
Question
1 answer
Hello,
I am currently working on the data analysis for my Ph.D. project comparing the probability of occurrence of species density and richness (in hectare basis) between three different land use types using count data. Due to the design of the field study, I decided to use GLMM with Poisson distribution as I have various random effects and sites as a random effect that need to be accounted for. The model seems to be doing the job, however, I am not really sure how to report the results. I am using the lme4 package in the R console to analyze my data.
Thank you
Relevant answer
Answer
You would report it similar to how you would report an OLS anova.
Usually, people won't report anything about the random effects, except to mention that they treated as random effects in the model. But you can also do a hypothesis test on the random effects.
BTW, the last time I check lme4 doesn't have methods to report p-values of groups. The authors take that stance on principle. But you can use the lmerTest package to get an anova-like table.
  • asked a question related to Data Analysis
Question
2 answers
Can anyone please explain how to analyse data collected using brief cope 28 item scale?
Like i won't get the total score rather than score for each subscale and then it is mentioned in a research to use normative data of a heart failure study for calculating percentile ranks. Can anyone please help as I dont get the data analysis part after collection using this scale?
Relevant answer
Answer
I have the same question. How to interpret the result?
  • asked a question related to Data Analysis
Question
5 answers
Hi,
I have just installed and used spss 29. I was using spss 27.
I am analyzing data with a crossed random effects mixed model.
I am using syntax for this type of analysis. With the exact same syntax and data base I obtain different results with spss 29 and spss 27!
Specifically, the same model (that I call model 3) run with spss 27 was not giving me a warning whereas with spss 29 I get a warning (The final Hessian matrix is not positive definite although all convergence criteria are satisfied. The MIXED procedure continues despite this warning. Validity of subsequent results cannot be ascertained.).
Another case: with a slightly simpler model that I call model 2, I have no warnings but the results with spss 27 and spss 29 are not identical (e.g. BIC is different).
Is anyone experiencing the same or similar ?
Relevant answer
Answer
Thanks. I am running with SPSS29 a syntax written with SPSS27.
  • asked a question related to Data Analysis
Question
4 answers
discuss the study design, the relevant data to be collected, how two animal species i.e. cattle and pigs can be incorporated in one paper for discussion and the most relevant data analysis techniques for data that spans 5 years.
Relevant answer
Answer
The abattoirs are a unique location for retrospective study of Taenia sodium, Taenia saginata and Echinococcus granulosus as all the affected hosts (pigs, cattle, dogs and man) may be found in the abattoirs especially in Nigeria where stray dogs in abattoirs are not uncommon.
But a proper record keeping must bein place, Without an excellent records, it may be difficult to carry out your research.
  • asked a question related to Data Analysis
Question
9 answers
I'm wondering how can I code in a simple way analyze data quantitatively that has been reported as HH:MM format but its output is text/character (e.g. "09:00AM; "9AM"; "10am")
Relevant answer
Answer
Actually, it's clear that Uzair Essa Kori doesn't care whether the chat bot's answer is right or wrong, since I bet he couldn't write that Python code himself if his honour depended on it.
ResearchGate is failing dismally in taking frauds like him seriously. The whole value of a community of knowledge sharing is being rapidly undermined by plagiarists like him.
  • asked a question related to Data Analysis
Question
3 answers
I was performing a RNAseq data analysis. I did my alignment using RNA-STAR and then I perform featurecounts. I used latest assembly of human genome i.e. HG38.p14. But after feturecounts step i noticed that some gene were counted abnormally, like the screenshot i share you can see that ABO gene came two times, one as 'ABO' and then 'ABO_1' and you can see many more are came like this. in featured count i selected the option, "count them as single fragment". Dataset was illumina Paired end reads.
1. Dose anyone know What is the reason behind that?
2. Did I do any mistake during the processes that i didn't noticed?
3. What to do in this situation?
Thank you , very much for the time.
Relevant answer
Answer
I think the gene ID in your GTF or GFF3 files you used for constructing the alignment index might not be the transcript ID including splice variant, which cause multiple alignment to 1 gene. I think you'd better use the genome annotation file and sequence file (gtf and fa) file from the ensembl (with gtf available at https://ftp.ensembl.org/pub/release-110/gtf/homo_sapiens/Homo_sapiens.GRCh38.110.gtf.gz and fa at https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz) or download pre-bulit index from the website or your alignment tools.
  • asked a question related to Data Analysis
Question
3 answers
Data generation (collection) is a key and critical component of a qualitative research project. The question is, how can one make sure that sufficient data have been generated/collected?
Relevant answer
Answer
The very simple way to understand that you have collected sufficient data is that when you got the same answer from the respondent again and again, i mean you are not getting any new information form sespondents.
  • asked a question related to Data Analysis
Question
3 answers
Let's find the most essential and reliable no-code data science tools to speed up the elaboration of the research results. Thanks to Avi Chawla (source: LinkedIn post), I have some suggestions for you here. Let us know your tips.
Gigasheet
  • Browser-based no-code tool to analyze data at scale
  • Use AI to conduct data analysis
  • It's like a combination of Excel + Pandas with no scale limitations
  • Analyze up to 1B rows
Mito
  • Create a spreadsheet interface in Jupyter Notebook
  • Yse Mito AI to conduct data analysis
  • Automatically generates Python code for each analysis
PivotTableJS
  • Create Pivot tables, aggregations, and charts using drag-and-drop
  • Add heatmaps to tables
  • Works within Jupyter notebook
Drawdata
  • Draw any 2D scatter dataset by dragging the mouse
  • Export the data as DataFrame, CSV, or JSON
  • Create a histogram and line plot by dragging the mouse
PyGWalker
  • Open a tableau-style interface in Jupyter notebook
  • Analyze a DataFrame as you would in Tableau
Visual Python
  • A GUI-based Python code generator
  • Import libraries, perform data I/O, create plots, and write code for ML models by clicking buttons
Tensorflow Playground
  • Provides an elegant UI to build, train, and visualize neural networks
  • Browser-based tool
  • Change data, model architecture, hyperparameters, etc. by clicking buttons
ydata-profiling
  • Generate a standardized EDA report for your dataset
  • Works in a Jupyter notebook
  • Covers info about missing values, data statistics, correlation, and data interactions
Relevant answer
Answer
I can certainly tell you about some popular no-code data science tools that people often use:
1. **Tableau**: It's known for its data visualization capabilities, making it easy to create interactive charts and dashboards.
2. **Google Data Studio**: Great for creating custom, shareable reports and dashboards using data from various sources.
3. **IBM Watson Studio**: Offers a wide range of data science and machine learning tools with a user-friendly interface.
4. **RapidMiner**: Known for its powerful data preparation and machine learning features without the need for coding.
5. **KNIME**: A visual platform for data analytics, reporting, and integration.
6. **DataRobot**: Focuses on automated machine learning, making it easier to build predictive models.
7. **Alteryx**: Combines data preparation, data blending, and analytics into a single platform.
The choice of tool depends on your specific needs and preferences. These tools can be valuable for those who want to perform data science tasks without extensive coding knowledge.
  • asked a question related to Data Analysis
Question
13 answers
My area of interest are sustainable business and data analysis
Relevant answer
Answer
The most valuable thing for research according to the data provided is to study the tariff and customs barriers that the country itself has and the others that it wants to achieve. And the tastes of consumers.
  • asked a question related to Data Analysis
Question
3 answers
If i have got a matrix of 16x12 and i want to create 3 classes.Is there any machine learning technique which can identify the lower and upper boundary levels for each of the classes.
Relevant answer
Answer
The answer of Qamar Ul Islam is obviously AI generated.
I would recommend you to use one of the many clustering algorithms available in literature, k-means for example. However, if you already know which samples belong to which classes, what you want to find is a treshold between them, for that you can use a PCA approach, there are several algorithms such as confidence ellipses or Voronoi... Depends on what you exactly want.
  • asked a question related to Data Analysis
Question
3 answers
which one is the best online course for data analysis?
Relevant answer
Answer
s. Rama Gokula Krishnan has a point. This question is too vast for a simple answer, so the answer defaults to YouTube. Besides SPSS, Stata has a good YouTube channel, Orange has a brilliant one (it helps that Orange has a brilliant object-based interface), and there's lots on jamovi and JASP too.
Only you, Nasantogtokh Erdenebileg , can judge which is the best, because only you know what's best for your needs.
  • asked a question related to Data Analysis
Question
5 answers
Please mention name of open source software
Relevant answer
Answer
Claude Kiki - I don't think that Analyst can be considered Open Source.
  • asked a question related to Data Analysis
Question
1 answer
Am working in TVET in India and want to do a research study on earning & skill outcomes of TVET using data analytics. We have an active database of 15,000 apprentices that we are currently engaging so data collection should not be a problem.
Additionally would like to develop a framework for Dual System of Training for implementation in India. Seeking reference papers and general advice...
Relevant answer
Answer
Ideally, you should think about the variables in your data set and how you can use them to answer your research question.
  • asked a question related to Data Analysis
Question
3 answers
I‘m currently trying to find a fit for my data but I‘m struggling with finding the correct equivalent circuit. I‘m working with a two-electrode-system and my electrolyte is my analyte.
I can’t use the randels circuit and I wondered, why it describes just the electrolyte, the double layer and one electrode.
In my understanding, the circuit (in the case of a two-electrode-system) should consist of the resistor and capacitor from my working electrode, the resistor from my electrolyte and again a capacitor and resistor for my counter electrode.
I‘m quite new to the topic, so if someone has an answer or idea it would be very helpful!
Relevant answer
We usually consider only one electrode (working electrode, WE) because it is referred to as the reference electrode, RE. Thus, all interfacial processes occurring are taking place at the WE. Therefore, the equivalent circuit we propose considers all the current pathways contributing to the WE vs. RE impedance.
In the case of a two-electrode cell, you must consider the equivalent circuit that describes the current pathways that contribute to the impedance between your anode and cathode.
Suppose in your system, the charge transfer is absent. In that case, your equivalent circuit representing the interconnection of anode and cathode immersed in an electrolyte should be a resistance in serial connection with a double-layer capacitance (probably CPE). In your proposal, two resistors and two capacitors are equivalents to only one resistor and one capacitance. This is in agreement with Kirchoff's laws.
If there is a charge transfer in your system, you must include an additional resistor describing the global charge transferred between the anode and the cathode.
Best regards!
  • asked a question related to Data Analysis
Question
2 answers
He has sufficient knowledge of ready-made statistical applications and data analysis methods
Relevant answer
Answer
I think they should have done a subject in statistics at University or if not University, at least High School level.
  • asked a question related to Data Analysis
Question
3 answers
Dear all,
Are there any linking variables between, let's say TIMSS and PIRLS (both studies from IES), or studies of such kind, which could lead to a data analysis combining these sets of data when there are students participating in both studies?
I have seen some papers using data from different large studies, yet I'm not sure what approaches can be taken in this kind of analysis... What are your thoughts on this?
Relevant answer
No, not possible. TIMSS, PIRLS and ICCS are conducted by the same organization (IEA). PIRLS (grade 4) has some overlap with TIMSS (grades 4 and 8) and ICCS (grade 8) has some overlap with TIMSS. However, these are not the same students taking the test. These studies are cross-sectional and the studies are interested on results on population level, in every cycle the samples are drawn independently. So, although the target populations are always the same, the sampled students are always different. This applies not only across cycles of the same study, but also across studies. There is simply no match between students across the studies and across any studies' cycles. PISA is conducted by a different organization (OECD) which has nothing to do with the IEA. The sampling strategy of PISA is not even grade-based, but age-based and the samples are drawn from an independent sampling frame. So, there is no way to match them to the rest of the studies. In addition the cycles of all studies happen in different years, so it is impossible to sample the same students across the different studies.
Saying all this, TIMSS and PIRLS coincided in 2011. For the grade 4 assessment in both studies they used the same sample. This is the only case where you have match of students. This is why the IEA named this "Joint TIMSS and PIRLS" or TiPi. However, the data from this joint study is already more than a decade old and I don't know if it will be of any relevance for you.
  • asked a question related to Data Analysis
Question
6 answers
Exploring the role of AI and data analytics in improving our ability to predict and manage pandemics
Relevant answer
Answer
Enhancing Pandemic Forecasting and Response Through AI and Big Data Analytics
In the contemporary technological epoch, the synergistic confluence of Artificial Intelligence (AI) and Big Data Analytics (BDA) offers prodigious potentialities in the domain of epidemiology, specifically in the prognostication and management of pandemics. Below is an elucidative exegesis delineating the role of AI and BDA in this imperative juncture of public health.
  1. Temporal and Spatial Epidemiological Trend Detection:Heterogeneous Data Integration: AI methodologies, particularly deep learning architectures like convolutional neural networks (CNNs), can seamlessly amalgamate variegated data streams, ranging from climatic datasets to population mobility patterns. This facilitates the discernment of latent epidemiological trajectories. Geospatial Analytics: Leveraging geospatial big data, AI models can undertake spatial clustering, hotspots detection, and generate spatial epidemiological landscapes, thereby optimizing surveillance operations.
  2. Genomic Epidemiology and Phylogenetics:Pathogen Genomic Sequence Analysis: Deep learning frameworks, coupled with recurrent neural networks (RNNs) and long short-term memory (LSTM) units, can decode nucleotide sequences, enabling real-time tracking of pathogenic mutations and the subsequent epidemiological repercussions. Phylodynamic Modeling: The integration of phylogenetic trees with epidemiological data enhances pathogen transmission chain detection, assisting in the early intercession of superspreading events.
  3. Predictive Analytics and Forecasting:Epidemic Trajectory Forecasting: Leveraging techniques such as time series analysis, Gaussian processes, and Bayesian inference models, AI delineates potential epidemic trajectories, enhancing proactive pandemic management strategies. Sentinel Surveillance Augmentation: By harnessing natural language processing (NLP) and sentiment analysis on digital platforms, it's plausible to detect epidemiological anomalies and incipient outbreaks, thereby amplifying sentinel surveillance efficacy.
  4. Optimization of Resource Allocation:Reinforcement Learning for Policy Decisions: AI-driven reinforcement learning algorithms can simulate various pandemic response strategies, thereby elucidating optimal policy frameworks and resource allocations that minimize societal and economic ramifications. Supply Chain Analytics: Through BDA, the healthcare supply chain can be optimized in real-time, ensuring efficacious distribution of essential commodities like personal protective equipment (PPE) and vaccines.
  5. Socio-behavioral Analytics and Public Engagement:Sentiment Analysis on Public Discourse: By applying NLP on social media feeds and public discourse platforms, AI can gauge public sentiment, facilitating the development of targeted communication strategies and ensuring efficacious public engagement. Epidemiological Simulation Models: Agent-based modeling and cellular automata, driven by AI, can simulate various socio-behavioral scenarios, shedding light on potential transmission dynamics in diverse sociocultural milieus.
In summation, the concomitant integration of AI and BDA transcends traditional epidemiological paradigms, proffering an enhanced acumen in pandemic forecasting and response. As we embark upon the Fourth Industrial Revolution, the quintessential role of technologically-driven methodologies in public health resilience becomes incontrovertibly manifest.
  • asked a question related to Data Analysis
Question
3 answers
This question explores the potential of cutting-edge technology to provide early warnings for future pandemics and revolutionize our approach to pandemic preparedness.
Relevant answer
Answer
The prediction of novel pandemics is a complex challenge, and while advanced machine learning algorithms and AI-driven data analysis can play a role in pandemic preparedness, they cannot reliably predict the emergence of entirely new and unforeseen pandemics with high precision. Here are some key considerations:
  1. Lack of Historical Data:Predictive models, including machine learning and AI, typically rely on historical data to identify patterns and make predictions. However, novel pandemics by definition involve new pathogens that have not been previously observed in human populations. Therefore, there may be limited or no relevant historical data to analyze.
  2. Complexity and Unpredictability:The emergence of a novel pandemic involves a multitude of complex factors, including the mutation and transmission dynamics of pathogens, zoonotic spillover events, human behavior, international travel patterns, and more. These factors interact in unpredictable ways, making it difficult to build accurate predictive models.
  3. Data Limitations:While diverse datasets can provide valuable information for monitoring and responding to known diseases, they may not capture all the relevant factors associated with the emergence of a new pandemic. The data might also be incomplete, biased, or subject to reporting delays.
  4. Rare Events:Novel pandemics are rare events with significant societal impacts. Predictive models struggle with rare events because they lack sufficient examples to learn from. Most data-driven models are better suited for more common, recurring events.
  5. Ethical and Privacy Concerns:Collecting and analyzing data for the purpose of predicting novel pandemics could raise ethical and privacy concerns. Balancing the need for public health preparedness with individual rights and privacy is a challenging issue.
  6. Expertise and Collaboration:Effective pandemic preparedness and response require collaboration between AI/ML experts, epidemiologists, virologists, public health officials, and other domain experts. Expert judgment and insights are critical in interpreting model outputs and making informed decisions.
While AI and machine learning can't reliably predict the emergence of novel pandemics, they can contribute to pandemic preparedness and response in several ways:
  • Early Warning Systems: AI algorithms can analyze diverse datasets (e.g., social media, medical reports, environmental data) to detect unusual patterns or signals that might indicate the early stages of an outbreak.
  • Epidemiological Modeling: AI can help build more accurate and dynamic epidemiological models that assist in scenario planning and resource allocation during a pandemic.
  • Drug Discovery and Vaccine Design: AI can accelerate drug discovery and vaccine design by simulating molecular interactions and predicting potential candidates.
  • Healthcare Resource Allocation: Machine learning can help hospitals and healthcare systems optimize resource allocation during a pandemic, such as ICU bed availability and staff scheduling.
In summary, while advanced AI and machine learning techniques can enhance pandemic preparedness and response, predicting entirely novel pandemics remains a highly challenging task due to the inherent complexity, unpredictability, and data limitations associated with such events. Efforts should focus on a holistic approach that combines data-driven analysis with expert knowledge, surveillance systems, and international collaboration to mitigate the impact of pandemics.
  • asked a question related to Data Analysis
Question
3 answers
I apologize in advance as I am new to community data analysis.
I am currently analyzing differences in fish assemblages due to temperature extremes. I am looking at data over multiple years(the years were chosen based off the avg. temperature of that given year-looking at the warmest and coldest years from a larger dataset).
I chose three sampling sites within a a single bay, the sites are used as replicates for my analysis.
I am interested in determining if the assemblages have a significant difference between the years. The response variable is the counts of each species of fish collected. The ultimate goal would be to determine if there is any difference between assemblages in a given year and is it the significant difference occurring between a warm and cold year. Univariate biodiversity tests(richness, Shannon, etc.) were calculated as well as SIMPER to determine which species were contributing most to given results.
I am also looking at counts of the given years over multiple months but will be keeping months separate from one another (i.e August 2000-2010 data analysis is separate from September 2000-2010 data analysis).
I will be using R for my statistical program; but my question is which analysis is appropriate to use to test for significance. My only factor seems to be (year) so i am not sure if this is considered multivariate even in this case and if ANOSIM is even appropriate.
I would appreciate any feedback or constructive criticism at this point as I am pretty new to multivariate studies.
Relevant answer
Answer
Thank You! Engr. Tufail
  • asked a question related to Data Analysis
Question
4 answers
Dear all,
I have the following data and questions related to the issues in the results.
Independent variable
I did an online survey and asked parents questions about neighbourhood-built environment factors like neighbourhood type, accessibility, and neighbourhood safety. These are major factors that have sub-questions inside them. These questions are on a five-point Likert scale. Therefore, the parents’ responses are on a scale of 1 to 5. These are my independent variables.
I have calculated the average score based on the response to each question by the parents under the major factor. For example, in the neighbourhood type factor, I asked five questions. Each question was on a five-point Likert scale. So, If Q1 = 5 (response given by parent), Q2 = 3, Q3 = 2, Q4 = 1, Q5 = 1.
Then the subscale score for neighbourhood type was = 5+3+2+1+1 / 5 = 2.4
A similar procedure was adopted for other factors. The final score for each factor was taken as the binary categorical variable, with a cut-off value = 2.5.
Dependent variable
I also asked questions about the physical activity of their eldest adolescent kid. Adolescents' response was asked in the number of days in a week. This is my dependent variable, and I have taken it as a continuous variable.
Total data
I have collected this data for three cities in India. So, for the final dataset of the model, I have combined all the data points into a single Excel sheet.
Aim
I aim to understand the effect on adolescent physical activity due to the parents’ physical activity and built-environment factors. Therefore, I have used a multilevel linear regression model to investigate the relationship between children's PA and neighborhood-built environment. In this model, I have taken the city variable as a random effect variable and the adolescent's age & neighbourhood residency duration as a fixed effect.
Results
The results are unexpected, and I am not convinced. They are also statistically insignificant.
My questions are as follows:
  1. Is averaging and reporting the Likert scale as a subscale score correct?
  2. Is the binary categorisation of the factors a correct approach?
3. For this type of data, Is it correct to use a multi-level linear regression model for this type of data? As all the independent variables are fixed.
I would be grateful for the discussion and input to help me in this analysis.
warm regards
Laxman
Relevant answer
Answer
  1. Averaging Likert scale responses to create subscale scores is a valid practice, but ensure question validity and consider the distribution of your scores. Binary categorization of factors simplifies analysis but may oversimplify complex relationships. Multilevel linear regression is suitable for nested data, yet since all your independent variables are fixed, reconsider its appropriateness and explore adding covariates.
  • asked a question related to Data Analysis
Question
3 answers
In a causal model (such as multiple IVs and single DV) with presence of a mediator or moderator, do we have to consider such mediator or moderator when assessing the parametric assumptions or do we have ignore them and consider only the IV/s and DV in the model?
Relevant answer
Answer
Since you are going to involve a third variable that will eventually impact your results, you need to take that third variable into account and check for normality and other assumptions before you carry out your final analysis. However, while analysing the IV and DV, if the data is not found to be normally distributed, then a mediator or moderator is less likely to help ensure normality. In such a scenario, you could simply opt for non-parametric tests.
  • asked a question related to Data Analysis
Question
17 answers
Hi everyone
I'm facing a real problem when trying to export data results from imageJ (fiji) to excel to process it later.
The problem is that I have to change manually the dots (.) , commas (,) even when changing the properties in excel (from , to .) in order not count the numbers as thousands, (let's say I have 1,302 = one point three zero two) it count it as (1302 = one thousand three hundred and two) when I transfer to excel...
Lately I found a nice plugin (Localized copy...) that can change the numbers format locally in imageJ so it can be used easily by excel.
Unfortunately, this plugin has some bugs because it can only copy one line of the huge data that I have and only for one time (so I have to close and reopen the image again).
is there anyone that has faced this problem? Can anyone suggest me please another solutions??
Thanks in advance
Problem finally solved... I got the new version of 'Localized copy' plugin from the owner Mr Wolfgang Gross (not sure if I have the permission to upload it here).
Relevant answer
Answer
Jonas Petersen cool! some answers after years XD
  • asked a question related to Data Analysis
Question
7 answers
The fourth technological revolution currently underway is characterised by rapidly advancing ICT information technologies and Industry 4.0, including but not limited to machine learning, deep learning, artificial intelligence, ... what's next? Intelligent thinking autonomous robots?
The fourth technological revolution currently underway is characterised by rapidly advancing ICT information technologies and Industry 4.0, including but not limited to technologies learning machines, deep learning, artificial intelligence. Machine learning, machine learning, machine self-learning or machine learning systems are all synonymous terms relating to the field of artificial intelligence with a particular focus on algorithms that can improve themselves, improving automatically through the action of an experience factor within exposure to large data sets. Algorithms operating within the framework of machine learning build a mathematical model of data processing from sample data, called a learning set, in order to make predictions or decisions without being programmed explicitely by a human to do so. Machine learning algorithms are used in a wide variety of applications, such as spam protection, i.e. filtering internet messages for unwanted correspondence, or image recognition, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. Deep learning is a kind of subcategory of machine learning, which involves the creation of deep neural networks, i.e. networks with multiple levels of neurons. Deep learning techniques are designed to improve, among other things, automatic speech processing, image recognition and natural language processing. The structure of deep neural networks consists of multiple layers of artificial neurons. Simple neural networks can be designed manually so that a specific layer detects specific features and performs specific data processing, while learning consists of setting appropriate weights, significance levels, value system for components of specific issues defined on the basis of processing and learning from large amounts of data. In large neural networks, the deep learning process is automated and self-contained to a certain extent. In this situation, the network is not designed to detect specific features, but detects them on the basis of the processing of appropriately labelled data sets. Both such datasets and the operation of neural networks themselves should be prepared by specialists, but the features are already detected by the programme itself. Therefore, large amounts of data can be processed and the network can automatically learn higher-level feature representations, which means that they can detect complex patterns in the input data. In view of the above, deep learning systems are built on Big Data Analytics platforms built in such a way that the deep learning process is performed on a sufficiently large amount of data. Artificial intelligence, denoted by the acronym AI (artificial intelligence), is respectively the 'intelligent', multi-criteria, advanced, automated processing of complex, large amounts of data carried out in a way that alludes to certain characteristics of human intelligence exhibited by thought processes. As such, it is the intelligence exhibited by artificial devices, including certain advanced ICT and Industry 4.0 information technology systems and devices equipped with these technological solutions. The concept of artificial intelligence is contrasted with the concept of natural intelligence, i.e. that which pertains to humans. In view of the above, artificial intelligence thus has two basic meanings. On the one hand, it is a hypothetical intelligence realised through a technical rather than a natural process. On the other hand, it is the name of a technology and a research field of computer science and cognitive science that also draws on the achievements of psychology, neurology, mathematics and philosophy. In computer science and cognitive science, artificial intelligence also refers to the creation of models and programmes that simulate at least partially intelligent behaviour. Artificial intelligence is also considered in the field of philosophy, within which a theory is developed concerning the philosophy of artificial intelligence. In addition, artificial intelligence is also a subject of interest in the social sciences. The main task of research and development work on the development of artificial intelligence technology and its new applications is the construction of machines and computer programmes capable of performing selected functions analogously to those performed by the human mind functioning with the human senses, including processes that do not lend themselves to numerical algorithmisation. Such problems are sometimes referred to as AI-difficult and include such processes as decision-making in the absence of all data, analysis and synthesis of natural languages, logical reasoning also referred to as rational reasoning, automatic proof of assertions, computer logic games e.g. chess, intelligent robots, expert and diagnostic systems, among others. Artificial intelligence can be developed and improved by integrating it with the areas of machine learning, fuzzy logic, computer vision, evolutionary computing, neural networks, robotics and artificial life. Artificial intelligence (AI) technologies have been developing rapidly in recent years, which is determined by its combination with other Industry 4.0 technologies, the use of microprocessors, digital machines and computing devices characterised by their ever-increasing capacity for multi-criteria processing of ever-increasing amounts of data, and the emergence of new fields of application. Recently, the development of artificial intelligence has become a topic of discussion in various media due to the open-access, automated and AI-enabled solution ChatGPT, with which Internet users can have a kind of conversation. The solution is based and learns from a collection of large amounts of data extracted in 2021 from specific data and information resources on the Internet. The development of artificial intelligence applications is so rapid that it is ahead of the process of adapting regulations to the situation. The new applications being developed do not always generate exclusively positive impacts. These potentially negative effects include the potential for the generation of disinformation on the Internet, information crafted using artificial intelligence, not in line with the facts and disseminated on social media sites. This raises a number of questions regarding the development of artificial intelligence and its new applications, the possibilities that will arise in the future under the next generation of artificial intelligence, the possibility of teaching artificial intelligence to think, i.e. to realise artificial thought processes in a manner analogous or similar to the thought processes realised in the human mind.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
The fourth technological revolution currently taking place is characterised by rapidly advancing ICT information technologies and Industry 4.0, including but not limited to machine learning technologies, deep learning, artificial intelligence, .... what's next? Intelligent thinking autonomous robots?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Best regards,
Dariusz Prokopowicz
Relevant answer
Answer
The drive to build autonomous, thinking, intelligent robots, androids raises many ethical controversies and potential risks. In addition to this, the drive to build artificial consciousness as a kind of continuation of the development of artificial intelligence is also controversial.
What is your opinion on this topic?
Best regards,
Dariusz Prokopowicz
  • asked a question related to Data Analysis
Question
2 answers
Hello everyone,
I am Danillo Souza, and I am currently a Post-Doc Researcher at Basque Center for Applied Mathematics (BCAM). I am currently working on the Mathematical, Computational and Experimental Neuroscience Group (MCEN). One of the challenges of my work is to derive optimal tools to exact topological and/or geometrical information from Big data.
I am trying to submit a work to arXiv and unfortunately, an endorsement in Physics - Data Analysis and Statistics is required. I was wondering if some researcher could be my endorser in this area.
Beforehand, I appreciate your efforts in trying to help me.
With kind regards,
Danillo
Danillo Barros De Souza requests your endorsement to submit an article to the physics.data-an section of arXiv. To tell us that you would (or would not) like to endorse this person, please visit the following URL: https://arxiv.org/auth/endorse?x=UOKIX3 If that URL does not work for you, please visit http://arxiv.org/auth/endorse.php and enter the following six-digit alphanumeric string: Endorsement Code: UOKIX3
Relevant answer
Answer
Publish your paper for free
_________________________
Dear Researchers and postgraduate students
MESOPOTAMIAN JOURNAL OF BIG DATA (MJBD) issued by Mesopotamian Academic Press, welcomes the original research articles, short papers, long papers, review papers for the publication in the next issue the journal doesn’t requires any publication fee or article processing charge and all papers are published for free
Journal info.
1 -Publication fee: free
2- Frequency: 1 issues per year
3- Subject: computer science, Big data, Parallel Processing, Parallel Computing and any related fields
4- ISSN: 2958-6453
5- Published by: Mesopotamian Academic Press.
6- Contact: email: [email protected]
Managing Editor: Dr. Ahmed Ali
The journal indexed in
1- Croosref
2- DOAJ
3- Google scholar
4- Research gate
  • asked a question related to Data Analysis
Question
4 answers
Does anyone have a data analysis software setup to share with me?
Relevant answer
Okay noted, thanks Brian
  • asked a question related to Data Analysis
Question
1 answer
In case you perform data analysis and the CV is 34, what is the implication?
Relevant answer
Answer
The implication is that the standard deviation is 34% of the arithmetic mean (if the calculations have been done correctly).
There is no "optimal value" of CV.
  • asked a question related to Data Analysis
Question
3 answers
Suppose I want to analyse 100 year of monthly mean flow data of a particular month.
Relevant answer
Answer
For skewed data, it is better to do a log-transform first to see if it improves the assumption requirement, normally needed for many changepoint algorithms. Here is an example showing the log-tranformation of the covid infection time series:https://stats.stackexchange.com/questions/434990/changepoint-analysis-with-missing-data
  • asked a question related to Data Analysis
Question
3 answers
What are the potential research implications of employing entropy for one application and ARCH GARCH models for another application when applied to the same dataset? How might the utilization of these two different approaches influence the outcomes and insights derived from the data analysis?
Relevant answer
Answer
Many thanks for your response, Lindly but let me tell
  1. In what ways do entropy and ARCH GARCH models complement each other when applied to the same dataset, and how do these applications affect data analysis outcomes?
  2. How can entropy be employed to identify patterns in data, and how does this differ from the application of ARCH GARCH models in data analysis?
  • asked a question related to Data Analysis
Question
3 answers
Today more and more people gain data analytics skills, one of which to fill empty vacancies. Cities and communities are also leveraged by these trends, there could be more people to help maintain public policy analyst reproducible code base. The infrastructure need to be in place to support more public engagement. How do you think about this phenomena?
Relevant answer
Answer
Participatory analytics skills are becoming increasingly important in the workforce. These skills can help individuals and organizations make better decisions by analyzing data and identifying patterns and trends.
  • asked a question related to Data Analysis
Question
1 answer
This pertains to the transcendental aspect of phenomenology.
Relevant answer
Answer
Hi,
In phenomenology, one employs the 'epoché' to 'bracket out' or suspend judgements about the external world, concentrating instead on subjective experience. During data analysis, this facilitates the isolation of phenomena for scrutiny, devoid of preconceptions or theoretical frameworks. This is in line with the transcendental aspect of phenomenology, which emphasises consciousness and subjective experience.
Hope this helps.
  • asked a question related to Data Analysis
Question
4 answers
Is it possible to run Sharpe s Index, Value at Risk ( historical model or minte carlo model) and Monte carlo simulation simultaneously on the same data set to get a detailed understanding on the same data set of 5 year performance of individual banks vs Bank nifty Index towards understanding the performance?
Literature support is there for individual studies, but is it feasible to look at the same in a combined manner?
Can you let me know your suggestions on this?
Relevant answer
Answer
Hi! Alan. Yes, but with caution, because combining multiple finance models in a single research project is doable and helpful if the research objectives, data compatibility, and model assumptions are carefully considered. However, epistemological issues must be addressed.
1. Need to ensure each model's data is compatible and integrates well. Different models have different data needs, and if the data is not directly comparable or matched, the findings may be inaccurate or biased.
2. No model is developed without assumptions. The models' assumptions' compatibility must be thoroughly assessed. Conflicting outcomes or interpretations may result from mismatched assumptions.
3. Combining models can produce complex outcomes that are hard to understand and express. To gain relevant insights techniques and results must be clearly explained.
  • asked a question related to Data Analysis
Question
2 answers
Communication of the machine with the computer is OK and when started in normal way Green LED gets stable. Then when you click the Data Collection software the Service Console window opens and first rectangle of "Messaging Service" gets green but next Data Service triangle remains yellow and no further progress. Can someone guide what may the reason and possible solution please. Thank you.
Relevant answer
Answer
One thing I must have mentioned. Before this issue the computer was behaving strangely, some times it would be turned on easily and some times after several attempts it would be on and after that it would work smoothly without any issue with the machine or sequencing electrophoresis process. We got it checked by a repair expert and he noticed that there was a short-circuiting in the motherboard. He soldered two or three lose points and there was no issue in the turning on or off but now this new issue (mentioned in the question) has developed. Is it with the motherboard issue? If yes then is there any solution without replacing the motherboard as it is very old model?
  • asked a question related to Data Analysis
Question
13 answers
Analysis of qualitative data requires intensive reading of the transcripts, field reports, diaries, journals, and other documents. It is a continuous and to-and-fro process. What changes do you, as a qualitative researcher, face during data analysis and how do you overcome those challenges?
Relevant answer
Answer
I have recently been experimenting with artificial intelligence for the interview of qualitative data via ChatGPT, and I am very impressed with the results. In particular, I started by re-analyzing data from two of my previous studies, and I was surprised by how rapidly the program produced the main concepts from those studies.
Just asking a few general questions produced the important key dimensions, and asking follow-up questions gave more detailed information about each of those dimensions. Of course, the program cannot literally "interpret" the results for you, but it certainly could replace a laborious coding process as a tool for locating the core content that you need to interpret.
Like any other approach to qualitative analysis, it does require familiarity with your data (you can't just throw anything at it), but beyond that, the program has a strong potential for being an alternative to existing techniques for the initial stages of working with qualitative data.
  • asked a question related to Data Analysis
Question
3 answers
What is data analytics and its 4 types for research?
Relevant answer
Answer
I agree with the above except that descriptive can also be a description of present data, not just historical data, maybe I'm wrong.
  • asked a question related to Data Analysis
Question
3 answers
I have planned to conduct a study of Antenatal Education program using "One Group Pretest Posttest" for data analysis.
I used the statistics of prior to calculate sample size but it came out just 4.
Can I really use this number? Please find the details on file attached.
Relevant answer
Answer
The assumed effect size is huge (the group standard deviations are very small relative to the mean difference) and the correlation (.5) is substantial. That's why your estimated sample size is quite small. I would check to make sure you can really expect such a huge effect/difference between the groups/matched pairs.