Science topic
Data Analysis - Science topic
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Questions related to Data Analysis
I'm currently analysing the results of my survey but I'm encountering the problem that my quantitative data is too similar and I'm not sure how to interpret it.
Does anyone have any advice for me or can recommend any reading about this issue?
Dear colleagues
Could you tell me please,how is it possible to consruct boxplot from dataframe in rstuio
df9 <- data.frame(Kmeans= c(1,0.45,0.52,0.54,0.34,0.39,0.57,0.72,0.48,0.29,0.78,0.48,0.59),hdbscan= c(0.64,1,0.32,0.28,0.33,0.56,0.71,0.56,0.33,0.19,0.53,0.45,0.39),sectralpam=c(0.64,0.31,1,0.48,0.24,0.32,0.52,0.66,0.32,0.44,0.28,0.25,0.47),fanny=c(0.64,0.31,0.38,1,0.44,0.33,0.48,0.73,0.55,0.51,0.32,0.39,0.57),FKM=c(0.64,0.31,0.38,0.75,1,0.26,0.55,0.44,0.71,0.38,0.39,0.52,0.53), FKMnoise=c(0.64,0.31,0.38,0.75,0.28,1,0.42,0.45,0.62,0.31,0.25,0.66,0.67), Mclust=c(0.64,0.31,0.38,0.75,0.28,0.46,1,0.36,0.31,0.42,0.47,0.66,0.53), PAM=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,1,0.73,0.43,0.39,0.26,0.41) ,
AGNES=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,1,0.31,0.48,0.79,0.31), Diana=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,1,0.67,0.51,0.43),
zones2=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,1,0.69,0.35),
zones3=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,0.59,1,0.41),
gsa=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,0.59,0.36,1), method=c("kmeans", "hdbscan", "spectralpam", "fanny", "FKM","FKMnoise", "Mclust", "PAM", "AGNES", "DIANA","zones2","zones3","gsa"))
head(df9)
df9 <- df9 %>% mutate(across(everything(), ~as.numeric(as.character(.))))
Thank you ery much
Dear Experts,
I have a question, I am analyzing data for meta-analysis, is this possible SMD greater than 1 for any study,
I have a study in my data indicate 2 .03 SMD.
Hi!
I want to use the ADL model for my data analysis. However, after performing a stationary test, dependent and 6/8 independent variables are stationary only in differences. The other two are stationary in levels.
Is the cointegration test always necessary?
If so, I found on the Internet that I can only use the Pesaran Bounds test because I have a mix of I(0) and I(1) variables.
Is it true? I am not sure.
And how do you perform that test?
Thanks a lot for your suggestions.
AI in research offers tremendous potential, but ethical considerations are crucial. Biases in data or algorithms can lead to discriminatory or unfair results. The "black box" nature of some AI models makes it difficult to understand their reasoning, raising concerns about accountability. Ensuring data privacy, transparency in research methods, and maintaining human oversight are all essential for ethical AI-powered research.
I am studying leadership style's impact on job satisfaction. in the data collection instrument, there are 13 questions on leadership style divided into a couple of leadership styles. on the other hand, there are only four questions for job satisfaction. how do i run correlational tests on these variables? What values do i select to analyze in Excel?
Based on your expertise, which softwares are the best for the data analysis and graphs for the quantitative study of microbial biofilms?, pros & cons?
In the process of drafting a new research article, which structure is most effective? Should one follow the order of Introduction, Materials and Methods, Data Analysis, Results, and Discussion, or is it better to write the Materials and Methods, Results, Discussion first and leave the Introduction for the end? What is the commonly adopted approach by other scholars/researcher around the world?
Hey there!
I want to learn about correlation. I recently worked on a project related to rice genotypic trials, using a one-factorial RCBD design. While I know how to statistically analyze phenotypic and genotypic correlation, I specifically want to learn how to analyze environmental correlation using R. Could anyone help me out?
Thank you in advance. :)
Imagine you join a new research lab and are immediately assigned a dataset that contains the same variables as those in the substance abuse dataset but comprising a new sample. You are told the lab originally collected this data set with one particular research question in mind: “Does satisfaction with life predict health outcomes?” Before doing any statistical tests, you decide to browse through the dataset and make some graphs of the results. It seems to you that in your data, satisfaction with life predicts health outcomes more strongly in males than in females, and on reflection, you can think of several theoretical reasons why that should be the case. You disregard the data on females and investigate the hypothesis that low levels of satisfaction with life (using the “swl” variable) will be positively predictive of mental ill health (using the “psych6” variable) in males. You finally do a statistical test, and obtain a very low p-value (less than .001) associated with the regression coefficient. You write a paper using this single result, concluding that there is strong evidence for your hypothesis.
Question: What is a term for the practice that you are engaging in?
Is this practice a p-hacking or the garden of forking path?
I have a case control study that I would like to publish but there has already been a meta analysis of observational studies in the same topic but on a different population (Iranians) and during a different time period (2000 to 2016). Mine is in the USA and will analyze data from 2016 onwards. Will my study be novel enough for a shot at a good journal?
2024 4th International Conference on Machine Learning and Intelligent Systems Engineering (MLISE 2024) will be held on June 28- June 30, 2024 in Zhuhai China.
MLISE is conducting exciting series of symposium programs that connect researchers, scholars and students to industry leaders and highly relevant information. The conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. MLISE propose new ideas, strategies and structures, innovating the public sector, promoting technical innovation and fostering creativity in development of services.
---Call For Papers---
The topics of interest for submission include, but are not limited to:
1. Machine Learning
- Deep and Reinforcement learning
- Pattern recognition and classification for networks
- Machine learning for network slicing optimization
- Machine learning for 5G system
- Machine learning for user behavior prediction
......
2. Intelligent Systems Engineering
- Intelligent control theory
- Intelligent control system
- Intelligent information systems
- Intelligent data mining
- AI and evolutionary algorithms
......
All papers, both invited and contributed, will be reviewed by two or three experts from the committees. After a careful reviewing process, all accepted papers of MLISE 2024 will be published in the MLISE 2024 Conference Proceedings by IEEE (ISBN: 979-8-3503-7507-7), which will be submitted to IEEE Xplore, EI Compendex, Scopus for indexing.
Important Dates:
Submission Deadline: April 26, 2024
Registration Deadline: May 26, 2024
Conference Dates: June 28-30, 2024
For More Details please visit:
Invitation code: AISCONF
*Using the invitation code on submission system/registration can get priority review and feedback
Apart from using the CASP tool, which I can only get to review articles with qualitative studies, I am looking to review articles that equally use quantitative methodology for their research
Would it be considered academic dishonesty if a phd student hire a data analyst to conduct the data analysis for his/here thesis?
Hello guys
I want to employ FMRI for conducting research.
At first step, I want to know FMRI data is an image like MRI.
Or I should behave with FMRI like time-series when it comes to analyzing data
thank you
I need help analyzing enzyme kinetic data.
I have data from the Octet K2 system. In my experiment, I load the sensor with our protein of interest (6XHis tag on my recombinant protein to Ni-NTA sensors) and then expose this sensor to increasing concentrations of the candidate binding protein (five concentrations per experiment and each experiment is replicated four times). Each association step is followed by a dissociation step in buffer. A control sensor is used in each experiment where a sensor is loaded with the protein of interest but only exposed to buffer. (See picture, Part 1)
I have separate data where I loaded smaller recombinant domains of the protein of interest to the sensor and exposed it to the candidate binding protein. I would like to combine this data (the binding of the full-length protein and the binding of the domains) on the same graph.
My problem: In trying to analyze the data with the software provided with the Octet system (HT 11.1), the data misaligns. (See picture, Part 2)
My goal is to determine kinetic constants (KD) of the full-length protein and its separate domains to the protein of interest.
Suggestions for correctly aligning the data in the Octet software HT11.1? (I think the misalignment is because the program is trying to align the y axis to baseline 1 instead of baseline 2, which is the baseline right before the association step. If so, can you change this label after the fact?)
If the glitch with the Octet software cannot be fixed, then is there a manual/tutorial for the enzyme kinetic module for Sigma Plot?
I found I can extract the raw data from the Octet system. I can remove the background from the control sensor and manually assign concentrations. I uploaded this into Sigma plot 15, which has an enzyme kinetic module. I found the embedded help guide, but I have specific questions. For example:
*My candidate binding protein does not change, but how do you take into account the change in the kilodaltons of the proteins that are loaded to the sensor, full length vs. the smaller domain proteins? This is automatically taken care of in the Octet software.
*How do I differentiate between the association and dissociation phases?
I am new to Octet biolayer analysis and the Enzyme Kinetic Module analysis in Sigma Plot.
Any help will be greatly appreciated! I am happy to provide any more information.
Hi all!
I've been collecting data on a group of 8 chimpanzees at Chester Zoo for my dissertation. The group consists of 4x males and 4x females, all of which have different hierarchical status' and ages.
I have been doing random focal observations with a checksheet consisting of 4 state behaviours (timed) and 6 behaviours (frequencies). I would start a random focal observation when a stressful context arose (such as high visitor numbers, anticipation to feeding, or feeding time)and denote the durations or frequencies of behaviours exhibited by that individual for 15 minutes. Then at the following visit, I would observe the same individual at the same time but under a non-stressful context (therefore utilising the Matched Control Method).
This process repeated for 4 months and I now have a complete data set.
I am <really> struggling on 1. How to use SPSS, and 2. What tests would be ideal to use? As you can imagine there is quite alot of data which hold different values so you can hopefully see my confusion around this.
Ideally, the statistical analysis of my data will reveal which contexts in the zoo precipitate an increase in stress the most (e.g. high visitor numbers, anticipation to feeding, feeding). I also want to be able to compare this data to the hierarchial status' and ages of the individuals.
Any help would be so appreciated. Thanks in advance!
compliments of the day. Please, how do I go about the data analysis of my PhD work? human health risk assessment of heavy metals. biomarker and heavy metal analysis of human and environmental samples.
#biomarker data analysis.
#Heavy metal data analysis
#environmental samples data analysis
Hi,
Can anyone explain if there is a better system available to analyze data than the IBM SPSS?
Thank you,
Ameenah
Ecoplate is done to know the metabolic diversity of soil at community level.
I have several pairs of parameters (obtained from females and males) and want to find the difference in correlation between the two sexes for each parameter, but also want to give a weight so that the parameter showing the highest correlation with survival in either females or males have a greater weight. This way, I hope to find factors that shows a combination of strong correlation differences between females and males (with regard to survival) - and most positively correlated with survival for either sex (which I will resolve further).
To do this, if I have a correlation of parameter 1 for males as A and for females as B: I plan to do (A-B) multiplied by A or B (the highest correlation) - to acknowledge the weight of highest positive correlation with survival. For the next parameter, the correlation is C for males and D for females, I will do (C-D) X C or D (whichever is highest) - with the final aim to rank the parameters most differing between females and males, as well as, most correlating with survival of either sex. Do you think it is a reasonable idea?
I would be very very grateful for your advice, suggestions and tips.
I am a researcher who decided to buy labtop suitable for my work as a simulation and data analysis work
Please suggest one
I am checking if there any systematic differences in physical activity among different income group and education level for my masters’ thesis. The physical activity has been assessed through questionnaire in a municipality and taken three different dimensions (intensity, duration and frequency). I wonder if there is any way to integrate all three dimensions to make one new variable which could provide more reliable value for physical activity. If not possible and I have to select only one measure which one could be more reliable.
Hello everyone, for my dissertation I have two predictor variables and one criterion variable. In one of the predictor variable- I further have 5 domains and it doesn't have a global score so in that case can i used multiple regression or i have to perform step wise linear regression seperately for 6 predictors(5 domains and another predictor) ?- keeping in mind the assumption of multicollinearity.
Here are some examples of software that can be used for each step of RNA-seq data analysis:
- Quality Control: FastQC, PRINSEQ, Sickle
- Read Trimming: Trimmomatic, Cutadapt, AdapterRemoval
- Alignment: STAR, HISAT2, TopHat
- Quality Control of Alignment: Qualimap, RSeQC, Picard
- Assembly: Trinity, Oases, Trans-ABySS
- Quantification: RSEM, Kallisto, eXpress
- Differential Expression Analysis: DESeq2, EdgeR, limma
- Functional Annotation: Blast2GO, KEGG, Reactome
- Pathway Analysis: KEGG Pathway, Reactome, Enrichr
- Network Analysis: Cytoscape, STRING, ClueGO
- Visualization: IGV, GenomeBrowse, JBrowse
- Interpretation: GSEA, DAVID, IPA
Dear Scientists and Researchers,
I'm thrilled to highlight a significant update from PeptiCloud: new no-code data analysis capabilities specifically designed for researchers. Now, at www.pepticloud.com, you can leverage these powerful tools to enhance your research without the need for coding expertise.
Key Features:
PeptiCloud's latest update lets you:
- Create Plots: Easily visualize your data for insightful analysis.
- Conduct Numerical Analysis: Analyze datasets with precision, no coding required.
- Utilize Advanced Models: Access regression models (linear, polynomial, logistic, lasso, ridge) and machine learning algorithms (KNN and SVM) through a straightforward interface.
The Impact:
This innovation aims to remove the technological hurdles of data analysis, enabling researchers to concentrate on their scientific discoveries. By minimizing the need for programming skills, PeptiCloud is paving the way for more accessible and efficient bioinformatics research.
Join the Conversation:
- How do you envision no-code data analysis transforming your research?
- Are there any other no-code features you would like to see on PeptiCloud?
- If you've used no-code platforms before, how have they impacted your research productivity?
PeptiCloud is dedicated to empowering the bioinformatics community. Your insights and feedback are invaluable to us as we strive to enhance our platform. Visit us at www.pepticloud.com to explore these new features, and don't hesitate to reach out at [email protected] with your thoughts, suggestions, or questions.
Together, let's embark on a journey towards more accessible and impactful research.
Warm regards,
Chris Lee
Bioinformatics Advocate & PeptiCloud Founder
I'm recently trying to perform an RNA seq data analysis and in 1st step, I faced a few questions in my mind, which I would like to understand. Please help to understand these questions.
1) In 1st image, raw data from NCBI-SRA have marked 1&2 at the ends of the reads, What is the meaning of this? are those meaning forward and reverse reads?
2) In the second image I was trying to perform trimmomatic with this data set. I chose "paired-end as a collection" but it does not take any input even though my data was there in "fastqsanger.gz" format. Why is that? Should I treat this paired-end data as single-end data while performing Trimmomatic?
3) in the 3rd and 4th images, I collected the same data from ENA where they give two separate files for 1 and 2 marked data in SRA. Then I tried to process them in Trimmomatic by using "Paired-end as individual dataset" and then run it. Trimmomatic gives me 4 files for those, Why is that? which one will be useful for alignment ??
A big thank you in advance :)
+1
I'm performing RNA-seq data analysis. I want to do healthy vs disease_stage_1, Healthy vs disease_stage_2, and Healthy vs disease_stage_3. In the case of healthy, disease_stage_1, disease_stage_2, and disease_stage_3 data sets, I have 19, 7, 8, and 15 biological replicates respectively.
Does this uneven number of replicates affect the data analysis?
Should I Use an even no of datasets like for every dataset, 7 biological replicates (As the lowest number of replicates here is 7)?
Kindly research on how AI is going to impact on legal sevices and in particular arbitraration.Right now some firms have started using AI for research,due diligence and data analytics.
It prompts for an explanation regarding which method is more suitable and why, aiming to enhance understanding of the selection process between these two techniques in the context of mixed data analysis. Is there any other ordination technique suitable for mixed data type?
Cosine similarity, Soft Cosine similarity or SBERT?
We usually use Excel in our lab to analyse data, but I would like to take a course on a more sophisticated tool. Can you share which one is most common in molecular biology?
Thank you :)
Could this be due to an error in Mass Spec calibration or data analysis? I have 2 technical repeats that are fine, but the 3rd repeat is far away in the PCA plot and clusters with replicates of a different sample.
What is the short new way for you to solve this problem of data analysis in time series?Suppose you have time series data. What steps do you take and how to analyse this data? How to solve it in your work?Follow me and share with me your post and your personal experience.
What are some innovative approaches to data analysis and visualization in phonology research?
Dear experts,
I have noticed that researchers who are able to publish in first-tier journals often use advanced data analysis methods, which usually involve numerical forms such as Confirmatory Factor Analysis (CFA), Structural Equation Modelling, and Comparative Analysis.
While I acknowledge the significance of using advanced data analysis methods like CFA and structural equation modelling to answer specific research questions, I am interested to know why there is a preference for these methods over qualitative studies.
Look forward to hearing from you.
What are the applications of Industry 4.0/5.0 technologies, including Big Data Analytics and generative artificial intelligence to business entities to improve business entity management processes?
What are the applications of Industry 4.0/5.0 technologies, including Big Data Analytics, Data Science, multi-criteria simulation models, digital twins, additive manufacturing, Blockchain, smart technologies and also generative artificial intelligence to business entities in order to improve internal business intelligence information systems supporting the management processes of a company, enterprise, corporation or other type of business entity?
In recent years, there has been a growing scale of implementation of Industry 4.0/5.0 technologies, including Big Data Analytics, Data Science, multi-criteria simulation models, digital twins, additive manufacturing, Blockchain, smart technologies and also generative artificial intelligence to business entities in order to improve internal information systems of the Business Intelligence type supporting the management processes of a company, enterprise, corporation or other type of business entity. The Covid-19 pandemic has accelerated the processes of digitizing the economy. The importance and application of analytics conducted via the Internet and/or using data downloaded from the Internet is also growing. An example is sentiment analysis conducted on data downloaded from the Internet implemented on Big Data Analytics platforms being an additional research instrument of conducted market research, marketing research as an additional source of data for conducted Business Intelligence type analysis. This is particularly important because in recent years the importance of Internet marketing, including viral marketing, Real-Time marketing carried out on social media sites is increasing. Accordingly, in many industries and sectors of the economy, there is already an increase in the application of certain Industry 4.0 technologies, i.e., such as Big Data Analytics, Data Science, cloud computing, machine learning, personal and industrial Internet of Things, artificial intelligence, Business Intelligence, autonomous robots, horizontal and vertical data system integration, multi-criteria simulation models, additive manufacturing, Blockchain, cybersecurity instruments, Virtual and Augmented Reality and other advanced data processing technologies Data Mining. Besides, using Big Data Analytics, interesting research is being conducted in the field of the issue: Analysis of changes in the relationship of consumer behavior in the markets for goods and services caused by the impact of advertising campaigns conducted on the Internet, applying new Internet marketing tools used in new online media, including primarily social media. The growth of behavioral economics and finance, including the analysis of the determinants of media formation of consumer opinions on the recognition of the company's brand, product and service offerings, etc., through the growth of Internet information services, including social media portals. Currently, online viral marketing based on social media portals and customer data collected and processed in Big Data Analytics databases is developing rapidly. In recent years, new online marketing instruments have also been developed, applied mainly on social media portals and are also used by e-commerce companies. Internet technology companies and fintechs are also emerging, offering online information services to assist marketing management, including in planning advertising campaigns for products sold via the Internet. For this purpose, the aforementioned sentiment analyses are used to study the opinions of Internet users regarding the prevailing awareness, recognition, brand image, mission, offerings of certain companies. Sentiment analysis is carried out on large data sets taken from various websites, including millions of social media pages, collected in Big Data systems. The analytical data collected in this way is very helpful in the process of planning advertising campaigns carried out in new media, including social media sites. These campaigns advertise, among other things, products and services sold via the Internet, available in online stores. In view of the above, the development of e-commerce is mainly determined by technological advances in ICT information technology and advanced data processing technology Industry 4.0, as well as new technologies used in securing financial transactions carried out over the Internet, including transactions related to e-commerce, i.e. blockchain technology, for example. In my opinion, ongoing scientific research confirms the strong correlation occurring between the development of Big Data technologies, Data Science, Data Analytics and the efficiency of the use of knowledge resources. I believe that the development of Big Data technology and Data Science, Data Analytics and other ICT information technologies, multi-criteria technology, advanced processing of large sets of information, Industry 4.0 technology increases the efficiency of the use of knowledge resources, including in the field of economics, finance and organizational management. In recent years, ICT information technologies, Industry 4.0, etc., have been developing particularly rapidly and are being applied in knowledge-based economies. These technologies are being applied in scientific research and business applications in commercially operating enterprises and in financial and public institutions. In view of the growing importance of this issue in knowledge-based economies, it is important to analyze the correlation between the development of Big Data technologies and analytics of Data Science, Data Analytics, Business Intelligence and the efficiency of using knowledge resources to solve key problems of civilization development. Analytics based on Business Intelligence, in addition to Data Science, Big Data Analytics are increasingly being used in improving business management processes. The development of this analytics based on the implementation of ICT information technologies and Industry 4.0 into analytical processes has a great future in the years to come. In recent years, ICT information technologies, Industry 4.0, etc., have been developing particularly rapidly and are being applied in knowledge-based economies. In addition, the application of artificial intelligence technologies can increase the efficiency of the use of Big Data Analytics and other Industry 4.0/5.0 technologies, which are used to support business management processes.
I have described the issues of application of Big Data and Business Intelligence technologies in the context of enterprise risk management in the following article:
APPLICATION OF DATA BASE SYSTEMS BIG DATA AND BUSINESS INTELLIGENCE SOFTWARE IN INTEGRATED RISK MANAGEMENT IN ORGANIZATION
In addition, I described the issues of opportunities and threats to the development of AI technology applications in my following article:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
In view of the above, I address the following question to the esteemed community of scientists and researchers:
What are the applications of Industry 4.0/5.0 technologies, including Big Data Analytics, Data Science, multi-criteria simulation models, digital twins, additive manufacturing, Blockchain, smart technologies and also generative artificial intelligence to business entities in order to improve internal business intelligence information systems supporting the management processes of a company, enterprise, corporation or other type of business entity?
What are the applications of Industry 4.0/5.0 technologies, including Big Data Analytics and generative artificial intelligence to business entities to improve business entity management processes?
How does Big Data Analytics and generative artificial intelligence support business entity management processes?
What do you think on this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
What is the short new way for you to solve this problem of data analysis in time series? Suppose you have time series data. What steps do you take and how to analyse this data? How to solve it in your work? Follow me and share with me your post and your personal experience.
data collection methodology
data analysis methods
Hello,
Im conducting a qualitative piece of research which ive positioned within a phenomenological grounding. I’m also utilising IPA as the framework for my data analysis. However, I’ve been asked to detail my analytical framework separate this to but I’m struggling to understand the difference between this and my data analysis approach. Any help would be greatly appreciated?
I have selected two deep learning models CNN and sae for data analysis of a 1 d digitized data set. I need to justify choice of these two dl models in comparison to other dl and standard ml models. I am using ga to optimize hyper parameters values of the two dl models. Can you give some inputs for this query.thanks.
What aspects of working with data are the most time-consuming in your research activities?
- Data collection
- Data processing and cleaning
- Data analysis
- Data visualization
What functional capabilities would you like to see in an ideal data work platform?
Colleagues, good day!
We would like to reach out to you for assistance in verifying the results we have obtained.
We employ our own method for performing deduplication, clustering, and data matching tasks. This method allows us to obtain a numerical value of the similarity between text excerpts (including data table rows) without the need for model training. Based on this similarity score, we can determine whether records match or not, and perform deduplication and clustering accordingly.
This is a direct-action algorithm, relatively fast and resource-efficient, requiring no specific configuration (it is versatile). It can be used for quickly assessing previously unexplored data or in environments where data formats change rapidly (but not the core data content), and retraining models is too costly. It can serve as the foundation for creating personalized desktop data processing systems on consumer-grade computers.
We would like to evaluate the quality of this algorithm in quantitative terms, but we cannot find widely accepted methods for such an assessment. Additionally, we lack well-annotated datasets for evaluating the quality of matching.
If anyone is willing and able to contribute to the development of this topic, please step forward.
Sincerely, The KnoDL Team
During my RNAseq data analysis, I encountered a problem where the statistics from MultiQC from the STAR method showed that 70% of my trimmed reads were aligned, but when I ran using the same BAM files, it said that only 3% of my reads were assigned.
Why is that? Should I proceed further or do I need to perform additional checkups? For reference i used HG38.p14 version from NCBI.
- How is healthcare data collected and organized for analysis?
- What statistical and machine learning algorithms can be used for healthcare data analysis?
- How to address missing values and outliers in healthcare data?
- What feature selection and feature engineering methods should be used in healthcare data analysis?
i have a made a proposal but i didnt find anyone who can help me properly and work with me properly.i need someone as co-author
my project:
1.social science related
2.in bangldeshi contex
3.you have to work with me as team member.skill need-zotero,spss data analysis,google form maker,
4.need 3 another member with specific skillsets.
How are researchers leveraging artificial intelligence and machine learning algorithms to enhance data analysis and prediction in their studies?
Someone help me with access to Scopus database. I need the information about articles in Geoderma journal for review. I want to compare the use of methods of remote sensing and proximal sensing soil data analysis for the last 30-40 years. In russia the access is disabled.
Hi RG family. I’m trying to get a foothold on qualitative data analysis using Nvivo. First off, I must admit my addiction to quantitative methods through out much of my career. But recently, I’m getting obsessed with qualitative approaches to research, because of their potential to generate more detailed and comprehensive insights.
However, I’m not familiar with the Nvivo software. Please fill me in on any and I mean any detail you know about qualitative data analysis via Nvivo. From transcription, and data entry, to analysis, results visualization and interpretation. I look forward to learning massively from your immensely invaluable contributions to this discussion.
Over to you, fam!! I’m reading you!!
What are the most effective and widely-used software programs for analyzing data collected through observations in research studies?
I found in google but can't understand properly.
Hello dear scientists
How can we distinguish maternal contamination from triploidy in QF-PCR analysis?
Looking for somebody to share experience on using ChatGPT 4 as an assisting tool (in combination with either Excel or Perseus MaxQuant) for analyzing data such as OMICs. Concretely I'm dealing with proteomics from human blood samples.
Grateful for any feedback!
Are there methods to evaluate studies using medical data?
can anybody suggest some state-of-the-art research problems for Ph.D. on "AI in the academic sector"?
Scenario - There is an IV and DV. IV is measured in 5 point likert scale questions and DV is measured in 7 point likert scale questions.
Doubts -
01. Can we run a test like regression analysis directly irrespective of differences in measures?
02. if NOT, what are the transformation techniques available to transform data into same scale?
In my thesis I calculated two of this method correlation and regression. But I can't understand which is better for data analysis.
I am writing my bachelor thesis and I'm stuck with the Data Analysis and wonder if I am doing something wrong?
I have four independent variables and one dependent variable, all measured on a five point likert scale and thus ordinal data.
I cannot use a normal type of regression (since my data is ordinal and my data is not normally distributed and never will be (transformations could not change that) and is also violating homoscedasticity), so I figured ordinal logisitc regression. Everything worked out perfectly but the test of parallel lines on SPSS was significant and thus the assumption of proportional odds violated. So, I am now considering multinomial logisitc regression as an alternative.
However, here I could not find out how to test the assumption on SPSS: Linear relationship between continuous variables and the logit transformation of the outcome variable. Does somebody know how to do this???
Plus, I have a more profound question about my data. To get the data on my variables, I asked respondents several questions. My dependent variable for example is Turnover Intention and I used 4 questions using a 5 point likert scale, thus I got 4 different values from everyone about their Turnover Intention. In order to do my analysis, I took the average since I only want one result, so one value of Turnover Intention per respondent (and not four). However, now the data does not range from 1,2,3,4 and 5 anymore like before with the five point likert scale but is infinite since I took the average and now have decimals like 1,25 or 1,75. This leaves me with endless data points and I was wondering if my approach makes sense? I was thinking of grouping them together since my analysis is biased by having so many different categories due to the many decimals.
Can somebody provide any sort of guidance on this??
Which free software is suitable for XRD data analysis and how can I get it?
Hi,
Kindly help me to understand when we can use AMOS or SMART-PLS for data analysis? thanks
Regards,
I want to ask you a question about data analysis in psychology. I have two independent variables, one is group (between subjects), one is age (continuous data), and the dependent variable is a 6-point Likert score. I intend to use regression for data analysis, with three questions:
1. Should the ID of the subjects (the number of each subject) be included in the model as a random variable? If it is included, it should be Linear Mixed Model (LMM), if it is not included, it should be Multiple Linear Regression, right?
2. In the case of multiple linear regression, should I directly build the full model and see the influence of each independent variable, or should I build the full model and compare with the zero model, and then analyze with the method of step by step elimination?
3. When I do my analysis, do I need to centralize both age (continuous variable) and rating, or do I just need to centralize age?
Hello,
I am currently working on the data analysis for my Ph.D. project comparing the probability of occurrence of species density and richness (in hectare basis) between three different land use types using count data. Due to the design of the field study, I decided to use GLMM with Poisson distribution as I have various random effects and sites as a random effect that need to be accounted for. The model seems to be doing the job, however, I am not really sure how to report the results. I am using the lme4 package in the R console to analyze my data.
Thank you
Can anyone please explain how to analyse data collected using brief cope 28 item scale?
Like i won't get the total score rather than score for each subscale and then it is mentioned in a research to use normative data of a heart failure study for calculating percentile ranks. Can anyone please help as I dont get the data analysis part after collection using this scale?
Hi,
I have just installed and used spss 29. I was using spss 27.
I am analyzing data with a crossed random effects mixed model.
I am using syntax for this type of analysis. With the exact same syntax and data base I obtain different results with spss 29 and spss 27!
Specifically, the same model (that I call model 3) run with spss 27 was not giving me a warning whereas with spss 29 I get a warning (The final Hessian matrix is not positive definite although all convergence criteria are satisfied. The MIXED procedure continues despite this warning. Validity of subsequent results cannot be ascertained.).
Another case: with a slightly simpler model that I call model 2, I have no warnings but the results with spss 27 and spss 29 are not identical (e.g. BIC is different).
Is anyone experiencing the same or similar ?
discuss the study design, the relevant data to be collected, how two animal species i.e. cattle and pigs can be incorporated in one paper for discussion and the most relevant data analysis techniques for data that spans 5 years.
I'm wondering how can I code in a simple way analyze data quantitatively that has been reported as HH:MM format but its output is text/character (e.g. "09:00AM; "9AM"; "10am")
I was performing a RNAseq data analysis. I did my alignment using RNA-STAR and then I perform featurecounts. I used latest assembly of human genome i.e. HG38.p14. But after feturecounts step i noticed that some gene were counted abnormally, like the screenshot i share you can see that ABO gene came two times, one as 'ABO' and then 'ABO_1' and you can see many more are came like this. in featured count i selected the option, "count them as single fragment". Dataset was illumina Paired end reads.
1. Dose anyone know What is the reason behind that?
2. Did I do any mistake during the processes that i didn't noticed?
3. What to do in this situation?
Thank you , very much for the time.
Data generation (collection) is a key and critical component of a qualitative research project. The question is, how can one make sure that sufficient data have been generated/collected?
Let's find the most essential and reliable no-code data science tools to speed up the elaboration of the research results. Thanks to Avi Chawla (source: LinkedIn post), I have some suggestions for you here. Let us know your tips.
Gigasheet
- Browser-based no-code tool to analyze data at scale
- Use AI to conduct data analysis
- It's like a combination of Excel + Pandas with no scale limitations
- Analyze up to 1B rows
Mito
- Create a spreadsheet interface in Jupyter Notebook
- Yse Mito AI to conduct data analysis
- Automatically generates Python code for each analysis
PivotTableJS
- Create Pivot tables, aggregations, and charts using drag-and-drop
- Add heatmaps to tables
- Works within Jupyter notebook
Drawdata
- Draw any 2D scatter dataset by dragging the mouse
- Export the data as DataFrame, CSV, or JSON
- Create a histogram and line plot by dragging the mouse
PyGWalker
- Open a tableau-style interface in Jupyter notebook
- Analyze a DataFrame as you would in Tableau
Visual Python
- A GUI-based Python code generator
- Import libraries, perform data I/O, create plots, and write code for ML models by clicking buttons
Tensorflow Playground
- Provides an elegant UI to build, train, and visualize neural networks
- Browser-based tool
- Change data, model architecture, hyperparameters, etc. by clicking buttons
ydata-profiling
- Generate a standardized EDA report for your dataset
- Works in a Jupyter notebook
- Covers info about missing values, data statistics, correlation, and data interactions
My area of interest are sustainable business and data analysis
If i have got a matrix of 16x12 and i want to create 3 classes.Is there any machine learning technique which can identify the lower and upper boundary levels for each of the classes.
which one is the best online course for data analysis?
Please mention name of open source software
Am working in TVET in India and want to do a research study on earning & skill outcomes of TVET using data analytics. We have an active database of 15,000 apprentices that we are currently engaging so data collection should not be a problem.
Additionally would like to develop a framework for Dual System of Training for implementation in India. Seeking reference papers and general advice...
I‘m currently trying to find a fit for my data but I‘m struggling with finding the correct equivalent circuit. I‘m working with a two-electrode-system and my electrolyte is my analyte.
I can’t use the randels circuit and I wondered, why it describes just the electrolyte, the double layer and one electrode.
In my understanding, the circuit (in the case of a two-electrode-system) should consist of the resistor and capacitor from my working electrode, the resistor from my electrolyte and again a capacitor and resistor for my counter electrode.
I‘m quite new to the topic, so if someone has an answer or idea it would be very helpful!
He has sufficient knowledge of ready-made statistical applications and data analysis methods
Dear all,
Are there any linking variables between, let's say TIMSS and PIRLS (both studies from IES), or studies of such kind, which could lead to a data analysis combining these sets of data when there are students participating in both studies?
I have seen some papers using data from different large studies, yet I'm not sure what approaches can be taken in this kind of analysis... What are your thoughts on this?
Exploring the role of AI and data analytics in improving our ability to predict and manage pandemics
This question explores the potential of cutting-edge technology to provide early warnings for future pandemics and revolutionize our approach to pandemic preparedness.
I apologize in advance as I am new to community data analysis.
I am currently analyzing differences in fish assemblages due to temperature extremes. I am looking at data over multiple years(the years were chosen based off the avg. temperature of that given year-looking at the warmest and coldest years from a larger dataset).
I chose three sampling sites within a a single bay, the sites are used as replicates for my analysis.
I am interested in determining if the assemblages have a significant difference between the years. The response variable is the counts of each species of fish collected. The ultimate goal would be to determine if there is any difference between assemblages in a given year and is it the significant difference occurring between a warm and cold year. Univariate biodiversity tests(richness, Shannon, etc.) were calculated as well as SIMPER to determine which species were contributing most to given results.
I am also looking at counts of the given years over multiple months but will be keeping months separate from one another (i.e August 2000-2010 data analysis is separate from September 2000-2010 data analysis).
I will be using R for my statistical program; but my question is which analysis is appropriate to use to test for significance. My only factor seems to be (year) so i am not sure if this is considered multivariate even in this case and if ANOSIM is even appropriate.
I would appreciate any feedback or constructive criticism at this point as I am pretty new to multivariate studies.
Dear all,
I have the following data and questions related to the issues in the results.
Independent variable
I did an online survey and asked parents questions about neighbourhood-built environment factors like neighbourhood type, accessibility, and neighbourhood safety. These are major factors that have sub-questions inside them. These questions are on a five-point Likert scale. Therefore, the parents’ responses are on a scale of 1 to 5. These are my independent variables.
I have calculated the average score based on the response to each question by the parents under the major factor. For example, in the neighbourhood type factor, I asked five questions. Each question was on a five-point Likert scale. So, If Q1 = 5 (response given by parent), Q2 = 3, Q3 = 2, Q4 = 1, Q5 = 1.
Then the subscale score for neighbourhood type was = 5+3+2+1+1 / 5 = 2.4
A similar procedure was adopted for other factors. The final score for each factor was taken as the binary categorical variable, with a cut-off value = 2.5.
Dependent variable
I also asked questions about the physical activity of their eldest adolescent kid. Adolescents' response was asked in the number of days in a week. This is my dependent variable, and I have taken it as a continuous variable.
Total data
I have collected this data for three cities in India. So, for the final dataset of the model, I have combined all the data points into a single Excel sheet.
Aim
I aim to understand the effect on adolescent physical activity due to the parents’ physical activity and built-environment factors. Therefore, I have used a multilevel linear regression model to investigate the relationship between children's PA and neighborhood-built environment. In this model, I have taken the city variable as a random effect variable and the adolescent's age & neighbourhood residency duration as a fixed effect.
Results
The results are unexpected, and I am not convinced. They are also statistically insignificant.
My questions are as follows:
- Is averaging and reporting the Likert scale as a subscale score correct?
- Is the binary categorisation of the factors a correct approach?
3. For this type of data, Is it correct to use a multi-level linear regression model for this type of data? As all the independent variables are fixed.
I would be grateful for the discussion and input to help me in this analysis.
warm regards
Laxman
In a causal model (such as multiple IVs and single DV) with presence of a mediator or moderator, do we have to consider such mediator or moderator when assessing the parametric assumptions or do we have ignore them and consider only the IV/s and DV in the model?
Hi everyone
I'm facing a real problem when trying to export data results from imageJ (fiji) to excel to process it later.
The problem is that I have to change manually the dots (.) , commas (,) even when changing the properties in excel (from , to .) in order not count the numbers as thousands, (let's say I have 1,302 = one point three zero two) it count it as (1302 = one thousand three hundred and two) when I transfer to excel...
Lately I found a nice plugin (Localized copy...) that can change the numbers format locally in imageJ so it can be used easily by excel.
Unfortunately, this plugin has some bugs because it can only copy one line of the huge data that I have and only for one time (so I have to close and reopen the image again).
is there anyone that has faced this problem? Can anyone suggest me please another solutions??
Thanks in advance
Problem finally solved... I got the new version of 'Localized copy' plugin from the owner Mr Wolfgang Gross (not sure if I have the permission to upload it here).
The fourth technological revolution currently underway is characterised by rapidly advancing ICT information technologies and Industry 4.0, including but not limited to machine learning, deep learning, artificial intelligence, ... what's next? Intelligent thinking autonomous robots?
The fourth technological revolution currently underway is characterised by rapidly advancing ICT information technologies and Industry 4.0, including but not limited to technologies learning machines, deep learning, artificial intelligence. Machine learning, machine learning, machine self-learning or machine learning systems are all synonymous terms relating to the field of artificial intelligence with a particular focus on algorithms that can improve themselves, improving automatically through the action of an experience factor within exposure to large data sets. Algorithms operating within the framework of machine learning build a mathematical model of data processing from sample data, called a learning set, in order to make predictions or decisions without being programmed explicitely by a human to do so. Machine learning algorithms are used in a wide variety of applications, such as spam protection, i.e. filtering internet messages for unwanted correspondence, or image recognition, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. Deep learning is a kind of subcategory of machine learning, which involves the creation of deep neural networks, i.e. networks with multiple levels of neurons. Deep learning techniques are designed to improve, among other things, automatic speech processing, image recognition and natural language processing. The structure of deep neural networks consists of multiple layers of artificial neurons. Simple neural networks can be designed manually so that a specific layer detects specific features and performs specific data processing, while learning consists of setting appropriate weights, significance levels, value system for components of specific issues defined on the basis of processing and learning from large amounts of data. In large neural networks, the deep learning process is automated and self-contained to a certain extent. In this situation, the network is not designed to detect specific features, but detects them on the basis of the processing of appropriately labelled data sets. Both such datasets and the operation of neural networks themselves should be prepared by specialists, but the features are already detected by the programme itself. Therefore, large amounts of data can be processed and the network can automatically learn higher-level feature representations, which means that they can detect complex patterns in the input data. In view of the above, deep learning systems are built on Big Data Analytics platforms built in such a way that the deep learning process is performed on a sufficiently large amount of data. Artificial intelligence, denoted by the acronym AI (artificial intelligence), is respectively the 'intelligent', multi-criteria, advanced, automated processing of complex, large amounts of data carried out in a way that alludes to certain characteristics of human intelligence exhibited by thought processes. As such, it is the intelligence exhibited by artificial devices, including certain advanced ICT and Industry 4.0 information technology systems and devices equipped with these technological solutions. The concept of artificial intelligence is contrasted with the concept of natural intelligence, i.e. that which pertains to humans. In view of the above, artificial intelligence thus has two basic meanings. On the one hand, it is a hypothetical intelligence realised through a technical rather than a natural process. On the other hand, it is the name of a technology and a research field of computer science and cognitive science that also draws on the achievements of psychology, neurology, mathematics and philosophy. In computer science and cognitive science, artificial intelligence also refers to the creation of models and programmes that simulate at least partially intelligent behaviour. Artificial intelligence is also considered in the field of philosophy, within which a theory is developed concerning the philosophy of artificial intelligence. In addition, artificial intelligence is also a subject of interest in the social sciences. The main task of research and development work on the development of artificial intelligence technology and its new applications is the construction of machines and computer programmes capable of performing selected functions analogously to those performed by the human mind functioning with the human senses, including processes that do not lend themselves to numerical algorithmisation. Such problems are sometimes referred to as AI-difficult and include such processes as decision-making in the absence of all data, analysis and synthesis of natural languages, logical reasoning also referred to as rational reasoning, automatic proof of assertions, computer logic games e.g. chess, intelligent robots, expert and diagnostic systems, among others. Artificial intelligence can be developed and improved by integrating it with the areas of machine learning, fuzzy logic, computer vision, evolutionary computing, neural networks, robotics and artificial life. Artificial intelligence (AI) technologies have been developing rapidly in recent years, which is determined by its combination with other Industry 4.0 technologies, the use of microprocessors, digital machines and computing devices characterised by their ever-increasing capacity for multi-criteria processing of ever-increasing amounts of data, and the emergence of new fields of application. Recently, the development of artificial intelligence has become a topic of discussion in various media due to the open-access, automated and AI-enabled solution ChatGPT, with which Internet users can have a kind of conversation. The solution is based and learns from a collection of large amounts of data extracted in 2021 from specific data and information resources on the Internet. The development of artificial intelligence applications is so rapid that it is ahead of the process of adapting regulations to the situation. The new applications being developed do not always generate exclusively positive impacts. These potentially negative effects include the potential for the generation of disinformation on the Internet, information crafted using artificial intelligence, not in line with the facts and disseminated on social media sites. This raises a number of questions regarding the development of artificial intelligence and its new applications, the possibilities that will arise in the future under the next generation of artificial intelligence, the possibility of teaching artificial intelligence to think, i.e. to realise artificial thought processes in a manner analogous or similar to the thought processes realised in the human mind.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
The fourth technological revolution currently taking place is characterised by rapidly advancing ICT information technologies and Industry 4.0, including but not limited to machine learning technologies, deep learning, artificial intelligence, .... what's next? Intelligent thinking autonomous robots?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Best regards,
Dariusz Prokopowicz
Hello everyone,
I am Danillo Souza, and I am currently a Post-Doc Researcher at Basque Center for Applied Mathematics (BCAM). I am currently working on the Mathematical, Computational and Experimental Neuroscience Group (MCEN). One of the challenges of my work is to derive optimal tools to exact topological and/or geometrical information from Big data.
I am trying to submit a work to arXiv and unfortunately, an endorsement in Physics - Data Analysis and Statistics is required. I was wondering if some researcher could be my endorser in this area.
Beforehand, I appreciate your efforts in trying to help me.
With kind regards,
Danillo
Email: [email protected]
Danillo Barros De Souza requests your endorsement to submit an article
to the physics.data-an section of arXiv. To tell us that you would (or
would not) like to endorse this person, please visit the following URL:
https://arxiv.org/auth/endorse?x=UOKIX3
If that URL does not work for you, please visit
http://arxiv.org/auth/endorse.php
and enter the following six-digit alphanumeric string:
Endorsement Code: UOKIX3
Does anyone have a data analysis software setup to share with me?
In case you perform data analysis and the CV is 34, what is the implication?
Suppose I want to analyse 100 year of monthly mean flow data of a particular month.
What are the potential research implications of employing entropy for one application and ARCH GARCH models for another application when applied to the same dataset? How might the utilization of these two different approaches influence the outcomes and insights derived from the data analysis?
Today more and more people gain data analytics skills, one of which to fill empty vacancies. Cities and communities are also leveraged by these trends, there could be more people to help maintain public policy analyst reproducible code base. The infrastructure need to be in place to support more public engagement. How do you think about this phenomena?
This pertains to the transcendental aspect of phenomenology.
Is it possible to run Sharpe s Index, Value at Risk ( historical model or minte carlo model) and Monte carlo simulation simultaneously on the same data set to get a detailed understanding on the same data set of 5 year performance of individual banks vs Bank nifty Index towards understanding the performance?
Literature support is there for individual studies, but is it feasible to look at the same in a combined manner?
Can you let me know your suggestions on this?
Communication of the machine with the computer is OK and when started in normal way Green LED gets stable. Then when you click the Data Collection software the Service Console window opens and first rectangle of "Messaging Service" gets green but next Data Service triangle remains yellow and no further progress. Can someone guide what may the reason and possible solution please. Thank you.
Analysis of qualitative data requires intensive reading of the transcripts, field reports, diaries, journals, and other documents. It is a continuous and to-and-fro process. What changes do you, as a qualitative researcher, face during data analysis and how do you overcome those challenges?
I have planned to conduct a study of Antenatal Education program using "One Group Pretest Posttest" for data analysis.
I used the statistics of prior to calculate sample size but it came out just 4.
Can I really use this number? Please find the details on file attached.