Science topics: Mathematical SciencesStatistics
Science topic
Statistics - Science topic
Statistical theory and its application.
Questions related to Statistics
Hello all,
I am running into a problem I have not encountered before with my mediation analyses. I am running a simple mediation X > M > Y in R.
Generally, I concur that the total effect does not have to be significant for there to be a mediation effect, and in the case I am describing, this would be a logical occurence, since the effects of path a and b are both significant and respectively are -.142 and .140, thus resulting in a 'null-effect' for the total effect.
However, my 'c path X > Y is not 'non-significant' as I would expect, rather, the regression does not fit (see below) :
(Residual standard error: 0.281 on 196 degrees of freedom
Multiple R-squared: 0.005521, Adjusted R-squared: 0.0004468
F-statistic: 1.088 on 1 and 196 DF, p-value: 0.2982).
Usually I would say you cannot interpret models that do not fit, and since this path is part of my model, I hesitate to interpret the mediation at all. However, the other paths do fit and are significant. Could the non-fitting also be a result of the paths cancelling one another?
Note: I am running bootstrapped results for the indirect effects, but the code does utilize the 'total effect' path, which does not fit on its own, therefore I am concerned.
Note 2: I am working with a clinical sample, therefore the samplesize is not as great as I'd like group 1: 119; group2: 79 (N = 198).
Please let me know if additional information is needed and thank you in advance!
Assuming this is my hypothetical data set (attached figure), in which the thickness of a structure was evaluated in the defined positions (1-3) in 2 groups (control and treated). I emphasize that the structure normally increases and decreases in thickness from position 1 to 3. I would also like to point out that each position has data from 2 individuals (samples).
I would like to check if there is a statistical difference in the distribution of points (thickness) depending on the position. Suggestions were to use the 2-sample Kolmogorov-Smirnov test.
However, my data are not absolutely continuous, considering that the position of the measurement in this case matters (and the test ignores this factor, just ordering all values from smallest to largest and computing the statistics).
In this case, is the 2-sample Komogorov-Smirnov test misleading? Is there any other type of statistical analysis that could be performed in this case?
Thanks in advance!
Dear colleagues
Could you tell me please,how is it possible to consruct boxplot from dataframe in rstuio
df9 <- data.frame(Kmeans= c(1,0.45,0.52,0.54,0.34,0.39,0.57,0.72,0.48,0.29,0.78,0.48,0.59),hdbscan= c(0.64,1,0.32,0.28,0.33,0.56,0.71,0.56,0.33,0.19,0.53,0.45,0.39),sectralpam=c(0.64,0.31,1,0.48,0.24,0.32,0.52,0.66,0.32,0.44,0.28,0.25,0.47),fanny=c(0.64,0.31,0.38,1,0.44,0.33,0.48,0.73,0.55,0.51,0.32,0.39,0.57),FKM=c(0.64,0.31,0.38,0.75,1,0.26,0.55,0.44,0.71,0.38,0.39,0.52,0.53), FKMnoise=c(0.64,0.31,0.38,0.75,0.28,1,0.42,0.45,0.62,0.31,0.25,0.66,0.67), Mclust=c(0.64,0.31,0.38,0.75,0.28,0.46,1,0.36,0.31,0.42,0.47,0.66,0.53), PAM=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,1,0.73,0.43,0.39,0.26,0.41) ,
AGNES=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,1,0.31,0.48,0.79,0.31), Diana=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,1,0.67,0.51,0.43),
zones2=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,1,0.69,0.35),
zones3=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,0.59,1,0.41),
gsa=c(0.64,0.31,0.37,0.75,0.28,0.46,0.58,0.55,0.42,0.45,0.59,0.36,1), method=c("kmeans", "hdbscan", "spectralpam", "fanny", "FKM","FKMnoise", "Mclust", "PAM", "AGNES", "DIANA","zones2","zones3","gsa"))
head(df9)
df9 <- df9 %>% mutate(across(everything(), ~as.numeric(as.character(.))))
Thank you ery much
In the domain of clinical research, where the stakes are as high as the complexities of the data, a new statistical aid emerges: bayer: https://github.com/cccnrc/bayer
This R package is not just an advancement in analytics - it’s a revolution in how researchers can approach data, infer significance, and derive conclusions
What Makes `Bayer` Stand Out?
At its heart, bayer is about making Bayesian analysis robust yet accessible. Born from the powerful synergy with the wonderful brms::brm() function, it simplifies the complex, making the potent Bayesian methods a tool for every researcher’s arsenal.
Streamlined Workflow
bayer offers a seamless experience, from model specification to result interpretation, ensuring that researchers can focus on the science, not the syntax.
Rich Visual Insights
Understanding the impact of variables is no longer a trudge through tables. bayer brings you rich visualizations, like the one above, providing a clear and intuitive understanding of posterior distributions and trace plots.
Big Insights
Clinical trials, especially in rare diseases, often grapple with small sample sizes. `Bayer` rises to the challenge, effectively leveraging prior knowledge to bring out the significance that other methods miss.
Prior Knowledge as a Pillar
Every study builds on the shoulders of giants. `Bayer` respects this, allowing the integration of existing expertise and findings to refine models and enhance the precision of predictions.
From Zero to Bayesian Hero
The bayer package ensures that installation and application are as straightforward as possible. With just a few lines of R code, you’re on your way from data to decision:
# Installation
devtools::install_github(“cccnrc/bayer”)# Example Usage: Bayesian Logistic Regression
library(bayer)
model_logistic <- bayer_logistic( data = mtcars, outcome = ‘am’, covariates = c( ‘mpg’, ‘cyl’, ‘vs’, ‘carb’ ) )
You then have plenty of functions to further analyze you model, take a look at bayer
Analytics with An Edge
bayer isn’t just a tool; it’s your research partner. It opens the door to advanced analyses like IPTW, ensuring that the effects you measure are the effects that matter. With bayer, your insights are no longer just a hypothesis — they’re a narrative grounded in data and powered by Bayesian precision.
Join the Brigade
bayer is open-source and community-driven. Whether you’re contributing code, documentation, or discussions, your insights are invaluable. Together, we can push the boundaries of what’s possible in clinical research.
Try bayer Now
Embark on your journey to clearer, more accurate Bayesian analysis. Install `bayer`, explore its capabilities, and join a growing community dedicated to the advancement of clinical research.
bayer is more than a package — it’s a promise that every researcher can harness the full potential of their data.
Explore bayer today and transform your data into decisions that drive the future of clinical research: bayer - https://github.com/cccnrc/bayer
What may be a good, strong and convincing example demonstrating the power of copulas by uncovering some not obvious statistical dependencies?
I am especially interested in the example contrasting copula vs a simple calculation of a correlation coefficient for the original distributions.
Something like this - the (properly normalized) correlation coefficient of components of a bivariate distribution does not suggest a strong statistical dependence between them, but the copula distribution of these two components shows a clear dependence between them (possibly manifested in the value of a correlation coefficient calculated for the copula distribution?). Or the opposite - the correlation coefficient of the original bivariate distribution suggests strong dependence, but its copula shows that the statistical dependence is "weak", or just absent.
Mostly interested in an example described in terms of formulae (so that the samples could be generated, e.g. in MATLAB), but if somebody can point to the specific pre-generated bivariate distribution dataset (or its plots), that will work too.
Thank you!
I have come across some ideas which I hope will be pursued by academics in the associated disciplines:
1) Topics Broadly outlined in the following articles
Fathabadi OS (2022) Voluntary Selection; Bringing Evolution at the Service of Humanity. Scientific J Genet Gene Ther 8(1): 009-015.DOI: http://dx.doi.org/10.17352/sjggt.000021
Fathabadi OS (2023) The way of future through voluntary selection. Glob J Ecol 8(1): 034-041. DOI: 10.17352/gje.000079
2) Explanation of how violence is the link between evolution and society. What is not clear in the articles above is that just as a disease shows symptoms such as fever in its early stages, violence that is caused by the lack of evolutionary order in society also shows itself in the form of problems like bullying, harassment, discrimination, passive aggression, coercive control, etc and then It may appear in the form of an increase in the rate of crimes and social riots and finally in the form of naked violence, civil wars, genocides and foreign wars. The purpose of the topics raised in both articles is to realize the evolutionary order through the methods proposed in control engineering and by achieving the desired statistical goals, not only to prevent violence, but also to spread satisfaction in the society and to provide internal and external security.
3) Interpretation of historical events including genocides and world wars through the lens of insights provided above for example how passive aggression became widespread prior to such events.
4) The articles above argue that Control Theory, in conjunction with data science, can help establish and maintain democracies with an unprecedented level of stability and provide optimal levels of living standards by regulating evolutionary health of societies and the relationships between them. This goal however, requires interpretation of the theory widely used in engineering, for application in Humanities. Control Engineering remains a rather challenging field to learn and its theoreticians and practitioners are mostly active in areas other than humanities - most critically politics - and even despite the large number of engineering graduates receiving education in Systems and Control Theory as part of their curriculum, the specific practical pathway for application of the theory for solving problems in Humanities remains rather unclear; let alone the inaccessibility of the field to the practitioners of humanities even to the extent necessary to allow them to express the problems in terms approachable by control engineering experts. Projects should therefore be defined to develop highly targeted educational materials and tools which provide a shortcut to applying the theory in solving practical problems in humanities. It is particularly important to understand that the type of Control Theory applicable to Humanities would be "Non-Linear Time-Varying Control" which is an extension of control Theory as relevant to most engineering problems and as such, the specific theory will need to be developed specifically beyond what a classically educated expert in Control Engineering would be comfortably equipped with even if educated with a PhD. For example the types of time constants and uncertainties relevant in Humanities and the types of irregularities which can be created by collective mis-intentions of social groups as well as moral aspects of implications of such techniques on human population will extend to matters associated in politics, biology, culture, education, and media. Actually it is important to understand that the knowledge and tools, will not act as a pill to fix problems but through insights, they provide directions and "Decision Support Systems" with local applicability and temporal relevance which despite their limitations, will provide unprecedented powers for providing better living standards for all populations and individuals. Such tools need to be maintained in order to remain relevant to the changes in the system under control and time. The developed materials will help students, researchers and practitioners get started and up to speed with the expertise through minimal training or self-study and the impact I believe would be revolutionary. In addition to Systems and Control Theory, the mentioned books/educational materials/tools may also cover topics such as Programming, Differential Equations, Numerical Optimisation, System Identification, Artificial Intelligence and to some extent Statistics. The set of these topic may one day help establish an independent field of study namely "Humanities Engineering".
5) Applications of the theory, educational materials, and tools mentioned above, to solve practical problems associated in sociology, psychology, politics and other fields of humanities is the ultimate goal and such projects make great topics for research in universities, Think tanks, and governments. It is important to mention that using the mean-values and standard deviations defining phenotypic profiles of populations is a way of taking into account the differences of different populations in achieving desirable results within them and in regulating the relationship between them.
6) It is also possible to characterise discourses, and broadcasted contents by defining relevant indices that quantify various aspects of them and then model and predict their relationship with the outcomes in society. These models can then act as decision support systems to identify and implement adjustments for achieving the desired social outcomes. If sufficiently predictive, they can also be used in combination with other models or in isolation as part of the control loops associated in Control Theory in order to achieve the desired outcomes in terms of sustainability, social stability, freedoms, economic welfare, health, national security, psychological security, gender equality, and optimal levels of happiness in all individuals and social groups, etc.
7) It is possible to compile a set of contents including matters mentioned in the articles above, to act as a mental anchorage for people. Something that is scientifically proven, convincing and understandable by those who put in the effort and allows them to remain motivated, morally directed, socially responsible and supported and mentally healthy. I believe adding a content starting from Genetics explaining how "Life is a complex Product of Nature" and how "Survival and Reproduction are Complex Interpretations of Laws of Nature" is necessary. Topics such as cosmology and Quantum Physics can also be included.
8) It is possible to define a project on optimal forms of democracy for different populations with emphasis on the fact that peoples' choices represent their interests as they understand them as individuals while much of what comprises our existing living standards or is necessary for achieving higher standards of living are a result of societies and mechanisms maintaining them. Societies were formed by cultures/religions as an aftermath of painful evolutionary events which occurred when people pursued their personal interests and emerged as optimal ways of achieving a better average standard of living for larger numbers of individuals over a larger proportion of their lives. In other words, many aspects of our existing living standards are by-products of societies and could not be achieved or maintained only pursuing our individual choices which is what a democracy guarantees. Democracies should be pursued for optimum living standards and preventing abuse however, it is necessary to have democracies in place to guarantee the maintenance of society itself (what you can call an evolutionary order) and realise/maintain its desired levels of standards of living and this should not be compromised by the choice of individuals.
9) Ethics of Voluntary Selection and application of methods concerned in Control Theory and AI in solving problems in Humanities specially to prevent abuse of individuals, and minorities under the flag of interests of society and to prevent creating senseless scientific approvals for imposing disadvantage on individuals and social groups. It is also necessary to minimise pain imposed on individuals and social groups in transitions as a result of adjustments.
10) While the first article introduces the concept of "Voluntary Selection" and a methodology to use it in a calculated way, it only acts as a beginning and if it is going to be implemented, a huge methodological and experimental effort is needed for identifying relevant phenotypes, developing phenotypic maps for distinct populations, identifying the results when choosing donor and receiver populations, and developing tools to predict and monitor the progress of such programs besides studying the implications for society, economy and beyond. Research can also dig deeper and take into account genotype-phenotype relationships in achieving the desired results.
I want to estimate the half-life value for the virus as a function of strain and concentration, and as a continuous function of temperature.
Could anybody tell me, how to calculate the half-life value in R programming?
I have attached a CSV file of the data
I am studying leadership style's impact on job satisfaction. in the data collection instrument, there are 13 questions on leadership style divided into a couple of leadership styles. on the other hand, there are only four questions for job satisfaction. how do i run correlational tests on these variables? What values do i select to analyze in Excel?
I explain here the connection between the pre-scientific Law of Universal Causality and all sorts of statistical explanations in physical sciences. The way it takes may look strange, but it will be interesting enough to consider.
To repeat in short what is already said a few times: by all possible assumptions, to exist (which is To Be with respect to Reality-in-total) is non-vacuous. Hence, any existent must have Extension, have finite-content parts. These parts, by the only other possible assumption, must yield impacts on other parts both external and internal. This is Change.
These impacts are always finite in the content and measured extents. The measured extents of Extension and Change are space and time. Without measurements we cannot speak of space and time as existing or as pertaining to existents. What pertain to all existents as most essential are Extension and Change. Existence in Extension and Change means that finitely extended objects give origin to finite impacts. This is Causality. Every existent is Extension-Change-wise existent, and hence everything is causal.
As pertinents to existents, Extension and Change are the most applicable qualities / universals of the group of all entities, i.e., Reality-in-total, because they belong to all that exist. Since Extension and Change are not primarily in our minds, let us call them as ontological universals. As is clear now, Extension and Change are the widest possible and most general ontological universals. All universals are pure qualities. All qualities other than ontological universals are mixtures of pure qualities.
There are physical-ontological universals / qualities that are not as universal as Extension and Change. ‘Colouredness’ / ‘being coloured’, ‘redness’, ‘unity’ / ‘being a unit’, ‘being malleable’, ‘being rigid’, etc. are also pure qualities. These are pertinents not merely of one existent process. They belong to many. These many are a group of existent processes of one kind, based on the one classification quality. Such groups of Extension-Change-wise existent entities are termed natural kinds.
Ontological universals can be reflected in minds too, but in very meagre ways, not always, and not always to the same extent of correspondence with ontological universals, because they are primarily in existent processes. A direct reflection is impossible. The many individuals who get them reflected meagrely formulate them differently.
The supposed common core of ontological universals in minds is a pure notion, but they are mere notions idealized by minds. These ideals are also not inherited of the pertinent ontological universals of all relevant existent things, but at least by way of absorption from some existents, in whatever manner of correspondence with ontological universals. I call them connotative universals, because they are the pure aspects of the conceptual activity of noting objectual processes together.
In brains connotative universals can show themselves only as a mixture of the relevant connotative universals and the relevant brain elements. Please note that this is not a brain-scientific statement. It is the best imaginable philosophical common-sense on the brain-scientific aspect of the formation of connotative universals, and hence it is acceptable to all brain scientists. In brains there are processes that define such activities. But it needs only to be accepted that these processes too are basically of Extension-Change-wise existence, and hence are causal in all senses.
Connotatives are just representations of all kinds of ontological universals. Connotatives are concatenated in various ways in connection with brain elements – in every case highly conceptually and symbolically. These concatenations of connotatives among themselves are imaginations, emotions, reflections, theories, etc., as considered exclusively in the mind.
Note here also that the lack of exact correspondence between ontological and connotative universals is what makes ALL our statements essentially statistical and non-exact at the formation of premises and at the jump from premises into conclusions. The statistical aspect here is part of the process of formation, by brains, of connotatives from ontological universals. This is the case in every part of imaginations, emotions, reflections, theories, etc., even when statistical measurements are not actually being made part of the inquiry as a matter of mentally guided linguistic and mathematical procedures.
Further, connotative universals are formulated in words expressed as terms, connected with connectives of processes, and concatenated in statements. These are the results of the symbolic functioning of various languages including mathematics. These are called denotative universals and their concatenations. All symbolic activities function at this level.
Now coming to statistics as an applied expression of mathematics. It is nothing but denotative universals concatenated in a quantitatively qualitative manner. Even here there is a lot of lack of exactness, which are known as uncertainty, randomness, etc. Pay attention to the fact that language, mathematics, and its statistical part work at the level of denotative universals and their concatenations. These are naturally derived from the conglomerations of ontological universals via concatenations of connotatives and then translated with further uncertainties unto denotative concatenations.
Causation works at the level of the conglomerations of ontological universals, which are in existent things themselves. That is, statistical connections appear not at the ontological level, but at the denotative level. When I say that this laptop is in front of me, there is a directness of acceptance of images from the ontological universals and their conglomerations into the connotative realm of connotations and from there into the denotative realm of connotations. But in roundabout conclusions regarding causal processes at the physical-ontological level into the statistical level, the amount or extent of directness of judgement is very much lacking.
What is the specific importance of a bachelor’s degree in the hiring process?
Why parsimoniously does fertility negatively correlate with socioeconomic status? How?
Hi,
I am hoping to get some help on what type of statistical test to run to validate my data. I have run 2 ELISAs with the same samples for each test. I did perform a Mann-Whitney U-test to compare the groups, and my results were good.
However, my PI wants me to also run a statistical test to determine that there wasn't any significant difference in the measurement of each sample between the runs. He wants to know that my results are concordant/reproducible.
I am trying to compare each sample individually, and since I don't have 3 data points, I can't run an ANOVA. What types of statistical tests will give me that information? Also, is there a test that will run all the samples simultaneously but only compare across the same sample.
For example, if my data looked like this.
A: 5, 5.7
B: 6, 8
C: 10, 20
I need a test to determine if there is any significant difference between the values for samples A, B, and C separately and not compare the group variance between A-C.
Hello everyone,
I am currently undertaking a research project that aims to assess the effectiveness of an intervention program. However, I am encountering difficulties in locating suitable resources for my study.
Specifically, I am in search of papers and tutorials on multivariate multigroup latent change modelling. My research involves evaluating the impact of the intervention program in the absence of a control group, while also investigating the influence of pre-test scores on subsequent changes. Additionally, I am keen to explore how the scores differ across various demographic groups, such as age, gender, and knowledge level (all measured as categorical variables).
Although I have come across several resources on univariate/bivariate latent change modelling with more than three time points, I have been unable to find papers that specifically address my requirements—namely, studies focusing on two time points, multiple latent variables (n >= 3), and multiple indicators for each latent variable (n >= 2).
I would greatly appreciate your assistance and guidance in recommending any relevant papers, tutorials, or alternative resources that pertain to my research objectives.
Best,
V. P.
I want to examine the relationship between school grades and self-esteem and was planning to do a linear regression analysis.
Here's where my Problem is. I have three more variables: socioeconomic status, age and sex. I wanted to treat those as moderation variables, but I'm not sure if that's the best solution. Maybe a multiple regression analysis would be enough? Or should I control those variables?
Also if I'd go for a moderation analysis, how'd I go about analysing with SPSS? I can find a lot of videos about moderation analysis, but I can't seem to find cases with more than one moderator.
I've researched a lot already but can't seem to find an answer. Also my statistic skills aren't the best so maybe that's why.
I'd be really thankful for your input!
I recently had a strange question from one of the non-statistician asking on confidence interval. His way of understanding is that all the sample values that was used to calculate the confidence interval should be within that interval. I have tried to answer him the best, but couldn't convince him in any way. Is there any best way to explain why it need not be, and the purpose is not the way he understands. How would you handle this question?
Thanks in advance.
Respectfully, across reincarnation belief and scientific materialism, why is considering the individual self, as an illusion, a commonality? 1)
Dear all,
I am sharing the model below that illustrates the connection between attitudes, intentions, and behavior, moderated by prior knowledge and personal impact perceptions. I am seeking your input on the preferred testing approach, as I've come across information suggesting one may be more favorable than the other in specific scenarios.
Version 1 - Step-by-Step Testing
Step 1: Test the relationship between attitudes and intentions, moderated by prior knowledge and personal impact perceptions.
Step 2: Test the relationship between intentions and behavior, moderated by prior knowledge and personal impact perceptions.
Step 3: Examine the regression between intentions and behavior.
Version 2 - Structural Equation Modeling (SEM)
Conduct SEM with all variables considered together.
I appreciate your insights on which version might be more suitable and under what circumstances. Your help is invaluable!
Regards,
Ilia
ResearchGate does a pretty good job of tracking publication analytics such as reads and citations over time. The recommendations feature can also be an interesting indicator for a publication's resonance with the scholarly community.
This progress allow for ideas to be developed about how to make the analytics features even better in the future. Here are some ideas I have been thinking about:
- something equivalent to Altmetric that tracks social media mentions across multiple platforms and mentions in news articles, conference proceedings, etc.
- more longitudinal data for individual publications by month and year
- the ability to compare the performance of one's own publications, with perhaps a way to rank them in analytic reports by reads, citations, etc.
- More specific analytics to allow for comparisons within and between departments on an individual and collective basis, which can be sorted by discipline, field, etc.
Are there any additional analytics features that you would like to see on ResearchGate?
Meta-analyses and systematic reviews seem the shortcut to academic success as they usually have a better chance of getting published in accredited journals, be read more, and bring home a lot of citations. Interestingly enough, apart from being time-consuming, they are very easy; they are actually nothing but carefully followed protocols of online data collection and statistical analysis, if any.
The point is that most of this can be easily done (at least in theory) by a simple computer algorithm. A combination of if/thenstatements would simply allow the software to decide on the statistical parameters to be used, not to mention more advanced approaches that can be available to expert systems.
The only part needing a much more advanced algorithm like a very good artificial intelligence is the part that is supposed to search the articles, read them, accurately understand them, include/exclude them accordingly, and extract data from them. It seems that today’s level of AI is becoming more and more sufficient for this purpose. AI can now easily read papers and understand them quite accurately. So AI programs that can either do the whole meta-analysis themselves, or do the heavy lifting and let the human check and polish/correct the final results are on the rise. All needed would be the topic of the meta-analysis. The rest is done automatically or semi-automatically.
We can even have search engines that actively monitor academic literature, and simply generate the end results (i.e., forest plots, effect sizes, risk of bias assessments, result interpretations, etc.), as if it is some very easily done “search result”. Humans then can get back to doing more difficult research instead of putting time on searching and doing statistical analyses and writing the final meta-analysis paper. At least, such search engines can give a pretty good initial draft for humans to check and polish them.
When we ask a medical question from a search engine, it will not only give us a summary of relevant results (the way the currently available LLM chatbots do) but also will it calculate and produce an initial meta-analysis for us based on the available scientific literature. It will also warn the reader that the results are generated by AI and should not be deeply trusted, but can be used as a rough guess. This is of course needed until the accuracy of generative AI surpasses that of humans.
It just needs some enthusiasts with enough free time and resources on their hands to train some available open-source, open-parameter LLMs to do this specific task. Maybe even big players are currently working on this concept behind the scene to optimize their propriety LLMs for meta-analysis generation.
Any thoughts would be most welcome.
Vahid Rakhshan
hi, i'm currently writing my psychology dissertation where i am investigating "how child-oriented perfectionism relates to behavioural intentions and attitudes towards children in a chaotic versus calm virtual reality environment".
therefore i have 3 predictor variables/independent variables: calm environment, chaotic environment and child-oriented perfectionism
my outcome/dependent variables are: behavioural intentions and attitudes towards children.
my hypotheses are:
- participants will have more negative behavioural intentions and attitudes towards children in the chaotic environment than in the calm environment.
- these differences (highlighted above) will be magnified in participants high in child-oriented perfectionism compared to participants low in child oriented perfectionism.
i used a questionnaire measuring child-oriented perfectionism which will calculate a score. then participants watched the calm environment video and then answered the behavioural intentions and attitudes towards children questionnaires in relation to the children shown in the calm environment video. participants then watched the chaotic environment video and then answered the behavioural intentions and attitudes towards children questionnaire in relation to the children in the chaotic environment video.
i am unsure whether to use a multiple linear regression or repeated measures anova with a continuous moderator (child-oriented perfectionism) to answer my research question and hypotheses. please please can someone help!
RuntimeWarning: Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.
res = hypotest_fun_out(*samples, **kwds)
Above warning occured in python. Firstly, the dataset was normalised and then while performing the t-test this warning appeared, though the output was displayed. Kindly suggest some methods to avoid this warning.
I am somewhat Hegelian because I do not believe in martyrdom, and or dying on a hill, and usually the popular, and or traditional, opinion has a deeper less obvious reason.
Can anyone here me with one biostatistics question. It is about finding the sample size from power analysis. I have the variables. Just need an assistance with the calculations.
As a Computer Science student inexperienced in statistics, I'm looking for some advice on selecting the appropriate statistical test for my dataset.
My data, derived from brain scans, is structured into columns: subject, channels, freqbands, measures, value, and group. It involves recording multiple channels (electrodes) per patient, dividing the signal into various frequency bands (freqbands), and calculating measures like Shannon entropy for each. So each signal gets broken down to one data point. This results in 1425 data points per subject (19 channels x 5 freqbands x 15 measures), totalling around 170 subjects.
I aim to determine if there's a significant difference in values (linked to specific channel, freqband, and measure combinations) between two groups. Additionally, I'm interested in identifying any significant differences at the channel, measure or freqband level.
What would be a suitable statistical test for this scenario?
Thanks in advance for any help!
Has anyone gone through the Wohler's report_2023 yet? Its pros and cons? What are the ways to obtain its e-copy? Its subscription is very costly for a normal researcher (around 750 USD per user). Any alternatives to get similar kind of data as that of the report?
I'm excited to speak at this FREE conference for anyone interested in statistics in clinical research. 👇🏼👇🏼
The Effective Statistician conference features a lineup of scholars and practitioners who will speak about professional & technical issues affecting statisticians in the workplace.
I'll be giving a gentle introduction to structural equation modeling! I hope to see you there.
Sign up here:
Is it correct to choose the principal components method in order to show the relationship of species with biotopes?
My answer: Yes, in order to interpret history, disincentives are the most rigorous guide. How?: Due to the many assumptions of inductive logic, deductive logic is more rigorous. Throughout history, incentives are less rigorous because no entity(besides God) is completely rational and or self-interested, thus what incentivizes an act is less rigorous then what disincentivizes the same action. And, as a heuristic, all entities(besides God) have a finite existence before their energy(eternal consciousness) goes to the afterlife( paraphrased from these sources : 1)
2) )
, thus interpretation through disincentives is more rigorous than interpreting through incentives.
Who agrees life is more about preventing tragedies than performing miracles? I welcome elaborations.
If you're using a number such as a statistic from a reference study you want to cite, should you write the number with the confidence interval? And how to effectively prevent plagiarism when dealing with numbers?
Thank you!
Are people more likely to mix up words if they are fluent in more languages? How? Why?
Hi!
This might be a bit of a stupid question, but I am currently writing my master thesis. One of the things I am doing is a factor analysis on a scale developed in Canada. This scale has only been validated on the Canadian workforce (the developers have one time done a exploratory factor analysis and two times done a confirmatory factor analysis). I am doing an exploratory and a confirmatory factor analysis in the Norwegian workforce to see what factor structure I would find here, and if it is the same as in Canada. As this this only one of three things I am doing in my masters I have hypothesis for all the other findings, so my supervisor would like me to have hypothesis for the factor structure as well. Whenever I try to come up with some arguments, I always feel like I am just arguing for the same attitudes in both countries, rather than the factor structure.
My question is: How do you make a hypothesis for this where you argue for the same/a different factor structure without arguing for the same/different attitudes?
Thank you in advance! :)
I came across a commentary titled, 'Tests of Statistical Significance – their (ab)use in the social sciences' and it made me reflect on the validity of using my sample for statistical testing. I have a sample of 24 banks and they were not randomly selected. They were the top 50 banks ranked by the Banker and I narrowed down the sample to 24 because only those banks were usable for my study. I wanted to test the association between these banks using a McNemar's test but any result I obtain- I obtained insignificant results - would be meaningless, right? Because they are not a random selection. I did not want to make a generalisation, but I wanted to know if I could still comment on the insignificance of their association?
Hello. We understand that a volcano plot is a graphical representation of differential values (proteins or genes), and it requires two parameters: fold change and p-value. However, for IP-MS (immunoprecipitation-mass spectrometry) data, there are many proteins identified in the IP (immunoprecipitation group) with their intensity, but these proteins are not detected in the IgG (control group)(the data is blank). This means that we cannot calculate the p-value and fold change for these "present(IP) --- absent(IgG)" proteins, and therefore, we cannot plot them on a volcano plot. However, in many articles, we see that these proteins are successfully plotted on a volcano plot. How did they accomplish this? Are there any data fitting methods available to assist in drawing? need imputation? but is it reflect the real interaction degree?
We measured three aspects (i.e. variables) of self-regulation. We have 2 groups and our sample size is ~30 in each group. We anticipate that three variables will each contribute unique variance to a self-regulation composite. How do we compare if there are group differences in the structure/weighting of the composite? What analysis should be conducted?
I have a paper that proposed a hypothesis test that is heavily based on existing tests (so it is pretty much a procedure built on existing statistical tests). It was rejected by a few journals claiming that it was not innovative, although I demonstrated that it outperforms some commonly used tests.
Are there any journals that take this sort of papers?
I want to ask about the usage of parametrical and non-parametrical tests if we have an enormous sample size.
Let me describe a case for discussion:
- I have two groups of samples of a continuous variable (let's say: Pulse Pressure, so the difference between systolic and diastolic pressure at a given time), let's say from a) healthy individuals (50 subjects) and b) patients with hypertension (also 50 subjects).
- there are approx. 1000 samples of the measured variable from each subject; thus, we have 50*1000 = 50000 samples for group a) and the same for group b).
My null hypothesis is: that there is no difference in distributions of the measured variable between analysed groups.
I calculated two different approaches, providing me with a p-value:
Option A:
- I took all samples from group a) and b) (so, 50000 samples vs 50000 samples),
- I checked the normality in both groups using the Shapiro-Wilk test; both distributions were not normal
- I used the Mann-Whitney test and found significant differences between distributions (p<0.001), although the median value in group a) was 43.0 (Q1-Q3: 33.0-53.0) and in group b) 41.0 (Q1-Q3: 34.0-53.0).
Option B:
- I averaged the variable's values over all participants (so, 50 samples in group a) and 50 samples in group b))
- I checked the normality in both groups using the Shapiro-Wilk test; both distributions were normal,
- I used t Student test and obtained p-value: 0.914 and median values 43.1 (Q1-Q3: 33.3-54.1) in group a) and 41.8 (Q1-Q3: 35.3-53.1) in group b).
My intuition is that I should use option B and average the signal before the testing. Otherwise, I reject the null hypothesis, having a very small difference in median values (and large Q1-Q3), which is quite impractical (I mean, visually, the box plots look very similar, and they overlap each other).
What is your opinion about these two options? Are both correct but should be used depending on the hypothesis?
Neurons were treated with four different types of drugs, and then a full transcriptome was produced. I am interested in looking at the effects of these drugs on two specific pathways, each with around 20 genes. Would it be appropriate for me to just set up a simple comparative test (like a t-test) and run it for each gene? Or should I still use a differential gene expression package like DESeq2, even though only a few genes are going to be analysed? The aim of my experiment is a very targeted analysis, with the hopes that I may be able to uncover interesting relationships by cutting out the noise (i.e., the rest of the genes that are not of interest).
During writing a review, usually published articles are collected from the popular data source like PubMed, google scholar, Scopus etc.
My questions are
1. how we can confirm that all the articles that are published in a certain period (e.g.,2000 to 2020) are collected and considered in the sorting process(excluding and including criteria)?
2. When the articles are not in open access, then how can we minimize the challenges to understand the data for the metanalysis?
I would like to compare the percentage of slices which show gamma oscillations with the percentage of other slices from different conditions which also show gamma taking into consideration that groups might have different sample sizes. Thanks
Each sorbate is only adsorbed onto 1 sorbent at a time, in different sorption experiments. The research question is to determine which sorbate was the most sorbed and the fastest sorbed onto which sorbent. Is a test to measure the normality distribution of data needed? Thank you.
When am I supposed to use the following tests to check homogeneity of variance
1) O'Brien
2) Bartlett
3) F test
4) Leven's and it's variation
5) Brown Forsythe
Can anyone help me m
I want to know if the type of litter found between sites is significantly different. I have 7 stations with polymer IDs for litter from each station. Can I compare two categorical variables (stations, n=7; polymer type, n=11)?
Can anyone share some advice on what stats to use and any code in R.
I am trying to analyse mutation data for endometrial cancer obtained from different studies within several databases (COSMIC, cBioportal, Intogen). I have collated the data and grouped the mutations by gene. The focus of the analysis are non-synonymous coding mutations - because these mutations are most likely to cause a change in the normal protein function.
The aim of the study is to understand the mutational landscape of Endometrial cancer. The main objectives of the study are to find the commonly mutated genes in endometrial cancer, to find significantly damaging gene mutations in endometrial cancer and to create an updated list of genes comparable to commercial gene panels.
I have created this table with the collated data:
- Gene name
- Number of samples with coding mutations
- Frequency ( number of samples with coding mutations / total number of samples with coding mutation)
- CDS length
- Total number of unique coding mutations
- Number of unique coding: synonymous mutations
- Number of unique coding: non-synonymous mutations
- Mutation burden (number of unique coding: non-synonymoys mutations / CDS length)
- Composite score [(frequency of samples * 0.7) + (mutation burden * 0.3)]
The idea here is to use mutation burden to imply damaging effects of the genes' mutations in endometrial cancer. We then created a composite score to use as a comparable figure between the genes.
At the moment, our list of genes is at 16,000+. We are currently trying to think of a way to narrow down the list of genes to only focus on those significantly mutated compared to the other genes by way of statistics. Any advice is greatly appreciated.
first i send this number to bard chat bot
6020 5603 54830 6750 23000 12370 10000
it responds this.
You have listed 7 numbers.
- 6020
- 5603
- 54830
- 6750
- 23000
- 12370
- 10000
The sum of these numbers is 103573.
The average of these numbers is 14825.
The median of these numbers is 6750.
The mode of these numbers is 10000, which appears twice.
The range of these numbers is 23000 - 10000 = 13000.
The standard deviation of these numbers is 3844.74.
Is there anything else I can help you with?
then i ask it to standardize them.
and the respond was this.
Sure. Standardizing the numbers will mean subtracting the mean and then dividing by the standard deviation. This will transform the numbers so that they have a mean of 0 and a standard deviation of 1.
The standardized numbers are:
- -2.18
- -2.88
- 2.44
- -1.38
- 3.22
- -0.78
- 0.00
----------------------------------------
Bing chat respond to the same question what this
I can standardize the numbers you provided. Standardization is the process of transforming data to have a mean of 0 and a standard deviation of 1. This is done by subtracting the mean from each data point and then dividing by the standard deviation.
Here are the standardized values for the numbers you provided:
- 6020: -0.705
- 5603: -0.801
- 54830: 2.891
- 6750: -0.524
- 23000: 0.693
- 12370: -0.191
- 10000: -0.363
Is there anything else I can help with? 😊
------------------------------------------
and perplexity respond was this.
Based on the search results, it is possible to standardize the numbers provided by converting them into z-scores. Standardizing data is a common practice in statistics to compare variables that have different scales or units of measurement. To standardize the numbers provided, one would need to calculate the mean and standard deviation of the data and then transform each value to its corresponding z-score
Here are the steps to standardize the numbers:
- Calculate the mean of the data: Mean = (6020 + 5603 + 54830 + 6750 + 23000 + 12370 + 10000) / 7 = 15038.43
- Calculate the standard deviation of the data: Standard deviation = sqrt(((6020 - 15038.43)^2 + (5603 - 15038.43)^2 + (54830 - 15038.43)^2 + (6750 - 15038.43)^2 + (23000 - 15038.43)^2 + (12370 - 15038.43)^2 + (10000 - 15038.43)^2) / 6) = 17523.22
- Calculate the z-score for each value: Z-score = (X - Mean) / Standard deviationZ-score for 6020 = (6020 - 15038.43) / 17523.22 = -0.706 Z-score for 5603 = (5603 - 15038.43) / 17523.22 = -0.768 Z-score for 54830 = (54830 - 15038.43) / 17523.22 = 2.238 Z-score for 6750 = (6750 - 15038.43) / 17523.22 = -0.426 Z-score for 23000 = (23000 - 15038.43) / 17523.22 = 0.455 Z-score for 12370 = (12370 - 15038.43) / 17523.22 = -0.152 Z-score for 10000 = (10000 - 15038.43) / 17523.22 = -0.853
By standardizing the numbers, we have transformed them into a common scale that allows for easier comparison and analysis. It is important to note that the interpretation of z-scores depends on the context and the distribution of the data.
-----------------------------------------------------
so, as you can see it can't even get sum of the data correct. my question is why this models cant get sum of this number right even when they apply all the correct steps
I'm expecting to use stock prices from the pre-covid period up to now to build a model for stock price prediction. I doubt regarding the periods I should include for my training and test set. Do I need to consider the pre-covid period as my training set and the current covid period as my test set? or should I include the pre-covid period and a part of the current period for my training set and the rest of the current period as my test set?
There are 2 groups: one has an average of 305.15 and standard deviation of 241.83 while the second group has an average of 198.1 and a standard deviation of 98.1. Given the large standard deviation of the first group, its mean should not be significantly different from the second group. But when I conducted the independent sample t-test, it was which doesn't make sense. Is there any another test I can conduct to analyze the date (quantitative)?
The data is about solid waste generation on a monthly basis (averages). And I am comparing March and April data with that of other months. Also, the sample sizes are not equal i.e. less days in Feb as compared December for example.
Hi everyone. I have searched all around the internet and literature for the answer to this question but haven’t been able to find any info regarding my specific situation.
I have multiple experiments consisting of qPCR data but can’t figure out how to best analyse it. I have WT and KO cells which I apply 3 treatments to and I have a control (no treatment) for both genotypes, and I check 15 genes. What I really want to show is if the genes are up/downregulated when I add a treatment in ko vs wt, so I want to make my comparison between the genotypes. But I can’t compare them directly, because at baseline they have quite different expression levels already, so I want to take the control for each into consideration. Before I was plotting -Dct (normalised to housekeeping gene only) and would compare each treatment in each genotype to its own control. But my group didn’t like this, which I understand, because the graphs are cluttered and I don’t show the comparison I’m really trying to make. I worked with a bioinformatician with my idea to normalise the Dct for each genotype/treatment to its own control and in that way make DDct, and then I compare the -DDct between genotypes for each treatment using an unpaired t-test. I don’t do fold change. These graphs are much nicer to look at, but my supervisor says it doesn’t make statistical sense this way, and wants to keep the graphs the original way.
can anyone help me out? What is the best way to analyse and graph my data?
Hi!
My project includes 2 cell lines (developed from a stock culture) then each cell line was subcultured into 18 flasks. These 18 flasks are then separated into 3 temperatures and incubated for 6 time points. At each time point, I would analyze 2 flasks (one from each cell line), and this flask is discarded after.
Right now, we have the data analyzed with a mixed ANOVA model (time vs temperature as within factors) and tukey (to determine any differences between temperature, time point, and cell line). I was wondering if this is correct? Or should we do it differently because each time point uses a different flask compared to the next (meaning different cells from the stock culture).
thanks!
For some smaller and less know "statistics" often no option to calculated the error or confidence intervals is given. However, this might be obtained by bootstrapping. In addition, both McElrath and Kruschke have used grid approximation as an example in their well written books. While given as an example, I have never seen it being used, although understandably difficult for higher dimensional issues for estimations of single less critical parameters it might be appropriate.
Consider we apply a multivariate analysis and calculate "R" from ANOSIM (see vegan R package vegan::anosim and https://sites.google.com/site/mb3gustame/hypothesis-tests/anosim). Now I am not interested in the testing if the data is compatible with 0, (it non "random" and biased which might be one and the same thing). I want to make some statement on the meaningfulness of the posterior of R by adjusting the likelihood (which is all we do).
The "R" from ANOSIM does not return the error, however we can bootstrap the data and estimate the alpha and beta parameters from beta distribution under the assumption this "R" would be a random variable. Given we know alpha and beta parameters from "R" we could use this as the likelihood introduce the prior and obtain the posterior estimate (see Fig and example). I am just not aware whether this is a reasonable approach or not as there is not much documentation on grid approximation.
Thank you in advance!
Hello everyone! I am currently doing moderation/mediation analyses with Hayes Process.
As you can see the model 3 is significant with R2=.48
The independent variables have no sig. direct effect on the dependent variable, but significant interaction effects. The curious thing is: toptMEAN does not correlate with any of the variables, but still fits into the regression model. Should I take this as confirmation that toptMEAN has an effect on the dependent variable even though it does not correlate? Or am I missing something in the interpretation of these results?
(Maybe you could also suggest a different model for me to run. model 3 is currently the one with the highest r2 i found)
I have 23500 points, I sorted them in Excel from lowest to biggest, and then in a scatter plot, I create its chart, now I want to find the data that after that data (point) my chart starts to a high slope near 90 degrees, or in another word my chart begins growing up faster.
What steps should be taken in determining whether there is causal relationship between two variables?
Hello there, I would like to correlate the mRNA expression of a gene obtained from patient blood samples with surrogate parameters like HbA1c and data like body weight, age, etc. However, I am unsure and confused about the correct way to do it. I did not find a solution on the web so far.
Which qPCR-data should you I use (dCT, ddCT, 2^-ddCT)?
If it favorable to use log-transformed data, or does it distort the results?
Is it statistically correct to simply use an xy-graph and do a correlation analysis?
Regards!
I'm working on my PhD thesis and I'm stuck around expected analysis.
I'll briefly explain the context then write the question.
I'm studying moral judgment in the cross-context between Moral Foundations Theory and Dual Process theory.
Simplified: MFT states that moral judgmnts are almost always intuitive, while DPT states that better reasoners (higher on cognitive capability measures) will make moral judgmnets through analytic processes.
I have another idea - people will make moral judgments intuitively only for their primary moral values (e.g., for conservatives those are binding foundations - respectin authority, ingroup loyalty and purity), while for the values they aren't concerned much about they'll have to use analytical processes to figure out what judgment to make.
To test this idea, I'm giving participants:
- a few moral vignettes to judge (one concerning progressive values and one concerning conservative values) on 1-7 scale (7 meaning completely morally wrong)
- moral foundations questionnaire (measuring 5 aspects of moral values)
- CTSQ (Comprehensive Thinking Styles Questionnaire), CRT and belief bias tasks (8 syllogisms)
My hypothesis is therefore that cognitive measures of intuition (such as intuition preference from CTSQ) will predict moral judgment only in the situations where it concerns primary moral values.
My study design is correlational. All participants are answering all of the questions and vignettes. So I'm not quite sure how to analyse the findings to test the hypothesis.
I was advised to do a regressional analysis where moral values (5 from MFQ) or moral judgments from two different vignettes will be predictors, and intuition measure would be dependent variable.
My concern is that the anlaysis is a wrong choice because I'll have both progressives and conservatives in the sample, which means both groups of values should predict intuition if my assumption is correct.
I think I need to either split people into groups based on their MFQ scores than do this analysis, or introduce some kind of multi-step analysis or control or something, but I don't know what would be the right approach.
If anyone has any ideas please help me out.
How would you test the given hypothesis with available variables?
In many biostatistics books, the negative sign is ignored in the calculated t value.
in left tail t test we include a minus sign in the critical value.
eg.
result of paired t test left tailed
calculated t value = -2.57
critical value = - 1.833 ( df =9; level of significance 5%) (minus sign included since
it is a left tailed test)
now, we can accept or reject the null hypothesis.
if we do not ignore the negative sign i.e. -2.57<1.833 null hypothesis accepted
if we ignore the negative sign i.e. 2.57>1.833 null hypothesis rejected.
Hi!
I am performing PCR quantification of 5 inflammatory markers on 180 samples. As you can imagine, I therefore work on several 384-well plates. To compare them, I introduced a duplicate amplification control per primer for each plate to check if my amplification is indeed the same from one run to another.
Now, I have experimented with all my plates and I would like to run them through a statistical test to verify that the experiment is comparable from one plate to another. What statistical test should I use?
I'll let the stats pros answer! :)
Hello , I plan to use I-Distance statistical method to rank some countries based on some indicators. I found many papers that are used this method in their paper, however, I could not find any work that shows exactly how this method applied to the data and variables. E.g calculating partial coefficient of the correlation between two variables, etc. can anybody please suggest me any good resource for that?
Hello,
I am currently analyzing data from a study and am running into some issues. I have two independent variables (low vs high intensity & protein vs no protein intervention) and 5 dependent variables that I measured on two separate occasions (pre intervention and post intervention). So technically I have 4 groups a) low intensity, no protein b) low intensity, protein c) high intensity, no protein and d) high intensity, protein.
Originally I was going to do a two-way MANOVA as I wanted to know the interaction between the two independent variables on multiple dependent variables however I forgot about the fact I have two measurements of each of the dependent variables and want to include how they changed over time.
I can't seem to find a test that will incorporate all these factors, it seems like I would need to do a three-way MANOVA but can't seem to find anything on that. So I am thinking of a) calculating the difference in the dependent variables between the two time stamps and using that measurement for the MANOVA or b) using MANOVA for the measurement of dependent from the post test and then doing a separate test to see how each of the dependent variables changed over time. Is this the right line of thinking or am I missing something? When researching this I kept finding doubly multivariate analysis for multiple observations but it seems to me that that only allows for time and one other independent variable, not two.
Any guidance or feedback would be greatly appreciated :)
Dear ResearchGate community,
I have a statistical question which has given me a lot of headache (statistics usually do, but this is worse!). I have designed a survey in which participants read sentences, evaluate whether the sentences are meaningful and then select a response among 3 possible interpretations. I have 3 types of sentences (let's say language 1, language 2, language 3) and for each language there are 8 sentences. In total, the participants read 24 sentences (3x8).
What I'm interested is accuracy. Say a participant has accepted all 8 sentences in Language 1, whereas the correct meaning has been selected for only 5 of these. This means that the accuracy is 62,5%. In Language 2, on the other hand, the participant has accepted only 5 sentences and has found the correct meaning of only 3 of these. This means that my 100% is always changing.
Do anybody know how I can calculate mean precision with these kinds of numbers? The goal is to examine precision according to languages (1, 2 or 3). I have a feeling it has to do with ratios, but I'm quite lost at the moment!
Hey there!
I have data that includes the following arms:
- Healthy individuals who did not receive the drug
- Ill individuals who did not receive the drug
- Ill individuals who did receive the drug
- Healthy individuals who did receive the drug
I want to perform a meta-analysis and show all my results for one outcome in a single forest plot. My effect size is SMD.
Can you tell me if it's possible and how to do it?
And, by the way, does Stata have my back? If yes, what command should I use?
Has anyone conducted a meta-analysis with Comprehensive Meta-Analysis (CMA) software?
I have selected: comparison of two groups > means > Continuous (means) > unmatched groups (pre-post data) > means, SD pre and post, N in each group, Pre/post corr > finish
However, it is asking for pre/post correlations which none of my studies report. Is there a way to calculate this manually or estimate it somehow?
Thanks!
I have been assigned the task of performing business sales forecasting using time series analysis. However, before starting the forecasting process, I need to identify and treat the outliers in the dataset.
To achieve this, I have decided to use Seasonal Trend Decomposition (STL) with LOESS.
I would appreciate your assistance in implementing this technique using Python or R programming language.
Hello dear colleagues!
I want to present my results regarding water vapour transmission rate. I have 5 samples and I worked in triplicate. Do I apply the formula for each sample (1a, 1b, 1c, 2a, 2b, 2c etc) and then calculate the mean and SD or do I calculate the mean value and SD for each sample (1, 2, 3 etc) and then apply the formula?
the formula used is (initial weight-final weight)/ (areax24)
So should i use (initial weight of sample 1a - final weight of sample 1a) / (areax24) and calculate the mean and SD for sample 1
or calculate mean and DS of samples 1a, 1b, 1c and apply formula as(mean of initial weight of sample 1 - mean of final weight of sample 1) / (areax24)
I want to present my results as the number given by the formula +/- DS
For context, the study I am running is a between-participants vignette experimental research design.
My variables include:
1 moderator variable: social dominance orientation (SDO)
1 IV: target (Muslim woman= 0, woman= 1) <-- these represent the vignette 'targets' and 2 experimental conditions which are dummy-coded on SPSS as written here)
1 DV: bystander helping intentions
I ran a moderation analysis with Hayes PROCESS macro plug-in on SPSS, using model 1.
As you can see in my moderation output (first image), I have a significant interaction effect. Am I correct in saying there is no direct interpretation for the b value for interaction effect (Hence, we do simple slope analyses)? So all it tells us is - SDO significantly moderates the relationship between the target and bystander helping intentions.
Moving onto the conditional effects output (second image) - I'm wondering which value tells us information about X (my dichotomous IV) in the interaction, and how a dichotomous variable should be interpreted?
So if there was a significant effect for high SDO per se...
How would the IV be interpreted?
" At high SDO levels, the vignette target ___ led to lesser bystander helping intentions; b = -.20,t (88) = -1.65, p = .04. "
(Note: even though my simple slope analyses showed no significant effect for high SDO, I want to be clear on how my IV should be interpreted as it is relevant for the discussion section of the lab report I am writing!)
Refer File:
The table provides a snapshot of the literacy rates of a selection of countries across the globe as of 2011-2021 census. The data is based on information collected by the UNESCO Institute for Statistics. Literacy rate refers to the percentage of people aged 15 years and above who can read and write. The table includes 18 countries, with literacy rates ranging from a low of 43.0% in Afghanistan to a high of 99.7% in Russia. Some of the world's most populous countries, such as China, India, and Nigeria, have literacy rates below 80%. On the other hand, many developed nations, such as Canada, France, Germany, Japan, the United Kingdom, and the United States, have literacy rates above 98%. The data can be used to gain insight into global education levels and to compare literacy rates across countries. Country Total literacy rate Afghanistan 43.0%
Any Additions/Questions/Results/Ideas Etc?
Propensity score matching (PSM) and Endogenous Switching Regression (ESR) by full information maximum likelihood (FIML) are most commonly applied models in impact evaluation when there are no baseline data. Sometimes, it happens that the results from these two methods are different. In such cases, which one should be trusted the most because both models have their own drawbacks?
Hi all,
I have categorical data from 2 different experimental conditions (see the stacked bar graphs as an example). I can use Chi-squre test for association to see if there is a statistically significant difference between the frequencies of the categories in these two datasets. However, this does not tell me if the 'change in the particular category is significant' (i.e., is the decrease in the 'red' category from 37% to 26% significant?). I believe, I need a post hoc test for such pairwise comparison but I couldn't figure out which pot hoc test can be used. I have the percentage values and the actual sample sizes for each category.
Any suggestion is greatly appreciated.
Thanks!
I have 3 groups:
- Intervention 1 (n=6)
- Intervention 2 (n=9)
- Treatment as usual (n=12)
And I have 2 time points (pre and post intervention)
What kind of test do you recommend? Thank you!
Given 𝑛 independent Normally distributed random variables 𝑋ᵢ ∼ 𝑁(𝜇ᵢ,𝜎²ᵢ) and 𝑛 real constants 𝑎ᵢ∈ℝ, I need to find an acceptable Normal approximation of the distribution of 𝑌 random variable (assuming Pr[𝑋ᵢ≤0]≈0, to avoid divisions by zero)
Y = ∑aᵢXᵢ / ∑Xᵢ
I thought to split 𝑌 into single components
Y = a₁X₁ / ∑Xᵢ + a₂X₂ / ∑Xᵢ + ... + aₙXₙ / ∑Xᵢ
Y = a₁Y₁ + a₂Y₂ + ... + aₙYₙ
where the distribution of each 𝑌ᵢ can be found noting that
Yᵢ = Xᵢ / (Xᵢ + ∑Xⱼ) , for j≠i
and that
1/Yᵢ = (Xᵢ + ∑Xⱼ) / Xᵢ = 1 + ∑Xⱼ / Xᵢ
so, calling ∑Xⱼ = Uᵢ we can say that Xᵢ and Uᵢ are independent and, according to Díaz-Francés et Al. 2012, a Normal approximation of 1/𝑌ᵢ can be the one in figure 1 and, considering 1 ~ N(1,0), the r.v. Yᵢ can be approximated to figure 2. Thus, approximation of each aᵢYᵢ is the one in figure 3.
But now... I'm stuck at the sum of aᵢYᵢ because, not being independent, I don't know how to approximate the variance of their sum.
Any advice? Any more straightforward or more efficient method?
I have a data set of peak areas from gas chromatography I would like to run on a PLSR model. Generally, for PLSR I would center and scale the data, is that appropriate here?
As the peaks differ in scale between compounds on a magnitude of 100, running the model on unscaled data is unfeasible.
Is it standard to scale these peak areas? Is there a scaling method that will reduce overfitting the model and avoid introducing extra noise?
Hi,
I have data from a study that included 3 centers. I will conduct a multiple regression (10 IVs, 1 non-normally distributed DV) but I am unsure how to handle the variable "center" in these regressions. Should I:
1) Include "centre" as one predictor along with the other 10 IVs.
2) Utilize multilevel regression
Thanks in advance for any input
Kind regards
Hey guys,
So I am planning an experiment where I will check for reactions between two different compounds with distinct UV spectral curves. I was wondering how I might go about doing stats on the results?
My instinct is to just do T. tests on the peak values over time, but that seems extremely crude. I'm sure there must be better ways of doing things? What do you think? Has anyone got any experience or suggestions with this?
Hello, I'm currently analyzing antibody data with Repeated Measures ANOVA and have run into problems. I built a model with age, gender, vaccine background, and around 7 genetic polymorphisms. When I do multiple comparisons afterward, I get the error "all pairwise comparisons mean-mean scatterplot cannot be shown because confidence intervals cannot be computed jmp." I don't know how to solve this. Does anybody know what the problem is? Someone suggested something about the degrees of freedom running out, but I do not understand. Appreciate your help!
Dear Statisticians
We are a team of engineers working on an assistive device for stroke patients. We designed a questionnaire to ensure that the technology will meet patients' needs. I have a question regarding the sample size for our survey and I hope you help me with this. I used the following formula, and used the number of stroke patients in the UK as the population size:
https://lnkd.in/evJbP7R5
However, we also want to know how different stages of the disease would affect the answers. Thus, we have 3 subgroups (early subacute, late subacute, chronic) and we want to know how each group answers to our questions. Please notice that the population size of subgroups are not the same (e.g. early subacute 1000, late subacute 10,000, chronic 1,000,000). Could you please help me to calculate the sample size in this case.
Many thanks for your help
Ghazal
I have data from the our experimental model - where we analyze the immune response following BCG vaccination, and then the responses and clinical outcome following Mtb infection of our vaccinated models. Because we cannot experimentally follow the very same entity after evaulating the post-vaccination response also for the post vaccination plus post infection studies - we have such data from different batches. Is it possible to do correlation here between post vaccination responses of 5 replicates in one batch (in different vaccine candidates) versus 4-5 replicates in vaccination & infection from another batch? I ask this because we are not following up the same replicates for post vaccination and post infection measurements (as it is not experimentally feasible). If correlation is not the best method, are there other ways to analyze the patterns - such as strength of association between T cell response in BCG vaccinated models versus increased survival of BCG vaccinated models (both measurements are from different batches)? We have several groups like that, with a variety of parameters measured per group in different sets of experiments.
Thanks for your responses and help.
Dear all,
I want to calculate an effect of treatments (qualitative) on quantitative variables (e.g. plant growht, % infestation by nematodes, ...) compared to a control in an experimental setup. For the control and for each treatment, I have n=5.
Rather than comparing means between each treatments, including the control, I would like to to see whether each treatment has a positive/negative effect on my variable compared to the control.
For that purpose, I wanted to calculate the log Response Ratio (lnRR) that would show the actual effect of my treatment.
1) Method for the calculation of the LnRR
a) During my thesis, I had calculated the mean of the control value (Xc, out of n=5) and then compared it to each of the values of my treatments (Xti). Thereby, I ended up with 5 lnRR values (ln(Xt1/Xc); ln(Xt2/Xc); ln(Xt3/Xc); ln(Xt4/Xc); ln(Xt5/Xc)) for each treatment, calculated the mean of those lnRR values (n=5) and ran the following stats : i) comparison of the mean effects between all my treatments ("has one treatment a larger effect than the other one?") and ii) comparison of the effect to 0 ("is the effect significantly positive/negative?")
==> Does this method seem correct to you ?
b) While searching the litterature, most studies consider data from original studies and calculate LnRR from mean values within the studies. Hence, they end up with n>30. This is not our case here as data are from 1 experimental setup...
I also found this: "we followed the methods of Hedges et al. to evaluate the responses of gas exchange and water status to drought. A response ratio (lnRR, the natural log of the ratio of the mean value of a variable of interest in the drought treatment to that in the control) was used to represent the magnitude of the effects of drought as follows:
LnRR = ln(Xe/Xc) = lnXe - lnXc,
where Xe and Xc are the response values of each individual observation in the treatment and control, respectively."
==> This is confusing to me because the authors say that they use mean values of treatment / mean values of control), but in their calculation they use "individual observations". Are the individual observation means within each studies ?
==> Can you confirm that I CANNOT compare each observation of replicate 1 of control with replicate 1 of treatment; then replicate 2 of control with replicate 2 of treatment and so on? (i.e. ln(Xt1/Xc1); ln(Xt2/Xc2); ln(Xt3/Xc3); ln(Xt4/Xc4); ln(Xt5/Xc5)). This sounds wrong to me as each replicate is independent.
2) Statistical use of LnRR
Taking my example in 1a), I did a t-test for the comparison of mean lnRR value with "0".
However, n<30 so it would probably be best not to use a parametric test :
=> any advice on that ?
=> Would you stick with a comparison of means from raw values, without trying to calculate the lnRR to justify an effect ?
Thank you very much for your advice on the use of LnRR within single experimental studies.
Best wishes,
Julia
I have a vector based on a signal in which I need to calculate the log-likelihood and need to maximize it using maximum likelihood estimation. Is there any way to do this in MATLAB using the in-build function mle().
Hi,
There is an article that I want to know which statistical method has been used, regression or Pearson correlation.
However, they don't say which one. They show the correlation coefficient and standard error.
Based on these two parameters, can I know if they use regression or Pearson correlation?
I am examining whether sex and religion of a defendant may impact their percieved guilt, risk, possibility of rehabilitation and the harshness of sentencing. I have done this by creating 4 different case studies in which a defendant has differing sex and religion and was suspected of committing a crime. There were 200 participants in which 50 each where given one of the 4 case studies. Participants would then have to answer a number of questions about the case study such as "what sentence do you think is fair?" All data is ordinal. Ive been advised to used different statisical analysis so im confused and would like some advice on which one to use
Example Scenario: 1 categorical variable (Yes/No), 3 continuous dependent variables
- 3 independent sample's t tests are conducted (testing for group differences on three variables); let's assume 2 of the 3 variables are significant with medium-large effect sizes
- a binomial logistic regression is conducted for the significant predictors (for classification purposes and predictor strength via conversion of unstandardized beta weights to standardized weights)
Since 4 tests are planned, the alpha would be set at .0125 (.05/4) via the Bonferroni correction. Should the adjusted alpha be also applied to the p-values for the Wald tests in the "variables in question" output?
Thank you in advance!
For my master's thesis I have conducted visitor surveys at two different immersive experiences. To analyze the extent to which visitors feel immersed, I used three different conceptualizations of immersion (narrative transportation scale (6 items), short immersion scale (3 items) and self-location scale (5 items)). The items are all on a 5 point-likert scale.
What is the best way to compare the 2 case studies on the 3 separate scales, so that I can draw conclusions about how immersed visitors of case study 1 feel versus how immersed visitors feel in case study 2?
So that it looks like this:
- narrative transportation: case 1 mean versus case 2 mean
- immersion: case 1 mean versus case 2 mean
- self-location: case 1 mean versus 2 mean
Is differentiating means even the way to go about this?
I've looked at:
- simple independent samples t-test, but this is not advised for likert scales and my data is non-normally distributed, which is typical for a likert
- Mann-Whitney u test, but my data is around 70 respondents per group
But i've also heard that comparing scales on individual questions is fine, but not on groups of questions, like with my subscales of immersion.
At this point, I'm at a loss as to how to tackle this, so any suggestion or comment is much appreciated...
Hello,
I'm running a one-way ANCOVA to compare a ratio variable between two groups and adjust for some confounding variables. The software I use (SPSS) reports partial eta squared as effect size statistic. However, I would like to know if it's valid to calculate Cohen's d for this analysis.
I first thought to use the least squares means, standard error of the mean (SEM) and n reported for the estimated marginal means from the ANCOVA analysis. First, multiply SEM * SQRT(n) to get each group's standard deviation (SD) and then calculate the pooled SD to calculate Cohen's d.
However, I also found that Fritz et al. reported an equation to calculate Cohen's d from eta squared as d = 2 * SQRT(eta squared) / SQRT(1 - eta squared).
Would Cohen's d calculated from the procedures above be valid for ANCOVA? Or should I report eta/partial eta squared?
Thank you in advance.
Alejandro.
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2–18. https://doi.org/10.1037/a0024338
Hi ! I'm looking for an open source program dealing with exploratory technique of multivariate data analysis, in particular those based on Euclidean vector spaces.
Ideally, it should be capable of handling databases as a set of k matrices.
There is a software known as ACT-STATIS ( or a an older version named SPAD) who perform this task, but as far as I know they are not open source. Thanks !
I have ten regions and I created dummy variables for them. When I run my model one of them shows omitted, so I had to exclude the one showing omitted,and then I got significant result. But I need "the coefficient value of the excluded variable" to estimate Total Factor Productivity.
If possible, could you please explain how I can calculate it,dear colleagues.
I am currently writing up my PhD thesis, and I have shown that even with a sampling resolution of 1 sample per cm, the black shale I am studying shows evidence of brief fluctuations between oxic and anoxic states (probably decades to centuries in duration) in the form of in-situ benthic macrofauna.
The issue is that many of the geochemical sampling techniques I used can only resolve proxy records at a 1 cm scale, due to sample weight requirements (e.g. total lipid extraction for biomarker analysis), and multiple redox oscillations become time-averaged in these samples.
Is it possible to use some sort of model based on Bayesian statistics, to estimate the likely true frequency of oxia/anoxia in a given sampling interval (i.e. using the 1cm scale proxy data and the <1cm scale lithological data as priors)? Have there been any studies that have used some sort of bayesian model to estimate true frequencies between samples (in any field of study)?
How can I add the robust confidence ellipses of 97.5% on the variation diagrams (XY ilr-Transformed) in the robcompositions ,or composition packages?
Best
Azzeddine
Hi all!
Does anyone use endophenotypes for research purposes? I found very interesting papers explaining the value of endophenotypes (mostly in psychiatry), but I think the concept is perfectly applicable to any medical condition. Unfortunately, I can't find a methodological paper explaining the process of constructing an endophenotype.
Is there a formal statistical/methodological approach to do this? or more than the process of making them, is there an evidence-based process to probe their suitability?
What test is appropriate for a data set with 10 continuous dependent variables and one dichotomous independent variable? Is it possible to perform 10 separate independent t-tests or some sort of ANOVA (MANOVA)? The sample size is 1022.
How can I calculate ANOVA table for the quadratic model by Python?
I want to calculathe a table like the one I uploaded in the image.
We are using a continuous scoring system for the Eating Disorder Diagnostic Scale that gives a total score of disordered eating symptomatology with a score range from 0 to 109. We have a sample size of 48 and the scores range from 8 to 81. We want to see if participants' scores on the EDDS, along with whether they received intranasal oxytocin or placebo in a repeated-measures study design, had a significant effect on our dependent variables including performance on the Emotion Recognition Task and visual analogue scales taken at different time points during their visit, hence why we are using a repeated-measures ANOVA.
Any thoughts or advice would be greatly appreciated, thank you!
In my time series dataset, I have 1 dependent variable and 5 independent variables and I need to find the independent variable that affects the dependent variable the most (the independent variable that explains most variations in the dependent variable). Consider all 5 variables are economic factors.
For my master's thesis, I conducted an infection assay experiment on wheat plants in pots to test several treatments for their effectiveness to control the pathogen. I had 10 variants in total. Each variant consisted of 5 pots, where 2 of them were put in a frame and measured each day for another, rather unrelated experiment (Hyperspectral Imaging). As the pots in the frames were moved daily to a measuring chamber, lied under lights for several minutes and were fixed in the frame, we initially decided to score those pots seperately and only use the 3 other pots per variant to test the effectiveness of the treatment.
As the data contained many zeros and didn't follow a Gaussian distribution, I conducted a Kruskal-Wallis test with Wilcoxon as posthoc test. Due to only having 3 repetitions per variant now, I get very high p-values. Now I wanted to test, whether being put in the frame made a significant effect on the plants/pathogens, because if not, I can combine the scoring values of the 2 pots in the frame with the other 3 to have a total of 5 repetitions per variant. For this, I plan to implement a factor called 'frame' with values 1/2 (yes/no). However, I don't know, which test to conduct here to evaluate the effect.
Do I have to conduct a confirmative factor analysis?
Thanks in advance!