Advanced Statistical Analysis - Science topic

Chi Square Test Alternatives for Repeated Measures?

0 Recommendations

Adam Clifton

asked a question related to Advanced Statistical Analysis

Question

4 answers

Mar 10, 2024

Hello everyone, I hope you're doing well.

I recently conducted a test on simulating near-field reflections. Using a measured dataset of OBRIRs from a KEMAR HATS in an anechoic chamber, facing a reflective surface at distances of 0.25m and 0.5m as the hidden reference. I then created a simulated room and generated OBRIRs using AKTools roomsimualtion software, using various HRTFs of near-field (matching the 0.25m and 0.5m) and far-field measurements (an overall 2m measurement).

These were then presented to listeners using headphones and head tracking, convolved with separate male and female voice stimulus that had been modelled to come out of the listeners mouth, and the listener had to imagine that it was. and repeated 3 times for each voice. For each comparison, they were asked to pick which of the 3 options (the measured, near field HRTF, the far field HRTF) they thought was the most real/believable/plausible and then rate it on a scale from 1-6, 1 being not at all, 6 being very plausible. Each comparison, the options were randomised, so that the listener wouldn't get used to picking the same one. This was then repeated 3 times for each voice, then also repeated another 3 times for the other distance. This gave a total of 12 measurements per listener (3 male 0.25, 3 female 0.25, 3 male 0.5, 3 female 0.5).

My Hypothesis was that each of the options would be equally plausible and so there would be an equal selection from the listeners choices overall. So a presumed split of 1/3 between each option. I thought a Chi Square test would be suitable, however this is not true, as the data holds multiple answers from each listener.

I can't seem to find any data analysis methods that work for this setup? I thought about just taking each listeners initial response for the male 0.25, then female 0.25, then male 0.5, then female 0.5 and comparing that...somehow using Chi Square?

I was also intrigued if the distance and the voice had an effect on which option the listeners liked the most?

It does seem like there is a slight difference in which option was preferred. From a total of 22 listeners, the far-field HRTF had a higher frequency of 105, compared to the reference of 71 and the near-field of 88. I'm mostly looking at tests that can say whether this is statistically significant or not, but with a sample size of 22, I doubt I'll be able to make any huge judgements. But some listeners caught onto which they preferred and gave the same option each time. I might need to exclude this, or keep it I'm not sure?

Any advice you can provide is greatly appreciated, any further questions or information you need please let me know!

Thank you!

Relevant answer

Daniel Jacob Bilar

Mar 13, 2024

Answer

IYH Dear Adam Clifton

Based on your description, I suggest applying Generalized Linear Mixed Models (GLMM) followed by post hoc pairwise comparisons with Bonferroni adjustment. This approach allows you to incorporate multiple responses from the same participant, handle categorical variables, and determine statistically significant differences between conditions.

Example how you could structure the GLMM:

Set up the dependent variable as plausibility ratings on a continuous scale (e.g., 1-6).
Define fixed effects, namely the type of HRTF (near-field 0.25 m, near-field 0.5 m, far-field 2 m), distance (0.25 m, 0.5 m), and voice (male, female). Including interaction terms between HRTF types and voices might prove beneficial.
Add random effects to account for participants repeating trials and contributing idiosyncrasies, nesting trial repetitions within participants.

Once fitted, execute post hoc pairwise comparisons between HRTF types to investigate significant preferences. As suggested, I would adjust p-values accordingly using Bonferroni corrections to maintain strict control over family-wise error rates.

Additionally, calculate the proportion of consistent selections made by each participant for each condition, excluding those who consistently chose the same option regardless of presentation. Then run chi-square goodness-of-fit tests comparing expected frequencies (equal distribution) versus observed distributions derived from participants' consistent preferences.

Kindly be aware (or your prof may tell you :D) that your current sample size is relatively small, drawing definitive conclusions is still possible after properly accounting for dependencies and handling multiple comparisons.

Maybe a word on Bonferroni: The so-called Bonferroni adjustment serves to counteract the issue of multiple comparisons, reducing the chance of falsely rejecting null hypotheses. Essentially, it establishes stricter criteria for concluding statistical significance, thereby maintaining appropriate error rates in the presence of multiple tests.

Why is this needed? When executing post hoc pairwise comparisons after fitting a GLMM, running multiple tests simultaneously, increases the risk of encountering false positives. To mitigate this risk, we apply a corrected alpha level determined by dividing the desired family-wise error rate (Type I error rate applicable to the entire set of comparisons) by the number of tests performed.

Typically, you would choose α=0.05 or α=0.01 as baseline significance threshold. In cases w multiple comparisons, we set α_bonferroni=(α/number of tests). For example, suppose you intend to compare six means with a baseline α=0.05. In that situation, you'd compute α_bonferroni=0.05/6≈0.0083, meaning that you now require stronger evidence (smaller p-value) to claim significance.

Bonferroni is a safe conservative choice (however sometimes too much so, especially when tests exhibit low dependence) However, it guarantees strict control of the family-wise error rate. Other alternatives are Šídák or Holm–Šídák: Compared to Bonferroni, both Šídák and Holm–Šídák demonstrate reduced conservativeness ie lesser reduction in statistical power. In practical applications, choosing between Šídák and Holm–Šídák depends on the degree of dependence amongst the tests, the cost of committing type I or type II errors, and the complexity of computations tolerated.

Can I use the same variable twice as both a mediator and an independent variable in SEM?

6 Recommendations

Ceyhun Karabıyık

asked a question related to Advanced Statistical Analysis

Question

2 answers

Mar 5, 2024

Suppose that we have three variables (X, Y, Z). According to past literature Y mediates the relationship between X & Z while X mediates the relationship between Y & Z. Can I analyze these interrelationships in a single SEM using a duplicate variable for either X (i.e., Xiv & X Ddv) or Y (Yiv or Ydv)?

Relevant answer

Faheem Uddin Syed

Mar 6, 2024

Answer

It is possible to use the same variable twice, once as a mediator and once as an independent variable. This methodology enables a more comprehensive examination of the connections inside the model.

For Reference:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1350511-testing-mediation-between-10-independent-variables-and-1-dv-with-mixed

What is lack of fit test in statistical analysis in simple terms and why is it undesirable when it is significant?

5 Recommendations

Adarsh Shetty

asked a question related to Advanced Statistical Analysis

Question

3 answers

Feb 3, 2024

What are the possible ways of rectifying a lack of fit test showing up as significant. Context: Optimization of lignocellulosic biomass acid hydrolysis (dilute acid) mediated by nanoparticles

Relevant answer

https://stats.libretexts.org/Bookshelves/Computing_and_Modeling/Supplemental_Modules_%28Computing_and_Modeling%29/Regression_Analysis/Simple_linear_regression/Test_for_Lack_of_Fit

Feb 4, 2024

Answer

Check this link.

How to combine multiple probability distributions of measurements?

10 Recommendations

Henry Ra

asked a question related to Advanced Statistical Analysis

Question

6 answers

Oct 10, 2023

Hello,

I have the following problem. I have made three measurements of the same event under the same measurement conditions.

Each measurement has a unique probability distribution. I have already calculated the mean and standard deviation for each measurement.

My goal is to combine my three measurements to get a general result of my experiment.

I know how to calculate the combined mean: (x_comb = (x1_mean+x2_mean+x3_mean)/3)

I don't know how to calculate the combined standard deviation.

Please let me know if you can help me. If you have any other questions, don't hesitate to ask me.

Thank you very much! :)

Relevant answer

Alan Carvalho Dias

Jan 18, 2024

Answer

What is the pooled standard deviation?

The pooled standard deviation is a method for estimating a single standard deviation to represent all independent samples or groups in your study when they are assumed to come from populations with a common standard deviation. The pooled standard deviation is the average spread of all data points about their group mean (not the overall mean). It is a weighted average of each group's standard deviation.

Attached is the formula.

SD pooled_formu
la.png
29.23 KB

Clarity on Partial Least Square Regression?

0 Recommendations

Naveed Ahmed

asked a question related to Advanced Statistical Analysis

Question

1 answer

Nov 29, 2023

Dear Community,

I would like a question regarding the use of Partial Least Square Regression Analysis. Basically ,I am confused in units. For example, I have Year 1, Year 2, Year 3 land cover and Water Balance Components. The units of Water Balance Components are "mm", while the units of each landcover type for year 1, 2 and 3 are in square Km. I am confused how the different units will perform the PLSR test.

Either, I have to use the % difference in each Year or % of particular landcover type to the total area of the basin and similarly convert the water balance variables from "mm" to percentage.

Looking for a guidance. Please teach me.

Regards

Relevant answer

Suraj Shah

Jan 11, 2024

Answer

In my opinion, Normalizing each component will solve problem!

How to choose between all the possible p value adjustments ?

0 Recommendations

Jérémy Chantrel

asked a question related to Advanced Statistical Analysis

Question

1 answer

May 26, 2020

Hello everyone,

I am performing multiple comparisons at the same time (post hoc tests), but among all the possible p value adjustments available (Bonferroni, Holm, Hochberg, Sidak, Bonferroni-Sidak, Benjamini-Hochberg, Benjamini-Yekutieli, Hommel, Tukey, etc.), I don't know which one to choose... And I want to be statistically correct for the comparisons that I am making in my experiment.

In my experiment, there are 4 groups (let say A, B, C, D), but I want to compare A vs B, and C vs D. That's all. So, after performing wilcoxon tests, the non-parametrical equivalent of a t test (because I have such a low amount of repeat per group (n=6) + non-normality for some groups), for A vs B, and C vs D, I don't know which p value adjustment should be performed here.

I would like to understand 1. which adjustment I should perform here. 2. how to decide which test I should perform for any other analysis (what is the reasoning).

Thanks in advance for your response,

Relevant answer

Priyadarshini Chatterjee

Dec 31, 2023

Answer

Hi! Was your query answered? I am confused about a similar set up of mine!

How can I calculate the ICC1, ICC2 for each company?

0 Recommendations

Yang Gu

asked a question related to Advanced Statistical Analysis

Question

5 answers

Dec 14, 2023

I have a dataset that includes 1900 companies. Also, I investigated 10 employees for each company. There is a question about the risk preference of each employee. At now, I need to calculate the ICC1 and ICC2 values for each company. I have already coded for each company, so each company will have a unique company_id. At now, I have the employee dataset, it means I have the 19000 data, and each employee will match the company according to the company_id. In this case, how to get the ICC1, and ICC2 value of each company in R. I have already tried for few days, expecting someone could resolve my problem.

Relevant answer

Rainer Duesing

Dec 15, 2023

Answer

P.S.: Plaul Bliese has a multilevel tutorial for R, where he shows how to calcualte the above mentioned indices, as well as others, since all have their specific problems, which would lead too far to discuss them here.

https://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf

Variance Inflation Factor (VIF) for constatnt coefficent?

5 Recommendations

Adam Jonson

asked a question related to Advanced Statistical Analysis

Question

3 answers

Nov 29, 2023

In the case of a constant coefficient, where the VIF is greater than 10, what does that mean? Do all the variables in the model exhibit multicollinearity? How can multicollinearity be reduced? Multicollinearity could be reduced by removing variables with VIF >10. But I don't know what to do with the constant coefficient.

Thank you very much

Relevant answer

Thom Baguley

Nov 30, 2023

Answer

Looking further - your package may be reporting an uncentred VIF in place of or in addition to a centred VIF. There is an apparently unresolved debate in the literature about when or why that's useful. For practical purpose in most regressions it seems likely that high uncentred VIF may not be problematic. I've never seen uncentred VIF used in a published paper ...

5 Recommendations

How can I run a for loop for statistical functions in R?

asked a question related to Advanced Statistical Analysis

Question

5 answers

Nov 14, 2023

I want to repeat a statistical function (like -lm-, -glm- or -glmgee-) for a lot of variables. But, it does not work for statistical functions (example 1) but works for simple functions (example 2).

Important: I do not mean multivariate regression and using cbind()!

Example 1:

a = rnorm(10, 5, 1)

b = rnorm(10, 7, 1)

c = rnorm(10, 9, 1)

d = rnorm(10, 10, 1)

i = list(a, b, c)

for (x in i) {

lm(x~d)

}

Example 2:

a = rnorm(10, 5, 1)

b = rnorm(10, 7, 1)

c = rnorm(10, 9, 1)

d = rnorm(10, 10, 1)

i = list(a, b, c)

for (x in i) {

plot(x+d)

}

You can check in this site: https://rdrr.io/snippets/

Relevant answer

Nov 15, 2023

Answer

You need to save the output in an object. In the case of lm(), the best is to save the results in a list. This is achieved automatically if you'd use lapply() instead of for().

models <- lapply(i, function(x) lm(x~d))

Looking for host in Germany for Alexander von Humboldt Foundation fellowship

15 Recommendations

Jie Zhang

asked a question related to Advanced Statistical Analysis

Question

3 answers

Nov 5, 2023

Dear all,

I hope this message finds you well. I am currently in the process of applying for an Alexander von Humboldt Foundation fellowship, and I am actively seeking a host professor in Germany who shares my research interests and expertise.

As an experienced epidemiologist, my primary research focus lies in the fields of obesity and diabetes from a life course perspective. Over the years, I have honed my skills in the intricate handling of complex data and advanced statistical analysis, including the application of multilevel growth models and causal mediation analysis.

I would be honored to explore the possibility of collaborating with you as my host professor in Germany. Your expertise and research interests align well with my background, making you an ideal candidate for this partnership.

If you are open to discussing the potential of hosting me as a fellow in your research group, I would greatly appreciate the opportunity to engage in a more detailed conversation about our research synergies.

Thank you for considering my inquiry, and I look forward to your response.

Best,

Jie

Relevant answer

Hossein Safar Yousefifard

Nov 9, 2023

Answer

Dear Jie Zhang

You're welcome. Wish you the best.

Regards.

Haque Mobassir Imtiyazul Haque Shaikh

0 Recommendations

asked a question related to Advanced Statistical Analysis

Crazy thinker with technical background Vs Methodical researcher with technical background by the book? who will you back for hitting Gold?

Question

10 answers

Oct 28, 2023

There are a lot of researchers who go by the book the right approach and write results, and observations in their field of work, proving the existing information or suggesting improvement in the experiment for better analysis and so on, very hard working but then there are other who are crazy thinkers always suggesting things with little backup from existing experiments or know facts, always radical in their understanding of results, and these people mostly get dismissed as blip by the first category of researchers.

So if I have to take your opinion who will you back for hitting gold one who is methodical and hardworking or who are crazy thinker?

Relevant answer

Thomas Cuff

Oct 29, 2023

Answer

Hello Haque Mobassir Imtiyazul Haque Shaikh ,

I agree with your contention that some ideas initially strike most people as 'crazy' both in technical and nontechnical fields. Examples from nontechnical fields include: opposing slavery, gun control, democracy, women voting, environmentalism, climate change, etc. Examples from technical fields include: mRNA vaccines (COVID-19 vaccines from Moderna and Pfizer), prions (self replicating proteins), continental drift, quasicrystalls, Josephson junctions (SQUIDs), quantum mechanics, the personal computer, the Internet, the airplane, radio, TV, electricity, etc. One person's 'crazy' idea may eventually become widely accepted, and even commercially important. And don't forget, many 'crazy' ideas originated from by-the-book investigations: the idea of the quantum of energy arose from Max Planck's tireless attempts at trying to explain the shape of the blackbody curve using classical thermodynamics, and superconductivity in some metals was the result of a rather pedestrian checking of electrical conductivity of metals at liquid helium temperatures - no one expected superconductivity and no theory predicted it.

I really like your question.

Regards,

Tom Cuff

Can Primary and Secondary data be subjected to regression process in the same model?

30 Recommendations

Jerry Kwarbai

asked a question related to Advanced Statistical Analysis

Question

10 answers

Jun 23, 2016

It is possible to run a regression of both Seconday and primary data in the same model? I mean, when the dependent variable is primary data to be sourced via questionnaire and the Independent variable is secondary data to be gathered from published financial statements?

For Example: if the topic is Capital Budgeting moderator and shareholders wealth (SHW). Capital budgeting moderators is proxy by inflation , management attitude to risk, Economic condition and Political instability. while SHW is proxy by Market value, Profitability and Retained earnings.

Relevant answer

John Maccarthy

Oct 11, 2023

Answer

There should be a causal effect of the independent variables on the dependent variable in regression analysis. Primary data gathered through questionnaire for the dependent variable would be influenced by the current happenings while the independent variables based on secondary data was influenced by past or historical happenings. Therefore, there would not true linkages between independent variables and the dependent variable. Therefore, running a regression with both Secondary and primary data in the same model would not give you best outcome.

How to calculate the tangential and axial curvature for cornea ?

3 Recommendations

Nithin Rayudu

asked a question related to Advanced Statistical Analysis

Question

3 answers

Oct 4, 2023

Hi all,

I am trying to calculate the curvatures of the cornea and compare them with Pentacam values. I have the Zernike equation in polar coordinates (Zfit = f(r, theta)). Can anybody let me know the equations for calculating the curvatures ?.

Thanks & Regards.

Nithin

Relevant answer

Rodolfo Felipe De Oliveira Costa

Oct 5, 2023

Answer

I think you can try something like this

Demo - ma
in.pdf
27.66 KB

Which test should I use to compare biodiversity of different sites?

20 Recommendations

Anna Reboa

asked a question related to Advanced Statistical Analysis

Question

4 answers

Oct 2, 2023

Hi! I have a dataset with a list of species of sponges (Porifera) and the number of specimens found for each specie in three different sites. I add here a sample from my dataset. Which test should I use to compare the three sites showing both which species where found in each site and their abundance? I was also thinking of a visual representation showing just the difference between sites in terms of diversity of species (and not abundance), so that is possible to see which species were just in one sites and which ones were in both sites. For this last purpose I thought about doing an MDS but I am not sure if it is the right test to do neither how to do it in R and how to set the dataset, can you help me finding a script which also show the shape of the dataset? any advice in general would be great! thank you!

exampl
e.xlsx
9.41 KB

Relevant answer

Miriam Simma

Oct 4, 2023

Answer

Hi,

I wonder why you would like to ignore the abundance information?

Based on a species-site abundance matrix, you could calculate a dissimilarty matrix (if the abundance data should be considered, I would use bray-curtis similarity) and conduct a mantel test between all three dissimilarity matrices to test for correlations between the species compositions of the three sites.

Is it possible to test variation in a DV for 2x2x2 design for 2 manipulated and 1 measured variable?

14 Recommendations

Sundas Azim

asked a question related to Advanced Statistical Analysis

Question

4 answers

Sep 28, 2023

Is it possible to test a 2x2x2 design where the first two variables are manipulated high/low categories and the third variable is a measured continuous variable?

Would it be suitable to convert the measured continuous variable to a categorical variable to create a 2x2x2 design?

If so, i would now have 8 categories with multiple high/low combinations.

What test would i use to identify the differences across these groups in a dependent variable if i want to hypothesize that the DV would vary as a function of high/low categorical variable (3rd variable) values?

Relevant answer

Oct 1, 2023

Answer

In addition to the loss of power that Kelvyn Jones mentioned, when you carve a quantitative variable into categories, the fitted values are forced to follow an artificial step function. The attached image shows the relationship between age (X) and total cholesterol (Y). Notice that when X is carved into 3 categories, the fitted values are forced to follow a step function. Notice that people near a cut-point who have tiny differences in age have quite large differences in fitted values. And notice that people who fall at opposite ends of an age category have the same fitted value, despite having fairly large age differences. With that in mind, the fitted values from the linear regression model make a lot more sense, I think! HTH.

anova_v_regression_step_functi
on.png
119.15 KB

Choosing the appropirate statistical analysis.

7 Recommendations

Sundas Azim

asked a question related to Advanced Statistical Analysis

Question

7 answers

Sep 30, 2023

I am conducting a study which has 3 IVs ( POP, SOE, PI) and 1 DV (COO). 2 of the IVs (POP and SOE) are manipulated variables for high/ low, making 4 groups. However, the third IV (PI) is a measured variable which is a continuous variable. This means i cannot manipulate it to create high/ low conditions.

Should i convert the continuous IV (PI) to high/low conditions to make a 2x2x2 design?

If yes, what values of the high/ low aspects will i enter into my data sheet ?

If no, what options do i have for my analysis?

Someone told me it is not a good idea to convert the continuous third IV to a categorical variable. They told me the options i have are either Hierarchical regresison analysis or multiple regression analysis with interaction terms.

I would like to mention that i would also like to see the interactive effects of all three IVs, not only combinations of 2 IVs, on the DV. I want to hypothesize that COO will be highest for the combinations of High POP, high SOE, and high PI. Alternatively, the COO outcome should vary for high PI, when POP and SOE are high.

I would like suggestions to gain clarity on the best apporach i should follow, and the tests my study needs. For any analysis, what values do i enter in my data sheet for high/ low values of the two categorical IVs?

Relevant answer

Engr. Tufail

Sep 30, 2023

Answer

Sundas Azim, In your study with three independent variables (IVs) and one dependent variable (DV), it's generally not recommended to convert a continuous IV like PI into a categorical variable, as doing so can lead to a loss of information and statistical power. Instead, you can employ hierarchical regression analysis or multiple regression analysis with interaction terms to examine the interactive effects of all three IVs on the DV. To investigate your hypothesis that COO will be highest for the combination of High POP, high SOE, and high PI, you can create interaction terms for these conditions and include them in your regression model. For instance, you can create a variable that multiplies the values of POP, SOE, and PI when they are all high. Similarly, you can create interaction terms for other combinations you want to explore. This approach allows you to assess the impact of each IV while considering their interactive effects, avoiding the need to categorize the continuous variable PI. In your data sheet, you would enter the actual continuous values for POP and SOE, and for PI, you would enter the measured values. Ensure you center your continuous IVs (subtract the mean from each score) to aid in the interpretation of interaction effects. This approach will provide a more robust and informative analysis for your research question.

What factors should I consider when deciding between the Diagonally Weighted Least Squares (DWLS) and Unweighted Least Squares (ULS) estimators?

12 Recommendations

Matyáš Strašík

asked a question related to Advanced Statistical Analysis

Question

5 answers

Sep 18, 2023

Greetings,

I am currently in the process of conducting a Confirmatory Factor Analysis (CFA) on a dataset consisting of 658 observations, using a 4-point Likert scale. As I delve into this analysis, I have encountered an interesting dilemma related to the choice of estimation method.

Upon examining my data, I observed a slight negative kurtosis of approximately -0.0492 and a slight negative skewness of approximately -0.243 (please refer to the attached file for details). Considering these properties, I initially leaned towards utilizing the Diagonally Weighted Least Squares (DWLS) estimation method, as existing literature suggests that it takes into account the non-normal distribution of observed variables and is less sensitive to outliers.

However, to my surprise, when I applied the Unweighted Least Squares (ULS) estimation method, it yielded significantly better fit indices for all three factor solutions I am testing. In fact, it even produced a solution that seemed to align with the feedback provided by the respondents. In contrast, DWLS showed no acceptable fit for this specific solution, leaving me to question whether the assumptions of ULS are being violated.

In my quest for guidance, I came across a paper authored by Forero et al. (2009; DOI: 10.1080/10705510903203573), which suggests that if ULS provides a better fit, it may be a valid choice. However, I remain uncertain about the potential violations of assumptions associated with ULS.

I would greatly appreciate your insights, opinions, and suggestions regarding this predicament, as well as any relevant literature or references that can shed light on the suitability of ULS in this context.

Thank you in advance for your valuable contributions to this discussion.

Best regards, Matyas

pic.sv
g
52.88 KB

Relevant answer

E.A. Gawad

Sep 20, 2023

Answer

Thank you for your question. I have searched the web for information about the Diagonally Weighted Least Squares (DWLS) and Unweighted Least Squares (ULS) estimators, and I have found some relevant sources that may help you with your decision.

One of the factors that you should consider when choosing between DWLS and ULS is the sample size. According to Forero et al. (2009)1, DWLS tends to perform better than ULS when the sample size is small (less than 200), but ULS tends to perform better than DWLS when the sample size is large (more than 1000). Since your sample size is 658, it falls in the intermediate range, where both methods may provide similar results.

Another factor that you should consider is the degree of non-normality of your data. According to Finney and DiStefano (2006), DWLS is more robust to non-normality than ULS, especially when the data are highly skewed or kurtotic. However, ULS may be more efficient than DWLS when the data are moderately non-normal or close to normal. Since your data have slight negative skewness and kurtosis, it may not be a serious violation of the ULS assumptions.

A third factor that you should consider is the model fit and parameter estimates. According to Forero et al. (2009)1, both methods provide accurate and similar results overall, but ULS tends to provide more accurate and less variable parameter estimates, as well as more precise standard errors and better coverage rates. However, DWLS has higher convergence rates than ULS, which means that it is less likely to encounter numerical problems or estimation errors.

Based on these factors, it seems that both DWLS and ULS are reasonable choices for your data and model, but ULS may have some advantages over DWLS in terms of efficiency and accuracy. However, you should also check the sensitivity of your results to different estimation methods, and compare them with other criteria such as theoretical plausibility, parsimony, and interpretability.

I hope this answer helps you with your analysis. If you need more information, you can refer to the sources that I have cited below.

1: Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation by Carlos G. Forero, Alberto Maydeu-Olivares & David Gallardo-Pujol in British Journal of Mathematical and Statistical Psychology (2009)

: Non-normal and categorical data in structural equation modeling by Sara J. Finney & Christine DiStefano in Structural equation modeling: A second course (2006)

https://psycnet.apa.org/record/2010-12087-004

https://stats.stackexchange.com/questions/331543/fa-estimation-gls-wls-dwls-uls-and-robustness-of-these-estimators

https://lavaan.ugent.be/tutorial/est.html

Good luck

Autoregressive coefficients in longitudinal SEM indicating model bias?

33 Recommendations

Kaitlin Riegel

asked a question related to Advanced Statistical Analysis

Question

4 answers

Sep 13, 2023

I have a longitudinal model and the stability coefficients for one construct change dramatically from the first and second time point (.04) to the second and third time point (.89). I have offered a theoretical explanation for why this occurs, but have been asked about potential model bias.

Why would this indicate model bias? (A link to research would be helpful).

How can I determine whether the model is biased or not? (A link to research would be helpful).

Thanks!

Relevant answer

Sep 14, 2023

Answer

That makes sense. Are you comparing the cross-lagged panel (auto)regression (path) coefficients to zero-order correlations? This could be part of the issue (explain the "discrepancy"/low autoregressive stability coefficient). Regression coefficients are not equal to zero-order (bivariate) correlations. The regression coefficients take the correlation with other independent variables into account. This may explain why the autoregressive "stability" coefficients in your model look very different from the zero-order correlations. It is impossible to know without looking at your data and model in more detail.

The model fit does not look completely horrible at first sight but the chi-square test is significant and the RMSEA value is a bit high. I would take a look at model residuals and/or modification indices to find out where the model may be misspecified.

Relevant to the significance level of a test ?

9 Recommendations

Amine Allaoui

asked a question related to Advanced Statistical Analysis

Question

1 answer

Aug 5, 2023

Hello !

Please, relevant to the significance level of a test, I would like to know when can we increase the critical value α to 0.1.

Thank you in advance.

Relevant answer

Abedin Keshavarz

Aug 28, 2023

Answer

The significance level of a test is a predetermined threshold that is used to determine if the evidence from the sample provides enough support to reject the null hypothesis. It is typically denoted by the symbol α. The significance level is relevant in several ways: Type I error rate, Confidence level, Decision-makingm, Sample size determination

In summary, the significance level guides the researcher in interpreting the test results and making informed decisions regarding the null and alternative hypotheses.

Can anyone explain why this large lang model like (gpt 4 and lambda) messed up a simple statistic question?

0 Recommendations

Mohammed Looti

asked a question related to Advanced Statistical Analysis

Question

4 answers

Aug 28, 2023

first i send this number to bard chat bot

6020 5603 54830 6750 23000 12370 10000

it responds this.

You have listed 7 numbers.

6020
5603
54830
6750
23000
12370
10000

The sum of these numbers is 103573.

The average of these numbers is 14825.

The median of these numbers is 6750.

The mode of these numbers is 10000, which appears twice.

The range of these numbers is 23000 - 10000 = 13000.

The standard deviation of these numbers is 3844.74.

Is there anything else I can help you with?

then i ask it to standardize them.

and the respond was this.

Sure. Standardizing the numbers will mean subtracting the mean and then dividing by the standard deviation. This will transform the numbers so that they have a mean of 0 and a standard deviation of 1.

The standardized numbers are:

-2.18
-2.88
2.44
-1.38
3.22
-0.78
0.00

----------------------------------------

Bing chat respond to the same question what this

I can standardize the numbers you provided. Standardization is the process of transforming data to have a mean of 0 and a standard deviation of 1. This is done by subtracting the mean from each data point and then dividing by the standard deviation.

Here are the standardized values for the numbers you provided:

6020: -0.705
5603: -0.801
54830: 2.891
6750: -0.524
23000: 0.693
12370: -0.191
10000: -0.363

Is there anything else I can help with? 😊

------------------------------------------

and perplexity respond was this.

Based on the search results, it is possible to standardize the numbers provided by converting them into z-scores. Standardizing data is a common practice in statistics to compare variables that have different scales or units of measurement. To standardize the numbers provided, one would need to calculate the mean and standard deviation of the data and then transform each value to its corresponding z-score

Here are the steps to standardize the numbers:

Calculate the mean of the data: Mean = (6020 + 5603 + 54830 + 6750 + 23000 + 12370 + 10000) / 7 = 15038.43
Calculate the standard deviation of the data: Standard deviation = sqrt(((6020 - 15038.43)^2 + (5603 - 15038.43)^2 + (54830 - 15038.43)^2 + (6750 - 15038.43)^2 + (23000 - 15038.43)^2 + (12370 - 15038.43)^2 + (10000 - 15038.43)^2) / 6) = 17523.22
Calculate the z-score for each value: Z-score = (X - Mean) / Standard deviationZ-score for 6020 = (6020 - 15038.43) / 17523.22 = -0.706 Z-score for 5603 = (5603 - 15038.43) / 17523.22 = -0.768 Z-score for 54830 = (54830 - 15038.43) / 17523.22 = 2.238 Z-score for 6750 = (6750 - 15038.43) / 17523.22 = -0.426 Z-score for 23000 = (23000 - 15038.43) / 17523.22 = 0.455 Z-score for 12370 = (12370 - 15038.43) / 17523.22 = -0.152 Z-score for 10000 = (10000 - 15038.43) / 17523.22 = -0.853

By standardizing the numbers, we have transformed them into a common scale that allows for easier comparison and analysis. It is important to note that the interpretation of z-scores depends on the context and the distribution of the data.

-----------------------------------------------------

so, as you can see it can't even get sum of the data correct. my question is why this models cant get sum of this number right even when they apply all the correct steps

Relevant answer

Zoran Jasak

Aug 28, 2023

Answer

There are seven numbers and you stated that 10000 appears twice, which means that there are eight numbers. In calculation of average denominator is 7 which means that 10000 cannot apper twice. Range is calculated as 23000 - 10000 instead as 54830 - 5603 = 49227. Sum of those numbers is 128573, not 103573. Are you sure about those numbers ?

Coulibaly Wawogninlin Brice

4 Recommendations

asked a question related to Advanced Statistical Analysis

How to interpret multiple overlaps in the sequential Mann-Kendall Sneyer test?

Question

2 answers

Aug 22, 2023

Hello, could someone assist me in interpreting the results of the sequential Mann-Kendall Sneyer test? Indeed, according to Dufek (2008: Precipitation variability in São Paulo State, Brazil), "In the absence of any trend, the graphical representation of the direct series (u(t)) and the backward series (u'(t)) obtained with this method yields curves that overlap several times." In my case, I observe two to three overlaps, often with sequences that exhibit significant trends. Should I also conclude that there is an absence of trends in my dataset?

MKS2.j
pg
220.20 KB

Relevant answer

Ma'Mon Abu Hammad

Aug 25, 2023

Answer

The sequential Mann-Kendall test, also known as the Mann-Kendall-Sneyers (MKS) test, is a variation of the Mann-Kendall test that aims to detect trends in time series data. The test involves comparing the original time series to its reverse version to identify potential trends. The graphical representation of the direct series (u(t)) and the backward series (u'(t)) can provide insights into the presence or absence of trends. However, the interpretation can be nuanced.

Dufek (2008) suggests that if there is no trend, the curves of the direct and backward series will overlap several times. In your case, you observe two to three overlaps, often with sequences that exhibit significant trends. This situation requires careful consideration:

Overlaps: The fact that you observe overlaps in the curves suggests that there might be a lack of consistent and significant trends in your dataset. If you're seeing two to three overlaps, it could indicate a certain level of fluctuation without a clear upward or downward trend. However, it's important to consider the magnitude and duration of these overlaps. Short overlaps might be less indicative of a lack of trend than longer ones.
Significant Trends: The presence of sequences with significant trends might complicate the interpretation. Significant trends imply that some portions of the data are exhibiting systematic changes over time. The presence of these trends could be in contrast to the overlaps you observe.
Complex Patterns: Time series data can exhibit complex patterns that might not be captured by a single test or method. Overlapping curves and significant trends suggest that the behavior of your data might be more intricate than a simple upward or downward trend.
Data Context: Consider the context of your data and the subject matter. Sometimes, fluctuations and variations might be inherent to the process being studied, and these might not necessarily indicate a clear trend.

In conclusion, while the observation of overlaps in the graphical representation of the sequential Mann-Kendall test might suggest a lack of clear trends, the presence of significant trends in some segments complicates the interpretation. It's important to analyze the trends, magnitudes, and durations of both overlaps and significant trends while considering the broader context of your data and the subject matter you're studying. If possible, consulting with a statistician or subject-matter expert might help you make a more informed interpretation of your findings.

What is the most important use of the discrimination function in plant breeding and the economy?

3 Recommendations

Abbas Salehi

asked a question related to Advanced Statistical Analysis

Question

6 answers

Aug 22, 2014

In plant breeding, what are uses discrimination function.

Relevant answer

Aug 13, 2023

Answer

Discriminant function technique involves the development of selection criteria on a combination of various characters and aids the breeder in indirect selection for genetic improvement in yield. In plant breeding, the selection index refers to a linear combination of characters associated with yield.

9 Recommendations

Is it possible to design a graphical and interactive application as an output of R programming?

asked a question related to Advanced Statistical Analysis

Question

5 answers

Aug 10, 2023

I am looking for a graphical tool like visual basic software to define R codes for interactive graphical buttons and text boxes.

For example, I want to design a windows application with graphical design for calculation of body mass index (BMI). I want to have two boxes for weight and height imputation and a button for run. When clicking the button, I want to the below code be run.

BMI < - box1/(box2^2)

Relevant answer

Yves Payette

Aug 11, 2023

Answer

R in Power BI ?

It should not contain complex R syntaxes though.

https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-r-visuals

Why complicate the answer to those who need support?

7 Recommendations

Carlos Jimenez-Gallardo

asked a question related to Advanced Statistical Analysis

Question

5 answers

Jul 30, 2023

some of the people who consult are only users of statistics, while others are the ones who develop statistics, and we would love that people use it correctly.

But, "I believe" that many arrive late, always post process of experimentation, asking "what statistical process can I do or apply". Perhaps they do not know that they should always consult, with the question or the hypothesis that they wish to answer or verify, since it would allow a better answer. On the other hand, some come with simple queries, but usually a statistics class is given as an answer, which I feel in some cases is late. In some cases it is extremely necessary, but in others, it opens a debate that leads to serendipity. Wouldn't it be better, to try to advise them in a more precise way? I read them:

Relevant answer

Gioacchino de Candia

Aug 1, 2023

Answer

Hi Carlos Jimenez-Gallardo ,

precisely: two sides of the same coin.

How to calculate R2 for existing linear regression equation?

3 Recommendations

Pavel Makovsky

asked a question related to Advanced Statistical Analysis

Question

10 answers

Jul 4, 2023

I’ve got a data set and I want to calculate R2 for linear regression equation from another study.

For example, I have derived an equation from my data (with R2) and I want to test how other equations perform on my data (and thus calculate R2 for them). Then, I want to compare R2 from my data set with R2 from derivation studies.

Do you have any software for this? Any common statistical software could cope with this task (e.g. SPSS or SAS)? Maybe you have any tutorials on YouTube for this?

Relevant answer

Jul 5, 2023

Answer

Hello Pavel Makovsky. In that case, modify the code I posted earlier. With your dataset open and active:

REGRESSION

/STATISTICS COEFF OUTS CI(95) R ANOVA

/DEPENDENT YourDV

/METHOD=ENTER your explanatory variables

/SAVE PRED(yhat1).

* Now use another set of coefficients to generate yhat2.

COMPUTE yhat2 = use the other regression coefficients here but with the variables in your dataset.

REGRESSION

/STATISTICS R

/DEPENDENT YourDV

/METHOD=ENTER yhat1.

REGRESSION

/STATISTICS R

/DEPENDENT YourDV

/METHOD=ENTER yhat2.

How to specify the significance of the marginal effects in the binary logistic regression analysis?

6 Recommendations

Mohammad Jaber

asked a question related to Advanced Statistical Analysis

Question

3 answers

Jun 12, 2023

Dear colleagues,

I analyzed my survey data using binary logistic regression, and I am trying to assess the results by looking at the p-value, B, and Exp(B) values. However, the task is also to specify the significance of the marginal effects. How to interpret the results of binary logistic regression considering the significance of the marginal effects?

Best,

Relevant answer

Samira Akter Tumpa

Jun 13, 2023

Answer

To specify the significance of the marginal effects in binary logistic regression analysis, you can interpret the results by examining the p-values, B (coefficient estimates), and Exp(B) (exponentiated coefficient estimates) values. The p-value indicates the statistical significance of each predictor variable's effect on the outcome variable. A low p-value (typically less than 0.05) suggests a significant effect. The B values represent the estimated change in the log-odds of the outcome for a one-unit change in the predictor, with positive values indicating a positive association and negative values indicating a negative association. Exp(B) provides the odds ratio, which quantifies the change in odds for a one-unit increase in the predictor. An Exp(B) greater than 1 indicates an increased odds of the outcome, while a value less than 1 implies a decreased odds. By considering the significance of the marginal effects, you can determine the direction, magnitude, and statistical significance of the predictor variables' impacts on the binary outcome variable in your logistic regression analysis.

How to compare between conditions in LME in Matlab?

30 Recommendations

Chen Gafni

asked a question related to Advanced Statistical Analysis

Question

6 answers

Jun 17, 2016

I constructed a linear mixed-effects model in Matlab with several categorical fixed factors, each having several levels. Fitlme calculates confidence intervals and p values for n-1 levels of each fixed factor compared to a selected reference. How can I get these values for other combinations of factor levels? (e.g., level 1 vs. level 2, level 1 vs. level 3, level 2 vs. level 3).

Thanks,

Chen

Relevant answer

Vsevolod A Lyakhovetskii

May 26, 2023

Answer

First, to change the reference level You can specify the order of items in categorical array

categorical(A,[1, 2, 3],{'red', 'green', 'blue'}) or

categorical(A,[3, 2, 1],{'blue', 'green', 'red'})

Second, You can specify the correct hypothesis matrix for coefTest function for comparison between every pair of categories.

Comprehensive Meta Analysis (CMA)

0 Recommendations

Ravisha Jayawickrama

asked a question related to Advanced Statistical Analysis

Question

5 answers

Apr 25, 2023

Has anyone conducted a meta-analysis with Comprehensive Meta-Analysis (CMA) software?

I have selected: comparison of two groups > means > Continuous (means) > unmatched groups (pre-post data) > means, SD pre and post, N in each group, Pre/post corr > finish

However, it is asking for pre/post correlations which none of my studies report. Is there a way to calculate this manually or estimate it somehow?

Thanks!

Relevant answer

Ma'Mon Abu Hammad

Apr 26, 2023

Answer

Yes, it is possible to estimate the pre-post correlation coefficient in a meta-analysis using various methods, such as imputing a value or using a range of plausible values. Here are a few options:

Imputing a value: If none of your studies report the pre-post correlation, you can impute a value based on previous research or assumptions. A commonly used estimate is a correlation coefficient of 0.5, which assumes a moderate positive relationship between the pre and post-measures. However, it is important to note that this value may not be appropriate for all studies or research questions.
Using a range of plausible values: Another option is to use a range of plausible correlation coefficients in the analysis, rather than a single value. This can help to account for the uncertainty and variability in the data. A common range is 0 to 0.8, which covers a wide range of possible correlations.
Contacting study authors: If possible, you can try to contact the authors of the included studies to request the missing information or clarification about the pre-post correlation coefficient. This can help to ensure that the analysis is based on accurate and complete data.

Once you have estimated the pre-post correlation coefficient, you can enter it into the appropriate field in the CMA software and proceed with the analysis. It is important to carefully consider the implications of the chosen correlation coefficient and to conduct sensitivity analyses to test the robustness of the results.

Intervention (psychological) meta-analysis

4 Recommendations

Ravisha Jayawickrama

asked a question related to Advanced Statistical Analysis

Question

2 answers

Apr 16, 2023

Hello everyone,

I'm going to conduct a meta-analysis of psychological interventions relevant to a topic via Comprehensive Meta-Analysis (CMA) software. I have a few questions/points for clarification:

- From my understanding, I should only meta-analyse interventions that have used a pre-test, post-test (with and/or without follow-up) design, as meta-analysing post-test only designs with the others is not effective. Is my understanding correct?

- Can I combine between-subjects and within-subjects designs together or do I need to meta-analyse them separately?

Thanks in advance!

Relevant answer

David Morse

Apr 17, 2023

Answer

Hello Ravisha,

If cases are randomly assigned to treatment condition, there's no reason that post-only design results should be considered uninformative.

Designs with pre-post measures can offer the added benefits of: (a) allowing for estimation of change (though unless scores are completely reliable, the change scores will be less reliable than either the pre- or post- score by itself); or (b) pre-scores can be used as a covariate, to adjust for randomly occurring differences across groups.

One noted threat to pre-post designs is that if the interval separating them is too short, the post-results, and therefore group comparisons, can be biased, especially with measures of affect.

Ultimately, the answer depends on what your target ES might be: If it is post-treatment differences across groups/conditions, then either design can contribute. You could estimate ES separately by study type to see whether inclusion of pre-test appears to account for differences.

If it is strictly pre-post change, then post-only designs can't contribute (again, though, note the caveats above).

Good luck with your work.

How to analyse and compare a correlation between two variables over time between two groups?

5 Recommendations

Tuur Smolders

asked a question related to Advanced Statistical Analysis

Question

4 answers

Apr 14, 2023

I have ordinal data on happiness of citizens from multiple countries (from the European Value Study) and I have continuous data on the GDP per capita of multiple countries from the World Bank. Both of these variables are measured at multiple time points.

I want to test the hypothesis that countries with a low GDP per capita will see more of an increase in happiness with an increase in GDP per capita than countries that already have a high GDP per capita.

My first thought to approach this is that I need to make two groups; 1) countries with low GDP per capita, 2) countries with high GDP per capita. Then, for both groups I need to calculate the correlation between (change in) happiness and (change in) GDP per capita. Lastly, I need to compare the two correlations to check for a significant difference.

I am stuck however on how to approach the correlation analysis. For example, I dont know how to (and if I even have to) include the repeated measures of the different time points the data was collected. If I just base my correlations on one timepoint the data was measured, I feel like I am not really testing my research question, considering I am talking about an increase in happiness and an increase in GDP, which is a change over time.

If anyone has any suggestions on the right approach, I would be very thankful! Maybe I am overcomplicating it (wouldnt be the first time)!

Relevant answer

Yanping Wang

Apr 17, 2023

Answer

At the same time，Collect two variables data,As a sample,After collecting N samples over time,erform data regression analysis on them,The correlation coefficient will be obtained.

What post-hoc test should I use for nominal variables, after chi square test?

7 Recommendations

Daria Berezovska

asked a question related to Advanced Statistical Analysis

Question

4 answers

Apr 2, 2023

Hello! I would like to address the experts regarding a question about conducting a statistical analysis using only nominal variables. Specifically, I would like to compare the responses of survey participants answered the question whether they take certain medications "Yes" or "No", and analyze the data with different criteria such as education level, economic status, marital status, etc. I have conducted a Chi-squared test to determine if there is a significant difference between the variables, but now I would like to compare the answers of whether or not this medicine is taken depending on each group, for example in the education variable (higher, secondary, vocational and basic education). Is there a statistical test similar to Tukey's test that is suitable for nominal variables? I would also like to know if it is possible to create a column chart with asterisks above the columns indicating the significant differences between them based on this test for nominal variables.

I usually use Statistica StatSoft and R studio. But none of my attempts to do post-hoc for nominal variables analysis on any of them were successful. In R studio I tried pairwise.prop.test(cont_table, p.adjust.method = "bonferroni")

But I got an error:

Error in pairwise.prop.test(cont_table, p.adjust.method = "bonferroni") :

'x' must have 2 columns

I assume that this is due to the fact that I have groups in one of the variables and not two.

What should I do?

Thank you in advance for your help!

Relevant answer

Jos Feys

Apr 9, 2023

Answer

In attachment an R script with the BH post-hoc test based on Benjamini & Hochberg (1995). You could replace this with Bonferroni, but in my opinion this last method is too conservative.

chi-square_Esmail_Pos
tHoc.R
1.31 KB

Can we do a simple linear regression for vegetation index (normal data) and disease severity scoring (non-normal data)?

5 Recommendations

Rahul Raman

asked a question related to Advanced Statistical Analysis

Question

4 answers

Feb 12, 2023

The variables I have- vegetation index and plant disease severity scores, were not normal. So, I did log10(y+2) transformation of vegetation index and sqrt(log10(y+2)) transformation of plant disease severity score. Plant disease severity is on the scale of 0, 10, 20, 30,..., 100 and were scored based on visual observations. Even after combined transformation, disease severity scoring data is non-normal but it improves the CV in simple linear regression.

Can I proceed with the parametric test, a simple linear regression between the log transformed vegetation index (normally distributed) and combined transformed (non-normal) disease severity data?

Relevant answer

Koen Van de Moortel

Mar 25, 2023

Answer

Why would these variables have to be normal? As far as I understand our problem, a logistic model might do well. You can try it with my software "FittingKVdm", but if you can send me some dat, I can try it for you.

How to insert categorical predictos in path analysis?

8 Recommendations

Angel Tabullo

asked a question related to Advanced Statistical Analysis

Question

6 answers

Mar 12, 2023

Hi everyone! I need to examine interactions between categorical and continuos predictors in a path analysis model. What strategy would be more accurate: 1) including the categorical variable, the continous one and the interaction as separate terms, 2) run a multigroup analysis?

I have the same problem with several models. For instance, examining potential differences of executive function (continuos predictor) effects on reading comprehension (outcome variable) among children from different grades (categorical predictor).

Thank you so much for your help!

Relevant answer

Wadie Abu Dahoud

Mar 19, 2023

Answer

Very helpful paper with references:

https://web.pdx.edu/~newsomj/semclass/ho_moderation.pdf

Best,

Wadie

6 Recommendations

Mellis So

asked a question related to Advanced Statistical Analysis

GEE or mixed methods?

Question

3 answers

Mar 17, 2023

I want to study the relationship between parameters for physical activity in a lifespan and the outcome of pain (binary). I have a longitudinal data with four measurement, hence repeated measures.

Should I do an GEE or a mixed method? And does anyone guides on how to rearrange my dataset so it will fit the methods? I have tried the GEE with long data and wide but I keep on getting errors.

To clarify, my outcome is binary (at the last measurement) and further my independent variables are measured at four times (with the risk of them being correlated).

Relevant answer

Brian Kelly

Mar 17, 2023

Answer

Yes, that would be correct.

As your outcome/ dependent measure is only at one time point you would not have to consider time in relation to the outcome, so not a longitudinal model (no variation over time to model).

That is not to say that time may or may not be important in your research question. If trends/ differences/ averages in the repeated measures of independent variables are important in relation to your outcome then you can find ways to incorporate these things into your modelling strategy (in the way that you choose to use your repeated independent measures - being guided by research questions).

0 Recommendations

How can I define a graphics space to make plots like the attached figure below using the graphics package in r?

asked a question related to Advanced Statistical Analysis

Question

4 answers

Mar 13, 2023

How can I define a graphics space to make plots like the attached figure below using the graphics package in r?

I need help locating each position (centering) using the "mar" argument.

Reghais.a

Captur
e.JPG
32.35 KB

Relevant answer

Mar 13, 2023

Answer

You can use layout() to define a matrix of plots with different heights/widths. In your case, this will produce a layout similar to your picture:

m <- rbind(c(0,1,0), c(2:4))

layout(m, widths = c(1,1,1.5), heights = c(1,1))

par(oma = c(3,3,3,3), mar = c(0,0,0,0), las = 1, xaxs = "i", yaxs="i")

plot(NA, xlim = c(-1, 9), ylim = c(-1, 4), xaxt="n", yaxt="n")

axis(2, at = 0:4)

axis(3, at = 0:4 * 2)

plot(NA, xlim = c(0, 4), ylim = c(0, 3.5), xaxt="n", yaxt="n")

axis(1, at = 0:4)

axis(2)

plot(NA, xlim = c(-1, 9), ylim = c(0, 3.5), xaxt="n", yaxt="n")

plot(NA, xlim = c(0, 7), ylim = c(0, 3.5), xaxt="n", yaxt="n")

axis(1, at = 0:7)

axis(4)

Rplot.
pdf
4.80 KB

Same code different responces in R and Python

21 Recommendations

Hasan Misaii

asked a question related to Advanced Statistical Analysis

Question

6 answers

Feb 28, 2023

I am tiding up with the below problem, it's a pleasure to have your ideas.

I've written a coding program in two languages, Python and R, but each came to a completely different result. Before jumping to a conclusion, I declare that:

- Every word of the code in two languages has multiple checks and is correct and represents the same thing.

- The used packages in two languages are the same version.

So, what do you think?

The code is about applying deep neural networks for time series data.

Relevant answer

Inès François

Mar 1, 2023

Answer

Good morning, without the code it is difficilt to know where is the difference I do not use Python i work on R but maybe these difference is due to the stage of spitting dataset do you try to add thr same number in the count of generator of randomly for example seed(1234) (if my memory is good this function is also used in Python language. Were your results and metrics of evaluation totally different? In this case, mayve there is a reliability issue in your model. You should check your data preparation and features selection .

Looking for a way to derive standard deviations from estimated marginal means using mixed linear models with SPSS?

18 Recommendations

Julian Wienert

asked a question related to Advanced Statistical Analysis

Question

3 answers

Aug 1, 2016

Hi, I am looking for a way to derive standard deviations from estimated marginal means using mixed linear models with SPSS. I already figured where SPSS provides the pooled SD to calculate the SMD, however, I still need the SD of the means. Any help is appreciated!

Relevant answer

Nathaniel Allen

Feb 24, 2023

Answer

I was unsure how to pool SD from the SE without knowing N. A method I found used the "baseline SD" for each group.

I have an query in Gradient descent Algorithm which helps to minimize the values?

0 Recommendations

Naresh Bhimchand

asked a question related to Advanced Statistical Analysis

Question

5 answers

Feb 13, 2023

I have an data of 30 X 1 matrix, in which by using gradient descent algorithm is it possible to find the best optimized value.If yes, please share me the procedure or link for the detailed background theory behind it.it will be helpful for me to proceed further on my research.

Relevant answer

Roberto Vega

Feb 21, 2023

Answer

It depends on the cost function and the model that you are using. Gradient descent will converge to the optimal value (or very close to it) of the training loss function, given a properly set learning rate, if the optimization problem is convex with respect to the parameters. That is the case for linear regression using the mean squared error loss, or logistic regression using cross entropy. For the case of neural networks with several layers and non-linearities none of these loss functions make the problem convex, therefore there is no guarantee that you will find the optimal value. The same would happen if you used logistic regression with the mean squared error instead of cross entropy.

An important thing to note is that when I talk about the optimal value, I mean the value that minimizes the loss in your training set. It is always possible to overfit, which means that you find the optimal parameters for your training set, but those parameters make inaccurate predictions on the test set.

Plotting bivariate distribution from the few known parameters?

0 Recommendations

Rudolf Gaško

asked a question related to Advanced Statistical Analysis

Question

4 answers

Feb 15, 2023

I want to display the bivariate distribution of two (laboratory) parameters in sets of patients. I have available the data of N, mean +- SD of the first and second parameters. I am looking for software that could draw a bivariate distribution = ellipse from the given parameters. Can someone help me? Thank you.

Relevant answer

Yasser Al Zaim

Feb 20, 2023

Answer

Dear Dr. Gaško,

I'm glad to hear that. You are very welcome.

Best wishes.

Does "correlation coefficient" mean the authors use pearson correlation?

0 Recommendations

Bar Friedman

asked a question related to Advanced Statistical Analysis

Question

10 answers

Feb 7, 2023

Hi,

There is an article that I want to know which statistical method has been used, regression or Pearson correlation.

However, they don't say which one. They show the correlation coefficient and standard error.

Based on these two parameters, can I know if they use regression or Pearson correlation?

Relevant answer

Daniel Wright

Feb 7, 2023

Answer

Not sure I understand your question. If there is a single predictor and by regression you mean linear OLS regression, then the r is the same. Can you provide more details>

21 Recommendations

Bootstrap method to estimate the error rate in linear discriminant analysis ?

asked a question related to Advanced Statistical Analysis

Question

8 answers

Jan 24, 2023

How to run the Bootstrap method to estimate the error rate in linear discriminant analysis using r code?

Best

reghais.A

Relevant answer

Mayur Wanjari

Jan 25, 2023

Answer

Using R code, the bootstrap method can estimate the error rate in linear discriminant analysis. First, the data must be split into a training set and a test set and then normalized. The lda() function can then be used to run the calculations twice, with CV=TRUE for the first run to get predictions of class membership derived from leave-one-out cross-validation. The second run should use CV=FALSE to get predictions of class membership based on the entire training set. The true error rate estimator BT2 of the restricted linear or quadratic discriminant analysis can be calculated using the dawai package in R. Finally, resampling methods such as bootstrapping can be used to estimate the test error rate.

6 Recommendations

Is it common to use continuity correction toward our hypothesis instead of toward punishment?

asked a question related to Advanced Statistical Analysis

Question

5 answers

Jan 21, 2023

Dear all,

I want to know your opinions

Also, there is good paper here

Article Critical review and comparison of continuity correction meth...

Also,

https://www.ncbi.nlm.nih.gov/books/NBK115736/pdf/Bookshelf_NBK115736.pdf

Relevant answer

https://en.wikipedia.org/wiki/Yates%27s_correction_for_continuity

Jan 22, 2023

Answer

I've only glanced quickly at those two resources, but are you sure they are addressing the same thing? Yates' (continuity) correction as typically described entails subtraction 0.5 from |O-E| before squaring in the usual equation for Pearson's Chi2. E.g.,

But adding 0.5 to each cell in a 2x2 table is generally done to avoid division by 0 (e.g., when computing an odds ratio), not to correct for continuity (AFAIK). This is what makes me wonder if your two resources are really addressing the same issues. But as I said, I only had time for a very quick glance at each. HTH.

12 Recommendations

The robust confidence ellipses of 97.5% in robCompositions or compositions packages?

asked a question related to Advanced Statistical Analysis

Question

1 answer

Jan 8, 2023

How can I add the robust confidence ellipses of 97.5% on the variation diagrams (XY ilr-Transformed) in the robcompositions ,or composition packages?

Best

Azzeddine

Relevant answer

Jan 13, 2023

Answer

In order for the benefit to prevail, I have verified a group of packages that do the add of The robust confidence ellipses of 97.5%

View them here by package and its function

1- ellipses () using the package 'ellipse'

## ellipses () using the package 'rrcov'

## ellipses () using the package 'cluster'

Looking for academic research collaboration and collaborative research article writing

0 Recommendations

Vikas Ramteke

asked a question related to Advanced Statistical Analysis

Question

8 answers

Jan 9, 2023

Res. Sir/ Madam,

I am working as Scientist (Horticulture) and my research focus is improvement of tropical and semi arid fruits. I am also interested in working out role of nutrients in fruit based cropping systems.

Looking for collaborators from the field of Genetics and Plant Breeding, Horticulture, Agricultural Statistics, Soil Science and Agronomy.

Currently working on Genetic analysis for fruit traits in Jamun (Indian Blackberry).

Relevant answer

Andrew Paul McKenzie Pegman

Jan 11, 2023

Answer

Try to publish on your own then you have complete control. Collaborators will steal your data and treat you badly :)

Testing relationship between variables : Single or Multiple Regression?

27 Recommendations

Tebogo Joel Ramantswana

asked a question related to Advanced Statistical Analysis

Question

10 answers

Jan 26, 2017

I am testing hypothesis of relationships between CEA and Innovation Performance (IP). If I am testing the relationship of one construct , say Management support to IP , is it ok to use single linear regression? Of should I be testing it in a multiple regression with all the constructs?

Relevant answer

https://statisticasoftware.wordpress.com/2012/07/23/how-to-find-relationship-between-variables-multiple-regression/

Jan 10, 2023

Answer

Which effect size measure should one report after performing repeated measures conditional random intercept model analysis?

4 Recommendations

Lukasz Derdowski

asked a question related to Advanced Statistical Analysis

Question

2 answers

Jan 6, 2023

What are current recommendations for reporting effect size measures from repeated measures multilevel model?

Concerning analytical approach, I have followed procedure by Garson (2020) with matrix for repeated measures: diagonal, and matrix for random effects: variance components.

In advance, thank you for your contributions.

Relevant answer

Kelvyn Jones

Jan 8, 2023

Answer

You can use standard procedures for the fixed effects estimates as they are akin to regression model estimates if the response is continuous. Things are more complicated of the response is categorical.

Low Cronbach’s Alpha Value

0 Recommendations

Ravisha Jayawickrama

asked a question related to Advanced Statistical Analysis

Question

4 answers

Dec 25, 2022

Merry Christmas everyone!

I used the Interpersonal Reactivity Index (IRI) subscales Empathic Concern (EC), Perspective Taking (PT) and Personal Distress (PD) in my study (N = 900) When I calculated Cronbach's alpha for each subscale, I got .71 for EC, .69 for PT and .39 for PD. The value for PD is very low. The analysis indicated that if I deleted one item, the alpha would increase to .53 which is still low but better than .39. However, as my study does not focus mainly on the psychometric properties of the IRI, what kind of arguments can I make to say the results are still valid? I did say findings (for the PD) should be taken with caution but what else can I say?

Relevant answer

Dec 25, 2022

Answer

A scale reliability of .39 (and even .53!) is very low. Even if your main focus is not on the psychometric properties of your measures, you should still care about those properties. Inadequate reliability and validity can jeopardize your substantive results.

My recommendation would be to examine why you get such a low alpha value. Most importantly, you should first check whether each scale (item set) can be seen as unidimensional (measuring a single factor). This is usually done by running a confirmatory factor analysis (CFA) or item response theory analysis. Unidimensionality is a prerequisite for a meaningful interpretation of Cronbach's alpha (alpha is a composite reliability index for essentially tau-equivalent measures). CFA allows you to test the assumption of unidimensionality/essential tau equivalence and to examine the item loadings.

Also, you can take a look at the item intercorrelations. If some items have low correlations with others, this may indicate that they do not measure the same factor (and/or that they contain a lot of measurement error). Another reason for a low alpha value can be an insufficient number of items.

Statistical Analysis of RT-qPCR data, what test do I select?

63 Recommendations

Panagiotis N. Stoikos

asked a question related to Advanced Statistical Analysis

Question

4 answers

Dec 22, 2022

I have done my qPCR experiments and gave me some results, I used the DDCt method and I calculated the 2^(-DDCt), I transformed my data in base 10 logarithm and separated my samples between control and patients. I want to ask if I see that there is for example a fold change 4 times higher in patients for my gene of interest then I use one-tail or two-tail t-test, and what if the distribution is not normal, will I do non-parametric test, or I can skip the outliers and do the t-test. I am very confused in that statistical conundrum.

Relevant answer

Mohamed Khashan

Dec 24, 2022

Answer

Dear Panagiotis N. Stoikos

If your data is not normally distributed, you should use non-parametric statistical tests such as Wilcoxon rank sum tests or Mann-Whitney U tests in order to compare the expression levels between the two groups.

Regarding the one tailed or two tailed. one tailed can specify the direction of the effect (positive or negative) but the two tailed one can be used for both direction at the same time.

Best...

Generalized linear mixed models or PERMANOVA?

0 Recommendations

Kostadin Andonov

asked a question related to Advanced Statistical Analysis

Question

13 answers

Oct 2, 2022

Dear all,

I have conducted a research about snake chemical communication where I test the reaction of a few adult snake individuals (both males and females) to different chemical compounds. Every individual is tested 3 times with each of the compounds. Basically, I put a soaked paper towel in each of the individual terrariums and record the behavior for 10 minutes with a camera. The compounds are presented to the individuals in random order.

My grouping variable represents the reactions to each of the compounds for each of the sexes. For example, in the grouping variable I have categories titled “male reactions to compound X”, “male reactions to compound Y” etc. I have three dependent variables as follows: 1) whether there is an interest towards the compound presented or not (binary), 2) chin rubbing behavior recorded (I record how many times this behavior is exhibited) and 3) tongue-flick rate (average tongue-flicks per minute). The distribution is not normal.

What I would like to test is 1) whether there is a difference in the behavior between males and females, 2) whether there is a difference between the behavior of males snakes to the different compounds (basically if males react more to compound X, rather than to compound Y) and the same goes for females, and finally 3) whether males exhibit different behavior to different types of compounds (I want to combine for example compounds X, Y and Z, because they are lipids and A, B and C, because they are alkanes and check difference in male responses).

I thought that PERMANOVA will be enough, since it is a multivariate non-parametric test, but two reviewers wrote that I have to use Generalized linear mixed models, because of the repeated measures (as mentioned, I test each individual with each of the compounds 3 times). They think there might be some individual differences that could affect the results if not taken into consideration.

Unfortunately, I am a newbie in GLMM, and I do not really see how such model can help me answer my questions and test the respective hypotheses. Could you, please, advise me on that? And how should I build the data matrix in order to test for such differences?

Isn’t it also possible to check for differences between individuals with the Friedman test and then use PERMANOVA?

Thank you very much in advance!

Relevant answer

Congcong Wang

Dec 22, 2022

Answer

In general, permanova is a test of the effect of two parallel variables on the organism. It is equivalent to two one-way ANOVAs. Whereas GLMS is equivalent to the combined effect of all factors, in GLMS you can derive the contribution of each variable to determine the magnitude of the contribution of each environmental factor. You can understand that permanova is the parallel effect of several factors, while GLMS is the combined effect. GLMS is simple and easy to operate in R language.

Who is familiar with the oblique multiple group method of factor analysis?

9 Recommendations

Peter Prudon

asked a question related to Advanced Statistical Analysis

Question

7 answers

Oct 8, 2013

Holzinger (Psychometrika, vol. 9, no. 4, Dec. 1944) and Thurstone (Psychometrika, vol. 10, no. 2, June 1945; vol. 14, no. 1, March, 1949) discussed an alternative method for factoring a correlation matrix. The idea was to enter several clusters of items (tests) in the computer program beforehand, and then test them, optimize them and produce the residual matrix (which may show the necessity of further factoring). These clusters could stem from theoretical and substantive considerations, or from an inspection of the correlation matrix. It was an alternative to producing one factor at a time until the residual matrix becomes negligible, and was attractive because it spared much calculation time for the computers in that era. That reason soon lapsed but the method is still interesting as an alternative kind of confirmatory factor analysis.

My problem is: I would like to know the exact procedure (especially the one by Holzinger) but I cannot get hold of these three original publications (except the first two pages), unless against big expenses, nor can I find a thorough discussion of it in another publication, except perhaps in H.H. Harman (1976): Modern factor analysis, Section 11.5, but that book has disappeared from the university library, while on Google-books it is incomplete. Has anyone a copy of these publications, or is he/she familiar with this type of factor analysis?

Relevant answer

Peter Prudon

Dec 16, 2022

Answer

In the last few months, a colleague of mine has written a version of the PCO-program in R. The first impressions are good, but we need a few more months to test it and prepare a publication aboiut it.

What statistical tests are used for case control study with sample size of 5 patients?

0 Recommendations

Shahnawaz Ahmad Wani

asked a question related to Advanced Statistical Analysis

Question

14 answers

Dec 10, 2022

Please share this question with expert in statistics if you don't know answere.

I am stuck here, as i am working on therapy and trying to evalute the changes in biomarker levels. So I have selected 5 patients and analysed their biomarker levels prior therapy and then after first therapy and followed by 2nd therapy. So as i apply anova results show significant difference in their mean values but due larger difference in their standard deviations i am getting non significant results

like in this table below.

Sample Size Mean Standard Deviation SE of Mean

vb bio 5 314.24 223.53627 99.96846

cb1 bio 5 329.7 215.54712 96.3956

CB II 5 371.6 280.77869 125.56805

So I want to know from all those good statsticians who are well aware about the clinical trial studies.

Please suggest

Am i performing statistics correctly?

Should not i worry about non significant results?

What are the statistical tests I should use?

How will I represent my data for publication purposes?

Please be eloberative in answers?

Try to teach like you are teaching to the fresher to this field.

Relevant answer

Jan Homolak

Dec 10, 2022

Answer

Massimo Sivo very nice of you that you want to try to help your colleague, however, as was mentioned earlier, you should understand experimental design very well before you have sufficient information to construct an appropriate statistical model (that can than provide meaningful insights). To understand experimental design of a clinical trial it is not enough to understand some statistics, you should also understand some medicine. E.g. From what Shahnawaz Ahmad Wani described one can by no means understand what was actually measured, how, and why. It is super challenging to sample 5 patients with appropriate controls to make any kind of biologically meaningful inference about the population of patients. Tons of confounders and no power to take them into account. From how the problem was presented I can image such (confounder) data was not collected. Using such data and shoving them into e.g. mixed model to account for repeated measures (and dependence of some parameters) will most likely give meaningless results. Even if everything was done right but you did not account for a biologically meaningful confounder... it would still be meaningless. For example, Shahnawaz Ahmad Wani told us he is measuring some kind of biochemical parameter from the blood from female patients. Many parameters in the blood change with the part of the menstrual cycle, and it is not clear whether the authors analysed this. Without accounting for such a serious confounder the study makes no sense. Furthermore, how would you construct a model around the dependence of venous blood and "tissue blood". The variables are most definitely related, but we cannot imagine the nature of their dependence without having much more information. So, before asking for data, you should really ask for the explanation of the design to make sure all the confounders were ruled out the the level of design as this is the only scenario in which several lines of code or several clicks in some statistical software will help you in any way to make some kind of inference.

Please be mindful about this kind of stuff, uncertain and misleading conclusions can be very dangerous in medicine and it is healthier for the community to talk about them than to just generate some numbers from the data.

Best,

jan

How can I examine the relationship between a data with only 1 values and the predictive values?

44 Recommendations

Serkan Özdemir

asked a question related to Advanced Statistical Analysis

Question

5 answers

Dec 5, 2022

I have a distribution map produced with only presence data. And there is a certain number of presence data that is in no way included in the model. How can I evaluate the compatibility of the presence data not included in the model I have with the predictive values corresponding to these points in the potential distribution map? So we can also think like this: I have two columns. The first column has only 1 values, the second column has the predictive values. Which method would be the best approach to examine the relationship between these two columns?

Relevant answer

Dec 5, 2022

Answer

I'm not sure I fully understand your question but when you have a column with a constant value (e.g., 1), this constant by definition cannot covary with another column/variable. A constant does not have any variance and therefore, the covariance/correlation with another variable will also be zero by definition.

How to recode my variables using SPSS syntax?

30 Recommendations

Ange H.

asked a question related to Advanced Statistical Analysis

Question

3 answers

Nov 25, 2022

Hello, I currently have a set of categorical variables, coded as Variable A,B,C,etc... (Yes = 1, No = 0). I would like to create a new variable called severity. To create severity, I know I'll need to create a coding scheme like so:

if Variable A = 1 and all other variables = 0, then severity = 1.

if Variable B = 1 and all other variables = 0, then severity = 2.

So on, and so forth, until I have five categories for severity.

How would you suggest I write a syntax in SPSS for something like this?

Relevant answer

Nov 27, 2022

Answer

* Create a toy dataset to illustrate.

NEW FILE.

DATASET CLOSE ALL.

DATA LIST LIST / A B C D E (5F1).

BEGIN DATA

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

1 1 0 0 0

0 1 1 0 0

0 0 1 1 0

0 0 0 1 1

1 0 2 0 0

END DATA.

IF A EQ 1 and MIN(B,C,D,E) EQ 0 AND MAX(B,C,D,E) EQ 0 severity = 1.

IF B EQ 1 and MIN(A,C,D,E) EQ 0 AND MAX(A,C,D,E) EQ 0 severity = 2.

IF C EQ 1 and MIN(B,A,D,E) EQ 0 AND MAX(B,A,D,E) EQ 0 severity = 3.

IF D EQ 1 and MIN(B,C,A,E) EQ 0 AND MAX(B,C,A,E) EQ 0 severity = 4.

IF E EQ 1 and MIN(B,C,D,A) EQ 0 AND MAX(B,C,D,A) EQ 0 severity = 5.

FORMATS severity (F1).

LIST.

* End of code.

Q. Is it possible for any of the variables A to E to be missing? If so, what do you want to do in that case?

14 Recommendations

How to convert simulated normal data to bernoulli distributed data in Stata?

asked a question related to Advanced Statistical Analysis

Question

9 answers

Nov 25, 2022

I am using -corr2data- to simulated raw data from a correlation matrix. However, some variables that I need should be binary. How can I convert?

Is it possible to convert higher amounts to 1 (and the other ones to 0) as the form to reach the same mean? How should I do it?

Is a way in R?

(I want to perform a GSEM on a correlation matrix)

(I know -faux- package in R. But my problem is that just some of [not all of] my variables are binary.)

Relevant answer

David Eugene Booth

Nov 27, 2022

Answer

Maybe the attached is what you mean. David Booth

Screenshot_20221127-0050
32.png
330.38 KB

Data cleaning: How to recode my variable in SPSS by writing syntax?

8 Recommendations

Ange H.

asked a question related to Advanced Statistical Analysis

Question

3 answers

Nov 26, 2022

if Variable A = 1 and all other variables = 0, then severity = 1.

if Variable B = 1 and all other variables = 0, then severity = 2.

So on, and so forth, until I have five categories for severity.

How would you suggest I write a syntax in SPSS for something like this? Thank you in advance!

Relevant answer

Robert Trevethan

Nov 26, 2022

Answer

Ange, I think the easiest way for you to find an answer to your question would be to google something such as "SPSS recode variables YouTube". You'll probably find several sites that demonstrate what you want to do.

All the best with your research.

Which statistical test is appropriate for my data?

14 Recommendations

Daisy Westion

asked a question related to Advanced Statistical Analysis

Question

5 answers

Nov 25, 2022

I am creating a hypothetical study in which there are two drugs being tested. Thus I have taken 60 participants and randomly split them into three groups: drug A, drug B and a control group. A YBOCS score will be taken before the trial after the trial has ended and then again at a 3-month follow-up. Which statistical test should I use to compare the three groups and to find out which was most effective?

Relevant answer

Daniel Wright

Nov 25, 2022

Answer

What do you mean "hypothetical study?" Is this a homework question?

How to compare a presence-absence dataset of species over time?

28 Recommendations

Marie Dankworth

asked a question related to Advanced Statistical Analysis

Question

4 answers

Oct 24, 2022

For example:

If there are 40 species identical between two sites, they are the same. However, two sites can each have 40 species each, but none in common. So by species number they are identical but by species composition they are 0% alike.

How can I calculate or show the species composition of the two sites over time?

Relevant answer

Andrew Paul McKenzie Pegman

Oct 26, 2022

Answer

You use beta diversity (β), which is a measure of the difference in composition of species between locations :)

Can the properties of frequentoist be proved mathematically?

8 Recommendations

Jianhong Xu

asked a question related to Advanced Statistical Analysis

Question

4 answers

Oct 19, 2022

During the lecture, the lecturer mentioned the properties of Frequentist. As following

Unbiasedness is only one of the frequentist properties — arguably, the most compelling from a frequentist perspective and possibly one of the easiest to verify empirically (and, often, analytically).

There are however many others, including:

1. Bias-variance trade-off: we would consider as optimal an estimator with little (or no) bias; but we would also value ones with small variance (i.e. more precision in the estimate), So when choosing between two estimators, we may prefer one with very little bias and small variance to one that is unbiased but with large variance;

2. Consistency: we would like an estimator to become more and more precise and less and less biased as we collect more data (technically, when n → ∞).

3. Efficiency: as the sample size incrases indefinitely (n → ∞), we expect an estimator to become increasingly precise (i.e. its variance to reduce to 0, in the limit).

Why Frequentist has these kinds of properties and can we prove it? I think these properties can be applied to many other statistical approach.

Relevant answer

Maik Kschischo

Oct 19, 2022

Answer

Sorry, Jianhing. But I think you have misunderstood something in the lecture. Frequentist statistics, which is an interpretation of probability to be assigned on the basis of many random experiments.

In this setting, on designs functions of the data (also called statistics) which estimate certain quantities from data. For example, the probability p of a coin to land heads is given from n independent trials with the same coin and just counting the fraction of heads. This is then an estimator for the parameter p.

Each estimator should have desirable properties, as unbiasedness, consistency, efficiency and low variance and so on. Not every estimator has these properties. But, in principle one can proof, whether a given estimator has these properties.

So, it is not a characteristics of frequentist statistics, but a property of an individual estimator based on frequentist statistics.

Is it possible to understand population distribution from sampling distribution?

30 Recommendations

Brajaballav Kar

asked a question related to Advanced Statistical Analysis

Question

4 answers

Nov 27, 2019

Assuming that a researcher does not know the nature of population distribution (the parameters or the type e.g. normal, exponential, etc.), is it possible that the sampling distribution can indicate the nature of the population distribution.

According to the central limit theorem, the sampling distribution is likely to be normal. So, the exact population distribution can not be known. The shape of the distribution for a large sample size is enough? or It has to be inferred logically based on different factors?

Am I missing some points? Any lead or literature will help.

Thank you

Relevant answer

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjC09PAzNP6AhWD_qQKHTLvDLUQFnoECAoQAw&url=https%3A%2F%2Fbookdown.org%2Fcquirk%2FLetsExploreStatistics%2Flets-explore-sampling-distributions.html&usg=AOvVaw3Zj_cDTmalpD2FTsTKDYU3

Oct 9, 2022

Answer

Is there a function in R to count the number of words each participant used in a variable?

14 Recommendations

Serra Baskurt

asked a question related to Advanced Statistical Analysis

Question

4 answers

Sep 23, 2022

Hello,

I have a variable in a dataset with ~500 answers; it essentially represents participants' answers to an essay question. I am interested in how many words each individual has used in the task and I cannot seem to find a function in R to calculate/count each participant's words in that particular variable.

Is there a way to do this in R? Any packages you think could help me do this? Thank you so much in advance!

Relevant answer

Serra Baskurt

Sep 24, 2022

Answer

Thank you so much, Daniel Wright , Jochen Wilhelm , Richard Johansen ! Your answers were very helpful. I was able to do it through the string package.

Hosmer-Lemeshow goodness of fit test

3 Recommendations

Rolando Garcia

asked a question related to Advanced Statistical Analysis

Question

3 answers

Mar 31, 2014

Can you still have a good model despite a p-value < .05 for the H-L goodness of fit test? Any alternative testing in SAS or R?

Relevant answer

Siti Khuzaiyah

Sep 16, 2022

Answer

What if the p-value of the HL test doesn't appear? it just appeared as this code ".". what is that mean? thank you

If one variable are not normally distributed but the other was normally distributed what test should be used to compare between them?

0 Recommendations

Mirna K. Faiq

asked a question related to Advanced Statistical Analysis

Question

7 answers

Sep 3, 2022

if for example I want to compare BMI for two group?

When I use shapiro wilk test to check normality between BMI of each group!

one group is normally distributed

and the other was not ?

what test should i use either t-test or mann whitney?

Relevant answer

Sal Mangiafico

Sep 4, 2022

Answer

Javier Ernesto Vilaso Cadre

12 Recommendations

asked a question related to Advanced Statistical Analysis

Percentage variables in ANOVA

Question

8 answers

Sep 3, 2022

A few years ago, in a conversation that remained unfinished, a statistics expert told me that percentage variables should not be taken as a response variable in an ANOVA. Does anyone know if this is true, and why?

Relevant answer

Sep 3, 2022

Answer

Javier Ernesto Vilaso Cadre , tests do not corroborate that a distribution is normal. They may only fail to corroborate that a distribution is not normal, and that may simple be due to the sample size. Actually, such tests only tell you if your sample size is already large enough to see that the normal distribution model (an idealized model!) does not account for all features of a real distribution. So actually they don't give you any useful information (you may fail to see relevant discrepancies because the sample size is too small, or you may get blinede with "statistically significant" discrepancies that are irrelevant for your problem). The only sensible way is to understand the variable and have some theoretical justification of its distribution, and then to judge if the presumed discrepancies are relevent for your problem. One may then certainly have a look at the empirical distribution of the observed data: if it screams at you that your thoughts and arguments are very likely very wrong you may go back and refine or deepen your understanding of the data-gereative process you like to study.

Statistical approaches for curve shape as a function of time

12 Recommendations

Matthew Borrelli

asked a question related to Advanced Statistical Analysis

Question

2 answers

Aug 26, 2022

Hi all,

As a non-statistician, I have a (seemingly) complicated statistical question on my hands that I'm hoping to gather some guidance on.

For background, I am studying the spatial organization of a biological process over time (14 days), in a roughly-spherical structure. Starting with fluorescence images (single plane, no stacks), I generate one curve per experimental day that corresponds to the average intensity of the process as I pass through the structure; this is in the vein of line intensity profiling for immunofluorescence colocalization. I have one curve per day (see attached) and I'm wondering if there are any methods that can be used to compare these curves to check for statistical differences.

Any direction to specific methods or relevant literature is deeply appreciated, thank you!

Cheers,

Matt

Edit to add some additional information: the curves to be analyzed will be averages of curves generated from multiple biological replicates, and therefore will have error associated with them. Across the various time points and conditions, the number of values per curve ranges roughly from 200 -- 1000 (one per pixel).

Profile stats examp
le.png
275.56 KB

Relevant answer

Matthew Borrelli

Aug 31, 2022

Answer

Hi Mairtin Mac Siurtain, thanks so much for your reply!

This is great information. I'll take a look first at MANOVA and then at RBSCIs if indicated. Many thanks!

Best,

Matt

Are these variations to the analytical methods producing the same results or significantly different results?

0 Recommendations

Jeremy Polreis

asked a question related to Advanced Statistical Analysis

Question

3 answers

Aug 29, 2022

We are altering our original analytical method to save time and cost and are trying to come up with a good footing to say if the method results are the same or if they are significantly different. We are taking actual samples and analyzing thru both the original analytical method that we validated and then also thru the alterations we made. I am not very knowledgeable with statistics but is there a statistical way to say if the methods are producing results that are the same or significantly different? Or is there a more common method to determine if two analytical methods are the same? I have attached the results from each of the variations we tried along with the original method resutls.

Data Set for Method Variation
s.xlsx
13.63 KB

Relevant answer

Hasan Issa Mirza

Aug 31, 2022

Answer

نعم يؤدي الى اختلاف النتائج لان كل اداة احصائية تستخدم وفق معايير خاصة لها لايمكن استخدام عدة وسائل احصائية للغرض نفسه David Morse

Meta analysis by R studios?

3 Recommendations

Mirza Mienur Meher

asked a question related to Advanced Statistical Analysis

Question

4 answers

Jun 8, 2022

Different steps and procedure with complete example of a meta analysis.

What is pooled prevalence?

Relevant answer

Muhammad Irfan Malik

Aug 28, 2022

Answer

You should try this text.

Harrer, M., Cuijpers, P., Furukawa, T.A., & Ebert, D.D. (2021). Doing Meta-Analysis with R: A Hands-On Guide. Boca Raton, FL and London: Chapman & Hall/CRC Press. ISBN 978-0-367-61007-4.

How do we calculate 95% confidence intervals for RMSE?

0 Recommendations

Samiksha S. V.

asked a question related to Advanced Statistical Analysis

Question

12 answers

Feb 5, 2021

I have two datasets (measured and modelled).

I want to calculate 95% confidence intervals for RMSE.

Cany anyone please help me with this?

Thank you in advance

Relevant answer

Sal Mangiafico

Aug 21, 2022

Answer

I found the code linked to by Jeff Rothschild to be a little difficult to follow. I adapted the code to make it a little more straightforward. Below is some R code which calculates the confidence interval by this method and by bootstrap. RMSE is calculated from a vector of Actual values and a vector of Predicted values.

### rmseCI function adapted from

### https://gist.github.com/brshallo/7eed49c743ac165ced2294a70e73e65e

### Bootstrapped confidence interval adapted from

### https://rcompanion.org/handbook/G_10.html

if(!require(rcompanion)){install.packages("rcompanion")}

rmseCI = function(rmse, n, conf, digits=3){

p_lower = 0.5-conf/2

p_upper = 0.5+conf/2

DF = data.frame(

RMSE = rmse,

lower.ci = signif(sqrt(n / qchisq(p_upper, df = n)) * rmse,

digits=digits),

upper.ci = signif(sqrt(n / qchisq(p_lower, df = n)) * rmse,

digits=digits))

row.names(DF)=1

return(DF)}

#######################

Actual = 1:24

Predicted = c(1,3,4,5,2,3,6,3,4,7,8,9,11,12,15,16,18,19,20,22,25,24,23,24)

library(rcompanion)

RMSE = efronRSquared(actual=Actual, predicted=Predicted, statistic="RMSE")

RMSE

#######################

N = length(Actual)

rmseCI(RMSE, n=N, 0.95)

###############################

library(boot)

Data = data.frame(Actual, Predicted)

Function = function(input, index){

Input = input[index,]

Result = efronRSquared(actual = Input$Actual,

predicted = Input$Predicted,

statistic = "RMSE")

return(Result)}

Boot = boot(Data, Function, R=5000)

boot.ci(Boot, conf = 0.95, type = "bca")

RMSE

hist(Boot$t[,1])

Any recommendations for Cox Regression analysis, cases dropped due to missing values?

4 Recommendations

Sarang Jokhio

asked a question related to Advanced Statistical Analysis

Question

2 answers

Aug 18, 2022

Greetings Fellow Researchers,

I am a newbie in using survival analysis (Cox Regression). In my data-set 10-40% cases have missing values (Depending on the variable I include in my analysis). Based on this I have two questions,

1- there are any recommendations on accepted percentage of cases dropped (missing values) from the analysis?

2- Should I impute the missing values of all the cases that were dropped (lets say maximum of 40%).

Thank you so much for your time and kind consideration.

Best,

Sarang

Relevant answer

Aug 20, 2022

Answer

I have no first-hand experience to offer you either. But this systematic review article may give you some ideas.

Article How are missing data in covariates handled in observational ...

What's the best statistical analysis for germination rate with different hormonal treatments?

4 Recommendations

Nicolò M. Villa

asked a question related to Advanced Statistical Analysis

Question

6 answers

Aug 5, 2022

I'm doing a germination assay of 6 Arabidopsis mutants under 3 different ABA concentrations in solid medium. I've 4 batches. Each batch has 2 plates for each mutant, 3 for the wild type, and each plate contains 8-13 seeds. Some seeds and plates are lost to contamination. So I don't have the same sample size for each mutant in each batch. In same cases the mutant is no longer present in the batch. I've recorded the germination rate per mutant after a week and expressed it as percentage. I'm using R. How can I analyse them best to test if the mutations affect the germination rate in presence of ABA?

I've two main questions:

1. Do I consider each seed as a biological replica with categorical type of result (germinated/not-germinated) or each plate with a numerical result (% germination)?

2. I compare treatments within the genotype. Should I compare mutant against wild type within the treatment, the treatment against itself within mutant, or both?

Relevant answer

Aug 10, 2022

Answer

I suggest using mosaic plots rather than (stacked) barplots to visualize your data.

The chi²- and p-values can be calculated simply via chi²-tests (one for each ABA conc) -- assuming the data are all independent (again, please note that seedlings on the same plate are not independent). If you have no possibility to account for this (using a hierarchical/multilevel/mixed model), you may ignore this in the analysis but then interpret the results more carefully (e.g., use a more stringent level of significance than usual).

A binomial model (including genotype and ABA conc as well as their interaction) would allow you to analyse the difference between genotypes in conjunction with ABA conc. However, due to the given experimental design (only three different conc values) this is cumbersome to interpret (because you cannot establish a meaningful functional relationship between cons and probability of germination).

How to calculate the standard value of Mahalanobis distance to check multivariate outliers?

9 Recommendations

Bhuvanesh Kumar Sharma

asked a question related to Advanced Statistical Analysis

Question

9 answers

Feb 26, 2017

For Individual responses I can calculate the value with respect to which we have to check the outlier don't know?

Please help in this regards

Relevant answer

Prosper Yeng

Aug 4, 2022

Answer

Kindly follow the SPSS structure for determining the critical value here: It is pretty simple and intuitive

https://www.statology.org/mahalanobis-distance-spss/

0 Recommendations

Applications of discriminant analysis in groundwater assessment?

asked a question related to Advanced Statistical Analysis

Question

23 answers

Jul 31, 2022

I need suggestions for groundwater assessment-related articles used discriminant analysis in their analysis and study, as well as how to apply this analysis in R programming.

Reghais.A

Thanks

Relevant answer

Faraed Salman

Jul 31, 2022

Answer

https://www.sciencedirect.com/science/article/am/pii/S0048969717330814,this link may be useful.

What happens if in binary logistic regression, the intercept(b0) is statistically not significant?

27 Recommendations

Cliff Richardo

asked a question related to Advanced Statistical Analysis

Question

5 answers

Jul 28, 2022

I'm trying to construct a model for binary logistics. The first model includes 4 variable of predictor and the intercept is not statistically significant. Meanwhile, in the second model, I exclude one variable from the first model and the intercept is significant.

The consideration that I take here is that:

The pseudo R² of the first model is better at explaining the model rather than the second model.

Any suggestion which model should I use?

Relevant answer

Jul 28, 2022

Answer

You should use the model that makes more sense, practically and/or theoretically. A high R² is not in indication for the "goodness" of the model. A higher R² can also mean that the model makes more wrong predictions with a higher precision.

Do not build your model based on observed data. Build your model based on understanding (theory) and the targeted purpose (simple prediction, exptrapolation (e.g. forecast), testing meaningful hypotheses etc.)

Removing a variable from the model changes the meaning of the intercept. The intercepts in the two models have different meanings. They are (very usually) not comparable. The hypothesis tests of the intercepts of the two models test very different hypotheses.

PS: a "non-significant" intercept term just means that the data are not sufficient to statistically distinguish the estimated value (the log odds given all X=0) from 0, what means that you cannot distinguish the probability of the event (given all X=0) from 0.5 (the data are compatible with probabilities larger and lower 0.5). This is rarely a sensible hypothesis to test.

Is there any alternative when cochran formula is not applicable?

16 Recommendations

Samran Asiabani

asked a question related to Advanced Statistical Analysis

Question

2 answers

Jun 19, 2022

i need a statistical result to test my hypothesis, but my N isn't so rich to put in cochran formula!

besides that, I can not collect more information to solve this issue, do you know any other reliable method that fits this issue?

Relevant answer

Samran Asiabani

Jul 20, 2022

Answer

Hi dear Mario

I'm trying to classify hard-wired human settlement preferences based on archaeological evidence in the Paleolithic era, this is an effort to update biophilia hypothesis.

But for doing hypothesis testing i need to quantify the data and the first step is to make a database from as many as possible Paleolithic geolocations then put them through GIS analysis and test my hypothesis based on the data acquired in SEM (which mostly is that what environmental attribute has how much impact on our decision to select a settlement in what period of paleohistory).

PS. I tested a small database but i was wondering if there is a geolocation database of archaeological sites that i coulden't find yet.

Bests

SAM

How to understand this very large odd ratios (multilevel logistic regression)?

0 Recommendations

Sarah Woods

asked a question related to Advanced Statistical Analysis

Question

3 answers

Jul 13, 2022

300 Participants in my study viewed 66 different moral photos and had to make a binary choice (yes/no) in response to each. There were 3 moral photo categories (22 positive images, 22 neutral images and 22 negative images). I am running a multilevel logistic regression (we manipulated two other aspects about the images) and have found unnaturally high odd ratios (see below). We have no missing values. Could anyone please help me understand what the below might mean? I understand I need to approach with extreme caution so any advice would be highly appreciated.

Yes choice: morally negative compared morally positive (OR=441.11; 95% CI [271.07,717.81]; p<.001)

Yes choice: morally neutral compared to morally positive (OR=0.94; 95% CI [0.47,1.87]; p=0.86)

It should be noted that when I plot the data, very very few participants chose yes in response to the neutral and positive images. Almost all yes responses were given in response to the negative images.

Relevant answer

Thom Baguley

Jul 14, 2022

Answer

I think you have answered your question: "It should be noted that when I plot the data, very very few participants chose yes in response to the neutral and positive images. Almost all yes responses were given in response to the negative images."

This is what you'd expect even in a simple 2x2 design. If the probability of a yes response in the positive condition is very high and the probability very low in the negative condition then the OR could be high as its the ratio of a big probability to a very low one.

This isn't unnatural unless the raw probabilities don't reflect this pattern. (There might still be issues but not from what you described).

What statistical test should I use?

50 Recommendations

Noramira Nozmi

asked a question related to Advanced Statistical Analysis

Question

17 answers

Jun 27, 2022

Hi,

I have 4 animals (A, B, C &D), and did behavioural observation for 90 days. The parameter that I want to test is the temperature and time (whether the temperature/time is affecting their behaviour or not). I categorized the temperature as cool temperature and hot temperature. Time is categorized between morning and afternoon.

The questions are -

1) is it correct to use non-parametric test since the subjects is small (n>10)?

2) if I want to differentiate the behaviour between the individual animals, do I use paired t-test or independent t-test?

3) if I want to see the interaction between individual subjects X temperature X time ; is the data considered as dependent or independent?

Relevant answer

Pichakoon Auttawechasakoon

Jul 5, 2022

Answer

Thank you Noramira Nozmi for the question and professors for the useful answers.

How to calculate within subject correlation (in SPSS)?

36 Recommendations

Anna Gaidosch

asked a question related to Advanced Statistical Analysis

Question

4 answers

Jun 29, 2022

In my research I need to calculate the correlation between two variables. Each variable was measured 100 times for each subjects (N=20), so in total, I have 2000 data points, of which every 100 belongs to the same participant.

I have calculated a simple bivariate correlation on these 2000 data points without taking the participant effects into account. I know that this is wrong and the participant effects should be accounted for, but I didn't manage to find a way to do this (in SPSS). Any help would be highly appreciated.

Relevant answer

https://www.youtube.com/watch?v=cWlkPF1UoHs&t=15s

Jun 29, 2022

Answer

You have a nested (multilevel, hierarchical) designs where observations/time points (Level 1) are nested within participants (Level 2). This nested structure should be taken into account because it causes dependencies in the data. Perhaps you could use software for multilevel (hierarchical linear) modeling to compute the within-person and between-person correlation matrices. The software Mplus, for example, allows the calculation of these two matrices with the analysis option TWOLEVEL BASIC. See my Youtube video about this here (I discuss the matrices in the second half of the video):

I'm not sure if and how this could be done in SPSS as well.

I was wondering where is the ECM (the error correction term) in this data? I wanted to figure out the ECT(-1) location?

10 Recommendations

Sadik Aden Dirir

asked a question related to Advanced Statistical Analysis

Question

5 answers

Jun 14, 2022

I am using an ARDL model however I am having some difficulties interpreting the results. I found out that there is a cointegration in the long run. I provided pictures below.

123.PN
G
6.55 KB
111.PN
G
20.90 KB
333.PN
G
19.84 KB
5555.P
NG
18.66 KB
777.PN
G
11.42 KB

Relevant answer

Kehinde Mary Bello

Jun 29, 2022

Answer

Mr a. D.

The ECT(-1)os always the lagged value of your dependent variable.

Regards

15 Recommendations

Which variable should be constrained in confirmatory factor analysis in stata? Is it possible to have a model without a constrained variable?

asked a question related to Advanced Statistical Analysis

Question

7 answers

Jun 29, 2022

In confimatory factor analysis (CFA) in Stata, the first observed variable is constrained by default (beta coefficient =1, mean of latent variable =constant).

I don't know what is it! Because, other software packages report beta coefficients of all observed variables.

So, I have two questions.

1- Which variable should be constrained in confirmatory factor analysis in stata?

2- Is it possible to have a model without a constrained variable like other software packages?

Relevant answer

Holger Steinmetz

Jun 29, 2022

Answer

Hello Seyyed,

I guess you mean with beta the factor loading? Traditionally, these are denoted with lambda but probably, Stata treats these differently.

The fixation of the "marker variable" is needed a) to assign a metric to the latent variable--those of the marker, and to b) identify the equation system.

As far as I know it does not matter which variable you choose unless it is no valid indicator of the latent.

HTH

Holger

How can I analyze the relationship between a quantitative and a categorical variable in a within-subject design?

20 Recommendations

Anna Gaidosch

asked a question related to Advanced Statistical Analysis

Question

6 answers

Jun 27, 2022

For my bachelor thesis I'm conducting a study on the relationship between eye-movements and memory. One of the hypotheses is that the number of fixations made during the viewing of a movie clip will be positively related to the memory that movie clip.

Each participant viewed 100 movie clips, and the number of fixations were counted for each movie clip for each participant. Later participants' memory of the clips were tested and each movie was categorized as "remembered" or "forgotten" for each participant.

So, for each participants there are 100 trials with the corresponding number of fixations and categorization as "remembered" or "forgotten".

My first idea was to do a paired-samples t-test (to compare the number of fixations between "remembered" and "forgotten"), but I didn't find a way to do that in SPSS with this file format as there are 100 rows for each participant. I though of calculating the average number of fixations for the remembered vs forgotten movies per participant and compare and do a t-test on these means (one mean per participant for both categories) but this way the means get distorted because some subjects remember way more clips than others (so the "mean of the means" is not the same as the overall mean).

Now I'm thinking that doing a t-test might not be appropriate at all, and that logistic regression would be a better choice (to see how well the number of fixations predicts whether a clip will be remembered vs forgotten), but I didn't manage to find out how to do this in SPSS in for a within subject design with multiple trials per participant. Any help/suggestions would be highly appreciated.

Relevant answer

Jun 27, 2022

Answer

I believe Blaine Tomkins meant to describe the data as having a LONG format, not a wide format. Apart from that, I concur with his advice. SPSS can estimate that model. Look up the GENLINMIXED command:

https://www.ibm.com/docs/en/spss-statistics/28.0.0?topic=reference-genlinmixed

A good resource is the book by Heck, Thomas & Tabata (if you can get your hands on it):

https://www.routledge.com/Multilevel-Modeling-of-Categorical-Outcomes-Using-IBM-SPSS/Heck-Thomas-Tabata/p/book/9781848729568

HTH.

Comparison of Conditional Independence Testings?

5 Recommendations

Mahsa Lotfi

asked a question related to Advanced Statistical Analysis

Question

9 answers

Jun 24, 2022

I have applied several conditional independence testing methods:

1- Fisher's exact test

2- Monte-Carlo Chi-sq

3- Yates correction of Chi-sq

4- CMH

The number of distinct feature segments that reject the independence (null H) is different in each method. Which method is more reliable and why?

(The data satisfies the prerequisite of all of these methods)

Relevant answer

Ronán Michael Conroy

Jun 25, 2022

Answer

With low expected cell counts there are various suggestions as to when Chi-sq is no longer trustworthy. I like the Np15 rule because it's simple and seems to work well. In this rule, N is the total number of observations, and p is the proportion in the smaller group. So if you have a table that contains 50 observations, and 10% of them fall into the smaller group (so P = 0·1). Then Np is (50 x 0·1) which is 5. The table fails the Np > 15 test – not enough data to do a Chi-squared test.

What correlation test is the most suitable for analyzing two data activities from tested compounds?

7 Recommendations

Harto Widodo

asked a question related to Advanced Statistical Analysis

Question

3 answers

Jun 24, 2022

I have six kinds of compounds which I then tested for antioxidant activity using the DDPH assay and also anticancer activity on five types of cell lines, so I got two types of data groups:

1. Antioxidant activity data

2. Anticancer activity (5 types of cancer cell line)

Each data consisted of 3 replications. Which correlation test is the most appropriate to determine whether there is a relationship between the two activities?

Relevant answer

David Eugene Booth

Jun 24, 2022

Answer

Just do logistic regression is what I had in mind. The DV might be antcancer activity (yes /no) same for antioxidant activity. Best wishes David Booth

Observed vs predicted probabilities graph in R?

11 Recommendations

Zuffain Hussan

asked a question related to Advanced Statistical Analysis

Question

3 answers

Jun 23, 2022

I want to draw a graph between predicted probabilities vs observed probabilities. For predicted probabilities I use this “R” code (see below). Is this code ok or not ?.

Could any tell me, how can I get the observed probabilities and draw a graph between predicted and observed probability.

analysis10<-glm(Response~ Strain + Temp + Time + Conc.Log10

+ Strain:Conc.Log1+ Temp:Time

,family=binomial(link=logit),data=df)

predicted_probs = data.frame(probs = predict(analysis10, type="response"))

I have attached that data file

Household fli
es.csv
3.75 KB

Relevant answer

Jun 23, 2022

Answer

Plotting observed vs predicted is not sensible here.

You don't have observed probabilities; you have observed events. You might use "Temp", "Time", and "Conc.Log10" as factors (with 4 levels) and define 128 different "groups" (all combinations of all levels of all factors) and use the proportion of observed events within each of these 128 groups. But you have only 171 observations in total. No chance to get any reasonable proportions (you would need some tens or hundreds of observation per groups for this to work reasonably well).

Interpreting Conditional Independence Testing Results?

0 Recommendations

Mahsa Lotfi

asked a question related to Advanced Statistical Analysis

Question

7 answers

Jun 8, 2022

I have a huge dataset for which I'd like to assess the independence of two categorical variables (x,y) given a third categorical variable (z).

My assumption: I have to do the independence tests per each unique "z" and even if one of these experiments shows the rejection of null hypothesis (independence), it would be rejected for the whole data.

Results: I have done Chi-Sq, Chi with Yates correction, Monte Carlo and Fisher.

- Chi-Sq is not a good method for my data due to sparse contingency table

- Yates and Monte carlo show the rejection of null hypothesis

- For Fisher, all the p values are equal to 1

1) I would like to know if there is something I'm missing or not.

2) I have already discarded the "z"s that have DOF = 0. If I keep them how could I interpret the independence?

3) Why do Fisher result in pval=1 all the time?

4) Any suggestion?

#### Apply Fisher exact test

fish = fisher.test(cont_table,workspace = 6e8,simulate.p.value=T)

#### Apply Chi^2 method

chi_cor = chisq.test(cont_table,correct=T); ### Yates correction of the Chi^2

chi = chisq.test(cont_table,correct=F);

chi_monte = chisq.test(cont_table,simulate.p.value=T, B=3000);

Relevant answer

David Morse

Jun 8, 2022

Answer

Hello Masha,

Why not use the Mantel-Haenszel test across all the z-level 2x2 tables for which there is some data? This allows you to estimate the aggregate odds ratio (and its standard error), thus you can easily determine whether a confidence interval includes 1 (no difference in odds, and hence, no relationship between the two variables in each table) or not.

That seems simpler than having to run a bunch of tests, and by so doing, increase the aggregate risk of a type I error (false positive).

Good luck with your work.

Hypothesis definition for one-way ANOVA (2 levels): Is it possible to define as null hypothesis, that there is *no* difference between the groups?

11 Recommendations

Johanna Schulz

asked a question related to Advanced Statistical Analysis

Question

17 answers

Jun 1, 2022

Hello,

I am writing a research proposal in the field of Marketing Theory right now. The hypothesis must be developed based on the theory (of a paper by Ofek et al. (2011)), that multichannel retailers must increase in-store assistance levels to decrease product returns. Therefore, the authors propose that retailers with a high level of in-store assistance have lower returns than vice versa.

I want to test this relationship. My hypothesis is based on my belief (based on previous literature), that this relationship as described by the authors has changed. Therefore, I expect both groups (retailers with high vs. low in-store assistance levels) to have the same product return rates.

Now my question:

Is this hypothesis in a statistical context correct?

"The average product return rate of a B&C retailer with a high level of in-store assistance is similar to a B&C retailer with a low level of in-store assistance in the clothing market."

My problem is: If I conduct, for example, an ANOVA test and the results are significant, I would normally conclude with "there is a significant difference in the group means, therefore I can reject the null hypothesis.

In my case, with a significant test result, I would then need to *reject* my hypothesis. Is this allowed or statistical incorrect?

Thank you so much for any help or feedback. I would appreciate any thoughts on this.

Kind regards,

Johanna

Relevant answer

Jun 2, 2022

Answer

Just a note to the post of Ronán Michael Conroy : TOST is equivalent to interpreting a (two-sided) confidence interval, something that might be easier to understand.

Johanna Schulz , maybe you can contact Ofek et al. to get the data or discuss directly what they would consider a trivial effect. Getting the data would be ideal because then you could directly test the hypothesis that the effect in their study equals the effect equals in your study. If the estimated difference of these effects is negative (the estimate for your study is smaller then for theirs) and when you can reject the hypothesis, then you have reason to believe that your effect is smaller then theirs (there may still be an effect-but not as large as found be Ofek et al.).

PS: This seems to be a nice example demonstrating how worthless publications are that do not report and test meaningful models and don't give any effect quantification, and don't provide the actual data.

Questions about Lavaan (in R studio). Can we use lavaan for categorical variable (e.g. high and low ethnic diversity)?

20 Recommendations

Edita Kristofora

asked a question related to Advanced Statistical Analysis

Question

3 answers

Jun 1, 2022

Dear fellow researchers,

Usually we use lavaan for continuous variable, so can we still use lavaan for categorical variable (e.g. high and low ethnic diversity composition)?

Thank you very much!

Best,

Edita

Relevant answer

David Morse

Jun 1, 2022

Answer

Hello Edita,

A categorical variable having only two levels (e.g., coded 0/1) can be used in any linear model as an IV or antecedent variable.

If such a variable is the DV, however, it likely makes more sense to switch from linear to logistic models.

Good luck with your work.

Which statistical test?? Think the answers really simple?

27 Recommendations

Alana Hunt

asked a question related to Advanced Statistical Analysis

Question

6 answers

May 23, 2022

I am looking at gender equality in sports media. I have collected two screen time measures from TV coverage of a sport event - one time for male athletes and one time for female athletes.

i am looking for a statistical test to give evidence that one gender is favoured. I assume I have to compare each genders time against the EXPECTED time given a 50/50 split (so male time + female time / 2), as this would be the time if no gender was favoured.

my first though was chi square? But I’m not sure that works because there’s really only one category. I am pregnant and so my brain is not working at the moment lol. I think the answer is really simple but I just can’t think of anything.

Relevant answer

Meseret Mesfin Bambo

May 31, 2022

Answer

independent sample t-test best

Logistic regression: should the variables need to be normalized before analysis?

12 Recommendations

Mateusz Soliński

asked a question related to Advanced Statistical Analysis

Question

7 answers

May 25, 2022

My question concerns the problem of calculating odds ration in logistic regression analysis when the input variables are from different scales (i.e.: 0.01-0.1, 0-1, 0-1000). Although the coefficients of the logistic regression looks fine, the odds ratio values are, in some cases, enormous (see example below).

In the example there were no outlier values in each input variables.

What is general rule, should we normalize all input variables before analysis to obtain reliable OR values?

Sincerely

Mateusz Soliński

OR_questi
on.png
8.41 KB
log_questi
on.png
34.78 KB

Relevant answer

Anuraj Nayarisseri

May 30, 2022

Answer

You need to interpret OR using Exponential of estimates.

Is it possible to perform ANOVA on percentage data over 100?

20 Recommendations

Kamyar Amirhosseini

asked a question related to Advanced Statistical Analysis

Question

11 answers

May 29, 2022

I have a data set that includes percentages of algal biomass increase in a growth medium over time. In the majority of the groups, algal biomass has more than doubled and therefore percentage increase is well over the 100% mark. Moreover, my current data does not follow the equality of variance assumption of ANOVA. Will I be able to perform ANOVA with percentage data that are over 100? And if yes, what data transformations do you suggest?

Relevant answer

Aref Wazwaz

May 29, 2022

Answer

Dear Kamyar Amirhosseini . See the following useful RG link:

Article Application of student's t-test, analysis of variance, and covariance

Does anyone know how to perform ANOVA and tukey-HSD in R with a dataset containing NAs?

20 Recommendations

Nicolò M. Villa

asked a question related to Advanced Statistical Analysis

Question

5 answers

May 11, 2022

I need to run artanova and tukey-hsd for the interactions among the treatments, but my dataset has few NAs due to experimental errors.

When I run :

anova(model<- art(X ~ Y, data = d.f))

I get the warning :

Error in (function (object) :

Aligned Rank Transform cannot be performed when fixed effects have missing data (NAs).

Manually lifting is not an option because each row is a sample and it would keep NAs, simply in wrong samples.

Relevant answer

Thom Baguley

May 25, 2022

Answer

The issue is that you are using art() from ARTool to fit the model and that can't handle missing values. You could use listwise deletion by passing na.omit(d.f) to the art() function - though this would potentially bias results (though no more than using na.rm=TRUE in anova() or lm().

A better solution is to use multiple imputation (e.g., with the mice package in R), though I'm not sure if that works directly with art() models or to use a different approach to handle your data (which presumably aren't suitable for linear models). You could use a transformation, a generalized linear model, robust regression etc. depending on the nature of the data.

Replicating Eview plot using STATA command?

9 Recommendations

Aliu Adebiyi

asked a question related to Advanced Statistical Analysis

Question

1 answer

May 23, 2022

Dear all, I want to replicate an Eview plot (attached as Plot 1) in STATA after performing a time series regression. I made an effort to produce this STATA plot (attached as Plot 2). However, I want Plot 2 to be exactly the same thing as Plot 1.

Please, kindly help me out. Below are the STATA codes I run to produce Plot 2. What exactly did I need to include?

The codes:

twoway (tsline Residual, yaxis(1) ylabel(-0.3(0.1)0.3)) (tsline Actual, yaxis(2)) (tsline Fitted, yaxis(2)),legend(on)

Plot 1
.jpg
46.85 KB
Plot 2
.png
74.61 KB

Relevant answer

Aliu Adebiyi

May 23, 2022

Answer

OlaOluwa Simon Yaya Olusanya E. Olubusoye Oluwaseun Aramide Otekunrin Adedayo Adepoju Hammed Abiola Olayinka Toheeb Aduramomi Jimoh

Which model will be most appropriate for prediction in the given data set?

0 Recommendations

Vijay Kumar Koli

asked a question related to Advanced Statistical Analysis

Question

3 answers

May 22, 2022

One dependent variable (continuous) ~ two continuous and two categorical (nominal) independent variables

I'm seeking for the best method for predicting a data collection with more than 100 sites. The distribution of all continuous variables is not normally distributed.

Relevant answer

Gioacchino de Candia

May 23, 2022

Answer

Beyond the scarcity of information, are you sure of the relationship between variables?

The mediation analysis using PROCESS model 4 macro software?

11 Recommendations

Esmaeel Saemi

asked a question related to Advanced Statistical Analysis

Question

8 answers

May 21, 2022

Hi all,

Please let me know If you know any things about my question. Recently I have conducted a mediation analysis using PROCESS model 4 macro software and with the bootstrapping method (Hayes, 2017). As you know, in these conditions, we have three paths, A, B, C

path A is between IV and M(mediator), path B is between Mand DV, and path C is between IV and DV when M also is in the model. Please let me know when we can say the indirect effect is significant? should all three paths be significant before? for example when path C isn't significant, and or two paths C and A aren't significant, we can say the indirect effect is significant (I know when zero isn't in the confidence interval (BootLLCI and BootULCI), we have the significant indirect effect). what about when we have two and or three mediators in the model?

All the best,

Esmaeel,

Relevant answer

May 22, 2022

Answer

Stefan Poier Yes, a single-step test of the indirect effect with bootstrap CIs is what I would also recommend. MacKinnon et al. showed that this provides for the most powerful test. That's why I think path analysis with simultaneous estimation of all effects in a single model is the way to go. Nonetheless, I find the logic of the Baron and Kenny approach to still be worthwhile for beginners to think about in order to better understand what mediation actually is.

MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7(1), 83-104.

MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99-128.

Hadir Abdelmoneim Bassyouni

9 Recommendations

asked a question related to Advanced Statistical Analysis

How can I calculate POD and FAR in different rainfall intensities using R or python?

Question

2 answers

May 14, 2022

Hello,

I have a huge number of data and I need to calculate such categorical statistical indices (e.g., POD, FAR, CSI, ETS) using python or R. I will be thankful for any kind of help.

Regards,

Relevant answer

Anton Vrdoljak

May 14, 2022

Answer

Binary Forecast Verification is included within SwirlsPy - open-source Python lib:

https://docs.com-swirls.org/auto_examples/ver_deterministic.html

What is the best statistical test to assess adverse drug reactions? Also, what is the best method to compare ADRs in two different drugs?

8 Recommendations

Rohan Bir Singh

asked a question related to Advanced Statistical Analysis

Question

3 answers

May 8, 2022

I am working on reporting adverse outcomes of drugs. So, I wanted to know what would be the ideal tests to assess and compare ADRs between two drugs? The data are limited to the clinical presentation of patients with ADRs.

Relevant answer

Barry Turner

May 10, 2022

Answer

The ideal test for determining adverse reactions is to look at the biochemistry at the earliest opportunity.

Many of the ADR's that are supposedly discovered statistically after yellow card reporting were eminently predictable.

Dangerous Drugs Lectur
e.pptx
1.91 MB

How to Identify a pattern based on given feature(s) in time-series data?

12 Recommendations

Rajesh K Ahir

asked a question related to Advanced Statistical Analysis

Question

10 answers

May 5, 2022

I am working with electricity time-series data collected at 15 minutes intervals. I am looking for a procedure/theory to find the pattern/sequence in the time-series data based on given features. As I am working with electricity time-series data and solving the problem of solar PV identification from these data, the given features would be:

1. There is a fall in electricity consumption during 7am-8am as generation from the PV starts.

2. There is a rise in electricity consumption during 5pm-6pm as generation from the PV ends.

See the attached figure to understand the above two features.

I have gone through the literature for the same. I got the following:

Gaussian prior: This works with considering the prior knowledge and evidence. In this case, the prior knowledge would be above two features, and the evidence would be the time-series data.
Cross-correlation: This basically looks for the relation between two patterns.
Various ML techniques: The different ML techniques can be applied such as clustering, HMM, DTW etc.

I am not looking to solve this problem with option 3. Can anyone guide me with the option 1 as it looks more relevant to my problem. I cannot understand how Gaussian prior can fit into the problem. Summarily, I want to utilize above two features as the prior knowledge and use the given data(electricity time-series) as the evidence to prove that the solar PV panel is present or not.

20220505_104257 (
1).jpg
186.55 KB

Relevant answer

Anuraj Nayarisseri

May 7, 2022

Answer

Refer the one in python with seasonality that should resolve the issue https://www.bounteous.com/insights/2020/09/15/forecasting-time-series-model-using-python-part-two/?fbclid=IwAR1Z0V6jS4ruRrpbneHqqzrUA-_fu8nCV_ZXD4GdXSx4UP8WmcEbeCWpbsU

How to compare two independent factors on 3 different crops statistically? Which test should I use ?

63 Recommendations

Mohamed Badawy

asked a question related to Advanced Statistical Analysis

Question

9 answers

May 3, 2022

I have two different formulations of one active ingredient that was teated on 3 crops (A,b,c) Using the same concentration and environmental conditions. I want to check if these two formulations used are acting significantly the same on those crops or not. n=10 and the data is perfectly normal. It would be great to compare the means of each of the 3 crops between these two formulations used. The mean of crop(A) treated by formula X with the means of the crop(A) treated by formula Y of the same active ingredient. That is To see if they are acting similarly in terms of residue detected on those 3 crops

I have the feeling that using normal Anova is not correct! any advice?

Relevant answer

Valentine Joseph Owan

May 3, 2022

Answer

It sounds like an independent t-test case. You can use independent t-test to compare the mean of The mean of crop(A) treated by formula X and the mean of the crop(A) treated by formula Y of the same active ingredient. Then you repeat the comparison for crops B and C. Three separate independent t-tests in this case.

Analysing data about diet habits regarding BMI?

6 Recommendations

Aleksandar P. Medarevic

asked a question related to Advanced Statistical Analysis

Question

11 answers

Apr 23, 2022

Dear researchers,

I need to analysie my data about nutrition habits. My aim is to examine differences regarding BMI. Therefore, I split population into three group <5th, from5thto85th, >85thpercentile of BMI. I calculate BMI on the entire population of 7th grade children (perhaps, I need to calculate BMI for boys and girls separately). The example of data is attached.

The questions are:

Should I calculate BMI for each gender?
How to provide meaningful research? It is hard to explaine findings using large number of groups. My data set is large, therefore almost every chi sqare is <.05.
Could Pearson residuals be a soultion.
Any insight is wellcome.

Thanks.

data.x
lsx
8.87 KB

Relevant answer

D. Eastern Kang Sim

Apr 25, 2022

Answer

If it's BMI for 7th graders, the index must be standardized by norms. Researchers tend to use BMI z-score using the CDC norms (google search BMIz CDC), but there are other standards as well (WHO, IOFT, etc.). BMI z-score is age and gender adjusted. Note, it's not simple standardization as the z-score is based on the growth trajectories. Typically, you need to run a SAS macro to compute BMI z-scores, but a Canadian Pediatric Group has developed a shiny app to compute these scores for you. (https://cpeg-gcep.shinyapps.io/who2007_cpeg/ ; they have CDC options as well). If you are studying nutrition or feeding 'habits', I assume you have repeated observations. In this case, you would run linear mixed effect model adjusting for baseline BMI and other factors. (google search for LME or analysis of repeated observation). Depends on how data were collected, things can be quite complicated. I encourage you to consult with a biostatistician for further guidance.

Network Analysis Question: what is the minimum sample size (for each group) for comparing two networks?

28 Recommendations

Mohamad reza Davoudi

asked a question related to Advanced Statistical Analysis

Question

7 answers

Apr 22, 2022

Hello

I want to camparing two Psychiatric groups in their connections .

But, because of our financial limitation , we need the minimum sample size.

Relevant answer

Gioacchino de Candia

Apr 24, 2022

Answer

The sample size always depends on the reference population.

Also, you need to clearly set the goals of the analysis.

12 Recommendations

Which one of these multilevel models are better? should the random equation variables be added also as covariates?

asked a question related to Advanced Statistical Analysis

Question

4 answers

Apr 21, 2022

Which one of these multilevel models are better? should the random equation variables be added also as covariates?

Model A: with random equation variables as covariates

Model B: without random equation variables as covariates

* Model A resulted in same results with a routine ologit. So, if model A is better than model B, what the philosophy of using multilevel mixed models (because of same result with ologit)?!

questi
on.png
29.87 KB

Relevant answer

Kelvyn Jones

Apr 22, 2022

Answer

You may want to deepen you understanding of multilevel random coefficient models by using the resources given here:

http://www.bristol.ac.uk/cmm/software/mlwin/mlwin-resources.html

and

http://www.bristol.ac.uk/cmm/learning/online-course/

it includes instructions for Stata

Modules

Using quantitative data in research (watch video introduction)
Introduction to quantitative data analysis (watch video introduction)
Multiple regression
Multilevel structures and classifications (watch video introduction)
Introduction to multilevel modelling
Regression models for binary responses
Multilevel models for binary responses
Multilevel modelling in practice: Research questions, data preparation and analysis
Single-level and multilevel models for ordinal responses
Single-level and multilevel models for nominal responses
Three-level multilevel models
Cross-classified multilevel models
Multiple membership multilevel models
Missing Data
Multilevel Modelling of Repeated Measures Data