Science topic

Regression - Science topic

Explore the latest questions and answers in Regression, and find Regression experts.
Questions related to Regression
  • asked a question related to Regression
Question
2 answers
How do we evaluate the importance of individual features for a specific property using ML algorithms (say using GBR) and construct an optimal features set for our problem.
image taken from: 10.1038/s41467-018-05761-w
Relevant answer
Answer
It all depends on what the endpoint is, the one you wish to analyze or evaluate. Linear regression analysis is nice, if you have a linear process.
  • asked a question related to Regression
Question
4 answers
Hello all,
I am running into a problem I have not encountered before with my mediation analyses. I am running a simple mediation X > M > Y in R.
Generally, I concur that the total effect does not have to be significant for there to be a mediation effect, and in the case I am describing, this would be a logical occurence, since the effects of path a and b are both significant and respectively are -.142 and .140, thus resulting in a 'null-effect' for the total effect.
However, my 'c path X > Y is not 'non-significant' as I would expect, rather, the regression does not fit (see below) :
(Residual standard error: 0.281 on 196 degrees of freedom Multiple R-squared: 0.005521, Adjusted R-squared: 0.0004468 F-statistic: 1.088 on 1 and 196 DF, p-value: 0.2982).
Usually I would say you cannot interpret models that do not fit, and since this path is part of my model, I hesitate to interpret the mediation at all. However, the other paths do fit and are significant. Could the non-fitting also be a result of the paths cancelling one another?
Note: I am running bootstrapped results for the indirect effects, but the code does utilize the 'total effect' path, which does not fit on its own, therefore I am concerned.
Note 2: I am working with a clinical sample, therefore the samplesize is not as great as I'd like group 1: 119; group2: 79 (N = 198).
Please let me know if additional information is needed and thank you in advance!
Relevant answer
Answer
Somehow it is not clear to my, what you mean with "does not fit"? Could you please provide the output of the whole analysis? I think this would be helpful.
  • asked a question related to Regression
Question
2 answers
I am evaluating impacts of major health financing policy changes in Georgia (country). The database is household level and it is not panel data. Continues outcome variable is out-of-pocket health spending (OOPs) and it exhibits skewed distribution as well as seasonality. The residuals are positively autocorrelated. The regression also takes independent variables connected with each household characteristics into account. My goal is to evaluate impact of health policies on the financial wellbeing of population connected with health care utilization determinants. Should I aggregate the dataset or keep it as it is?
Relevant answer
Answer
Thank you for the information
  • asked a question related to Regression
Question
9 answers
Here is the case, as I said, I am working on how Macroeconomic variables affect REIT Index Return. To understand how macroeconomic variables affect REIT which tests or estimation method should I use.
I know I can use OLS but is there any other method to use? All my time series are stationary at I(0).
Relevant answer
Answer
You can use econometric methods such as regression analysis, Vector Autoregression (VAR), or Granger causality tests to analyze how macroeconomic variables affect REIT index returns.
  • asked a question related to Regression
Question
10 answers
In the domain of clinical research, where the stakes are as high as the complexities of the data, a new statistical aid emerges: bayer: https://github.com/cccnrc/bayer
This R package is not just an advancement in analytics - it’s a revolution in how researchers can approach data, infer significance, and derive conclusions
What Makes `Bayer` Stand Out?
At its heart, bayer is about making Bayesian analysis robust yet accessible. Born from the powerful synergy with the wonderful brms::brm() function, it simplifies the complex, making the potent Bayesian methods a tool for every researcher’s arsenal.
Streamlined Workflow
bayer offers a seamless experience, from model specification to result interpretation, ensuring that researchers can focus on the science, not the syntax.
Rich Visual Insights
Understanding the impact of variables is no longer a trudge through tables. bayer brings you rich visualizations, like the one above, providing a clear and intuitive understanding of posterior distributions and trace plots.
Big Insights
Clinical trials, especially in rare diseases, often grapple with small sample sizes. `Bayer` rises to the challenge, effectively leveraging prior knowledge to bring out the significance that other methods miss.
Prior Knowledge as a Pillar
Every study builds on the shoulders of giants. `Bayer` respects this, allowing the integration of existing expertise and findings to refine models and enhance the precision of predictions.
From Zero to Bayesian Hero
The bayer package ensures that installation and application are as straightforward as possible. With just a few lines of R code, you’re on your way from data to decision:
# Installation devtools::install_github(“cccnrc/bayer”)# Example Usage: Bayesian Logistic Regression library(bayer) model_logistic <- bayer_logistic( data = mtcars, outcome = ‘am’, covariates = c( ‘mpg’, ‘cyl’, ‘vs’, ‘carb’ ) )
You then have plenty of functions to further analyze you model, take a look at bayer
Analytics with An Edge
bayer isn’t just a tool; it’s your research partner. It opens the door to advanced analyses like IPTW, ensuring that the effects you measure are the effects that matter. With bayer, your insights are no longer just a hypothesis — they’re a narrative grounded in data and powered by Bayesian precision.
Join the Brigade
bayer is open-source and community-driven. Whether you’re contributing code, documentation, or discussions, your insights are invaluable. Together, we can push the boundaries of what’s possible in clinical research.
Try bayer Now
Embark on your journey to clearer, more accurate Bayesian analysis. Install `bayer`, explore its capabilities, and join a growing community dedicated to the advancement of clinical research.
bayer is more than a package — it’s a promise that every researcher can harness the full potential of their data.
Explore bayer today and transform your data into decisions that drive the future of clinical research: bayer - https://github.com/cccnrc/bayer
Relevant answer
Answer
Many thanks for your efforts!!! I will try it out as soon as possible and will provide feedback on github!
All the best,
Rainer
  • asked a question related to Regression
Question
2 answers
SVM regression based on PSO optimisation
Relevant answer
Answer
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from pyswarm import pso
# Create a simple dataset
X, y = make_classification(n_samples=100, n_features=5, n_informative=3, n_redundant=2, random_state=42)
# Define the objective function to be minimized by PSO
def svm_pso_loss(params):
C, gamma = params
# Ensure the parameters are positive and within a reasonable range
if C <= 0 or gamma <= 0:
return float('inf') # return infinity if parameters are out of bounds
# Define the SVM classifier
model = SVC(C=C, gamma=gamma)
# Negative mean cross-validated score (we need to minimize it)
neg_accuracy = -cross_val_score(model, X, y, cv=5).mean()
return neg_accuracy
# Set bounds for C and gamma
lb = [0.01, 0.001] # lower bounds of C and gamma
ub = [100, 10] # upper bounds of C and gamma
# Run PSO
best_params, best_score = pso(svm_pso_loss, lb, ub, swarmsize=50, maxiter=100)
print("Best Parameters: C={}, gamma={}".format(best_params[0], best_params[1]))
print("Best Score: {}".format(-best_score))
  • asked a question related to Regression
Question
1 answer
I have three variables (A, B, C) and do a multilevel SEM with R - Lavaan.
I do not understand why the following two models render different regression coefficients:
in the 1st one I use the ready aggregated latent variables from the sheet directly, in the 2nd one I define them within the model, but the data behind is of course the same.
Could anybody please explain why that is and which model would be the right one to use?
1.) "
level: 1
A ~ B + C
level: 2
A ~ B + C
"
2.)"
level: 1
A =~ a1 + a2 + a3
B =~ b1 + b2 + b3 + b4
c =~ c1 + c2 + c3
A ~ B + C
level: 2
A =~ a1 + a2 + a3
B =~ b1 + b2 + b3 + b4
C =~ c1 + c2 + c3
A ~ B + C
"
thanks so much for any help!
Relevant answer
Answer
Hello! I have the same question - did you find out the cause of the problem? I get really weird results when defining my latent variables in the SEM model, but just fine results when I use the aggregated variables.
  • asked a question related to Regression
Question
2 answers
Dear All,
I have an imagery with a single fish species within each image along with a list of morphometric measurements of the fish (length, width, length of tail, etc). I would like to train a CNN model that will predict these measurements having as input only the images. Any ideas what kind of architecture is ideal for this task? I read about multioutput learning, but I haven't found a practical implementation in Python.
Thank you for your time.
Relevant answer
Answer
Thank you Aldo for your suggestion. I can see the general framework.
Cheers!
  • asked a question related to Regression
Question
2 answers
I have collected data at community level using cluster sampling. The ICC shows >10% variability at cluster level. However, I don't have relevant variable at cluster level (all variables are at household and individual levels).
Then, can I run multilevel regression without having multilevel variable?
Thanks!
Relevant answer
Answer
Yes, you can run a multilevel model without level 2 predictors. This is sometimes referred to as a random coefficient regression analysis. In that analysis, you would simply model potential variability in the level-1 regression intercept and/or slope coefficients across clusters (level-2 units) without explaining that variability by level-2 predictors. This allows you to properly take into account the non-independence that arises from cluster sampling that is shown by your ICC.
  • asked a question related to Regression
Question
1 answer
My Topic is Study of the energy status of the construction materials, that is why all parameters of them is very needed. calculation, comparison, regression, co relation etc.
that is why if you have any idea of them .
thanking you all
Relevant answer
Answer
Certainly! Conducting a study on the energy status of construction materials is a fascinating and relevant topic. Here are some reference databases and research articles that you can explore to gather information on energy status, efficiency, and related parameters of construction materials:
Research Articles and Journals:
  1. Energy and Buildings - A peer-reviewed journal focusing on the energy efficiency, sustainability, and environmental impacts of buildings and construction materials.
  2. Construction and Building Materials - A journal covering research on the properties, performance, and sustainability of construction materials, including energy-related aspects.
  3. Journal of Cleaner Production - A multidisciplinary journal publishing research on cleaner production, sustainable development, and energy efficiency in various industries, including construction.
  4. Applied Energy - A journal focusing on energy engineering, energy efficiency, and sustainable energy systems. It includes articles related to the energy status and efficiency of construction materials.
  5. Building Research & Information - A journal publishing research on building science, construction technology, and sustainable building practices, including energy-efficient materials and systems.
Keywords to Search:
When searching these databases and journals, consider using the following keywords and phrases to find relevant articles and research papers:
  • Energy efficiency in construction materials
  • Energy status of building materials
  • Thermal properties of construction materials
  • Life cycle energy analysis of materials
  • Embodied energy of building materials
  • Energy-efficient construction materials
  • Sustainable construction materials
  • Energy consumption and efficiency in construction
  • asked a question related to Regression
Question
2 answers
Dear Colleagues,
Does anyone know about Universities that are offering (a) Ph.D. by prior publication (b) Ph.D. by portfolio?
I have two publications viz."Regression Testing in Era of Internet of Things and Machine Learning" and "Regression Testing and Machine Learning". The former has touched 1k+ copies and has a rating of 4.04 and the latter is a recent publication with 200+ copies with a rating of 4.04. This data is as per BookAuthority.org.
Also, the former is indexed in prestigious searches such as Deutsche Nationalbibliothek (DNB), GND Network, Crossref Metadata Search, and OpenAIRE Explore.
Any leads or pointers would be greatly appreciated.
Best Regards,
Abhinandan(919886406214).
References
Relevant answer
Answer
Thanks for the information and insight.
Best Regards,
Abhinandan.
  • asked a question related to Regression
Question
2 answers
Hello everyone and thank you for reading my question.
I have a data set that have around 2000 data point. It have 5 inputs (4 wells rate and the 5th is the time) and 2 ouputs ( oil cumulative and water cumulative). See the attached image.
I want to build a Proxy model to simualte the cumulative oil & water.
I have made 5 models ( ANN, Extrem Gradient Boost, Gradient Boost, Randam forest, SVM) and i have used GridSearch to tune the hyper parameters and the results for training the models are good. Of course I have spilited the training data set to training, test and validation sets.
So I have another data that I haven't include in either of the train,test and validation sets and when I use the models to predict the output for this data set the models results are bad ( failed to predict).
I think the problem lies in the data itself because the only input parameter that changes are the (days) parameter while the other remains constant.
But the problem is I can't remove the well rate or join them into a single variable because after the Proxy model has been made I want to optimize the well rates to maximize oil and minimize water cumulative respectively.
Is there a solution to suchlike issue?
Relevant answer
Answer
To everyone who faced this problem, this type of data is called time series data which have a specific algorithm that used to build the proxy models (i.e RNN, LSTM)
  • asked a question related to Regression
Question
2 answers
Hi,
I am trying to evalaute the impact of gender quotas on women's political engagement. I am using the world values survey data on different countries over the time period of range 1981-2009. I wish to do a country and time fixed regression of gender quotas on a variable while controlling for age. However age in the survey is divided into categories, how can i recode it for my regression. Should I use binning to control for age or should I use the mean values of the categories?
Relevant answer
Answer
If age is an ordinal (ordered categorical) variable in your data set, you can simply add that variable as another predictor to your regression model.
  • asked a question related to Regression
Question
12 answers
It is known that we can use the regression analysis to limit the confounding variables affecting the main outcome. But what if the entire sample have a confounding variable affecting main outcome, will Regression Analysis still applicable and reliable ?
For example a study was done to investigate the role of certain intervention in cognitive impairment, the entire population included was old aged (more than 60 years old ), which means that the age here is a risk factor ( Co-variate ) in the entire sample, and it is well known that age is a significant independent risk factor of cognitive impairment
My question here is; Will the regression here of a real value ? Will it totally vanish the effect of age and got to us the clear effects the intervention on cognitive impairment ?
Relevant answer
Answer
Yes of course, adjusting by age will remove the confounding factor of age. Actually, adjusting the model by a confounding factor is one way of two to remove its effect when checking the effect of another variable.
Look at this example:
if the equation of model with only cognitive score as x and the outcome as y is like this:
y = 1 + 5*x1
and equation of a model with only age:
y = 2 + 3*x2
Assuming that both variables have additive effect (which is may not be true)
then the final equation should be something like this:
2*y = 1 + 5*x1 + 2 + 3*x2 = 3 + 5*x1 + 3*x2
and it can be written like this: 2 * y/2 = (3 + 5*x1 + 3*x2)/2
y = 1.5 + 2.5* x1 + 1.5 * x2
And adding more and more variables to the model will definitely affect the coefficients
The other way is by stratification: you can just select homogeneous sub sample and fit a model and so on as I mentioned in the previous comment
There is another way "propensity score matching/ propensity weighted analysis" which is using the same model of the previous regression and adding weights to the patients which can be used in weighted analysis or scoring analysis ( but I don't encourage it in most cases it doesn't work unless you have a large sample )
This in general, but the question remains:
Is there results conclusive? Absolutely not, the only way that can eliminate the confounding factors is controlled randomized trials or a very big sample of the population like big clinical registries or census
In my opinion and from experience most studies group patients like this: ">18", "18-45", "46-65", "> 65"
In your case it deserves a good letter to the editor to criticize and obvious mistake and I would as the authors to send me their data to replicate their results.. there is a very big mistake if they did so.
>> Tooth loss and age are correlated variables and can't be included in the same model as they will produce a collinearity problem, it must do so.<<<
  • asked a question related to Regression
Question
4 answers
in the use of spectral indices in the estimation of corn yield, why is it that when I put the average of the total index at the farm level in the equation generated from the regression, the predicted yield is closer to the actual yield even though the coefficient of determination is weak?
# spectralindices
#predictedyield
#RS
Relevant answer
Answer
Thank you. You helped a lot.
  • asked a question related to Regression
Question
13 answers
I am doing landuse projection using the Dyna-CLUE model, but I am stucked with the error "Regression can not be calculated due to a large value in cell 0,478". I would appreciate any advice you can provide to solve this error.
Relevant answer
Answer
My problem could be identified by your suggestion since I could find the variable that had the problem.
Do you have any idea about this Error?
ERROR: no solution; program terminated
thanks for your help.
  • asked a question related to Regression
Question
3 answers
I am conducting a meta-analysis and I want to use the nonlinear polynomial regression and splines functions to model the dose-response relationship between the parameters of interest.
I would appreciate any help or suggestions.
Thank you very much.
Relevant answer
Answer
You can use this:
but polynomial regression is very dangerous to use since it doesn't explain anything and it can never be extrapolated.
  • asked a question related to Regression
Question
10 answers
My question is looking at the influence of simulation on student attitudes. My professor would like me to do regression analysis, but he says to do two regressions. I have my pre-test data and post-test data the only other information I have is student college. What I found in my class materials seems to indicate that I can complete a regression using the post-test as my dependent variable and the pre-test as my independent variable in SPSS. How would I do another regression? Should I work in the colleges as another dependent variable and if so, do I do them as a group or do I need to create a variable for each college?
Relevant answer
Answer
I have some questions.
1) Was there some treatment (or intervention) between the baseline and followup scores? If so, did all subjects receive it, or only some of them? And if so to that, how were they allocated to intervention vs control?
2) How many colleges are there? If the number is fairly large, it may be preferable to estimate a multilevel model with subjects at level 1 clustered within colleges at level 2.
  • asked a question related to Regression
Question
1 answer
I regress X to Y: ,direct effect (c)
M: mediator: I regress X to M (a), M to Y (b)
Total effect = c + a*b
now i introduce a moderator effect between X and Y
How i calculate the total effect with moderator and mediator effect
Relevant answer
Answer
If the moderation effect is different from zero (i.e., if there is a moderation/interaction effect), the c path would differ for different values of the moderator variable (Z). Consequently, also the total effect would differ for different values of Z.
  • asked a question related to Regression
Question
1 answer
I have daily sales data and stock availability for items in a supermarket chain. My goal is to estimate the sales quantity elasticity with respect to availability (represented as a percentage). With this model, I want to understand how a 1% change in availability impacts sales. Currently, single-variable regressions yield low R-squared values. Should I include lagged sales values in the regression to account for other endogenous factors influencing sales? This would isolate availability as the primary exogenous variable
Relevant answer
Answer
Well, I don't get a clear picture of the variables you are considering for your study. Since you are considering daily data and you have more than one variable you can either apply non-linear ARDL(s) or multivariate volatility models depending on the objectives of your study. Bear in mind that the pre-tests are highly instrumental to the choice of models.
Best wishes!
  • asked a question related to Regression
Question
1 answer
I am doing a study focusing on analyzing differences in fish assemblages due to temperature extremes. I calculated Shannon diversity, evenness, richness, and total abundance for each year sampled. The years are grouped into 2 temperature periods essentially as well, which is what I want to overall compare.
On viewing results, there appears to be consistency across years, and when comparing the two groupings. I do have multivariate tests to follow this after for community composition, but when describing univariate results, are there any statistical tests that can be followed up with to better show there is no difference, rather than simply describing the numbers and their mean differences?
Relevant answer
Answer
Hi Alana Barton It would be good to have more information here to be able to help more, but have you tried a GLM with both years and temperatures included in the model? Perhaps you'd also need to add an interaction effect between temperature and year (as from what you said there's seems to be an interaction). Further explanatory variables could be added to the model if you have measured them.
  • asked a question related to Regression
Question
7 answers
Dear all,
I am sharing the model below that illustrates the connection between attitudes, intentions, and behavior, moderated by prior knowledge and personal impact perceptions. I am seeking your input on the preferred testing approach, as I've come across information suggesting one may be more favorable than the other in specific scenarios.
Version 1 - Step-by-Step Testing
Step 1: Test the relationship between attitudes and intentions, moderated by prior knowledge and personal impact perceptions.
Step 2: Test the relationship between intentions and behavior, moderated by prior knowledge and personal impact perceptions.
Step 3: Examine the regression between intentions and behavior.
Version 2 - Structural Equation Modeling (SEM)
Conduct SEM with all variables considered together.
I appreciate your insights on which version might be more suitable and under what circumstances. Your help is invaluable!
Regards,
Ilia
Relevant answer
Answer
Ilia, some thoughts on your model. According to your path diagram you have 4 moderator effects. For such a large model, you need a large sample size to detect all moderator effects simultaneously. Do you have a justification for all of these nonlinear relationships?
Some relationships in the path diagram are missing. First, prior knowledge, personal impact, and attitude should be correlated - these are the predictor variables. Second, prior knowledge and personal impact should have direct effects on the dependent variables behavioral intentions and behavior (this is necessary).
As this model is quite complex, I would suggest to start with analyzing the linear model. If this model fits the data well, then I would include the interaction effects one by one. Keep in mind that you need to use a robust estimation method for parameter estimation because of the interaction effects. If these effects exist in the population, then behavioral intentions and behavior should be non-normally distributed.
Kind regards, Karin
  • asked a question related to Regression
Question
1 answer
Hello everyone, for my dissertation I have two predictor variables and one criterion variable. In one of the predictor variable- I further have 5 domains and it doesn't have a global score so in that case can i used multiple regression or i have to perform step wise linear regression seperately for 6 predictors(5 domains and another predictor) ?- keeping in mind the assumption of multicollinearity.
Relevant answer
Answer
There are two different issues here. The first is with regard to step-wise regression, which is a very old-fashioned technique which is no longer widely accepted. Instead, you should indeed use multiple regression.
The other issue is with regard to multicolinearity. Since you predictors will almost certainly be inter-correlated, you will thus have some degree of multicolinearity. But this goes back your wanting to keep the 5 domains separate, since it is their degree of inter-correlation that creates the multicolinearity.
Have you considered using Structural Equation Analysis, or exploratory factor analysis to clarify whether your 5 domains truly are statistically distinct, as opposed to indicators of a single larger domain.
  • asked a question related to Regression
Question
1 answer
Dear Scientists and Researchers,
I'm thrilled to highlight a significant update from PeptiCloud: new no-code data analysis capabilities specifically designed for researchers. Now, at www.pepticloud.com, you can leverage these powerful tools to enhance your research without the need for coding expertise.
Key Features:
PeptiCloud's latest update lets you:
  • Create Plots: Easily visualize your data for insightful analysis.
  • Conduct Numerical Analysis: Analyze datasets with precision, no coding required.
  • Utilize Advanced Models: Access regression models (linear, polynomial, logistic, lasso, ridge) and machine learning algorithms (KNN and SVM) through a straightforward interface.
The Impact:
This innovation aims to remove the technological hurdles of data analysis, enabling researchers to concentrate on their scientific discoveries. By minimizing the need for programming skills, PeptiCloud is paving the way for more accessible and efficient bioinformatics research.
Join the Conversation:
  1. How do you envision no-code data analysis transforming your research?
  2. Are there any other no-code features you would like to see on PeptiCloud?
  3. If you've used no-code platforms before, how have they impacted your research productivity?
PeptiCloud is dedicated to empowering the bioinformatics community. Your insights and feedback are invaluable to us as we strive to enhance our platform. Visit us at www.pepticloud.com to explore these new features, and don't hesitate to reach out at [email protected] with your thoughts, suggestions, or questions.
Together, let's embark on a journey towards more accessible and impactful research.
Warm regards,
Chris Lee
Bioinformatics Advocate & PeptiCloud Founder
Relevant answer
Answer
I think they remove the need for programming skills and make data analysis much easier to do quickly and efficiently! For the future, I look forward to considering adding more no-code functions to meet a wider range of research needs. Just like the no-code platforms used before, a lot of time will be spent on data processing and analysis, and with no-code tools It will make our work easier and easier
  • asked a question related to Regression
Question
3 answers
hi, i'm currently writing my psychology dissertation where i am investigating "how child-oriented perfectionism relates to behavioural intentions and attitudes towards children in a chaotic versus calm virtual reality environment".
therefore i have 3 predictor variables/independent variables: calm environment, chaotic environment and child-oriented perfectionism
my outcome/dependent variables are: behavioural intentions and attitudes towards children.
my hypotheses are:
  1. participants will have more negative behavioural intentions and attitudes towards children in the chaotic environment than in the calm environment.
  2. these differences (highlighted above) will be magnified in participants high in child-oriented perfectionism compared to participants low in child oriented perfectionism.
i used a questionnaire measuring child-oriented perfectionism which will calculate a score. then participants watched the calm environment video and then answered the behavioural intentions and attitudes towards children questionnaires in relation to the children shown in the calm environment video. participants then watched the chaotic environment video and then answered the behavioural intentions and attitudes towards children questionnaire in relation to the children in the chaotic environment video.
i am unsure whether to use a multiple linear regression or repeated measures anova with a continuous moderator (child-oriented perfectionism) to answer my research question and hypotheses. please please can someone help!
Relevant answer
Answer
1. participants will have more negative behavioural intentions and attitudes towards children in the chaotic environment than in the calm environment.
--- because there were only two conditions (levels of your factor), you can use a paired t-test (or wilcoxon if nonparametric) to compare the behavioral intentions/attitudes between the calm and chaotic environment where the same participants were subjected to both environments.
2. these differences (highlighted above) will be magnified in participants high in child-oriented perfectionism compared to participants low in child oriented perfectionism.
--- indeed this is a simple linear regression (not multiple one), you can start with creating a new dependent variable (y) as the difference in behavioral intentions/attitudes between the calm and chaotic environment, then you run a regression on the independent variable of a perfectionism score (x).
  • asked a question related to Regression
Question
3 answers
How can I interpret these two examples below in the mediation analysis? Help me
1) with negative indirect and total effect, positive direct effect
Healthy pattern (X)
Sodium Consumption (M)
Gastric Cancer (Y)
Total Effect: Negative (-0.29)
Indirect Effect: Negative (-0.44)
Direct Effect: Positive (0.14)
Mediation percentage: 100%
2) With total and direct negative effect, positive indirect effect
Healthy pattern (x)
Sugar consumption (m)
Gastric Cancer (Y)
Total Effect: Negative (-0.42)
Indirect Effect: Positive (0.03)
Direct Effect: Negative (-0.29)
Mediation percentage: 10.3%
Relevant answer
Answer
The interpretations depends on all aspects either positive nor negative.simply advantages and disadvantages .
  • asked a question related to Regression
Question
2 answers
I run OLS regression on panel data in Eviews and then 2SLS and GMM regression.
I introduced all the independent variables of OLS as instrumental variables.
I am getting exacty same results under the three methods.
is there any mistake in running the models
I am also attaching the results.
thanks in advance
Relevant answer
Answer
OLS (Ordinary Least Squares), (2SLS) Two-Stage Least Squares, and (GMM) Generalized Method of Moments, are all statistical methods used in econometrics.
These tools are used in different contexts and under different assumptions, so they do not generally produce similar results.
  • asked a question related to Regression
Question
2 answers
In his 1992 paper, (Psychological Assessment 1992, Vol.4, No. 2,145-155) Tellegen proposed a formula to calculate the uniform T score.
UT = B0 + B1X + B2X2 + B3X3.
B0 being the intercept, X the raw score and B1, B2 and B3 different regression coefficients. X2 is squared and X3 cubic.
What is the intercept ? How do you calculate the intercept (Bzero)?
How do you calculate the regression cofficient? Is it between the raw score and the percentile? Why 3 different regression coefficients?
Relevant answer
Answer
Yes the formula to obtain a linear T score is quite simple to apply. The question is more about uniform T score or normalized T score in order to address the sweness and kurtosis of different slopes.
  • asked a question related to Regression
Question
2 answers
Suppose I compute a least squares regression with the growth rate of y against the growth rate of x and a constant. How do I recover the elasticity of the level of y against the level of x from the estimated coefficient?
Relevant answer
Answer
The elasticity of y with respect to x is defined as the percentage change in y resulting from a one-percent change in x, holding all else constant. In the context of your regression model, where you have regressed the growth rate of y (which can be thought of as the percentage change in y) against the growth rate of x (the percentage change in x), the estimated coefficient on the growth rate of x is an estimate of this elasticity directly.
Here's why: If you run the following regression:
Δ%y=a+b(Δ%x)+ϵ
where Δ%y is the growth rate of y (dependent variable), Δ%x is the growth rate of x (independent variable), a is the constant term, b is the slope coefficient, and ϵ is the error term, the coefficient b represents the change in Δ%y for a one-unit change in Δ%x. Because Δ%y and Δ%x are already in percentage terms, the coefficient b is the elasticity of y with respect to x.
So, if you have estimated the coefficient b from this regression, you have already estimated the elasticity. There is no need to recover or transform the coefficient further; the estimated coefficient b is the elasticity of y with respect to x.
It's important to note that this interpretation assumes that the relationship between y and x is log-linear, meaning the natural logarithm of y is a linear function of the natural logarithm of x, and the model is correctly specified without omitted variable bias or other issues that could affect the estimator's consistency.
  • asked a question related to Regression
Question
2 answers
In most of the studies tobit regression is used but in tobit model my independent variable is not significant. Whether fractional logistic regression is also an appropriate technique to explore determinants of efficiency?
Relevant answer
Answer
When using efficiency scores as a dependent variable in subsequent regression analysis, researchers often encounter the issue of these scores being bounded between 0 and 1, which violates the assumption of unboundedness in standard linear regression models. To address this issue, fractional regression models, such as the fractional logistic regression, are employed as they are designed specifically for dependent variables that are proportions or percentages confined to the (0,1) interval.
Fractional logistic regression, based on the quasi-likelihood estimation, can be used to model relationships where the dependent variable is a fraction or proportion, which is exactly the nature of technical efficiency scores resulting from DEA. Therefore, it is suitable to apply fractional logistic regression in a two-stage DEA analysis where the first stage involves calculating the efficiency scores, and the second stage seeks to regress these scores on other explanatory variables to investigate what might influence the efficiency of the DMUs.
This two-stage approach, where the DEA is used first to compute efficiency scores and then fractional logistic regression is used in the second stage, helps to avoid the potential biases and inconsistencies that might arise if standard linear regression techniques were used with bounded dependent variables. It is an appropriate statistical technique for dealing with the special characteristics of efficiency scores and can provide more reliable insights into the factors influencing DMU efficiency.
  • asked a question related to Regression
Question
2 answers
If I want to carry out innovative research based on Wasserstein Regression, what other perspectives can I carry out statistical innovation? Wasserstein Regressions can I carry out statistical innovation? Specifically,(1) Combining with Bayesian framework, the prior distribution is introduced and parameter estimation is performed based on Bayesian rule to obtain more reliable estimation results.(2)Variable selection technique is introduced to automatically select the predictive distribution that has explanatory power to the response distribution to obtain sparse interpretation.
Can the above questions be regarded as a highly innovative research direction?
Relevant answer
Answer
Hi,
Incorporating a Bayesian framework and variable selection into Wasserstein Regression could be a valuable contribution to the field. These methods could enhance the robustness and clarity of the models, offering a meaningful advancement in statistical analysis, particularly for complex data.
Just share my thoughts.
  • asked a question related to Regression
Question
1 answer
I would like to utilise the correct regression equation for conducting Objective Optimisations using MATLAB's Optimisation Tool.
When using Design Expert, I'm presented with the Actual factors or Coded factors for the regression equation. However, with the Actual Factors, I'm presented with multiple regression equations since one of my input factors was a categoric value. In this categoric value, the factors were, Linear, Triangular, Hexagonal and Gyroid. As a result, I'm unsure which Regression equation to utilise from the actual factors image.
Otherwise, should I utilise the single regression equation which incorporates all of them? I feel like I'm answering my own question and I really should be using the Coded Factors for the regression equation, but I would like some confirmation.
I used one of the regression equations under "Actual Factors" where Linear is seen, but I fear that this did not incorporate all of the information from the experiment. So any advice would be most appreciated.
Most appreciated!
Relevant answer
Answer
Multiple objective is appropriate.
  • asked a question related to Regression
Question
4 answers
Whether one independent variable in multilevel regression can be the other two independent variables, such as type, token, and type/token ratio
Relevant answer
Answer
You can see the following books for more details
1. Using Multivariate Statistic by Tabachnic
2. Hair - Multivariate Analysis 7e
3. Applied Multivariate Techniques - Subhash Sharma
  • asked a question related to Regression
Question
13 answers
Can we use inferential statistics like correlation and regression if we use non-probability sampling technique like convenient or judgement sampling?
Relevant answer
Answer
Yes, it is possible
Statistical tests such as relationship tests, regression, comparison, etc
Randomness of the sample is not required
Each test has separate special conditions
It can be applied to any block of data that meets the conditions regardless of the type of sampling method performed
However, the sampling method affects the generalizability of the results
best wishes
  • asked a question related to Regression
Question
1 answer
Phonetic - What is progressive and regressive assimilation and dissimilation in the Romanic Languages (especially Spanish) and how do you recognize it? Was ist progressive und regressive Assimilation und Dissimilation in den Romanischen Sprachen, besonders im Spanischen? Que es la asimilación progresiva y regresiva y la disimilación fonética en las lenguas románicas (especialmente en el español)?
I am searching for an explication, a good source recommendation, where I could read more about this topic and some examples for the assimilation or dissimilation in the romanic languages, especially in the Spanish language. Thank you for helping me!!
Relevant answer
Answer
The assimilation phenomena are the influences that a sound makes to another. We have the progressive assimilation when the preceding sound changes the characteristics of the following: [aθ't̪̟eka] the voiceless interdental fricative consonant interdentalized the voiceless dental oclusive consonant. In the other hand, the regressive assimilation takes place when the following sound influences to the preceding: [an̪dalu'θia] the voiced dental occlusive consonant dentalized the alveolar nasal, but here we have also the progressive assimilation because the alveolar nasal made the voiced dental occlusive consonant mantain its occlusion. Finally, the dissimilation is the process by which two sounds variate their characteristics in order to emphasise the difference between them: an example could be the evolution of the medieval Spanish sibilants.
  • asked a question related to Regression
Question
4 answers
I found in google but can't understand properly.
Relevant answer
Answer
Correlation measures the strength and direction of a linear relationship between two variables, while regression goes a step further by modeling and predicting the impact of one or more independent variables on a dependent variable. Correlation does not imply causation, merely showing association, whereas regression can provide insights into potential cause-and-effect relationships.
  • asked a question related to Regression
Question
3 answers
Hi,
I have some confusion as to which one is better for outcomes as a model using binomial regression or logistic regression. currently i am working on judicial decisions, outcomes in tax courts where the cases go either in favour of assessee or the taxman. The factors influencing the judges as reflected in the cases are duly represented by presence(1) or absence (0) of the same. If a factor is not considered in final judgment it takes '0' else '1'. if outcome is favourable to assessee - it is '1' else'0' - now which would be the best approach to put this into a regression model showing relationhip between outcome (dependant) and independent ( factors - may be 5-6 variables). I need some guidance on this . can i use any other better model for forecast after i can perform a bootstrap run for say 1000 simulations and then arrive average outcomes and related statistics.
Relevant answer
Answer
In your case, the appropriate choice would be logistic regression rather than binomial regression. Logistic regression is specifically designed for binary outcomes, which seems to align with your scenario where the judicial decisions can go either in favor of the assessee (1) or the taxman (0).
Logistic regression models the probability of a binary outcome, and it's well-suited for situations where the dependent variable is categorical and has two possible outcomes. Binomial regression, on the other hand, is a more general term that can encompass logistic regression as a special case, but it's not the same thing. Logistic regression is a type of binomial regression.
Given that you have a binary outcome and you want to model the relationship between this outcome and several independent variables (factors), logistic regression would be the more appropriate choice.
As for incorporating bootstrapping for more robust estimates, that's a good approach. By running simulations and generating multiple bootstrap samples, you can assess the stability of your model and obtain more reliable estimates of model parameters. This can be especially helpful when dealing with a limited number of observations.
  • asked a question related to Regression
Question
4 answers
I've often seen the following steps used to test for moderating effects in a number of papers, and I don't quite understand the usefulness of Model 1 (which only tests the effect of the control variable on the dependent variable) and Model 4 (which only adds the cross-multiplier term of one of the moderating variables with the independent variable). These two models seem redundant.
Relevant answer
Answer
Burke D. Grandjean Thank you very much for your patience! I agree with you, though I still don't think there is much value in performing this manipulation in the results section.
If the difference between the two coefficients is not large, I cannot be sure if this is because the risk of confounding is low in my study or because my control variables are null (not eliminating the risk of confounding). Conversely, if the difference between the two coefficients is large, again I cannot determine whether this is because I am controlling for confounding well or because of some other problem (such as the multicollinearity problem you describe).
What is the point of this manipulation, given my difficulty in determining the reason for a particular situation? I think the relevant discussion should focus on the relationship of the control variables to the independent and dependent variables, with a dedicated section for detailed discussion and further testing.
  • asked a question related to Regression
Question
1 answer
if we can
1- First degree equation
2- 7th-degree polynomial equation
3- A non-linear differential equation
4- A system of two variables and 2 equations
5- A system of second-order differential equations
6- A device of nonlinear equations
And
7- The next step is the non-linear differential equation
If we can model all of these by regression and get the output correctly or with good accuracy, then we can have an approximate model of the system using only data, and this can be a start for using control or troubleshooting. Complex systems for which exact models are not available.
for the test, I start with 1 and 2 but Is it possible to achieve a good accuracy of the answer with regression for the rest of the cases?
Relevant answer
Answer
While regression analysis is a powerful tool for modeling relationships in data, its applicability to different types of equations varies. Let's explore each case:
First-degree equation:
Regression is highly effective for linear relationships. You can achieve good accuracy modeling a first-degree equation through linear regression.
7th-degree polynomial equation:
Polynomial regression can be used to model higher-degree polynomials. However, as the degree increases, there's a risk of overfitting, and the model may not generalize well to new data. Careful consideration of model complexity is crucial.
Non-linear differential equation:
Non-linear differential equations often involve dynamic systems. While regression may provide some insights, more specialized techniques like differential equation modeling or system identification methods would be more appropriate to capture the dynamic behavior accurately.
System of two variables and 2 equations:
Linear regression can be extended to multiple variables, making it suitable for simple systems. However, for more complex systems, you might need to explore system identification methods or machine learning techniques tailored for system modeling.
System of second-order differential equations:
Modeling second-order differential equations involves understanding the system's dynamics. Traditional regression may not capture these complexities well. System identification methods or dynamic modeling approaches are more suitable.
Device of nonlinear equations:
Nonlinear regression techniques can be employed for systems represented by nonlinear equations. However, the accuracy depends on the nature of nonlinearity and the quality and quantity of data.
The next step is the non-linear differential equation:
Similar to the third case, modeling non-linear differential equations requires specialized methods. Ordinary Differential Equation (ODE) solvers or system identification techniques may be more suitable than standard regression.
In summary, while regression is valuable for simple relationships, more complex systems often require advanced techniques tailored to the specific characteristics of the equations involved. Always validate the model's accuracy and generalizability, and consider consulting experts in the relevant field for a comprehensive understanding of the system.
  • asked a question related to Regression
Question
3 answers
In the case of a constant coefficient, where the VIF is greater than 10, what does that mean? Do all the variables in the model exhibit multicollinearity? How can multicollinearity be reduced? Multicollinearity could be reduced by removing variables with VIF >10. But I don't know what to do with the constant coefficient.
Thank you very much
Relevant answer
Answer
Looking further - your package may be reporting an uncentred VIF in place of or in addition to a centred VIF. There is an apparently unresolved debate in the literature about when or why that's useful. For practical purpose in most regressions it seems likely that high uncentred VIF may not be problematic. I've never seen uncentred VIF used in a published paper ...
  • asked a question related to Regression
Question
5 answers
Hello,
I am measuring thermal stability of a small protein (131 aa) using circular dichroism following the loss of its secondary structure. The data obtained is normalized to be within 0 and 1 where 0 is folded protein and 1 is completely unfolded. The CD of the fully unfolded state was calculated from a different experiment on the same batch and taken as reference. Once plotting my data in Graphpad Prism 9, I am fitting a standard 4PL curve using non-linear regression, constraining the regression to use 0 as bottom value and 1 as top value (see attached file). The Tm is reported as IC50 in this screenshot because this formula is often use for calculating IC50 and EC50. However, the resulted fitted line seems to not being able to represent correctly my data. I performed this experiment twice and the replicate test is showing that the model is inadequately representing the data. Should I look for a different equation to model my data? Or am I making a mistake in performing this regression? Thank you for the help!
Relevant answer
Answer
To ascertain that your model is applicable to your data, plot the residuals (y-\hat{y}) against the independent variable, they should be randomly distributed. You can perform a “runs-test” as a non-parametric test, or (more work, but also more powerful) a t-test to compare the residuals with the standard deviation of the data. Both methods test for H0: The data are reasonably well described by the regression curve, against H1: The data significantly deviate from the regression curve.
Most commercial software performs non-linear curve fitting with the Marquardt-Levenberg algorithm. I had this fail me on occasions and found that Nelder-Mead's simplex-algorithm is more reliable (DOI:10.1093/comjnl/7.4.308). Disadvantage: you need to get the errors of the fitting parameters from bootstrapping (10.1016/0076-6879(92)10009-3), ML gives them directly.
  • asked a question related to Regression
Question
16 answers
There are two ways we can go about testing the moderating effect of a variable (assuming the moderating variable is a dummy variable). One is to add an interaction term to the regression equation, Y=b0+b1*D+b2*M+b3*D*M+u, to test whether the coefficient of the interaction term is significant; an alternative approach could also be to equate the interaction term model to a grouped regression (assuming that the moderating variables are dummy variables), which has the advantage of directly showing the causal effects of the two groups. However, we still need to test the statistical significance of the estimated D*M coefficients of the interaction terms by means of an interaction term model. Such tests are always necessary because between-group heterogeneity cannot be resorted to intuitive judgement.
One of the technical details is that if the group regression model includes control variables, the corresponding interaction term model must include all the interaction terms between the control variables and the moderator variables in order to ensure the equivalence of the two estimates.
If in equation Y=b0+b1*D+b2*M+b3*D*M+u I do not add the cross-multiplication terms of the moderator and control variables, but only the control variables alone, is the estimate of the coefficient on the interaction term still accurate at this point? At this point, can b1 still be interpreted as the average effect of D on Y when M = 0?
In other words, when I want to test the moderating effect of M in the causal effect of D on Y, should I use Y=b0+b1*D+b2*M+b3*D*M+b4*C+u or should I use Y=b0+b1*D+b2*M+b3*D*M+b4*C+b5*M*C+u?
Reference: 江艇.因果推断经验研究中的中介效应与调节效应[J].中国工业经济,2022(05):100-120.DOI:10.19581/j.cnki.ciejournal.2022.05.005.
Relevant answer
Answer
You are welcome! Dont bother to ask further questions.
  • asked a question related to Regression
Question
14 answers
I am writing my bachelor thesis and I'm stuck with the Data Analysis and wonder if I am doing something wrong?
I have four independent variables and one dependent variable, all measured on a five point likert scale and thus ordinal data.
I cannot use a normal type of regression (since my data is ordinal and my data is not normally distributed and never will be (transformations could not change that) and is also violating homoscedasticity), so I figured ordinal logisitc regression. Everything worked out perfectly but the test of parallel lines on SPSS was significant and thus the assumption of proportional odds violated. So, I am now considering multinomial logisitc regression as an alternative.
However, here I could not find out how to test the assumption on SPSS: Linear relationship between continuous variables and the logit transformation of the outcome variable. Does somebody know how to do this???
Plus, I have a more profound question about my data. To get the data on my variables, I asked respondents several questions. My dependent variable for example is Turnover Intention and I used 4 questions using a 5 point likert scale, thus I got 4 different values from everyone about their Turnover Intention. In order to do my analysis, I took the average since I only want one result, so one value of Turnover Intention per respondent (and not four). However, now the data does not range from 1,2,3,4 and 5 anymore like before with the five point likert scale but is infinite since I took the average and now have decimals like 1,25 or 1,75. This leaves me with endless data points and I was wondering if my approach makes sense? I was thinking of grouping them together since my analysis is biased by having so many different categories due to the many decimals.
Can somebody provide any sort of guidance on this??
Relevant answer
Answer
Lisa Ss it doesn't make sense to pool the data that way, if you believe that you have ordinal data. You cannot simply calculate a mean or a sum score from the items, since ordinal data doesnt provide that information. This demands a metric scale. Therefore, your "average" score is not appropriate and hence the "grouping", too.
In my opinion you have several options:
1) Use an ordinal multilevel model to account for the repeated measures and the ordinality.
2) Conduct an ordinal confirmatory factor analysis, calculate the factor score for the latent variable and use this as a dependent variable in a OLS regression.
3) Do everything with an ordinal SEM, the structural and the measurement model.
4) Treat the oridinal items as metric (not recommended),
Maybe others have different approaches, please share.
  • asked a question related to Regression
Question
5 answers
In my Ms thesis, I calculated my data by the process of linear regression but my supervisor added me also step-wise linear regression.
Relevant answer
Answer
stepwise methods seem like the answer to the problem of having a lot of possible predictors and not knowing which ones to put into your model. In fact, the problem is that you don’t know which variables to put into your model. Specifying the variables in the model should be done based on your hypothesis, which should build on previous research. Collecting data without a planned model is like shopping without having a recipe in mind. You end up with half the ingredients needed for half a dozen recipes. Science – same thing.
If you don’t have a hypothesised model and you’ve gone ahead anyway and accumulated data, there are still very important reasons for not letting stepwise methods act as a substitute for having a theory and a hypothesis. Briefly, these are :
1. The p-values for the variables in a stepwise model do not have the interpretation you think they do. It’s hard to define what hypothesis they actually test, or the chances that they are false-positive or false-negative.
2. The variables selected may not be the best subset of variables either. There may be other equally good, or even better, combinations of variables. One simple solution is to test all possible subsets of variables. And, like all simple solutions to complex problem, it's wrong. You end up with an unreproducible, atheoretical model that has sacrificed any generalisability to the task you gave it, which was fitting a particular sample of data.
3. The overall model fit statistics are wrong. The adjusted R2 is too big, and if there were a lot of variables not included in the final model, the adjusted R2 will be a massive overestimate. R2 should be adjusted based on the number of variables entered into the process, not on the number actually selected.
4. Stepwise models produce unreproducible results. A different dataset will, most likely, give a different model, and a stepwise model from one dataset fitted to a new dataset will fit badly.
5. But the most important argument is that stepwise models break a fundamental assumption of statistics, which is that the model is specified in advance and then the model coefficients are calculated from the data. If you allow the data to specify the model, as well as the coefficients, all bets are off. See the Stata FAQ :
I can do no better than quote Kelvyn Jones, a geography researcher significant enough to have his own Wikipedia page : There is no escaping of the need to think; you just cannot press a button.
Essentially, stepwise methods break the first rule of data analysis :
The software should work; the analyst should think.
  • asked a question related to Regression
Question
3 answers
I have retrieved a study that reports a logistic regression, the OR for the dichotomous outcome is 1.4 for the continuous variable ln(troponin) . This means that the Odds increase 0.4 every 2.7-fold in the troponin variable; but, is there any way of calculating the OR for a 1 unit increase in the Troponin variable?
I want to meta-analyze many logistic regressions, for which i need them to be in the same format (i.e some use the variable ln(troponin) and others (troponin). (no individual patient data is available)
Relevant answer
Answer
Just for the sake of completeness: it might be possible if there is a meaningful reference concentration of troponin you could refer to, but I doubt that there is such a value.
  • asked a question related to Regression
Question
3 answers
For instance, when using OLS, objective of the could be
# to determine the effect of A on B
could this kind of objective hold when using threshold regression?
Relevant answer
Answer
Thank you
  • asked a question related to Regression
Question
2 answers
Hi folks!
Let's say that I have two lists / vectors "t_list" and "y_list" representing the relationship y(t). I also have numerically computed dy/dt and stored it into "dy_dt_list".
The problem is that "dy_dt_list" contains a lot of fluctuations, and that I know that it MONOTONOUSLY DECREASES out of a physical theory.
1) Is there is a simple way in R or Python to carry out a spline regression that reproduces the numerical values of dy/dt(t) in "dy_dt_list" as best it can UNDER THE CONSTRAINT that it keeps decreasing? I thus want to get a monotonously decreasing (dy/dt)_spline as the output.
2) Is there is a simple way in R or Python to carry out a spline regression that reproduces the numerical values of y(t) as best it can UNDER THE CONSTRAINT that (dy/dt)spline keeps decreasing? I thus want to get y_spline as the output, given that the above constraint is fulfilled.
I'd like to avoid having to reinvent the wheel!
P.S: I added an example to clarify things!
Relevant answer
Answer
Hi!
There is C-library "GNU Scientific Library" - chapter 29 "Numerical Differentiation". It's free software.
There is FORTRAN-C-...-Pyton-library "IMSL". Is it possible documentation will be sufficient supporting? Documentation is free.
Best regards.
  • asked a question related to Regression
Question
4 answers
I have the OR of a logistic regresion that used the independent variable as continuous. I also have the ORs of 2x2 tables that dichotomized the variable (high if >0.1, low if < 0.1).
Is there anyway i can merge them for a meta-analysis. i.e. can the OR of the regression (OR for 1 unit increase) be converted to OR for High vs Low?
Relevant answer
Answer
Hello Santiago Ferriere Steinert. These two ORs are from different studies, right? How many ORs do you have in total? If I had only the two ORs you describe, I think I would just report them separately. If they were two ORs of a much larger number of ORs, and all but that one were from models that treated the X-variable as continuous, I might compare the OR from the 2x2 table to the pooled estimate of the OR from the other studies. But I think more information is needed. HTH.
  • asked a question related to Regression
Question
4 answers
#QuestionForGroup GoodDay, I'm utilizing the Lovibond photometer for water analysis but I noticed in its handbook for analysis that this calibration function is like in the attached photo. is it the inverted equation of Beer's law and why did it use polynomial regression?!. can you clarify the derivation and purpose of this equation?
Relevant answer
Answer
Apparently they use linear regression. The higher order coefficients are 0.
  • asked a question related to Regression
Question
3 answers
We know that bone is active tissue with continuous remodelling( bone growth and resorption). The atherosclerosis is static process as it formed or it could regress ? if the condition of lipid oxidation stopped, could atheroma regress spontaneously?
Relevant answer
Answer
Dear Dr. Ahmed Mahdy
When one is diagnosed of atherosclerosis, the most one can do is prevent progression and further complications. There are medical treatment, regular exercise, and dietary changes that can be used to keep atherosclerosis from getting worse and stabilize the plaque, but they aren’t able to reverse the disease. For instance, aspirin, its blood thinning qualities are beneficial in reducing blood clots and thus preventing strokes and heart attacks, but it has no effect in reducing arterial plaque.
Or, the use of statins which are the most effective and commonly used cholesterol-lowering medications. They work by blocking the protein in the liver that the body uses to make low-density lipoprotein (LDL), or bad cholesterol. The lower one knocks the LDL down, the more likely it is that one will get the plaque to stop growing.
Best.
  • asked a question related to Regression
Question
4 answers
Suppose one has 40 or 50 survey questions for an exploratory analysis of a phenomenon, several of which are intended to be dependent variables, but most independent. A MLR is conducted with e.g. 15 IVs to explain the DV, and maybe half turn out to be significant. Now suppose an interesting IV warrants further investigation, and you think you have collected enough data to at least partially explain what makes this IV so important to the primary DV. Perhaps another, secondary model is in order... i.e. you'd like to turn a significant IV from the primary model into the DV in a new model.
Is there a name for this regression or model approach? It is not exactly nested, hierarchical, or multilevel (I think). The idea, again, is simply to explore what variables explain the presence of IV.a in Model 1, by building Model 2 with IV.a as the DV, and employing additional IVs that were not included in Model 1 to explain this new DV.
I am imagining this as a sort of post-hoc follow up to Model 1, which might sound silly, but this is an exploratory social science study, so some flexibility is warranted, imo.
Relevant answer
Answer
If you how coherent subsets of your variables (i.e., they all measure essentially the same thing), then you can create scales that are stronger measures than any of the variables take alone.
I have consolidated a set of references on this approach here:
  • asked a question related to Regression
Question
13 answers
When i do regression analyze, in Model Summary Table, i found Rsquare is very weak like:0,001 or 0.052, and value of sig. in Anova table is greater than 0.05, how can i fix this?
Relevant answer
Answer
Unless you have an error in your data, this may just simply be the result of the analysis (i.e., that your predictor(s) is/are only weakly related to, and do not significantly predict, the dependent variable).
  • asked a question related to Regression
Question
4 answers
İ have a data set with six categorical variables, with responses on a scale of 1-5; the reliability test for the individual variables is very strong but when combined for all variables the reliability test give very low figures. What could be the problem. Also what would be an appropriate regression for this analysis.
Relevant answer
Answer
@Christian İ used Cronbach's Alpha in both instances
  • asked a question related to Regression
Question
7 answers
If we have a research (analysis of factors affecting on sustainable agriculture...) in order to analyze its data, most previous researches have used techniques such as regression. To identify effictive factors, Is it possible to use the exploratory factor analysis technique?
Relevant answer
Answer
Yes you can use exploratory factor analysis
  • asked a question related to Regression
Question
2 answers
I am planning to assess the extent of different income diversification strategies on rural household welfare. Considering simultaneous causality between different livelihood strategies and welfare indicators, the Two Stage Least Square (2SLS) method with instrumental variables will applied to estimate the impact of the strategies on household welfare.
Please check the attached file also. I just need to know which regression was used in table 4 of this paper and which tool (SPSS, STATA, R, etc.) I need to use to analyse the data.
Relevant answer
Answer
Thank you so much Ahlam Hanash Gatea .Can you please give an example of how to perform 2SLS methods in R?
  • asked a question related to Regression
Question
2 answers
I perfomed 2SLS ,
In the robust version i found the endogeneity , but did not found in non robust version.
The results in robust version is validy? need your help.
Non Robust options
Tests of endogeneity
H0: Variables are exogenous
Durbin (score) chi2(1) = .242302 (p = 0.6225)
Wu-Hausman F(1,613) = .227544 (p = 0.6335)
. estat overid
Tests of overidentifying restrictions:
Sargan (score) chi2(1) = .035671 (p = 0.8502)
Basmann chi2(1) = .033487 (p = 0.8548)
. estat firststage, all
First-stage regression summary statistics
--------------------------------------------------------------------------
| Adjusted Partial
Variable | R-sq. R-sq. R-sq. F(2,613) Prob > F
-------------+------------------------------------------------------------
TURN_1 | 0.1681 0.1152 0.0632 20.6714 0.0000
--------------------------------------------------------------------------
Shea's partial R-squared
--------------------------------------------------
| Shea's Shea's
Variable | partial R-sq. adj. partial R-sq.
-------------+------------------------------------
TURN_1 | 0.0632 0.0036
--------------------------------------------------
Minimum eigenvalue statistic = 20.6714
Critical Values # of endogenous regressors: 1
H0: Instruments are weak # of excluded instruments: 2
---------------------------------------------------------------------
| 5% 10% 20% 30%
2SLS relative bias | (not available)
-----------------------------------+---------------------------------
| 10% 15% 20% 25%
2SLS size of nominal 5% Wald test | 19.93 11.59 8.75 7.25
LIML size of nominal 5% Wald test | 8.68 5.33 4.42 3.92
---------------------------------------------------------------------
Robust options
  • Tests of endogeneity
  • H0: Variables are exogenous
  • Robust score chi2(1) = 2.99494 (p = 0.0835)
  • Robust regression F(1,613) = 2.77036 (p = 0.0965)
  • . estat overid, forcenonrobust
  • Tests of overidentifying restrictions:
  • Sargan chi2(1) = .035671 (p = 0.8502)
  • Basmann chi2(1) = .033487 (p = 0.8548)
  • Score chi2(1) = .514465 (p = 0.4732)
  • . estat overid
  • Test of overidentifying restrictions:
  • Score chi2(1) = .514465 (p = 0.4732)
  • . estat firststage, all
  • First-stage regression summary statistics
  • --------------------------------------------------------------------------
  • | Adjusted Partial Robust
  • Variable | R-sq. R-sq. R-sq. F(2,613) Prob > F
  • -------------+------------------------------------------------------------
  • TURN_1 | 0.1681 0.1152 0.0632 13.7239 0.0000
  • --------------------------------------------------------------------------
  • Shea's partial R-squared
  • --------------------------------------------------
  • | Shea's Shea's
  • Variable | partial R-sq. adj. partial R-sq.
  • -------------+------------------------------------
  • TURN_1 | 0.0632 0.0036
  • --------------------------------------------------
Relevant answer
Answer
The estat endogenous command in Stata can be used to test for the endogeneity of regressors in an instrumental variables (IV) model. The robust version of the test is designed to be more reliable in the presence of heteroskedasticity and autocorrelation in the errors.
The non-robust version of the test is based on the assumption that the errors are i.i.d. (independently and identically distributed). This assumption may not be valid in many real-world applications. If the errors are heteroskedastic or autocorrelated, the non-robust test may be unreliable and may produce false positives.
The robust version of the test is based on a different set of assumptions that are more likely to be valid in real-world applications. The robust test is therefore more reliable than the non-robust test, especially in the presence of heteroskedasticity and autocorrelation in the errors.
Which version of the test should you use?
It is generally recommended to use the robust version of the estat endogenous test, unless you are confident that the errors in your model are i.i.d. The robust test is more reliable and is less likely to produce false positives.
How do you interpret the results of the test?
The estat endogenous test produces a chi-squared statistic and a p-value. If the p-value is less than your chosen significance level (e.g., 0.05), then you can reject the null hypothesis of exogeneity. This means that there is evidence of endogeneity in the regressor(s) being tested.
Example
The following Stata code shows how to use the estat endogenous command to test for the endogeneity of the regressor x1:
ivregress y x1 x2 (z1 = z2), vce(robust) estat endogenous x1
The output of the estat endogenous command will include a chi-squared statistic and a p-value. If the p-value is less than your chosen significance level, then you can reject the null hypothesis of exogeneity for x1.
Conclusion
The robust version of the estat endogenous test is more reliable than the non-robust version, especially in the presence of heteroskedasticity and autocorrelation in the errors. It is generally recommended to use the robust version of the test.
  • asked a question related to Regression
Question
2 answers
What does the unstandardized regression coefficient in simple linear regression means?
Where as in multiple linear regression Unstandardized regression coefficients tell how much change in Y is predicted to occur per unit change in that independent variable (X), *when all other IVs are held constant*. But my question is in simple linear regression we have only one independent variable so how should I interpret it?
Relevant answer
Answer
The same, just that there is no "*when all other IVs are held constant*".
It's simply the (expected) change in Y per unit change of X. There are no other variables involved that would need to be "held constant".
  • asked a question related to Regression
Question
4 answers
Hi,
How do I interpret a significant interaction effect between my moderator (Coh) and independent variable (Hos)? The literature states Hos and my dependent variable (PDm) has a negative relationship. The literature also states the moderator (Coh) has a positive relationship with the DV (PDm). My regression co-efficient for the interaction effect is negative. Does this mean Coh is exacerbating the negative effect (i.e., making it worse) or weakening the effect (i.e., making it better)?
I have attached the SPSS output and simple slopes graph.
Thank you!
Relevant answer
Answer
Dear Np No
First, check the sign of the correlation coefficient (Pearson, Spearman, or others, depending on the condition of your data).
Before studying the regression model, you must calculate the correlation coefficients, which give you the inevitable result about the direction and strength of the relationship.
Regarding the interpretation of negative regression or correlation coefficients
An inverse relationship means, for example, one of two situations.
First: An increase in the value of the independent variable will lead to a decrease in the value of the dependent variable.
Second: A decrease in the value of the independent variable will lead to an increase in the value of the dependent.
The concept of increase and decrease is fundamentally related to the state of the studied variable and its logic.
EXP:
The shorter treatment period means a faster recovery. This is a positive thing, And A drop in vital indicators below a certain level, such as blood pressure, means something negative.
The depth here is expressed in the number after calculating it. If you need more detailed help, you can contact me, and you are welcome.
  • asked a question related to Regression
Question
2 answers
Hello, I am trying to analyze factors that influence the adoption of technology, and while doing that, I am facing issues with rbiprobit estimation. I have seven years (2015-2021) balanced panel data that contains 2835 data. The dependent variable y1 (Adopt2cat), the endogenous variable "BothTechKnowledge," and the instrumental variable "SKinfoAdoptNew" takes value 0 and 1. Although the regression works, I am unsure how to include panel effects in the model. I am using follwing codes: rbiprobit Adopt2cat ACode EduC FarmExp HHCat LaborHH LandSizeDec LandTypeC landownership SoilWaterRetain SoilFertility CreditAvail OffFarmCode BothTechAware IrriMachineOwn, endog(BothTechKnowledge = ACode EduC FarmExp HHCat LaborHH LandSizeDec LandTypeC landownership SoilWaterRetain SoilFertility CreditAvail OffFarmCode BothTechAware IrriMachineOwn SKinfoAdoptNew) rbiprobit tmeffects, tmeff(ate) rbiprobit margdec, dydx(*) effect(total) predict(p11) If we do not add time variables (year dummy), can we say we have obtained pooled panel estimation? I kindly request you to please guide me through both panel and pool panel estimation procedures. I have attached the Data file for your kind consideration. Thank you very much in advance. Kind regards Faruque
Relevant answer
Answer
Thank you very much Mr. Usman for your kind reply. It would be great help if you could kindly share the the code.
  • asked a question related to Regression
Question
5 answers
I have 4 groups in my study and I want to analyse the effect of treatment in 4 groups at 20 time points. Which test should I chose?
Relevant answer
Answer
If I did understand your question correctly, I will suggest to you to use RCBD , at the
same time you still have the chance to analyze data for regression, for each 20 points and/or 80 collected together. Regards.
  • asked a question related to Regression
Question
7 answers
I did principal component analysis on several variables to generate one component measuring compliance to medication but need understanding on how to use the regression scores generated for that component.
Relevant answer
Answer
Nicco Lopez Tan thanks so much
  • asked a question related to Regression
Question
2 answers
How can I ensure random sampling for customer surveys when the sampling frame is unavailable but needs to run a regression?
Relevant answer
Hi,
You can try this,
SD for the Average = SQRT[(SD1^2 + SD2^2 + SD3^2) / 3]
Thanks.
  • asked a question related to Regression
Question
1 answer
I had a few quick questions regarding the output generated by FEAT statistics. I'm currently working with resting-state data and attempting to perform nuisance regression of CSF, WM, Global Signal, motion parameters (standard + extended), and also scrub volumes that exceed a specific threshold of motion using FEAT Statistics. To scrub specific volumes with excessive motion I generated a confound.txt file that includes columns of 0 each with a single 1 indicating the specific volume that needs to be scrubbed. I selected Standard + Extended Motion Parameters to apply the motion parameters generated during FEAT preprocessing. Additionally, I applied CSF, WM, and Global signal nuisance regressors under full model setup by selecting Custom (1 entry per volume) and including three separate .txt files, each including 1 column of average values per volume (for CSF, WM, or Global). Doing so generated the attached design.png and res4d image. Is this the correct way to perform nuisance regression? If so, does the output res4d image look correct? It is very difficult to see the actual image relative to the background. Furthermore, is res4d the right image that I should be using if my goal is to extract the time series of ROIs within this fully processed resting state data? Any help is very much appreciated! Best, Brandon
Relevant answer
Answer
Dear Brandon,
The reason your residual map is hard to visualize is that typically the data is demeaned when fitting the GLM. So one strategy to visualize your data as a brain could be to save the mean of the data in time before running the denoising and then adding it back to the residuals afterwards for visualization purposes.
Regarding your denoising strategy, it seems fine to me, even if there are some cons of scrubbing (see https://neurostars.org/t/despiking-vs-scrubbing/2157/9 ). What are regressors 3-6 in your design matrix? They look surprisingly regular to me.
Also, usually you might want to also apply a bandpass filter in this process, which you can do in FEAT.
I hope this helps!
  • asked a question related to Regression
Question
5 answers
The objective here is to determine factor sensitivities or slope coefficients in a multiple ols regression model.
Relevant answer
Answer
I would certainly do that, just to check if your independent variables have a linear influence to the DV or not!
  • asked a question related to Regression
Question
5 answers
When the results of correlation and regression are different, which one should I rely on more? For example, if the correlation of two variables is negative, but the direction is positive in regression or path analysis, how should I interpret the results?
Relevant answer
Answer
Could you give us some data please, so we can have a look what's going on?
  • asked a question related to Regression
Question
1 answer
Global Project: Should we start developing the SIT-USE?
Software Immune Testing: Unified Software Engine (SIT-USE)
Toward Software Immune Testing Environment
Would you like to be part of the funding proposal for SIT-USE?
Would you like to participate in the development of the SIT-USE?
Would you like to support the development of HR SIT-USE?
Keywords: Funding Proposal or Funding, Participation, Support
If you answer yes to any of the questions, don't hesitate to get in touch with me at
[email protected] and write in the subject – The keyword(s)
Despite much progress and research in software technology, testing is still today's primary quality assurance technique. Currently, significant issues in software testing are:
1) Developing and testing software is necessary to meet the new economy market. In this new market, delivering the software on time is essential to capture the market. Software must be produced on time and be good enough to meet the customer's needs.
2) The existing software requirements keep changing as the project progresses, and in some projects, the rate of requirement changes can grow exponentially as the deadline approaches. This kind of rapid software change imposes significant constraints on testing because once a software program changes, the corresponding test cases/scripts may have to be updated. Furthermore, regression testing may have to be performed to ensure that those parts that are supposed to remain unchanged are indeed unchanged.
3) The number of test cases needed is enormous; however, the cost of developing test cases is extremely high.
4) Software development technologies, such as object-oriented techniques, design patterns (such as Decorator, Factory, Strategy), components (such as CORBA, Java's EJB and J2EE, and Microsoft's. NET), agents, application frameworks, client-server computing (such as socket programming, RMI, CORBA, Internet protocols), and software architecture (such as MVC, agent architecture, and N-tier architecture), progress rapidly, while designing and programming towards dynamic and runtime behavior. Dynamic behavior makes software flexible but also makes it difficult to test. Objects can now send a message to another entity without knowing the type of object that will receive the news. The receiver may be just downloaded from the Internet with no interface definition and implementation. Numerous testing techniques have been proposed to test object-oriented software. However, testing technology is still far behind software development technology.
5) Conventional software testing is generally application-specific, rarely reusable, and is not extensible. Even within a software development organization, software development, and test artifacts are developed by different teams and are described in separate documents. These make test reuse difficult.
As a part of this research, we plan to work toward an automated and immune software testing environment that includes 1. Unified Component-Based Testing (U-CBT); 2. Unified Built-In Test (U-BIT); 3. Unified-End-to-End (U-E2E) Testing; 4. Unified Agent-Based Testing U-ABT); 5. Unified Automatic Test Case Generators (U-ATCG); and 6. Unified Smart Testing Framework (U-STF). The development of this environment is based on the software stability model (SSM), knowledge map (KM): Unified Software Testing (KM-UST), and the notion of software agents. An agent is a computational entity evolving in an environment with autonomous behavior, capable of perceiving and acting on this environment and communicating with other agents.
You are invited to join Unified Software Engineering (USWE)
Relevant answer
Answer
It could help improve the detection and prevention of software vulnerabilities. Few factors:
1. Research and feasibility: Conduct thorough research to understand existing approaches, tools, and techniques related to software immune testing. Evaluate the feasibility of developing a unified software engine and consider the potential challenges and limitations that may arise.
2. Market demand and competition: Assess the market demand for a software immune testing tool. Investigate if similar tools or solutions already exist and analyze their features and limitations. Consider whether there is a need for a unified software engine like SIT-USE and how it would differentiate itself from existing solutions.
3. Resources and expertise: Determine if you have the necessary resources, including skilled developers, researchers, and domain expertise, to undertake the development of SIT-USE. Developing a robust and effective software testing tool requires significant time, effort, and expertise in areas such as software security, testing methodologies, and programming.
4. Collaboration and partnerships: Consider collaborating with experts or organizations specializing in software security or immune system-inspired testing. Partnering with experts in the field can provide valuable insights, guidance, and potential support for the development process.
5. Sustainability and maintenance: Evaluate the long-term sustainability and maintenance of the software immune testing tool. Consider factors such as updates, bug fixes, support, and staying up-to-date with emerging security threats and technologies.
Good luck
  • asked a question related to Regression
Question
15 answers
i am using STATA for thesting my hypotheses, what is the right command should i write to get star above significant coefficients at 10%, 5%, and 1% levels (two-tailed) and (one-tailed)?
i enclosed an attachment of regression and stars on significant coefficients at 10%, 5%, and 1% levels (two-tailed).
what about one tail?
Relevant answer
Answer
Use this estimates table, star(.05 .01 .001)
  • asked a question related to Regression
Question
4 answers
I am doing a research project to study the determinants of capital structure. However, I've run into two issues.
After downloading data from Compustat, I noticed there are a lot of missing values amongst the data, and I wonder how I can deal with this data? How is it usually done in finance literature?
The other problem I came across is strange to me, and one of my variables, Interest expsense, includes zero values and sometimes also negative values, which not only does not make sense but also poses issues in calculating coverage ratio. What do you suggest me to do in this case?
I highly appreicate your response.
Best
Saeed
Relevant answer
Answer
Missing data is a common problem in financial data. There are several methods to handle missing data in financial data. One approach is to replace the missing value with a constant value. This can be a good approach when used in discussion with the domain expert for the data we are dealing with. Another approach is to replace the missing value with the mean or median. This is a decent approach when the data size is small but it does add bias. A third approach is to replace the missing value with values by using information from other columns.
  • asked a question related to Regression
Question
1 answer
Recently, I was contacted by a professor who wanted to utilize my PyCaret book in his research. Considering that I support scientific advancement in every way possible, I was happy to collaborate with that person. Furthermore, I have decided to freely provide my book to other researchers interested in utilizing it. Here are the topics covered in the book:
• Regression
• Classification
• Clustering
• Anomaly Detection
• Natural Language Processing
• Time Series Forecasting
• Developing Machine Learning Apps with Streamlit
If you want to acquire the book for research purposes, I encourage you to send some information about your project, so we can discuss this further. You can check the link below for more information, and leave a comment below if you have any questions!
𝗦𝗶𝗺𝗽𝗹𝗶𝗳𝘆𝗶𝗻𝗴 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗣𝘆𝗰𝗮𝗿𝗲𝘁: https://leanpub.com/pycaretbook/
Relevant answer
Answer
There are several machine learning books that can provide you with the theoretical foundations, practical techniques and recent development of machine learning.
While selecting, look out for
- The book that cover a wide range of topics, from the basics to the advanced, and from the classical to the modern. This way, you can have a comprehensive overview of the field and explore different aspects of machine learning.
- The book should also provide clear explanations, proofs, and relevant examples. To help you understand the concepts, the methods, and the results of machine learning, and apply them to your own research problems.
  • asked a question related to Regression
Question
1 answer
In G Power - which should I use for a hierarchical regression with continuous variables? t test - linear multiple regression: fixed model, single regression coefficient OR OR f test - linear multiple regression: fixed model, R2 deviation from zero. What is the difference?
Relevant answer
Answer
Hello Bonnie,
The answer depends on which step in the hierarchy you're trying to evaluate.
1. If your base model includes only the target variable(s) (as in an "unadjusted," "raw," or "crude" model), then use: Fixed model, R2 deviation from zero.
2. If your base model includes only "control" variables, then again, use Fixed model, R2 deviation from zero.
3. If your second-tier model is intended to look at the added explanatory power of one or more target variables, given the "control" or previously entered variables, then use Fixed model, R2 increase.
Good luck with your work.
  • asked a question related to Regression
Question
3 answers
THANKS
Relevant answer
Answer
The criteria to choose between seemingly unrelated regression (SUR) and generalized method of moments (GMM) in EViews 13 depends on the research question you are trying to answer and the assumptions you are willing to make. The SUR method estimates the parameters of the system, accounting for heteroskedasticity and contemporaneous correlation in the errors across equations. The GMM method is a general method for estimating parameters in models where some of the assumptions made by maximum likelihood estimation are not met.
  • asked a question related to Regression
Question
10 answers
i am runing an instrumental variable regression.
Eviews is providing two different models for instrumetenal variables i.e., two-stage least squares and generalized method of moments.
how to choose between the two models.
thanks in advance
Relevant answer
Answer
Least squares method is, as I have been experienced, more convenient than the method of moments.
In the method of moments, first you have to derive the theoretical moments up to order p if the regression equation consists of p parameters in order to obtain p equations and then to solve the p equations substituting the values of the moments obtained from the sample in the equations.
Derivation of theoretical moments is more difficult than to set up normal equations (required in least squares method) since it depends upon the nature of the probability distribution followed by the parent population of the sample.
  • asked a question related to Regression
Question
2 answers
I need help on how to run RIDGE REGRESSION in EViews. I have installed the add-in in EViews, but am having problems running the regression. Could someone, please help me with the step-by-step video (or even explanation, on how to do this. I am facing deadline.
I will sincerely appreciate a timely response.
Shalom.
Monday
Relevant answer
Answer
Install EViews,
  • Open it and load your data.
  • Click on 'Quick' and select 'Estimate Equation'.
  • Select 'Ridge Regression' from the list of estimation.
  • Choose dependent and independent variables.
  • set the value of the ridge parameter.
  • click on OK to run the regression.
  • asked a question related to Regression
Question
6 answers
I am working on a SEM model using Mplus. The model includes 2 latent factors each with about 4 dichotomous indicators. The latent factors are regressed onto 5 exogenous predictors (also dichotomous). A dichotomous outcome is, in turn, regressed onto the 2 latent factors. I used WLSMV to estimate the model, which is recommended when the latent factor indicators are dichotomous.
The model fits well but my understanding is that Mplus uses probit regression for the DV and latent factors. And I am not very familiar with how to interpret probit results. So I do not know how to interpret the parameter estimates (the indicator coefficients for each latent factor; the exogenous coefficients for those variables after regressing the latent factor on them; and the coefficients for the DV regressed onto the latent risk factors).
Can anyone point me towards reference material that might walk me through how to interpret (and write-up) the results of this modeling?
Thanks for any help.
James
  • asked a question related to Regression
Question
2 answers
Regression Analysis
Relevant answer
Answer
What is the meaning of categorical more than 2(ordinal)?
But still, If a variable is categorical and has more than two categories, You should use that variable as a dummy variable or indicator variable. Mahfooz Alam
  • asked a question related to Regression
Question
14 answers
Hello
I am searching for the Panel smooth transition regression Stata code.
Does anyone know of any available code for Stata?
Thank you
Relevant answer
Answer
You can use the “PSTR” package to implement the Panel Smooth Transition Regression model in Stata. The package offers tools for conducting model specification tests, estimating the PSTR model, and evaluating the results
  • asked a question related to Regression
Question
3 answers
Hi everyone
I am using package "XTENDOTHRESDPD" to run a Dynamic panel threshold regression in Stata which is provided here: https://econpapers.repec.org/software/bocbocode/s458745.htm
However, I have the following issue which I could not solve.
To see whether the threshold effect is statistically significant, I am running "xtendothresdpdtest", function after the regression result and I am getting this Error:  "inferieurbt_result not found."
I would really appreciate it if you could guide me in case you have any experience with this function.
Relevant answer
Answer
You can run “xtendothresdpdtest” after using “XTENDOTHRESDPD” in Stata by typing the following command in the Stata command window:
xtendothresdpdtest
This command will test for the statistical significance of the threshold effect in your regression model. If you are getting an error message when running this command, it may be due to a problem with your data or your model specification. You may want to check your data and model specification to ensure that they are correct.
  • asked a question related to Regression
Question
7 answers
sing STATA or R, how can we extract intra class correlation coefficients (ICCs) for Multilevel Poisson and Multilevel Negative Binomial Regression?
Relevant answer
Answer
Thank you Mosharop Hossian for your details
  • asked a question related to Regression
Question
2 answers
Dear all
I have a set of balance panel data, i:6, t: 21 which is it overall 126 observation. I decided that 1 dependent variable (y) and 6 independents variables (x1,x2......).
First: I do unit root test it shows:
y I(I)
x1 I(0)
x2 I(I)
x3 I(I)
x4 I(0)
X5 I(I)
x6 I(0)
If I would like to run panel data regression (Pooled, Fixed Effect and Random Effect), is that the correct form for inputting the model in Views:
d(y) c x1 d(x2) d(x3) x4 d(x5) x6
or
Shall I sort all variables in the same difference level, adding "d" to all ?
please correct if I am wrong, these are the steps I would like to conduct the statical part of a panel data:
1. Test Unit Root
2. Panel Regression?
3. ARDL
Relevant answer
Answer
If the data is in different stationary levels, you can still write the model in Eviews by following these steps:
  1. Open the Eviews program.
  2. Load the data you want to use for your model.
  3. Click on the “Quick” menu and select “Estimate Equation”.
  4. In the Equation Specification window, select the variables you want to include in your model.
  5. Click on the “Options” button.
  6. In the Options window, select the appropriate option for handling non-stationary variables (e.g., first differences).
  7. Click on the “OK” button to close the Options window.
  8. Click on the “OK” button to run the model.
In summary, if the data is in different stationary levels, you can still write the model in Eviews by opening the Eviews program, loading the data you want to use for your model, clicking on the “Quick” menu and selecting “Estimate Equation”, selecting the variables you want to include in your model, clicking on the “Options” button, selecting the appropriate option for handling non-stationary variables (e.g., first differences), clicking on the “OK” button to close the Options window, and clicking on the “OK” button to run the model.
  • asked a question related to Regression
Question
3 answers
Hello everyone,
In order to compare two clinical methods, we usually use Passing & Bablok (PABA) regression. Most of the time, our samples are larger than n=50, but for the comparison I'm interested in today (method A vs method B), the samples are small (n = 10-15).
The PABA regression validates the equivalence between the two methods (method A vs method B). Indeed, the CI intercept crosses 0 and CI slope crosses 1 :
  • Intercept = -6 and confidence intervalle (CI) = [-56 ; 31]
  • Slope = 2. and confidence intervalle (CI) = [0,5 ; 4]
However, I have a few points of concern about these results because :
  • The Pearson coefficient is low (r = 0,63),
  • The size of the CI is very large,
  • The coefficient of variation (CV) between the two methods is high (CV > 20%).
Do you know of any criteria or rules that I could add to the analysis of PABA regression that would enable me to improve our validation method ?
Thanks in advance for your help ! :)
Relevant answer
Answer
Bonjour Morgane,
Attention, la méthode de Passing-Bablock donne des intervalles de confiance assez approximatifs surtout pour de faibles échantillons. Par ailleurs, le fait de regarder s'ils contiennent 0 ou non (pour l'ordonnée à l'origine) et 1 ou non (pour la pente) ne permet pas de prouver l'équivalence des méthodes : s'ils ne les contiennent pas, on peut raisonnablement rejeter l'équivalence des méthodes, mais sinon on ne peut rien dire — cf. la théorie des tests d'équivalence.
En particulier, s'ils sont très larges (ce qui se produira sur des données de faible taille et assez bruyantes ou pas du tout alignées sur une droite), on n'arrivera presque jamais à rejeter l'équivalence simplement par manque de puissance. C'est sans doute ce qu'il se passe dans votre cas.
Que donne une représentation graphique ?
Avec un peu plus de contexte, on peut vous guider sur les méthodes d'étude de concordance...
  • asked a question related to Regression
Question
1 answer
I discovered that three independent variables have standardized multiple coefficients (SMC) equal to 1.02 on a dependent variable. Are there any approaches to be considered for modifying a high R2 in a regression?
Relevant answer
Answer
HI,
Do you mean that each independent variable has the same value of 1.02 or the general multiple linear regression model has this beta value. In regression check both predictive R squared value and Adjusted R squared value especially when there are many predictor variables.
Check for your residual plots, without seen the residual plots it is hard to recommend a way to better fit your data to the model
  • asked a question related to Regression
Question
3 answers
I have conducted some ordinal logistic regressions, however, some of my tests have not met the proportional odds assumptions so I need to run multinomial regressions. What would I have to do to the ordinal DV to use it in this model? I'm doing this in SPSS by the way.
Relevant answer
Answer
Hello Hannah Belcher. How did you determine that the proportional odds (aka., parallel lines) assumption was too severely violated? And what is your sample size?
I ask those questions, because the test of proportional odds is known to be too liberal (i.e., it rejects the null too easily), particularly as n increases. You can find some relevant discussion of this (and many other issues) in this nice tutorial:
HTH.
  • asked a question related to Regression
Question
6 answers
Using SPSS, I've studied linear regression between two continous variables (having 53 values each), I've got a p-value of 0.000 which means no normal distribution, should I use another type of regression?
Relevant answer
Answer
Mohamed Amine Ferradji , you might clarify your question. You got a p-value of 0.000 for what test ? And to what did you apply this test ?
  • asked a question related to Regression
Question
8 answers
In carrying out panel data regression analysis, it is required that Hausman Specification Test be carried out to choose from Fixed Effect or Random Effect estimation approaches. Another theory holds that Breusch-Pagan Lagrange multiplier (LM) test for panel data is also required to choose between Random Effect estimation and Pooled Effect Estimation.
Which of the preliminary tests should come first? Are these tests the final determinants of which estimation approach to deploy?
Relevant answer
Answer
John C Frain Thanks a great deal, Professor. This is helpful.
However, aside a recourse to economic theory, other statisticians have suggested that economic theory should be combined with statistical analysis in determining the choice among Fixed Effect/Random Effect/Pooled Effect.
The above formed part of the major reason I needed clarification on "What are the preliminary tests that will determine the choice among fixed effect regression, random effect regression and pooled effect regression?"
  • asked a question related to Regression
Question
1 answer
Hello community!
I am running a CFA for a within and between subjects design. The research involves the study of students on variables before and after taking an entrepreneurship course. The dependent variables are self-efficacy, with 5 subconstructs, and entrepreneurial intent. The independent variable is the course. Covariates are a continuous age variable, exposure (0, 1, 2, or 3), and experience (0, 1, or 2) (I used dummy variables for these in the ANCOVAs).
I don't have experience with repeated measures CFA and want to make sure I'm doing it correctly. I have attached a picture for the CFA model I have tested. I correlated error terms for the errors of the corresponding measured items. I set the regression weights to equal for both times. I also correlated the latent variables. This study also has multiple groups (female and male), but I believe it does not change anything to the factor structure (I just added separate data sets for those groups in Amos). Please let me know if this assumption is wrong.
  1. Does the model reflect an appropriate way to test whether the factor structure holds across time?
  2. Is it OK that I did not include the covariates or should I?
  3. The model where I constrain the regressions weights to be equal for both times has significantly lower model fit according to the chi square difference test. The model fit is otherwise good for both models TLI & CFI > .9 and RMSEA < .04. Can I argue that theoretically the model should hold and since model fit is good for the constrained model, that it is OK to use it across time? I know the chi square difference test is sensitive to sample size (n > 3,000) but does that matter for the chi square difference test?
  4. Chi square constrained model – Chi square less constrained model -> 11814.262-11644.759=169.503 and df (2ndmodel) – df (1st model) = 2345-2288=57. p < .001
I would appreciate your insight very much!
Thank you,
Heidi
Relevant answer
Answer
Hello Heidi,
Assuming your principal concern (research question) is whether the factor structure holds for this batch of respondents over the length of the course in question, then:
1. Your illustration would correspond to the restricted model, in which respective variable-factor (and second-order) loadings were constrained to be equal. In the unrestricted model, you would remove such constraints. As well, in the unrestricted model, you'd likely not start by assuming correlated error terms.
2. I don't see that inclusion of covariates would help...again, the driving force here is the specific research question you're trying to address.
3. Yes, N makes a difference (that's how one arrives at a value of 11,814). However, given N and your data, your results would appear to suggest that: (a) the constrained model (equal structures) is significantly different from the unrestricted model; (b) both models appear to fit well enough (to your implied criteria) that one could consider the difference to be statistically significant but not of such magnitude as to be of practical significance.
The fact of the matter is, sample to sample variance may often be the only real "culprit" for why one research team claims support for structure "A" for a measure while another claims support for structure "B."
4. See #3.
Males vs. females: Was this a research question? I couldn't tell.
Good luck with your work.
  • asked a question related to Regression
Question
1 answer
In addition to Oaxaca-Blinder decomposition, does exogenous switching regression is applicable to see gender gap in market participation of agricultural product?
Relevant answer
Answer
Both Oaxaca-Blinder decomposition and exogenous switching regression can be used to see the gender gap in market participation of agricultural product, but they have different assumptions and interpretations. Oaxaca-Blinder decomposition assumes that the treatment variable (e.g., gender) is exogenous and does not affect the outcome variable (e.g., market participation) through unobserved factors. It decomposes the mean difference in the outcome variable between the two groups into an explained component (due to differences in observable characteristics, such as education, land size, etc.) and an unexplained component (due to differences in coefficients or discrimination). Exogenous switching regression also assumes that the treatment variable is exogenous, but it allows for heterogeneity in the outcome variable across the two groups. It estimates two regression models for the outcome variable, one for each group, and a selection equation for the treatment variable. It can estimate the average treatment effect (ATE) and the average treatment effect on the treated (ATT), which measure the difference in the expected outcome between the two groups and between the treated group and their counterfactual outcome, respectively.
The choice between Oaxaca-Blinder decomposition and exogenous switching regression depends on the research question and the data availability. Oaxaca-Blinder decomposition is simpler to implement and interpret, but it requires a common set of predictors for both groups and a linear specification of the outcome variable. Exogenous switching regression is more flexible and can account for nonlinearities and interactions in the outcome variable, but it requires a set of exogenous variables that affect only the treatment choice and not the outcome variable. Both methods can provide useful insights into the sources and magnitude of the gender gap in market participation of agricultural product.
  • asked a question related to Regression
Question
1 answer
Hi,
I have a set of studies that looked at the association of sex w.r.t to multiple variables. The majority of the studies reported regression variables such as beta, b values, t-stats, and standard errors. Is it possible to run a meta-analysis using any of the above-mentioned variables? If so, which software would be more meaningful to perform a meta-analysis? I did a wee bit of research and found out that Metafor in R would be the better choice to perform these kinds of meta-analyses.
Any help would be highly appreciated!
Thanks!
Relevant answer
Answer
Hello Vital,
As sex is typically coded to be a dichotomous variable, you could use:
1. ordinary Pearson r;
2. Cohen's d (probably derived from r via the formula d ~ 2r / (1 - r^2));
3. for single IV regression models, the "beta" (standardized regression coefficient) = Pearson r;
Both r and d are common ES metrics in meta-analytic studies. If most of your sources are correlational in form, then I'd suggest sticking with r.
The problem with multiple regression models is, unless each model has exactly the same assortment and number of IVs, and the same DV, beta coefficients aren't meaningfully comparable for a given IV (for your aims, sex) across studies.
Good luck with your work.
  • asked a question related to Regression
Question
1 answer
Hi,
I have a set of studies that looked at the association of sex w.r.t to multiple variables. The majority of the studies reported regression variables such as beta, b values, t-stats, and standard errors. Is it possible to run a meta-analysis using any of the above-mentioned variables? If so, which software would be more meaningful to perform a meta-analysis? I did a wee bit of research and found out that Metafor in R would be the better choice to perform these kinds of meta-analyses.
Any help would be highly appreciated!
Thanks!
Relevant answer
Answer
Hi,
you may meta-analyze the bivariate correlation coefficients that are depicted in most studies. If not, write an email to the author.
You may convert the relationship between an IV and the DV from the regression analysis in a semi-partial or partial correlation but the problem is that these don't follow a defined sampling distribution. The reason is that the context (i.e., the set of other predictors and covariates) affect the regression coefficient.
Recenty, a paper by Aloe proposed to solve this in a meta-regression where the sets of control variables can be represented as dummies in a meta-regression. I have not yet tested this idea but think that studies will differ to such a large degree (wrt the controls) that you'll end up with one dummy per study....Perhaps an extension could be to create a dummy for each used covariate and control for these......
Aloe, A. M. (2015). Inaccuracy of regression results in replacing bivariate correlations. Research Synthesis Methods, 6(1), 21-27.
Best,
Holger
  • asked a question related to Regression
Question
3 answers
can we apply regression on moderate correlation? Please recommended an easy book to understand for non-statistical readers.
Relevant answer
Answer
If you have enough observations, such as 40 or 50 or more, then tested your data for correlation, it is recommended to run regression, for it will give you better underdtanding of your data. You can check any reference of Statistics books to understand better. Regards.
  • asked a question related to Regression
Question
3 answers
Hello everyone. The p value of the path estime regression weight (B=0.198) from A to C, is 0.014 in my model in the figure. After boostraping, the coefficient from A to C (B=0.198) becomes p value 0.043 as a direct effect. What causes this difference in P value? Many thanks for your comments
Relevant answer
Answer
My guess is that the first p value is based on a regular theoretical/asymptotic standard error, whereas the second one is based on bootstrapping, which is a different methodology for finding a p value empirically based on resampling rather than asymptotic theory.
  • asked a question related to Regression
Question
9 answers
Hello spatial analysis experts
Hope you're all good.
I need urgently the commands and R codes of performing spatial binomial regression in RStudio. Please if someone has already worked on it, share the codes from start to end.
Thanks and regards
Dr. Sami
Relevant answer
Answer
Hi Dr. Sami,
To perform spatial binomial regression in RStudio, you can use the 'spdep' package, which provides functions for spatial dependency analysis. Here's a step-by-step guide to performing spatial binomial regression:
1. Install and load the 'spdep' package in RStudio:
install.packages("spdep")
library(spdep)
2. Load your spatial data into RStudio. Make sure your data includes a dependent binary variable and spatial coordinates (e.g., longitude and latitude).
3. Create a spatial weights matrix using the 'nb2listw' function. This function will generate a neighbor list and transform it into a spatial weights matrix. For example:
nb <- dnearneigh(coordinates(data), d1) # Create neighbor list
w <- nb2listw(nb) # Create spatial weights matrix
4. Perform spatial binomial regression using the 'spglm' function. This function fits spatial generalized linear models. Specify the family argument as 'binomial' to indicate binomial regression. For example:
model <- spglm(dependent ~ independent, data = data, family = "binomial", weights = w)
5. Extract and analyze the regression results. You can use the 'summary' function to obtain a summary of the model coefficients and their significance. For example:
summary(model)
Remember to replace 'dependent' and 'independent' with the appropriate variable names in your dataset. Additionally, adjust the value of 'd1' in step 3 to define the distance threshold for identifying neighboring observations.
I hope this helps you with performing spatial binomial regression in RStudio. If you encounter any issues or have further questions, feel free to ask.
  • asked a question related to Regression
Question
4 answers
Hi there!
I am currently running SPSS AMOS 24
But the SEM result doesn't show the P-Value for regression weights in estimate when it comes to my three main paths
Estimates only showed score 1 for each correlation, S.E., C.R. and P-Value are all empty
(The rest of the variables are normal, only three main ones)
How can I resolve this question?
Looking forward to kind assistance in this regard, wish everyone well :)
Relevant answer
Answer
It seems like you are encountering an issue with SPSS AMOS 24 where the P-values for regression weights are not being displayed for your three main paths. This could be due to various reasons, such as missing data or model specification issues. To resolve this, you can try the following steps:
1. Check your data for missing values or outliers that may affect the estimation process. Make sure all variables used in the analysis have complete data.
2. Verify that your model is correctly specified, including the variable labels and measurement scales. Ensure that you have properly defined the paths and set up the model constraints.
3. Consider increasing the sample size if it is small, as this can improve the accuracy of parameter estimates and increase the likelihood of obtaining meaningful P-values.
4. If the issue persists, try updating your SPSS AMOS software to the latest version or consult the SPSS AMOS user community or support team for further assistance. They may be able to provide specific guidance based on the software version you are using.
  • asked a question related to Regression
Question
2 answers
My research topic is ROLE OF TEACHERS' ENTREPRENEURIAL ORIENTATION IN DEVELOPING ENTREPRENEURIAL MIND-SET OF STUDENTS IN HEIs. The research constructs that I am using are ENTREPRENEURIAL ORIENTATION and ENTREPRENEURIAL MIND-SET, both are psychological and behavioral. The variables that I will be measuring are INNOVATIVENESS, PRO-ACTIVENESS and RISK TAKING ABILITY of Teachers.
I will checking the strength of relationship between these construct using regression and would like to use THEORY OF PLANNED BEHAVIOUR by Ajzen in support of my research argument without TESTING or BUILDING the theory. I would seek expert advice as to how can it be done and is it practically acceptable practice Thank You
Relevant answer
Answer
Grounded Theory. Develops based on data.
  • asked a question related to Regression
Question
4 answers
Since OLS and Fixed effect estimation varies, for a fixed effect panel data model estimated using a fixed effects (within) regression what assumptions, for example no heteroskedasticity, linearity, do I need to test for, before I can run the regression.
I'm using the and xtreg,fe and xtscc,fe commands on stata.
Relevant answer
Answer
Before performing a fixed effects regression on panel data, several assumptions should be tested to ensure the validity of the results. These assumptions include:
  1. Time-invariant individual effects: One assumption of the fixed effects model is that the individual-specific effects are time-invariant. This means that the unobserved individual-specific factors affecting the dependent variable remain constant over time. This assumption can be tested by examining whether the individual effects are correlated with the time-varying regressors. If there is a correlation, it suggests that the assumption may be violated.
  2. No perfect multicollinearity: The independent variables in the regression should not exhibit perfect multicollinearity, which occurs when one or more independent variables are perfectly linearly dependent on others. Perfect multicollinearity can lead to unreliable coefficient estimates and inflated standard errors.
  3. No endogeneity: The assumption of exogeneity implies that the independent variables are not correlated with the error term. Endogeneity can arise when there are omitted variables, measurement errors, or simultaneity issues. Various tests, such as instrumental variable approaches or tests for correlation between the residuals and the independent variables, can be employed to check for endogeneity.
  4. Homoscedasticity: Homoscedasticity assumes that the error term has constant variance across all observations. Heteroscedasticity, where the variance of the error term varies systematically, can lead to inefficient coefficient estimates. Graphical methods, such as plotting residuals against predicted values or conducting formal tests like the White test, can be used to diagnose heteroscedasticity.
  5. No serial correlation: Serial correlation, also known as autocorrelation, assumes that the error terms are not correlated with each other over time. If there is serial correlation, it violates the assumption of independence of observations. Diagnostic tests like the Durbin-Watson test or plotting residuals against time can help identify serial correlation.
  6. Normality of errors: The assumption of normality assumes that the error term follows a normal distribution. Departures from normality can affect the reliability of hypothesis tests and confidence intervals. Graphical methods, such as histograms or Q-Q plots of residuals, can help assess normality.
  • asked a question related to Regression
Question
4 answers
In what situation we will use please tell me I face difficulty
Relevant answer
Answer
ARDL stands for Autoregressive Distributed Lag model. It is a statistical model used to analyze time series data. ARDL error correction model regression is an extension of the ARDL model that includes an error correction term. This term helps to correct for any deviations from long-run equilibrium in the short run. ARDL long run form bound test is a test used to determine whether there is a long-run relationship between two or more variables in an ARDL model.
  • asked a question related to Regression
Question
6 answers
In 2007 I did an Internet search for others using cutoff sampling, and found a number of examples, noted at the first link below. However, it was not clear that many used regressor data to estimate model-based variance. Even if a cutoff sample has nearly complete 'coverage' for a given attribute, it is best to estimate the remainder and have some measure of accuracy. Coverage could change. (Some definitions are found at the second link.)
Please provide any examples of work in this area that may be of interest to researchers. 
Relevant answer
Answer
I would like to restart this question.
I have noted a few papers on cutoff or quasi-cutoff sampling other than the many I have written, but in general, I do not think those others have had much application. Further, it may be common to ignore the part of the finite population which is not covered, and to only consider the coverage, but I do not see that as satisfactory, so I would like to concentrate on those doing inference. I found one such paper by Guadarrama, Molina, and Tillé which I will mention later below.
Following is a tutorial i wrote on quasi-cutoff (multiple item survey) sampling with ratio modeling for inference, which can be highly useful for repeated official establishment surveys:
"Application of Efficient Sampling with Prediction for Skewed Data," JSM 2022: 
This is what I did for the US Energy Information Administration (EIA) where I led application of this methodology to various establishment surveys which still produce perhaps tens of thousands of aggregate inferences or more each year from monthly and/or weekly quasi-cutoff sample surveys. This also helped in data editing where data collected in the wrong units or provided to the EIA from the wrong files often showed early in the data processing. Various members of the energy data user community have eagerly consumed this information and analyzed it for many years. (You might find the addenda nonfiction short stories to be amusing.)
There is a section in the above paper on an article by Guadarrama, Molina, and Tillé(2020) in Survey Methodology, "Small area estimation methods under cut-off sampling," which might be of interest, where they found that regression modeling appears to perform better than calibration, looking at small domains, for cutoff sampling. Their article, which I recommend in general, is referenced and linked in my paper.
There are researchers looking into inference from nonprobability sampling cases which are not so well-behaved as what I did for the EIA, where multiple covariates may be needed for pseudo-weights, or for modeling, or both. (See Valliant, R.(2019)*.) But when many covariates are needed for modeling, I think the chances of a good result are greatly diminished. (For multiple regression, from an article I wrote, one might not see heteroscedasticity that should theoretically appear, which I attribute to the difficulty in forming a good predicted-y 'formula'. For psuedo-inclusion probabilities, if many covariates are needed, I suspect it may be hard to do this well either, but perhaps that may be more hopeful. However, in Brewer, K.R.W.(2013)**, he noted an early case where failure using what appears to be an early version of that helped convince people that probability sampling was a must.)
At any rate, there is research on inference from nonprobability sampling which would generally be far less accurate than what I led development for at the EIA.
So, the US Energy Information Administration makes a great deal of use of quasi-cutoff sampling with prediction, and I believe other agencies could make good use of this too, but in all my many years of experience and study/exploration, I have not seen much evidence of such applications elsewhere. If you do, please respond to this discussion.
Thank you - Jim Knaub
..........
*Valliant, R.(2019), "Comparing Alternatives for Estimation from Nonprobability Samples," Journal of Survey Statistics and Methodology, Volume 8, Issue 2, April 2020, Pages 231–263, preprint at 
**Brewer, K.R.W.(2013), "Three controversies in the history of survey sampling," Survey Methodology, Dec 2013 -  Ken Brewer - Waksberg Award article: 
  • asked a question related to Regression
Question
4 answers
I'm working on my PhD thesis and I'm stuck around expected analysis.
I'll briefly explain the context then write the question.
I'm studying moral judgment in the cross-context between Moral Foundations Theory and Dual Process theory.
Simplified: MFT states that moral judgmnts are almost always intuitive, while DPT states that better reasoners (higher on cognitive capability measures) will make moral judgmnets through analytic processes.
I have another idea - people will make moral judgments intuitively only for their primary moral values (e.g., for conservatives those are binding foundations - respectin authority, ingroup loyalty and purity), while for the values they aren't concerned much about they'll have to use analytical processes to figure out what judgment to make.
To test this idea, I'm giving participants:
- a few moral vignettes to judge (one concerning progressive values and one concerning conservative values) on 1-7 scale (7 meaning completely morally wrong)
- moral foundations questionnaire (measuring 5 aspects of moral values)
- CTSQ (Comprehensive Thinking Styles Questionnaire), CRT and belief bias tasks (8 syllogisms)
My hypothesis is therefore that cognitive measures of intuition (such as intuition preference from CTSQ) will predict moral judgment only in the situations where it concerns primary moral values.
My study design is correlational. All participants are answering all of the questions and vignettes. So I'm not quite sure how to analyse the findings to test the hypothesis.
I was advised to do a regressional analysis where moral values (5 from MFQ) or moral judgments from two different vignettes will be predictors, and intuition measure would be dependent variable.
My concern is that the anlaysis is a wrong choice because I'll have both progressives and conservatives in the sample, which means both groups of values should predict intuition if my assumption is correct.
I think I need to either split people into groups based on their MFQ scores than do this analysis, or introduce some kind of multi-step analysis or control or something, but I don't know what would be the right approach.
If anyone has any ideas please help me out.
How would you test the given hypothesis with available variables?
Relevant answer
Answer
There are several statistical analysis techniques available, and the choice of method depends on various factors such as the type of data, research question, and the hypothesis being tested. Here is a step-by-step guide on how to approach hypothesis testing:
  1. Formulate your research question and null hypothesis: Start by clearly defining your research question and the hypothesis you want to test. The null hypothesis (H0) represents the default position, stating that there is no significant relationship or difference between variables.
  2. Select an appropriate statistical test: The choice of statistical test depends on the nature of your data and the research question. Here are a few common examples:Student's t-test: Used to compare means between two groups. Analysis of Variance (ANOVA): Used to compare means among more than two groups. Chi-square test: Used to analyze categorical data and test for independence or association between variables. Correlation analysis: Used to examine the relationship between two continuous variables. Regression analysis: Used to model the relationship between a dependent variable and one or more independent variables.
  3. Set your significance level and determine the test statistic: Specify your desired level of significance, often denoted as α (e.g., 0.05). This value represents the probability of rejecting the null hypothesis when it is true. Based on your selected test, identify the appropriate test statistic to calculate.
  4. Collect and analyze your data: Gather the necessary data for your analysis. Perform the chosen statistical test using statistical software or programming languages like R or Python. The specific steps for analysis depend on the chosen test and software you are using.
  5. Calculate the p-value: The p-value represents the probability of obtaining the observed results (or more extreme) if the null hypothesis is true. Compare the p-value to your significance level (α). If the p-value is less than α, you reject the null hypothesis and conclude that there is evidence for the alternative hypothesis (Ha). Otherwise, you fail to reject the null hypothesis.
  6. Interpret the results: Based on the outcome of your analysis, interpret the results in the context of your research question. Consider the effect size, confidence intervals, and any other relevant statistical measures.
  • asked a question related to Regression
Question
3 answers
This questions is for beginner students only.
Relevant answer
Answer
You can check any book of statistics to understand and follow the answer, and get acquainted with this topic, Regard.