Science topic
Regression - Science topic
Explore the latest questions and answers in Regression, and find Regression experts.
Questions related to Regression
How do we evaluate the importance of individual features for a specific property using ML algorithms (say using GBR) and construct an optimal features set for our problem.
image taken from: 10.1038/s41467-018-05761-w
Hello all,
I am running into a problem I have not encountered before with my mediation analyses. I am running a simple mediation X > M > Y in R.
Generally, I concur that the total effect does not have to be significant for there to be a mediation effect, and in the case I am describing, this would be a logical occurence, since the effects of path a and b are both significant and respectively are -.142 and .140, thus resulting in a 'null-effect' for the total effect.
However, my 'c path X > Y is not 'non-significant' as I would expect, rather, the regression does not fit (see below) :
(Residual standard error: 0.281 on 196 degrees of freedom
Multiple R-squared: 0.005521, Adjusted R-squared: 0.0004468
F-statistic: 1.088 on 1 and 196 DF, p-value: 0.2982).
Usually I would say you cannot interpret models that do not fit, and since this path is part of my model, I hesitate to interpret the mediation at all. However, the other paths do fit and are significant. Could the non-fitting also be a result of the paths cancelling one another?
Note: I am running bootstrapped results for the indirect effects, but the code does utilize the 'total effect' path, which does not fit on its own, therefore I am concerned.
Note 2: I am working with a clinical sample, therefore the samplesize is not as great as I'd like group 1: 119; group2: 79 (N = 198).
Please let me know if additional information is needed and thank you in advance!
I am evaluating impacts of major health financing policy changes in Georgia (country). The database is household level and it is not panel data. Continues outcome variable is out-of-pocket health spending (OOPs) and it exhibits skewed distribution as well as seasonality. The residuals are positively autocorrelated. The regression also takes independent variables connected with each household characteristics into account. My goal is to evaluate impact of health policies on the financial wellbeing of population connected with health care utilization determinants. Should I aggregate the dataset or keep it as it is?
Here is the case, as I said, I am working on how Macroeconomic variables affect REIT Index Return. To understand how macroeconomic variables affect REIT which tests or estimation method should I use.
I know I can use OLS but is there any other method to use? All my time series are stationary at I(0).
In the domain of clinical research, where the stakes are as high as the complexities of the data, a new statistical aid emerges: bayer: https://github.com/cccnrc/bayer
This R package is not just an advancement in analytics - it’s a revolution in how researchers can approach data, infer significance, and derive conclusions
What Makes `Bayer` Stand Out?
At its heart, bayer is about making Bayesian analysis robust yet accessible. Born from the powerful synergy with the wonderful brms::brm() function, it simplifies the complex, making the potent Bayesian methods a tool for every researcher’s arsenal.
Streamlined Workflow
bayer offers a seamless experience, from model specification to result interpretation, ensuring that researchers can focus on the science, not the syntax.
Rich Visual Insights
Understanding the impact of variables is no longer a trudge through tables. bayer brings you rich visualizations, like the one above, providing a clear and intuitive understanding of posterior distributions and trace plots.
Big Insights
Clinical trials, especially in rare diseases, often grapple with small sample sizes. `Bayer` rises to the challenge, effectively leveraging prior knowledge to bring out the significance that other methods miss.
Prior Knowledge as a Pillar
Every study builds on the shoulders of giants. `Bayer` respects this, allowing the integration of existing expertise and findings to refine models and enhance the precision of predictions.
From Zero to Bayesian Hero
The bayer package ensures that installation and application are as straightforward as possible. With just a few lines of R code, you’re on your way from data to decision:
# Installation
devtools::install_github(“cccnrc/bayer”)# Example Usage: Bayesian Logistic Regression
library(bayer)
model_logistic <- bayer_logistic( data = mtcars, outcome = ‘am’, covariates = c( ‘mpg’, ‘cyl’, ‘vs’, ‘carb’ ) )
You then have plenty of functions to further analyze you model, take a look at bayer
Analytics with An Edge
bayer isn’t just a tool; it’s your research partner. It opens the door to advanced analyses like IPTW, ensuring that the effects you measure are the effects that matter. With bayer, your insights are no longer just a hypothesis — they’re a narrative grounded in data and powered by Bayesian precision.
Join the Brigade
bayer is open-source and community-driven. Whether you’re contributing code, documentation, or discussions, your insights are invaluable. Together, we can push the boundaries of what’s possible in clinical research.
Try bayer Now
Embark on your journey to clearer, more accurate Bayesian analysis. Install `bayer`, explore its capabilities, and join a growing community dedicated to the advancement of clinical research.
bayer is more than a package — it’s a promise that every researcher can harness the full potential of their data.
Explore bayer today and transform your data into decisions that drive the future of clinical research: bayer - https://github.com/cccnrc/bayer
I have three variables (A, B, C) and do a multilevel SEM with R - Lavaan.
I do not understand why the following two models render different regression coefficients:
in the 1st one I use the ready aggregated latent variables from the sheet directly, in the 2nd one I define them within the model, but the data behind is of course the same.
Could anybody please explain why that is and which model would be the right one to use?
1.) "
level: 1
A ~ B + C
level: 2
A ~ B + C
"
2.)"
level: 1
A =~ a1 + a2 + a3
B =~ b1 + b2 + b3 + b4
c =~ c1 + c2 + c3
A ~ B + C
level: 2
A =~ a1 + a2 + a3
B =~ b1 + b2 + b3 + b4
C =~ c1 + c2 + c3
A ~ B + C
"
thanks so much for any help!
Dear All,
I have an imagery with a single fish species within each image along with a list of morphometric measurements of the fish (length, width, length of tail, etc). I would like to train a CNN model that will predict these measurements having as input only the images. Any ideas what kind of architecture is ideal for this task? I read about multioutput learning, but I haven't found a practical implementation in Python.
Thank you for your time.
I have collected data at community level using cluster sampling. The ICC shows >10% variability at cluster level. However, I don't have relevant variable at cluster level (all variables are at household and individual levels).
Then, can I run multilevel regression without having multilevel variable?
Thanks!
My Topic is Study of the energy status of the construction materials, that is why all parameters of them is very needed. calculation, comparison, regression, co relation etc.
that is why if you have any idea of them .
thanking you all
Dear Colleagues,
Does anyone know about Universities that are offering (a) Ph.D. by prior publication (b) Ph.D. by portfolio?
I have two publications viz."Regression Testing in Era of Internet of Things and Machine Learning" and "Regression Testing and Machine Learning". The former has touched 1k+ copies and has a rating of 4.04 and the latter is a recent publication with 200+ copies with a rating of 4.04. This data is as per BookAuthority.org.
Also, the former is indexed in prestigious searches such as Deutsche Nationalbibliothek (DNB), GND Network, Crossref Metadata Search, and OpenAIRE Explore.
Any leads or pointers would be greatly appreciated.
Best Regards,
Abhinandan(919886406214).
References
Hello everyone and thank you for reading my question.
I have a data set that have around 2000 data point. It have 5 inputs (4 wells rate and the 5th is the time) and 2 ouputs ( oil cumulative and water cumulative). See the attached image.
I want to build a Proxy model to simualte the cumulative oil & water.
I have made 5 models ( ANN, Extrem Gradient Boost, Gradient Boost, Randam forest, SVM) and i have used GridSearch to tune the hyper parameters and the results for training the models are good. Of course I have spilited the training data set to training, test and validation sets.
So I have another data that I haven't include in either of the train,test and validation sets and when I use the models to predict the output for this data set the models results are bad ( failed to predict).
I think the problem lies in the data itself because the only input parameter that changes are the (days) parameter while the other remains constant.
But the problem is I can't remove the well rate or join them into a single variable because after the Proxy model has been made I want to optimize the well rates to maximize oil and minimize water cumulative respectively.
Is there a solution to suchlike issue?
Hi,
I am trying to evalaute the impact of gender quotas on women's political engagement. I am using the world values survey data on different countries over the time period of range 1981-2009. I wish to do a country and time fixed regression of gender quotas on a variable while controlling for age. However age in the survey is divided into categories, how can i recode it for my regression. Should I use binning to control for age or should I use the mean values of the categories?
It is known that we can use the regression analysis to limit the confounding variables affecting the main outcome. But what if the entire sample have a confounding variable affecting main outcome, will Regression Analysis still applicable and reliable ?
For example a study was done to investigate the role of certain intervention in cognitive impairment, the entire population included was old aged (more than 60 years old ), which means that the age here is a risk factor ( Co-variate ) in the entire sample, and it is well known that age is a significant independent risk factor of cognitive impairment
My question here is; Will the regression here of a real value ? Will it totally vanish the effect of age and got to us the clear effects the intervention on cognitive impairment ?
in the use of spectral indices in the estimation of corn yield, why is it that when I put the average of the total index at the farm level in the equation generated from the regression, the predicted yield is closer to the actual yield even though the coefficient of determination is weak?
# spectralindices
#predictedyield
#RS
I am doing landuse projection using the Dyna-CLUE model, but I am stucked with the error "Regression can not be calculated due to a large value in cell 0,478". I would appreciate any advice you can provide to solve this error.
I am conducting a meta-analysis and I want to use the nonlinear polynomial regression and splines functions to model the dose-response relationship between the parameters of interest.
I would appreciate any help or suggestions.
Thank you very much.
My question is looking at the influence of simulation on student attitudes. My professor would like me to do regression analysis, but he says to do two regressions. I have my pre-test data and post-test data the only other information I have is student college. What I found in my class materials seems to indicate that I can complete a regression using the post-test as my dependent variable and the pre-test as my independent variable in SPSS. How would I do another regression? Should I work in the colleges as another dependent variable and if so, do I do them as a group or do I need to create a variable for each college?
I regress X to Y: ,direct effect (c)
M: mediator: I regress X to M (a), M to Y (b)
Total effect = c + a*b
now i introduce a moderator effect between X and Y
How i calculate the total effect with moderator and mediator effect
I have daily sales data and stock availability for items in a supermarket chain. My goal is to estimate the sales quantity elasticity with respect to availability (represented as a percentage). With this model, I want to understand how a 1% change in availability impacts sales. Currently, single-variable regressions yield low R-squared values. Should I include lagged sales values in the regression to account for other endogenous factors influencing sales? This would isolate availability as the primary exogenous variable
I am doing a study focusing on analyzing differences in fish assemblages due to temperature extremes. I calculated Shannon diversity, evenness, richness, and total abundance for each year sampled. The years are grouped into 2 temperature periods essentially as well, which is what I want to overall compare.
On viewing results, there appears to be consistency across years, and when comparing the two groupings. I do have multivariate tests to follow this after for community composition, but when describing univariate results, are there any statistical tests that can be followed up with to better show there is no difference, rather than simply describing the numbers and their mean differences?
Dear all,
I am sharing the model below that illustrates the connection between attitudes, intentions, and behavior, moderated by prior knowledge and personal impact perceptions. I am seeking your input on the preferred testing approach, as I've come across information suggesting one may be more favorable than the other in specific scenarios.
Version 1 - Step-by-Step Testing
Step 1: Test the relationship between attitudes and intentions, moderated by prior knowledge and personal impact perceptions.
Step 2: Test the relationship between intentions and behavior, moderated by prior knowledge and personal impact perceptions.
Step 3: Examine the regression between intentions and behavior.
Version 2 - Structural Equation Modeling (SEM)
Conduct SEM with all variables considered together.
I appreciate your insights on which version might be more suitable and under what circumstances. Your help is invaluable!
Regards,
Ilia
Hello everyone, for my dissertation I have two predictor variables and one criterion variable. In one of the predictor variable- I further have 5 domains and it doesn't have a global score so in that case can i used multiple regression or i have to perform step wise linear regression seperately for 6 predictors(5 domains and another predictor) ?- keeping in mind the assumption of multicollinearity.
Dear Scientists and Researchers,
I'm thrilled to highlight a significant update from PeptiCloud: new no-code data analysis capabilities specifically designed for researchers. Now, at www.pepticloud.com, you can leverage these powerful tools to enhance your research without the need for coding expertise.
Key Features:
PeptiCloud's latest update lets you:
- Create Plots: Easily visualize your data for insightful analysis.
- Conduct Numerical Analysis: Analyze datasets with precision, no coding required.
- Utilize Advanced Models: Access regression models (linear, polynomial, logistic, lasso, ridge) and machine learning algorithms (KNN and SVM) through a straightforward interface.
The Impact:
This innovation aims to remove the technological hurdles of data analysis, enabling researchers to concentrate on their scientific discoveries. By minimizing the need for programming skills, PeptiCloud is paving the way for more accessible and efficient bioinformatics research.
Join the Conversation:
- How do you envision no-code data analysis transforming your research?
- Are there any other no-code features you would like to see on PeptiCloud?
- If you've used no-code platforms before, how have they impacted your research productivity?
PeptiCloud is dedicated to empowering the bioinformatics community. Your insights and feedback are invaluable to us as we strive to enhance our platform. Visit us at www.pepticloud.com to explore these new features, and don't hesitate to reach out at [email protected] with your thoughts, suggestions, or questions.
Together, let's embark on a journey towards more accessible and impactful research.
Warm regards,
Chris Lee
Bioinformatics Advocate & PeptiCloud Founder
hi, i'm currently writing my psychology dissertation where i am investigating "how child-oriented perfectionism relates to behavioural intentions and attitudes towards children in a chaotic versus calm virtual reality environment".
therefore i have 3 predictor variables/independent variables: calm environment, chaotic environment and child-oriented perfectionism
my outcome/dependent variables are: behavioural intentions and attitudes towards children.
my hypotheses are:
- participants will have more negative behavioural intentions and attitudes towards children in the chaotic environment than in the calm environment.
- these differences (highlighted above) will be magnified in participants high in child-oriented perfectionism compared to participants low in child oriented perfectionism.
i used a questionnaire measuring child-oriented perfectionism which will calculate a score. then participants watched the calm environment video and then answered the behavioural intentions and attitudes towards children questionnaires in relation to the children shown in the calm environment video. participants then watched the chaotic environment video and then answered the behavioural intentions and attitudes towards children questionnaire in relation to the children in the chaotic environment video.
i am unsure whether to use a multiple linear regression or repeated measures anova with a continuous moderator (child-oriented perfectionism) to answer my research question and hypotheses. please please can someone help!
How can I interpret these two examples below in the mediation analysis? Help me
1) with negative indirect and total effect, positive direct effect
Healthy pattern (X)
Sodium Consumption (M)
Gastric Cancer (Y)
Total Effect: Negative (-0.29)
Indirect Effect: Negative (-0.44)
Direct Effect: Positive (0.14)
Mediation percentage: 100%
2) With total and direct negative effect, positive indirect effect
Healthy pattern (x)
Sugar consumption (m)
Gastric Cancer (Y)
Total Effect: Negative (-0.42)
Indirect Effect: Positive (0.03)
Direct Effect: Negative (-0.29)
Mediation percentage: 10.3%
I run OLS regression on panel data in Eviews and then 2SLS and GMM regression.
I introduced all the independent variables of OLS as instrumental variables.
I am getting exacty same results under the three methods.
is there any mistake in running the models
I am also attaching the results.
thanks in advance
In his 1992 paper, (Psychological Assessment 1992, Vol.4, No. 2,145-155) Tellegen proposed a formula to calculate the uniform T score.
UT = B0 + B1X + B2X2 + B3X3.
B0 being the intercept, X the raw score and B1, B2 and B3 different regression coefficients. X2 is squared and X3 cubic.
What is the intercept ? How do you calculate the intercept (Bzero)?
How do you calculate the regression cofficient? Is it between the raw score and the percentile? Why 3 different regression coefficients?
Suppose I compute a least squares regression with the growth rate of y against the growth rate of x and a constant. How do I recover the elasticity of the level of y against the level of x from the estimated coefficient?
In most of the studies tobit regression is used but in tobit model my independent variable is not significant. Whether fractional logistic regression is also an appropriate technique to explore determinants of efficiency?
I want to analyse data of body measurement of Osn\manabadi goats
If I want to carry out innovative research based on Wasserstein Regression, what other perspectives can I carry out statistical innovation? Wasserstein Regressions can I carry out statistical innovation? Specifically,(1) Combining with Bayesian framework, the prior distribution is introduced and parameter estimation is performed based on Bayesian rule to obtain more reliable estimation results.(2)Variable selection technique is introduced to automatically select the predictive distribution that has explanatory power to the response distribution to obtain sparse interpretation.
Can the above questions be regarded as a highly innovative research direction?
I would like to utilise the correct regression equation for conducting Objective Optimisations using MATLAB's Optimisation Tool.
When using Design Expert, I'm presented with the Actual factors or Coded factors for the regression equation. However, with the Actual Factors, I'm presented with multiple regression equations since one of my input factors was a categoric value. In this categoric value, the factors were, Linear, Triangular, Hexagonal and Gyroid. As a result, I'm unsure which Regression equation to utilise from the actual factors image.
Otherwise, should I utilise the single regression equation which incorporates all of them? I feel like I'm answering my own question and I really should be using the Coded Factors for the regression equation, but I would like some confirmation.
I used one of the regression equations under "Actual Factors" where Linear is seen, but I fear that this did not incorporate all of the information from the experiment. So any advice would be most appreciated.
Most appreciated!
Whether one independent variable in multilevel regression can be the other two independent variables, such as type, token, and type/token ratio
Can we use inferential statistics like correlation and regression if we use non-probability sampling technique like convenient or judgement sampling?
Phonetic - What is progressive and regressive assimilation and dissimilation in the Romanic Languages (especially Spanish) and how do you recognize it? Was ist progressive und regressive Assimilation und Dissimilation in den Romanischen Sprachen, besonders im Spanischen? Que es la asimilación progresiva y regresiva y la disimilación fonética en las lenguas románicas (especialmente en el español)?
I am searching for an explication, a good source recommendation, where I could read more about this topic and some examples for the assimilation or dissimilation in the romanic languages, especially in the Spanish language. Thank you for helping me!!
I found in google but can't understand properly.
Hi,
I have some confusion as to which one is better for outcomes as a model using binomial regression or logistic regression. currently i am working on judicial decisions, outcomes in tax courts where the cases go either in favour of assessee or the taxman. The factors influencing the judges as reflected in the cases are duly represented by presence(1) or absence (0) of the same. If a factor is not considered in final judgment it takes '0' else '1'. if outcome is favourable to assessee - it is '1' else'0' - now which would be the best approach to put this into a regression model showing relationhip between outcome (dependant) and independent ( factors - may be 5-6 variables). I need some guidance on this . can i use any other better model for forecast after i can perform a bootstrap run for say 1000 simulations and then arrive average outcomes and related statistics.
I've often seen the following steps used to test for moderating effects in a number of papers, and I don't quite understand the usefulness of Model 1 (which only tests the effect of the control variable on the dependent variable) and Model 4 (which only adds the cross-multiplier term of one of the moderating variables with the independent variable). These two models seem redundant.
if we can
1- First degree equation
2- 7th-degree polynomial equation
3- A non-linear differential equation
4- A system of two variables and 2 equations
5- A system of second-order differential equations
6- A device of nonlinear equations
And
7- The next step is the non-linear differential equation
If we can model all of these by regression and get the output correctly or with good accuracy, then we can have an approximate model of the system using only data, and this can be a start for using control or troubleshooting. Complex systems for which exact models are not available.
for the test, I start with 1 and 2 but Is it possible to achieve a good accuracy of the answer with regression for the rest of the cases?
In the case of a constant coefficient, where the VIF is greater than 10, what does that mean? Do all the variables in the model exhibit multicollinearity? How can multicollinearity be reduced? Multicollinearity could be reduced by removing variables with VIF >10. But I don't know what to do with the constant coefficient.
Thank you very much
Hello,
I am measuring thermal stability of a small protein (131 aa) using circular dichroism following the loss of its secondary structure. The data obtained is normalized to be within 0 and 1 where 0 is folded protein and 1 is completely unfolded. The CD of the fully unfolded state was calculated from a different experiment on the same batch and taken as reference. Once plotting my data in Graphpad Prism 9, I am fitting a standard 4PL curve using non-linear regression, constraining the regression to use 0 as bottom value and 1 as top value (see attached file). The Tm is reported as IC50 in this screenshot because this formula is often use for calculating IC50 and EC50. However, the resulted fitted line seems to not being able to represent correctly my data. I performed this experiment twice and the replicate test is showing that the model is inadequately representing the data. Should I look for a different equation to model my data? Or am I making a mistake in performing this regression? Thank you for the help!
There are two ways we can go about testing the moderating effect of a variable (assuming the moderating variable is a dummy variable). One is to add an interaction term to the regression equation, Y=b0+b1*D+b2*M+b3*D*M+u, to test whether the coefficient of the interaction term is significant; an alternative approach could also be to equate the interaction term model to a grouped regression (assuming that the moderating variables are dummy variables), which has the advantage of directly showing the causal effects of the two groups. However, we still need to test the statistical significance of the estimated D*M coefficients of the interaction terms by means of an interaction term model. Such tests are always necessary because between-group heterogeneity cannot be resorted to intuitive judgement.
One of the technical details is that if the group regression model includes control variables, the corresponding interaction term model must include all the interaction terms between the control variables and the moderator variables in order to ensure the equivalence of the two estimates.
If in equation Y=b0+b1*D+b2*M+b3*D*M+u I do not add the cross-multiplication terms of the moderator and control variables, but only the control variables alone, is the estimate of the coefficient on the interaction term still accurate at this point? At this point, can b1 still be interpreted as the average effect of D on Y when M = 0?
In other words, when I want to test the moderating effect of M in the causal effect of D on Y, should I use Y=b0+b1*D+b2*M+b3*D*M+b4*C+u or should I use Y=b0+b1*D+b2*M+b3*D*M+b4*C+b5*M*C+u?
Reference: 江艇.因果推断经验研究中的中介效应与调节效应[J].中国工业经济,2022(05):100-120.DOI:10.19581/j.cnki.ciejournal.2022.05.005.
I am writing my bachelor thesis and I'm stuck with the Data Analysis and wonder if I am doing something wrong?
I have four independent variables and one dependent variable, all measured on a five point likert scale and thus ordinal data.
I cannot use a normal type of regression (since my data is ordinal and my data is not normally distributed and never will be (transformations could not change that) and is also violating homoscedasticity), so I figured ordinal logisitc regression. Everything worked out perfectly but the test of parallel lines on SPSS was significant and thus the assumption of proportional odds violated. So, I am now considering multinomial logisitc regression as an alternative.
However, here I could not find out how to test the assumption on SPSS: Linear relationship between continuous variables and the logit transformation of the outcome variable. Does somebody know how to do this???
Plus, I have a more profound question about my data. To get the data on my variables, I asked respondents several questions. My dependent variable for example is Turnover Intention and I used 4 questions using a 5 point likert scale, thus I got 4 different values from everyone about their Turnover Intention. In order to do my analysis, I took the average since I only want one result, so one value of Turnover Intention per respondent (and not four). However, now the data does not range from 1,2,3,4 and 5 anymore like before with the five point likert scale but is infinite since I took the average and now have decimals like 1,25 or 1,75. This leaves me with endless data points and I was wondering if my approach makes sense? I was thinking of grouping them together since my analysis is biased by having so many different categories due to the many decimals.
Can somebody provide any sort of guidance on this??
In my Ms thesis, I calculated my data by the process of linear regression but my supervisor added me also step-wise linear regression.
I have retrieved a study that reports a logistic regression, the OR for the dichotomous outcome is 1.4 for the continuous variable ln(troponin) . This means that the Odds increase 0.4 every 2.7-fold in the troponin variable; but, is there any way of calculating the OR for a 1 unit increase in the Troponin variable?
I want to meta-analyze many logistic regressions, for which i need them to be in the same format (i.e some use the variable ln(troponin) and others (troponin). (no individual patient data is available)
For instance, when using OLS, objective of the could be
# to determine the effect of A on B
could this kind of objective hold when using threshold regression?
Hi folks!
Let's say that I have two lists / vectors "t_list" and "y_list" representing the relationship y(t). I also have numerically computed dy/dt and stored it into "dy_dt_list".
The problem is that "dy_dt_list" contains a lot of fluctuations, and that I know that it MONOTONOUSLY DECREASES out of a physical theory.
1) Is there is a simple way in R or Python to carry out a spline regression that reproduces the numerical values of dy/dt(t) in "dy_dt_list" as best it can UNDER THE CONSTRAINT that it keeps decreasing? I thus want to get a monotonously decreasing (dy/dt)_spline as the output.
2) Is there is a simple way in R or Python to carry out a spline regression that reproduces the numerical values of y(t) as best it can UNDER THE CONSTRAINT that (dy/dt)spline keeps decreasing? I thus want to get y_spline as the output, given that the above constraint is fulfilled.
I'd like to avoid having to reinvent the wheel!
P.S: I added an example to clarify things!
I have the OR of a logistic regresion that used the independent variable as continuous. I also have the ORs of 2x2 tables that dichotomized the variable (high if >0.1, low if < 0.1).
Is there anyway i can merge them for a meta-analysis. i.e. can the OR of the regression (OR for 1 unit increase) be converted to OR for High vs Low?
#QuestionForGroup
GoodDay,
I'm utilizing the Lovibond photometer for water analysis but I noticed in its handbook for analysis that this calibration function is like in the attached photo.
is it the inverted equation of Beer's law and why did it use polynomial regression?!.
can you clarify the derivation and purpose of this equation?
We know that bone is active tissue with continuous remodelling( bone growth and resorption). The atherosclerosis is static process as it formed or it could regress ? if the condition of lipid oxidation stopped, could atheroma regress spontaneously?
Suppose one has 40 or 50 survey questions for an exploratory analysis of a phenomenon, several of which are intended to be dependent variables, but most independent. A MLR is conducted with e.g. 15 IVs to explain the DV, and maybe half turn out to be significant. Now suppose an interesting IV warrants further investigation, and you think you have collected enough data to at least partially explain what makes this IV so important to the primary DV. Perhaps another, secondary model is in order... i.e. you'd like to turn a significant IV from the primary model into the DV in a new model.
Is there a name for this regression or model approach? It is not exactly nested, hierarchical, or multilevel (I think). The idea, again, is simply to explore what variables explain the presence of IV.a in Model 1, by building Model 2 with IV.a as the DV, and employing additional IVs that were not included in Model 1 to explain this new DV.
I am imagining this as a sort of post-hoc follow up to Model 1, which might sound silly, but this is an exploratory social science study, so some flexibility is warranted, imo.
When i do regression analyze, in Model Summary Table, i found Rsquare is very weak like:0,001 or 0.052, and value of sig. in Anova table is greater than 0.05, how can i fix this?
İ have a data set with six categorical variables, with responses on a scale of 1-5; the reliability test for the individual variables is very strong but when combined for all variables the reliability test give very low figures. What could be the problem. Also what would be an appropriate regression for this analysis.
If we have a research (analysis of factors affecting on sustainable agriculture...) in order to analyze its data, most previous researches have used techniques such as regression. To identify effictive factors, Is it possible to use the exploratory factor analysis technique?
I am planning to assess the extent of different income diversification strategies on rural household welfare. Considering simultaneous causality between different livelihood strategies and welfare indicators, the Two Stage Least Square (2SLS) method with instrumental variables will applied to estimate the impact of the strategies on household welfare.
Please check the attached file also. I just need to know which regression was used in table 4 of this paper and which tool (SPSS, STATA, R, etc.) I need to use to analyse the data.
I perfomed 2SLS ,
In the robust version i found the endogeneity , but did not found in non robust version.
The results in robust version is validy? need your help.
Non Robust options
Tests of endogeneity
H0: Variables are exogenous
Durbin (score) chi2(1) = .242302 (p = 0.6225)
Wu-Hausman F(1,613) = .227544 (p = 0.6335)
. estat overid
Tests of overidentifying restrictions:
Sargan (score) chi2(1) = .035671 (p = 0.8502)
Basmann chi2(1) = .033487 (p = 0.8548)
. estat firststage, all
First-stage regression summary statistics
--------------------------------------------------------------------------
| Adjusted Partial
Variable | R-sq. R-sq. R-sq. F(2,613) Prob > F
-------------+------------------------------------------------------------
TURN_1 | 0.1681 0.1152 0.0632 20.6714 0.0000
--------------------------------------------------------------------------
Shea's partial R-squared
--------------------------------------------------
| Shea's Shea's
Variable | partial R-sq. adj. partial R-sq.
-------------+------------------------------------
TURN_1 | 0.0632 0.0036
--------------------------------------------------
Minimum eigenvalue statistic = 20.6714
Critical Values # of endogenous regressors: 1
H0: Instruments are weak # of excluded instruments: 2
---------------------------------------------------------------------
| 5% 10% 20% 30%
2SLS relative bias | (not available)
-----------------------------------+---------------------------------
| 10% 15% 20% 25%
2SLS size of nominal 5% Wald test | 19.93 11.59 8.75 7.25
LIML size of nominal 5% Wald test | 8.68 5.33 4.42 3.92
---------------------------------------------------------------------
Robust options
- Tests of endogeneity
- H0: Variables are exogenous
- Robust score chi2(1) = 2.99494 (p = 0.0835)
- Robust regression F(1,613) = 2.77036 (p = 0.0965)
- . estat overid, forcenonrobust
- Tests of overidentifying restrictions:
- Sargan chi2(1) = .035671 (p = 0.8502)
- Basmann chi2(1) = .033487 (p = 0.8548)
- Score chi2(1) = .514465 (p = 0.4732)
- . estat overid
- Test of overidentifying restrictions:
- Score chi2(1) = .514465 (p = 0.4732)
- . estat firststage, all
- First-stage regression summary statistics
- --------------------------------------------------------------------------
- | Adjusted Partial Robust
- Variable | R-sq. R-sq. R-sq. F(2,613) Prob > F
- -------------+------------------------------------------------------------
- TURN_1 | 0.1681 0.1152 0.0632 13.7239 0.0000
- --------------------------------------------------------------------------
- Shea's partial R-squared
- --------------------------------------------------
- | Shea's Shea's
- Variable | partial R-sq. adj. partial R-sq.
- -------------+------------------------------------
- TURN_1 | 0.0632 0.0036
- --------------------------------------------------
What does the unstandardized regression coefficient in simple linear regression means?
Where as in multiple linear regression Unstandardized regression coefficients tell how much change in Y is predicted to occur per unit change in that independent variable (X), *when all other IVs are held constant*. But my question is in simple linear regression we have only one independent variable so how should I interpret it?
Hi,
How do I interpret a significant interaction effect between my moderator (Coh) and independent variable (Hos)? The literature states Hos and my dependent variable (PDm) has a negative relationship. The literature also states the moderator (Coh) has a positive relationship with the DV (PDm). My regression co-efficient for the interaction effect is negative. Does this mean Coh is exacerbating the negative effect (i.e., making it worse) or weakening the effect (i.e., making it better)?
I have attached the SPSS output and simple slopes graph.
Thank you!
Hello, I am trying to analyze factors that influence the adoption of technology, and while doing that, I am facing issues with rbiprobit estimation. I have seven years (2015-2021) balanced panel data that contains 2835 data. The dependent variable y1 (Adopt2cat), the endogenous variable "BothTechKnowledge," and the instrumental variable "SKinfoAdoptNew" takes value 0 and 1. Although the regression works, I am unsure how to include panel effects in the model.
I am using follwing codes:
rbiprobit Adopt2cat ACode EduC FarmExp HHCat LaborHH LandSizeDec LandTypeC landownership SoilWaterRetain SoilFertility CreditAvail OffFarmCode BothTechAware IrriMachineOwn, endog(BothTechKnowledge = ACode EduC FarmExp HHCat LaborHH LandSizeDec LandTypeC landownership SoilWaterRetain SoilFertility CreditAvail OffFarmCode BothTechAware IrriMachineOwn SKinfoAdoptNew)
rbiprobit tmeffects, tmeff(ate)
rbiprobit margdec, dydx(*) effect(total) predict(p11)
If we do not add time variables (year dummy), can we say we have obtained pooled panel estimation? I kindly request you to please guide me through both panel and pool panel estimation procedures. I have attached the Data file for your kind consideration.
Thank you very much in advance.
Kind regards
Faruque
I have 4 groups in my study and I want to analyse the effect of treatment in 4 groups at 20 time points. Which test should I chose?
I did principal component analysis on several variables to generate one component measuring compliance to medication but need understanding on how to use the regression scores generated for that component.
How can I ensure random sampling for customer surveys when the sampling frame is unavailable but needs to run a regression?
I had a few quick questions regarding the output generated by FEAT statistics. I'm currently working with resting-state data and attempting to perform nuisance regression of CSF, WM, Global Signal, motion parameters (standard + extended), and also scrub volumes that exceed a specific threshold of motion using FEAT Statistics. To scrub specific volumes with excessive motion I generated a confound.txt file that includes columns of 0 each with a single 1 indicating the specific volume that needs to be scrubbed. I selected Standard + Extended Motion Parameters to apply the motion parameters generated during FEAT preprocessing. Additionally, I applied CSF, WM, and Global signal nuisance regressors under full model setup by selecting Custom (1 entry per volume) and including three separate .txt files, each including 1 column of average values per volume (for CSF, WM, or Global). Doing so generated the attached design.png and res4d image. Is this the correct way to perform nuisance regression? If so, does the output res4d image look correct? It is very difficult to see the actual image relative to the background. Furthermore, is res4d the right image that I should be using if my goal is to extract the time series of ROIs within this fully processed resting state data?
Any help is very much appreciated!
Best,
Brandon
The objective here is to determine factor sensitivities or slope coefficients in a multiple ols regression model.
When the results of correlation and regression are different, which one should I rely on more? For example, if the correlation of two variables is negative, but the direction is positive in regression or path analysis, how should I interpret the results?
Global Project: Should we start developing the SIT-USE?
Software Immune Testing: Unified Software Engine (SIT-USE)
Toward Software Immune Testing Environment
Would you like to be part of the funding proposal for SIT-USE?
Would you like to participate in the development of the SIT-USE?
Would you like to support the development of HR SIT-USE?
Keywords: Funding Proposal or Funding, Participation, Support
If you answer yes to any of the questions, don't hesitate to get in touch with me at
[email protected] and write in the subject – The keyword(s)
Despite much progress and research in software technology, testing is still today's primary quality assurance technique. Currently, significant issues in software testing are:
1) Developing and testing software is necessary to meet the new economy market. In this new market, delivering the software on time is essential to capture the market. Software must be produced on time and be good enough to meet the customer's needs.
2) The existing software requirements keep changing as the project progresses, and in some projects, the rate of requirement changes can grow exponentially as the deadline approaches. This kind of rapid software change imposes significant constraints on testing because once a software program changes, the corresponding test cases/scripts may have to be updated. Furthermore, regression testing may have to be performed to ensure that those parts that are supposed to remain unchanged are indeed unchanged.
3) The number of test cases needed is enormous; however, the cost of developing test cases is extremely high.
4) Software development technologies, such as object-oriented techniques, design patterns (such as Decorator, Factory, Strategy), components (such as CORBA, Java's EJB and J2EE, and Microsoft's. NET), agents, application frameworks, client-server computing (such as socket programming, RMI, CORBA, Internet protocols), and software architecture (such as MVC, agent architecture, and N-tier architecture), progress rapidly, while designing and programming towards dynamic and runtime behavior. Dynamic behavior makes software flexible but also makes it difficult to test. Objects can now send a message to another entity without knowing the type of object that will receive the news. The receiver may be just downloaded from the Internet with no interface definition and implementation. Numerous testing techniques have been proposed to test object-oriented software. However, testing technology is still far behind software development technology.
5) Conventional software testing is generally application-specific, rarely reusable, and is not extensible. Even within a software development organization, software development, and test artifacts are developed by different teams and are described in separate documents. These make test reuse difficult.
As a part of this research, we plan to work toward an automated and immune software testing environment that includes 1. Unified Component-Based Testing (U-CBT); 2. Unified Built-In Test (U-BIT); 3. Unified-End-to-End (U-E2E) Testing; 4. Unified Agent-Based Testing U-ABT); 5. Unified Automatic Test Case Generators (U-ATCG); and 6. Unified Smart Testing Framework (U-STF). The development of this environment is based on the software stability model (SSM), knowledge map (KM): Unified Software Testing (KM-UST), and the notion of software agents. An agent is a computational entity evolving in an environment with autonomous behavior, capable of perceiving and acting on this environment and communicating with other agents.
You are invited to join Unified Software Engineering (USWE)
i am using STATA for thesting my hypotheses, what is the right command should i write to get star above significant coefficients at 10%, 5%, and 1% levels (two-tailed) and (one-tailed)?
i enclosed an attachment of regression and stars on significant coefficients at 10%, 5%, and 1% levels (two-tailed).
what about one tail?
I am doing a research project to study the determinants of capital structure. However, I've run into two issues.
After downloading data from Compustat, I noticed there are a lot of missing values amongst the data, and I wonder how I can deal with this data? How is it usually done in finance literature?
The other problem I came across is strange to me, and one of my variables, Interest expsense, includes zero values and sometimes also negative values, which not only does not make sense but also poses issues in calculating coverage ratio. What do you suggest me to do in this case?
I highly appreicate your response.
Best
Saeed
Recently, I was contacted by a professor who wanted to utilize my PyCaret book in his research. Considering that I support scientific advancement in every way possible, I was happy to collaborate with that person. Furthermore, I have decided to freely provide my book to other researchers interested in utilizing it. Here are the topics covered in the book:
• Regression
• Classification
• Clustering
• Anomaly Detection
• Natural Language Processing
• Time Series Forecasting
• Developing Machine Learning Apps with Streamlit
If you want to acquire the book for research purposes, I encourage you to send some information about your project, so we can discuss this further. You can check the link below for more information, and leave a comment below if you have any questions!
𝗦𝗶𝗺𝗽𝗹𝗶𝗳𝘆𝗶𝗻𝗴 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗣𝘆𝗰𝗮𝗿𝗲𝘁: https://leanpub.com/pycaretbook/
In G Power - which should I use for a hierarchical regression with continuous variables? t test - linear multiple regression: fixed model, single regression coefficient OR OR f test - linear multiple regression: fixed model, R2 deviation from zero. What is the difference?
i am runing an instrumental variable regression.
Eviews is providing two different models for instrumetenal variables i.e., two-stage least squares and generalized method of moments.
how to choose between the two models.
thanks in advance
I need help on how to run RIDGE REGRESSION in EViews. I have installed the add-in in EViews, but am having problems running the regression. Could someone, please help me with the step-by-step video (or even explanation, on how to do this. I am facing deadline.
I will sincerely appreciate a timely response.
Shalom.
Monday
I am working on a SEM model using Mplus. The model includes 2 latent factors each with about 4 dichotomous indicators. The latent factors are regressed onto 5 exogenous predictors (also dichotomous). A dichotomous outcome is, in turn, regressed onto the 2 latent factors. I used WLSMV to estimate the model, which is recommended when the latent factor indicators are dichotomous.
The model fits well but my understanding is that Mplus uses probit regression for the DV and latent factors. And I am not very familiar with how to interpret probit results. So I do not know how to interpret the parameter estimates (the indicator coefficients for each latent factor; the exogenous coefficients for those variables after regressing the latent factor on them; and the coefficients for the DV regressed onto the latent risk factors).
Can anyone point me towards reference material that might walk me through how to interpret (and write-up) the results of this modeling?
Thanks for any help.
James
Hello
I am searching for the Panel smooth transition regression Stata code.
Does anyone know of any available code for Stata?
Thank you
Hi everyone
I am using package "XTENDOTHRESDPD" to run a Dynamic panel threshold regression in Stata which is provided here: https://econpapers.repec.org/software/bocbocode/s458745.htm
However, I have the following issue which I could not solve.
To see whether the threshold effect is statistically significant, I am running "xtendothresdpdtest", function after the regression result and I am getting this Error: "inferieurbt_result not found."
I would really appreciate it if you could guide me in case you have any experience with this function.
sing STATA or R, how can we extract intra class correlation coefficients (ICCs) for Multilevel Poisson and Multilevel Negative Binomial Regression?
Dear all
I have a set of balance panel data, i:6, t: 21 which is it overall 126 observation. I decided that 1 dependent variable (y) and 6 independents variables (x1,x2......).
First: I do unit root test it shows:
y I(I)
x1 I(0)
x2 I(I)
x3 I(I)
x4 I(0)
X5 I(I)
x6 I(0)
If I would like to run panel data regression (Pooled, Fixed Effect and Random Effect), is that the correct form for inputting the model in Views:
d(y) c x1 d(x2) d(x3) x4 d(x5) x6
or
Shall I sort all variables in the same difference level, adding "d" to all ?
please correct if I am wrong, these are the steps I would like to conduct the statical part of a panel data:
1. Test Unit Root
2. Panel Regression?
3. ARDL
Hello everyone,
In order to compare two clinical methods, we usually use Passing & Bablok (PABA) regression. Most of the time, our samples are larger than n=50, but for the comparison I'm interested in today (method A vs method B), the samples are small (n = 10-15).
The PABA regression validates the equivalence between the two methods (method A vs method B). Indeed, the CI intercept crosses 0 and CI slope crosses 1 :
- Intercept = -6 and confidence intervalle (CI) = [-56 ; 31]
- Slope = 2. and confidence intervalle (CI) = [0,5 ; 4]
However, I have a few points of concern about these results because :
- The Pearson coefficient is low (r = 0,63),
- The size of the CI is very large,
- The coefficient of variation (CV) between the two methods is high (CV > 20%).
Do you know of any criteria or rules that I could add to the analysis of PABA regression that would enable me to improve our validation method ?
Thanks in advance for your help ! :)
I discovered that three independent variables have standardized multiple coefficients (SMC) equal to 1.02 on a dependent variable. Are there any approaches to be considered for modifying a high R2 in a regression?
I have conducted some ordinal logistic regressions, however, some of my tests have not met the proportional odds assumptions so I need to run multinomial regressions. What would I have to do to the ordinal DV to use it in this model? I'm doing this in SPSS by the way.
Using SPSS, I've studied linear regression between two continous variables (having 53 values each), I've got a p-value of 0.000 which means no normal distribution, should I use another type of regression?
In carrying out panel data regression analysis, it is required that Hausman Specification Test be carried out to choose from Fixed Effect or Random Effect estimation approaches. Another theory holds that Breusch-Pagan Lagrange multiplier (LM) test for panel data is also required to choose between Random Effect estimation and Pooled Effect Estimation.
Which of the preliminary tests should come first? Are these tests the final determinants of which estimation approach to deploy?
Hello community!
I am running a CFA for a within and between subjects design. The research involves the study of students on variables before and after taking an entrepreneurship course. The dependent variables are self-efficacy, with 5 subconstructs, and entrepreneurial intent. The independent variable is the course. Covariates are a continuous age variable, exposure (0, 1, 2, or 3), and experience (0, 1, or 2) (I used dummy variables for these in the ANCOVAs).
I don't have experience with repeated measures CFA and want to make sure I'm doing it correctly. I have attached a picture for the CFA model I have tested. I correlated error terms for the errors of the corresponding measured items. I set the regression weights to equal for both times. I also correlated the latent variables. This study also has multiple groups (female and male), but I believe it does not change anything to the factor structure (I just added separate data sets for those groups in Amos). Please let me know if this assumption is wrong.
- Does the model reflect an appropriate way to test whether the factor structure holds across time?
- Is it OK that I did not include the covariates or should I?
- The model where I constrain the regressions weights to be equal for both times has significantly lower model fit according to the chi square difference test. The model fit is otherwise good for both models TLI & CFI > .9 and RMSEA < .04. Can I argue that theoretically the model should hold and since model fit is good for the constrained model, that it is OK to use it across time? I know the chi square difference test is sensitive to sample size (n > 3,000) but does that matter for the chi square difference test?
- Chi square constrained model – Chi square less constrained model -> 11814.262-11644.759=169.503 and df (2ndmodel) – df (1st model) = 2345-2288=57. p < .001
I would appreciate your insight very much!
Thank you,
Heidi
In addition to Oaxaca-Blinder decomposition, does exogenous switching regression is applicable to see gender gap in market participation of agricultural product?
Hi,
I have a set of studies that looked at the association of sex w.r.t to multiple variables. The majority of the studies reported regression variables such as beta, b values, t-stats, and standard errors. Is it possible to run a meta-analysis using any of the above-mentioned variables? If so, which software would be more meaningful to perform a meta-analysis? I did a wee bit of research and found out that Metafor in R would be the better choice to perform these kinds of meta-analyses.
Any help would be highly appreciated!
Thanks!
Hi,
I have a set of studies that looked at the association of sex w.r.t to multiple variables. The majority of the studies reported regression variables such as beta, b values, t-stats, and standard errors. Is it possible to run a meta-analysis using any of the above-mentioned variables? If so, which software would be more meaningful to perform a meta-analysis? I did a wee bit of research and found out that Metafor in R would be the better choice to perform these kinds of meta-analyses.
Any help would be highly appreciated!
Thanks!
can we apply regression on moderate correlation? Please recommended an easy book to understand for non-statistical readers.
Hello everyone. The p value of the path estime regression weight (B=0.198) from A to C, is 0.014 in my model in the figure. After boostraping, the coefficient from A to C (B=0.198) becomes p value 0.043 as a direct effect. What causes this difference in P value? Many thanks for your comments
Hello spatial analysis experts
Hope you're all good.
I need urgently the commands and R codes of performing spatial binomial regression in RStudio. Please if someone has already worked on it, share the codes from start to end.
Thanks and regards
Dr. Sami
Hi there!
I am currently running SPSS AMOS 24
But the SEM result doesn't show the P-Value for regression weights in estimate when it comes to my three main paths
Estimates only showed score 1 for each correlation, S.E., C.R. and P-Value are all empty
(The rest of the variables are normal, only three main ones)
How can I resolve this question?
Looking forward to kind assistance in this regard, wish everyone well :)
My research topic is ROLE OF TEACHERS' ENTREPRENEURIAL ORIENTATION IN DEVELOPING ENTREPRENEURIAL MIND-SET OF STUDENTS IN HEIs. The research constructs that I am using are ENTREPRENEURIAL ORIENTATION and ENTREPRENEURIAL MIND-SET, both are psychological and behavioral. The variables that I will be measuring are INNOVATIVENESS, PRO-ACTIVENESS and RISK TAKING ABILITY of Teachers.
I will checking the strength of relationship between these construct using regression and would like to use THEORY OF PLANNED BEHAVIOUR by Ajzen in support of my research argument without TESTING or BUILDING the theory. I would seek expert advice as to how can it be done and is it practically acceptable practice
Thank You
Since OLS and Fixed effect estimation varies, for a fixed effect panel data model estimated using a fixed effects (within) regression what assumptions, for example no heteroskedasticity, linearity, do I need to test for, before I can run the regression.
I'm using the and xtreg,fe and xtscc,fe commands on stata.
In what situation we will use please tell me I face difficulty
In 2007 I did an Internet search for others using cutoff sampling, and found a number of examples, noted at the first link below. However, it was not clear that many used regressor data to estimate model-based variance. Even if a cutoff sample has nearly complete 'coverage' for a given attribute, it is best to estimate the remainder and have some measure of accuracy. Coverage could change. (Some definitions are found at the second link.)
Please provide any examples of work in this area that may be of interest to researchers.
I'm working on my PhD thesis and I'm stuck around expected analysis.
I'll briefly explain the context then write the question.
I'm studying moral judgment in the cross-context between Moral Foundations Theory and Dual Process theory.
Simplified: MFT states that moral judgmnts are almost always intuitive, while DPT states that better reasoners (higher on cognitive capability measures) will make moral judgmnets through analytic processes.
I have another idea - people will make moral judgments intuitively only for their primary moral values (e.g., for conservatives those are binding foundations - respectin authority, ingroup loyalty and purity), while for the values they aren't concerned much about they'll have to use analytical processes to figure out what judgment to make.
To test this idea, I'm giving participants:
- a few moral vignettes to judge (one concerning progressive values and one concerning conservative values) on 1-7 scale (7 meaning completely morally wrong)
- moral foundations questionnaire (measuring 5 aspects of moral values)
- CTSQ (Comprehensive Thinking Styles Questionnaire), CRT and belief bias tasks (8 syllogisms)
My hypothesis is therefore that cognitive measures of intuition (such as intuition preference from CTSQ) will predict moral judgment only in the situations where it concerns primary moral values.
My study design is correlational. All participants are answering all of the questions and vignettes. So I'm not quite sure how to analyse the findings to test the hypothesis.
I was advised to do a regressional analysis where moral values (5 from MFQ) or moral judgments from two different vignettes will be predictors, and intuition measure would be dependent variable.
My concern is that the anlaysis is a wrong choice because I'll have both progressives and conservatives in the sample, which means both groups of values should predict intuition if my assumption is correct.
I think I need to either split people into groups based on their MFQ scores than do this analysis, or introduce some kind of multi-step analysis or control or something, but I don't know what would be the right approach.
If anyone has any ideas please help me out.
How would you test the given hypothesis with available variables?
This questions is for beginner students only.