Which statistical technique should i use




















A chi-square test is used when you want to see if there is a relationship between two categorical variables. In SPSS, the chisq option is used on the statistics subcommand of the crosstabs command to obtain the test statistic and its associated p-value.

Remember that the chi-square test assumes that the expected value for each cell is five or higher. This assumption is easily met in the examples below.

The point of this example is that one or both variables may have more than two levels, and that the variables do not have to have the same number of levels. In this example, female has two levels male and female and ses has three levels low, medium and high. Please see the results from the chi squared example above. A one-way analysis of variance ANOVA is used when you have a categorical independent variable with two or more categories and a normally distributed interval dependent variable and you wish to test for differences in the means of the dependent variable broken down by the levels of the independent variable.

For example, using the hsb2 data file , say we wish to test whether the mean of write differs between the three program types prog. The command for this test would be:. The mean of the dependent variable differs significantly among the levels of program type.

However, we do not know if the difference is between only two of the levels or all three of the levels. The F test for the Model is the same as the F test for prog because prog was the only variable entered into the model. If other variables had also been entered, the F test for the Model would have been different from prog. To see the mean of write for each level of program type,. From this we can see that the students in the academic program have the highest mean writing score, while students in the vocational program have the lowest.

The Kruskal Wallis test is used when you have one independent variable with two or more levels and an ordinal dependent variable. In other words, it is the non-parametric version of ANOVA and a generalized form of the Mann-Whitney test method since it permits two or more groups. We will use the same data file as the one way ANOVA example above the hsb2 data file and the same variables as in the example above, but we will not assume that write is a normally distributed interval variable.

If some of the scores receive tied ranks, then a correction factor is used, yielding a slightly different value of chi-squared. With or without ties, the results indicate that there is a statistically significant difference among the three type of programs. A paired samples t-test is used when you have two related observations i. For example, using the hsb2 data file we will test whether the mean of read is equal to the mean of write.

The Wilcoxon signed rank sum test is the non-parametric version of a paired samples t-test. You use the Wilcoxon signed rank sum test when you do not wish to assume that the difference between the two variables is interval and normally distributed but you do assume the difference is ordinal.

We will use the same example as above, but we will not assume that the difference between read and write is interval and normally distributed. The results suggest that there is not a statistically significant difference between read and write. If you believe the differences between read and write were not ordinal but could merely be classified as positive and negative, then you may want to consider a sign test in lieu of sign rank test.

Again, we will use the same variables in this example and assume that this difference is not ordinal. These binary outcomes may be the same outcome variable on matched pairs like a case-control study or two outcome variables from a single group. Continuing with the hsb2 dataset used in several above examples, let us create two binary outcomes in our dataset: himath and hiread.

These outcomes can be considered in a two-way contingency table. The null hypothesis is that the proportion of students in the himath group is the same as the proportion of students in hiread group i. You would perform a one-way repeated measures analysis of variance if you had one categorical independent variable and a normally distributed interval dependent variable that was repeated at least twice for each subject.

This is the equivalent of the paired samples t-test, but allows for two or more levels of the categorical variable. This tests whether the mean of the dependent variable differs by the categorical variable.

In this data set, y is the dependent variable, a is the repeated measure and s is the variable that indicates the subject number. You will notice that this output gives four different p-values. No matter which p-value you use, our results indicate that we have a statistically significant effect of a at the. If you have a binary outcome measured repeatedly for each subject and you wish to run a logistic regression that accounts for the effect of multiple measures from single subjects, you can perform a repeated measures logistic regression.

The exercise data file contains 3 pulse measurements from each of 30 people assigned to 2 different diet regiments and 3 different exercise regiments. A factorial ANOVA has two or more categorical independent variables either with or without the interactions and a single normally distributed interval dependent variable. For example, using the hsb2 data file we will look at writing scores write as the dependent variable and gender female and socio-economic status ses as independent variables, and we will include an interaction of female by ses.

Note that in SPSS, you do not need to have the interaction term s in your data set. You perform a Friedman test when you have one within-subjects independent variable with two or more levels and a dependent variable that is not interval and normally distributed but at least ordinal.

We will use this test to determine if there is a difference in the reading, writing and math scores. The null hypothesis in this test is that the distribution of the ranks of each type of score i. To conduct a Friedman test, the data need to be in a long format.

SPSS handles this for you, but in other statistical packages you will have to reshape the data before you can conduct this test. Hence, there is no evidence that the distributions of the three types of scores are different.

Ordered logistic regression is used when the dependent variable is ordered, but not continuous. For example, using the hsb2 data file we will create an ordered variable called write3. This variable will have the values 1, 2 and 3, indicating a low, medium or high writing score.

We do not generally recommend categorizing a continuous variable in this way; we are simply creating a variable to use for this example. We will use gender female , reading score read and social studies score socst as predictor variables in this model. We will use a logit link and on the print subcommand we have requested the parameter estimates, the model summary statistics and the test of the parallel lines assumption.

There are two thresholds for this model because there are three levels of the outcome variable. One of the assumptions underlying ordinal logistic and ordinal probit regression is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc.

This is called the proportional odds assumption or the parallel regression assumption. Because the relationship between all pairs of groups is the same, there is only one set of coefficients only one model.

If this was not the case, we would need different models such as a generalized ordered logit model to describe the relationship between each pair of outcome groups. A factorial logistic regression is used when you have two or more categorical independent variables but a dichotomous dependent variable. For example, using the hsb2 data file we will use female as our dependent variable, because it is the only dichotomous variable in our data set; certainly not because it common practice to use gender as an outcome variable.

We will use type of program prog and school type schtyp as our predictor variables. Because prog is a categorical variable it has three levels , we need to create dummy codes for it. APA Citation Generator. Home Knowledge Base Statistics Statistical tests: which one should you use?

Statistical tests: which one should you use? They can be used to: determine whether a predictor variable has a statistically significant relationship with an outcome variable. Statistical tests flowchart Table of contents What does a statistical test do? What can proofreading do for your paper? What are the main assumptions of statistical tests?

Statistical tests commonly assume that: the data are normally distributed the groups that are being compared have similar variance the data are independent If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences. What is a test statistic? What is statistical significance? What is the difference between quantitative and categorical variables?

What is the difference between discrete and continuous variables? Discrete and continuous variables are two types of quantitative variables : Discrete variables represent counts e.

Continuous variables represent measurable amounts e. Is this article helpful? Rebecca Bevans Rebecca is working on her PhD in soil ecology and spends her free time writing.

She's very happy to be able to nerd out about statistics with all of you. Other students also liked. A step-by-step guide to hypothesis testing Hypothesis testing is a formal procedure for investigating our ideas about the world. It allows you to statistically test your predictions. Test statistics explained The test statistic is a number, calculated from a statistical test, used to find if your data could have occurred under the null hypothesis.

Understanding normal distributions In a normal distribution, data is symmetrically distributed with no skew and follows a bell curve. What is the effect of income on longevity? What is the effect of income and minutes of exercise per day on longevity?

What is the effect of drug dosage on the survival of a test subject? What is the effect of two different test prep programs on the average exam scores for students from the same class? What is the difference in average exam scores for students from two different schools? What is the difference in average pain levels among post-surgical patients given three different painkillers? What is the effect of flower species on petal length , petal width , and stem length?

In case lack of the sample size than actual required, our study will be under power to detect the given difference as well as result would be statistically insignificant.

As for each and every situation, there are specific statistical methods. Failing to select appropriate statistical method, our significance level as well as their conclusion is affected. Due to incorrect practice, we detected the statistically significant difference between the groups although actually difference did not exist. Selection of the appropriate statistical methods is very important for the quality research. It is important that a researcher knows the basic concepts of the statistical methods used to conduct research study that produce a valid and reliable results.

There are various statistical methods that can be used in different situations. Each test makes particular assumptions about the data. These assumptions should be taken into consideration when deciding which the most appropriate test is.

Wrong or inappropriate use of statistical methods may lead to defective conclusions, finally would harm the evidence-based practices. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important for improving and producing quality biomedical research. However, it is extremely difficult for a biomedical researchers or academician to learn the entire statistical methods. There are many softwares available online as well as offline for analyzing the data, although it is fact that which set of statistical tests are appropriate for the given data and study objective is still very difficult for the researchers to understand.

Therefore, since planning of the study to data collection, analysis and finally in the review process, proper consultation from statistical experts may be an alternative option and can reduce the burden from the clinicians to go in depth of statistics which required lots of time and effort and ultimately affect their clinical works. These practices not only ensure the correct and appropriate use of the biostatistical methods in the research but also ensure the highest quality of statistical reporting in the research and journals.

Authors would like to express their deep and sincere gratitude to Dr. His critical reviews and suggestions were very useful for improvement in the article. National Center for Biotechnology Information , U. Journal List Ann Card Anaesth v. Ann Card Anaesth. Author information Copyright and License information Disclaimer. Address for correspondence: Dr. E-mail: moc. This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.

This article has been cited by other articles in PMC. Abstract In biostatistics, for each of the specific situation, statistical methods are available for analysis and interpretation of the data. Keywords: Diagnostic accuracy , parametric and nonparametric methods , regression analysis , statistical method , survival analysis.

Introduction Selection of appropriate statistical method is very important step in analysis of biomedical data. Aim and objective of the study Selection of statistical test depends upon our aim and objective of the study. Type and distribution of the data used For the same objective, selection of the statistical test is varying as per data types. Observations are paired or unpaired Another important point in selection of the statistical test is to assess whether data is paired same subjects are measures at different time points or using different methods or unpaired each group have different subject.

Concept of Parametric and Nonparametric Methods Inferential statistical methods fall into two possible categorizations: parametric and nonparametric. Selection between Parametric and Nonparametric Methods All type of the t -test, F test are considered parametric test.

Table 1 Parametric and their Alternative Nonparametric Methods. Open in a separate window. Statistical Methods to Compare the Proportions The statistical methods used to compare the proportions are considered nonparametric methods and these methods have no alternative parametric methods. Table 2 Statistical Methods to Compare the Proportions. Other Statistical Methods Intraclass correlation coefficient is calculated when both pre-post data are in continuous scale. Table 3 Semi-parametric and non-parametric methods.

Advantage and Disadvantages of Nonparametric Methods over Parametric Methods and Sample Size Issues Parametric methods are stronger test to detect the difference between the groups as compared with its counterpart nonparametric methods, although due to some strict assumptions, including normality of the data and sample size, we cannot use parametric test in every situation and resultant its alternative nonparametric methods are used.

Impact of Wrong Selection of the Statistical Methods As for each and every situation, there are specific statistical methods. Conclusions Selection of the appropriate statistical methods is very important for the quality research. Conflicts of interest There are no conflicts of interest. Acknowledgements Authors would like to express their deep and sincere gratitude to Dr.

References 1.



0コメント

  • 1000 / 1000