Instructions

Note: Caution should be used at all stages when interpreting GDs. Just because possible concerns are indicated by GDs, does not mean that the results should immediately be thrown out. Rather, investigation, understanding, and the advancement of knowledge should be the main goals of using GDs. The major aims of this process is to identify possible concerns and alternate explanations for the results and to test the robustness of the results to these concerns/alternate explanations. Overall, we posit that the discussions and ideations resulting from the inclusion of GDs within our research processes will help ensure that we are reporting accurate and replicable findings within our science.


Scatterplot Matrices

  • Assess the shape of the univariate densities
    • Are the densities approximately normal?
      • Skewness – are the distributions asymmetric? Are they more heavily concentrated on one side of the distribution? If so, there may be issues with skewness and variable transformations (logarithm, square-root, etc.) and/or the use of robust statistics might be warranted.
      • Kurtosis – how pointed (or flat) are the distributions? If the distributions deviate strongly from normality, transformations such as the cube-root and/or the use of robust statistics might be warranted.
    • Are the densities unimodal (i.e., a single peak in the distribution)?
      • If the density is not unimodal (i.e. multiple modes are visible), one should attempt to identify the underlying mechanism driving this multimodality. For example, a community sample of college students might return a bimodal distribution of SAT scores. These scores might represent the fact that a research university and a community college both exist within the community and that their admissions standards are quite different. These samples could be analyzed separately or analytic techniques which can account for this clustering, such as linear mixed-effects models, could be used.
  • Are potential univariate or multivariate outliers present?
    • Are there points within the data which exist further away from the majority of the observations?
      • If possible outliers exist, then considerations for their removal, transformation, and/or the use of robust statistics might be warranted.
  • Assess the strength, direction, and form of the relationships among the variables
    • Strength of the relationship
      • How consistent (i.e., tightly clustered) are the observations along the trend of the data? Is there wide variability or is there a clearly-defined relationship?
    • Direction of the relationship
      • Is the relationship generally moving upward (positive correlation), downward (negative correlation), or horizontal (no correlation)?
    • What is the form of the relationship?
      • Is the relationship linear, as is assumed by many statistical analyses, or is it more complex (quadratic, cubic, etc.)?
      • If non-linear, consideration for the inclusion of additional terms in the analytic model (quadratic, cubic, etc.) might be warranted.
  • Are there any other abnormalities in the data?
    • In general, visualizing one’s data allows for the identification of a broad-range of abnormalities. Examples include, but are not limited to miscoded data, the accidental removal of values (e.g., 3.0 of a 5-point response scale being calculated as ‘Missing’), range restriction (e.g., a strong ceiling and/or floor effect might be found in university admissions), and/or values falling outside of the potential range for the variable.

Group-Means Plot

  • Are the subgroup densities approximately normal?
    • Skewness – are the distributions asymmetric? Are they more heavily concentrated on one side of the distribution? If so, there may be issues with skewness and variable transformations (logarithm, square-root, etc.) and/or the use of robust statistics might be warranted.
    • Kurtosis – how pointed (or flat) are the distributions? If the distributions deviate strongly from normality, transformations such as the cube-root and/or the use of robust statistics might be warranted.
  • Are potential outliers present for each subgroup?
    • If possible outliers are identified, then considerations for their removal, transformation, and/or the use of robust statistics might be warranted. The mean differences among groups might simply be an artifact of a few influential data points.
  • Are the subgroup densities unimodal (i.e., a single peak in the distribution)?
    • If the densities are not unimodal (i.e. multiple peaks are visible), one should attempt to identify the underlying mechanism driving this multimodality. Additional Group-Means plots can be conducted to further assess these concerns.

Moderator Plots

  • Are there outliers or extreme values driving or attenuating predicted moderation effects?
    • If using a categorical moderator, this should be assessed for each level of the categorical moderator.
  • It is assumed that a linear relationship holds between the independent and dependent variable across groups. Is this the case?
  • To what extent is the grouping variable and independent variable related? A strong relationship may imply non-overlapping independent variable distributions which could indicate non-linearity between the independent variable and the dependent variable as an explanation.

Regression Assumptions Check

  • Correct Form of Relationship is Specified
    • Does the relationship between the fitted values and the residuals exhibit a curved or otherwise non-linear relationship (as shown by the red loess line in the top-left plot)? If so, this might indicate a violation of regression assumptions. Non-linear terms, data transformations, and/or the use of alternate analytic techniques might be warranted.
  • Homogeneity of Variance
    • Is there a constant spread of the residuals at all levels of the fitted values or is there a systematic restriction or expansion (e.g., a "shotgun" effect) across the fitted values in the top-left plot? Another way to assess this is to see if the rescaled residuals (bottom-left plot) have more spread at higher or lower fitted value. This can be evaluated by seeing if the red loess line is sloped (positively or negatively). A slope, either positive or negative, would indicate that more variance exists at higher or lower fitted values, respectively. A flat line indicates relatively constant residual variance across the fitted values. Data transformations and/or the use of alternate analytic techniques might be warranted.
  • Normality of the Residuals
    • Do the residuals fall fairly well onto the straight, dashed line in the top-right plot? If they deviate largely from the dashed line, then non-normality and/or outliers may be indicated and data transformations, the removal of outliers, and/or the use of alternate analytic techniques might be warranted.
  • Outliers
    • Do the residuals fall fairly well onto the straight, dashed line in the top-right plot? If they deviate largely from the dashed line, then non-normality and/or outliers may be indicated and data transformations, the removal of outliers, and/or the use of alternate analytic techniques might be warranted.