[quote]LoRez wrote:
I don’t know a whole lot about study design and the things that are often screwed up. What kind of things do people often screw up? What kinds of things are done wrong?[/quote]
Oh, jeez. Where do I begin?
- Study designs that don’t actually answer the question:
Mr. Investigator wants to know if a certain lab assay is reproducible in two different laboratories. He recruits a sample of men in Lab A, draws blood, and measure Biomarker Z using Magical Assay X. His colleague recruits a sample of men in Lab B across the country, draws blood, and measures Biomarker Z using Magical Assay X.
Problem: if we don’t have measurements taken on the same PEOPLE, there is no way to know if the variability is due to differences between the lab measurements, or due to actual differences in the people! It was like weighing 20 men in one city, weighing 20 different men in another city, and then concluding that the scales were different because we got a different average weight. I was absolutely floored that they had not thought of this in the design. That doesn’t even take a statistician to figure out, that takes common sense.
- Not really understanding how analysis works:
I want to compare the distribution of a particular variable (say, Max Bench Press) between two groups (say, two football teams). Team A has a bunch of kids who all bench between 225 and 315 pounds. Team B has a bunch of kids who bench between 225-315 and one kid who benches 500 pounds.
If I do a standard t-test comparing the means of the two groups, I will probably get a significant difference and conclude that Team B has a higher average bench press than Team A.
That’s a flawed conclusion. What’s really going on is that they have essentially the same distribution with one monstrously strong kid (see below on outliers). You should not just “throw out” that kid; you should use a different kind of test (rank-sum test, known as either the Wilcoxon test or the Mann-Whitney test) that is less sensitive to extreme observations but still tests whether the distributions have the same center.
Most people don’t have a clue that this test even exists.
- Doing the right type of analysis, but doing it wrong:
I want to create a regression model because I think several variables influence the relationship between my exposure of interest and my outcome. Model building is a VERY inexact science, but suppose my main question of interest is whether meat consumption is associated with cancer risk. I make a regression model with cancer as the outcome, meat consumption as the primary predictor, and then I decide to add more variables to the model based on whether they have significant relationships with cancer.
Wrong, chief. I should be adding variables to the model based on whether they have any effect on the meat-cancer relationship, since THAT is my primary research question. Most investigators just put everything that’s significantly related to their outcome of interest into the model.
- Inappropriate treatment of outliers:
Scientists should note the presence of outliers in a dataset, and POSSIBLY remove them from analysis depending on the reason that they occurred. But people will often make that decision based on how much the outlier affects the study results. Wrong answer, homeboy. That outlier is one of the most valuable data points, and taking it out will bias your results. You ONLY take it out if there’s some biological reason (like, the value was impossible, or the lab assay was done wrong) to do so. If it’s making your statistical results unstable, then you might consider a different statistical technique (transforming the data or a nonparametric test), but you absolutely should NOT remove it just because it’s an outlier and you don’t like the way it influenced your results.