I think Jeromy's answer is sufficient if you are examining two experimental studies or an actual meta-analysis. But often times we are faced with examining two non-experimental studies, and are tasked with assessing the validity of those two disparate findings.
As Cyrus's grocery list of questions suggests, the topic itself is not amenable to short response, and whole books are in essence aimed to address such a question. For anyone interested in conducting research on non-experimental data, I would highly suggest you read
Experimental and quasi-experimental designs for generalized causal inference by William R. Shadish, Thomas D. Cook, Donald Thomas Campbell (Also I have heard that the older versions of this text are just as good).
Several items Jeromy referred to (bigger sample sizes, and greater methodological rigour), and everything that Cyrus mentions would be considered what Campbell and Cook refer to as "Internal Validity". These include aspects of the research design and the statistical methods used to assess the relationship between X and Y. In particular as critics we are concerned about aspects of either that could bias the results, and diminish the reliability of the findings. As this is a forum devoted to statistical analysis, much of the answers are centered around statistical methods to ensure unbiased estimates of whatever relationship you are assessing. But their are other aspects of the research design unrelated to statistical analysis that diminish the validity of the findings no matter what rigourous lengths one goes to in their statistical analysis (such as Cyrus's mention of several aspects of experiment fidelity can be addressed but not solved with statistical methods, and if they occur will always diminish the validity of the studies results). There are many other aspects of internal validity that become crucial to assess in comparing results of non-experimental studies that are not mentioned here, and aspects of research designs that can distinguish reliability of findings. I don't think it is quite appropriate to go into too much detail here, but I would often take the results of a quasi-experimental study (such as an interrupted time series or a matched case-control) more seriously than I would a study that is not quasi experimental, regardless of the other aspects Jeromy or Cyrus mentioned (of course within some reason).
Campbell and Cook also refer to the "external validity" of studies. This aspect of research design is often much smaller in scope, and does not deserve as much attention as internal validity. External validity essentially deals with the generalizability of the findings, and I would say laymen can often assess external validity reasonably well as long as they are familiar with the subject. Long story short read Shadish's, Cook's and Campbell's book.