Nghịch lý Simpson

Nghịch lý Simpson (tiếng Anhː Simpson's paradox) hay hiệu ứng Yule–Simpson, là một nghịch lý trong xác suất và thống kê, phát biểu rằng một xu hướng xuất hiện trong nhiều nhóm con của dữ liệu nhưng biến mất hoặc đảo ngược khi các nhóm đó được gộp lại. Nó đôi khi được gọi bằng nghịch lý đảo ngược hay nghịch lý gộp.^[1]

Kết quả này thường gặp trong các thống kê của khoa học xã hội và y tế^[2] và gây hiểu lầm đặc biệt khi dữ liệu tần suất được diễn giải theo quan hệ nhân quả một cách không phù hợp.^[3] Nghịch lý này có thể được giải quyết khi các biến nhiễu và mối quan hệ nhân quả được giải quyết một cách thích hợp trong mô hình thống kê^[3]^[4] (ví dụ: thông qua phân tích cụm (cluster analysis)^[5]).

Edward H. Simpson lần đầu tiên mô tả hiện tượng này trong một bài báo kỹ thuật vào năm 1951^[6] nhưng các nhà thống kê Karl Pearson (năm 1899)^[7]) và Udny Yule (năm 1903)^[8] đã đề cập đến những hiệu ứng tương tự trước đó. Cái tên nghịch lý Simpson được Colin R. Blyth giới thiệu vào năm 1972.^[9]

Nhà toán học Jordan Ellenberg lập luận rằng nghịch lý Simpson được đặt tên sai vì "không có mâu thuẫn nào liên quan, chỉ có hai cách khác nhau để suy nghĩ về cùng một dữ liệu" và cho rằng bài học của nghịch lý này "không thực sự là cho chúng ta biết nên chọn quan điểm nào mà là nhấn mạnh rằng chúng ta phải ghi nhớ cả các bộ phận và toàn bộ cùng một lúc."^[10]

Xác suất xuất hiện

Một bài báo của Pavlides và Perlman (2009) đưa ra bằng chứng rằng trong một bảng ngẫu nhiên 2 × 2 × 2 có phân phối đều, nghịch lý Simpson sẽ xảy ra với xác suất chính xác là 1⁄60.^[11] Một nghiên cứu của Kock cho thấy rằng xác suất nghịch lý Simpson sẽ xảy ra ngẫu nhiên trong các mô hình path (tức là các mô hình được tạo ra bởi phân tích path) với hai biến độc lập và một biến phụ thuộc là khoảng 12,8%.^[12]

Tham khảo

^ I. J. Good, Y. Mittal (tháng 6 năm 1987). “The Amalgamation and Geometry of Two-by-Two Contingency Tables”. The Annals of Statistics. 15 (2): 694–711. doi:10.1214/aos/1176350369. ISSN 0090-5364. JSTOR 2241334.
^ Clifford H. Wagner (tháng 2 năm 1982). “Simpson's Paradox in Real Life”. The American Statistician. 36 (1): 46–48. doi:10.2307/2684093. JSTOR 2684093.
^ ^a ^b Judea Pearl. Causality: Models, Reasoning, and Inference, Cambridge University Press (2000, 2nd edition 2009). ISBN 0-521-77362-8.
^ Kock, N., & Gaskins, L. (2016). Simpson's paradox, moderation and the emergence of quadratic relationships in path models: An information systems illustration. International Journal of Applied Nonlinear Science, 2(3), 200–234.
^ Rogier A. Kievit, Willem E. Frankenhuis, Lourens J. Waldorp and Denny Borsboom (2013). “Simpson's paradox in psychological science: a practical guide”. Front. Psychol. 4 (513): 1=14. doi:10.3389/fpsyg.2013.00513.Quản lý CS1: nhiều tên: danh sách tác giả (liên kết)
^ Simpson, Edward H. (1951). “The Interpretation of Interaction in Contingency Tables”. Journal of the Royal Statistical Society, Series B. 13 (2): 238–241. doi:10.1111/j.2517-6161.1951.tb00088.x.
^ Pearson, Karl; Lee, Alice; Bramley-Moore, Lesley (1899). “Genetic (reproductive) selection: Inheritance of fertility in man, and of fecundity in thoroughbred racehorses”. Philosophical Transactions of the Royal Society A. 192: 257–330. doi:10.1098/rsta.1899.0006.
^ G. U. Yule (1903). “Notes on the Theory of Association of Attributes in Statistics”. Biometrika. 2 (2): 121–134. doi:10.1093/biomet/2.2.121.
^ Colin R. Blyth (tháng 6 năm 1972). “On Simpson's Paradox and the Sure-Thing Principle”. Journal of the American Statistical Association. 67 (338): 364–366. doi:10.2307/2284382. JSTOR 2284382.
^ Ellenberg, Jordan (25 tháng 5 năm 2021). Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy and Everything Else. New York: Penguin Press. tr. 228. ISBN 978-1-9848-7905-9. OCLC 1226171979.
^ Marios G. Pavlides & Michael D. Perlman (tháng 8 năm 2009). “How Likely is Simpson's Paradox?”. The American Statistician. 63 (3): 226–233. doi:10.1198/tast.2009.09007. S2CID 17481510.
^ Kock, N. (2015). “How likely is Simpson's paradox in path models?” (PDF). International Journal of e-Collaboration. 11 (1): 1–7.

Bibliography

Leila Schneps và Coralie Colmez, Math on trial. How numbers get used and abused in the courtroom, Basic Books, 2013. ISBN 978-0-465-03292-1. (Sixth chapter: "Math error number 6: Simpson's paradox. The Berkeley sex bias case: discrimination detection").

Liên kết ngoài

Cẩn thận với con số phần trăm! - Nguyễn Văn Tuấn^{[liên kết hỏng]}. Kinh tế Sài Gòn online. 22 tháng 6 năm 2010.
How statistics can be misleading - Mark Liddell—TED-Ed video and lesson.
Stanford Encyclopedia of Philosophy: "Simpson's Paradox" – by Gary Malinas.
Earliest known uses of some of the words of mathematics: S
- For a brief history of the origins of the paradox see the entries "Simpson's Paradox" and "Spurious Correlation"
Pearl, Judea, ""The Art and Science of Cause and Effect." A slide show and tutorial lecture.
Pearl, Judea, "Simpson's Paradox: An Anatomy" (PDF)
Simpson's Paradox Visualized - an interactive demonstration of Simpson's paradox.
Short articles by Alexander Bogomolny at cut-the-knot:
- "Mediant Fractions."
- "Simpson's Paradox."
The Wall Street Journal column "The Numbers Guy" for ngày 2 tháng 12 năm 2009 dealt with recent instances of Simpson's paradox in the news. Notably a Simpson's paradox in the comparison of unemployment rates of the 2009 recession with the 1983 recession, by Cari Tuna (substituting for regular columnist Carl Bialik).

[1] I. J. Good, Y. Mittal (tháng 6 năm 1987). “The Amalgamation and Geometry of Two-by-Two Contingency Tables”. The Annals of Statistics. 15 (2): 694–711. doi:10.1214/aos/1176350369. ISSN 0090-5364. JSTOR 2241334.

[2] Clifford H. Wagner (tháng 2 năm 1982). “Simpson's Paradox in Real Life”. The American Statistician. 36 (1): 46–48. doi:10.2307/2684093. JSTOR 2684093.

[pearl-3] Judea Pearl. Causality: Models, Reasoning, and Inference, Cambridge University Press (2000, 2nd edition 2009). ISBN 0-521-77362-8.

[4] Kock, N., & Gaskins, L. (2016). Simpson's paradox, moderation and the emergence of quadratic relationships in path models: An information systems illustration. International Journal of Applied Nonlinear Science, 2(3), 200–234.

[5] Rogier A. Kievit, Willem E. Frankenhuis, Lourens J. Waldorp and Denny Borsboom (2013). “Simpson's paradox in psychological science: a practical guide”. Front. Psychol. 4 (513): 1=14. doi:10.3389/fpsyg.2013.00513.Quản lý CS1: nhiều tên: danh sách tác giả (liên kết)

[6] Simpson, Edward H. (1951). “The Interpretation of Interaction in Contingency Tables”. Journal of the Royal Statistical Society, Series B. 13 (2): 238–241. doi:10.1111/j.2517-6161.1951.tb00088.x.

[7] Pearson, Karl; Lee, Alice; Bramley-Moore, Lesley (1899). “Genetic (reproductive) selection: Inheritance of fertility in man, and of fecundity in thoroughbred racehorses”. Philosophical Transactions of the Royal Society A. 192: 257–330. doi:10.1098/rsta.1899.0006.

[yule-8] G. U. Yule (1903). “Notes on the Theory of Association of Attributes in Statistics”. Biometrika. 2 (2): 121–134. doi:10.1093/biomet/2.2.121.

[blyth-72-9] Colin R. Blyth (tháng 6 năm 1972). “On Simpson's Paradox and the Sure-Thing Principle”. Journal of the American Statistical Association. 67 (338): 364–366. doi:10.2307/2284382. JSTOR 2284382.

[10] Ellenberg, Jordan (25 tháng 5 năm 2021). Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy and Everything Else. New York: Penguin Press. tr. 228. ISBN 978-1-9848-7905-9. OCLC 1226171979.

[11] Marios G. Pavlides & Michael D. Perlman (tháng 8 năm 2009). “How Likely is Simpson's Paradox?”. The American Statistician. 63 (3): 226–233. doi:10.1198/tast.2009.09007. S2CID 17481510.

[12] Kock, N. (2015). “How likely is Simpson's paradox in path models?” (PDF). International Journal of e-Collaboration. 11 (1): 1–7.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]