[전문공개] Frauds in the Korea 2020 Parliamentary Election -Walter R. Mebane, Jr.

21대 국회의원선거 사전투표 조작의혹 관련 논란이 증폭되고 있는 가운데, 관련 분야 세계적인 권위자로 평가받고 있는 한 전문가의 문건이 공개됐다.

Walter R. Mebane, Jr.이라는 미국 미시건 대학교의 정치학과 교수가 작성한 "Frauds in the Korea 2020 Parliamentary Election" 라는 문건으로, 본지는 국민의 알권리 차원에서 문서를 입수하여 전문을 공개한다.

월터 미베인 교수 (Walter R. Mebane, Jr)는 부정선거를 탐지하는 프로그램도 개발하여 지원하고 있으며, 최근까지 전세계 각국의 선거에서 부정의 요소들을 정확히 감지해 내는 문건을 여러차례 작성하는 등 이 분야 최고의 석학으로 꼽힌다.

Frauds in the Korea 2020 Parliamentary Election∗

Walter R. Mebane, Jr.†

April 28, 2020

The statistical model implemented in eforensics1 oﬀers evidence that fraudulent votes occurred in the election that may have changed some election outcomes.The statistical model operationalizes the idea that “frauds” occur when one party gains votes by a combination of manufacturing votes from abstentions and stealing votes from opposing parties.

The Bayesian speciﬁcation2 allows posterior means and credible intervals for counts of “fraudulent” votes to be determined both for the entire election and for observed individual aggregation units. It is important to keep in mind that “frauds” according to the eforensics model may or may not be results of malfeasance and bad actions.

How much estimated “frauds” may be produced by normal political activity, and in particular by strategic behavior, is an open question that is the focus of current research. Statistical ﬁndings such as are reported here should be followed up with additional information and further investigation into what happened.

The statistical ﬁndings alone cannot stand as deﬁnitive evidence about what happened in an election.

Figure 1 shows the distribution of turnout and vote proportions across aggregation units.3 Each turnout proportion is (Number Valid)/(Number Eligible), and each vote proportion is (Number Voting for Party)/(Number Eligible).

The data include counts for n = 19072 units. 328 “abroad oﬃce” observations have zero eligible voters but often a small number of votes—the largest number is 23—and are omitted from the plots.

Figure 1(a)uses vote proportions deﬁned based on Democratic Party votes, and Figure 1(b) uses vote proportions deﬁned based on the votes received by the party with the most votes in each constituency.

Fraud allegations have focused on the Democratic Party, but a principled way to analyze the single-member district election data is to consider that frauds potentially beneﬁted the eading candidate in each constituency.

In the the ﬁgure diﬀerences between the two distributions are apparent, but both share a distinctive multimodal pattern. There appear to be clusters of observations that hared istinctive levels of turnout and votes, some with low, medium, high and very high turnout.

The diagonal edge feature in the plots results from using Number Eligible as the denominator for both proportions: when the party receives nearly all the valid votes, then the observation is near that diagonal.

Figure 1: Korea 2020 Parliamentary Election Data Plots

(a) Democratic Party (b) Constituency leaders

Figures 2 and 3 show that the diﬀerent clusters in Figure 1 correspond with observations that are administratively distinctive.

Figure 2 displays data for Democratic Party votes, and Figure 3 shows data for constituency leader votes. The four sets of units2 that have distinctive distributions are district-level, election-day units that are not abroad (Figures 2(a) and 3(a)), postal, election-day units (Figures 2(b) and 3(b)), abroad units (Figures 2(c) and 3(c)) and pre-vote units (Figures 2(d) and 3(d)). Each subset of units (a), (b) and (d) has a mostly unimodal distribution: the marginal histograms are mostly near symmetric. But exceptional points are evident in each of these subsets. Abroad units are more distinctively bimodal when constituency leaders are considered than when the Democratic Party is in focus.

Figure 2: Korea 2020 Parliamentary Election Data Plots, Democratic Party

Note: plots show turnout (number voting/number eligible) and vote proportions (number voting for Democratic party/number eligible) for four subsets of observations: (a) district-level, election-day, not abroad; (b) postal election-day; (c) abroad; (d) pre-vote. Plots show scatterplots with estimated bivariate densities overlaid, with histograms along the axes. 328 “abroad oﬃce” observations reported with zero eligible voters but often with a positive number of votes are omitted.

Figure 3: Korea 2020 Parliamentary Election Data Plots, Constituency Leaders

Note: plots show turnout (number voting/number eligible) and vote proportions (number voting for constituency-leading party/number eligible) for four subsets of observations: (a) district-level, election-day, not abroad; (b) postal election-day; (c) abroad; (d) pre-vote. Plots show scatterplots with estimated bivariate densities overlaid, with histograms along the axes. 328 “abroad oﬃce” observations reported with zero eligible voters but often with a positive number of votes are omitted.

I estimate the eforensics model separately for the two deﬁnitions of leading party votes. Covariates for turnout and vote choice include indicators for pre-vote, postal, abroad and disabled-ship status and ﬁxed eﬀects for the 252 constituencies included in the data.

The two speciﬁcations agree that 418 aggregation units are fraudulent, but 869 additional units are fraudulent in the Democratic party speciﬁcation and 745 additional units are fraudulent in the constituency-leading party speciﬁcation. As Table 1 shows, key parameter estimates are similar in the models. Parameters for the probabilities of frauds (π1, π2, π3) are about the same between speciﬁcations, and coeﬃcients for the turnout equation (τ1–τ5) are similar. Coeﬃcients for vote choice (β1–β4) diﬀer, reﬂecting the diﬀerences in vote proportions being modeled.

Figure 4 uses plots by subset of Democratic party focused observations to illustrate which observations are fraudulent according to the eforensics model with the Democratic party focused speciﬁcation. Nonfraudulent observations are plotted in blue and fraudulent observations appear in red. The frequencies of fraudulent and not fraudulent units appear in the note at the bottom of the ﬁgure. Visually and by the numbers, frauds occur most frequently for pre-vote units (43.1% are fraudulent), next most frequently for for district-level, election-day, not abroad unts (3.14% fraudulent) then next most frequently postal election day units (.925% are fraudulent). None of the abroad units are fraudulent.

Figure 5 uses plots by subset of constituency-leader focused observations to illustrate which observations are fraudulent according to the eforensics model with the constituency-leader focused speciﬁcation. Nonfraudulent observations are plotted in blue and fraudulent observations appear in red. The frequencies of fraudulent and not fraudulent units appear in the note at the bottom of the ﬁgure. Visually and by the numbers, frauds occur most frequently for pre-vote units (22.6% are fraudulent), next most frequently for postal election day units (2.09% are fraudulent) then next most frequently for district-level, election-day, not abroad unts (.920% fraudulent). None of the abroad units are fraudulent.

Table 1: Korea 2020 Parliamentary eforensics Estimates

Figure 4: Korea 2020 Fraud Plots , Democratic Party

Note: plots show turnout (number voting/number eligible) and vote proportions (number voting for Democratic Party/number eligible) for four subsets of observations: (a) district-level, election-day, not abroad (10 fraudulent, 318 not); (b) postal election-day (131 fraudulent, 14155 not); (c) abroad (0 fraudulent, 328 not); (d) pre-vote (1146 fraudulent, 2656 not). Plots show scatterplots with nonfraudulent observations in blue and fraudulent observations in red. 328 “abroad oﬃce” observations reported with zero eligible voters but often with a positive number of votes are omitted.

Figure 5: Korea 2020 Fraud Plots , Constituency Leaders

Note: plots show turnout (number voting/number eligible) and vote proportions (number voting for constituency-leading party/number eligible) for four subsets of observations: (a) district-level, election-day, not abroad (5 fraudulent, 323 not); (b) postal election-day (298 fraudulent, 13988 not); (c) abroad (0 fraudulent, 328 not); (d) pre-vote (860 fraudulent, 2942 not). Plots show scatterplots with nonfraudulent observations in blue and fraudulent observations in red. 328 “abroad oﬃce” observations reported with zero eligible voters but often with a positive number of votes are omitted.

I use a counterfactual method to calculate how many votes are fraudulent.4 Table 2 reports the observed counts of eligible voters, valid votes and votes for the (a) Democratic party and (b) constituency-leading party totaled over all units in the analysis, along with fraudulent vote count totals. The total of “manufactured” votes is reported separately from the total number of fraudulent votes: manufactured votes are votes that the model estimates should have been abstentions but instead were observed as votes for the leading party.

Both posterior means and 95% and 99.5% credible intervals are reported. The results show that for the Democratic Party focused speciﬁcation over all about 1,491,548 votes are fraudulent, and of the fraudulent votes about 1,122,169 are manufactured (the remaining 369379 are stolen—counted for the leading party when they should have been counted for a diﬀerent party).

Overall, according to the eforensics model, about 10.43% of the votes for the Democratic Party candidates are fraudulent. The results show that for the constituency-leading focused speciﬁcation over all about 1,171,734 votes are fraudulent, and of the fraudulent votes about 910,444 are manufactured (the remaining 261,290 are stolen—counted for the leading party when they should have been counted for a diﬀerent party). Overall, according to the eforensics model, about 7.26% of the votes for the constituency-leading candidates are fraudulent.

Fraudulent vote occurrence varies over constituencies.

Counts of frauds by aggregation unit appear in a supplemental ﬁle5, but I use the unit-speciﬁc fraudulent vote counts from the constituency-leader focused speciﬁcation to assess whether the number of fraudulent votes is ever large enough apparently to change the winner of a constituency contest. For 236 constituencies it is not, but for 16 constituencies the number of fraudulent votes is large enough apparently to change the winner of the constituency contest. In 9 instances the apparently fraudulently winning party is the Democratic Party, in 6 instances it is the United Future Party and in the remaining instance it is an Independent candidate.

Given two speciﬁcations, which one is better?

Probably neither model is correct, strictly speaking, ven beyond the generality that no model is ever correct, but some are useful. If frauds only ever beneﬁt the Democratic Party, then those frauds may have induced apparent frauds when we constrain frauds to beneﬁt only constituency-leading candidates, because many of these do not aﬃliate with the Democratic Party.

Table 2: Korea 2020 eforensics Estimated Fraudulent Vote Counts

Similarly if only constituency-leading candidates beneﬁt from frauds, then eforensics may be producing misleading results when we constrain frauds to beneﬁt only the Democratic Party. Or perhaps other candidates—or several in each constituency—beneﬁt from frauds and both speciﬁcations are producing misleading results. Possibly, of course, there are no frauds and something else is going on.

Caveats are many. The most basic caution is to keep in mind that “frauds” according to the eforensics model may or may not be results of malfeasance and bad actions.

If some normal political situation makes the apparently fraudulent aggregation units appear fraudulent to the eforensics model and estimation procedure, then the frauds estimates may be signaling that “frauds” occur where in fact something else is happening. In particular there maybe something benign that leads many of the pre-vote units to have a turnout and vote choice distribution that diﬀers so much especially from the distribution for election-day postal units, the latter comprising the bulk of the data.

Likewise something benign may distinguish the election-day postal units that the eforensics model identiﬁes as fraudulent.

Beyond that general caution, there may something about the particular data used for the analysis that triggers the “fraud” ﬁndings—for instance, the data appear to be missing about 100,000 votes and one entire constituency, and the vote totals in the data for constituency-leading candidates do not always match totals reported in “lists of winners.”

And there may be something about the model speciﬁcation that should be improved that would produce diﬀerent results.

Statistical ﬁndings such as are reported here should be followed up with additional information and further investigation into what happened. The statistical ﬁndings alone cannot stand as deﬁnitive evidence about what happened in the election.

--------------------------------------------------
References
Ferrari, Diogo, Kevin McAlister and Walter R. Mebane, Jr. 2018. “Developments in Positive
Empirical Models of Election Frauds: Dimensions and Decisions.” Presented at the 2018
Summer Meeting of the Political Methodology Society, Provo, UT, July 16–18. (문건 끝)

Software Available for Downloading, with Documentation

Election Forensics R Package (eforensics tarball) and (eforensics GitHub). Diogo Ferrari, Kevin McAlister, Walter Mebane and Patrick Wu, 2019.

Robust Estimation Software (multinomRob). Walter R. Mebane, Jr., and Jasjeet S. Sekhon, 2003.

Genetic Optimization Using Derivatives for R (RGENOUD). Walter R. Mebane, Jr., and Jasjeet S. Sekhon, 2001. (The ancestral GENOUD C program from 1997 is here.)

Genetic Optimization and Bootstrapping of Linear Structures (GENBLIS). Walter R. Mebane, Jr., and Jasjeet S. Sekhon, 1998.

Papers Available for Downloading

Walter R. Mebane, Jr. 2020. `` Frauds in the Korea 2020 Parliamentary Election''

Walter R. Mebane, Jr. 2019. `` Evidence Against Fraudulent Votes Being Decisive in the Bolivia 2019 Election''

Walter R. Mebane, Jr. 2019. `` eforensics: A Bayesian Implementation of A Positive Empirical Model of Election Frauds''

Patrick Y. Wu, Walter R. Mebane, Jr., Logan Woods, Joseph Klaver, and Preston Due. 2019. `` Partisan Associations of Twitter Users Based on Their Self-descriptions and Word Embeddings'' Prepared for presentation at the 2018 Annual Meeting of the American Political Science Association, Washington, DC, Aug 29--Sep 1. 외 다수

메인화면

후원하기

정기후원
일반 후원
ARS 후원하기 1877-0583
무통장입금: 국민은행 917701-01-120396 (주)메이벅스
후원금은 CNN, 뉴욕타임즈, AP통신보다 공정하고
영향력있는 미디어가 되는데 소중히 쓰겠습니다.

Fn투데이는 여러분의 후원금을 귀하게 쓰겠습니다.

편집국 다른기사 보기

이병태머리 2020-05-07 23:05:27 (125.186.***.***)

삭제하기

이병태
머리는 가발인가
아닌가?

곱슬머리 가발이
직모가발보다 더 매력있을
확률은?

이런 걸 공부하는 것이 통계학이다.

김경희 2020-05-04 08:24:41 (211.226.***.***)

삭제하기

불법으로 사전선거에 QR코드사용, 현재 45개 선거구 증거보전신청중 10개 보전신청 인용, 8개 집행, 기각0인데다 세계적 정치통계부정선거 석학교수의 한국 총선은 사기다 라는 논문이 실린 상황에서도 기사화 되는 내용 없이 언론통제되고있으며. 선관위는 민경욱 지역구서 사전 비례대표표와, 인명부 내주지 않고,대전 김소연 대표 지역구 투표함 봉인지 훼손후 판사 판결에도 내주지 않고있는가? 각종 통계는 둘째치고 우리나라서 315부정선거 이래로 증거보전신청이 45개 지역구서 발생된적이 있는가? 이게 기사 한줄이 안나는 지금이정상인가?

그나마 파이낸스에 경의를 표합니다!!!

송석민 2020-05-04 06:15:39 (124.49.***.***)

삭제하기

월터 미베인교수가 찾아낸 부정선거 국가들 이란 2009, 터키 2015, 러시아 2016, 온두라스 2017, 콩코 2018, 케냐 2018, 이라크 2018, 볼리비아 2019, 한국 2020 ( 백악관 청원에 동참해주세요. https://petitions.whitehouse.gov/petition/petition-south-korea-elections-rigged-deliberately-ruling-party-and-moon-jae 이름,성, 이메일 적고 가입하시면 이메일로 날아온 confirm your signiture 클릭.

오준호 2020-05-03 12:54:02 (175.207.***.***)

삭제하기

이게 진실이다.... 아 공포 스럽다... 이나라 이정권... 진실을 알리는 매체가 여기 뿐이라니... 진짜 ....

ps 2020-05-03 02:48:57 (114.200.***.***)

삭제하기

21대 부정선거에 대해 말하는 한국언론이 하나도 없는 줄 알았는데 여기 유일한 한곳이 있었군요
한국언론이 이를 철저하게 숨기는 것이 문정권이 독재정권, 공산정권이란 증거겠죠
부정선거로 다음 대선에도 민주당 승리는 이미 결정이 난겁니다 이미 4번 연속 부정선거로 이긴 결과가 나왔으니 말입니다 드루킹은 초기 작품이고 갈수록 조작선거가 진화된다는 느낌을 받습니다 정말 무서운 한국의 문재인정권입니다 좌파정권이 장기화되면 한국의 공산화는 막을 길이 없을 것 같습니다;

이경규 2020-05-02 16:59:41 (125.186.***.***)

삭제하기

위키피디아에
아프리카의 여러나라들과 함께
케이스가 올라가게 되어서
위대하신 문정부의 커다란 공덕이니

눈알이 안구진탕을 일으켜
팽글팽글 돈다.

Jay 2020-05-02 14:09:13 (125.186.***.***)

삭제하기

통계는 거짓을 하지 않는다.

이번 사전선거에서
진보는 성미가 급해서
미리 투표하고
보수는 느긋해서
당일투표 했다는 이상한
논리를 펴는 사람이 있던데

금,토의 사전선거는 직장에 안 다니는사람
그리고 코로나 피해서 미리 투표한 사람이 많았다.

보수아재 2020-05-02 13:17:34 (175.223.***.***)

삭제하기

논문 마지막 결론 “통계적 결과만으로는 선거에서 무슨 일이 일어났는지에 대한 확실한 증거가 될 수 없다 “ 막 퍼나르다 망신당하겟네 ㅋㅋ

은수미 2020-05-02 12:57:30 (39.7.***.***)

삭제하기

대한민국에 단 하나 남은 진정한 언론사. 너무 감사합니다. 끝까지 파헤쳐주세요!

Go 2020-05-02 12:00:02 (118.235.***.***)

삭제하기

진정한 언론 파이낸스 투데이!! 기사 감사합니다 진실은 밝혀집니다