polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico...
Transcript of polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico...
![Page 1: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/1.jpg)
The Mass Media bias: Analysing and comparing the time series of
polls and news articles during the 2016 USA presidential election.
Federico Albanese([email protected])
Director: Pablo BalenzuelaCodirector: Viktoriya Semeshenko
Departamento de Física, FCEyN-UBA
![Page 2: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/2.jpg)
Objectives
1) Does a Mass media influence the society?
2) Does the negative propaganda have a positive or negative effect in a candidate?
3) Is there a bias in the Mass Media?
![Page 3: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/3.jpg)
Polls
- 263 polls ( an average of 2.7 polls per day)
- Made by: NBC, New York Times, LA Times, CBS, Fox News, Gravis, ABC, IBD (entre otros)
∆(Clinton - Trump)
Time [month]
perc
enta
ge [%
]
![Page 4: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/4.jpg)
MediaNew York Times Fox News Breitbart
[2] https://datascience.berkeley.edu/data-media-map-bitly/
- The most republican media, according to a study made at Berkeley University (2013) [2].
An article by A.J.Delgado in Oct. 22 2015
- Fox News is more conservative,whereas Breitbart is exclusively pro-Trump from the very first day.
[1] Google Trends in the USA between the most important newspapers
- Most consume and most google newspaper in the USA [1].
![Page 5: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/5.jpg)
First look into the data
Clinton Trump
Number of mentions per article in the New York Times
![Page 6: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/6.jpg)
First look into the data
Clinton Trump
Number of mentions per article in the New York Times
Clinton was mention less than 5 times in most of the articles. In contrast, Trump was mention more than 80 times in some articles.
![Page 7: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/7.jpg)
Sentiment AnalysisStandford NLP: The algorithm makes a binary tree from each sentence taking into account the semantic composition.
(There are slow and repetitive parts, but it has just enough spice to keep it interesting )
Going from the children to the root, a sentiment value (positive, negative or neutral) is assigned for each node
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642).
![Page 8: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/8.jpg)
Sentiment AnalysisTime Series:
(1) Republican National Convention(2) First Debate(3) Election Day
Clin
ton
Trum
p
dates
Num
ber o
f fra
ses
# positive frases
# neutral frases
# negative frases
# total frases
(1)
(1)(1)
(1)
(1)
(1)(2)
(2)
(2)
(2)(2)
(2)
(3)
(3)
(3)
(3)
(3)
(3)
Num
ber o
f fra
ses
![Page 9: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/9.jpg)
Linear CorrelationLinear Correlation with a 14 days lag
Coeficient p-value Coeficient p-value Coeficient p-value
Clinton’s positive mentions 0.485 3.43e-6 -0.213 0.05 0.060 0.590
Clinton’s negative mentions 0.394 2.24e-4 -0.682 1.29e-12 -0.319 0.3
Clinton’s total mentions 0.453 1.70e-5 -0.616 5.54e-10 -0.174 0.116
Trump’s positive mentions 0.554 5.64e-8 -0.395 2.20e-4 0.160 0.149
Trump’s negative mentions 0.476 5.39e-6 -0.470 7.54e-6 -0.021 0.853
Trump’s total mentions 0.518 5.31e-7 -0.437 3.62e-5 0.082 0.460
![Page 10: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/10.jpg)
- The more phrases published by the New York Times, bigger the difference in favor of Clinton.
- The more phrases published by Fox News, Trump goes up in the polls and smoller is the difference.
Difference in the polls
Linear CorrelationLinear Correlation with a 14 days lag
Coeficient p-value Coeficient p-value Coeficient p-value
Clinton’s positive mentions 0.485 3.43e-6 -0.213 0.05 0.060 0.590
Clinton’s negative mentions 0.394 2.24e-4 -0.682 1.29e-12 -0.319 0.3
Clinton’s total mentions 0.453 1.70e-5 -0.616 5.54e-10 -0.174 0.116
Trump’s positive mentions 0.554 5.64e-8 -0.395 2.20e-4 0.160 0.149
Trump’s negative mentions 0.476 5.39e-6 -0.470 7.54e-6 -0.021 0.853
Trump’s total mentions 0.518 5.31e-7 -0.437 3.62e-5 0.082 0.460
![Page 11: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/11.jpg)
Mutual Information of the symbolize time series
where Xi and Yj are two random variables and “n” and “m” are the number of possible values for X and Y. The value of MI goes from 0 (no mutual information) and 1 (perfect relation between the variables).
Mutual Information (MI) measures the dependency between two time series:
- The permutation test was used in order to measure the significance of the statistics results [1].
- A symbolization of all the time series was made for this analysis [2]:
[1] François, D., Wertz, V., & Verleysen, M. (2006, April). The permutation test for feature selection by mutual information. In ESANN (pp. 239-244).[2] Bandt, C., & Pompe, B. (2002). Permutation entropy: a natural complexity measure for time series. Physical review letters, 88(17), 174102.
![Page 12: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/12.jpg)
Mutual Information of the symbolize time series
DonaldTrump
Polls of Hillary Clinton
Hillary Clinton
DonaldTrump
It was observed how the sentiment of the frases is important and it is related to the time series of the polls.
![Page 13: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/13.jpg)
Topic Detection:Dimensionality reduction
![Page 14: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/14.jpg)
Topic Detection
Ramos, J. (2003, December). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, pp. 133-142).Xu, W., Liu, X., & Gong, Y. (2003, July). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 267-273). ACM.
Advantages: - Vectors have positive components (easy interpretation)
- Orthogonality is not imposeDisadvantages: - The # of topics is an input, not an output of the algorithm.
Dimensionality reduction:NMF is an algorithm where a matrix V is factorized into two matrices W and H (M ≈ H*W ), with the property that all three matrices have no negative elements.
How could you mathematically represent a document?
- Vectors
V = [ ... , TF(t)*IDF(t) , … ] -> dim = # words
con:
where N is the # of documents and nt the # of documents in which the word t appears.
Combining all the vectors of all the documents, we have a matrix M
![Page 15: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/15.jpg)
Non Negative Matrix Factorization (NMF)
ECONOMY Social Issues: Immigration
![Page 16: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/16.jpg)
Detección de tópicos para cada medio por separado
Social Issues(Immigration and racism)
Economy
week review
Clinton’s and Trump’s scandals
Art
Foreign affairs
Temas:
Elections
Clinton’s email scandal
Social issues(immigration)
Economy
Foreign affairs
Clinton foundation scandals
Temas:
Social issues (racism)
FBI investigation of the Clinton’s emails
third party
Clinton foundation scandals
Social issues(immigration)
Clinton’s email scandal
Temas:
![Page 17: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04c1b97e708231d40f8cef/html5/thumbnails/17.jpg)
<< [email protected] >>