Fake news: An exploratory dive into ways to identify misinformation in a network

03 Dec, 2019

Tags: bayesian-statistics machine-learning

Fake news is a false piece of information that was purposely created to deceive a person. The effect on a single individual can be devastating, however, modern trends show that actors are deploying fake news on a large scale to influence a group of users. By manual inspection it is difficult to detect fake news, since some take the form of half-truths, which may contain some truth but have been deceptively manipulated to fool the reader. The credulity of fake news can vary from outright insanity to likely real. A crafty attacker will stick to creating news that has the semblance of reality, as it is likely to yield the greatest utility. A piece of typical fake news can take the form of innuendos, rumours, and gossip among others. It is critical to note that unpleasant news that is true cannot be categorized as fake news. In this blog post, we will work to understand fake news from both perspectives (attacker, victim). As part of setting the stage for this research, we will pretend to be the attacker and simulate their reasoning. Fake news is a subjective term that has various meanings to different people. For example, a political operative sees fake news as anything that makes them lose an election. We will stay apolitical for the remainder of this blog post. Fake news is a news item that is designed to deceive the user. A few motivations for creating fake news may include comic and malicious intent. It can take the form of a misinformation campaign, news with doctored artifacts, half-truth, and extreme sensationalism among others. Fake news can also be weaponized to serve geopolitical interests. There was a case during the Cold War where the US used the pretext of deep-sea mining to hunt for a lost Soviet submarine [[1]](). This is another classical example of fake news coming to life. The explosive growth of the Internet has resulted in a sizable population that socializes, trades, studies, and gets entertained on the World Wide Web. It is imperative to note that things have changed in modern times, as anyone can be a citizen journalist by virtue of social media. Moreover, it is more difficult to police the World Wide Web. The potential reach of the Internet has made it an attractive avenue for special interest groups and government-sanctioned actors [[2]]() to deceive the public by distributing misinformation in a viral spread. The fake news problem is exacerbated by the ease of creating and disseminating news in social media. It should be noted that not all fake news is created with a malignant intent, as there is also false news made for entertainment purposes [[3]](). The detection of fake news is an interesting research topic, as there are significant economic and political benefits. Experiences have shown that the impact of fake news can be devastating as it can lead to the crumbling of businesses and government officials, as recent experiences have shown. False information can cause psychological and emotional distress to the audience. It is blurry, as news organizations tend to disseminate news with either a conservative or liberal bias. In no specific order, this blog post will attempt to answer the following questions as part of this research: - What are the motives for the creators of fake news? - How can the authors of fake news optimize their rewards? - What is the cost of staging an attack? - How to detect fake news? - How does fake news spread on a network? There is an old saying that says, “If you want to catch a thief, then you have to think like one”. We would like to follow the reasoning of the attacker. There are benefits for the dissemination of fake news and as such, it is sensible to optimize the reward of an attack. Unfortunately, the future is bright for bad actors as there are advances in machine learning models due to the wider availability of research papers and open source packages. The trend has led to a realistic generation of false information. Furthermore, the adversary can dedicate few computing resources to the task, because of the reduced cost of staging a successful attack. ### Content vs Context In addition to textual content, raw news can provide video, audio, and images. The news content is converted into a vector representation that can be fed into a machine learning model. Extra metadata can be obtained (spam, sentiment, subjectivity, stance) which can provide more attributes for feature engineering. We must avoid the second-system effect, in which a poor prediction in the pipeline due to derived metadata can lead to a degradation in the performance of the fake news classifier. These contents can be transformed into a suitable representation that can be fed into a classifier. Any news classifier that is focused solely on the textual content to detect fake news is likely to be fooled, especially if the system is gamed. It is desirable to incorporate the context of the news in the modelling stage. The context is the circumstances around the news. The amount of shock or amazement generated byted by a particular news item can be influenced or conditioned on prior related news. Both content and context have to be incorporated to produce a robust classifier. For a fake news classifier to be useful, it must consider context such as geographical information, metadata, the reputation of the news organization \& journalist, and the temporal nature of the news. The locations where news originates can affect how the facts are embellished. Metaphors can disguise a piece of news. This kind of information can be seen as misinformation to people outside the cultural context, but the targeted group will understand the message. This must also be taken into account when modelling, otherwise, the model will perform poorly. Spatio-temporal is based on the categorization of news by geographical location with time-sensitive properties as the relevance of a news item decays with time. In this way, we can create time series data by aggregating the news items into a suitable time frame and geographical location for further analysis. In this blog post, we focus on the spread of news over a network. This is because it is easier to fix inaccurate information in traditional news outlets by legislation. A fake news item in isolation does not spread. Conventional wisdom tells us that your information will remain unknown to the public; given that you do not share with a trusted friend, and there is no side-channel attack. There are existing works on graph analytics that discuss concepts such as page rank, eigenvector centrality, Laplacian centrality in the context of the graph as shown in the [paper](https://www.researchgate.net/profile/Muhammad_Rezaul_Karim/publication/263314701_Robust_Features_for_Detecting_Evasive_Spammers_in_Twitter/links/552208710cf2f9c130529b10.pdf), we did not investigate this path any further in this blog post. Contextual information can incorporate the action of an actor using ideas from game theory. The process of staging an effective attack requires understanding the problem of “when to post”. This is because it is in an attacker's best interests to figure out the most appropriate time to send fake news on the network to inflict the greatest devastation. The naive way to solve the "when to publish" problem is to post at the time when most social media users are online. However, if the world was that simple, then we would not need to do research. The current world has multiple time zones, so when it is day somewhere, it is night elsewhere. Another approach would be to identify nodes in the network that are articulation points and trick those users into retweeting their posts. The position of the tweet on the news feed would impact how likely it would spread. There is a rigorous [paper](https://arxiv.org/abs/1610.05773) on the subject, More information is provided in an upcoming section that discusses Bayesian modelling. ### Detecting Fake news A model in the wild is only useful if it incorporates sufficient error analysis like error bars, confidence level, and uncertainty quantification in evaluating its performance. This premise does not rule out the existence of a specialized detector that works well for a class of fake news. This is in contrast with my emphasis on a generic classifier that works for every fake news. Read more about the limitations of the current AI system and the definition of [intelligence](https://arxiv.org/pdf/1911.01547.pdf). The likelihood of gaming a fake news classifier is high, and as such consideration must be taken to incorporate defences against adversarial attacks. Unbalanced data sets will always pose a problem. This would necessitate future advances in single-shot learning in NLP. This is a long way off from reality as current models are data-hungry. There is still hope, as transfer learning and distillation techniques can be carefully applied to mitigate some problems with the data. However, in this blog post, we make use of Bayesian modelling for content analysis and SIR model for relationship analysis. The justifications for Bayesian modelling will be discussed in the next section. We present two mathematical frameworks - Bayesian modelling (for content analysis) - SIR (for context analysis) ##### Bayesian Modelling Bayesian statistics is a probabilistic framework that allows for adjusting the degree of belief as more data becomes available. At the beginning of the analysis, one has to specify a prior which is your belief about the underlying phenomenon that you are trying to model. This prior is updated as more information becomes available. The choice of priors can influence your analysis. This has led to a discussion about what constitutes a suitable prior. There are some schools of thought that categorize priors as informative or uninformative based on the influence that they have on the results of our analyses. For discussion on the subject, read this [tutorial](http://www.scholarpedia.org/article/Bayesian_statistics) and [blog](https://sempwn.github.io/blog/). The Bayesian approach to modelling the utility of fake news in the blog post was inspired by the previous work of Evan Miller's work on deriving Reddit formula [[4]]() and inferring tweet quality [[5]]() using the expected utility hypothesis [[6]](). Our formulation is similar to their work, but we added modelling of fake news which was missing in the original formulation [[4, 5]](). If we know that the news is fake, then there is no point in solving the problem and no need to write another blog post. This work makes use of a quantifiable metric of like / dislike as a form of upvote and downvote is a requirement for the news to become viral. As such, it should be included in our model for modelling the spread of fake news in social networks. When using the expected utility theorem [[6]](), we may have to decide on the values of the utility vector. Human experience can help us find this. As a consequence, we have manually assigned arbitrary values for the utility of several cases. We have also reasonably excluded a few that are not likely to be encountered in the wild. The ideal situation is to use counterfactual analysis to estimate the utility vector. It is imperative to understand the blurred line between ethical and legal conduct. This is because it is problematic to feed people with fake news on purpose as part of an experiment. This will become a public relations issue for the firm if it becomes public. The goal of our analysis is to obtain a closed-form solution to the problem, as it is easier to implement. Fake news effects will be captured by modelling as a Poisson process. We will set the stage by describing what a Poisson distribution is about. Poisson's distribution is ideal for modelling the distribution of scarce events in a given population. It is suitable when the average time between events is observable and constant, but the real timing between events is stochastic. Every recent event is independent of the past event. The fake news occurs randomly In the news stream, and as such the Poisson distribution fits our problem. For more information on the Poisson distribution, see [blog](https://towardsdatascience.com/the-poisson-distribution-and-poisson-process-explained-4e2cb17d459). The expected utility of a news item can be useful for ranking news items, as we can penalize fake news in a list of available news. Visitors reload the page every $s$ seconds on average. \begin{equation} \lambda = \frac{1}{s} \end{equation} Users are reloading their page. This could be an implicit action (default by the web browser) or explicitly by the user. $\lambda$ can be known as the time between reloads. We can now see why the Poisson distribution is a reasonable distribution to use here. In a social network, users can like stories. The list of events: + $P$: is it fake news + $Q$: is it liked the news + $R$: is it new (recent) The probability of events: + $p$: probability a story is fake + $q$: probability a story is liked + $r$: the probability that a story is new The table of possibilities can be seen below. | $P$ | $Q$ | $R$ | Probability | Payoff | |----|----|-----|-------------|-------:| |T|T|T|pqr|a| |T|T|F|pq(1-r)|b| |T|F|T|p(1-q)r|c| |T|F|F|p(1-q)(1-r)|d| |F|T|T|(1-p)qr|e| |F|T|F|(1-p)q(1-r)|f| |F|F|T|(1-p)(1-q)r|g| |F|F|F|(1-p)(1-q)(1-r)|h| Table 1: list of possibilities It is a common practice to ignore the cases that are not likely to happen by setting the payoff to 0. \begin{equation} u(p, q, r) = pqr \times a + pq(1-r) \times b + p(1-q)r \times c + p(1-q)(1-r) \times d + (1-p)qr \times e + (1-p)q(1-r) \times f + (1-p)(1-q)r \times g + (1-p)(1-q)(1-r)\times h \end{equation} Where $u(p, q, r)$ is expected utility. \begin{equation} E[q] = \frac{U+1}{U+D+2} \end{equation} \begin{equation} E[1-q] = \frac{D+1}{U+D+2} \end{equation} where $U$, $D$, $E[q]$, $E[1-q]$ is the number of upvotes, number of downvotes, proportion of upvotes, and proportion of downvotes of the story respectively. The +1 and +2 are from a beta distribution. \begin{equation} r = e^{-\lambda s} \end{equation} \begin{equation} p = \frac{\lambda^{k}e^{-\lambda}}{k!} \end{equation} Where $r$ is the probability that it is at least $s$ seconds since last reload ($\lambda$), and $p$ is the probability that fake news of $k$ fake stories has arrived. The value of $k$ can be empirically determined and application-specific. Rewrite $u(p, q, r)$ in terms of $U$, $D$ In our formulation, we chose $a=1$, $b=1$, $c=1$, $d=1$, $e=1$, $f=1$, $g=0$, and $h=0$. \begin{equation} u(p, q, r) = p(qr \times a + q(1-r) \times b + (1-q)r \times c + (1-q)(1-r) \times d )) + (1-p)(qr \times e + q(1-r) \times f) \end{equation} Input the values of the probabilities. Use a placeholder variable to keep the equation within browser width. \begin{equation} u(p, q, r) = \left(\left(\frac{\lambda^{k}e^{-\lambda}}{k!} \right) \times \left( \frac{U+1}{U+D+2} e^{-\lambda s} + \frac{U+1}{U+D+2} 1-e^{-\lambda s} + \frac{D+1}{U+D+2} e^{-\lambda s} + \frac{D+1}{U+D+2} 1-e^{-\lambda s} \right) \right) + \left(1 - \frac{\lambda^{k}e^{-\lambda}}{k!}\right) \left(\frac{U+1}{U+D+2}\right) \end{equation} The equation reduces to the product of the probability and proportion of downvotes. We can say that the model makes the claim based on the probability at least k stories are fake news and the proportion of downvotes. We weigh the probability by the ratio of downvotes. The entire details of the proof can be found [here](/static/attachment/fake_news_proof.pdf). \begin{equation} u(p, q, r) = \frac{\lambda^{k}e^{-\lambda}}{k!} \frac{D+1}{U+D+2} \end{equation} The expected utility, $u(p, q, r)$, provides a scoring function for ordering news in a list. A higher score indicates a higher chance of the news becoming fake, while a lower score indicates less chance that the news is fake. This is only useful if the nodes in the network are acting in good faith without compromise. Given how we decided on the utility of some cases, we ended up with a bias closed-form equation of the expected utility. Our derived measure that considers fake news to be anything that has been downgraded by the user. Experience has shown that is not the case in the real world, as fake news is designed to fool readers. However, this issue in our derived formulation can be remedied by giving more weight to more probable cases. This is not a challenging proposition and as such, it is left as an exercise to the reader. The Bayesian framework has been used to model the appropriate time to make a post on social media to get the maximum impact. Bad actors can infer the most appropriate time to post fake news on social media that would have the greatest impact on the followers, using ideas similar to those discussed in [paper](https://arxiv.org/abs/1610.05773). ##### Spread of fake news Social influence can be estimated by using several techniques, including experimental generalization, agent-based modelling, and compartmental modelling [[7]](). Experimental generalization makes sense of the dynamics of spreading behavior by finding patterns that match historical data. This does not provide individual-level insight. Agent-based modelling builds individual agents that can capture the dynamics of spread behaviour by observing the aggregate behaviour of interacting agents. It is difficult to fit the parameters properly to the data [[7]](). Compartmental modelling categorizes individuals into several states, which can transition between states with a given probability. In contrast to agent-based models [[7]](), this provides more control over the dynamics. We will use SIR modelling which is a form of Compartmental models in the blog post. The relationship between entities can be more informative than the content of the entity, and this has motivated our research into the spread of news on social networks. The edges (links) and nodes of the graph can encode relation-specific information. Some encoded information includes the strength of a link between nodes as a measure of the propensity of a node to influence other (neighbouring) nodes. All nodes are not created equal in a typical graph, and as such some will be more prominent than others. In our formulation, a node in the graph represents a user. The users differ in their degrees of influence on their peers. In social science circles, it is called peer pressure. In social network analysis, page rank algorithms are used to measure the influence of the nodes in the graph. Nodes with higher influence on their peers are said to be influencers. In the original formulation [[4,5]](), the author estimated the number of people who saw a retweet. We change the purpose of the analysis to provide a closed-form solution to the spread of fake news by re-purposing the existing formulation to suit our use case. $u$: is the fraction of the user who reshared a fake news item. This is between 0 and 1. 0 means nobody and 1 means everybody. - $n_d$: number of users at distance, d. - $q_d(t)$: represent a fraction of users at distance, d that saw initial (fake) news. - $R_d(t)$: number of users at a distance, d, who reshared fake news within time, t. \begin{equation} R_d(t) = n_d q_d(t)u \end{equation} \begin{equation} q_d(t) = u^{d-1}\left(1-\sum_{n=0}^{d-1} \frac{(\lambda t)^n e^{-\lambda t}}{n!}\right) \end{equation} \begin{equation} R(t) = \sum_d R_d(t) \end{equation} We have provided equations to model the flow of fake news in the network with depth (hops) with the time taken from the source to the current node in the graph. We obtained a function for the number of users who reshared fake news at depth, d. The goal is to estimate the spread of news given the fact that we know that it is fake. We begin by representing the problem as a graph structure of relationships between users on the network. This is essentially capturing the followings and followers of a user in context. The node is a user and the edge is a connection between a user and its neighbours, which is typically a directed graph to appropriately capture followers and followings in a structured manner. SIR is a well-known method for modelling the spread of infectious diseases. We are adapting it for use [[8]]() for capturing the dynamics of the spread of fake news [[9]](). Traditional SIR has many limitations, including the fact that people who read fabricated news need to recover after a set time. At the time of reading, the reader does not even know if it is fake news. + $S$: number of susceptible people + $I$ : number of Infected people + $R$: number of recovered people (some may have immunity) + $N$: total population + $\beta$ : infection rate + $v$ : recovery rate \begin{equation} S + I + R = N \end{equation} \begin{equation} S = \frac{-\beta IS}{N} \end{equation} \begin{equation} I = \frac{(\beta IS - vIR)}{N} \end{equation} \begin{equation} R = \frac {vIR}{N} \end{equation} We get a time series of the occurrence of the news at aggregated time intervals. The next step is to estimate the equation of the curve, $I(t)$. The following parameters $S$, $I$, $R$, $N$, $\beta$ ,and $v$ at different time steps $t$, can be inferred by solving an ODE. Percolation theory is a model that explains how a liquid can pass through a material with pores. We think that it is related to the SIR model and provide a reference to a proof. The percolation theory can be used as a way of modelling the spread of news. I did an investigation and realized that it is conceptually similar to the SIR model. For a primer on [percolation theory](http://www.math.chalmers.se/~steif/perc.pdf). One may be considered a specialization or generalization of the other. There is a relationship between thresholds shown in [Equation 4]() of [paper]( https://arxiv.org/pdf/1401.4208.pdf ) with critical probability in percolation theory, which demonstrates a connection to SIR. I was thinking that the critical probability of percolation theory can be used as an estimation of the severity of the epidemic. This can also capture the severity of fake news spread over a social network. The influential nodes would efficiently propagate the information to many nodes. The influence of a node can be estimated by page rank. The higher the values of the page rank of a node, the more severe the epidemic would be if the node is activated. The spread of fake news can be analogous to the spread of infectious disease and recovery from a disease in a population. However, it is imperative to clarify that every metaphor is flawed in some ways. The infection is like being infected with fake news, while the recovery is like having been vaccinated or having debunked the fake news. The disease can only spread by interpersonal contacts, which is analogous to how neighbors of the nodes can be infected. ### Discussion The Bayesian formulation described in Subsection on Bayesian Modelling as the fake news spread starts from a source node and travels through a path through the graphs. It stops at the leaf. This modelling requires an explicit graph to store the relationship between nodes. As fake news spreads to the nearest nodes, it necessitates the need for neighbourhood relations in a graph. We employed a model choice that allows fine-grained localized control of news propagation in a depth-wise manner. Our formulation has a built-in mechanism for modelling the aging of the news items. Recent news items tend to be more relevant to users since they are time-sensitive. SIR model described in Subsection on Spread of Fake news. This is because the model does not require an explicit graph to store the relationship between nodes and instead uses a state model. The compartments, $S, I, R$ are a disjointed subset of the universal set of users in the graph. Instead, the compartment is modelled as a state. This state undergoes transition, we people who have believed the fake news representing a state transition from $S$ to $I$. An informative campaign to debunk fake news results in a transition from $I$ to $R$. Once a person has recovered from the effect of fake news, it is unlikely that they will fall victim to the same fake news in the future again. SIR does not provide fine-grained control of the spread of news in a layer-wise manner as it models the spread of news on a global scale by considering the entire graph at once. However, the default SIR model does not model the aging of the news items. As a result of its time-sensitive nature, recent news items tend to be more relevant to users. The Bayesian model is equivalent to the SI model if the depth of nodes constraint in the formulation is relaxed. The news spreads from the source node to the leaf of the directed graph. This is because there is no recovery in our modelling design, as it is a one-way spread. The learning rates $\beta$, $v$ in the Equations of the SIR model are proportional to the Equations of the Bayesian formula. The parameter, $u$ helps to select a subset of the total population, $N$. The cumulative effects on the rates ($\beta \gg v$) can have a discounting effect on $S$ as the size of $S$ shrinks towards 0 while the size of $I$ increases. Similarly, the cumulative effects on the rates ($v \gg \beta$) can have a discounting effect on $S$ and $I$ is the size of $I$ shrinks towards 0 while the size of $R$ increases. The cumulative effects on t There are challenges with deploying SIR model in production. This includes: detecting the boundaries of the compartments as the sizes of $S$, $I$, and $R$ is unknown and the transition rates $\beta$, $v$ is also unknown. These challenges are similar for Bayesian, as the parameters ($R_d(t), q_d(t)$) are unknown. These unknown parameters must be well-chosen to ensure the model works properly. The possible issues with incorporating vaccination in the SIR. In the short term, prior vaccination would reduce the population of the susceptible. This would reduce the likelihood that they will fall victim to fake news in the future. This is because they seek out to verify facts, thereby truncating the spread of fake news. In fact, empirical evidence indicates that despite a prior flu vaccination, one can succumb to the same disease. The prior vaccination can be likened to the education of the masses about the prevalence of fake news in society. We can determine if the susceptibles have been vaccinated or not. As a result, we arrive at VSIR, which is a model that we are proposing. One may argue that vaccination would have reduced the susceptible count. We can claim that VSIR is a specialization of the SIR model under the condition that the vaccination works as intended for all cases. This is unrealistic in the real world. ### Conclusions The current push for election integrity in developed economies has created momentum for increased investment in research advancing the state of fake news detection. We have tried to convince the reader that detecting fake news is a multidisciplinary effort that is geared towards solving ill-posed challenges. Bayesian modelling is a convenient framework for modelling fake news using the Bayes theorem. It is pertinent to note that in several aspects of fake news, analytics requires a human factor. Therefore, the multifaceted nature of news analytics is better assessed through visual analytics. ### Acknowledgements I would like to thank Dr. Ziyuan Gao and Dr. Eugene Yablonsky from the National University of Singapore and Fraser Valley University, BC respectively. They provided technical support during the writing of this manuscript. I would also like to thank Aleksey Nozdryn-Plotnicki for a thorough review of this work that led to a complete rewrite of the article. Also, I would like to thank Dr. Dave Campbell and Dr. Mike Irvine for introducing me to Bayesian modeling by suggesting several research papers on the subject. However, I might have misused the Bayesian concept in my modeling choices despite my sincere intentions. Furthermore, I am grateful to the advanced reading group under the umbrella of [Learn Data Science](https://www.meetup.com/LearnDataScience/) meetup in Vancouver for the lively intellectual conversations on several machine learning topics. ### References - [[1]]() https://www.bbc.co.uk/news/resources/idt-sh/deep_sea_mining - [[2]]() https://www.bbc.com/news/world-us-canada-49100778 - [[3]]() https://www.theonion.com/ - [[4]]() http://www.evanmiller.org/deriving-the-reddit-formula.html - [[5]]() http://www.evanmiller.org/inferring-tweet-quality-from-retweets.html - [[6]]() https://en.wikipedia.org/wiki/Expected_utility_hypothesis - [[7]]() Mønsted, B. M., Sapiezynski, P., Ferrara, E., \& Jørgensen, S. L. (2017). Evidence of complex contagion of information in social media: An experiment using Twitter bots. PLoS One, 12(9), [e0184148]. https://doi.org/10.1371/journal.pone.0184148 - [[8]]() Cannarella, J., \& Spechler, J.A. (2014). Epidemiological modeling of online social network dynamics. ArXiv, abs/1401.4208. - [[9]]() http://people.cs.vt.edu/naren/papers/news-rumor-epi-snakdd13.pdf ### **How to Cite this Article** ``` BibTeX Citation @article{kodoh2019a, author = {Odoh, Kenneth}, title = {Fake news: An exploratory dive into ways to identify misinformation in a network}, year = {2019}, note = {https://kenluck2001.github.io/blog_post/fake_news_an_exploratory_dive_into_ways_to_identify_misinformation_in_a_network.html} } ```

14/18

Please feel free to donate to support my work by clicking

Read more of our blog posts, technical talks, and publications.