Centrality in actor graph and popularity. Are they linked?

We want to use wikipedia pageviews as a proxy for popularity, and try to find a correlation the centrality of actors in the actor graph. For this we used the dataset of pageviews from Homework 2. Since Wikipedia was founded at the start of the century, pageviews might not be relevant for older actors. To eliminate this possible bias we considered only actors having played in recent movies and formed the actor-graph only based on the recent movies.

We therefore consider 49481 actors accross 41039 movies, 4631 of which (the actors) we have the pageview count.

Computing centrality

We focused on three metrics of centrality: - Degree centrality: With how many other actors have the actors played - Eigenvector centrality - Betweenness: how much an actor bridges communities of actors

Who are the most central actors

For those interested, we list here the most central actors.

degree_centrality actor_name
70 0.011601 Anupam Kher
744 0.010853 Jane Lynch
748 0.010287 Samuel L. Jackson
710 0.009903 David Koechner
202 0.009863 Justin Long
eigenvector_centrality actor_name
1150 0.095907 David Strathairn
6434 0.095699 Nicole Kidman
4889 0.095603 Clive Owen
6102 0.094613 Parker Posey
4467 0.094151 Rodrigo Santoro
betweenness_centrality actor_name
70 0.022059 Anupam Kher
313 0.013711 Michael Madsen
5728 0.013613 Lee Byung-Hun
5257 0.011191 Vera Farmiga
10 0.009740 Nassar

Comparing centrality to pageviews

Naïve visualisation

We first try to visualise the two values together.

The correlations we found were very modest with R^2 values of 0.016, 0.013, and 0.007 respectively

The impact of movie count

For movie count.
    We find an average increase of 50.742% in pageviews per 10x increase in movie count
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept       4.7326      0.015    324.735      0.000       4.704       4.761
movie_count     0.1782      0.018      9.787      0.000       0.143       0.214
===============================================================================

For degree centrality.
    We find an average increase of 3.7x in centrality per 10x increase in movie count
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      -3.6732      0.007   -529.463      0.000      -3.687      -3.660
movie_count     0.5656      0.009     65.250      0.000       0.549       0.583
===============================================================================

For eigenvector centrality.
    We find an average increase of 19.8x in centrality per 10x increase in movie count
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      -4.8026      0.026   -187.939      0.000      -4.853      -4.753
movie_count     1.2970      0.032     40.620      0.000       1.234       1.360
===============================================================================

For betweenness centrality.
    We find an average increase of 10.7x in centrality per 10x increase in movie count
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      -5.3659      0.042   -126.906      0.000      -5.449      -5.283
movie_count     1.0284      0.040     25.452      0.000       0.949       1.108
===============================================================================

Here the values are better correlated, with R^2 values of 0.479, 0.263, and 0.212 respectively. Movie count having a significant correlation with pageviews as well as with centrality, it acts as a confounder. We therefore aim to isolate the effects of centrality from those of movie count by using A/B testing

The A/B tests

The A/B test is done by making pairs of actors with similar numbers of movies (Here similar means that their number of movies are less than 5% different) such that the first actor has a lower centrality than the second.

For degree centrality.
    We find an average increase of -2.43% in pageviews per 2x increase in centrality
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.0007      0.020      0.033      0.974      -0.038       0.040
logratio_centrality    -0.0355      0.008     -4.231      0.000      -0.052      -0.019
=======================================================================================

For eigenvector centrality.
    We find an average increase of 4.73% in pageviews per 2x increase in centrality
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept              -0.0351      0.019     -1.884      0.060      -0.072       0.001
logratio_centrality     0.0667      0.003     23.301      0.000       0.061       0.072
=======================================================================================

For betweenness centrality.
    We find an average increase of 2.58% in pageviews per 2x increase in centrality
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.2812      0.026     10.966      0.000       0.231       0.331
logratio_centrality     0.0367      0.004      9.174      0.000       0.029       0.045
=======================================================================================

In the end we find very little increases in pageviews when varying the centrality of actors. There is even a slight decrease concerning the degree centrality, although it is not statistically significant. The largest increase is with the eigenvalue centrality, with an average increase of 4.73% in pageviews per 2x increase in centrality, but with an R^2 value of 0.007. We conclude that if there is a relationship between the centrality of actors and their popularity, it is barely noticeable and not interesting.