In late 2018, user u/Godspeedme made a post measuring the popularity of each member of the popular Kpop group, TWICE, in r/twice and r/twicemedia. They took all the posts for each given promotion period, measured the number of posts and the accumulated upvotes, and used them to rank the popularity of each member during a promotion period. I reference it in my first post, where I try to find reasons behind the popularity of members in r/twicememes.
But I actually wanted to learn how to do my own Reddit scrape, and decided to use their posts as a basis. I’ve taken the same code, and extended the timeline to measure the popularity of TWICE members in three more comeback periods (Best Thing I Ever Had, Fancy, and Feel Special). I present the updated popularity rankings below:
It was fun to troubleshoot and learn how this code runs, but I wanted to then add something to this post. And given that I had already made a Reddit scrape program, I turned my attention to using it on r/twicememes. If I could scrape posts by date, I could compare the popularity of TWICE members between these subreddits. Does popularity on TWICE’s more traditional subreddits (r/twice and r/twicemedia) correspond to more popular memes?
The big issue with this is that r/twicememes does not have any rules on titling or flairing. We could easily attribute a post to a member of TWICE in u/Godspeedme’s code because the rules of the subreddit force the name of the member pictured to be included in the title. Given that we don’t have the same luxury with memes, I decided to take the top 50 posts from each promotion period, and count the members that are featured in each meme. This leaves some room for error/judgment, since memes aren’t always clearly about or relevant to all members pictured. 50 isn’t also a great sample size compared to the hundreds/thousands of posts analyzed on r/twice and r/twicemedia. So take any correlations with a grain of salt. Now with that limitation addressed, here are the popularity rankings for each member in r/twicememes compared to the rankings presented earlier:
Just looking at the average ranking between Media (r/twice and r/twicemedia) and Memes (r/twicememe), the standouts are Tzuyu and Dahyun. Tzuyu’s ranking drops four places from media to memes. On the other hand, Dahyun ranks surprisingly low on media (despite being voted as the most popular member in the 2018 Mega survey!) while averaging as the most meme-able member in each era. Her ranking on memes jumps about 4 places from her ranking on the media. This isn’t too surprising, as most ONCE will acknowledge that Dahyun is one of the members comfortable with being silly and participating in variety shows. Meanwhile, Tzuyu is the quietest and vary rarely contributes beyond TwiceTV. Her success in the media subreddits also match her role as the visual of the group.
In terms of similarity between these two rankings, the J-line dominate all the subreddits, making up the top 4 in both lists. Mina reaches her lowest popularity ranking during the Feel Special era: her break from activities for mental health means there are less pictures of her, but that doesn’t stop her from beating Jeongyeon and Jihyo in upvotes. This ranking also matches very closely with my analysis on the top 200 posts on r/twicememes completed earlier, in terms of putting Tzuyu, Jeongyeon, and Jihyo at the bottom, and the J-line + Dahyun near the top.
But these are observations. We want to measure how similar these lists are to decide if popularity in the media subreddits significantly corresponds to popularity of memes. A rank correlation coefficient can be determined in order to identify if there is a statistically significant degree of similarity between these two rankings. Because we also establish rankings for each era, we can compare how well these rankings match for each comeback period. In addition, we compare the averages of these eras to see if the cumulative rankings are similar. Three ranking correlation results are shown below:
If we go era by era, the correlation coefficients are significantly worst. In fact, the only era that has statistical significance is the Fancy era (R value of 0.67, which has p-values <0.10 for Spearman and >0.10 for Pearson). The Spearman coefficient would suggest there is some likely correlation, while the Pearson coefficient would say the two sets are independent. The Likey era has better correlation than some of the other eras, but the sample size for Likey memes is so small that we can consider this an anomaly.
For the last statistical test, we determine the Hoeffding D correlation coefficient, which is good for measuring nonlinear and non-monotonic relationships. We used this technique for just two sets of data because they had the greatest Pearson and Spearman coefficient values: the average ranking for all eras and the ranking in Fancy. For these datasets, both fall very close to 0, indicating that there is no strong nonlinear relationships between media and memes.
So what can we conclude? There is some relation between these two rankings. Ultimately, memes are a reflection of popularity, so there is naturally going to be some level of correlation. Each era on its own does not allow you to predict the popularity of a member on one subreddit vs. the other, but cumulatively there seems to be a fairly understandable correlation based on a few consistent findings. The J-line will occupy the top slots. Jeongyeon and Jihyo are the least popular members. Chaeyoung and Nayeon occupy the middle tier. Dahyun and Tzuyu are more dependent on the type of content, with Dahyun being the most meme-able member in most eras and Tzuyu doing best with more normal forms of media. Though the rankings fluctuate somewhat within these observations, the result is fairly clear. Though one subreddit’s ranking cannot be used to predict popularity in another subreddit on an era-by-era basis, there is a correlation for the overall popularity of members between media subreddits and the meme subreddit.
Sources
- for critical values of Pearson coefficient: https://researchbasics.education.uconn.edu/r_critical_value_table/
- for Spearman coefficient: https://www.york.ac.uk/depts/maths/tables/spearman.pdf
- for Hoeffding D correlation: https://www.princeton.edu/~dtakahas/publications/Brief%20Bioinform-2013-de%20Siqueira%20Santos
- u/Godspeedme’s reddit post: https://old.reddit.com/r/twice/comments/ad0y53/popularity_ranking_of_members_based_on_rtwice_and/
Originally published at http://paukshop.wordpress.com on December 16, 2019.