Data decay and the continuous present in social media research

Social media platforms are constantly shifting beneath our feet. Sometimes these changes are visible and jarring, for example the roll out of a new feature, a change in the way a feed appears or is ordered. Sometimes they are less visible, backend changes to the way content is algorithmically structured, or shifts in moderation policies to de-platform misinformation.

I want to think about what this means for the accumulation of longitudinal data on how social media operate. At the moment, much social media research focuses on snap data from a moment in time. As social media is both generative (e.g. generating social movements, news, memes, trends) and reactive (e.g. responding to social change, politics, news and popular culture) this approach makes sense. Longer term studies pull data from a few years, a few months or a few weeks. The longer a study stretches across time, the more likely it is to suffer the effects of what I call ‘data decay’.

While data on its surface may seem commensurable, the almost continuous micro adjustments companies make to their social media platforms, means it becomes harder, for example, to compare tweets collected at point a to tweets collected at point b. The challenge of collecting social media data over time becomes a contemporary version of the Ship of Theseus problem. At describes in Plutarch’s Lives of the Noble Greeks and Romans, the problem is thus:

“The ship wherein Theseus and the youth of Athens returned had thirty oars, and was preserved by the Athenians down even to the time of Demetrius Phalereus, for they took away the old planks as they decayed, putting in new and stronger timber in their place, insomuch that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same.”

If the parts of a social media platform are constantly being changed, swapped out, rearranged and tweaked, are we still studying the same social media platform?

At present, researchers largely avoid this problem, by collecting data in one swoop, avoiding the problems of comparing datasets collected at different points in time, under different conditions. However, this means that we, as researchers, lean towards living in the ‘continuous present’, forever re-engaging with the now. Of course, we do what we can to contextualise our findings in and through other scholarly work, and trace shifts and trends over time, but empirically we are still stuck in a continuous present.

Platforms that are always changing the stake of visibility, attention and virality makes making sense of data collected over time a more complex task. This is additionally complicated by the fact that researchers operate on a platform’s good will and good grace (a problem most recently illustrated by Elon Musk’s takeover of Twitter).

This also limits researcher’s ability to collaborate over time. Quantitative survey data is often provided via data commons and other mechanisms to help researchers compare results across time and context. Additionally, the data we collect from social media exists in a legal grey area and is not strictly ‘ours’, and limits our ability to pool data over time.

Over the past 7 years Meta (formerly Facebook) has slowly but surely shut down researcher access to its social media platforms; Instagram and Facebook. Almost as quickly as researchers develop and finesse work-arounds (for example, Instamancer), platforms make them redundant. Platforms like TikTok have no formal researcher or API access at all. Researchers scraping data through second-party APIs have suggested that the data produced by these approaches is often inaccurate on key metrics.

As social media research has become dominated by metricised forms of data gathering and analysis, this has serious implications for what we can know, over time about the platforms we study. Broadly, this trend reflects our preference for, and privileging of, metrics and quantitative data more generally as seemingly objective ways of measuring the heart or truth of platforms, and by extension, the social world. As Jenny Davis and Tony P. Love has carefully argue we must resist the temptation to empirically generalise from social media data to the broader social world. This is true for research of social media and with social media. Empirical social media research are slices of time. When scientists sample arctic ice, they are able to drill down 1000s of metres to analyse how the environment has changed over time. In this metaphor, social media research is examining each of these layers as they are formed, in the present.

To create social media research that moves beyond the continuous present, we need to look carefully and critical at how we can layer data over time to create satisfying and robust insights into this part of our social world (like a really good sandwich). This requires melding deeply qualitative sensibilities with computational approaches - as well as sticking with topics and communities that may fall in and out of visibility. An approach that weights temporality has the capacity to deepened and enhance our ability to theorise social media and speak more fully to what was, what is, and what is to come.

With thanks to Clare Southerton and Jenny Davis for additional insights and feedback