Enhancing Government Predictive Models with Social Media Intelligence

Written by Arthur Rehling | 23 May, 2025

In a recent collaboration with a government department, our team at QuantSpark tackled an interesting challenge with a critical process for a national security customer: How to improve the performance of an existing predictive model that wasn't delivering the accuracy needed for effective decision-making.

The Challenge

The department’s existing predictive model was not accurate enough to provide useful forecasts, nor flexible enough to enable vital scenario planning to inform decision-making. This meant that the department was frequently surprised by both real-world events and the results of their decisions, incurring costs significant enough to be a major concern at a national level.

When predictive models underperform, the usual approach is to refine the existing data inputs or adjust the model's parameters. However, we proposed a different hypothesis – what if the model was missing a critical data source entirely? Specifically, we suggested that social media conversations might contain valuable signals that could enhance the model’s predictive power.

Our Approach: Harnessing the Power of Social Media Data

To test this hypothesis, we developed a methodology to systematically analyse social media discussions across various dimensions, based on the following activities:

Intelligence-Driven Research: We began with comprehensive open-source intelligence (OSINT) research to identify the characteristics of relevant content. This foundational step helped us understand what to look for in the vast ocean of social media data.
Strategic Data Collection: Working with a specialised third-party provider, we sourced millions of social media posts in a manner compliant with GDPR, using our OSINT findings to focus on the most relevant content for the project's objectives.
Advanced Content Categorisation: We employed a large language model (LLM) to systematically categorise the collected content based on relevance, key topics for our use case, and geographic associations, creating a structured dataset from unstructured conversations.

From Data to Insights

With our dataset in place, we conducted rigorous statistical analysis through a series of hypothesis tests. These tests were designed to evaluate whether the social media signals we had identified could actually improve predictive power for the outcomes the government department was interested in.

The results were compelling: we discovered statistically significant relationships between our aggregated social media dataset and the real-world outcomes the department needed to forecast. There were clear correlations both when analysing changes over time and differences between geographies. This confirmed our initial hypothesis that social conversations contain valuable predictive information that had been missing from their model.

We also developed a roadmap detailing how such an approach could be scaled, automated and extended. One key learning was that while AI processing costs are falling, realistically, with a limited budget, it remains necessary to initially filter the content using less computationally intensive techniques, so as not to run an excessive number of queries for each relevant post identified. This is why the initial OSINT research was so crucial to our process in this project.

Looking Forward

This project demonstrates the untapped potential of social media data, in combination with rapidly advancing generative AI technology, for enhancing predictive analytics. As the per-token processing cost at a given level of intelligence continually decreases, it is becoming increasingly feasible to affordably analyse vast amounts of content (whether focusing on text content or extending the analysis to other media types) to create highly valuable structured datasets.

By systematically collecting, categorising and analysing these social media conversations, we've laid the groundwork for incorporating these insights into the department's predictive framework, in a way that would enable much-improved forecasting and scenario planning capabilities. While we can't share specific details about the model's applications due to the sensitive nature of the work, this methodology has broad implications for how government agencies and other organisations can leverage publicly available data in new ways, thanks to recent advances in AI.

If your organisation is exploring ways to enhance predictive models or unlock value from alternative data sources, we’d love to share more about how this methodology could apply to your use case.

Get in touch to explore how QuantSpark’s AI-led approach can help you make more informed, future-ready decisions.

View full post