In a recent collaboration with a government department, our team at QuantSpark tackled an interesting challenge with a critical process for a national security customer: How to improve the performance of an existing predictive model that wasn't delivering the accuracy needed for effective decision-making.
The department’s existing predictive model was not accurate enough to provide useful forecasts, nor flexible enough to enable vital scenario planning to inform decision-making. This meant that the department was frequently surprised by both real-world events and the results of their decisions, incurring costs significant enough to be a major concern at a national level.
When predictive models underperform, the usual approach is to refine the existing data inputs or adjust the model's parameters. However, we proposed a different hypothesis – what if the model was missing a critical data source entirely? Specifically, we suggested that social media conversations might contain valuable signals that could enhance the model’s predictive power.
To test this hypothesis, we developed a methodology to systematically analyse social media discussions across various dimensions, based on the following activities:
With our dataset in place, we conducted rigorous statistical analysis through a series of hypothesis tests. These tests were designed to evaluate whether the social media signals we had identified could actually improve predictive power for the outcomes the government department was interested in.
The results were compelling: we discovered statistically significant relationships between our aggregated social media dataset and the real-world outcomes the department needed to forecast. There were clear correlations both when analysing changes over time and differences between geographies. This confirmed our initial hypothesis that social conversations contain valuable predictive information that had been missing from their model.
We also developed a roadmap detailing how such an approach could be scaled, automated and extended. One key learning was that while AI processing costs are falling, realistically, with a limited budget, it remains necessary to initially filter the content using less computationally intensive techniques, so as not to run an excessive number of queries for each relevant post identified. This is why the initial OSINT research was so crucial to our process in this project.
This project demonstrates the untapped potential of social media data, in combination with rapidly advancing generative AI technology, for enhancing predictive analytics. As the per-token processing cost at a given level of intelligence continually decreases, it is becoming increasingly feasible to affordably analyse vast amounts of content (whether focusing on text content or extending the analysis to other media types) to create highly valuable structured datasets.
By systematically collecting, categorising and analysing these social media conversations, we've laid the groundwork for incorporating these insights into the department's predictive framework, in a way that would enable much-improved forecasting and scenario planning capabilities. While we can't share specific details about the model's applications due to the sensitive nature of the work, this methodology has broad implications for how government agencies and other organisations can leverage publicly available data in new ways, thanks to recent advances in AI.
If your organisation is exploring ways to enhance predictive models or unlock value from alternative data sources, we’d love to share more about how this methodology could apply to your use case.
Get in touch to explore how QuantSpark’s AI-led approach can help you make more informed, future-ready decisions.