

.jpg)
The Coming Sources of Open Sources

By Colin Crowden
GPT models are on track to become the default gateway to information. As competition intensifies, the biggest models will not rely on open sources. They will
seek exclusive access to proprietary datasets. If that happens, the open internet will
shrink, search engines will fade, and OSINT will face the most significant transformation in its short history.
When GPT Replaces Search: The Coming Collapse of Open Sources
Imagine waking up one morning and realising that Google is no longer the front door
to the internet. The line of blue links has disappeared. The search bar still exists, but it
has become a formality because you and everyone else now turn directly to ChatGPT,
Gemini, Claude, or whichever model dominates.
This shift is already visible. On 30 October 2025, the Digital Watch Observatory
reported that Google removed the option to display more than 10 search results per
page.
Full citation: https://dig.watch/updates/google-limits-search-results-to-10-per-page
The change may appear modest, but it has major implications. It signals a strategic
pivot away from open discovery and towards curated, limited, and controlled access
to information.
If GPT models continue to replace traditional search, the pressure to secure exclusive
datasets will increase. The open internet will contract, and OSINT will enter a
fundamentally different era.
The Incentive Shift: From Open Web to Controlled Data
Search engines were built on openness. Their value depended on crawling as much of
the web as possible, indexing content, and monetising user intent through advertising.
LLMs operate differently. They thrive on exclusivity. A model trained on the same
open data as its competitors has no advantage. The real edge comes from:
-
Exclusive datasets
-
Private partnerships
-
Proprietary corpora unavailable to rivals
-
High fidelity data absent from the open web
-
Restricted sources that once appeared in search results but can no longer be accessed
This incentive structure is already changing the information landscape.
Google’s restriction benefits Gemini and disadvantages other LLMs
Google’s new limit reduces the discoverable surface of the web for all external LLMs
that depend on SERP outputs, long-tail URLs, and deep indexing. Meanwhile, Google’s own LLM, Gemini, retains full internal access to Google’s complete index, cached pages, and deep search capabilities.
Public access is reduced. Gemini remains in the loop.
This creates a competitive imbalance. Other LLMs lose access to long-tail information.
Gemini retains a privileged internal advantage. Google becomes both gatekeeper and
competitor. The long tail of the internet, where much of OSINT’s richest material
resides, is being hidden from everyone except Google’s own AI systems.
Warnings from AI Leaders: The Rise of the Dead Internet
The idea that the internet is becoming hollowed out or increasingly synthetic is no
longer fringe speculation. Prominent figures in AI and technology have publicly
referenced the concept of a "dead internet."
-
Sam Altman, CEO of OpenAI, said he now sees a large number of LLM run Twitter accounts and suggested that the Dead Internet Theory might be partly true. Full citation: https://www.independent.co.uk/bulletin/news/openai-sam-altman-dead-internet-theory-b2820388.html
-
Alexis Ohanian, co founder of Reddit, stated that much of the internet is now dead and argued that AI generated content and bot engagement are overwhelming human created spaces. Full citation: https://www.businessinsider.com/alexis-ohanian-much-of-the-internet-is-now-dead-2025-10
-
Analysts have also warned that online content may be moving towards a state where synthetic material dominates to the point that 99.9 per cent of the web could be AI generated. Full citation: https://www.galaxy.com/insights/perspectives/dead-internet-theory-collapse-online-truth
These warnings matter. When the leaders of major AI labs and foundational internet
platforms say the web is becoming artificial and opaque, it signals a structural shift
that directly affects OSINT.
A Wider Recognition: The AI Compute Race as a 2026 Risk
Control Risks has identified the global AI compute race as a strategic risk for 2026. Their assessment focuses on energy shortages, water usage, and infrastructure strain across hyperscale data centres.
This is only part of the picture. As compute becomes scarce, high quality and exclusive data becomes a strategic asset. Organisations will compete not only for GPUs and electricity but also for data access, licensing agreements, exclusive corpora, and proprietary insights.
This further incentivises the tightening of access to the open web. The AI arms race becomes as much about control of information as about access to silicon.
The Disappearing Web: A Structural Threat to OSINT
OSINT rests on one assumption. Somewhere on the web, there is publicly available information that can be found, indexed, and interrogated.
As the internet becomes more closed, that assumption weakens.
We are moving towards a world where:
-
websites allow crawling only by approved LLMs
-
niche publications and local news sources move behind paywalls
-
social platforms restrict APIs and block scraping
-
governments designate datasets as strategic or controlled assets
-
proprietary LLM ecosystems offer access only to trusted partners or
subscribers
OSINT will become more expensive, less democratic, and more reliant on controlled
systems.
Testing the Hypothesis
Test 1: Adoption Curve
LLM usage is rising rapidly, while traditional search has plateaued.
Result: supports the hypothesis.
Test 2: Commercial Incentives
Models differentiate through exclusive datasets rather than architecture.
Result: strongly supports the hypothesis.
Test 3: Platform Behaviour
Google’s restriction reduces visibility for external LLMs while keeping internal access
for Gemini.
Full citation: https://dig.watch/updates/google-limits-search-results-to-10-per-page
Result: supports the hypothesis.
Test 4: Growing AI Generated Content
Warnings from AI leaders about a dead internet suggest a landscape increasingly
dominated by synthetic content.
Result: strongly supports the hypothesis.
Test 5: OSINT Adaptability
OSINT can adapt but will face higher costs and more complex access requirements.
Result: supports the hypothesis.
Adapting OSINT: Leveraging LLMs in a Gated Future
OSINT practitioners will need to evolve rapidly.
1. Reverse engineering LLM prompts
LLMs will become information chokepoints. Analysts must infer:
-
likely training sources
-
dataset biases and gaps
-
hidden reasoning patterns
-
indications that information is synthetic
-
whether outputs imply access to exclusive corpora
The model becomes an intelligence target.
2. Extracting latent information
LLMs may reveal useful signals indirectly. Analysts will need structured methods for
probing, challenging, and validating these insights.
3. Fusing gated ecosystems
Future OSINT workflows will rely on:
-
paid data feeds
-
licensed databases
-
commercial intelligence platforms
-
sensor, geospatial, and archival sources
-
human networks and offline intelligence
Data fusion becomes central.
4. Using LLMs as investigative tools
LLMs will be used for:
-
hypothesis testing
-
anomaly detection
-
linguistic pattern analysis
-
lead generation
-
red teaming
OSINT shifts from open web discovery to controlled system interrogation.
What This Means for OSINT Teams
To remain effective, OSINT professionals will need to:
-
treat LLMs as both platforms and sources
-
build prompt interrogation and model assessment skills
-
subscribe to commercial intelligence feeds and paid databases
-
integrate multiple gated and proprietary ecosystems
-
diversify into geospatial, technical, and archival intelligence
-
learn how to detect and handle synthetic content
-
develop model auditing and bias analysis capabilities
The open web is shrinking. GPTs are becoming gatekeepers. OSINT is not disappearing, but it is transforming.
The analysts who thrive will be those who understand that in this new landscape,
effective investigation happens through the model, not around it.


