top of page
New PNG Logo.png
image (1).jpg

The Coming Sources of Open Sources

127-VC6sfZDWYx4.jpeg
By Colin Crowden

GPT models are on track to become the default gateway to information. As competition intensifies, the biggest models will not rely on open sources. They will
seek exclusive access to proprietary datasets. If that happens, the open internet will
shrink, search engines will fade, and OSINT will face the most significant transformation in its short history.

When GPT Replaces Search: The Coming Collapse of Open Sources


Imagine waking up one morning and realising that Google is no longer the front door
to the internet. The line of blue links has disappeared. The search bar still exists, but it
has become a formality because you and everyone else now turn directly to ChatGPT,
Gemini, Claude, or whichever model dominates.


This shift is already visible. On 30 October 2025, the Digital Watch Observatory
reported that Google removed the option to display more than 10 search results per
page.
Full citation: https://dig.watch/updates/google-limits-search-results-to-10-per-page


The change may appear modest, but it has major implications. It signals a strategic
pivot away from open discovery and towards curated, limited, and controlled access
to information.


If GPT models continue to replace traditional search, the pressure to secure exclusive
datasets will increase. The open internet will contract, and OSINT will enter a
fundamentally different era.

The Incentive Shift: From Open Web to Controlled Data


Search engines were built on openness. Their value depended on crawling as much of
the web as possible, indexing content, and monetising user intent through advertising.


LLMs operate differently. They thrive on exclusivity. A model trained on the same
open data as its competitors has no advantage. The real edge comes from:

  1. Exclusive datasets

  2. Private partnerships

  3. Proprietary corpora unavailable to rivals

  4. High fidelity data absent from the open web

  5. Restricted sources that once appeared in search results but can no longer be accessed

This incentive structure is already changing the information landscape.

Google’s restriction benefits Gemini and disadvantages other LLMs


Google’s new limit reduces the discoverable surface of the web for all external LLMs
that depend on SERP outputs, long-tail URLs, and deep indexing. Meanwhile, Google’s own LLM, Gemini, retains full internal access to Google’s complete index, cached pages, and deep search capabilities.


Public access is reduced. Gemini remains in the loop.


This creates a competitive imbalance. Other LLMs lose access to long-tail information.
Gemini retains a privileged internal advantage. Google becomes both gatekeeper and
competitor. The long tail of the internet, where much of OSINT’s richest material
resides, is being hidden from everyone except Google’s own AI systems.

Warnings from AI Leaders: The Rise of the Dead Internet


The idea that the internet is becoming hollowed out or increasingly synthetic is no
longer fringe speculation. Prominent figures in AI and technology have publicly
referenced the concept of a "dead internet."

 

These warnings matter. When the leaders of major AI labs and foundational internet
platforms say the web is becoming artificial and opaque, it signals a structural shift
that directly affects OSINT.

A Wider Recognition: The AI Compute Race as a 2026 Risk

Control Risks has identified the global AI compute race as a strategic risk for 2026. Their assessment focuses on energy shortages, water usage, and infrastructure strain across hyperscale data centres.


This is only part of the picture. As compute becomes scarce, high quality and exclusive data becomes a strategic asset. Organisations will compete not only for GPUs and electricity but also for data access, licensing agreements, exclusive corpora, and proprietary insights.


This further incentivises the tightening of access to the open web. The AI arms race becomes as much about control of information as about access to silicon.

The Disappearing Web: A Structural Threat to OSINT


OSINT rests on one assumption. Somewhere on the web, there is publicly available information that can be found, indexed, and interrogated.


As the internet becomes more closed, that assumption weakens.


We are moving towards a world where:

  • websites allow crawling only by approved LLMs

  • niche publications and local news sources move behind paywalls

  • social platforms restrict APIs and block scraping

  • governments designate datasets as strategic or controlled assets

  • proprietary LLM ecosystems offer access only to trusted partners or
    subscribers

OSINT will become more expensive, less democratic, and more reliant on controlled
systems.

Testing the Hypothesis


Test 1: Adoption Curve
LLM usage is rising rapidly, while traditional search has plateaued.
Result: supports the hypothesis.


Test 2: Commercial Incentives
Models differentiate through exclusive datasets rather than architecture.
Result: strongly supports the hypothesis.

Test 3: Platform Behaviour
Google’s restriction reduces visibility for external LLMs while keeping internal access
for Gemini.
Full citation: https://dig.watch/updates/google-limits-search-results-to-10-per-page
Result: supports the hypothesis.


Test 4: Growing AI Generated Content
Warnings from AI leaders about a dead internet suggest a landscape increasingly
dominated by synthetic content.
Result: strongly supports the hypothesis.


Test 5: OSINT Adaptability
OSINT can adapt but will face higher costs and more complex access requirements.
Result: supports the hypothesis.

 

Adapting OSINT: Leveraging LLMs in a Gated Future


OSINT practitioners will need to evolve rapidly.


1. Reverse engineering LLM prompts
LLMs will become information chokepoints. Analysts must infer:

  • likely training sources

  • dataset biases and gaps

  • hidden reasoning patterns

  • indications that information is synthetic

  • whether outputs imply access to exclusive corpora

The model becomes an intelligence target.


2. Extracting latent information
LLMs may reveal useful signals indirectly. Analysts will need structured methods for
probing, challenging, and validating these insights.


3. Fusing gated ecosystems
Future OSINT workflows will rely on:

  • paid data feeds

  • licensed databases

  • commercial intelligence platforms

  • sensor, geospatial, and archival sources

  • human networks and offline intelligence

Data fusion becomes central.


4. Using LLMs as investigative tools


LLMs will be used for:

 

  • hypothesis testing

  • anomaly detection

  • linguistic pattern analysis

  • lead generation

  • red teaming

OSINT shifts from open web discovery to controlled system interrogation.


What This Means for OSINT Teams


To remain effective, OSINT professionals will need to:

  • treat LLMs as both platforms and sources

  • build prompt interrogation and model assessment skills

  • subscribe to commercial intelligence feeds and paid databases

  • integrate multiple gated and proprietary ecosystems

  • diversify into geospatial, technical, and archival intelligence

  • learn how to detect and handle synthetic content

  • develop model auditing and bias analysis capabilities

The open web is shrinking. GPTs are becoming gatekeepers. OSINT is not disappearing, but it is transforming.


The analysts who thrive will be those who understand that in this new landscape,
effective investigation happens through the model, not around it.

bottom of page