Using GDELT for Geopolitical Event Detection

What GDELT Is

The Global Database of Events, Language, and Tone -- GDELT -- is an open research project supported by Google Jigsaw that monitors broadcast, print, and online news media in over 150 languages, translates and processes each article through a battery of natural language processing algorithms, and publishes the resulting structured data every fifteen minutes. It has been running continuously since February 2015 in its current form (GDELT 2.0), with a back-catalog of events coded from digitized archives stretching to January 1, 1979.

GDELT is not a news aggregator in the way Google News is. It does not display headlines or article text. Instead, it reduces each article to a set of structured records: who did what to whom, where, when, and how the media framed the event. An article about Iran seizing a tanker in the Strait of Hormuz becomes a coded event record with Actor1 (Iran/government/military), Actor2 (the vessel/flag state), an event type (military seizure, CAMEO code 195), a geographic location (latitude/longitude in the Strait), a timestamp, and a tone score reflecting the emotional tenor of the coverage.

The underlying coding taxonomy is called CAMEO -- Conflict and Mediation Event Observations. It defines over 300 event types organized in a hierarchy from cooperative actions (diplomatic meetings, trade agreements, humanitarian aid) to conflictive actions (threats, sanctions, military force). Each event type has a numeric code and a Goldstein Scale value that measures its theoretical impact on country-to-country stability, ranging from -10 (most conflictive) to +10 (most cooperative). An event coded as "military attack" scores -10.0 on the Goldstein Scale. A "formal diplomatic meeting" scores +5.4. The scale is not perfect -- no single number can capture the complexity of a geopolitical event -- but it provides a consistent, machine-readable measure of conflict intensity that is available for every event in the database.

Why GDELT Matters for Shipping Analysis

Maritime shipping is a geopolitical business. Freight rates, insurance premiums, and routing decisions respond to events -- military confrontations in chokepoints, sanctions announcements, port closures, diplomatic breakthroughs -- that first appear in news media before they show up in vessel tracking data or commodity prices. GDELT captures these events at the point of first reporting, often hours before markets react.

Consider the sequence that plays out during a chokepoint disruption. A Houthi missile strike on a commercial vessel in the Red Sea is reported by local and international media within minutes. GDELT processes those articles in its next 15-minute cycle, coding the event as a military attack with a Goldstein score of -10.0 and geolocating it to the Bab el-Mandeb strait. Within the same cycle, the tone scores on all Red Sea-related coverage shift negative. The volume of coded events in the region spikes. All of this is queryable before the vessel has even diverted, before insurance underwriters have repriced war-risk premiums, and before container lines have announced surcharges.

This is not to say that GDELT alone can predict freight rate movements. It cannot. But it provides a structured, machine-readable signal of geopolitical disruption that complements vessel tracking data (which shows physical effects after they happen) and commodity price data (which reflects market pricing after participants have digested the information). GDELT sits at the front of the information chain.

Tone Scores as Sentiment Indicators

Every article processed by GDELT receives a tone score, computed from the full text of the article using a sentiment analysis algorithm. The score ranges from -100 (extremely negative) to +100 (extremely positive), with most news articles falling between -10 and +10. The average tone of global news coverage is slightly negative -- roughly -1.5 to -2.0 -- because conflict, disaster, and crisis receive disproportionate media attention.

For shipping analysis, absolute tone scores are less useful than changes in tone over time for a specific geography or topic. The baseline tone of news coverage about the Strait of Hormuz is always negative -- it is a region defined by military tension, sanctions, and strategic competition. What matters is whether the tone is becoming more negative, how quickly, and whether the rate of change exceeds historical norms.

Risk and Route tracks rolling 7-day average tone scores for each of the five major chokepoints. When the 7-day average drops more than two standard deviations below the 90-day mean for a given chokepoint, it triggers an alert. This approach filters out background noise -- the constant low-grade negativity of coverage about contested waterways -- and highlights statistically significant shifts that correlate with actual operational disruptions.

Tone Score Interpretation

● +3 to +10: Strongly positive coverage. Diplomatic breakthroughs, trade agreements, de-escalation signals. Rare for chokepoint regions.
● 0 to +3: Neutral to mildly positive. Normal operations, routine diplomatic contacts. The baseline for most non-crisis periods.
● -3 to 0: Mildly negative. The normal background level for most geopolitically active regions. Persistent but not alarming.
● Below -5: Significantly negative. Active military operations, crisis escalation, direct threats to commercial shipping. Historically correlated with freight rate spikes within 5 to 14 days.

How Risk and Route Uses GDELT

Risk and Route queries GDELT through two primary interfaces: the GDELT Analysis Service API (which runs SQL-like queries against the full event database via Google BigQuery) and the GDELT DOC API (which searches article metadata and tone scores). Both are free and require no authentication.

The platform monitors three categories of GDELT data for shipping-relevant signals.

1. Event Volume by Chokepoint

The first signal is simply the count of coded events geolocated within a defined bounding box around each chokepoint, filtered by conflictive event types (CAMEO root codes 13 through 20, which cover threats, protests, military posturing, coercion, and armed force). A sustained increase in conflictive event volume -- say, three or more consecutive days above the 90th percentile of the trailing 180-day distribution -- suggests escalating tension in the region. This metric detected the Houthi escalation in November 2023 roughly 48 hours before the first confirmed attack on commercial shipping.

2. Actor-Pair Monitoring

The second signal tracks specific actor pairs -- Iran and the United States, Iran and Israel, China and Taiwan, Russia and Ukraine -- and monitors both the volume and the Goldstein Scale distribution of events involving those pairs. When the proportion of events scoring below -5.0 on the Goldstein Scale exceeds 40% of all events for a given actor pair in a rolling 14-day window, it signals a deterioration in the bilateral relationship that is likely to have shipping implications.

3. Tone Velocity

The third signal measures the rate of change of average tone across all coverage mentioning specific keywords: "Strait of Hormuz," "Suez Canal," "Bab el-Mandeb," "Panama Canal," "Malacca Strait," "shipping attack," "tanker seizure," "maritime blockade." When tone velocity -- the first derivative of the 7-day rolling average -- exceeds a threshold calibrated against historical disruption events, the system flags a potential emerging risk.

Example: GDELT BigQuery Query

SELECT DATEADD(SQLDATE, 1, '1979-01-01') as date,
  Actor1CountryCode, Actor2CountryCode,
  AVG(GoldsteinScale) as avg_goldstein,
  AVG(AvgTone) as avg_tone,
  COUNT(*) as event_count
FROM `gdelt-bq.gdeltv2.events`
WHERE ActionGeo_Lat BETWEEN 26.0 AND 27.5
  AND ActionGeo_Long BETWEEN 55.5 AND 57.0
  AND QuadClass IN (3, 4)
  AND YEAR >= 2023
GROUP BY date, Actor1CountryCode, Actor2CountryCode
ORDER BY date DESC

Retrieves conflictive events (QuadClass 3 = verbal conflict, 4 = material conflict) geolocated within the Strait of Hormuz bounding box, grouped by day and actor pair. The avg_goldstein and avg_tone columns provide daily conflict intensity and media sentiment.

The GDELT DOC API for Keyword Monitoring

The GDELT DOC API is simpler and faster than BigQuery for real-time monitoring. It searches article metadata and returns results with tone scores, source country, language, and article URLs. It requires no API key and returns JSON directly.

For shipping analysis, the DOC API is most useful for tracking breaking events -- a tanker seizure, a missile strike, a sanctions announcement -- where you want to know the volume, geographic spread, and sentiment of coverage within minutes of the first reports. The API supports keyword queries with AND/OR/NOT logic, date filtering, source country filtering, and tone filtering.

Example: GDELT DOC API Call

GET https://api.gdeltproject.org/api/v2/doc/doc?query="strait of hormuz" OR "tanker seizure"&mode=ArtList&maxrecords=250&format=json×pan=7d&sort=DateDesc

Returns the 250 most recent articles from the past 7 days mentioning "strait of hormuz" or "tanker seizure," sorted by date descending, with tone scores and source metadata for each article.

Combining GDELT with Other Data Sources

GDELT is most powerful when joined to other data streams. On its own, it tells you that media coverage of a chokepoint is becoming more negative. Combined with AIS vessel tracking data, it tells you whether that coverage corresponds to actual changes in vessel behavior -- diversions, slow-steaming, anchor queues. Combined with commodity price data from FRED and freight rate indices, it tells you whether the geopolitical signal has already been priced in or whether the market is still digesting the information.

Risk and Route's signal pipeline runs in sequence: GDELT event detection first (within 15 minutes of media reporting), followed by AIS pattern confirmation (within hours), followed by price correlation analysis (within days). This layered approach reduces false positives. A GDELT tone spike in the Strait of Hormuz that is not confirmed by AIS anomalies within 24 hours is likely media noise rather than an operational disruption. A GDELT spike that is confirmed by AIS diversions but not yet reflected in freight rates or commodity prices represents a potential leading signal for price movement.

Limitations and Caveats

GDELT has real limitations that users should understand before relying on it.

Media bias is baked in. GDELT processes whatever the world's media publishes. If a region receives disproportionate coverage (the Middle East, for instance), events there will generate more data points than equivalent events in underreported regions. A minor border skirmish in the Persian Gulf generates hundreds of GDELT records; a similar incident in the Strait of Malacca might generate a dozen. This is a media attention bias, not a risk bias, and treating the two as equivalent leads to miscalibrated alerts.

Tone is not truth. The tone score reflects how the media frames an event, not the severity of the event itself. A dramatic but operationally minor incident can generate extremely negative tone scores if coverage is sensationalized. Conversely, a serious but slowly developing threat -- such as the gradual increase in Houthi attack capability through late 2023 -- can produce moderate tone scores until a spectacular attack crystallizes media attention.

Event coding is imperfect. GDELT's automated coding system occasionally miscodes events -- assigning the wrong CAMEO code, geolocating an event to the wrong location, or misidentifying actors. The error rate is estimated at roughly 10-15% for individual events. For aggregate analysis (rolling averages, volume trends, tone distributions), individual coding errors wash out. For individual event detection, manual verification against source articles is necessary.

Coverage is English-weighted. Although GDELT processes 152 languages, English-language sources are overrepresented in the database because they are the most digitally available. Events in regions with limited English-language media coverage may appear with a lag or at lower volume. Machine translation of non-English sources adds a further source of tone measurement error.

No causal claims. GDELT tells you what the media is saying about a region. It does not tell you what is actually happening on the ground. A correlation between GDELT tone scores and freight rate movements is an empirical pattern, not a causal mechanism. The mechanism runs through market participants who read the same news that GDELT processes -- GDELT is measuring the information environment, not the physical reality.

Key Takeaways

1. GDELT is the world's largest open geopolitical event database. It processes global news media every 15 minutes, codes events into structured records using the CAMEO taxonomy, and publishes tone scores reflecting media sentiment. All data is free and requires no API key for basic access.
2. Tone velocity matters more than absolute tone. Chokepoint regions always have negative media coverage. What signals a disruption is the rate of change -- a rapid drop in rolling average tone that exceeds historical norms, not the baseline negativity.
3. Event volume spikes precede operational disruptions. A sustained increase in conflictive events geolocated near a chokepoint -- filtered by CAMEO codes for threats and military action -- has historically led AIS-confirmed shipping diversions by 24 to 72 hours.
4. GDELT is a leading indicator, not a standalone signal. It detects disruptions at the point of media reporting, before vessel tracking or market data reflect the event. But it must be confirmed against AIS patterns and commodity prices to filter out media noise.
5. Media bias, coding errors, and English-language overweighting are real limitations. GDELT measures the information environment, not physical reality. Aggregate analysis (rolling averages, volume trends) is more reliable than individual event records.
6. The BigQuery interface and DOC API are complementary. BigQuery supports complex historical analysis with SQL. The DOC API supports real-time keyword monitoring with JSON output. Both are free.