Automate checking and requesting indexing of URLs using Google's Indexing API. This code-focused tutorial covers Python implementation, OAuth 2.0 setup, quota management, and batch processing for enterprise-scale sites.
The Google Indexing API lets you programmatically notify Google about new or updated URLs and check their index status. For agencies and site owners managing thousands of pages, manually verifying indexing is impractical. The API replaces tedious spreadsheet checks with automated scripts that can run on a cron job.
In practice, when you monitor 50,000 backlinks or guest post pages weekly, you need a reliable system. The API returns a urlNotificationMetadata object with the latest notifyTime and latestUpdate timestamps. No timestamp means the URL was never submitted. You can then decide to submit it for indexing.
A common situation we see: a site has 10,000+ pages, but only 40% are indexed. The developer scripts a daily check using the API, filters URLs with no notifyTime, and submits them in batches. This workflow cuts manual effort by hours and surfaces blocked or excluded URLs fast.
Set up OAuth 2.0 service account with scope https://www.googleapis.com/auth/indexing
POST to https://indexing.googleapis.com/v3/urlNotifications/metadata?url=ENCODED_URL
Extract latestUpdate and notifyTime. Null values mean no record.
Separate URLs with missing notifyTime. These need a submission request.
POST to https://indexing.googleapis.com/v3/urlNotifications:publish with URL_UPDATED type. Max 80 per batch.
Track used quota via response headers. Pause when daily limit of 200 is reached.
| Endpoint | Method | Response Fields | Failure Mode |
|---|---|---|---|
| /urlNotifications/metadata Check index status | GET | latestUpdate timestampnotifyTime timestamp | 404 means URL never submitted. 403 means auth error or quota exceeded. |
| /urlNotifications:publish Submit URL for indexing | POST | urlNotificationMetadata object with timestamps | 400 Bad Request if URL invalid or type wrong. 429 Too Many Requests if quota exceeded or rate limited. |
| /urlNotifications:batch Not available for Indexing API | N/A | N/A | Batch endpoint does not exist. Must send individual requests. Use async concurrency with delays. |
| OAuth 2.0 token endpoint Generate access token | POST | access_token with expiry in seconds | Invalid grant if service account key is expired or misconfigured. Check sub field for impersonation. |
Assume you have a CSV with 1,000 backlink URLs to check. Your Indexing API daily quota is 200 requests. You cannot check all 1,000 in one day.
Step 1: Filter URLs to only those from domains with high priority (e.g., DA > 30). This reduces the list to 450 URLs.
Step 2: Run the check script for 200 URLs on Day 1. Parse responses: 120 have notifyTime (already submitted), 80 have no notifyTime. Submit those 80 via publish. That uses 80 of your 200 daily publish quota.
Step 3: Day 2: Check the next 200 URLs. 150 have notifyTime, 50 are new. Submit the 50. You have 120 publish slots left.
Step 4: Day 3: Check remaining 150 URLs. 100 have timestamps, 50 are new. Submit 50. All 1,000 URLs are now checked across 3 days, with 180 publish requests used total.
Edge case: 15 URLs returned HTTP 403 because the service account lacks scope for those sites. Those are logged separately for manual investigation.
You need a Google Cloud project with the Indexing API enabled. Create a service account and download the JSON key. Grant the service account the 'Owner' role on the Search Console property for each site you want to check. Without this, the API returns 403 errors.
The sub field in the JWT claim must be the verified owner email. We often see developers forget the sub field, leading to mysterious auth failures. Double-check your service account's delegation to the Search Console property.
For Python, use the google-auth and requests libraries. The access token expires after 3600 seconds. Refresh it programmatically before each batch. Store the token in memory, not on disk, to avoid stale credentials.
The Indexing API has a daily quota of 200 requests per service account for publish and 200 for metadata requests. These are separate quotas. You can check 200 URLs and submit 200 URLs per day from one account.
For large sites, you need multiple service accounts or distribute checks across days. A practical strategy: use one account for metadata checks and another for publish. Or stagger checks: Day 1 check URLs 1-200, Day 2 check 201-400, etc.
We often see developers hit the 429 rate limit because they send requests too fast. Add a delay of at least 1 second between requests. Use exponential backoff on 429 responses. Log each retry to avoid silent failures.
An edge case: blocked URLs (e.g., disallowed by robots.txt or requiring authentication) return 404 from the metadata endpoint, not a special error. You must cross-reference with your crawl logs to confirm blockage.
Use one service account per Google Cloud project. Grant each site's Search Console property the service account email as owner. Loop through sites in your script, switching the base URL and auth context. Monitor total quota across all sites to avoid exceeding 200 daily requests per account. For 50+ sites, consider multiple projects with separate quotas.
The API does not directly show if a backlink is indexed. Instead, check the target URL (the page that contains the backlink). If that page has a latestUpdate timestamp, it is indexed. For backlink verification, use the metadata endpoint on the linking page URL. Pair this with a crawl of the linking page to confirm the link exists. The API only tells you about the page, not the link itself.
Write a Python script that reads a CSV with one column 'url'. For each row, call the metadata endpoint. Append results as new columns: notifyTime, latestUpdate, status. Use pandas for DataFrame operations. Handle 429 errors with time.sleep(60) and retry. Output a new CSV with index status. This script typically runs under 5 minutes for 200 URLs with 1-second delays.
403: service account lacks Search Console owner role. Fix: verify ownership and add email. 400: invalid URL (must start with http or https). Fix: URL-encode and validate. 429: rate limit exceeded. Fix: add delay or reduce concurrency. 404: URL never submitted. Not an error, just means no record. 500: transient Google error. Retry with exponential backoff up to 3 times.
The quota is 200 metadata requests and 200 publish requests per service account per day. These are separate counters. Metadata requests check status; publish requests submit. You can check 200 URLs and submit 200 URLs daily. Quota resets at midnight Pacific Time. Check response headers x-google-quota-remaining to track usage.
Search Console API gives aggregate index coverage data (indexed, excluded, errors) per property. Indexing API gives per-URL notification and status. Use Search Console for dashboard-level checks. Use Indexing API for programmatic per-URL workflows. They complement each other. The Indexing API is faster for individual URL checks but has lower quota.
Schedule a cron job daily that runs your Python script. Store the list of URLs to check in a database table with a 'last_checked' column. Each run picks up URLs that have not been checked in the last 7 days. Respect daily quota by limiting batch size to 200. Log results to a separate table for auditing. Use a lock file to prevent overlapping runs.
Alternatives: 1) Search Console API URL Inspection endpoint (rate-limited, but no quota per se). 2) Google Custom Search API (returns indexed status indirectly via search results). 3) Scraping Google search results with 'site:URL' (against ToS). The Indexing API is the only official, reliable, programmatic method. It is designed for notifications, not bulk checking, but works for small batches.
Deduplicate your URL list before processing. The API does not reject duplicates, but you waste quota. Use a Python set or pandas drop_duplicates(). Also check for URL variants: trailing slash vs no slash, http vs https, www vs non-www. Normalize all URLs to a canonical form (e.g., lowercase, https, no trailing slash) before checking. This reduces false negatives.
The metadata response contains two fields: latestUpdate (timestamp of last indexing notification, could be from a previous submit) and notifyTime (timestamp of the API notification you just triggered). If both are null, the URL was never submitted. If latestUpdate is present but notifyTime is null, the URL was submitted via sitemap or other means, not the API. Use latestUpdate to determine if the page is indexed.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.