Building a Custom API for Bulk Domain Analysis and Tracking
Understanding your domain portfolio’s performance at scale is critical for SEO, competitive intelligence, and digital asset management. This step-by-step guide shows you how to build a custom API for bulk domain analysis and tracking using modern web technologies. By the end, you will have a robust system for automating domain data retrieval, health checks, and historical tracking.
1. Define Your API Requirements and Data Sources
Start by listing the metrics you need: domain registration status, SSL expiration, DNS records, HTTP response codes, page speed, and backlink counts. For each metric, identify free or paid APIs (e.g., WhoisXML API for WHOIS data, Google PageSpeed Insights API for performance, and dnspython for DNS queries). Your custom API will act as an aggregator.
Key considerations: rate limits, authentication methods, and data freshness. Plan to cache results to avoid redundant calls.
2. Set Up the Development Environment
Choose a backend framework. Node.js with Express is ideal for I/O-heavy tasks. Initialize your project and install dependencies:
- axios – for HTTP requests to external APIs
- node-whois – raw WHOIS lookup
- dns2 – DNS resolution
- morgan – logging
- node-cron – scheduled jobs for periodic tracking
Set up environment variables for API keys and database connection strings (use PostgreSQL or MongoDB for storing historical domain data).
3. Create the Core Endpoint: Bulk Domain Input
Build a POST endpoint /api/domains/bulk that accepts a JSON array of domain names (e.g., ["example.com", "test.org"]). Validate input duplicates and malformed strings. Immediately return a job ID so the client can poll for results asynchronously.
Use an in-memory queue (bull or bee-queue) to process domains in batches, preventing external API rate limiting. Each job triggers parallel requests for different data points.
4. Implement Domain Data Enrichment Functions
Write modular functions for each data source:
WHOIS Lookup
Use node-whois to parse creation date, expiry date, and registrar. Store raw JSON for later analysis.
DNS & HTTP Checks
Perform A/AAAA record resolution and HTTPS status (200, 301, 404) using dns2 and axios. Record response time.
Performance Metrics
Call Google PageSpeed Insights API (strategy=mobile) to get lighthouse scores. Cache results for 24 hours to stay within free tier limits.
5. Build Historical Tracking with a Database Schema
Design a schema with two main tables: domains (id, name, created_at) and domain_snapshots (id, domain_id, snapshot_date, whois_expiry, http_status, pagespeed_score, ssl_valid). Use a unique composite index on (domain_id, snapshot_date) for fast retrieval.
Add a cron job (e.g., every 6 hours) that re-checks all active domains and inserts a new snapshot. Use a last_checked timestamp to spread updates evenly.
6. Expose Tracking Data via GET Endpoints
Create two reporting endpoints:
- GET /api/domains/:name/history – returns all snapshots between start_date and end_date (query params). Include trend indicators (e.g., SSL expiry within 30 days).
- GET /api/domains/status – provides a bulk summary of current health (green/yellow/red based on HTTP and SSL).
Use pagination (offset/limit) for large portfolios.
7. Optimize for Performance and Scalability
Implement rate limiting (express-rate-limit) on your API. For external requests, use a token bucket pattern. Store frequently accessed data in Redis (e.g., top 100 domains’ latest snapshot). Log all errors to an external service (Sentry or Papertrail) to monitor third-party API failures.
8. Secure and Document Your API
Add API key authentication using a Bearer token. Document all endpoints with OpenAPI/Swagger. Include example requests and error codes (e.g., 429 for rate limit exceeded). Provide a webhook option for real-time domain status changes (e.g., SSL expiry).
Deployment tip: Use Docker containers and run the API on a cloud provider like AWS ECS or Heroku. Set up monitoring dashboards (Grafana) to track job queue depth and response times.