HTTP Retry and Resilience

Fromager includes enhanced HTTP retry functionality to handle network failures, server timeouts, and rate limiting that can occur when downloading packages and metadata.

Features

The retry system provides:

  • Exponential backoff with jitter to avoid thundering herd problems

  • Configurable retry attempts (default: 8 retries)

  • Smart error handling for common network issues:

    • HTTP 5xx server errors (500, 502, 503, 504)

    • HTTP 429 rate limiting

    • Connection timeouts and broken connections

    • Incomplete reads during large downloads

    • DNS resolution failures

  • GitHub API rate limit handling with proper reset time detection

  • Automatic authentication for GitHub and GitLab APIs (see Authentication)

  • Temporary file handling to prevent partial downloads

Configuration

You can customize retry behavior using environment variables:

# Number of retry attempts (default: 8)
export FROMAGER_HTTP_RETRIES=10

# Backoff factor for exponential delay (default: 1.5)
export FROMAGER_HTTP_BACKOFF_FACTOR=2.0

# Request timeout in seconds (default: 120)
export FROMAGER_HTTP_TIMEOUT=180

Authentication credentials (GITHUB_TOKEN, GITLAB_PRIVATE_TOKEN, etc.) are documented in Authentication.

Error Types Handled

The retry mechanism specifically handles these error conditions:

Server Errors

  • 408 Request Timeout - Request took too long to complete

  • 504 Gateway Timeout - Server overwhelmed or upstream timeout

  • 502 Bad Gateway - Proxy/gateway errors

  • 503 Service Unavailable - Temporary server overload

  • 500 Internal Server Error - General server errors

Rate Limiting

  • 429 Too Many Requests - General rate limiting

  • GitHub API rate limits with proper reset time handling

Network Errors

  • ConnectionError - Network connectivity issues

  • ChunkedEncodingError - Broken connections during transfer

  • IncompleteRead - Partial data received

  • ProtocolError - Low-level protocol issues

  • Timeout - Request timeouts

Usage

The retry functionality is automatically enabled for all HTTP operations in Fromager. No code changes are required for existing functionality.

For Plugin Developers

If you’re writing plugins that need HTTP functionality, use the shared session from request_session. It includes retry handling and automatic authentication for GitHub and GitLab:

from fromager.request_session import session

# Use it like a normal requests session
response = session.get("https://pkg.test/api/data")
response.raise_for_status()

To register authentication for additional hosts:

from fromager.request_session import session_auth

def _resolve_my_auth(scheme: str, hostname: str) -> dict[str, str]:
    return {"Authorization": "Bearer my-token"}

session_auth.add("https://my-registry.test", _resolve_my_auth)

Decorating Functions with Retry Logic

For functions that might fail due to transient errors:

from fromager.http_retry import retry_on_exception, RETRYABLE_EXCEPTIONS

@retry_on_exception(
    exceptions=RETRYABLE_EXCEPTIONS,
    max_attempts=3,
    backoff_factor=1.0,
    max_backoff=30.0,
)
def download_metadata(url):
    # Your download logic here
    pass

Logging

The retry system logs important events:

  • WARNING: When retries are attempted with backoff times

  • ERROR: When all retry attempts are exhausted

  • DEBUG: Detailed retry configuration and authentication resolution

Example log output:

WARNING Request failed for https://api.github.com/repos/owner/repo/tags: 504 Server Error. Retrying in 2.3 seconds (attempt 2/5)
WARNING GitHub API rate limit hit for https://api.github.com/repos/owner/repo/tags. Waiting 1247 seconds until reset.
INFO saved /path/to/package.tar.gz

Performance Considerations

  • Chunk size: Downloads use 64KB chunks for better error recovery

  • Temporary files: Partial downloads are written to .tmp files first

  • Jitter: Random delays prevent synchronized retry storms

  • Max backoff: Delays are capped at 60-120 seconds depending on context

Troubleshooting

High Retry Rates

If you’re seeing many retries, consider:

  • Configuring authentication credentials (see Authentication) to avoid API rate limits

  • Increasing timeout values for slow connections

  • Checking network connectivity and DNS resolution

API Rate Limiting

  • Configure credentials via netrc or environment variables (see Authentication)

  • Consider using a local package mirror for PyPI

  • Monitor API usage if using private registries

Connection Issues

  • Verify firewall and proxy settings

  • Check if specific URLs are being blocked

  • Consider network-level retry/redundancy