# HTTP Retry and Resilience

Fromager includes enhanced HTTP retry functionality to handle network failures, server timeouts, and rate limiting that can occur when downloading packages and metadata.

## Features

The retry system provides:

- **Exponential backoff with jitter** to avoid thundering herd problems

- **Configurable retry attempts** (default: 8 retries)

- **Smart error handling** for common network issues:

  - HTTP 5xx server errors (500, 502, 503, 504)
  - HTTP 429 rate limiting
  - Connection timeouts and broken connections
  - Incomplete reads during large downloads
  - DNS resolution failures

- **GitHub API rate limit handling** with proper reset time detection

- **Automatic authentication** for GitHub and GitLab APIs (see {doc}`how-tos/authentication`)

- **Temporary file handling** to prevent partial downloads

## Configuration

You can customize retry behavior using environment variables:

```bash
# Number of retry attempts (default: 8)
export FROMAGER_HTTP_RETRIES=10

# Backoff factor for exponential delay (default: 1.5)
export FROMAGER_HTTP_BACKOFF_FACTOR=2.0

# Request timeout in seconds (default: 120)
export FROMAGER_HTTP_TIMEOUT=180
```

Authentication credentials (`GITHUB_TOKEN`, `GITLAB_PRIVATE_TOKEN`,
etc.) are documented in {doc}`how-tos/authentication`.

## Error Types Handled

The retry mechanism specifically handles these error conditions:

### Server Errors

- `408 Request Timeout` - Request took too long to complete
- `504 Gateway Timeout` - Server overwhelmed or upstream timeout
- `502 Bad Gateway` - Proxy/gateway errors
- `503 Service Unavailable` - Temporary server overload
- `500 Internal Server Error` - General server errors

### Rate Limiting

- `429 Too Many Requests` - General rate limiting
- GitHub API rate limits with proper reset time handling

### Network Errors

- `ConnectionError` - Network connectivity issues
- `ChunkedEncodingError` - Broken connections during transfer
- `IncompleteRead` - Partial data received
- `ProtocolError` - Low-level protocol issues
- `Timeout` - Request timeouts

## Usage

The retry functionality is automatically enabled for all HTTP operations in Fromager. No code changes are required for existing functionality.

### For Plugin Developers

If you're writing plugins that need HTTP functionality, use the
shared session from `request_session`. It includes retry handling
and automatic authentication for GitHub and GitLab:

```python
from fromager.request_session import session

# Use it like a normal requests session
response = session.get("https://pkg.test/api/data")
response.raise_for_status()
```

To register authentication for additional hosts:

```python
from fromager.request_session import session_auth

def _resolve_my_auth(scheme: str, hostname: str) -> dict[str, str]:
    return {"Authorization": "Bearer my-token"}

session_auth.add("https://my-registry.test", _resolve_my_auth)
```

### Decorating Functions with Retry Logic

For functions that might fail due to transient errors:

```python
from fromager.http_retry import retry_on_exception, RETRYABLE_EXCEPTIONS

@retry_on_exception(
    exceptions=RETRYABLE_EXCEPTIONS,
    max_attempts=3,
    backoff_factor=1.0,
    max_backoff=30.0,
)
def download_metadata(url):
    # Your download logic here
    pass
```

## Logging

The retry system logs important events:

- **WARNING**: When retries are attempted with backoff times
- **ERROR**: When all retry attempts are exhausted
- **DEBUG**: Detailed retry configuration and authentication resolution

Example log output:

```text
WARNING Request failed for https://api.github.com/repos/owner/repo/tags: 504 Server Error. Retrying in 2.3 seconds (attempt 2/5)
WARNING GitHub API rate limit hit for https://api.github.com/repos/owner/repo/tags. Waiting 1247 seconds until reset.
INFO saved /path/to/package.tar.gz
```

## Performance Considerations

- **Chunk size**: Downloads use 64KB chunks for better error recovery
- **Temporary files**: Partial downloads are written to `.tmp` files first
- **Jitter**: Random delays prevent synchronized retry storms
- **Max backoff**: Delays are capped at 60-120 seconds depending on context

## Troubleshooting

### High Retry Rates

If you're seeing many retries, consider:

- Configuring authentication credentials (see {doc}`how-tos/authentication`) to avoid API rate limits
- Increasing timeout values for slow connections
- Checking network connectivity and DNS resolution

### API Rate Limiting

- Configure credentials via netrc or environment variables (see {doc}`how-tos/authentication`)
- Consider using a local package mirror for PyPI
- Monitor API usage if using private registries

### Connection Issues

- Verify firewall and proxy settings
- Check if specific URLs are being blocked
- Consider network-level retry/redundancy