Resilience and Transparency Features¶
This document describes the resilience patterns and transparency features implemented in the MacumbaTravel backend to provide a reliable, user-friendly experience.
Circuit Breaker Pattern¶
The application implements circuit breakers to handle external service failures gracefully and prevent cascading failures.
Implementation¶
Circuit breakers are configured for each critical external service:
# Initialize circuit breakers in TravelService
self.flights_circuit_breaker = CircuitBreaker(
failure_threshold=3,
recovery_timeout=30,
name="flights_service"
)
self.maps_circuit_breaker = CircuitBreaker(
failure_threshold=3,
recovery_timeout=30,
name="google_maps_service"
)
self.image_circuit_breaker = CircuitBreaker(
failure_threshold=5,
recovery_timeout=60,
name="image_service"
)
self.accommodation_circuit_breaker = CircuitBreaker(
failure_threshold=3,
recovery_timeout=30,
name="accommodation_service"
)
Circuit Breaker States¶
- CLOSED (Normal): All requests go through
- OPEN (Failing): Requests fail fast, use fallback data
- HALF_OPEN (Testing): Single request to test service recovery
Service-Specific Configuration¶
| Service | Threshold | Recovery Time | Criticality | Fallback Strategy |
|---|---|---|---|---|
| Flights Service | 3 failures | 30 seconds | Critical | AI-estimated pricing |
| Google Maps | 3 failures | 30 seconds | Important | AI transportation options |
| Image Service | 5 failures | 60 seconds | Low | Default/generic images |
| Accommodations | 3 failures | 30 seconds | Important | Budget estimates |
Benefits¶
- Prevent Cascading Failures: Stop calling failing services immediately
- Faster Response Times: Fail fast instead of waiting for timeouts
- Graceful Degradation: Provide fallback data when services unavailable
- Automatic Recovery: Test service health and recover when possible
Progressive Enhancement¶
Enrichment Status Tracking¶
The system tracks the status of each enrichment component for transparency and progressive loading:
{
"enrichment_components": {
"transportation": {"status": "success", "error": null},
"flights": {"status": "success", "error": null},
"accommodations": {"status": "fallback", "error": "Circuit breaker open"},
"seasonal_info": {"status": "success", "error": null},
"payment_info": {"status": "success", "error": null}
}
}
Status Endpoint¶
The /recommendations/{id}/status endpoint allows clients to poll enrichment progress:
Response States: - pending: Basic recommendation available, enrichment in progress - partial: Some components failed, others successful - completed: All components successfully enriched
Example Response:
{
"recommendation_id": "rec_123456",
"status": "completed",
"has_basic_data": true,
"has_enriched_data": true,
"enrichment_components": {
"transportation": {"status": "success", "error": null},
"flights": {"status": "success", "error": null},
"accommodations": {"status": "partial", "error": null}
},
"message": "Recommendation fully enriched and ready"
}
Data Transparency¶
Provenance Tracking¶
Every piece of data includes its source information:
{
"provenance": {
"destination_data": "ai_generated",
"transportation": "google_maps_api",
"flights_data": "amadeus_api",
"accommodations_data": "booking_api",
"seasonal_info": "seasonal_service",
"payment_info": "payment_service"
}
}
Data Freshness¶
Timestamps show when each component was last updated:
{
"data_freshness": {
"transportation_fetched_at": "2025-01-15T10:30:00Z",
"flights_fetched_at": "2025-01-15T10:30:15Z",
"accommodations_fetched_at": "2025-01-15T10:30:30Z",
"seasonal_data_fetched_at": "2025-01-15T10:30:45Z"
}
}
Cost Breakdown Transparency¶
Detailed cost breakdowns help users understand pricing:
{
"cost_breakdown": {
"base_cost": 1200,
"transport_cost": 450,
"accommodation_estimate": 420,
"total_per_person": 1800,
"total_for_travelers": 3600
}
}
Timeout and Partial Returns¶
Enrichment Timeout¶
The enrichment process has a 30-second timeout to prevent hanging requests:
async def enrich_recommendation(self, recommendation, ...):
try:
# Set timeout for enrichment process (30 seconds)
enriched = await asyncio.wait_for(
self._enrich_recommendation_internal(
recommendation, departure_city, budget, max_travel_time, travelers, duration
),
timeout=30.0
)
return enriched
except asyncio.TimeoutError:
# Fallback to minimal recommendation
return self._create_minimal_recommendation(recommendation, ...)
Fallback Strategies¶
Each service has defined fallback behavior:
Transportation Fallback¶
- Use AI-provided transportation options
- Fall back to estimated drive times
- Default to 2-hour drive time if no data available
Flight Fallback¶
- Use distance-based pricing estimates
- Apply standard aviation pricing models
- Include check-in time estimates
Accommodation Fallback¶
- Provide budget-based estimates
- Use regional pricing averages
- Include different accommodation tiers
Image Fallback¶
- Use cached destination images
- Fall back to generic travel images
- Avoid duplicate images across destinations
Caching Headers¶
ETag Implementation¶
Static endpoints include ETags for efficient caching:
# Create ETag from content hash
content_hash = hashlib.md5(str(recommendations).encode()).hexdigest()
# Check if client has cached version
if request.headers.get("If-None-Match") == f'"{content_hash}"':
return Response(status_code=304) # Not Modified
# Set caching headers
response.headers["ETag"] = f'"{content_hash}"'
response.headers["Cache-Control"] = "public, max-age=3600"
Cache-Control Headers¶
Different endpoints have appropriate caching policies:
| Endpoint | Cache Duration | Policy |
|---|---|---|
| Random Recommendations | 1 hour | public, max-age=3600 |
| Rate Limit Status | 1 minute | public, max-age=60 |
| User Profile | Private | private, max-age=300 |
Security Enhancements¶
Token Hashing¶
Email verification and password reset tokens are securely hashed before storage:
@staticmethod
def hash_token(token: str) -> str:
"""Hash a token for secure storage"""
return pwd_context.hash(token)
def verify_verification_token(self, plain_token: str) -> bool:
"""Verify a verification token against the stored hash"""
if not self.verification_token:
return False
return pwd_context.verify(plain_token, self.verification_token)
Benefits¶
- Secure Storage: Tokens cannot be reversed if database is compromised
- Constant-Time Verification: Prevents timing attacks
- Proper Lifecycle: Tokens are invalidated after use
Monitoring and Observability¶
Circuit Breaker Metrics¶
Circuit breakers expose metrics for monitoring:
- Circuit breaker state (CLOSED/OPEN/HALF_OPEN)
- Failure count per service
- Recovery attempts
- Success/failure rates
Component Status Tracking¶
Enrichment components provide detailed status information:
- Success/failure rates per component
- Error categorization (timeout, service down, invalid data)
- Performance metrics (response times, cache hit rates)
Best Practices¶
When to Use Circuit Breakers¶
- External Service Calls: Any call to services outside your control
- Expensive Operations: Operations that consume significant resources
- Non-Critical Features: Features that can gracefully degrade
Fallback Strategy Guidelines¶
- Data Quality: Fallback data should be reasonably accurate
- User Communication: Clearly indicate when fallback data is used
- Gradual Degradation: Prefer partial functionality over complete failure
- Recovery Testing: Regularly test service recovery scenarios
Progressive Enhancement Tips¶
- Status Endpoints: Provide real-time status for long-running operations
- Granular Tracking: Track individual components, not just overall status
- Client-Friendly: Design status responses for easy frontend consumption
- Error Context: Include enough error information for debugging
Impact on User Experience¶
These resilience patterns significantly improve the user experience:
- Faster Perceived Performance: Progressive loading reduces wait times
- Higher Reliability: Circuit breakers prevent complete failures
- Greater Transparency: Users understand data sources and freshness
- Better Trust: Cost breakdowns and provenance build user confidence
- Improved Accessibility: Graceful degradation ensures core functionality remains available
The combination of these patterns creates a robust, transparent, and user-friendly travel planning system that maintains functionality even when external services are experiencing issues.