Understanding Database Configuration and Alembic Migrations: A Guide for Junior Developers¶
This guide provides a detailed explanation of the database configuration and Alembic migration system in our project. It's designed to help developers understand how our database is set up, how we manage schema changes, and best practices to follow.
Table of Contents¶
- 1. Database Configuration
- 2. What is Alembic?
- 3. Alembic Setup
- 4. Alembic Migrations
- 5. Database Models
- 6. Common Alembic Commands
- 7. Best Practices
- 8. Common Pitfalls
- 9. FastAPI Database Connection
- 10. Development vs. Production
- 11. Troubleshooting Alembic Migration Issues
1. What is Your Database Configuration?¶
The Database Setup (app/core/db.py)¶
What: - Our application uses PostgreSQL as its database system - SQLAlchemy is used as the ORM (Object-Relational Mapper) - We've created a centralized configuration for database access
How:
# Create base class for models
Base = declarative_base()
# Create async engine and session
engine = create_async_engine(settings.DATABASE_URL)
AsyncSessionLocal = async_sessionmaker(
autocommit=False,
autoflush=False,
bind=engine,
class_=AsyncSession
)
async def get_db():
async with AsyncSessionLocal() as session:
try:
yield session
finally:
await session.close()
Why: - Base class: Serves as the foundation for all your database models - Async Engine: Establishes non-blocking connections to your database for better performance - AsyncSessionLocal: Creates a factory for async database sessions - get_db function: An async dependency that provides and safely closes database sessions - Async Context Manager: Ensures proper session cleanup and connection pooling
Database Configuration (app/core/config.py)¶
What: - Our database settings are defined in the Settings class - We load database credentials from environment variables
How:
Why: - Environment variables: Keep sensitive credentials out of code - Settings class: Centralizes configuration and makes it easier to access - DATABASE_URL: Contains the full connection string for your database
2. What is Alembic?¶
What: - Alembic is a database migration tool for SQLAlchemy - It helps you manage changes to your database schema over time - It creates and applies "migrations" - scripts that modify your database structure
Why: - Version control for your database: Track changes to your schema like you track code changes - Collaborative development: Team members can apply the same changes to their databases - Safe deployment: Apply schema changes to production safely - Backward compatibility: Can roll back changes if needed
3. How Alembic is Set Up in Your Project¶
Alembic Configuration (alembic.ini)¶
What: - The main configuration file for Alembic - Defines settings like migration script location and logging
How:
[alembic]
# path to migration scripts
script_location = alembic
# ... other settings ...
sqlalchemy.url = postgresql://postgres:carnepalapicadora@db:5432/traveldb
Why: - script_location: Tells Alembic where to find migration scripts - sqlalchemy.url: The database connection string (though you override this in env.py)
Alembic Environment (alembic/env.py)¶
What: - Sets up the environment for Alembic to run - Links Alembic to your SQLAlchemy models - Provides context for migrations
How:
# Import your models
from app.core.db import Base
from app.models.destination import Destination, DestinationCost, Activity, WeatherRecord
from app.models.user import User
from app.models.saved_trip import SavedTrip
# ... other imports ...
# Get the SQLAlchemy URL from environment
def get_url():
user = os.getenv("POSTGRES_USER", "postgres")
password = os.getenv("POSTGRES_PASSWORD", "carnepalapicadora")
server = os.getenv("POSTGRES_SERVER", "db")
db = os.getenv("POSTGRES_DB", "traveldb")
return f"postgresql://{user}:{password}@{server}/{db}"
Why: - Import models: Alembic needs to know all your models to generate migrations - get_url function: Gets database connection from environment variables instead of hardcoding - target_metadata = Base.metadata: Links Alembic to your SQLAlchemy models
4. Alembic Migrations in Your Project¶
Migration Structure¶
What: - Each migration is a Python file in the alembic/versions/ directory - Migrations have an "upgrade" function (apply changes) and a "downgrade" function (reverse changes) - Migrations are applied in sequence based on their dependencies
How:
# Example migration: 6d8c09c0ea13_create_initial_schema.py
def upgrade() -> None:
op.create_table('destinations',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.String(), nullable=True),
# ... other columns ...
sa.PrimaryKeyConstraint('id')
)
# ... other tables ...
def downgrade() -> None:
op.drop_table('weather_records')
op.drop_table('destination_costs')
# ... other tables ...
Why: - upgrade: Describes how to modify the database to reach the new state - downgrade: Describes how to undo those changes - revision identifiers: Track the order of migrations
Migration Types in Your Project¶
What: You have several types of migrations: 1. Initial schema migration: Sets up the first version of your database 2. Feature migrations: Add new tables or columns for features (e.g., user authentication) 3. Improvement migrations: Modify existing tables to improve the schema
How:
alembic/versions/
├── 6d8c09c0ea13_create_initial_schema.py # Initial setup
├── 8e9625799d4e_add_destinations_table.py # Add new tables
├── xxxx_add_user_authentication.py # Add auth features
└── xxxx_add_user_profile_fields.py # Add additional fields
Why: - Incremental changes: Break changes into manageable pieces - Feature-based organization: Link migrations to specific features - Developmental history: Show how the database evolved
5. Your Database Models¶
Model Structure¶
What: - Models are Python classes that inherit from the Base class - Each model corresponds to a database table - Models define columns and relationships
How:
class Destination(Base):
__tablename__ = "destinations"
id = Column(Integer, primary_key=True, index=True)
name = Column(String, nullable=True, index=True)
country = Column(String, nullable=True, index=True)
# ... other columns ...
# Relationships
costs = relationship("DestinationCost", back_populates="destination", cascade="all, delete-orphan")
activities = relationship("Activity", back_populates="destination", cascade="all, delete-orphan")
weather_records = relationship("WeatherRecord", back_populates="destination", cascade="all, delete-orphan")
Why: - tablename: Specifies the database table name - Columns: Define the data structure - Relationships: Define connections between tables - Indexes: Improve query performance
Model Relationships¶
What: - Your models have various types of relationships: - One-to-many (a destination has many activities) - Many-to-many (users can share trips with multiple users)
How:
# One-to-many relationship
class Destination(Base):
# ...
activities = relationship("Activity", back_populates="destination", cascade="all, delete-orphan")
class Activity(Base):
# ...
destination_id = Column(Integer, ForeignKey("destinations.id"))
destination = relationship("Destination", back_populates="activities")
# Many-to-many relationship
class User(Base):
# ...
shared_with_me = relationship(
"SavedTrip",
secondary=user_shared_trips,
back_populates="shared_with"
)
Why: - Foreign keys: Establish relationships between tables - Cascades: Automatically handle related records (e.g., delete activities when a destination is deleted) - back_populates: Create bidirectional relationships - secondary: Use a junction table for many-to-many relationships
6. Common Alembic Commands and How to Use Them¶
Creating Migrations¶
What: - You can create migrations manually or automatically - Auto-generated migrations detect changes between your models and database
How:
# Create a new empty migration
alembic revision -m "description_of_changes"
# Generate a migration based on model changes
alembic revision --autogenerate -m "description_of_changes"
Why: - Manual migrations: For complex changes that need custom logic - Autogenerate: Quick and less error-prone for simple changes - -m flag: Adds a descriptive message to help identify the migration
Applying Migrations¶
What: - Apply migrations to update your database schema - You can apply all pending migrations or go to a specific version
How:
# Apply all pending migrations
alembic upgrade head
# Apply next migration only
alembic upgrade +1
# Go to a specific migration
alembic upgrade 8e9625799d4e
Why: - upgrade head: Most common - brings database to latest version - upgrade +1: Useful for testing migrations one at a time - specific revision: Roll forward or backward to a specific point
Viewing Migration Status¶
What: - Check what migrations are applied and what are pending - See migration history
How:
# Show current revision
alembic current
# Show history
alembic history
# Show pending migrations
alembic history --indicate-current
Why: - current: Quickly see what version your database is at - history: See the full migration timeline - indicate-current: Helps identify what needs to be applied
7. Best Practices for Database Management (as shown in your project)¶
1. Keep Database Credentials Secure¶
What: - Your project stores database credentials in environment variables - Default values are provided for development but not production secrets
How:
def get_url():
user = os.getenv("POSTGRES_USER", "postgres")
password = os.getenv("POSTGRES_PASSWORD", "carnepalapicadora")
server = os.getenv("POSTGRES_SERVER", "db")
db = os.getenv("POSTGRES_DB", "traveldb")
return f"postgresql://{user}:{password}@{server}/{db}"
Why: - Environment variables: Keep sensitive data out of code - Default values: Make development easier while keeping production secure
2. Use Appropriate Data Types¶
What: - Your models use specific data types for different kinds of data - Special types like JSON for structured data
How:
name = Column(String, nullable=True, index=True)
amount = Column(Float, nullable=False)
coordinates = Column(JSON, nullable=True)
date = Column(DateTime(timezone=True), nullable=False)
Why: - Type safety: Prevents storing the wrong kind of data - Performance: Different types have different storage and query characteristics - Functionality: Special types like JSON enable complex data operations
3. Define Relationships Properly¶
What: - Your models use well-defined relationships with proper cascades - Bidirectional relationships using back_populates
How:
# In Destination model
activities = relationship("Activity", back_populates="destination", cascade="all, delete-orphan")
# In Activity model
destination = relationship("Destination", back_populates="activities")
Why: - Data integrity: Ensure related records stay consistent - Ease of use: Access related records easily in code - Cascade behavior: Automatically handle related records when a parent is modified
4. Use Indexes Strategically¶
What: - Your models add indexes to columns that are frequently searched - Primary keys are automatically indexed
How:
id = Column(Integer, primary_key=True, index=True)
name = Column(String, nullable=True, index=True)
country = Column(String, nullable=True, index=True)
category = Column(String, index=True)
Why: - Query performance: Indexes speed up searches - Strategic indexing: Only index columns that need it (indexes have overhead) - Compound indexes: For queries that filter on multiple columns
5. Make Migrations Reversible¶
What: - Your migrations include both upgrade and downgrade functions - Downgrades exactly reverse the corresponding upgrades
How:
def upgrade() -> None:
op.create_table('destinations', ...)
def downgrade() -> None:
op.drop_table('destinations')
Why: - Rollback capability: Can undo changes if problems occur - Testing: Can test migrations in both directions - Development flexibility: Can switch between different versions easily
8. Common Pitfalls and How to Avoid Them¶
1. Forgetting to Import Models¶
What: - Alembic needs to know about all your models to generate migrations - Missing imports can lead to incomplete migrations
How to avoid: - In env.py, import all models explicitly:
from app.models.destination import Destination, DestinationCost, Activity, WeatherRecord
from app.models.user import User
from app.models.saved_trip import SavedTrip
# ... other imports ...
2. Modifying Applied Migrations¶
What: - Once a migration is applied to any database, it should never be modified - Changes should be made in new migrations
How to avoid: - Treat migrations like Git commits - create new ones instead of modifying old ones - If you need to correct a mistake, create a new migration that fixes it
3. Ignoring Migration Conflicts¶
What: - Multiple developers creating migrations simultaneously can cause conflicts - Branches with different migrations can be hard to merge
How to avoid: - Coordinate database changes among team members - Run alembic heads to check for multiple head revisions - Use alembic merge to resolve multiple heads
4. Large Migrations¶
What: - Big schema changes in a single migration can be risky and hard to review - They can also lock tables for too long in production
How to avoid: - Break large changes into smaller, more focused migrations - Consider data migrations separately from schema migrations - Use background migrations for large production databases
9. Connecting FastAPI to Your Database¶
What: - Your FastAPI application needs access to database sessions - You use dependency injection to provide database access
How:
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
# In an endpoint
@app.get("/users/")
def read_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
users = crud.get_users(db, skip=skip, limit=limit)
return users
Why: - Dependency injection: Clean way to provide database access - Automatic cleanup: Sessions are always closed, even if errors occur - Testability: Easy to mock for testing
10. Database Migrations in Development vs. Production¶
Development Workflow¶
What: - In development, you frequently make schema changes - You might reset your database often
How: 1. Modify your SQLAlchemy models 2. Generate a migration: alembic revision --autogenerate -m "description" 3. Apply it: alembic upgrade head 4. Test your changes
Why: - Rapid iteration: Quickly test schema changes - Clean state: Can reset database if needed - Autogenerate: Save time with automatic migration generation
Production Workflow¶
What: - In production, migrations must be handled carefully - Schema auto-creation is disabled for safety - You can never lose or corrupt production data
How: 1. Create and test migrations in development 2. Review migrations carefully before applying to production 3. Back up production database 4. Apply migrations during low-traffic periods 5. Monitor application after migrations
Production Safety Features: - Environment Detection: Automatic detection based on ENVIRONMENT configuration - Schema Auto-Creation Disabled: Production environments require explicit migrations - Migration-Only Changes: All schema changes must go through Alembic migrations - Rollback Support: Migration versioning allows for safe rollbacks
Why: - Data safety: Production data is valuable and must be protected - Controlled Changes: Prevents accidental schema modifications - Minimal downtime: Users should experience little or no disruption - Verification: Ensure migrations work correctly before affecting users - Audit Trail: Migration history provides clear record of all schema changes
11. Troubleshooting Alembic Migration Issues¶
Common Migration Problems¶
While Alembic migrations are powerful, you may encounter some common issues:
1. Multiple Head Revisions
What: - When different developers create migrations in parallel or on different branches - After merging, you end up with multiple "head" revisions that Alembic doesn't know how to resolve
How to detect:
How to fix:
# Create a merge migration that combines multiple heads
alembic merge -m "merge migrations" <revision1> <revision2>
2. Conflicting Migrations
What: - When migrations attempt to perform incompatible operations - For example, two migrations both try to create the same table
How to detect: - Alembic will raise errors like DuplicateTable: relation "table_name" already exists
How to fix: - Modify one of the conflicting migrations to check if objects exist before creating them:
def upgrade():
# Check if table exists before creating it
conn = op.get_bind()
inspector = Inspector.from_engine(conn)
tables = inspector.get_table_names()
if 'my_table' not in tables:
op.create_table('my_table',
# table definition
)
else:
print("Table 'my_table' already exists, skipping creation")
Alternative Approach: Direct Schema Creation¶
When migration issues become too complex to resolve (particularly in development environments), an alternative approach is to bypass migrations completely and create the schema directly from SQLAlchemy models.
What: - Avoid running alembic upgrade head and instead use SQLAlchemy's create_all() to generate the schema - Set the Alembic version manually to a known head
How: We've implemented this approach in entrypoint.sh:
# Fresh DB initialization approach instead of migrations
echo "Creating database schema directly from models..."
python <<EOF
from sqlalchemy import create_engine, inspect, text
from app.core.db import Base
from app.models.user import User
from app.models.destination import Destination, DestinationCost, Activity, WeatherRecord
from app.models.saved_trip import SavedTrip
from app.models.associations import user_shared_trips
from app.models.iata_codes import IATACode
from app.models.airport_mappings import AirportMapping
from app.models.cache_metadata import CacheMetadata
import os
# Get the database URL from environment
db_url = os.environ.get("DATABASE_URL")
print(f"Connecting to database: {db_url[:25]}...")
# Create engine
engine = create_engine(db_url)
# Check if any of our tables already exist
inspector = inspect(engine)
existing_tables = inspector.get_table_names()
our_tables = ['users', 'destinations', 'saved_trips', 'user_shared_trips',
'destination_costs', 'activities', 'weather_records',
'iata_codes', 'airport_mappings', 'cache_metadata']
tables_exist = any(table in existing_tables for table in our_tables)
if tables_exist:
print("Some tables already exist. Skipping schema creation.")
else:
print("Creating all tables from SQLAlchemy models")
Base.metadata.create_all(engine)
print("Tables created successfully")
# Create a current alembic version stamp without running migrations
with engine.begin() as conn:
# Check if alembic_version table exists
if 'alembic_version' in existing_tables:
# Get the current head revision from any migration file
print("Setting alembic_version to current head")
conn.execute(text("DELETE FROM alembic_version"))
conn.execute(text("INSERT INTO alembic_version VALUES ('78f58cb0eda7')"))
else:
print("Creating alembic_version table and setting current head")
conn.execute(text("CREATE TABLE IF NOT EXISTS alembic_version (version_num VARCHAR(32) NOT NULL)"))
conn.execute(text("INSERT INTO alembic_version VALUES ('78f58cb0eda7')"))
print("Database schema is now ready and alembic version is set")
EOF
Why: - Recovery from migration issues: Useful when migration history gets corrupted - Clean slate: Creates a pristine database state based on current models - Development simplicity: Avoids complex migration troubleshooting for development environments - Version control: Still sets the alembic_version to allow future migrations
Important Note: After using this approach, you'll still need to populate initial data using our setup scripts:
# Run these after direct schema creation
python scripts/setup_db.py # Adds initial airport mappings
python scripts/populate_airports.py # Populates IATA airport codes
When to Use: - Development environments: When you need a quick reset - Initial setup: For new installations where migration history doesn't matter - Migration recovery: When migrations are broken and need a fresh start
When NOT to Use: - Production environments with existing data: This approach doesn't migrate data - When migrations are working correctly: Regular Alembic workflows are preferable when they work
In Summary¶
Our database configuration and Alembic setup follow best practices for a modern FastAPI application:
- Clean separation of database configuration, models, and migrations
- Environment-based configuration for flexibility and security
- Fully async database operations for optimal performance and scalability
- Production-safe schema management with environment-aware auto-creation controls
- Well-defined models with appropriate relationships and indexes
- Structured migrations that track database schema evolution
- Dependency injection for clean async database access in our API
- Comprehensive monitoring with health checks and metrics
This structure will serve us well as the application grows, allowing us to evolve our database schema safely while maintaining data integrity and optimal performance through async operations.