Data Migration Generator

Generate CQL data migrations for transforming and populating data independently from schema migrations.

Synopsis

azu generate data_migration <name>

Description

Data migrations are specialized scripts for transforming existing data, seeding databases, or performing one-time data operations. Unlike schema migrations that modify database structure, data migrations work with the data itself—updating records, migrating data between tables, importing external data, or performing complex transformations.

Key Differences: Schema vs. Data Migrations

Aspect
Schema Migration
Data Migration

Purpose

Modify database structure

Transform existing data

Examples

Add columns, create tables

Update records, import data

Reversibility

Usually reversible

Often irreversible

When to Run

On deploy

On-demand or scheduled

Dependencies

Database schema

Application models

Location

db/migrations/

db/data_migrations/

Usage

Basic Usage

Generate a data migration:

This creates:

Common Scenarios

Backfill New Column

Migrate Data Between Tables

Import External Data

Transform Existing Data

Archive Old Data

Generated File Structure

File Location

File Content

Common Data Migration Patterns

1. Backfill Column Values

When you add a new column and need to populate it with calculated or default values:

2. Migrate Data Between Tables

When restructuring your schema and moving data:

3. Import External Data

When importing data from CSV, JSON, or external APIs:

4. Transform and Clean Data

When fixing data inconsistencies or normalizing values:

5. Archive Old Data

When moving old records to archive tables:

Running Data Migrations

Manual Execution

Run a specific data migration:

With Database Connection

Ensure database is configured:

In Production

As Part of Deployment

Add to deployment script:

Best Practices

1. Make Migrations Idempotent

Data migrations should be safe to run multiple times:

2. Use Transactions

Wrap operations in transactions when possible:

3. Batch Processing

Process large datasets in batches:

4. Progress Reporting

Provide feedback for long-running migrations:

5. Error Handling

Handle errors gracefully:

6. Verification

Verify results after migration:

Testing Data Migrations

Create tests for data migrations:

Tracking Data Migrations

Migration Log

Create a table to track data migrations:

Record Execution

Troubleshooting

Migration Fails Midway

Problem: Large migration fails after processing many records

Solutions:

  • Implement checkpoints

  • Use batching with offset tracking

  • Make migrations resumable

Out of Memory

Problem: Processing too many records at once

Solutions:

  • Use find_each or batching

  • Process in smaller chunks

  • Clear object caches

Slow Performance

Problem: Migration takes too long

Solutions:

  • Add database indexes

  • Use bulk operations

  • Optimize queries

  • Process in parallel (if safe)

See Also

Last updated