Complex Queries

CQL provides a powerful, type-safe query builder that allows you to compose sophisticated SQL queries using idiomatic Crystal code. This guide demonstrates how to build complex queries using Model.query and the CQL::Query interface.


Join Types and Strategies

Implicit vs Explicit Joins

CQL supports both implicit and explicit join syntax for different use cases:

# Implicit JOIN - CQL automatically detects relationships based on foreign keys
users_with_posts = User.query.joins(:posts).all(User)
# Generates: SELECT * FROM users INNER JOIN posts ON users.id = posts.user_id

# Explicit JOIN - Full control over join conditions
users_with_posts = User.query.join(Post) { users.id == posts.user_id }
                             .all(User)

Foreign Key Requirements for Implicit Joins

Important: Implicit joins in CQL require foreign key relationships to be explicitly defined in your schema. CQL uses these foreign key definitions to automatically infer the correct JOIN conditions.

Schema Definition with Foreign Keys

For implicit joins to work, you must define foreign key constraints in your schema:

AppDB = CQL::Schema.define(:app, adapter: CQL::Adapter::Postgres, uri: ENV["DATABASE_URL"]) do
  table :users do
    primary :id, Int32
    column :name, String
    column :email, String
    timestamps
  end

  table :posts do
    primary :id, Int32
    column :title, String
    column :body, String
    column :user_id, Int32  # Foreign key column
    timestamps

    # ✅ Required: Foreign key definition for implicit joins
    foreign_key [:user_id], references: :users, references_columns: [:id]
  end

  table :comments do
    primary :id, Int32
    column :content, String
    column :post_id, Int32
    column :user_id, Int32
    timestamps

    # Multiple foreign keys
    foreign_key [:post_id], references: :posts, references_columns: [:id]
    foreign_key [:user_id], references: :users, references_columns: [:id]
  end

  # Many-to-many join table
  table :post_tags do
    primary :id, Int32
    column :post_id, Int32
    column :tag_id, Int32

    # Both foreign keys defined for many-to-many relationships
    foreign_key [:post_id], references: :posts, references_columns: [:id]
    foreign_key [:tag_id], references: :tags, references_columns: [:id]
  end

  table :tags do
    primary :id, Int32
    column :name, String
  end
end

How CQL Uses Foreign Keys for Implicit Joins

When you use implicit joins, CQL:

  1. Looks up the foreign key definition in the schema

  2. Automatically determines the join condition based on the foreign key relationship

  3. Generates the appropriate SQL JOIN clause

# CQL analyzes the schema and finds:
# posts table has foreign_key [:user_id] references :users [:id]

users_with_posts = User.query.joins(:posts).all(User)
# CQL automatically generates:
# SELECT * FROM users INNER JOIN posts ON users.id = posts.user_id

Multi-Level Implicit Joins

With proper foreign key definitions, CQL can handle complex multi-level joins:

# Schema relationships: users -> posts -> comments
users_with_post_comments = User.query.joins(posts: :comments).all(User)
# Generates:
# SELECT * FROM users
# INNER JOIN posts ON users.id = posts.user_id
# INNER JOIN comments ON posts.id = comments.post_id

Without Foreign Keys - Explicit Joins Required

If foreign keys are not defined in the schema, you must use explicit joins:

# ❌ This won't work without foreign key definitions
User.query.joins(:posts).all(User)  # Error: Cannot infer join condition

# ✅ Use explicit joins when foreign keys are missing
User.query.join(Post) { |j| j.users.id.eq(j.posts.user_id) }.all(User)

Composite Foreign Keys

CQL supports composite foreign keys for complex relationships:

AppDB = CQL::Schema.define(:app, adapter: CQL::Adapter::Postgres, uri: ENV["DATABASE_URL"]) do
  table :order_items do
    primary :id, Int32
    column :order_id, Int32
    column :product_id, Int32
    column :quantity, Int32

    # Composite foreign key
    foreign_key [:order_id, :product_id],
                references: :order_products,
                references_columns: [:order_id, :product_id]
  end

  table :order_products do
    column :order_id, Int32
    column :product_id, Int32
    column :price, Float64

    primary [:order_id, :product_id]  # Composite primary key
  end
end

# CQL can handle composite foreign key joins
order_items_with_details = OrderItem.query.joins(:order_products).all(OrderItem)
# Generates: SELECT * FROM order_items
#           INNER JOIN order_products ON (order_items.order_id = order_products.order_id
#                                        AND order_items.product_id = order_products.product_id)

Foreign Key vs Column Reference

There's an important distinction between having a column that references another table and having a proper foreign key constraint:

# ❌ Column exists but no foreign key defined
table :posts do
  primary :id, Int32
  column :title, String
  column :user_id, Int32  # Just a column, not a foreign key
end

# ❌ Implicit joins won't work
User.query.joins(:posts).all(User)  # Error: No foreign key relationship found

# ✅ Proper foreign key definition
table :posts do
  primary :id, Int32
  column :title, String
  column :user_id, Int32

  # This tells CQL about the relationship
  foreign_key [:user_id], references: :users, references_columns: [:id]
end

# ✅ Now implicit joins work
User.query.joins(:posts).all(User)  # Success!

Checking Schema Foreign Keys

You can inspect your schema's foreign key definitions:

# Get foreign keys for a table
posts_foreign_keys = AppDB.posts.foreign_keys
posts_foreign_keys.each do |fk|
  puts "Foreign key: #{fk.name}"
  puts "Columns: #{fk.columns}"
  puts "References: #{fk.references_table}.#{fk.references_columns}"
end

# Check if a foreign key exists
has_user_fk = AppDB.posts.foreign_keys.any? { |fk| fk.references_table == :users }

Best Practices for Foreign Keys and Joins

  1. Always define foreign keys in your schema for relationships you'll query

  2. Use consistent naming for foreign key columns (e.g., user_id, post_id)

  3. Define foreign keys before creating tables in your schema

  4. Use explicit joins when you need custom join conditions beyond foreign key relationships

  5. Document complex relationships in your schema comments

AppDB = CQL::Schema.define(:app, adapter: CQL::Adapter::Postgres, uri: ENV["DATABASE_URL"]) do
  table :users do
    primary :id, Int32
    column :name, String
    column :email, String
    column :department_id, Int32
    timestamps

    # Foreign key for department relationship
    foreign_key [:department_id], references: :departments, references_columns: [:id]
  end

  table :posts do
    primary :id, Int32
    column :title, String
    column :body, String
    column :author_id, Int32      # Could be user_id, but using descriptive name
    column :category_id, Int32
    timestamps

    # Clear foreign key definitions
    foreign_key [:author_id], references: :users, references_columns: [:id]
    foreign_key [:category_id], references: :categories, references_columns: [:id]
  end
end

# Now all these implicit joins work seamlessly
users_with_posts = User.query.joins(:posts).all(User)
posts_with_authors = Post.query.joins(:author).all(Post)  # Uses author_id -> users.id
posts_with_categories = Post.query.joins(:category).all(Post)

This foreign key requirement ensures that CQL can provide type-safe, automatic join generation while maintaining clear relationships in your database schema.

Inner Joins (Default)

Inner joins return only records that have matching records in both tables:

# Automatic inner join based on foreign key relationship
active_users_with_posts = User.query.joins(:posts)
                                   .where { users.active == true }
                                   .all(User)

# Explicit inner join with custom conditions
users_with_recent_posts = User.query.join(Post) { users.id == posts.user_id }
                                   .where { posts.created_at > 1.month.ago }
                                   .all(User)

# Multiple inner joins
User.query.joins(posts: :comments)
         .where { comments.approved == true }
         .all(User)

Left Joins

Left joins return all records from the left table and matching records from the right table:

# All users including those without posts
all_users_with_optional_posts = User.query.left_joins(:posts).all(User)

# Explicit left join with conditions
User.query.left_join(Post) { users.id == posts.user_id }
         .where { (posts.id.is_null) | (posts.published == true) }
         .all(User)

# Find users who have no posts
users_without_posts = User.query.left_joins(:posts)
                                .where { posts.id.is_null }
                                .all(User)

Right Joins

Right joins return all records from the right table and matching records from the left table:

# All posts including orphaned ones (if any)
all_posts_with_optional_users = User.query.right_joins(:posts).all(User)

# Explicit right join
Post.query.right_join(User) { posts.user_id == users.id }
         .where { users.active == true }
         .all(Post)

Self Joins

Join a table to itself for hierarchical or comparative queries:

# Find users and their managers (assuming users table has manager_id)
User.query.from(users: :u)
         .join({users: :m}) { u.manager_id == m.id }
         .select { [u.name.as("employee"), m.name.as("manager")] }
         .all(User)

# Find users in the same city
User.query.from(users: :u1)
         .join({users: :u2}) { (u1.city == u2.city) & (u1.id != u2.id) }
         .select { [u1.name, u2.name, u1.city] }
         .all(User)

Cross Joins

Generate Cartesian product of two tables (use with caution):

# Cross join using explicit FROM syntax
User.query.from(users: :u, posts: :p)
         .where { u.active == true }
         .select { [u.name, p.title] }
         .all(User)

Advanced Filtering

Combine multiple conditions, logical operators, and expressions:

# Multiple AND/OR conditions
users = User.query.where { (age >= 18) & (active == true) }
                 .where { (name.like("A%")) | (email.like("%@gmail.com")) }
                 .all(User)

# Nested conditions with parentheses
admins = User.query.where { (role == "admin") & ((status == "active") | (status == "pending")) }
                   .all(User)

# IN and NOT IN operations
moderators = User.query.where { role.in(["admin", "moderator"]) }
                       .all(User)

excluded = User.query.where { id.not_in([1, 2, 3]) }
                     .all(User)

# NULL checks
users_with_profiles = User.query.where { profile_id.is_not_null }
                                .all(User)

# Range and comparison operations
adults = User.query.where { age.between(18, 65) }
                  .all(User)

recent_users = User.query.where { created_at > 1.week.ago }
                        .all(User)

Ordering and Sorting

Basic Ordering

# Single column ascending (default)
users = User.query.order(:name).all(User)

# Single column descending
users = User.query.order(:created_at, :desc).all(User)

# Multiple columns
users = User.query.order(:role)
                 .order(:name)
                 .all(User)

Advanced Ordering

# Order by calculated expressions
users = User.query.order { age * 2 }
                 .all(User)

# Order by joined table columns
users = User.query.joins(:posts)
                 .order { posts.created_at }
                 .desc
                 .all(User)

# Mixed ordering directions
users = User.query.order(:role)           # ASC
                 .order(:created_at, :desc) # DESC
                 .order(:name)              # ASC
                 .all(User)

# Reverse current ordering
users = User.query.order(:name)
                 .reverse_order  # Changes to DESC
                 .all(User)

Pagination and Limiting

Basic Pagination

# LIMIT and OFFSET
page_size = 20
page_number = 2
users = User.query.limit(page_size)
                 .offset((page_number - 1) * page_size)
                 .all(User)

# First N records
top_users = User.query.order(:points, :desc)
                     .limit(10)
                     .all(User)

Cursor-Based Pagination

# More efficient for large datasets
last_id = 1000
users = User.query.where { id > last_id }
                 .order(:id)
                 .limit(20)
                 .all(User)

Grouping, Aggregation, and Having

Use grouping and aggregate functions for reporting and analytics:

# Group by and aggregate
role_stats = User.query.select { [role, CQL.count(id).as("total")] }
                      .group_by(:role)
                      .all(User)

# Group by multiple columns
stats = Post.query.select { [user_id, status, CQL.count(id).as("post_count")] }
                  .group_by(:user_id, :status)
                  .all(Post)

# Having conditions on aggregates
active_users = User.query.select { [user_id, CQL.count(posts.id).as("post_count")] }
                        .joins(:posts)
                        .group_by(:user_id)
                        .having { CQL.count(posts.id) > 5 }
                        .all(User)

# Multiple aggregates
summary = Order.query.select { [
  CQL.sum(:total).as("total_sum"),
  CQL.avg(:total).as("avg_total"),
  CQL.min(:total).as("min_total"),
  CQL.max(:total).as("max_total"),
  CQL.count(:id).as("order_count")
] }.where { status == "paid" }
  .all(Order)

DISTINCT Queries

Remove duplicate results from your queries:

# Select distinct values
unique_cities = User.query.select(:city)
                         .distinct
                         .all(User)

# Distinct with multiple columns
unique_combinations = User.query.select(:role, :department)
                                .distinct
                                .all(User)

# Distinct with joins
active_post_authors = User.query.joins(:posts)
                               .where { posts.published == true }
                               .select(:id, :name)
                               .distinct
                               .all(User)

Subqueries

Use subqueries for advanced filtering and data retrieval:

# EXISTS subqueries
users_with_posts = User.query.where {
  CQL.exists(
    Post.query.select(:id)
        .where { posts.user_id == users.id }
        .where { posts.published == true }
  )
}.all(User)

# IN subqueries
prolific_users = User.query.where { id.in(
  Post.query.select(:user_id)
      .group_by(:user_id)
      .having { CQL.count(id) > 5 }
) }.all(User)

# Correlated subqueries with OR conditions
users_with_recent_activity = User.query.where {
  id.in(Post.query.select(:user_id).where { created_at > 1.week.ago }) |
  id.in(Comment.query.select(:user_id).where { created_at > 1.week.ago })
}.all(User)

# Scalar subqueries
users_with_post_count = User.query.select { [
  users.*,
  Post.query.select { CQL.count(:id) }
      .where { posts.user_id == users.id }
      .as("post_count")
] }.all(User)

Raw SQL Integration

Inspecting Generated SQL

# View the generated SQL and parameters
query = User.query.joins(:posts)
              .where { users.active == true }
              .where { posts.published == true }
              .order(:users.name)

sql, params = query.to_sql
puts "SQL: #{sql}"
puts "Params: #{params.inspect}"
# SQL: SELECT * FROM users INNER JOIN posts ON users.id = posts.user_id WHERE (users.active = ?) AND (posts.published = ?) ORDER BY users.name ASC
# Params: [true, true]

Executing Raw SQL

# Execute raw SQL through the schema
results = UserDB.exec("SELECT COUNT(*) as total FROM users WHERE active = ?", [true])

# Complex raw queries when query builder limitations are reached
sql = <<-SQL
  SELECT u.*,
         COUNT(p.id) as post_count,
         AVG(p.view_count) as avg_views
  FROM users u
  LEFT JOIN posts p ON u.id = p.user_id
  WHERE u.active = true
  GROUP BY u.id, u.name, u.email
  HAVING COUNT(p.id) > 5
  ORDER BY post_count DESC, u.name
SQL

results = UserDB.exec(sql)

N+1 Query Prevention

One of the most common performance issues in ORMs is the N+1 query problem, where loading a collection of records results in 1 query for the main records plus N additional queries for each associated record. CQL provides several mechanisms to prevent this issue.

Understanding the N+1 Problem

# ❌ N+1 Query Problem - Inefficient
users = User.query.all                    # 1 query: SELECT * FROM users
users.each do |user|
  puts user.posts.size                    # N queries: SELECT * FROM posts WHERE user_id = ?
end
# Total: 1 + N queries (where N = number of users)

CQL's JOIN-Based Approach

CQL automatically uses efficient JOIN-based queries instead of separate queries for each association:

Has Many Associations

# ✅ Efficient: Single query with WHERE clause
user = User.find(1)
posts = user.posts.all
# Generates: SELECT * FROM posts WHERE posts.user_id = 1

Many-to-Many Associations

# ✅ Efficient: Single query with JOIN
movie = Movie.find(1)
actors = movie.actors.all
# Generates: SELECT actors.* FROM actors
#           INNER JOIN movies_actors ON actors.id = movies_actors.actor_id
#           WHERE movies_actors.movie_id = 1

Lazy Loading with Caching

CQL implements intelligent lazy loading that prevents duplicate queries:

user = User.find(1)

# First access - executes query and caches results
posts = user.posts.all                  # Query executed
puts posts.size                         # Uses cached data

# Subsequent access - uses cached data
user.posts.each { |post| puts post.title }  # No additional query

Efficient Association Operations

CQL provides methods to perform common operations without loading full records:

Count Without Loading

# ✅ Efficient: COUNT query without loading records
user = User.find(1)
post_count = user.posts_count
# Generates: SELECT COUNT(*) FROM posts WHERE posts.user_id = 1

# Many-to-many count
movie = Movie.find(1)
actor_count = movie.actors_count
# Generates: SELECT COUNT(*) FROM movies_actors WHERE movies_actors.movie_id = 1

Existence Checks Without Loading

# ✅ Efficient: EXISTS query
user = User.find(1)
has_posts = user.posts_any?
# Generates: SELECT 1 FROM posts WHERE posts.user_id = 1 LIMIT 1

# Check if specific record exists in association
actor = Actor.find(1)
movie = Movie.find(1)
is_associated = movie.actors_include?(actor)
# Generates: SELECT 1 FROM movies_actors
#           WHERE movies_actors.movie_id = 1 AND movies_actors.actor_id = 1 LIMIT 1

ID-Only Operations

# ✅ Efficient: Get IDs without loading full records
user = User.find(1)
post_ids = user.posts_ids
# Generates: SELECT posts.id FROM posts WHERE posts.user_id = 1

# Set associations by IDs (many-to-many)
movie = Movie.find(1)
movie.actors_ids = [1, 2, 3]  # Replaces current associations efficiently

Query Optimization Strategies

Batch Operations

# ✅ Efficient: Batch loading patterns
user_ids = [1, 2, 3, 4, 5]

# Load all posts for multiple users in one query
posts_by_user = Post.query.where(user_id: user_ids)
                          .group_by(&.user_id)

# Now access posts without additional queries
user_ids.each do |user_id|
  user_posts = posts_by_user[user_id]? || [] of Post
  puts "User #{user_id} has #{user_posts.size} posts"
end

Strategic Use of Joins

# ✅ Efficient: Use joins for filtering and aggregation
users_with_post_counts = User.query
  .joins(:posts)
  .select("users.*, COUNT(posts.id) as post_count")
  .group_by("users.id")
  .all

# No N+1 problem - all data loaded in single query
users_with_post_counts.each do |user|
  puts "#{user.name} has #{user.post_count} posts"
end

Performance Best Practices

# 1. Use specific column selection instead of SELECT *
users = User.query.select(:id, :name, :email)
                 .where { active == true }
                 .all(User)

# 2. Use EXISTS instead of IN for large subqueries
users = User.query.where {
  CQL.exists(Post.query.where { posts.user_id == users.id })
}.all(User)

# 3. Use LIMIT to prevent accidentally loading large datasets
recent_users = User.query.order(:created_at, :desc)
                        .limit(100)
                        .all(User)

# 4. Use indexes effectively with proper WHERE clause ordering
users = User.query.where { email == "user@example.com" }  # Assuming email is indexed
                 .where { active == true }
                 .all(User)

# 5. ✅ Prefer efficient association methods over loading full collections
# Bad: Loads all posts just to count them
users = User.query.all
users.each { |user| puts user.posts.size }  # N+1 queries + memory overhead

# Good: Use count methods
users = User.query.all
users.each { |user| puts user.posts_count }  # N COUNT queries (still not ideal)

# Better: Use joins for aggregation
users_with_counts = User.query
  .left_joins(:posts)
  .select("users.*, COUNT(posts.id) as post_count")
  .group_by("users.id")
  .all
users_with_counts.each { |user| puts user.post_count }  # Single query

Advanced N+1 Prevention Patterns

Manual Preloading

# For complex scenarios, manually preload associations
def load_users_with_posts(user_ids)
  # Load users
  users = User.query.where(id: user_ids).all

  # Preload posts in batch
  posts_by_user = Post.query.where(user_id: user_ids)
                            .group_by(&.user_id)

  # Create a lookup structure
  users.each do |user|
    # Associate posts with users in memory
    user_posts = posts_by_user[user.id]? || [] of Post
    # Store in instance variable or similar mechanism
  end

  users
end

Caching Association Counts

# Use database-level caching for frequently accessed counts
class User
  # Add posts_count column to users table
  property posts_count : Int32 = 0

  # Update count when posts are added/removed
  def update_posts_count!
    self.posts_count = posts.count
    save!
  end
end

# Now you can access post counts without queries
users = User.query.all
users.each { |user| puts user.posts_count }  # No additional queries

Monitoring and Debugging

Query Analysis

# Monitor query patterns to identify N+1 issues
def with_query_logging(&block)
  query_count = 0
  start_time = Time.utc

  # This would require database-level query logging
  # Implementation depends on your database adapter

  result = yield

  end_time = Time.utc
  puts "Executed #{query_count} queries in #{end_time - start_time} seconds"
  result
end

# Usage
with_query_logging do
  users = User.query.limit(10).all
  users.each { |user| puts user.posts_count }  # Monitor query count here
end

Performance Testing

# Test different approaches to verify performance
def benchmark_association_access
  require "benchmark"

  user_ids = User.query.limit(100).ids

  Benchmark.ips do |x|
    x.report("N+1 pattern") do
      users = User.query.where(id: user_ids).all
      users.each { |user| user.posts.size }
    end

    x.report("Count method") do
      users = User.query.where(id: user_ids).all
      users.each { |user| user.posts_count }
    end

    x.report("JOIN aggregation") do
      User.query.where(id: user_ids)
               .joins(:posts)
               .select("users.*, COUNT(posts.id) as post_count")
               .group_by("users.id")
               .all
    end

    x.compare!
  end
end

Query Optimization Tips

Query Analysis

# Analyze query performance by examining generated SQL
def analyze_query(query)
  sql, params = query.to_sql
  puts "=== Query Analysis ==="
  puts "SQL: #{sql}"
  puts "Parameters: #{params.inspect}"
  puts "Character length: #{sql.size}"
  puts "Join count: #{sql.scan(/JOIN/i).size}"
  puts "Where conditions: #{sql.scan(/WHERE|AND|OR/i).size}"
  puts "====================="
end

# Usage
query = User.query.joins(posts: :comments)
              .where { users.active == true }
              .where { posts.published == true }
              .where { comments.approved == true }

analyze_query(query)

Advanced Query Patterns

Conditional Queries

def build_user_search(name_filter = nil, email_filter = nil, active_only = false)
  query = User.query

  if name_filter
    query = query.where { name.like("%#{name_filter}%") }
  end

  if email_filter
    query = query.where { email.like("%#{email_filter}%") }
  end

  if active_only
    query = query.where { active == true }
  end

  query.order(:name)
end

# Usage
users = build_user_search(name_filter: "John", active_only: true).all(User)

Complex Aggregations

# Monthly user registration statistics
monthly_stats = User.query.select { [
  CQL.date_trunc("month", created_at).as("month"),
  CQL.count(:id).as("registrations"),
  CQL.count(CQL.case {
    when(active == true) { id }
  }).as("active_users")
] }
.where { created_at > 1.year.ago }
.group_by { CQL.date_trunc("month", created_at) }
.order { CQL.date_trunc("month", created_at) }
.all(User)

Window Functions (if supported by your database)

# User ranking by posts within each role
ranked_users = User.query.select { [
  users.*,
  CQL.count(posts.id).as("post_count"),
  CQL.row_number.over(
    partition_by: :role,
    order_by: CQL.count(posts.id).desc
  ).as("rank_in_role")
] }
.left_join(Post) { users.id == posts.user_id }
.group_by(:users.id, :users.role)
.all(User)

Best Practices Summary

Query Construction

  • Use table aliases for clarity in multi-join queries

  • Use block syntax for complex conditions to improve readability

  • Leverage automatic relationship detection for simpler joins

  • Use explicit joins when you need custom conditions

  • Define foreign keys in your schema to enable implicit joins and automatic relationship detection

Performance

  • Always use .select to limit columns and improve performance

  • Use .limit and .offset for pagination

  • Prefer EXISTS over IN for subqueries with large result sets

  • Use proper indexing strategy for WHERE clause columns

  • Leverage association count and existence methods to avoid N+1 queries

  • Use JOIN-based aggregation instead of iterating over associations

  • Monitor query patterns and optimize based on actual usage patterns

Maintainability

  • Use meaningful variable names and method extraction for complex queries

  • Use .to_sql to inspect and verify generated SQL

  • Test query performance with realistic data volumes

  • Document complex business logic in query methods

Security

  • Always use parameterized queries (CQL handles this automatically)

  • Validate user input before using in dynamic query construction

  • Use proper escaping for LIKE patterns


For more information on related query topics:

Last updated

Was this helpful?