Cost Tracking Guide

This guide explains how to use Flujo's integrated cost and token usage tracking features to monitor and control spending on LLM operations.

Quick Start

Configure pricing in your flujo.toml:

[cost.providers.openai.gpt-4o]
prompt_tokens_per_1k = 0.005
completion_tokens_per_1k = 0.015

Run a pipeline with automatic cost tracking:

from flujo import Step, Flujo

pipeline = Step.solution(my_agent)
runner = Flujo(pipeline)
result = runner.run("Your prompt")

# Access cost information
for step in result.step_history:
    print(f"{step.name}: ${step.cost_usd:.4f} ({step.token_counts} tokens)")

Set usage limits to prevent excessive spending:

from flujo import UsageLimits

limits = UsageLimits(total_cost_usd_limit=1.0, total_tokens_limit=5000)
runner = Flujo(pipeline, usage_limits=limits)

Configuration

Provider Pricing

Configure pricing for your LLM providers in flujo.toml:

[cost]
[cost.providers]

# OpenAI Models (Pricing: https://openai.com/pricing)
[cost.providers.openai.gpt-4o]
prompt_tokens_per_1k = 0.005
completion_tokens_per_1k = 0.015

[cost.providers.openai.gpt-4o-mini]
prompt_tokens_per_1k = 0.00015
completion_tokens_per_1k = 0.0006

[cost.providers.openai.gpt-3.5-turbo]
prompt_tokens_per_1k = 0.0005
completion_tokens_per_1k = 0.0015

# Anthropic Models (Pricing: https://www.anthropic.com/pricing)
[cost.providers.anthropic.claude-3-sonnet]
prompt_tokens_per_1k = 0.003
completion_tokens_per_1k = 0.015

[cost.providers.anthropic.claude-3-haiku]
prompt_tokens_per_1k = 0.00025
completion_tokens_per_1k = 0.00125

# Google Models (Pricing: https://ai.google.dev/pricing)
[cost.providers.google.gemini-1.5-pro]
prompt_tokens_per_1k = 0.0035
completion_tokens_per_1k = 0.0105

[cost.providers.google.gemini-1.5-flash]
prompt_tokens_per_1k = 0.000075
completion_tokens_per_1k = 0.0003

Pricing Structure

Provider: openai, anthropic, google, etc.
Model: Specific model name (e.g., gpt-4o, claude-3-sonnet)
Prompt tokens: Cost per 1,000 input tokens (prompt_tokens_per_1k)
Completion tokens: Cost per 1,000 output tokens (completion_tokens_per_1k)

Cost Calculation

Costs are calculated using the formula:

cost = (prompt_tokens / 1000) * prompt_tokens_per_1k +
       (completion_tokens / 1000) * completion_tokens_per_1k

Note: The pricing units use _per_1k (per 1,000 tokens) rather than _per_million_tokens to align with common provider pricing pages and provide more intuitive configuration values.

Image Generation Cost Tracking

Flujo supports automatic cost tracking for image generation models like DALL-E 3. The system automatically detects image models and attaches cost calculation post-processors.

Image Model Configuration

Configure image generation pricing in your flujo.toml:

# OpenAI image generation models
[cost.providers.openai."dall-e-3"]
prompt_tokens_per_1k = 0.0  # No token costs for image generation
completion_tokens_per_1k = 0.0  # No token costs for image generation
price_per_image_standard_1024x1024 = 0.040
price_per_image_hd_1024x1024 = 0.080
price_per_image_standard_1792x1024 = 0.080
price_per_image_hd_1792x1024 = 0.120
price_per_image_standard_1024x1792 = 0.080
price_per_image_hd_1024x1792 = 0.120

Supported Image Models

Flujo automatically detects and configures cost tracking for these image generation models:

OpenAI DALL-E: dall-e-2, dall-e-3
Midjourney: midjourney:v6
Stable Diffusion: stable-diffusion:xl
Google Imagen: imagen-2

Image Cost Calculation

Image costs are calculated based on: - Quality: standard or hd - Size: 1024x1024, 1792x1024, 1024x1792 - Number of images: Reported in the usage details

The cost formula is:

cost = number_of_images * price_per_image_{quality}_{size}

Using Image Models

Image models work seamlessly with the existing Flujo API:

from flujo import Step, Flujo
from flujo.infra.agents import make_agent_async

# Create a DALL-E 3 agent
dalle_agent = make_agent_async(
    model="openai:dall-e-3",
    system_prompt="Generate beautiful images",
    output_type=str,
)

# Create a pipeline
pipeline = Step.solution(dalle_agent)
runner = Flujo(pipeline)

# Run the pipeline
result = runner.run("Generate a landscape")

# Access cost information
for step in result.step_history:
    print(f"{step.name}: ${step.cost_usd:.4f}")

Image Model Features

Automatic Detection: Image models are automatically detected and configured
Quality Support: Different pricing for standard and HD quality
Size Support: Different pricing for various image sizes
Usage Limits: Image costs integrate with existing usage limits
Backward Compatibility: Chat models continue to work normally

Strict Pricing Mode

For production environments where cost accuracy is critical, Flujo provides a Strict Pricing Mode that ensures all cost calculations are based on your explicit configuration.

Enabling Strict Mode

Add the strict = true flag to your flujo.toml:

[cost]
strict = true  # <-- Enable strict pricing mode

[cost.providers.openai.gpt-4o]
prompt_tokens_per_1k = 0.005
completion_tokens_per_1k = 0.015

How Strict Mode Works

When strict mode is enabled:

Explicit Configuration Required: Every model used in your pipeline must have explicit pricing configured in flujo.toml
No Fallback to Hardcoded Defaults: The system will not use hardcoded default prices, even for common models
Immediate Failure: If a model is used without explicit pricing, the pipeline will fail immediately with a PricingNotConfiguredError

Example: Strict Mode Success

# flujo.toml
[cost]
strict = true

[cost.providers.openai.gpt-4o]
prompt_tokens_per_1k = 0.005
completion_tokens_per_1k = 0.015

[cost.providers.openai.gpt-3.5-turbo]
prompt_tokens_per_1k = 0.0005
completion_tokens_per_1k = 0.0015

[cost.providers.openai."dall-e-3"]
prompt_tokens_per_1k = 0.0
completion_tokens_per_1k = 0.0
price_per_image_standard_1024x1024 = 0.040

from flujo import Step, Flujo
from flujo.infra.agents import make_agent_async

# These agents will work with strict mode
chat_agent = make_agent_async("openai:gpt-4o", "You are helpful", str)
image_agent = make_agent_async("openai:dall-e-3", "Generate images", str)

pipeline = Step.solution(chat_agent) >> Step.validate(image_agent)
runner = Flujo(pipeline)

# This will work because all models have explicit pricing
result = runner.run("Generate a response and an image")

Example: Strict Mode Failure

# flujo.toml
[cost]
strict = true

[cost.providers.openai.gpt-4o]
prompt_tokens_per_1k = 0.005
completion_tokens_per_1k = 0.015

# Missing pricing for dall-e-3

from flujo import Step, Flujo
from flujo.infra.agents import make_agent_async
from flujo.exceptions import PricingNotConfiguredError

# This will fail because dall-e-3 has no pricing
image_agent = make_agent_async("openai:dall-e-3", "Generate images", str)
pipeline = Step.solution(image_agent)
runner = Flujo(pipeline)

try:
    result = runner.run("Generate an image")
except PricingNotConfiguredError as e:
    print(f"Pipeline failed: {e}")
    # Output: Pipeline failed: Pricing not configured for provider=openai, model=dall-e-3

Using Cost Tracking in Pipelines

Once configured, cost tracking is automatically enabled for all pipeline steps that use LLM agents.

Accessing Cost Information

Cost and token information is available in pipeline results:

from flujo import Step, Flujo

# Create a pipeline with cost tracking
pipeline = Step.solution(my_agent) >> Step.validate(validator_agent)
runner = Flujo(pipeline)

# Run the pipeline
result = runner.run("Your prompt")

# Access cost information
for step_result in result.step_history:
    print(f"Step: {step_result.name}")
    print(f"  Cost: ${step_result.cost_usd:.4f}")
    print(f"  Tokens: {step_result.token_counts}")
    print(f"  Success: {step_result.success}")

Setting Usage Limits

You can set cost and token limits to prevent excessive spending:

from flujo import Flujo, Step, UsageLimits

# Define usage limits
limits = UsageLimits(
    total_cost_usd_limit=1.0,    # Maximum $1.00 total cost
    total_tokens_limit=5000       # Maximum 5,000 tokens
)

# Apply limits to pipeline
runner = Flujo(pipeline, usage_limits=limits)

try:
    result = runner.run("Your prompt")
except UsageLimitExceededError as e:
    print(f"Pipeline failed due to usage limits: {e}")

Cost Tracking with Different Model Types

Flujo automatically handles different types of models:

Chat Models (Token-based)

# GPT-4, Claude, etc. - cost calculated from tokens
chat_agent = make_agent_async("openai:gpt-4o", "You are helpful", str)

Image Models (Unit-based)

# DALL-E 3 - cost calculated per image
image_agent = make_agent_async("openai:dall-e-3", "Generate images", str)

Embedding Models (Token-based)

# Text embeddings - cost calculated from tokens
embedding_agent = make_agent_async("openai:text-embedding-3-large", "Embed text", str)

Advanced Features

Explicit Cost Reporting

For custom operations, you can implement the ExplicitCostReporter protocol:

class CustomImageResult:
    def __init__(self, cost_usd: float, token_counts: int = 0):
        self.cost_usd = cost_usd
        self.token_counts = token_counts

# This object will automatically be recognized for cost tracking
result = CustomImageResult(cost_usd=0.25, token_counts=0)

Cost Tracking in Complex Pipelines

Cost tracking works seamlessly in complex pipeline scenarios:

from flujo import Step, Flujo, UsageLimits

# Create agents for different tasks
chat_agent = make_agent_async("openai:gpt-4o", "You are helpful", str)
image_agent = make_agent_async("openai:dall-e-3", "Generate images", str)
validator_agent = make_agent_async("openai:gpt-4o", "Validate responses", str)

# Create a complex pipeline
pipeline = (
    Step.solution(chat_agent) >> 
    Step.validate(validator_agent) >> 
    Step.reflect(image_agent)
)

# Set usage limits
limits = UsageLimits(total_cost_usd_limit=2.0, total_tokens_limit=10000)
runner = Flujo(pipeline, usage_limits=limits)

# Run the pipeline
result = runner.run("Complex task with multiple model types")

# Analyze costs
total_cost = sum(step.cost_usd for step in result.step_history)
total_tokens = sum(step.token_counts for step in result.step_history)

print(f"Total cost: ${total_cost:.4f}")
print(f"Total tokens: {total_tokens}")

Troubleshooting

Common Issues

No Cost Reported: Check that your model has pricing configured in flujo.toml
Incorrect Costs: Verify pricing values match the current provider rates
Missing Image Costs: Ensure image models have the correct pricing fields configured
Strict Mode Failures: Add explicit pricing for all models used in your pipeline

Debugging Cost Calculation

Enable debug logging to see cost calculation details:

import logging
logging.basicConfig(level=logging.INFO)

# Run your pipeline and check the logs for cost calculation details

Testing Cost Configuration

Use the provided examples to test your cost configuration:

# Test basic cost tracking
python examples/cost_tracking_demo.py

# Test image cost tracking
python examples/image_cost_tracking_demo.py