Agentic refinement

The core of this example is an agentic refinement loop: generate content, critique it, revise based on feedback, and repeat until quality meets a threshold. This pattern is fundamental to building self-improving AI systems.

The agentic pattern

Traditional pipelines are linear: input → process → output. Agentic workflows are iterative: they evaluate their own output and improve it through multiple cycles.

        flowchart TD
    A[Generate] --> B[Critique]
    B -->|score >= threshold| C[Done]
    B -->|score < threshold| D[Revise]
    D --> B
    

Critique function

The critique function evaluates the current draft and returns structured feedback. It’s a traced function (not a separate task) that runs inside refine_report:

generate.py
@flyte.trace
async def critique_content(draft: str) -> Critique:
    """
    Critique the current draft and return structured feedback.

    Uses Pydantic models to parse the LLM's JSON response into
    a typed object for reliable downstream processing.

    Args:
        draft: The current draft to critique

    Returns:
        Structured critique with score, strengths, and improvements
    """
    print("Critiquing current draft...")

    response = await call_llm(
        f"Please critique the following report:\n\n{draft}",
        CRITIC_SYSTEM_PROMPT,
        json_mode=True,
    )

    # Parse the JSON response into our Pydantic model
    critique_data = json.loads(response)
    critique = Critique(**critique_data)

    print(f"Critique score: {critique.score}/10")
    print(f"Strengths: {len(critique.strengths)}, Improvements: {len(critique.improvements)}")

    return critique

Key points:

  • Uses json_mode=True to ensure the LLM returns valid JSON
  • Parses the response into a Pydantic Critique model
  • Returns a typed object for reliable downstream processing
  • @flyte.trace provides checkpointing—if the task retries, completed critiques aren’t re-run

Revise function

The revise function takes the current draft and specific improvements to address:

generate.py
@flyte.trace
async def revise_content(draft: str, improvements: list[str]) -> str:
    """
    Revise the draft based on critique feedback.

    Args:
        draft: The current draft to revise
        improvements: List of specific improvements to address

    Returns:
        The revised draft
    """
    print(f"Revising draft to address {len(improvements)} improvements...")

    improvements_text = "\n".join(f"- {imp}" for imp in improvements)
    prompt = f"""Please revise the following report to address these improvements:

IMPROVEMENTS NEEDED:
{improvements_text}

CURRENT DRAFT:
{draft}"""

    revised = await call_llm(prompt, REVISER_SYSTEM_PROMPT)

    print(f"Revision complete ({len(revised)} characters)")
    return revised

The prompt includes:

  1. The list of improvements from the critique
  2. The current draft to revise

This focused approach helps the LLM make targeted changes rather than rewriting from scratch.

The refinement loop

The refine_report task orchestrates the iterative refinement. It runs in the reusable llm_env because it makes multiple LLM calls through traced functions:

generate.py
@llm_env.task(retries=3)
async def refine_report(
    topic: str,
    max_iterations: int = 3,
    quality_threshold: int = 8,
) -> str:
    """
    Iteratively refine a report until it meets the quality threshold.

    This task runs in a reusable container because it makes multiple LLM calls
    in a loop. The traced helper functions provide checkpointing, so if the
    task fails mid-loop, completed LLM calls won't be re-run on retry.

    Args:
        topic: The topic to write about
        max_iterations: Maximum refinement cycles (default: 3)
        quality_threshold: Minimum score to accept (default: 8)

    Returns:
        The final refined report
    """
    # Generate initial draft
    draft = await generate_initial_draft(topic)

    # Iterative refinement loop
    for i in range(max_iterations):
        with flyte.group(f"refinement_{i + 1}"):
            # Get critique
            critique = await critique_content(draft)

            # Check if we've met the quality threshold
            if critique.score >= quality_threshold:
                print(f"Quality threshold met at iteration {i + 1}!")
                print(f"Final score: {critique.score}/10")
                break

            # Revise based on feedback
            print(f"Score {critique.score} < {quality_threshold}, revising...")
            draft = await revise_content(draft, critique.improvements)
    else:
        print(f"Reached max iterations ({max_iterations})")

    return draft

How it works

  1. Generate initial draft: Creates the first version of the report
  2. Enter refinement loop: Iterates up to max_iterations times
  3. Critique: Evaluates the current draft and assigns a score
  4. Check threshold: If score meets quality_threshold, exit early
  5. Revise: If below threshold, revise based on improvements
  6. Repeat: Continue until threshold met or iterations exhausted

All the LLM calls (generate, critique, revise) are traced functions inside this single task. This keeps the task graph simple while the reusable container handles the actual LLM work efficiently.

Early exit

The if critique.score >= quality_threshold: break pattern enables early exit when quality is sufficient. This saves compute costs and time—no need to run all iterations if the first draft is already good.

Grouping iterations with flyte.group

Each refinement iteration is wrapped in flyte.group:

for i in range(max_iterations):
    with flyte.group(f"refinement_{i + 1}"):
        critique = await critique_content(draft)
        # ...

Why use flyte.group?

Groups provide hierarchical organization in the Flyte UI. Since critique and revise are traced functions (not separate tasks), groups help organize them:

refine_report
├── generate_initial_draft (traced)
├── refinement_1
│   ├── critique_content (traced)
│   └── revise_content (traced)
├── refinement_2
│   ├── critique_content (traced)
│   └── revise_content (traced)
└── [returns refined report]

Benefits:

  • Clarity: See exactly how many iterations occurred
  • Debugging: Quickly find which iteration had issues
  • Observability: Track time spent in each refinement cycle

Group context

Groups are implemented as context managers. All traced calls and nested groups within the with flyte.group(...) block are associated with that group.

Configuring the loop

The refinement loop accepts parameters to tune its behavior:

Parameter Default Description
max_iterations 3 Upper bound on refinement cycles
quality_threshold 8 Minimum score (1-10) to accept

Choosing thresholds

  • Higher threshold (9-10): More refinement cycles, higher quality, more API costs
  • Lower threshold (6-7): Faster completion, may accept lower quality
  • More iterations: Safety net for difficult topics
  • Fewer iterations: Cost control, faster turnaround

A good starting point is quality_threshold=8 with max_iterations=3. Adjust based on your quality requirements and budget.

Best practices for agentic loops

  1. Always set max iterations: Prevent infinite loops if the quality threshold is never reached.

  2. Use structured critiques: Pydantic models ensure you can reliably extract the score and improvements from LLM responses.

  3. Log iteration progress: Print statements help debug when reviewing logs:

    print(f"Iteration {i + 1}: score={critique.score}")
  4. Consider diminishing returns: After 3-4 iterations, improvements often become marginal. Set max_iterations accordingly.

  5. Use groups for observability: flyte.group makes the iterative nature visible in the UI, essential for debugging and monitoring.

Next steps

With the agentic refinement loop complete, learn how to generate multiple outputs in parallel.