Agentic refinement | Union.ai Docs

The core of this example is an agentic refinement loop: generate content, critique it, revise based on feedback, and repeat until quality meets a threshold. This pattern is fundamental to building self-improving AI systems.

The agentic pattern

Traditional pipelines are linear: input → process → output. Agentic workflows are iterative: they evaluate their own output and improve it through multiple cycles.

        flowchart TD
    A[Generate] --> B[Critique]
    B -->|score >= threshold| C[Done]
    B -->|score < threshold| D[Revise]
    D --> B

Critique function

The critique function evaluates the current draft and returns structured feedback. It’s a traced function (not a separate task) that runs inside refine_report:

generate.py

@flyte.trace
async def critique_content(draft: str) -> Critique:
    """
    Critique the current draft and return structured feedback.

    Uses Pydantic models to parse the LLM's JSON response into
    a typed object for reliable downstream processing.

    Args:
        draft: The current draft to critique

    Returns:
        Structured critique with score, strengths, and improvements
    """
    print("Critiquing current draft...")

    response = await call_llm(
        f"Please critique the following report:\n\n{draft}",
        CRITIC_SYSTEM_PROMPT,
        json_mode=True,
    )

    # Parse the JSON response into our Pydantic model
    critique_data = json.loads(response)
    critique = Critique(**critique_data)

    print(f"Critique score: {critique.score}/10")
    print(f"Strengths: {len(critique.strengths)}, Improvements: {len(critique.improvements)}")

    return critique

Key points:

Uses json_mode=True to ensure the LLM returns valid JSON
Parses the response into a Pydantic Critique model
Returns a typed object for reliable downstream processing
@flyte.trace provides checkpointing—if the task retries, completed critiques aren’t re-run

Revise function

The revise function takes the current draft and specific improvements to address:

generate.py

@flyte.trace
async def revise_content(draft: str, improvements: list[str]) -> str:
    """
    Revise the draft based on critique feedback.

    Args:
        draft: The current draft to revise
        improvements: List of specific improvements to address

    Returns:
        The revised draft
    """
    print(f"Revising draft to address {len(improvements)} improvements...")

    improvements_text = "\n".join(f"- {imp}" for imp in improvements)
    prompt = f"""Please revise the following report to address these improvements:

IMPROVEMENTS NEEDED:
{improvements_text}

CURRENT DRAFT:
{draft}"""

    revised = await call_llm(prompt, REVISER_SYSTEM_PROMPT)

    print(f"Revision complete ({len(revised)} characters)")
    return revised

The prompt includes:

The list of improvements from the critique
The current draft to revise

This focused approach helps the LLM make targeted changes rather than rewriting from scratch.

The refine_report task orchestrates the iterative refinement. It runs in the reusable llm_env because it makes multiple LLM calls through traced functions:

generate.py

@llm_env.task(retries=3)
async def refine_report(
    topic: str,
    max_iterations: int = 3,
    quality_threshold: int = 8,
) -> str:
    """
    Iteratively refine a report until it meets the quality threshold.

    This task runs in a reusable container because it makes multiple LLM calls
    in a loop. The traced helper functions provide checkpointing, so if the
    task fails mid-loop, completed LLM calls won't be re-run on retry.

    Args:
        topic: The topic to write about
        max_iterations: Maximum refinement cycles (default: 3)
        quality_threshold: Minimum score to accept (default: 8)

    Returns:
        The final refined report
    """
    # Generate initial draft
    draft = await generate_initial_draft(topic)

    # Iterative refinement loop
    for i in range(max_iterations):
        with flyte.group(f"refinement_{i + 1}"):
            # Get critique
            critique = await critique_content(draft)

            # Check if we've met the quality threshold
            if critique.score >= quality_threshold:
                print(f"Quality threshold met at iteration {i + 1}!")
                print(f"Final score: {critique.score}/10")
                break

            # Revise based on feedback
            print(f"Score {critique.score} < {quality_threshold}, revising...")
            draft = await revise_content(draft, critique.improvements)
    else:
        print(f"Reached max iterations ({max_iterations})")

    return draft

How it works

Generate initial draft: Creates the first version of the report
Enter refinement loop: Iterates up to max_iterations times
Critique: Evaluates the current draft and assigns a score
Check threshold: If score meets quality_threshold, exit early
Revise: If below threshold, revise based on improvements
Repeat: Continue until threshold met or iterations exhausted

All the LLM calls (generate, critique, revise) are traced functions inside this single task. This keeps the task graph simple while the reusable container handles the actual LLM work efficiently.

Early exit

The if critique.score >= quality_threshold: break pattern enables early exit when quality is sufficient. This saves compute costs and time—no need to run all iterations if the first draft is already good.

Grouping iterations with flyte.group

Each refinement iteration is wrapped in flyte.group:

for i in range(max_iterations):
    with flyte.group(f"refinement_{i + 1}"):
        critique = await critique_content(draft)
        # ...

Why use flyte.group?

Groups provide hierarchical organization in the Flyte UI. Since critique and revise are traced functions (not separate tasks), groups help organize them:

            
        
refine_report
├── generate_initial_draft (traced)
├── refinement_1
│   ├── critique_content (traced)
│   └── revise_content (traced)
├── refinement_2
│   ├── critique_content (traced)
│   └── revise_content (traced)
└── [returns refined report]

Benefits:

Clarity: See exactly how many iterations occurred
Debugging: Quickly find which iteration had issues
Observability: Track time spent in each refinement cycle

Group context

Groups are implemented as context managers. All traced calls and nested groups within the with flyte.group(...) block are associated with that group.

Configuring the loop

The refinement loop accepts parameters to tune its behavior:

Parameter	Default	Description
`max_iterations`	3	Upper bound on refinement cycles
`quality_threshold`	8	Minimum score (1-10) to accept

Choosing thresholds

Higher threshold (9-10): More refinement cycles, higher quality, more API costs
Lower threshold (6-7): Faster completion, may accept lower quality
More iterations: Safety net for difficult topics
Fewer iterations: Cost control, faster turnaround

A good starting point is quality_threshold=8 with max_iterations=3. Adjust based on your quality requirements and budget.

Best practices for agentic loops

Always set max iterations: Prevent infinite loops if the quality threshold is never reached.
Use structured critiques: Pydantic models ensure you can reliably extract the score and improvements from LLM responses.
Log iteration progress: Print statements help debug when reviewing logs:
```
print(f"Iteration {i + 1}: score={critique.score}")
```
Consider diminishing returns: After 3-4 iterations, improvements often become marginal. Set max_iterations accordingly.
Use groups for observability: flyte.group makes the iterative nature visible in the UI, essential for debugging and monitoring.

Next steps

With the agentic refinement loop complete, learn how to generate multiple outputs in parallel.