Agentic refinement
The core of this example is an agentic refinement loop: generate content, critique it, revise based on feedback, and repeat until quality meets a threshold. This pattern is fundamental to building self-improving AI systems.
The agentic pattern
Traditional pipelines are linear: input → process → output. Agentic workflows are iterative: they evaluate their own output and improve it through multiple cycles.
flowchart TD
A[Generate] --> B[Critique]
B -->|score >= threshold| C[Done]
B -->|score < threshold| D[Revise]
D --> B
Critique function
The critique function evaluates the current draft and returns structured feedback.
It’s a traced function (not a separate task) that runs inside refine_report:
@flyte.trace
async def critique_content(draft: str) -> Critique:
"""
Critique the current draft and return structured feedback.
Uses Pydantic models to parse the LLM's JSON response into
a typed object for reliable downstream processing.
Args:
draft: The current draft to critique
Returns:
Structured critique with score, strengths, and improvements
"""
print("Critiquing current draft...")
response = await call_llm(
f"Please critique the following report:\n\n{draft}",
CRITIC_SYSTEM_PROMPT,
json_mode=True,
)
# Parse the JSON response into our Pydantic model
critique_data = json.loads(response)
critique = Critique(**critique_data)
print(f"Critique score: {critique.score}/10")
print(f"Strengths: {len(critique.strengths)}, Improvements: {len(critique.improvements)}")
return critique
Key points:
- Uses
json_mode=Trueto ensure the LLM returns valid JSON - Parses the response into a Pydantic
Critiquemodel - Returns a typed object for reliable downstream processing
@flyte.traceprovides checkpointing—if the task retries, completed critiques aren’t re-run
Revise function
The revise function takes the current draft and specific improvements to address:
@flyte.trace
async def revise_content(draft: str, improvements: list[str]) -> str:
"""
Revise the draft based on critique feedback.
Args:
draft: The current draft to revise
improvements: List of specific improvements to address
Returns:
The revised draft
"""
print(f"Revising draft to address {len(improvements)} improvements...")
improvements_text = "\n".join(f"- {imp}" for imp in improvements)
prompt = f"""Please revise the following report to address these improvements:
IMPROVEMENTS NEEDED:
{improvements_text}
CURRENT DRAFT:
{draft}"""
revised = await call_llm(prompt, REVISER_SYSTEM_PROMPT)
print(f"Revision complete ({len(revised)} characters)")
return revised
The prompt includes:
- The list of improvements from the critique
- The current draft to revise
This focused approach helps the LLM make targeted changes rather than rewriting from scratch.
The refinement loop
The refine_report task orchestrates the iterative refinement. It runs in the
reusable llm_env because it makes multiple LLM calls through traced functions:
@llm_env.task(retries=3)
async def refine_report(
topic: str,
max_iterations: int = 3,
quality_threshold: int = 8,
) -> str:
"""
Iteratively refine a report until it meets the quality threshold.
This task runs in a reusable container because it makes multiple LLM calls
in a loop. The traced helper functions provide checkpointing, so if the
task fails mid-loop, completed LLM calls won't be re-run on retry.
Args:
topic: The topic to write about
max_iterations: Maximum refinement cycles (default: 3)
quality_threshold: Minimum score to accept (default: 8)
Returns:
The final refined report
"""
# Generate initial draft
draft = await generate_initial_draft(topic)
# Iterative refinement loop
for i in range(max_iterations):
with flyte.group(f"refinement_{i + 1}"):
# Get critique
critique = await critique_content(draft)
# Check if we've met the quality threshold
if critique.score >= quality_threshold:
print(f"Quality threshold met at iteration {i + 1}!")
print(f"Final score: {critique.score}/10")
break
# Revise based on feedback
print(f"Score {critique.score} < {quality_threshold}, revising...")
draft = await revise_content(draft, critique.improvements)
else:
print(f"Reached max iterations ({max_iterations})")
return draft
How it works
- Generate initial draft: Creates the first version of the report
- Enter refinement loop: Iterates up to
max_iterationstimes - Critique: Evaluates the current draft and assigns a score
- Check threshold: If score meets
quality_threshold, exit early - Revise: If below threshold, revise based on improvements
- Repeat: Continue until threshold met or iterations exhausted
All the LLM calls (generate, critique, revise) are traced functions inside this single task. This keeps the task graph simple while the reusable container handles the actual LLM work efficiently.
Early exit
The if critique.score >= quality_threshold: break pattern enables early exit
when quality is sufficient. This saves compute costs and time—no need to run
all iterations if the first draft is already good.
Grouping iterations with flyte.group
Each refinement iteration is wrapped in flyte.group:
for i in range(max_iterations):
with flyte.group(f"refinement_{i + 1}"):
critique = await critique_content(draft)
# ...Why use flyte.group?
Groups provide hierarchical organization in the Flyte UI. Since critique and revise are traced functions (not separate tasks), groups help organize them:
refine_report
├── generate_initial_draft (traced)
├── refinement_1
│ ├── critique_content (traced)
│ └── revise_content (traced)
├── refinement_2
│ ├── critique_content (traced)
│ └── revise_content (traced)
└── [returns refined report]Benefits:
- Clarity: See exactly how many iterations occurred
- Debugging: Quickly find which iteration had issues
- Observability: Track time spent in each refinement cycle
Group context
Groups are implemented as context managers. All traced calls and nested groups
within the with flyte.group(...) block are associated with that group.
Configuring the loop
The refinement loop accepts parameters to tune its behavior:
| Parameter | Default | Description |
|---|---|---|
max_iterations |
3 | Upper bound on refinement cycles |
quality_threshold |
8 | Minimum score (1-10) to accept |
Choosing thresholds
- Higher threshold (9-10): More refinement cycles, higher quality, more API costs
- Lower threshold (6-7): Faster completion, may accept lower quality
- More iterations: Safety net for difficult topics
- Fewer iterations: Cost control, faster turnaround
A good starting point is quality_threshold=8 with max_iterations=3. Adjust
based on your quality requirements and budget.
Best practices for agentic loops
-
Always set max iterations: Prevent infinite loops if the quality threshold is never reached.
-
Use structured critiques: Pydantic models ensure you can reliably extract the score and improvements from LLM responses.
-
Log iteration progress: Print statements help debug when reviewing logs:
print(f"Iteration {i + 1}: score={critique.score}") -
Consider diminishing returns: After 3-4 iterations, improvements often become marginal. Set
max_iterationsaccordingly. -
Use groups for observability:
flyte.groupmakes the iterative nature visible in the UI, essential for debugging and monitoring.
Next steps
With the agentic refinement loop complete, learn how to generate multiple outputs in parallel.