Build Your Own AI File Editor
A practical guide with code examples. Implement fallback cascades, matching algorithms, and the verification loop that makes AI file editing robust.
The Golden Rule: Never Fail on First Try
LLMs are non-deterministic. They will hallucinate whitespace, forget context, and misquote code. Your file editing system must handle these variations gracefully.
The Fallback Ladder
Implement your editing logic as a waterfall. Each tier is a fallback for the previous one:
- Exact Match β Fast, safe, preferred
- Whitespace Flexible β Handle indentation differences
- Anchor Matching β Match first/last lines, fuzzy middle
- Diff/Patch β Use diff-match-patch libraries
- Full Overwrite β Nuclear option, always works
Alternative approach β Hash-Backed Anchors: To eliminate the fallback cascade entirely, move the model onto an anchor language. Dirac does this with persistent word anchors; Oh My Pi does it with compact line-plus-hash markers. In both cases the model references stable anchors directly instead of describing code to find.
Step 1: Define Your Tools
Provide your AI with two distinct tools: one for surgical edits, one for full rewrites.
Tool A: replace_in_file (Preferred)
<replace_in_file>
<path>src/app.py</path>
<diff>
<<<< SEARCH
def calculate_total(items):
return sum(item.price for item in items)
====
def calculate_total(items):
subtotal = sum(item.price for item in items)
tax = subtotal * 0.1
return subtotal + tax
>>>> REPLACE
</diff>
</replace_in_file>
Tool B: write_to_file (Fallback / Creation)
<write_to_file>
<path>src/utils.py</path>
<content>
def helper():
"""Helper function."""
return True
</content>
</write_to_file>
Why XML Over JSON?
Code inside JSON strings requires escaping (\",
\n, \\). This is error-prone for LLMs
and wastes tokens. XML lets you embed code content directly
without escaping.
Step 2: Implement Matching Algorithms
Tier 1: Exact Match
The happy path. Fast and reliable when it works.
def exact_match(file_content: str, search: str, replace: str) -> str | None:
"""Try exact string replacement."""
if search in file_content:
return file_content.replace(search, replace, 1) # Replace first occurrence
return None # Failed, try next tier
Tier 2: Whitespace Flexible
Normalize whitespace before matching, but preserve original indentation in output.
def whitespace_flexible_match(file_content: str, search: str, replace: str) -> str | None:
"""Match after normalizing whitespace."""
file_lines = file_content.split('\n')
search_lines = search.split('\n')
# Trim each line for comparison
search_trimmed = [line.strip() for line in search_lines]
for i in range(len(file_lines) - len(search_lines) + 1):
file_chunk = file_lines[i:i + len(search_lines)]
file_trimmed = [line.strip() for line in file_chunk]
if file_trimmed == search_trimmed:
# Match found! Calculate indentation from original
base_indent = len(file_lines[i]) - len(file_lines[i].lstrip())
indent = ' ' * base_indent
# Apply indentation to replacement
replace_lines = replace.split('\n')
indented_replace = [indent + line if line.strip() else line
for line in replace_lines]
# Reconstruct file
return '\n'.join(
file_lines[:i] +
indented_replace +
file_lines[i + len(search_lines):]
)
return None # Failed, try next tier
Tier 3: Anchor Matching
Match first and last lines as "anchors", fuzzy-match the middle. Based on OpenCode/Cline.
def anchor_match(file_content: str, search: str, replace: str,
similarity_threshold: float = 0.5) -> str | None:
"""Match using first/last lines as anchors."""
file_lines = file_content.split('\n')
search_lines = search.split('\n')
if len(search_lines) < 3:
return None # Need at least 3 lines for anchor matching
start_anchor = search_lines[0].strip()
end_anchor = search_lines[-1].strip()
expected_length = len(search_lines)
# Find start anchor
for i, line in enumerate(file_lines):
if line.strip() != start_anchor:
continue
# Look for end anchor within reasonable range
for j in range(i + 2, min(i + expected_length * 2, len(file_lines))):
if file_lines[j].strip() != end_anchor:
continue
# Found potential match. Verify middle similarity.
file_middle = file_lines[i+1:j]
search_middle = search_lines[1:-1]
if calculate_similarity(file_middle, search_middle) >= similarity_threshold:
# Match confirmed! Replace the block.
replace_lines = replace.split('\n')
return '\n'.join(
file_lines[:i] +
replace_lines +
file_lines[j+1:]
)
return None # Failed, try next tier
def calculate_similarity(lines_a: list, lines_b: list) -> float:
"""Calculate similarity between two line lists (0.0 to 1.0)."""
if not lines_a or not lines_b:
return 0.0
# Simple token-based Jaccard similarity
tokens_a = set(' '.join(lines_a).split())
tokens_b = set(' '.join(lines_b).split())
intersection = len(tokens_a & tokens_b)
union = len(tokens_a | tokens_b)
return intersection / union if union > 0 else 0.0
Tier 4: Diff-Match-Patch
Use Google's diff-match-patch library for fuzzy patching.
from diff_match_patch import diff_match_patch
def dmp_match(file_content: str, search: str, replace: str) -> str | None:
"""Use diff-match-patch for fuzzy matching."""
dmp = diff_match_patch()
# Create a patch from search β replace
patches = dmp.patch_make(search, replace)
# Try to apply it to the file
result, success = dmp.patch_apply(patches, file_content)
if all(success):
return result
return None # Some patches failed
The Master Function
Chain all tiers together:
def apply_edit(file_content: str, search: str, replace: str) -> tuple[str, str]:
"""Apply edit with fallback cascade. Returns (new_content, method_used)."""
# Tier 1: Exact match
result = exact_match(file_content, search, replace)
if result is not None:
return result, "exact_match"
# Tier 2: Whitespace flexible
result = whitespace_flexible_match(file_content, search, replace)
if result is not None:
return result, "whitespace_flexible"
# Tier 3: Anchor matching
result = anchor_match(file_content, search, replace)
if result is not None:
return result, "anchor_match"
# Tier 4: Diff-match-patch
result = dmp_match(file_content, search, replace)
if result is not None:
return result, "diff_match_patch"
# All tiers failed
raise EditFailedError(
f"Could not match search block. "
f"Tried: exact, whitespace, anchor, dmp. "
f"Search block:\n{search[:200]}..."
)
Alternative Path: Strict Exact-Match Contracts
You do not always need a fallback ladder. ADK-Rust shows the opposite design: keep the editor contract sharp, fail immediately on ambiguous matches, and let the model retry with better context.
def strict_replace_once(file_content: str, old: str, new: str) -> str:
matches = file_content.count(old)
if matches == 0:
raise EditFailedError("old_str not found in file")
if matches > 1:
raise EditFailedError(
"old_str appears multiple times; provide a more specific match"
)
return file_content.replace(old, new, 1)
When this works well
Frameworks that wrap provider-native tools often prefer this contract. It reduces local complexity, avoids risky guesses, and gives the model a crisp error to recover from.
Alternative Path: Hashline Anchors
If you want something more drift-resistant than exact old/new text, but lighter than a persistent anchor database, Oh My Pi's hashline pattern is a strong middle path. Read the file in an anchored format, let the model target those anchors, and allow a small local rebase window before you declare failure.
BIGRAMS = load_anchor_bigrams() # 647 compact tokens
def hashline_for(line_no: int, text: str) -> str:
normalized = significant_content(text)
short_hash = xxhash32(normalized) % len(BIGRAMS)
return f"{line_no}~{BIGRAMS[short_hash]}|{text}"
def apply_hashline_edit(lines: list[str], anchor: str, new_text: str) -> list[str]:
line_no, short_hash, _original = parse_hashline(anchor)
candidate_indexes = nearby_indexes(line_no - 1, radius=5)
for idx in candidate_indexes:
if hashline_for(idx + 1, lines[idx]).split("|", 1)[0] == f"{idx + 1}~{short_hash}":
updated = lines[:]
updated[idx] = new_text
return updated
raise EditFailedError("hashline anchor no longer matches nearby lines")
When this works well
Use hashline anchors when your model often misquotes code or your files are moving under active edits, but you still want a compact text protocol instead of a full AST or persistent anchor store.
Step 3: The Verification Loop
Don't assume an edit works. Validate it immediately using LSP or a linter.
import subprocess
import json
def verify_edit(file_path: str) -> list[dict]:
"""Run linter and return any errors."""
# Example: Use pylint for Python files
if file_path.endswith('.py'):
result = subprocess.run(
['pylint', '--output-format=json', file_path],
capture_output=True, text=True
)
if result.stdout:
return json.loads(result.stdout)
# Example: Use eslint for JS/TS
elif file_path.endswith(('.js', '.ts', '.jsx', '.tsx')):
result = subprocess.run(
['eslint', '--format=json', file_path],
capture_output=True, text=True
)
if result.stdout:
data = json.loads(result.stdout)
return data[0].get('messages', []) if data else []
return []
def apply_and_verify(file_path: str, search: str, replace: str) -> str:
"""Apply edit and return result with diagnostics."""
# Read current file
with open(file_path, 'r') as f:
content = f.read()
# Apply edit
new_content, method = apply_edit(content, search, replace)
# Write file
with open(file_path, 'w') as f:
f.write(new_content)
# Verify
errors = verify_edit(file_path)
# Build response
response = f"Edit applied successfully using {method}."
if errors:
response += "\n\n<file_diagnostics>\n"
for err in errors:
line = err.get('line', '?')
msg = err.get('message', str(err))
response += f"Line {line}: {msg}\n"
response += "</file_diagnostics>"
return response
When the AI sees syntax errors in the response, it can self-correct without waiting for the user to run the code.
Step 4: Design Good Error Messages
When an edit fails, give the AI enough context to self-correct.
def format_edit_error(file_path: str, search: str, actual_content: str) -> str:
"""Format a helpful error message for the AI."""
# Find similar content in the file
search_first_line = search.split('\n')[0].strip()
similar_lines = []
for i, line in enumerate(actual_content.split('\n'), 1):
if search_first_line[:20] in line:
# Found potential match location
context_start = max(0, i - 3)
context_end = min(len(actual_content.split('\n')), i + 5)
similar_lines.append((i, actual_content.split('\n')[context_start:context_end]))
error_msg = f"""
# SEARCH block failed to match!
The following SEARCH block was not found in {file_path}:
```
{search[:500]}{'...' if len(search) > 500 else ''}
```
"""
if similar_lines:
error_msg += "## Did you mean one of these sections?\n\n"
for line_num, context in similar_lines[:3]:
error_msg += f"### Near line {line_num}:\n```\n"
error_msg += '\n'.join(context)
error_msg += "\n```\n\n"
else:
error_msg += """
## No similar content found.
Consider:
1. Use `read_file` to get the current file contents
2. Check if the file path is correct
3. If the file has changed, your context may be stale
"""
error_msg += """
## Remember:
- SEARCH must match EXACTLY (character-for-character)
- Include all whitespace, comments, and indentation
- Use 2-3 lines of context before and after for uniqueness
"""
return error_msg
Step 5: Write Your System Prompt
Here's a template based on the patterns we've seen in production agents:
You are an AI coding assistant with file editing capabilities.
## Available Tools
### replace_in_file
Make targeted edits to existing files using SEARCH/REPLACE blocks.
**Format:**
```xml
<replace_in_file>
<path>relative/path/to/file</path>
<diff>
<<<< SEARCH
[exact content to find]
====
[new content to replace with]
>>>> REPLACE
</diff>
</replace_in_file>
```
**Critical Rules:**
1. SEARCH must match EXACTLY (character-for-character, including whitespace)
2. Include 2-3 lines of context before/after for unique matching
3. Multiple SEARCH/REPLACE blocks must appear in file order
4. To delete code: Leave REPLACE section empty
5. To add code: Include surrounding context in SEARCH, add new lines in REPLACE
### write_to_file
Create new files or completely rewrite existing files.
**Format:**
```xml
<write_to_file>
<path>relative/path/to/file</path>
<content>
[entire file content]
</content>
</write_to_file>
```
**When to use:**
- Creating new files
- File is small (< 50 lines) and changing most of it
- replace_in_file has failed 3+ times
## Workflow
1. **Before editing:** Use read_file to get current content
2. **Prefer replace_in_file:** It's safer and more precise
3. **After editing:** Check the response for <file_diagnostics>
4. **If diagnostics show errors:** Fix them immediately
5. **If edit fails:** Re-read the file and try again with exact content
6. **After 3 failures:** Fall back to write_to_file
## Anti-Patterns
- NEVER use write_to_file on existing files without reading them first
- NEVER guess at file contentβread it first
- NEVER use placeholder comments like "... rest of code ..."
Architecture Overview
Extract path, search/replace content from AI output
Get actual content for matching
Try exact β whitespace β anchor β dmp
Save changes to disk
Check for syntax errors immediately
Success with any errors, or failure with suggestions
Best Practices Checklist
β Do
- Implement multiple fallback tiers
- Verify edits with LSP/linter
- Return helpful error messages
- Use 1-indexed line numbers for AI
- Support both surgical and full-file edits
- Handle Unicode normalization (smart quotes, em-dashes)
- Show diff previews before writing
- Track edit history for undo
β Don't
- Pass code inside JSON strings (escaping nightmare)
- Trust AI whitespace to be perfect
- Fail silentlyβalways explain what went wrong
- Skip verification after edits
- Allow overwrites without reading first
- Use 0-indexed lines in AI prompts
- Ignore model-specific quirks
Final Summary: What We Learned
| Concept | Recommendation | Source Inspiration |
|---|---|---|
| Primary edit method | Search/Replace with fallbacks | Cline, Aider, OpenCode |
| Fallback depth | 4-5 tiers minimum | OpenCode (9), Aider (5) |
| Output format | XML or custom markers | Cline (XML), Codex (custom) |
| Error handling | Suggest similar content + recovery path | Aider's error templates |
| Verification | LSP/linter check after every edit | OpenCode's diagnostics |
| User approval | Diff preview with configurable auto-approve | Grok CLI's confirmation system |
| Multi-file edits | Patch format or sequential operations | Codex patch syntax |
| Unicode handling | Normalize smart quotes, dashes, whitespace | Codex's seek_sequence |
Ready to Build?
You now have the patterns, code examples, and best practices from the top AI coding agents. Go build something amazing.