Build Your Own AI File Editor

A practical guide with code examples. Implement fallback cascades, matching algorithms, and the verification loop that makes AI file editing robust.

The Golden Rule: Never Fail on First Try

LLMs are non-deterministic. They will hallucinate whitespace, forget context, and misquote code. Your file editing system must handle these variations gracefully.

🔑

The Fallback Ladder

Implement your editing logic as a waterfall. Each tier is a fallback for the previous one:

Exact Match — Fast, safe, preferred
Whitespace Flexible — Handle indentation differences
Anchor Matching — Match first/last lines, fuzzy middle
Diff/Patch — Use diff-match-patch libraries
Full Overwrite — Nuclear option, always works

Alternative approach — Hash-Backed Anchors: To eliminate the fallback cascade entirely, move the model onto an anchor language. Dirac does this with persistent word anchors; Oh My Pi does it with compact line-plus-hash markers. In both cases the model references stable anchors directly instead of describing code to find.

Step 1: Define Your Tools

Provide your AI with two distinct tools: one for surgical edits, one for full rewrites.

Tool A: replace_in_file (Preferred)

XML Format

<replace_in_file>
    <path>src/app.py</path>
    <diff>
        <<<< SEARCH
        def calculate_total(items):
            return sum(item.price for item in items)
        ====
        def calculate_total(items):
            subtotal = sum(item.price for item in items)
            tax = subtotal * 0.1
            return subtotal + tax
        >>>> REPLACE
    </diff>
</replace_in_file>

Tool B: write_to_file (Fallback / Creation)

XML Format

<write_to_file>
    <path>src/utils.py</path>
    <content>
def helper():
    """Helper function."""
    return True
    </content>
</write_to_file>

💡

Why XML Over JSON?

Code inside JSON strings requires escaping (\", \n, \\). This is error-prone for LLMs and wastes tokens. XML lets you embed code content directly without escaping.

Step 2: Implement Matching Algorithms

Tier 1: Exact Match

The happy path. Fast and reliable when it works.

Python

def exact_match(file_content: str, search: str, replace: str) -> str | None:
    """Try exact string replacement."""
    if search in file_content:
        return file_content.replace(search, replace, 1)  # Replace first occurrence
    return None  # Failed, try next tier

Tier 2: Whitespace Flexible

Normalize whitespace before matching, but preserve original indentation in output.

Python

def whitespace_flexible_match(file_content: str, search: str, replace: str) -> str | None:
    """Match after normalizing whitespace."""
    file_lines = file_content.split('\n')
    search_lines = search.split('\n')
    
    # Trim each line for comparison
    search_trimmed = [line.strip() for line in search_lines]
    
    for i in range(len(file_lines) - len(search_lines) + 1):
        file_chunk = file_lines[i:i + len(search_lines)]
        file_trimmed = [line.strip() for line in file_chunk]
        
        if file_trimmed == search_trimmed:
            # Match found! Calculate indentation from original
            base_indent = len(file_lines[i]) - len(file_lines[i].lstrip())
            indent = ' ' * base_indent
            
            # Apply indentation to replacement
            replace_lines = replace.split('\n')
            indented_replace = [indent + line if line.strip() else line 
                               for line in replace_lines]
            
            # Reconstruct file
            return '\n'.join(
                file_lines[:i] + 
                indented_replace + 
                file_lines[i + len(search_lines):]
            )
    
    return None  # Failed, try next tier

Tier 3: Anchor Matching

Match first and last lines as "anchors", fuzzy-match the middle. Based on OpenCode/Cline.

Python

def anchor_match(file_content: str, search: str, replace: str, 
                 similarity_threshold: float = 0.5) -> str | None:
    """Match using first/last lines as anchors."""
    file_lines = file_content.split('\n')
    search_lines = search.split('\n')
    
    if len(search_lines) < 3:
        return None  # Need at least 3 lines for anchor matching
    
    start_anchor = search_lines[0].strip()
    end_anchor = search_lines[-1].strip()
    expected_length = len(search_lines)
    
    # Find start anchor
    for i, line in enumerate(file_lines):
        if line.strip() != start_anchor:
            continue
            
        # Look for end anchor within reasonable range
        for j in range(i + 2, min(i + expected_length * 2, len(file_lines))):
            if file_lines[j].strip() != end_anchor:
                continue
            
            # Found potential match. Verify middle similarity.
            file_middle = file_lines[i+1:j]
            search_middle = search_lines[1:-1]
            
            if calculate_similarity(file_middle, search_middle) >= similarity_threshold:
                # Match confirmed! Replace the block.
                replace_lines = replace.split('\n')
                return '\n'.join(
                    file_lines[:i] + 
                    replace_lines + 
                    file_lines[j+1:]
                )
    
    return None  # Failed, try next tier


def calculate_similarity(lines_a: list, lines_b: list) -> float:
    """Calculate similarity between two line lists (0.0 to 1.0)."""
    if not lines_a or not lines_b:
        return 0.0
    
    # Simple token-based Jaccard similarity
    tokens_a = set(' '.join(lines_a).split())
    tokens_b = set(' '.join(lines_b).split())
    
    intersection = len(tokens_a & tokens_b)
    union = len(tokens_a | tokens_b)
    
    return intersection / union if union > 0 else 0.0

Tier 4: Diff-Match-Patch

Use Google's diff-match-patch library for fuzzy patching.

Python

from diff_match_patch import diff_match_patch

def dmp_match(file_content: str, search: str, replace: str) -> str | None:
    """Use diff-match-patch for fuzzy matching."""
    dmp = diff_match_patch()
    
    # Create a patch from search → replace
    patches = dmp.patch_make(search, replace)
    
    # Try to apply it to the file
    result, success = dmp.patch_apply(patches, file_content)
    
    if all(success):
        return result
    return None  # Some patches failed

The Master Function

Chain all tiers together:

Python

def apply_edit(file_content: str, search: str, replace: str) -> tuple[str, str]:
    """Apply edit with fallback cascade. Returns (new_content, method_used)."""
    
    # Tier 1: Exact match
    result = exact_match(file_content, search, replace)
    if result is not None:
        return result, "exact_match"
    
    # Tier 2: Whitespace flexible
    result = whitespace_flexible_match(file_content, search, replace)
    if result is not None:
        return result, "whitespace_flexible"
    
    # Tier 3: Anchor matching
    result = anchor_match(file_content, search, replace)
    if result is not None:
        return result, "anchor_match"
    
    # Tier 4: Diff-match-patch
    result = dmp_match(file_content, search, replace)
    if result is not None:
        return result, "diff_match_patch"
    
    # All tiers failed
    raise EditFailedError(
        f"Could not match search block. "
        f"Tried: exact, whitespace, anchor, dmp. "
        f"Search block:\n{search[:200]}..."
    )

Alternative Path: Strict Exact-Match Contracts

You do not always need a fallback ladder. ADK-Rust shows the opposite design: keep the editor contract sharp, fail immediately on ambiguous matches, and let the model retry with better context.

Python

def strict_replace_once(file_content: str, old: str, new: str) -> str:
    matches = file_content.count(old)

    if matches == 0:
        raise EditFailedError("old_str not found in file")

    if matches > 1:
        raise EditFailedError(
            "old_str appears multiple times; provide a more specific match"
        )

    return file_content.replace(old, new, 1)

🧭

When this works well

Frameworks that wrap provider-native tools often prefer this contract. It reduces local complexity, avoids risky guesses, and gives the model a crisp error to recover from.

Alternative Path: Hashline Anchors

If you want something more drift-resistant than exact old/new text, but lighter than a persistent anchor database, Oh My Pi's hashline pattern is a strong middle path. Read the file in an anchored format, let the model target those anchors, and allow a small local rebase window before you declare failure.

Python (illustrative)

BIGRAMS = load_anchor_bigrams()  # 647 compact tokens

def hashline_for(line_no: int, text: str) -> str:
    normalized = significant_content(text)
    short_hash = xxhash32(normalized) % len(BIGRAMS)
    return f"{line_no}~{BIGRAMS[short_hash]}|{text}"


def apply_hashline_edit(lines: list[str], anchor: str, new_text: str) -> list[str]:
    line_no, short_hash, _original = parse_hashline(anchor)
    candidate_indexes = nearby_indexes(line_no - 1, radius=5)

    for idx in candidate_indexes:
        if hashline_for(idx + 1, lines[idx]).split("|", 1)[0] == f"{idx + 1}~{short_hash}":
            updated = lines[:]
            updated[idx] = new_text
            return updated

    raise EditFailedError("hashline anchor no longer matches nearby lines")

🧪

When this works well

Use hashline anchors when your model often misquotes code or your files are moving under active edits, but you still want a compact text protocol instead of a full AST or persistent anchor store.

Step 3: The Verification Loop

Don't assume an edit works. Validate it immediately using LSP or a linter.

Python

import subprocess
import json

def verify_edit(file_path: str) -> list[dict]:
    """Run linter and return any errors."""
    
    # Example: Use pylint for Python files
    if file_path.endswith('.py'):
        result = subprocess.run(
            ['pylint', '--output-format=json', file_path],
            capture_output=True, text=True
        )
        if result.stdout:
            return json.loads(result.stdout)
    
    # Example: Use eslint for JS/TS
    elif file_path.endswith(('.js', '.ts', '.jsx', '.tsx')):
        result = subprocess.run(
            ['eslint', '--format=json', file_path],
            capture_output=True, text=True
        )
        if result.stdout:
            data = json.loads(result.stdout)
            return data[0].get('messages', []) if data else []
    
    return []


def apply_and_verify(file_path: str, search: str, replace: str) -> str:
    """Apply edit and return result with diagnostics."""
    
    # Read current file
    with open(file_path, 'r') as f:
        content = f.read()
    
    # Apply edit
    new_content, method = apply_edit(content, search, replace)
    
    # Write file
    with open(file_path, 'w') as f:
        f.write(new_content)
    
    # Verify
    errors = verify_edit(file_path)
    
    # Build response
    response = f"Edit applied successfully using {method}."
    
    if errors:
        response += "\n\n<file_diagnostics>\n"
        for err in errors:
            line = err.get('line', '?')
            msg = err.get('message', str(err))
            response += f"Line {line}: {msg}\n"
        response += "</file_diagnostics>"
    
    return response

When the AI sees syntax errors in the response, it can self-correct without waiting for the user to run the code.

Step 4: Design Good Error Messages

When an edit fails, give the AI enough context to self-correct.

Python

def format_edit_error(file_path: str, search: str, actual_content: str) -> str:
    """Format a helpful error message for the AI."""
    
    # Find similar content in the file
    search_first_line = search.split('\n')[0].strip()
    
    similar_lines = []
    for i, line in enumerate(actual_content.split('\n'), 1):
        if search_first_line[:20] in line:
            # Found potential match location
            context_start = max(0, i - 3)
            context_end = min(len(actual_content.split('\n')), i + 5)
            similar_lines.append((i, actual_content.split('\n')[context_start:context_end]))
    
    error_msg = f"""
# SEARCH block failed to match!

The following SEARCH block was not found in {file_path}:

```
{search[:500]}{'...' if len(search) > 500 else ''}
```

"""
    
    if similar_lines:
        error_msg += "## Did you mean one of these sections?\n\n"
        for line_num, context in similar_lines[:3]:
            error_msg += f"### Near line {line_num}:\n```\n"
            error_msg += '\n'.join(context)
            error_msg += "\n```\n\n"
    else:
        error_msg += """
## No similar content found.

Consider:
1. Use `read_file` to get the current file contents
2. Check if the file path is correct
3. If the file has changed, your context may be stale
"""
    
    error_msg += """
## Remember:
- SEARCH must match EXACTLY (character-for-character)
- Include all whitespace, comments, and indentation
- Use 2-3 lines of context before and after for uniqueness
"""
    
    return error_msg

Step 5: Write Your System Prompt

Here's a template based on the patterns we've seen in production agents:

System Prompt Template

You are an AI coding assistant with file editing capabilities.

## Available Tools

### replace_in_file
Make targeted edits to existing files using SEARCH/REPLACE blocks.

**Format:**
```xml
<replace_in_file>
<path>relative/path/to/file</path>
<diff>
<<<< SEARCH
[exact content to find]
====
[new content to replace with]
>>>> REPLACE
</diff>
</replace_in_file>
```

**Critical Rules:**
1. SEARCH must match EXACTLY (character-for-character, including whitespace)
2. Include 2-3 lines of context before/after for unique matching
3. Multiple SEARCH/REPLACE blocks must appear in file order
4. To delete code: Leave REPLACE section empty
5. To add code: Include surrounding context in SEARCH, add new lines in REPLACE

### write_to_file
Create new files or completely rewrite existing files.

**Format:**
```xml
<write_to_file>
<path>relative/path/to/file</path>
<content>
[entire file content]
</content>
</write_to_file>
```

**When to use:**
- Creating new files
- File is small (< 50 lines) and changing most of it
- replace_in_file has failed 3+ times

## Workflow

1. **Before editing:** Use read_file to get current content
2. **Prefer replace_in_file:** It's safer and more precise
3. **After editing:** Check the response for <file_diagnostics>
4. **If diagnostics show errors:** Fix them immediately
5. **If edit fails:** Re-read the file and try again with exact content
6. **After 3 failures:** Fall back to write_to_file

## Anti-Patterns

- NEVER use write_to_file on existing files without reading them first
- NEVER guess at file content—read it first
- NEVER use placeholder comments like "... rest of code ..."

Architecture Overview

Parse Tool Call

Extract path, search/replace content from AI output

↓

Read Current File

Get actual content for matching

↓

Apply Fallback Cascade

Try exact → whitespace → anchor → dmp

↓

Write Updated File

Save changes to disk

↓

Verify with LSP/Linter

Check for syntax errors immediately

↓

Return Result + Diagnostics

Success with any errors, or failure with suggestions

Best Practices Checklist

✅ Do

Implement multiple fallback tiers
Verify edits with LSP/linter
Return helpful error messages
Use 1-indexed line numbers for AI
Support both surgical and full-file edits
Handle Unicode normalization (smart quotes, em-dashes)
Show diff previews before writing
Track edit history for undo

❌ Don't

Pass code inside JSON strings (escaping nightmare)
Trust AI whitespace to be perfect
Fail silently—always explain what went wrong
Skip verification after edits
Allow overwrites without reading first
Use 0-indexed lines in AI prompts
Ignore model-specific quirks

Final Summary: What We Learned

Concept	Recommendation	Source Inspiration
Primary edit method	Search/Replace with fallbacks	Cline, Aider, OpenCode
Fallback depth	4-5 tiers minimum	OpenCode (9), Aider (5)
Output format	XML or custom markers	Cline (XML), Codex (custom)
Error handling	Suggest similar content + recovery path	Aider's error templates
Verification	LSP/linter check after every edit	OpenCode's diagnostics
User approval	Diff preview with configurable auto-approve	Grok CLI's confirmation system
Multi-file edits	Patch format or sequential operations	Codex patch syntax
Unicode handling	Normalize smart quotes, dashes, whitespace	Codex's seek_sequence

Ready to Build?

You now have the patterns, code examples, and best practices from the top AI coding agents. Go build something amazing.

← Start Over Review Agents

← Prompts & Instructions

How agents teach AI to use tools

Back to Overview →

Return to the playbook home