Your documentation is stuck in an aging wiki, a clunky CMS, or scattered across multiple systems. You need to migrate. But how do you move thousands of pages without breaking everything? This guide walks you through the process.

Signs It's Time to Migrate#

You probably need to migrate if:

Developers avoid the docs because they're hard to navigate
Search doesn't work or returns irrelevant results
Updates are painful requiring special tools or permissions
Mobile experience is broken or non-existent
Analytics are missing so you can't measure effectiveness
The platform is unsupported or reaching end-of-life

The cost of staying often exceeds the cost of migrating.

Migration Strategy Overview#

Text

┌─────────────────────────────────────────────────────────────┐
│                    Migration Timeline                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Phase 1        Phase 2        Phase 3        Phase 4      │
│  ────────       ────────       ────────       ────────     │
│  Audit &        Content        Technical      Launch &     │
│  Plan           Migration      Setup          Redirect     │
│                                                             │
│  [2-3 weeks]    [4-6 weeks]    [2-3 weeks]   [1-2 weeks]  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Phase 1: Audit and Planning#

Content Inventory#

First, know what you have:

Bash

# Export sitemap or crawl existing docs
wget --spider --recursive --no-verbose --output-file=urls.log \
  https://old-docs.example.com
 
# Extract URLs
grep -oP 'https://old-docs\.example\.com[^\s]+' urls.log | \
  sort -u > all-pages.txt

Create a spreadsheet:

URL	Title	Last Updated	Views	Keep/Archive/Delete	Priority
/docs/intro	Getting Started	2023-01-15	10,234	Keep	High
/docs/old-api	Legacy API v1	2019-08-22	123	Archive	Low
/docs/test-page	Test	2020-03-14	5	Delete	-

Content Audit Questions#

For each page, ask:

Is it still accurate? Outdated content hurts more than helps
Is it still needed? Some pages can be retired
Should it be merged? Consolidate related thin content
What's the traffic? Prioritize high-value pages
Are there external links? These need redirects

Classification Framework#

Markdown

## Content Categories
 
### Keep As-Is
- Current, accurate, well-structured
- High traffic
- Action: Migrate directly
 
### Keep with Updates
- Still relevant but needs refresh
- Outdated code examples, screenshots
- Action: Update during migration
 
### Consolidate
- Multiple pages covering similar topics
- Thin content that should be combined
- Action: Merge into comprehensive pages
 
### Archive
- Historical value but no longer current
- Legacy API versions, deprecated features
- Action: Move to /archive/ with notice
 
### Delete
- Test pages, duplicates, broken content
- No external links pointing to them
- Action: Don't migrate, add redirect to relevant page

URL Strategy#

Plan your new URL structure before migrating:

Old URLs:

Text

/wiki/Documentation/API/REST/v2/Users/Create
/wiki/Documentation/API/REST/v2/Users/List
/wiki/Guides/Getting_Started_With_API

New URLs:

Text

/docs/api/users#create
/docs/api/users#list
/docs/getting-started

Document the mapping:

csv

old_url,new_url,redirect_type
/wiki/Documentation/API/REST/v2/Users/Create,/docs/api/users#create,301
/wiki/Documentation/API/REST/v2/Users/List,/docs/api/users#list,301
/wiki/Guides/Getting_Started_With_API,/docs/getting-started,301

Phase 2: Content Migration#

Manual vs. Automated Migration#

Automate when:

Large volume (100+ pages)
Consistent source format
Structural changes are minimal

Manual when:

Small volume (fewer than 50 pages)
Content needs significant rewriting
Source format is inconsistent

Automated Migration Script#

Python

import os
import re
from pathlib import Path
import html2text
import frontmatter
 
def migrate_page(source_html: str, old_url: str) -> str:
    """Convert HTML documentation to MDX."""
 
    # Convert HTML to Markdown
    h = html2text.HTML2Text()
    h.ignore_links = False
    h.ignore_images = False
    content = h.handle(source_html)
 
    # Clean up common issues
    content = clean_content(content)
 
    # Extract title from first heading
    title_match = re.search(r'^#\s+(.+)$', content, re.MULTILINE)
    title = title_match.group(1) if title_match else "Untitled"
 
    # Build frontmatter
    post = frontmatter.Post(content)
    post['title'] = title
    post['description'] = extract_description(content)
    post['old_url'] = old_url  # Keep for redirect mapping
 
    return frontmatter.dumps(post)
 
def clean_content(content: str) -> str:
    """Fix common conversion issues."""
 
    # Fix code blocks
    content = re.sub(
        r'```\n\n',
        '```\n',
        content
    )
 
    # Fix internal links
    content = re.sub(
        r'\(/wiki/([^)]+)\)',
        lambda m: f'(/docs/{slugify(m.group(1))})',
        content
    )
 
    # Remove empty headers
    content = re.sub(r'^#+\s*$', '', content, flags=re.MULTILINE)
 
    # Fix broken tables
    content = fix_tables(content)
 
    return content
 
def migrate_all(source_dir: str, dest_dir: str):
    """Migrate all documentation files."""
 
    for html_file in Path(source_dir).glob('**/*.html'):
        with open(html_file) as f:
            html_content = f.read()
 
        # Convert and save
        mdx_content = migrate_page(html_content, str(html_file))
 
        # Determine output path
        relative_path = html_file.relative_to(source_dir)
        output_path = Path(dest_dir) / relative_path.with_suffix('.mdx')
 
        output_path.parent.mkdir(parents=True, exist_ok=True)
        with open(output_path, 'w') as f:
            f.write(mdx_content)
 
        print(f"Migrated: {html_file} -> {output_path}")

Handling Common Source Formats#

From Confluence:

Python

# Export Confluence space as HTML
# Use Confluence REST API or built-in export
 
import requests
 
def export_confluence_space(space_key: str):
    response = requests.get(
        f'{CONFLUENCE_URL}/rest/api/space/{space_key}/content',
        auth=(USERNAME, API_TOKEN)
    )
 
    for page in response.json()['results']:
        content = get_page_content(page['id'])
        save_as_markdown(page['title'], content)

From GitBook:

Bash

# GitBook exports to Markdown natively
# Clone the repo and process
 
git clone https://github.com/org/gitbook-docs.git
 
# GitBook uses SUMMARY.md for structure
# Parse it to maintain hierarchy

From ReadTheDocs/Sphinx:

Python

# Sphinx RST to MDX conversion
import pypandoc
 
def rst_to_mdx(rst_file: str) -> str:
    # Convert RST to Markdown
    md = pypandoc.convert_file(rst_file, 'md', format='rst')
 
    # Sphinx-specific fixes
    md = fix_sphinx_directives(md)
    md = fix_sphinx_roles(md)
 
    return md

Content Quality Checks#

Run automated checks after migration:

Python

def validate_migrated_content(mdx_file: str) -> list[str]:
    """Check for common migration issues."""
    issues = []
 
    with open(mdx_file) as f:
        content = f.read()
 
    # Check frontmatter
    if not content.startswith('---'):
        issues.append("Missing frontmatter")
 
    # Check for broken internal links
    internal_links = re.findall(r'\]\((/[^)]+)\)', content)
    for link in internal_links:
        if not link_exists(link):
            issues.append(f"Broken link: {link}")
 
    # Check for empty code blocks
    if '```\n```' in content:
        issues.append("Empty code block found")
 
    # Check for HTML remnants
    if re.search(r'<(?!br)[a-z]+', content):
        issues.append("HTML tags remaining")
 
    # Check for broken images
    images = re.findall(r'!\[.*?\]\((.*?)\)', content)
    for img in images:
        if not image_exists(img):
            issues.append(f"Missing image: {img}")
 
    return issues

Phase 3: Technical Setup#

Setting Up Redirects#

Redirects are critical—never break incoming links:

Next.js (next.config.js):

JavaScript

module.exports = {
  async redirects() {
    return [
      // Individual redirects
      {
        source: '/wiki/Documentation/API/REST/v2/Users/Create',
        destination: '/docs/api/users#create',
        permanent: true,
      },
      // Pattern-based redirects
      {
        source: '/wiki/Documentation/:path*',
        destination: '/docs/:path*',
        permanent: true,
      },
      // Catch-all for deleted pages
      {
        source: '/wiki/:path*',
        destination: '/docs',
        permanent: false, // Use 302 for catch-all
      },
    ];
  },
};

Nginx:

nginx

# Bulk redirects from map file
map $request_uri $new_uri {
    include /etc/nginx/redirects.map;
}
 
server {
    if ($new_uri) {
        return 301 $new_uri;
    }
}
 
# redirects.map
/wiki/Documentation/API/REST/v2/Users/Create /docs/api/users#create;
/wiki/Documentation/API/REST/v2/Users/List /docs/api/users#list;

Cloudflare (bulk redirects):

csv

# Upload to Cloudflare Bulk Redirects
/wiki/Documentation/API/REST/v2/Users/Create,/docs/api/users#create,301
/wiki/Documentation/API/REST/v2/Users/List,/docs/api/users#list,301

Search Index Migration#

Ensure search works from day one:

JavaScript

// Rebuild search index with new content
import { indexDocuments } from './search';
 
async function rebuildSearchIndex() {
  // Clear old index
  await searchClient.clearIndex();
 
  // Index all migrated documents
  const documents = await getAllDocuments();
 
  for (const doc of documents) {
    await searchClient.index({
      id: doc.slug,
      title: doc.title,
      content: doc.content,
      url: doc.url,
      // Include old URLs for transition period
      aliases: doc.oldUrls
    });
  }
}

Asset Migration#

Don't forget images and files:

Bash

# Download all assets from old docs
wget --recursive --no-parent --accept=png,jpg,gif,svg,pdf \
  https://old-docs.example.com/assets/
 
# Organize and optimize
for file in assets/**/*; do
  # Optimize images
  if [[ $file == *.png ]]; then
    pngquant --quality=65-80 "$file" -o "${file%.png}-optimized.png"
  fi
 
  # Update references in migrated docs
  sed -i "s|/old-path/$(basename $file)|/assets/$(basename $file)|g" \
    docs/**/*.mdx
done

Phase 4: Launch and Validation#

Pre-Launch Checklist#

Markdown

## Migration Launch Checklist
 
### Content
- [ ] All priority pages migrated
- [ ] Content reviewed for accuracy
- [ ] Images and assets working
- [ ] Internal links validated
- [ ] Code examples tested
 
### Technical
- [ ] Redirects configured and tested
- [ ] Search index populated
- [ ] Analytics installed
- [ ] SSL certificate valid
- [ ] Performance acceptable (under 3s load time)
 
### SEO
- [ ] Sitemap generated and submitted
- [ ] Meta descriptions present
- [ ] Canonical URLs set
- [ ] Robots.txt updated
 
### Monitoring
- [ ] 404 monitoring enabled
- [ ] Analytics dashboards ready
- [ ] Feedback mechanism in place
- [ ] Team notified of launch

Staged Rollout#

Don't migrate everything at once:

Text

Week 1: High-traffic pages (top 20%)
        - Getting Started
        - API Reference
        - Installation
 
Week 2: Medium-traffic pages (next 30%)
        - Guides
        - Tutorials
        - Concepts
 
Week 3: Long-tail content (remaining 50%)
        - Advanced topics
        - Archive content
        - Edge cases
 
Week 4: Final cleanup
        - Remove old system
        - Update all external links
        - Announce migration complete

Post-Launch Monitoring#

Track issues closely after launch:

JavaScript

// Monitor 404s
app.use((req, res, next) => {
  res.on('finish', () => {
    if (res.statusCode === 404) {
      analytics.track('404_error', {
        url: req.url,
        referrer: req.headers.referer,
        timestamp: new Date()
      });
    }
  });
  next();
});

Daily review:

Check 404 error logs
Review user feedback
Monitor search queries
Watch analytics for traffic drops

Fixing Issues Fast#

Create a rapid response process:

Markdown

## Migration Issue Response
 
### P0 - Critical (fix in hours)
- Major page returning 404
- Search completely broken
- API docs inaccessible
 
### P1 - High (fix in 1 day)
- High-traffic page has broken content
- Important redirect missing
- Images not loading
 
### P2 - Medium (fix in 1 week)
- Minor formatting issues
- Low-traffic page issues
- Non-critical broken links
 
### P3 - Low (add to backlog)
- Cosmetic improvements
- Nice-to-have enhancements
- Archive content issues

Common Migration Pitfalls#

1. Breaking Incoming Links#

Problem: External sites link to your old URLs

Solution: Comprehensive redirect mapping + monitoring

Bash

# Find external backlinks before migration
# Use tools like Ahrefs, Moz, or Google Search Console
 
# Ensure ALL incoming URLs have redirects
# Monitor 404s for missed redirects

2. Losing Search Rankings#

Problem: SEO drops after migration

Solution:

Use 301 (permanent) redirects
Keep URL structure similar when possible
Submit new sitemap immediately
Monitor Search Console for issues

3. Migrating Outdated Content#

Problem: Old, wrong content in new system

Solution: Content audit before migration

Markdown

Don't migrate blindly. Ask:
- Is this still accurate?
- When was it last updated?
- Does anyone use this?
- Should it be rewritten instead?

4. Underestimating Manual Work#

Problem: Automation doesn't catch everything

Solution: Budget for manual review

Text

Realistic estimate:
- Automated migration: 60% of content
- Semi-automated (needs review): 25%
- Manual rewrite required: 15%

5. No Rollback Plan#

Problem: Migration fails, can't recover

Solution: Keep old system running during transition

Bash

# Keep old docs accessible during migration
old-docs.example.com -> Active (read-only)
docs.example.com -> New system
 
# Only decommission old system after:
# - 30 days with no critical issues
# - Traffic fully transitioned
# - Redirects verified working

Post-Migration Success#

Measure the Impact#

Compare before and after:

Metric	Before	After	Change
Monthly visitors	45,000	52,000	+16%
Search success rate	62%	84%	+22%
Avg. time on page	1:30	3:45	+150%
Support tickets (doc-related)	120/mo	75/mo	-38%
Developer satisfaction	3.2/5	4.4/5	+38%

Communicate the Change#

Let users know:

Markdown

# We've Moved! 🎉
 
Our documentation has a new home with improved search,
better navigation, and updated content.
 
## What's New
- Faster, more accurate search
- Mobile-friendly design
- Updated code examples
- New tutorials and guides
 
## What You Need to Know
- All old URLs redirect automatically
- Bookmarks will still work
- Search engines will update within days
 
## Found an Issue?
Help us improve! [Report a problem](link)

Conclusion#

Documentation migration is a significant undertaking, but it's worth doing right:

Audit thoroughly - Know what you have before moving it
Plan redirects carefully - Never break incoming links
Automate where possible - But expect manual work
Launch gradually - Staged rollout reduces risk
Monitor closely - Fix issues fast after launch

A successful migration isn't just moving content—it's an opportunity to improve your documentation foundation for years to come.

Ready to migrate to a modern documentation platform? Dokly makes migration easy with automatic URL redirects, built-in search, and a developer-friendly writing experience. Start your migration today.

Dokly

Migrating Legacy Documentation: A Practical Guide

Signs It's Time to Migrate#

Migration Strategy Overview#

Phase 1: Audit and Planning#

Content Inventory#

Content Audit Questions#

Classification Framework#

URL Strategy#

Phase 2: Content Migration#

Manual vs. Automated Migration#

Automated Migration Script#

Handling Common Source Formats#

Content Quality Checks#

Phase 3: Technical Setup#

Setting Up Redirects#

Search Index Migration#

Asset Migration#

Phase 4: Launch and Validation#

Pre-Launch Checklist#

Staged Rollout#

Post-Launch Monitoring#

Fixing Issues Fast#

Common Migration Pitfalls#

1. Breaking Incoming Links#

2. Losing Search Rankings#

3. Migrating Outdated Content#

4. Underestimating Manual Work#

5. No Rollback Plan#

Post-Migration Success#

Measure the Impact#

Communicate the Change#

Conclusion#

Dokly

Ready to build better docs?