Back to Blog
Dokly

Dokly

Pro API documentation without the $300/mo price tag.

Check it out on Product Hunt →
Migration
Documentation
Legacy Systems
Content Strategy
DevOps

Migrating Legacy Documentation: A Practical Guide

Moving from outdated documentation systems to modern platforms? Learn strategies for successful migration, content auditing, redirects, and minimizing disruption.

Dokly Team
11 min read

Your documentation is stuck in an aging wiki, a clunky CMS, or scattered across multiple systems. You need to migrate. But how do you move thousands of pages without breaking everything? This guide walks you through the process.

Signs It's Time to Migrate#

You probably need to migrate if:

  • Developers avoid the docs because they're hard to navigate
  • Search doesn't work or returns irrelevant results
  • Updates are painful requiring special tools or permissions
  • Mobile experience is broken or non-existent
  • Analytics are missing so you can't measure effectiveness
  • The platform is unsupported or reaching end-of-life

The cost of staying often exceeds the cost of migrating.

Migration Strategy Overview#

Text
┌─────────────────────────────────────────────────────────────┐
│                    Migration Timeline                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Phase 1        Phase 2        Phase 3        Phase 4      │
│  ────────       ────────       ────────       ────────     │
│  Audit &        Content        Technical      Launch &     │
│  Plan           Migration      Setup          Redirect     │
│                                                             │
│  [2-3 weeks]    [4-6 weeks]    [2-3 weeks]   [1-2 weeks]  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Phase 1: Audit and Planning#

Content Inventory#

First, know what you have:

Bash
# Export sitemap or crawl existing docs
wget --spider --recursive --no-verbose --output-file=urls.log \
  https://old-docs.example.com
 
# Extract URLs
grep -oP 'https://old-docs\.example\.com[^\s]+' urls.log | \
  sort -u > all-pages.txt

Create a spreadsheet:

URLTitleLast UpdatedViewsKeep/Archive/DeletePriority
/docs/introGetting Started2023-01-1510,234KeepHigh
/docs/old-apiLegacy API v12019-08-22123ArchiveLow
/docs/test-pageTest2020-03-145Delete-

Content Audit Questions#

For each page, ask:

  1. Is it still accurate? Outdated content hurts more than helps
  2. Is it still needed? Some pages can be retired
  3. Should it be merged? Consolidate related thin content
  4. What's the traffic? Prioritize high-value pages
  5. Are there external links? These need redirects

Classification Framework#

Markdown
## Content Categories
 
### Keep As-Is
- Current, accurate, well-structured
- High traffic
- Action: Migrate directly
 
### Keep with Updates
- Still relevant but needs refresh
- Outdated code examples, screenshots
- Action: Update during migration
 
### Consolidate
- Multiple pages covering similar topics
- Thin content that should be combined
- Action: Merge into comprehensive pages
 
### Archive
- Historical value but no longer current
- Legacy API versions, deprecated features
- Action: Move to /archive/ with notice
 
### Delete
- Test pages, duplicates, broken content
- No external links pointing to them
- Action: Don't migrate, add redirect to relevant page

URL Strategy#

Plan your new URL structure before migrating:

Old URLs:

Text
/wiki/Documentation/API/REST/v2/Users/Create
/wiki/Documentation/API/REST/v2/Users/List
/wiki/Guides/Getting_Started_With_API

New URLs:

Text
/docs/api/users#create
/docs/api/users#list
/docs/getting-started

Document the mapping:

csv
old_url,new_url,redirect_type
/wiki/Documentation/API/REST/v2/Users/Create,/docs/api/users#create,301
/wiki/Documentation/API/REST/v2/Users/List,/docs/api/users#list,301
/wiki/Guides/Getting_Started_With_API,/docs/getting-started,301

Phase 2: Content Migration#

Manual vs. Automated Migration#

Automate when:

  • Large volume (100+ pages)
  • Consistent source format
  • Structural changes are minimal

Manual when:

  • Small volume (fewer than 50 pages)
  • Content needs significant rewriting
  • Source format is inconsistent

Automated Migration Script#

Python
import os
import re
from pathlib import Path
import html2text
import frontmatter
 
def migrate_page(source_html: str, old_url: str) -> str:
    """Convert HTML documentation to MDX."""
 
    # Convert HTML to Markdown
    h = html2text.HTML2Text()
    h.ignore_links = False
    h.ignore_images = False
    content = h.handle(source_html)
 
    # Clean up common issues
    content = clean_content(content)
 
    # Extract title from first heading
    title_match = re.search(r'^#\s+(.+)$', content, re.MULTILINE)
    title = title_match.group(1) if title_match else "Untitled"
 
    # Build frontmatter
    post = frontmatter.Post(content)
    post['title'] = title
    post['description'] = extract_description(content)
    post['old_url'] = old_url  # Keep for redirect mapping
 
    return frontmatter.dumps(post)
 
def clean_content(content: str) -> str:
    """Fix common conversion issues."""
 
    # Fix code blocks
    content = re.sub(
        r'```\n\n',
        '```\n',
        content
    )
 
    # Fix internal links
    content = re.sub(
        r'\(/wiki/([^)]+)\)',
        lambda m: f'(/docs/{slugify(m.group(1))})',
        content
    )
 
    # Remove empty headers
    content = re.sub(r'^#+\s*$', '', content, flags=re.MULTILINE)
 
    # Fix broken tables
    content = fix_tables(content)
 
    return content
 
def migrate_all(source_dir: str, dest_dir: str):
    """Migrate all documentation files."""
 
    for html_file in Path(source_dir).glob('**/*.html'):
        with open(html_file) as f:
            html_content = f.read()
 
        # Convert and save
        mdx_content = migrate_page(html_content, str(html_file))
 
        # Determine output path
        relative_path = html_file.relative_to(source_dir)
        output_path = Path(dest_dir) / relative_path.with_suffix('.mdx')
 
        output_path.parent.mkdir(parents=True, exist_ok=True)
        with open(output_path, 'w') as f:
            f.write(mdx_content)
 
        print(f"Migrated: {html_file} -> {output_path}")

Handling Common Source Formats#

From Confluence:

Python
# Export Confluence space as HTML
# Use Confluence REST API or built-in export
 
import requests
 
def export_confluence_space(space_key: str):
    response = requests.get(
        f'{CONFLUENCE_URL}/rest/api/space/{space_key}/content',
        auth=(USERNAME, API_TOKEN)
    )
 
    for page in response.json()['results']:
        content = get_page_content(page['id'])
        save_as_markdown(page['title'], content)

From GitBook:

Bash
# GitBook exports to Markdown natively
# Clone the repo and process
 
git clone https://github.com/org/gitbook-docs.git
 
# GitBook uses SUMMARY.md for structure
# Parse it to maintain hierarchy

From ReadTheDocs/Sphinx:

Python
# Sphinx RST to MDX conversion
import pypandoc
 
def rst_to_mdx(rst_file: str) -> str:
    # Convert RST to Markdown
    md = pypandoc.convert_file(rst_file, 'md', format='rst')
 
    # Sphinx-specific fixes
    md = fix_sphinx_directives(md)
    md = fix_sphinx_roles(md)
 
    return md

Content Quality Checks#

Run automated checks after migration:

Python
def validate_migrated_content(mdx_file: str) -> list[str]:
    """Check for common migration issues."""
    issues = []
 
    with open(mdx_file) as f:
        content = f.read()
 
    # Check frontmatter
    if not content.startswith('---'):
        issues.append("Missing frontmatter")
 
    # Check for broken internal links
    internal_links = re.findall(r'\]\((/[^)]+)\)', content)
    for link in internal_links:
        if not link_exists(link):
            issues.append(f"Broken link: {link}")
 
    # Check for empty code blocks
    if '```\n```' in content:
        issues.append("Empty code block found")
 
    # Check for HTML remnants
    if re.search(r'<(?!br)[a-z]+', content):
        issues.append("HTML tags remaining")
 
    # Check for broken images
    images = re.findall(r'!\[.*?\]\((.*?)\)', content)
    for img in images:
        if not image_exists(img):
            issues.append(f"Missing image: {img}")
 
    return issues

Phase 3: Technical Setup#

Setting Up Redirects#

Redirects are critical—never break incoming links:

Next.js (next.config.js):

JavaScript
module.exports = {
  async redirects() {
    return [
      // Individual redirects
      {
        source: '/wiki/Documentation/API/REST/v2/Users/Create',
        destination: '/docs/api/users#create',
        permanent: true,
      },
      // Pattern-based redirects
      {
        source: '/wiki/Documentation/:path*',
        destination: '/docs/:path*',
        permanent: true,
      },
      // Catch-all for deleted pages
      {
        source: '/wiki/:path*',
        destination: '/docs',
        permanent: false, // Use 302 for catch-all
      },
    ];
  },
};

Nginx:

nginx
# Bulk redirects from map file
map $request_uri $new_uri {
    include /etc/nginx/redirects.map;
}
 
server {
    if ($new_uri) {
        return 301 $new_uri;
    }
}
 
# redirects.map
/wiki/Documentation/API/REST/v2/Users/Create /docs/api/users#create;
/wiki/Documentation/API/REST/v2/Users/List /docs/api/users#list;

Cloudflare (bulk redirects):

csv
# Upload to Cloudflare Bulk Redirects
/wiki/Documentation/API/REST/v2/Users/Create,/docs/api/users#create,301
/wiki/Documentation/API/REST/v2/Users/List,/docs/api/users#list,301

Search Index Migration#

Ensure search works from day one:

JavaScript
// Rebuild search index with new content
import { indexDocuments } from './search';
 
async function rebuildSearchIndex() {
  // Clear old index
  await searchClient.clearIndex();
 
  // Index all migrated documents
  const documents = await getAllDocuments();
 
  for (const doc of documents) {
    await searchClient.index({
      id: doc.slug,
      title: doc.title,
      content: doc.content,
      url: doc.url,
      // Include old URLs for transition period
      aliases: doc.oldUrls
    });
  }
}

Asset Migration#

Don't forget images and files:

Bash
# Download all assets from old docs
wget --recursive --no-parent --accept=png,jpg,gif,svg,pdf \
  https://old-docs.example.com/assets/
 
# Organize and optimize
for file in assets/**/*; do
  # Optimize images
  if [[ $file == *.png ]]; then
    pngquant --quality=65-80 "$file" -o "${file%.png}-optimized.png"
  fi
 
  # Update references in migrated docs
  sed -i "s|/old-path/$(basename $file)|/assets/$(basename $file)|g" \
    docs/**/*.mdx
done

Phase 4: Launch and Validation#

Pre-Launch Checklist#

Markdown
## Migration Launch Checklist
 
### Content
- [ ] All priority pages migrated
- [ ] Content reviewed for accuracy
- [ ] Images and assets working
- [ ] Internal links validated
- [ ] Code examples tested
 
### Technical
- [ ] Redirects configured and tested
- [ ] Search index populated
- [ ] Analytics installed
- [ ] SSL certificate valid
- [ ] Performance acceptable (under 3s load time)
 
### SEO
- [ ] Sitemap generated and submitted
- [ ] Meta descriptions present
- [ ] Canonical URLs set
- [ ] Robots.txt updated
 
### Monitoring
- [ ] 404 monitoring enabled
- [ ] Analytics dashboards ready
- [ ] Feedback mechanism in place
- [ ] Team notified of launch

Staged Rollout#

Don't migrate everything at once:

Text
Week 1: High-traffic pages (top 20%)
        - Getting Started
        - API Reference
        - Installation
 
Week 2: Medium-traffic pages (next 30%)
        - Guides
        - Tutorials
        - Concepts
 
Week 3: Long-tail content (remaining 50%)
        - Advanced topics
        - Archive content
        - Edge cases
 
Week 4: Final cleanup
        - Remove old system
        - Update all external links
        - Announce migration complete

Post-Launch Monitoring#

Track issues closely after launch:

JavaScript
// Monitor 404s
app.use((req, res, next) => {
  res.on('finish', () => {
    if (res.statusCode === 404) {
      analytics.track('404_error', {
        url: req.url,
        referrer: req.headers.referer,
        timestamp: new Date()
      });
    }
  });
  next();
});

Daily review:

  • Check 404 error logs
  • Review user feedback
  • Monitor search queries
  • Watch analytics for traffic drops

Fixing Issues Fast#

Create a rapid response process:

Markdown
## Migration Issue Response
 
### P0 - Critical (fix in hours)
- Major page returning 404
- Search completely broken
- API docs inaccessible
 
### P1 - High (fix in 1 day)
- High-traffic page has broken content
- Important redirect missing
- Images not loading
 
### P2 - Medium (fix in 1 week)
- Minor formatting issues
- Low-traffic page issues
- Non-critical broken links
 
### P3 - Low (add to backlog)
- Cosmetic improvements
- Nice-to-have enhancements
- Archive content issues

Common Migration Pitfalls#

Problem: External sites link to your old URLs

Solution: Comprehensive redirect mapping + monitoring

Bash
# Find external backlinks before migration
# Use tools like Ahrefs, Moz, or Google Search Console
 
# Ensure ALL incoming URLs have redirects
# Monitor 404s for missed redirects

2. Losing Search Rankings#

Problem: SEO drops after migration

Solution:

  • Use 301 (permanent) redirects
  • Keep URL structure similar when possible
  • Submit new sitemap immediately
  • Monitor Search Console for issues

3. Migrating Outdated Content#

Problem: Old, wrong content in new system

Solution: Content audit before migration

Markdown
Don't migrate blindly. Ask:
- Is this still accurate?
- When was it last updated?
- Does anyone use this?
- Should it be rewritten instead?

4. Underestimating Manual Work#

Problem: Automation doesn't catch everything

Solution: Budget for manual review

Text
Realistic estimate:
- Automated migration: 60% of content
- Semi-automated (needs review): 25%
- Manual rewrite required: 15%

5. No Rollback Plan#

Problem: Migration fails, can't recover

Solution: Keep old system running during transition

Bash
# Keep old docs accessible during migration
old-docs.example.com -> Active (read-only)
docs.example.com -> New system
 
# Only decommission old system after:
# - 30 days with no critical issues
# - Traffic fully transitioned
# - Redirects verified working

Post-Migration Success#

Measure the Impact#

Compare before and after:

MetricBeforeAfterChange
Monthly visitors45,00052,000+16%
Search success rate62%84%+22%
Avg. time on page1:303:45+150%
Support tickets (doc-related)120/mo75/mo-38%
Developer satisfaction3.2/54.4/5+38%

Communicate the Change#

Let users know:

Markdown
# We've Moved! 🎉
 
Our documentation has a new home with improved search,
better navigation, and updated content.
 
## What's New
- Faster, more accurate search
- Mobile-friendly design
- Updated code examples
- New tutorials and guides
 
## What You Need to Know
- All old URLs redirect automatically
- Bookmarks will still work
- Search engines will update within days
 
## Found an Issue?
Help us improve! [Report a problem](link)

Conclusion#

Documentation migration is a significant undertaking, but it's worth doing right:

  1. Audit thoroughly - Know what you have before moving it
  2. Plan redirects carefully - Never break incoming links
  3. Automate where possible - But expect manual work
  4. Launch gradually - Staged rollout reduces risk
  5. Monitor closely - Fix issues fast after launch

A successful migration isn't just moving content—it's an opportunity to improve your documentation foundation for years to come.


Ready to migrate to a modern documentation platform? Dokly makes migration easy with automatic URL redirects, built-in search, and a developer-friendly writing experience. Start your migration today.

Dokly

Dokly

Pro API documentation without the $300/mo price tag.

Check it out on Product Hunt →

Ready to build better docs?

Start creating beautiful documentation with Dokly today.

Get Started Free