Migrating Legacy Documentation: A Practical Guide
Moving from outdated documentation systems to modern platforms? Learn strategies for successful migration, content auditing, redirects, and minimizing disruption.
Your documentation is stuck in an aging wiki, a clunky CMS, or scattered across multiple systems. You need to migrate. But how do you move thousands of pages without breaking everything? This guide walks you through the process.
Signs It's Time to Migrate#
You probably need to migrate if:
- Developers avoid the docs because they're hard to navigate
- Search doesn't work or returns irrelevant results
- Updates are painful requiring special tools or permissions
- Mobile experience is broken or non-existent
- Analytics are missing so you can't measure effectiveness
- The platform is unsupported or reaching end-of-life
The cost of staying often exceeds the cost of migrating.
Migration Strategy Overview#
┌─────────────────────────────────────────────────────────────┐
│ Migration Timeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ Phase 1 Phase 2 Phase 3 Phase 4 │
│ ──────── ──────── ──────── ──────── │
│ Audit & Content Technical Launch & │
│ Plan Migration Setup Redirect │
│ │
│ [2-3 weeks] [4-6 weeks] [2-3 weeks] [1-2 weeks] │
│ │
└─────────────────────────────────────────────────────────────┘Phase 1: Audit and Planning#
Content Inventory#
First, know what you have:
# Export sitemap or crawl existing docs
wget --spider --recursive --no-verbose --output-file=urls.log \
https://old-docs.example.com
# Extract URLs
grep -oP 'https://old-docs\.example\.com[^\s]+' urls.log | \
sort -u > all-pages.txtCreate a spreadsheet:
| URL | Title | Last Updated | Views | Keep/Archive/Delete | Priority |
|---|---|---|---|---|---|
| /docs/intro | Getting Started | 2023-01-15 | 10,234 | Keep | High |
| /docs/old-api | Legacy API v1 | 2019-08-22 | 123 | Archive | Low |
| /docs/test-page | Test | 2020-03-14 | 5 | Delete | - |
Content Audit Questions#
For each page, ask:
- Is it still accurate? Outdated content hurts more than helps
- Is it still needed? Some pages can be retired
- Should it be merged? Consolidate related thin content
- What's the traffic? Prioritize high-value pages
- Are there external links? These need redirects
Classification Framework#
## Content Categories
### Keep As-Is
- Current, accurate, well-structured
- High traffic
- Action: Migrate directly
### Keep with Updates
- Still relevant but needs refresh
- Outdated code examples, screenshots
- Action: Update during migration
### Consolidate
- Multiple pages covering similar topics
- Thin content that should be combined
- Action: Merge into comprehensive pages
### Archive
- Historical value but no longer current
- Legacy API versions, deprecated features
- Action: Move to /archive/ with notice
### Delete
- Test pages, duplicates, broken content
- No external links pointing to them
- Action: Don't migrate, add redirect to relevant pageURL Strategy#
Plan your new URL structure before migrating:
Old URLs:
/wiki/Documentation/API/REST/v2/Users/Create
/wiki/Documentation/API/REST/v2/Users/List
/wiki/Guides/Getting_Started_With_APINew URLs:
/docs/api/users#create
/docs/api/users#list
/docs/getting-startedDocument the mapping:
old_url,new_url,redirect_type
/wiki/Documentation/API/REST/v2/Users/Create,/docs/api/users#create,301
/wiki/Documentation/API/REST/v2/Users/List,/docs/api/users#list,301
/wiki/Guides/Getting_Started_With_API,/docs/getting-started,301Phase 2: Content Migration#
Manual vs. Automated Migration#
Automate when:
- Large volume (100+ pages)
- Consistent source format
- Structural changes are minimal
Manual when:
- Small volume (fewer than 50 pages)
- Content needs significant rewriting
- Source format is inconsistent
Automated Migration Script#
import os
import re
from pathlib import Path
import html2text
import frontmatter
def migrate_page(source_html: str, old_url: str) -> str:
"""Convert HTML documentation to MDX."""
# Convert HTML to Markdown
h = html2text.HTML2Text()
h.ignore_links = False
h.ignore_images = False
content = h.handle(source_html)
# Clean up common issues
content = clean_content(content)
# Extract title from first heading
title_match = re.search(r'^#\s+(.+)$', content, re.MULTILINE)
title = title_match.group(1) if title_match else "Untitled"
# Build frontmatter
post = frontmatter.Post(content)
post['title'] = title
post['description'] = extract_description(content)
post['old_url'] = old_url # Keep for redirect mapping
return frontmatter.dumps(post)
def clean_content(content: str) -> str:
"""Fix common conversion issues."""
# Fix code blocks
content = re.sub(
r'```\n\n',
'```\n',
content
)
# Fix internal links
content = re.sub(
r'\(/wiki/([^)]+)\)',
lambda m: f'(/docs/{slugify(m.group(1))})',
content
)
# Remove empty headers
content = re.sub(r'^#+\s*$', '', content, flags=re.MULTILINE)
# Fix broken tables
content = fix_tables(content)
return content
def migrate_all(source_dir: str, dest_dir: str):
"""Migrate all documentation files."""
for html_file in Path(source_dir).glob('**/*.html'):
with open(html_file) as f:
html_content = f.read()
# Convert and save
mdx_content = migrate_page(html_content, str(html_file))
# Determine output path
relative_path = html_file.relative_to(source_dir)
output_path = Path(dest_dir) / relative_path.with_suffix('.mdx')
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w') as f:
f.write(mdx_content)
print(f"Migrated: {html_file} -> {output_path}")Handling Common Source Formats#
From Confluence:
# Export Confluence space as HTML
# Use Confluence REST API or built-in export
import requests
def export_confluence_space(space_key: str):
response = requests.get(
f'{CONFLUENCE_URL}/rest/api/space/{space_key}/content',
auth=(USERNAME, API_TOKEN)
)
for page in response.json()['results']:
content = get_page_content(page['id'])
save_as_markdown(page['title'], content)From GitBook:
# GitBook exports to Markdown natively
# Clone the repo and process
git clone https://github.com/org/gitbook-docs.git
# GitBook uses SUMMARY.md for structure
# Parse it to maintain hierarchyFrom ReadTheDocs/Sphinx:
# Sphinx RST to MDX conversion
import pypandoc
def rst_to_mdx(rst_file: str) -> str:
# Convert RST to Markdown
md = pypandoc.convert_file(rst_file, 'md', format='rst')
# Sphinx-specific fixes
md = fix_sphinx_directives(md)
md = fix_sphinx_roles(md)
return mdContent Quality Checks#
Run automated checks after migration:
def validate_migrated_content(mdx_file: str) -> list[str]:
"""Check for common migration issues."""
issues = []
with open(mdx_file) as f:
content = f.read()
# Check frontmatter
if not content.startswith('---'):
issues.append("Missing frontmatter")
# Check for broken internal links
internal_links = re.findall(r'\]\((/[^)]+)\)', content)
for link in internal_links:
if not link_exists(link):
issues.append(f"Broken link: {link}")
# Check for empty code blocks
if '```\n```' in content:
issues.append("Empty code block found")
# Check for HTML remnants
if re.search(r'<(?!br)[a-z]+', content):
issues.append("HTML tags remaining")
# Check for broken images
images = re.findall(r'!\[.*?\]\((.*?)\)', content)
for img in images:
if not image_exists(img):
issues.append(f"Missing image: {img}")
return issuesPhase 3: Technical Setup#
Setting Up Redirects#
Redirects are critical—never break incoming links:
Next.js (next.config.js):
module.exports = {
async redirects() {
return [
// Individual redirects
{
source: '/wiki/Documentation/API/REST/v2/Users/Create',
destination: '/docs/api/users#create',
permanent: true,
},
// Pattern-based redirects
{
source: '/wiki/Documentation/:path*',
destination: '/docs/:path*',
permanent: true,
},
// Catch-all for deleted pages
{
source: '/wiki/:path*',
destination: '/docs',
permanent: false, // Use 302 for catch-all
},
];
},
};Nginx:
# Bulk redirects from map file
map $request_uri $new_uri {
include /etc/nginx/redirects.map;
}
server {
if ($new_uri) {
return 301 $new_uri;
}
}
# redirects.map
/wiki/Documentation/API/REST/v2/Users/Create /docs/api/users#create;
/wiki/Documentation/API/REST/v2/Users/List /docs/api/users#list;Cloudflare (bulk redirects):
# Upload to Cloudflare Bulk Redirects
/wiki/Documentation/API/REST/v2/Users/Create,/docs/api/users#create,301
/wiki/Documentation/API/REST/v2/Users/List,/docs/api/users#list,301Search Index Migration#
Ensure search works from day one:
// Rebuild search index with new content
import { indexDocuments } from './search';
async function rebuildSearchIndex() {
// Clear old index
await searchClient.clearIndex();
// Index all migrated documents
const documents = await getAllDocuments();
for (const doc of documents) {
await searchClient.index({
id: doc.slug,
title: doc.title,
content: doc.content,
url: doc.url,
// Include old URLs for transition period
aliases: doc.oldUrls
});
}
}Asset Migration#
Don't forget images and files:
# Download all assets from old docs
wget --recursive --no-parent --accept=png,jpg,gif,svg,pdf \
https://old-docs.example.com/assets/
# Organize and optimize
for file in assets/**/*; do
# Optimize images
if [[ $file == *.png ]]; then
pngquant --quality=65-80 "$file" -o "${file%.png}-optimized.png"
fi
# Update references in migrated docs
sed -i "s|/old-path/$(basename $file)|/assets/$(basename $file)|g" \
docs/**/*.mdx
donePhase 4: Launch and Validation#
Pre-Launch Checklist#
## Migration Launch Checklist
### Content
- [ ] All priority pages migrated
- [ ] Content reviewed for accuracy
- [ ] Images and assets working
- [ ] Internal links validated
- [ ] Code examples tested
### Technical
- [ ] Redirects configured and tested
- [ ] Search index populated
- [ ] Analytics installed
- [ ] SSL certificate valid
- [ ] Performance acceptable (under 3s load time)
### SEO
- [ ] Sitemap generated and submitted
- [ ] Meta descriptions present
- [ ] Canonical URLs set
- [ ] Robots.txt updated
### Monitoring
- [ ] 404 monitoring enabled
- [ ] Analytics dashboards ready
- [ ] Feedback mechanism in place
- [ ] Team notified of launchStaged Rollout#
Don't migrate everything at once:
Week 1: High-traffic pages (top 20%)
- Getting Started
- API Reference
- Installation
Week 2: Medium-traffic pages (next 30%)
- Guides
- Tutorials
- Concepts
Week 3: Long-tail content (remaining 50%)
- Advanced topics
- Archive content
- Edge cases
Week 4: Final cleanup
- Remove old system
- Update all external links
- Announce migration completePost-Launch Monitoring#
Track issues closely after launch:
// Monitor 404s
app.use((req, res, next) => {
res.on('finish', () => {
if (res.statusCode === 404) {
analytics.track('404_error', {
url: req.url,
referrer: req.headers.referer,
timestamp: new Date()
});
}
});
next();
});Daily review:
- Check 404 error logs
- Review user feedback
- Monitor search queries
- Watch analytics for traffic drops
Fixing Issues Fast#
Create a rapid response process:
## Migration Issue Response
### P0 - Critical (fix in hours)
- Major page returning 404
- Search completely broken
- API docs inaccessible
### P1 - High (fix in 1 day)
- High-traffic page has broken content
- Important redirect missing
- Images not loading
### P2 - Medium (fix in 1 week)
- Minor formatting issues
- Low-traffic page issues
- Non-critical broken links
### P3 - Low (add to backlog)
- Cosmetic improvements
- Nice-to-have enhancements
- Archive content issuesCommon Migration Pitfalls#
1. Breaking Incoming Links#
Problem: External sites link to your old URLs
Solution: Comprehensive redirect mapping + monitoring
# Find external backlinks before migration
# Use tools like Ahrefs, Moz, or Google Search Console
# Ensure ALL incoming URLs have redirects
# Monitor 404s for missed redirects2. Losing Search Rankings#
Problem: SEO drops after migration
Solution:
- Use 301 (permanent) redirects
- Keep URL structure similar when possible
- Submit new sitemap immediately
- Monitor Search Console for issues
3. Migrating Outdated Content#
Problem: Old, wrong content in new system
Solution: Content audit before migration
Don't migrate blindly. Ask:
- Is this still accurate?
- When was it last updated?
- Does anyone use this?
- Should it be rewritten instead?4. Underestimating Manual Work#
Problem: Automation doesn't catch everything
Solution: Budget for manual review
Realistic estimate:
- Automated migration: 60% of content
- Semi-automated (needs review): 25%
- Manual rewrite required: 15%5. No Rollback Plan#
Problem: Migration fails, can't recover
Solution: Keep old system running during transition
# Keep old docs accessible during migration
old-docs.example.com -> Active (read-only)
docs.example.com -> New system
# Only decommission old system after:
# - 30 days with no critical issues
# - Traffic fully transitioned
# - Redirects verified workingPost-Migration Success#
Measure the Impact#
Compare before and after:
| Metric | Before | After | Change |
|---|---|---|---|
| Monthly visitors | 45,000 | 52,000 | +16% |
| Search success rate | 62% | 84% | +22% |
| Avg. time on page | 1:30 | 3:45 | +150% |
| Support tickets (doc-related) | 120/mo | 75/mo | -38% |
| Developer satisfaction | 3.2/5 | 4.4/5 | +38% |
Communicate the Change#
Let users know:
# We've Moved! 🎉
Our documentation has a new home with improved search,
better navigation, and updated content.
## What's New
- Faster, more accurate search
- Mobile-friendly design
- Updated code examples
- New tutorials and guides
## What You Need to Know
- All old URLs redirect automatically
- Bookmarks will still work
- Search engines will update within days
## Found an Issue?
Help us improve! [Report a problem](link)Conclusion#
Documentation migration is a significant undertaking, but it's worth doing right:
- Audit thoroughly - Know what you have before moving it
- Plan redirects carefully - Never break incoming links
- Automate where possible - But expect manual work
- Launch gradually - Staged rollout reduces risk
- Monitor closely - Fix issues fast after launch
A successful migration isn't just moving content—it's an opportunity to improve your documentation foundation for years to come.
Ready to migrate to a modern documentation platform? Dokly makes migration easy with automatic URL redirects, built-in search, and a developer-friendly writing experience. Start your migration today.
