Skip to content

Production Best Practices

Giới thiệu

Trong bài này, chúng ta sẽ tìm hiểu các best practices cho deployment và maintenance của hệ thống backup trong production.


Error Handling

Standardized Error Objects

typescript
export const errorSnapshotLimitExceeded = {
  error: 'Snapshot limit exceeded',
  code: 'SNAPSHOT_LIMIT_EXCEEDED',
  type: 'VALIDATION_ERROR',
};

export const errorSnapshotLimitByScheduledBackup = {
  error: 'Cannot create manual snapshot - would exceed limit for scheduled backup',
  code: 'SNAPSHOT_LIMIT_BY_SCHEDULED_BACKUP',
  type: 'VALIDATION_ERROR',
};

Error Handling Pattern

typescript
try {
  await checkSnapshotLimit(serverId, limit);
  await createSnapshot();
} catch (error: any) {
  // Check if it's a known error object
  if (error?.error && error?.code && error?.type) {
    throw error; // Preserve error structure
  }
  // Handle unexpected errors
  logger.error(`Unexpected error: ${error?.message}`);
  throw new Error('Failed to create snapshot');
}

Logging Strategy

Winston Logger

typescript
import { logger } from '../../../logger/winston';

// Info logs
logger.info(`✅ Backup created successfully: ${imageId}`);
logger.info(`🔄 Processing backup job: ${resourceId}`);

// Warning logs
logger.warn(`⚠️ Schedule disabled, skipping: ${scheduleId}`);

// Error logs
logger.error(`❌ Backup failed: ${error.message}`);

Log Levels

  • Info: Normal operations, status updates
  • Warn: Recoverable issues, fallbacks
  • Error: Failures, exceptions

Monitoring

Queue Statistics

typescript
// Monitor queue health
const stats = await backupQueueManager.getQueueStats();

if (stats.waiting > 100) {
  // Alert: Too many waiting jobs
}

if (stats.failed > 50) {
  // Alert: Too many failed jobs
}

Health Checks

typescript
// Periodic health check
const health = await backupQueueManager.healthCheck();

if (!health.healthy) {
  // Alert: Queue health issues
  logger.error(`Queue health issues: ${health.issues.join(', ')}`);
}

Graceful Shutdown

Shutdown Handlers

typescript
export async function stopBackupScheduler() {
  logger.info('🔄 Stopping backup scheduler...');
  
  // Stop cron jobs
  if (schedulerTask) {
    schedulerTask.stop();
    schedulerTask = null;
  }
  
  if (cleanupTask) {
    cleanupTask.stop();
    cleanupTask = null;
  }
  
  // Close queue manager
  await backupQueueManager.close();
  
  logger.info('📴 Backup scheduler stopped');
}

// Graceful shutdown handlers
process.on('SIGTERM', async () => {
  logger.info('🔄 SIGTERM received, stopping backup scheduler...');
  await stopBackupScheduler();
  process.exit(0);
});

process.on('SIGINT', async () => {
  logger.info('🔄 SIGINT received, stopping backup scheduler...');
  await stopBackupScheduler();
  process.exit(0);
});

Job Recovery

Recover Interrupted Jobs

typescript
export async function recoverInterruptedJobs(): Promise<void> {
  // Find jobs that were running when app stopped
  const interruptedJobs = await BackupSchedule.find({
    status: 'running',
  });
  
  for (const job of interruptedJobs) {
    // Reset status
    await BackupSchedule.findByIdAndUpdate(job._id, {
      $set: {
        status: 'idle',
        runningJobId: null,
        lastError: null,
      }
    });
    
    // Reschedule if due
    if (shouldReschedule(job)) {
      await backupQueueManager.addBackupJob(job, 5000);
    }
  }
}

Performance Optimization

Batch Operations

typescript
// ✅ DO: Process in batches
const batchSize = 100;
const cursor = BackupSchedule.find({...}).batchSize(batchSize).cursor();

for await (const schedule of cursor) {
  // Process schedule
}

// ❌ DON'T: Load all at once
const schedules = await BackupSchedule.find({...}); // May be too large

Efficient Queries

typescript
// ✅ DO: Use indexes
BackupSchedule.createIndex({ nextRunAt: 1, status: 1 });
BackupSchedule.createIndex({ enabled: 1, status: 1 });

// ✅ DO: Project only needed fields
BackupSchedule.find({...}, { nextRunAt: 1, status: 1 });

Troubleshooting Guide

Common Issues

1. Queue Not Processing

bash
# Check Redis connection
redis-cli ping

# Check queue stats
curl http://localhost:3000/api/v1/backup/queue/stats

2. Schedules Not Running

bash
# Check scheduler logs
tail -f logs/backup-scheduler.log

# Check MongoDB schedules
db.backupschedules.find({ enabled: true, status: 'idle' })

3. OpenStack Errors

bash
# Check OpenStack credentials
echo $OPENSTACK_ENDPOINT_HCM
echo $OPENSTACK_USERNAME_HCM

# Test authentication
curl -X POST $OPENSTACK_ENDPOINT_HCM/identity/v3/auth/tokens ...

Summary

Key Points

  1. Error Handling

    • Standardized error objects
    • Preserve error structure
    • Clear error messages
  2. Logging

    • Use Winston logger
    • Appropriate log levels
    • Context in logs
  3. Monitoring

    • Queue statistics
    • Health checks
    • Alerting
  4. Graceful Shutdown

    • Handle SIGTERM/SIGINT
    • Close connections
    • Save state
  5. Job Recovery

    • Recover interrupted jobs
    • Reschedule if needed
    • Reset status

Next Steps

Bạn đã hoàn thành khóa học!

Quay lại:


Last Updated: 2025-01-25
Previous: 08. Snapshot Limits
Back to: Index

Internal documentation for iNET Portal