Postmortem: Unveiling the Story of a Database Disaster
Script Unveiled: Exploring the intricacies of our database backup magic.

Postmortem: Unveiling the Story of a Database Disaster

Postmortem: Inventory Management System Outage

Issue Summary:

  • Duration: October 30, 2023
  • Impact:

  • Service Affected: Web-based Inventory Management System (Under Development)
  • User Experience: I was unable to access the data on the interface. I couldn’t add items in the database.

  • Root Cause: Database Corruption due to Unforeseen MySQL Issue

Timeline:

  • Detection Time: October 30, 2023, [Time of detection]

  • Detection Method: Discovered the issue during routine usage of the system interface.

  • Actions Taken:

  • Investigated server logs for anomalies.
  • Assumed Root Cause: Initially suspected a network issue impacting database connectivity.
  • Checked server resources and connectivity.
  • Assumed Root Cause: Database server load was within normal limits.
  • Investigated MySQL logs for errors.
  • Unable to access MySQL as root or other users.
  • Attempted safe mode access and user recreation without success.

  • Misleading Paths:

  • Explored Network Connectivity Issues - ruled out after investigation.
  • Explored High Server Load - not the cause.
  • Explored MySQL Safe Mode for User Recreation - unsuccessful.

  • Resolution:

  • Identified and rectified a corrupted database table.
  • Applied necessary fixes to MySQL configuration.
  • Restored the system from the last available backup.
  • MySQL was uninstalled and reinstalled on the Ubuntu 23 droplet.
  • Recreated two databases, tables, and users.
  • Granted user privileges and added missing columns.
  • Resolution Time: Approximately three hours.

Root Cause and Resolution:

Root Cause:

  • The primary cause was identified as a corruption in a key database table, likely due to an unexpected MySQL glitch.

Resolution:

  • The corrupted table was repaired through MySQL tools.
  • Configuration adjustments were made to prevent a recurrence.
  • A full system restore was performed from the most recent backup.
  •  MySQL was uninstalled and reinstalled on the Ubuntu 23 droplet.
  • Databases, tables, and users were recreated, and missing columns added.

Backup Implementation:

  • Backups were scheduled every two days using a cron job and a bash script.

Article content
Script Unveiled: Exploring the intricacies of our database backup magic.

This script captures the current date, sets the database name, defines the backup path, and then uses mysqldump to create a dump file. Finally, it compresses the dump using tar.

Article content
File Permissions Unveiled: Witnessing the access rights of our crucial backup script.

To make the script executable, run:

chmod +x stock_backup.sh

Article content
Backup Rhythm: How this cron job ensures our data is securely backed up every 48 hours.


The cron job executes the stock_backup.sh Bash script every two days at midnight. This script creates a MySQL database dump and saves it in a backup directory. The execution output, along with any errors, is appended to a log file (cron_log.txt)."

Corrective and Preventative Measures:

  • Improvements/Fixes:

  • Implement regular database health checks in the monitoring system.
  • Establish a more robust backup and recovery strategy.
  • Perform backups every two days.
  • Consider implementing database version control for schema changes.
  • Implement an automated backup script and directory creation.
  • Create a MySQL database replica on another server.

  • Tasks:

  • TODO: Schedule regular database maintenance to address potential issues proactively.
  • TODO: Review and update backup procedures to ensure quick system recovery.
  • TODO: Monitor and improve database replication for real-time backup.
  • TODO: Conduct regular drills to test the effectiveness of the backup and recovery process.

Conclusion

In reflecting on the development and fortification of our database backup system, the journey has been both a challenge and a revelation. I am immensely grateful to ALX Africa for affording me the incredible opportunity to partake in their Software Engineering Program . The unwavering support from the Mastercard Foundation, who sponsored this program, has been a beacon throughout this learning odyssey.

Every line of code written and every solution derived bear the imprint of the skills acquired during my eight months in the ALX Africa program. This project stands as a testament to the transformative impact of quality education and mentorship. The credit provided by ALX Africa also empowered me to leverage DigitalOcean resources, a pivotal element in the success of this endeavor.

Special thanks to alx_africa Africa and the Mastercard Foundation for not only making this learning journey possible but for shaping it into an enriching experience. As I continue to grow in my software engineering endeavors, I am fueled by the knowledge and skills fostered in me by these remarkable organizations.

#TechGratitude #ALXAfrica #MastercardFoundation #DigitalOcean #SoftwareEngineering


To view or add a comment, sign in

More articles by ALEX RUGEMA

Explore content categories