Home Database Mysql Tutorial 一次惊心动魄的ASM磁盘头损坏故障处理过程带来的深思

一次惊心动魄的ASM磁盘头损坏故障处理过程带来的深思

Jun 07, 2016 pm 05:17 PM

Oracle数据库,为了防止数据丢失以及构建高可用环境给出了多种架构方式。例如,为了防止Oracle实例级别的单点故障提供了RAC技术(

数据通常比喻为企业的血液和生命,数据安全一直是大家非常重视的话题。

Oracle数据库,为了防止数据丢失以及构建高可用环境给出了多种架构方式。例如,为了防止Oracle实例级别的单点故障提供了RAC技术(Real Application Clusters,真正的应用集群),RAC以Share Everything的架构方式使多个主机实例可以共享一套存储上的数据,从而避免了由于个别实例出现故障导致数据库不可用;RAC技术仅仅给出了实例层面的高可用解决方案,为了防止存储层面的单点故障,Oracle又提出了Data Guard(数据卫士)技术,无论是逻辑Data Guard还是物理Data Guard都从存储层面解决了单点故障,同时也是灾备技术的最佳选择。基于RAC和Data Guard技术,Oracle进一步又推出了MAA架构方式,即主站点是RAC架构方式,备用站点也是RAC架构方式,主备站点之间通过Data Guard技术使用redo传输变化的数据,确保备站点与主站点之间达到实时或者准实时的数据一致。

除此之外,Oracle还提供了各种备份恢复工具,比如物理备份恢复工具RMAN、逻辑备份恢复工具EXP/IMP EXPDP/IMPDP。基于这些工具便可以定制一套有效的备份恢复策略,以便防止数据丢失。

以上技术手段都是确保数据不丢失的必要条件,绝非充分条件!这些技术固然重要,但是与之相比,更加重要的是“人”的因素。再优秀的技术,如果没有人来定期做健康检查并排查潜在问题的话,这些都是“浮云”。这里给大家分享一个最近刚刚为客户处理完的一个Case。起到警示的作用。

【数据库环境描述】:

数据库类型:    某政府核心生产系统

影响范围:      全国性

数据量:        8T

主机类型:      IBM 570

数据库版本:    10.2.0.4.0

ASM版本:       10.2.0.4.0

数据库架构方式:两节点RAC架构方式;存储使用ASM技术,并且ASM磁盘头没有备份;未部署Data Guard灾备站点;归档模式,,使用RMAN做全库及增量备份。

【故障现象】:

在手工为表空间添加数据文件的时候,触发ASM磁盘头损坏,ASM的alert日志中记录了如下信息:

Sat Jun  9 01:45:51 2012

WARNING: cache read a corrupted block gn=1 dsk=39 blk=18 from disk 39

NOTE: a corrupted block was dumped to the trace file

ERROR: cache failed to read dsk=39 blk=18 from disk(s): 39

ORA-15196: invalid ASM block header [kfc.c:8033] [check_kfbh] [2147483687] [18] [2154781313 != 2634714205]

System State dumped to trace file /home/oracle/admin/+ASM/bdump/+asm1_arb0_602136.trc

NOTE: cache initiating offline of disk 39  group 1

WARNING: offlining disk 39.3734428818 (BDC_DATA_0039) with mask 0x3

NOTE: PST update: grp = 1, dsk = 39, mode = 0x6

【艰难的数据恢复过程】:

第一次尝试:直接恢复ASM磁盘头数据

尝试使用Oracle KFED(Kernel Files Editor)工具修改ASM磁盘头,如果这种方式能够顺利的恢复ASM磁盘头的话,将是一种完美的结局,然而事与愿违,此时的ASM磁盘头损坏非一般类型的损坏(故障原因中给出分析),使用KFED无法完成恢复。第一次梦魇不期而遇。

第二次尝试:使用RMAN进行数据恢复

既然每天都做RMAN的备份,正常情况下便可以使用RMAN进行数据恢复。因此,找来设备上尝试数据恢复(提醒:千万不要在生产环境上尝试恢复,保留现场很重要!),8T的数据拷贝以及恢复时间都是不可想象的,经过漫长的17小时的恢复,梦魇再一次来袭,在尝试恢复的过程中突然发现,RAC的第二节点上的归档日志不完整,仅剩半个月之前的归档日志,这是不可饶恕的,这也就意味着,使用RMAN工具最多只能恢复到15天前的数据,最近半个月的数据将荡然无存。这便是典型的“无人值守”导致的灾难。

第三次尝试:尽最大努力挽回数据

由于RAC第二节点归档日志的丢失导致最多可以恢复到15天前的数据,但也不要放弃希望,尽一切努力进行数据恢复。再次尝试使用RMAN恢复数据到15天前。正如小说中常见的情景,此时,梦魇又一次降临到这套可怜的数据库!即便恢复到了15天前的数据,发现数据库依然无法正常open。尝试各种手段,启用隐含参数等方法,亦不奏效。使用各种手段强制open数据库后alert日志中频现ORA-00600错误,即使在逻辑导出数据的过程中,都在频繁的抛出 ORA-00600错误。最终以备份介质无效无法完美恢复而终止。

第四次终极处理方法:使用工具直接抽取ASM磁盘组中的数据

在客户几近崩溃的时候,最终选择了直接数据抽取方法进行恢复,直接抽取ASM磁盘组中的数据,构造出数据文件的全貌,又是一个10多小时的漫长数据抽取恢复时间。经过漫长的等待之后,经验证,数据完美恢复完毕,没有让客户丢失任何一条重要数据!

【故障原因】:

此次故障推测是由于底层磁盘的映射混乱导致的,比如主机重启后导致disk number变化,导致Oracle认为ASM磁盘组的某块盘是voting disk,进而错误的写入了心跳信息,覆盖了原来位置上的ASM元数据ALT,这样一旦有大规模的reblance操作需要改上述ALT时,ASM便出现了上述故障。这种故障是无法通过简单的KFED工具进行恢复的。

【数据安全故障总结】:

这个Case中的故障本身并不可怕,可怕的是这个过程中出现的各种险情,发人深思。我们经常提到“备份重于一切”、“有备无患”等DBA职业操守。我认为最佳的诠释应该再加一条:在可信的架构方式下,定期对备份介质进行有效性验证,及灾备环境DRP演练的前提下!

针对此次故障的前因后果,给出以下建议:

1.给出高可用解决方案;建议使用Data Guard技术做远程灾备;

2.RMAN物理备份以及逻辑备份介质,要定期做备份介质有效性验证;

3.“人”的因素,制定严格的备份恢复检查机制,对备份以及灾备环境进行日常检查;

4.前期的架构设计很重要;

5.……

linux

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1269
29
C# Tutorial
1248
24
MySQL's Role: Databases in Web Applications MySQL's Role: Databases in Web Applications Apr 17, 2025 am 12:23 AM

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

Explain the role of InnoDB redo logs and undo logs. Explain the role of InnoDB redo logs and undo logs. Apr 15, 2025 am 12:16 AM

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

MySQL: An Introduction to the World's Most Popular Database MySQL: An Introduction to the World's Most Popular Database Apr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

MySQL's Place: Databases and Programming MySQL's Place: Databases and Programming Apr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

Why Use MySQL? Benefits and Advantages Why Use MySQL? Benefits and Advantages Apr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

MySQL vs. Other Programming Languages: A Comparison MySQL vs. Other Programming Languages: A Comparison Apr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages ​​such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages ​​have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

MySQL: From Small Businesses to Large Enterprises MySQL: From Small Businesses to Large Enterprises Apr 13, 2025 am 12:17 AM

MySQL is suitable for small and large enterprises. 1) Small businesses can use MySQL for basic data management, such as storing customer information. 2) Large enterprises can use MySQL to process massive data and complex business logic to optimize query performance and transaction processing.

How does MySQL index cardinality affect query performance? How does MySQL index cardinality affect query performance? Apr 14, 2025 am 12:18 AM

MySQL index cardinality has a significant impact on query performance: 1. High cardinality index can more effectively narrow the data range and improve query efficiency; 2. Low cardinality index may lead to full table scanning and reduce query performance; 3. In joint index, high cardinality sequences should be placed in front to optimize query.

See all articles