节点一 alert日志:
PDB(17):Transaction recovery: lock conflict caught and ignored
PDB(17):Transaction recovery: lock conflict caught and ignored
PDB(17):Transaction recovery: lock conflict caught and ignored
...
节点二: alert日志
PDB(17):minact-scn: useg scan erroring out with error e:12751
PDB(17):minact-scn: useg scan erroring out with error e:12751
PDB(17):minact-scn: useg scan erroring out with error e:12751
...
起因:
分区表 表级nologging,分批四千万数据并行插入,全表共6亿多数据,导致死锁,查询
select 'alter system kill session '||chr(39)||t2.sid||','||t2.serial#||chr(39)||';' from v$locked_object t1,v$session t2 where t1.session_id=t2.sid order by t2.logon_time;
alter system kill session '4843,29019';
...
全部是同一个会话,竟有3000多条记录,
后采用杀会话的方式没能释放,又用kill -9 杀掉进程。随后,alert日志出现以上告警。
然后各种查资料...网上有说要dump数据快,技术有限,还好是压力测试用的,等着慢慢恢复,停止此表数据的插入。
1.查看恢复时使用的回滚段
select b.name useg, b.inst# instid, b.status$ status, a.ktuxeusn xid_usn, a.ktuxeslt xid_slot, a.ktuxesqn xid_seq, a.ktuxesiz undoblocks, a.ktuxesta txstatus from x$ktuxe a, undo$ b where a.ktuxecfl like '�AD%' and a.ktuxeusn = b.us#;
USEG INSTID STATUS XID_USN XID_SLOT XID_SEQ UNDOBLOCKS TXSTATUS ------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------------- _SYSSMU30_2947991045$ 1 3 30 10 50494 3572115 ACTIVE
2.查看恢复进度
select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo from x$ktuxe where ktuxesta <> 'INACTIVE' and ktuxecfl like '�AD%' order by ktuxesiz asc;
USN SLOT SEQ STATE UNDO ---------- ---------- ---------- ---------------- ---------- 30 10 50494 ACTIVE 2815649
等着慢慢恢复,一分钟约一万block的速度,6个小时3572115blocks。第二天观察UNDO恢复正常了,告警也随之消失。