OLTP系统中一个批处理经常遇到的性能问题

摘录了yang的一个优化案例,我在以前的项目中遇到过多次,很简单,但是值得大家的注意。yang在查询视图的时候,使用的sql不是很直观,大家可以替换一下,查询一下相关的视图。

数据库是Oracle 10.2.0.3 Rac环境,不过现在处于单节点工作状态。当前的问题是页面访问很慢,没有其他进一步的信息。

根据描述,登陆服务器后首先使用top检查了系统运行状况,结果发现系统并不是很忙:

bash-3.00$ /usr/local/bin/top
load averages: 1.76, 1.63, 1.59; up 19+23:45:48 16:12:20
194 processes: 189 sleeping, 5 on cpu
CPU states: 78.6% idle, 17.5% user, 3.9% kernel, 0.0% iowait, 0.0% swap
Memory: 32G phys mem, 8755M free mem, 31G swap, 31G free swap

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
1151 oracle 1 1 0 20G 20G cpu 0:19 27.59% oracle
26096 oracle 11 32 0 20G 20G sleep 3:07 24.00% oracle
1177 oracle 11 30 0 20G 20G cpu 2:17 20.57% oracle
1187 oracle 11 44 0 20G 20G cpu 2:03 18.42% oracle
25298 oracle 11 50 0 20G 20G sleep 3:06 15.12% oracle
1557 oracle 1 52 0 20G 20G sleep 1:57 13.64% oracle
27086 oracle 12 52 0 20G 20G sleep 3:15 10.45% oracle
1175 oracle 1 24 0 20G 20G sleep 0:19 6.12% oracle
1551 oracle 1 29 0 20G 20G cpu 1:28 4.56% oracle
27076 oracle 1 100 -20 20G 20G sleep 14:56 4.02% oracle
27080 oracle 1 100 -20 20G 20G sleep 14:29 3.20% oracle
27090 oracle 2 59 0 20G 20G sleep 2:48 2.49% oracle
10672 oracle 1 59 0 3800K 2536K cpu 0:00 0.41% top
27131 oracle 1 59 0 20G 20G sleep 0:19 0.34% oracle
27123 oracle 11 59 0 20G 20G sleep 0:59 0.32% oracle

IOWAIT为0,用户部分只有17.5%,而IDLE达到了将近80%,这种情况应该是系统很轻闲才对。

根据判断问题和系统压力没有关系,问题应该发生在数据库层面上,而操作系统层面工作正常。

bash-3.00$ sqlplus / as sysDBA

SQL*Plus: Release 10.2.0.3.0 - Production on 星期三 9月 15 16:12:28 2010

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> select event, count(*)
2 from v$session
3 where username != user
4 group by event
5 having count(*) > 4;

EVENT COUNT(*)
---------------------------------------------------------------- ----------
enq: TX - row lock contention 7
SQL*Net message from client 57

既然前台页面反映系统很慢,肯定存在等待,检查一下当前系统中主要的等待事件是什么。发现绝大部分会话都处于空闲状态,但是有一些会话在等待锁。

检查V$LOCK视图:

SQL> select * from v$lock;

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---- -- -------- ---- ----- ------- ----- -----
00000008D43C7918 00000008D43C7938 548 XR 0 0 1 0 12409 2
00000008D43C79B0 00000008D43C79D0 551 RT 2 2 6 0 12291 2
00000008D43C7A48 00000008D43C7A68 484 TX 11272251 19 0 6 229 0
00000008D43C7AE0 00000008D43C7B00 551 MR 1 0 4 0 12291 2
00000008D43C7B78 00000008D43C7B98 551 MR 2 0 4 0 12291 2
.
.
.
00000008D43CA4D0 00000008D43CA4F0 551 MR 205 0 4 0 12291 2
00000008D43CA568 00000008D43CA588 459 TX 11272251 19 0 6 130 0
00000008D43CA600 00000008D43CA620 551 DM 1 0 4 0 12287 2
00000008D43CA698 00000008D43CA6B8 442 TX 11272251 19 0 6 104 0
00000008D43CA730 00000008D43CA750 475 TX 11272251 19 0 6 38 0
00000006943C6468 00000006943C6488 1097 XR 4 0 1 0 0 2
00000006943C6500 00000006943C6520 1029 TX 11272251 19 0 6 638 0
00000006943C6598 00000006943C65B8 989 TX 11272251 19 0 6 390 0
00000006943C6630 00000006943C6650 1097 CF 0 0 2 0 12403 2
00000006943C66C8 00000006943C66E8 1097 RS 25 1 2 0 12399 2
00000006943C67F8 00000006943C6818 1098 RT 2 0 6 0 12399 2
00000006943C69C0 00000006943C69E0 1098 RT 2 1 6 0 12291 2
00000006943C6A58 00000006943C6A78 998 TX 11272251 19 0 6 378 0
00000006943C6AF0 00000006943C6B10 1096 TS 3 2 3 0 12277 2
00000008D7C13778 00000008D7C137A0 484 TM 222937 0 2 0 229 2
00000008D7C13878 00000008D7C138A0 484 TM 222971 0 3 0 229 2
00000008D7C13978 00000008D7C139A0 484 TM 223250 0 2 0 229 2
00000008D7C13A78 00000008D7C13AA0 484 TM 243441 0 3 0 229 2
00000008D7C13B78 00000008D7C13BA0 484 TM 223188 0 3 0 229 2
00000008D7C13C78 00000008D7C13CA0 459 TM 222903 0 2 0 141 2
00000008D7C13D78 00000008D7C13DA0 459 TM 222971 0 3 0 130 2
00000008D7C13E78 00000008D7C13EA0 459 TM 223250 0 3 0 141 2
00000008D7C13F90 00000008D7C13FB8 459 TM 222937 0 2 0 130 2
00000008D7C14090 00000008D7C140B8 459 TM 243441 0 3 0 130 2
00000008D7C14190 00000008D7C141B8 459 TM 223188 0 3 0 130 2
00000008D7C14290 00000008D7C142B8 442 TM 222937 0 2 0 104 2
00000008D7C14390 00000008D7C143B8 442 TM 222971 0 3 0 104 2
00000008D7C14490 00000008D7C144B8 442 TM 223250 0 2 0 104 2
00000008D7C14590 00000008D7C145B8 442 TM 243441 0 3 0 104 2
00000008D7C14690 00000008D7C146B8 442 TM 223188 0 3 0 104 2
00000008D7C147A8 00000008D7C147D0 475 TM 222937 0 2 0 38 2
00000008D7C148A8 00000008D7C148D0 475 TM 222971 0 3 0 38 2
00000008D7C149A8 00000008D7C149D0 475 TM 223250 0 2 0 38 2
00000008D7C14AA8 00000008D7C14AD0 475 TM 243441 0 3 0 38 2
00000008D7C14BA8 00000008D7C14BD0 475 TM 223188 0 3 0 38 2
00000008D7C14CA8 00000008D7C14CD0 476 TM 223477 0 2 0 0 2
00000008D7C14DA8 00000008D7C14DD0 476 TM 223483 0 3 0 0 2
00000008D7C14EA8 00000008D7C14ED0 476 TM 223482 0 3 0 0 2
00000008D7C14FC0 00000008D7C14FE8 476 TM 223485 0 3 0 0 2
00000008D7C162F0 00000008D7C16318 434 TM 223188 0 3 0 3392 2
0000000697C12138 0000000697C12160 1029 TM 222937 0 2 0 638 2
0000000697C12238 0000000697C12260 1029 TM 222971 0 3 0 638 2
0000000697C12338 0000000697C12360 1029 TM 223250 0 2 0 638 2
0000000697C12438 0000000697C12460 1029 TM 243441 0 3 0 638 2
0000000697C12538 0000000697C12560 1029 TM 223188 0 3 0 638 2
0000000697C12638 0000000697C12660 989 TM 222937 0 2 0 390 2
0000000697C12738 0000000697C12760 989 TM 222971 0 3 0 390 2
0000000697C12838 0000000697C12860 989 TM 223250 0 2 0 390 2
0000000697C12950 0000000697C12978 989 TM 243441 0 3 0 390 2
0000000697C12A50 0000000697C12A78 989 TM 223188 0 3 0 390 2
0000000697C12B50 0000000697C12B78 998 TM 222937 0 2 0 378 2
0000000697C12C50 0000000697C12C78 998 TM 222971 0 3 0 378 2
0000000697C12D50 0000000697C12D78 998 TM 223250 0 2 0 378 2
0000000697C12E50 0000000697C12E78 998 TM 243441 0 3 0 378 2
0000000697C12F50 0000000697C12F78 998 TM 223188 0 3 0 378 2
00000008D7CAEA98 00000008D7CAEAD0 434 TX 11272251 19 6 0 1457 1
00000008D7CCE1B8 00000008D7CCE1F0 476 TX 17694738 16 6 0 0 2
00000008D7D256F8 00000008D7D25730 459 TX 14876753 15 6 0 141 2
00000008D7D7EEB8 00000008D7D7EEF0 442 TX 18022408 20 6 0 104 2
00000008D7D7FFF8 00000008D7D80030 475 TX 16056394 10 6 0 38 2
00000008D7D9EE78 00000008D7D9EEB0 484 TX 2490446 5278 6 0 229 2
0000000697D9BE58 0000000697D9BE90 989 TX 5767249 353 6 0 390 2
0000000697DB9B98 0000000697DB9BD0 998 TX 10354767 11 6 0 378 2
0000000697DD78D8 0000000697DD7910 1029 TX 13041703 13 6 0 638 2

已选择137行。

可以看到,V$LOCK视图中存在大量的锁信息,多个会话处于被锁定的状态,进一步检查这个视图:

SQL> select * from v$lock where block = 1;

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---- -- -------- --- ----- ------- ----- -----
00000008D7CAEA98 00000008D7CAEAD0 434 TX 11272251 19 6 0 1491 1

SQL> select * from v$lock where sid = 434;

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---- -- --------- ---- ----- ------- ----- -----
00000008D7C162F0 00000008D7C16318 434 TM 223188 0 3 0 3506 2
00000008D7CAEA98 00000008D7CAEAD0 434 TX 11272251 19 6 0 1571 1

找到锁定其他会话的会话,查询锁定的对象:

SQL> select owner, object_name, object_type
2 from dba_objects
3 where object_id = 223188;


OWNER OBJECT_NAME OBJECT_TYPE
------------------------------ ------------------------------ ------------------------------
ZHEJIANG ORD_HIT_COMM TABLE

这张表是系统中数据量最大的一张表,也是前台页面经常需要访问的一张表,这张表被锁定肯定会导致众多的会话处于等待状态。

SQL> select sql_text
2 from v$sql
3 where hash_value =
4 (select sql_hash_value from v$session where sid = 434);

SQL_TEXT
--------------------------------------------------------------------------------------------
update ord_hit_comm h set h.enable_flag = '2', h.last_update_date = sysdate where h.plat_id in ('DATA100000000000028
24837', 'DATA10000000000012870015', 'DATA10000000000012690072',
'DATA10000000000012390004', 'DATA10000000000012870014', 'DATA1000000000
0012870016') and h.sender_orgid = 'DATA10000000000013061647' and h.contract_id in ('ZJJY10000000000044867537',
'ZJJY10000000000040142580', 'ZJJY10000000000040142582',
'ZJJY10000000000040142584', 'ZJJY10000000000040142586', 'ZJJY1
0000000000040142588', 'ZJJY10000000000040142590', 'ZJJY100000000000401
42592', 'ZJJY10000000000040142594', 'ZJJY10000000000040142596',
'ZJJY1000000000004014

这个会话执行的操作显然是一个大批量的更新操作,按道理来说,这种操作应该是夜间运行的后台JOB才对,莫非是JOB运行的时间出现了偏差,检查详细的会话信息:

SQL> select username, program, terminal, action
2 from v$session where sid = 434;

USERNAME PROGRAM TERMINAL ACTION
------------------------------ --------------- --------------- --------------------
ZHEJIANG_PROJ plsqldev.exe KYLIN-PC SQL Window - New

显然是有人在处理批量更新,正是这个操作导致了大量会话被锁:

SQL> select * from v$lock where id1 = 223188;

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------ -- ------- ---- ----- ------- ----- -----
00000008D7C13B78 00000008D7C13BA0 484 TM 223188 0 3 0 624 2
00000008D7C14190 00000008D7C141B8 459 TM 223188 0 3 0 525 2
00000008D7C14690 00000008D7C146B8 442 TM 223188 0 3 0 499 2
00000008D7C14BA8 00000008D7C14BD0 475 TM 223188 0 3 0 433 2
00000008D7C150C0 00000008D7C150E8 456 TM 223188 0 3 0 346 2
00000008D7C155C0 00000008D7C155E8 476 TM 223188 0 3 0 324 2
00000008D7C15AD8 00000008D7C15B00 547 TM 223188 0 3 0 185 2
00000008D7C162F0 00000008D7C16318 434 TM 223188 0 3 0 3787 2
0000000697C12538 0000000697C12560 1029 TM 223188 0 3 0 1033 2
0000000697C12A50 0000000697C12A78 989 TM 223188 0 3 0 785 2
0000000697C12F50 0000000697C12F78 998 TM 223188 0 3 0 773 2
0000000697C13468 0000000697C13490 1036 TM 223188 0 3 0 392 2
0000000697C13C80 0000000697C13CA8 976 TM 223188 0 3 0 95 2
0000000697C13F80 0000000697C13FA8 990 TM 223188 0 3 0 46 2
0000000697C14898 0000000697C148C0 995 TM 223188 0 3 0 24 2

已选择15行。

除了在进行批处理的会话外,还有14个会话被锁,这些会话在得不到锁之前,一直处于等待状态,因此前台页面的感觉就是系统响应很慢,出不来结果。

SQL> select num_rows, blocks from dba_tables where table_name = 'ORD_HIT_COMM' and owner = 'ZHEJIANG';

NUM_ROWS BLOCKS
---------- ----------
13301787 963930

对一个上千万的表的批量更新,这种操作不但会占用大量的系统资源,更重要的是会长时间的锁住大量的数据,导致系统平均响应时间迅速增加,甚至使得系统处于不可用的状态。因此这种操作对于OLTP系统而言是非常危险的,应该在合理的时间利用合理的方法进行操作。

标签: 暂无标签
oraunix

写了 199 篇文章,拥有财富 1026,被 339 人关注

转播转播 分享分享 分享淘帖
回复

使用道具

成为第一个吐槽的人

您需要登录后才可以回帖 登录 | 加入社区

本版积分规则

意见
反馈