案例:是谁用了我的临时表空间?
环境:RHEL 6.5 + Oracle 11.2.0.4 RAC + ADG
起初发现自己的ADG测试环境不再同步,进一步分析是DATA磁盘组空间耗尽导致的,可是最近在磁盘组上的数据库都没有做过什么测试,且测试磁盘组一直都留有2G+剩余空间,那是什么导致突然没空间了呢? 经过查询dba_data_files发现数据文件的确没有什么增长,但查询dba_temp_files发现临时文件空间增长严重。
sys@DEMO> set lines 180
sys@DEMO> select file_id,file_name,tablespace_name,bytes/1048576 "MB" from dba_temp_files;
FILE_ID FILE_NAME TABLESPACE_NAME MB
---------- ------------------------------------------------------- ------------------------------ ----------
1 +DATA/demo/tempfile/temp.264.1018829761 TEMP 2701
2 +DATA/demo/tempfile/temp_jingyu.258.1018830415 TEMP_JINGYU 30
sys@DEMO>
或者直接从asmcmd中也可以清楚看到TEMP文件占用大量空间:
ASMCMD> pwd
+data/demo/tempfile
ASMCMD> ls -ls
Type Redund Striped Time Sys Block_Size Blocks Bytes Space Name
TEMPFILE UNPROT COARSE NOV 16 09:00:00 Y 8192 345729 2832211968 2834300928 TEMP.264.1018829761
TEMPFILE UNPROT COARSE NOV 04 10:00:00 Y 8192 3841 31465472 32505856 TEMP_JINGYU.258.1018830415
ASMCMD> du
Used_MB Mirror_used_MB
2734 2734
基本已经确认了就是temp文件占用了空间,导致DATA磁盘组空间耗尽,那么是谁使用了临时表空间呢?根据DG不同步的时间点初步定位是在11-16号这天,我们可以直接根据DBA_HIST_ACTIVE_SESS_HISTORY中的TEMP_SPACE_ALLOCATED字段进一步定位:
DBA_HIST_ACTIVE_SESS_HISTORY TEMP_SPACE_ALLOCATED Amount of TEMP memory (in bytes) consumed by this session at the time this sample was taken
select instance_number,
to_char(sample_time,'mm-dd hh24:mi'),
sum(TEMP_SPACE_ALLOCATED)
from dba_hist_active_sess_history
where sample_time > to_date('11-16 00:00','mm-dd hh24:mi')
and sample_time < to_date('11-17 00:00','mm-dd hh24:mi')
group by instance_number, to_char(sample_time,'mm-dd hh24:mi')
order by 1, 2;
省略部分输出..
2 11-16 08:43
2 11-16 08:44
2 11-16 08:50 574619648
2 11-16 08:51 1743781888
2 11-16 08:52 3379560448
2 11-16 08:53 5015339008
2 11-16 08:54 6727663616
2 11-16 08:55 8554283008
2 11-16 08:56 1.0105E+10
2 11-16 08:57 1.1897E+10
2 11-16 08:58 1.3597E+10
2 11-16 08:59 1.4906E+10
2 11-16 09:00 2787115008
2 11-16 09:02
2 11-16 09:04
进一步细化时间,秒级别(注意DBA_HIST_ACTIVE_SESS_HISTORY默认采样数据间隔为10s)
select instance_number,
to_char(sample_time,'mm-dd hh24:mi:ss'),
sum(TEMP_SPACE_ALLOCATED)
from dba_hist_active_sess_history
where sample_time > to_date('11-16 08:43','mm-dd hh24:mi') and sample_time < to_date('11-16 09:02','mm-dd hh24:mi')
group by instance_number, to_char(sample_time,'mm-dd hh24:mi:ss')
order by 1, 2;
INSTANCE_NUMBER TO_CHAR(SAMPLE SUM(TEMP_SPACE_ALLOCATED)
--------------- -------------- -------------------------
省略部分输出..
2 11-16 08:44:29
2 11-16 08:50:29 35651584
2 11-16 08:50:39 134217728
2 11-16 08:53:21 819986432
2 11-16 08:53:31 866123776
2 11-16 08:53:41 897581056
2 11-16 08:53:51 936378368
2 11-16 08:54:01 991952896
2 11-16 08:58:53 2356150272
2 11-16 08:59:03 2383413248
2 11-16 08:59:13 2403336192
2 11-16 08:59:24 2443182080
2 11-16 08:59:34 2487222272
2 11-16 08:59:44 2550136832
2 11-16 08:59:54 2638217216
2 11-16 09:00:04 2786066432
2 11-16 09:00:14 1048576
2 11-16 09:00:24
81 rows selected.
实例2从11-16 08:50:29 开始,到 09:00:04 结束,temp增长到2G+(2786066432),那么具体是哪些SQL消耗的呢? 上面的查询可以直接加入sql_id字段定位,发现都是同一个SQL导致的:
select instance_number,
to_char(sample_time,'mm-dd hh24:mi:ss'),
sql_id,
sum(TEMP_SPACE_ALLOCATED)
from dba_hist_active_sess_history
where sample_time > to_date('11-16 08:43','mm-dd hh24:mi')
and sample_time < to_date('11-16 09:02','mm-dd hh24:mi')
group by instance_number, to_char(sample_time,'mm-dd hh24:mi:ss'), sql_id
order by 1, 2;
2 11-16 08:59:24 auyf8px9ywc6j 2443182080
2 11-16 08:59:34 auyf8px9ywc6j 2487222272
2 11-16 08:59:44 auyf8px9ywc6j 2550136832
2 11-16 08:59:54 auyf8px9ywc6j 2638217216
2 11-16 09:00:04 auyf8px9ywc6j 2786066432
SQL_ID auyf8px9ywc6j 对应的文本为:
SQL_TEXT
--------------------------------------------------------------------------------
WITH SNAP_RANGES AS (SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER ,SN.STAR
TUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP ,MAX(SN.SNAP_ID)
AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS MIN_DATE ,MAX(CAST(END_I
NTERVAL_TIME AS DATE)) AS MAX_DATE FROM DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST
WHERE SN.BEGIN_INTERVAL_TIME > TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRU
NC(SYSDATE) AND SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROU
P BY SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME ) ,DELTA
_DATA AS (SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE WHEN SR.STARTUP
_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE + (TM2.VALUE - TM1.VALU
E) ELSE (TM2.VALUE - TM1.VALUE) END AS DELTA_TIME FROM WRH$_SYS_TIME_MODEL TM1 ,
WRH$_SYS_TIME_MODEL TM2 ,SNAP_RANGES SR WHERE TM1.DBID = SR.DBID AND TM1.INSTANC
E_NUMBER = SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID = SR
.STAT_ID AND TM2.DBID = SR.DBID AND TM2.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND
TM2.SNAP_ID = SR.MAX_SNAP AND TM2.STAT_ID = SR.STAT_ID ) SELECT STAT_NAME ,ROUN
D(SUM(DELTA_TIME/1000000),2) AS SECS FROM DELTA_DATA GROUP BY STAT_NAME
SQL执行计划为:
如果进一步获取SQL的awr报告还可以看到SQL执行的统计信息:
Stat Name Statement Total Per Execution % Snap Total
Elapsed Time (ms) 582,925 582,924.68 1.84
CPU Time (ms) 64,811 64,811.15 5.89
Executions 1
Buffer Gets 2,472,595 2,472,595.00 8.88
Disk Reads 1 1.00 0.00
Parse Calls 1 1.00 0.00
Rows 0 0.00
User I/O Wait Time (ms) 65,348
Cluster Wait Time (ms) 17
Application Wait Time (ms) 0
Concurrency Wait Time (ms) 1
Invalidations 0
Version Count 1
Sharable Mem(KB) 64
查到这里其实已经确认这个事情并不是人为,因为如果拿这个SQL_ID去网络搜索,会发现有人提到过这个SQL_ID,说这是由MMON发起的SQL
It looks like something made by a DBA, but it comes from the MMON.
因为只是临时文件,且目前没有被持续使用到,又是测试环境,可以直接按照测试需求resize为较小值,然后为避免这样的事情,再关闭其自动扩展的功能:
sys@DEMO> alter database tempfile 1 resize 500M;
sys@DEMO> alter database tempfile 1 autoextend off;
修改完再次查询:
sys@DEMO> select file_id, file_name, tablespace_name, bytes/1048576 "MB" from dba_temp_files;
FILE_ID FILE_NAME TABLESPACE_NAME MB
---------- ------------------------------------------------------- ------------------------------ ----------
1 +DATA/demo/tempfile/temp.264.1018829761 TEMP 500
2 +DATA/demo/tempfile/temp_jingyu.258.1018830415 TEMP_JINGYU 30
DG备库启动应用,观察确认恢复正常同步:
sys@ORCL> recover managed standby database using current logfile disconnect;
Media recovery complete.
sys@ORCL> select * from v$dataguard_stats;
NAME VALUE UNIT TIME_COMPUTED DATUM_TIME
-------------------------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------
transport lag +00 00:00:00 day(2) to second(0) interval 11/18/2019 23:10:55 11/18/2019 23:10:54
apply lag +00 00:00:00 day(2) to second(0) interval 11/18/2019 23:10:55 11/18/2019 23:10:54
apply finish time +00 00:00:00.000 day(2) to second(3) interval 11/18/2019 23:10:55
estimated startup time 24 second 11/18/2019 23:10:55
一般确认时间正确,各指标的value都为0即可。若不放心可以再手工去主库切换几次日志,看下备库的同步表现。
.png?raw=1)