[toc]

概念描述

ORACLE 11g 之后的自动统计信息的阈值默认是10%,也就是表变化超过10%才会进行搜集。如果分区表的某个分区的数据变化超过了10%,但是没有超过整个表的10%,默认情况下会不会进行自动的统计信息搜集呢。

测试验证

  1. 完了方便测试,这里修改当前的 NLS_DATE_FORMAT 显示时分秒。
1
2
3
SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';

Session altered.
  1. 创建分区表

1
2
3
4
5
6
7
8
9
10
11
12
13
SQL>  CREATE TABLE TEST_PARTITION2
(
id NUMBER,
name VARCHAR2(50),
register_date DATE
)
PARTITION BY RANGE (register_date)
(
PARTITION p_2019 VALUES LESS THAN (TO_DATE('01-01-2020', 'dd-mm-yyyy')),
PARTITION p_2020 VALUES LESS THAN (TO_DATE('01-01-2021', 'dd-mm-yyyy')),
PARTITION p_2021 VALUES LESS THAN (TO_DATE('01-01-2022', 'dd-mm-yyyy'))
);
Table created.
  1. 每个分区插入100000 条数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SQL> INSERT /*+ APPEND */ INTO TEST_PARTITION2
SELECT rownum, 'name_'||rownum, DATE '2019-01-01' + MOD(rownum, 365)
FROM dual CONNECT BY LEVEL <= 100000; 2 3

100000 rows created.

SQL> commit;

Commit complete.

SQL> INSERT /*+ APPEND */ INTO TEST_PARTITION2
SELECT rownum+100000, 'name_'||(rownum+100000), DATE '2020-01-01' + MOD(rownum, 365)
FROM dual CONNECT BY LEVEL <= 100000; 2 3

100000 rows created.

SQL> commit;

Commit complete.

SQL> INSERT /*+ APPEND */ INTO TEST_PARTITION2
SELECT rownum+100000, 'name_'||(rownum+100000), DATE '2021-01-01' + MOD(rownum, 365)
FROM dual CONNECT BY LEVEL <= 100000; 2 3

100000 rows created.

SQL> commit;

Commit complete.

SQL> SELECT COUNT(*) FROM TEST_PARTITION2;

COUNT(*)
----------
300000
  1. 验证数据各个分区的数据分布

总共 300000 条数据,每个分区100000 条数据

1
2
3
4
5
6
7
8
9
10
SQL> SELECT 'p_2019' as parted, COUNT(*) FROM TEST_PARTITION2 PARTITION(p_2019)
UNION ALL
SELECT 'p_2020' as parted, COUNT(*) FROM TEST_PARTITION2 PARTITION(p_2020)
UNION ALL
SELECT 'p_2021' as parted, COUNT(*) FROM TEST_PARTITION2 PARTITION(p_2021);
PARTED COUNT(*)
------------ ----------
p_2019 100000
p_2020 100000
p_2021 100000
  1. 搜集统计信息并查看
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SYS','TEST_PARTITION2',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO',cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> set lines 200
col PARTITION_NAME for a40
select partition_name,object_type,num_rows,to_char(last_analyzed,'yyyy-mm-dd hh24:mi:ss') last_analyzed from dba_tab_statistics where table_name=upper('TEST_PARTITION2');SQL> SQL>

PARTITION_NAME OBJECT_TYPE NUM_ROWS LAST_ANALYZED
-------------------------------- ------------------------
TABLE 300000 2023-08-04 11:49:28
P_2019 PARTITION 100000 2023-08-04 11:49:27
P_2020 PARTITION 100000 2023-08-04 11:49:28
P_2021 PARTITION 100000 2023-08-04 11:49:28

统计信息是11:49分搜集的

  1. 分区p_2019 插入16000条数据,超过了当前分区的10%,不超过当前表的总数据看的10%
1
2
3
4
5
6
7
8
9
SQL> INSERT /*+ APPEND */ INTO TEST_PARTITION2
SELECT rownum, 'name_'||rownum, DATE '2019-01-01' + MOD(rownum, 365)
FROM dual CONNECT BY LEVEL <= 16000; 2 3

16000 rows created.

SQL> commit;

Commit complete.
  1. 分区p_2020 插入 6000 条数据,数据不超过当前分区的10%,变化的总数据也不超过当前表总数据的10%
1
2
3
4
5
6
7
8
9
SQL> INSERT /*+ APPEND */ INTO TEST_PARTITION2
SELECT rownum+100000, 'name_'||(rownum+100000), DATE '2020-01-01' + MOD(rownum, 365)
FROM dual CONNECT BY LEVEL <= 6000;
commit; 2 3
6000 rows created.

SQL>

Commit complete.
  1. 确认当前数据和统计信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SQL>  SELECT 'p_2019' as parted, COUNT(*) FROM TEST_PARTITION2 PARTITION(p_2019)
UNION ALL
SELECT 'p_2020' as parted, COUNT(*) FROM TEST_PARTITION2 PARTITION(p_2020)
UNION ALL
SELECT 'p_2021' as parted, COUNT(*) FROM TEST_PARTITION2 PARTITION(p_2021);

PARTED COUNT(*)
------------ ----------
p_2019 116000
p_2020 106000
p_2021 100000
SQL> set lines 200
col PARTITION_NAME for a20
select partition_name,object_type,num_rows,to_char(last_analyzed,'yyyy-mm-dd hh24:mi:ss') last_analyzed from dba_tab_statistics where table_name=upper('TEST_PARTITION2');SQL> SQL>

PARTITION_NAME OBJECT_TYPE NUM_ROWS LAST_ANALYZED
---------------- ------------------------ ---------- ---------------
TABLE 300000 2023-08-04 11:49:28
P_2019 PARTITION 100000 2023-08-04 11:49:27
P_2020 PARTITION 100000 2023-08-04 11:49:28
P_2021 PARTITION 100000 2023-08-04 11:49:28
  1. 调整自动统计信息的任务运行时间为12:10
1
2
3
4
5
6
7
BEGIN
DBMS_SCHEDULER.SET_ATTRIBUTE(
name => '"SYS"."THURSDAY_WINDOW"',
attribute => 'REPEAT_INTERVAL',
VALUE => 'freq=daily;byday=THU;byhour=12;byminute=10; bysecond=0');
END;
/
  1. 等自动统计信息的任务执行之后,检查统计信息如下
1
2
3
4
5
6
7
8
9
10
11
SQL>set lines 200
col PARTITION_NAME for a20
select partition_name,object_type,num_rows,to_char(last_analyzed,'yyyy-mm-dd hh24:mi:ss') last_analyzed from dba_tab_statistics where table_name=upper('TEST_PARTITION2');SQL> SQL>

PARTITION_NAME OBJECT_TYPE NUM_ROWS LAST_ANALYZED

------------- --------------- ---------- ----------------------
TABLE 300000 2023-08-04 11:49:28
P_2019 PARTITION 116000 2023-08-04 12:10:38
P_2020 PARTITION 100000 2023-08-04 11:49:28
P_2021 PARTITION 100000 2023-08-04 11:49:28

– 可以看到分区 P_2019 的统计信息在12:10进行了重新搜集

知识总结

经过以上验证,自动搜集统计信息任务的分区表 10% 的阈值,是基于当前分区的。

也就是如果分区表的某个分区的数据变化超过当前分区的10%,而不超过当前表总数据的10%,自动搜集统计信息任务会搜集当前分区的统计信息。

参考文档

原文作者: liups.com

原文链接: http://liups.com/posts/aa64b78/

许可协议: 知识共享署名-非商业性使用 4.0 国际许可协议