postgresl系列之NoSQL特性

最新推荐文章于 2025-07-08 17:43:01 发布

原创最新推荐文章于 2025-07-08 17:43:01 发布 · 1.9k 阅读

0 ·

CC 4.0 BY-SA版权

PostgreSQL 专栏收录该内容

56 篇文章

订阅专栏

本文探讨PostgreSQL中JSON和JSONB类型的性能差异，包括写入速度、读取效率及全文检索支持。通过实验对比，揭示不同场景下两者的优劣。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本文是《PostgreSQL实战》的读书笔记，感兴趣的可以参看本书对应章节。

一、PostgreSQL的NoSQL特性

PostgreSQL不只是一个关系型数据库，同时支持非关系特性，而且逐步增加对非关系特性的支持。

1.1 为jsonb类型创建索引

jsonb数据类型支持GIN索引。

{
 "id":1,
 "user_id":122,
 "user_name":"1_joker",
 "create_time":"2018-11-25 20:48:10"
}

表tbl_user_jsonb中的user_info字段类型是jsonb 。在user_info创建GIN索引。

create index idx_gin ON tbl_user_jsonb USING gin(user_info);

jsonb上的GIN支持 @> 、？、 ？& 、？| 操作符。例如

-- 使用索引查询
select * from tbl_user_jsonb where user_info @> '{"user_name":"1_joker"}'

-- 但是以下基于jsonb键值的查询不会走索引idx_gin:
select * from tbl_user_jsonb where user_info ->>'user_name' = '1_joker';

-- 如果要提升基于json类型的键值检索效率，可以在jsonb类型对应的键值上创建索引：
create index idx_gin_user_infob_user_name ON tbl_user_jsonb Using btree(user_info->>'user_name')

1.2 json、jsonb读写性能测试

json写入时比jsonb快，但检索时比jsonb慢。原因：json存储格式为文本而jsonb为二进制，存储格式的不同使的两种json数据类型的处理效率不一样。json类型存储的内容和输入数据一样，当检索json数据时必须重新解析；而jsonb以二进制存储已解析好的数据，当检索jsonb数据时不需要重新解析。

通过简答的例子测试json、jsonb的读写性能差异。

--1.创建user_ini表并插入200w数据。
create table user_ini(
	id int4,
	user_id int8,
	user_name character varying(64),
	create_time timestamp(6) with time zone default clock_timestamp()
);
--1.1插入数据
INSERT INTO user_ini(id,user_id,user_name)
SELECT r,round(random()*2000000),r||'_frances'
FROM generate_series(1,2000000) AS r;

--2.使用user_ini表生成json、jsonb数据，创建user_ini_json，user_ini_jsonb表。
CREATE TABLE tbl_user_json(id serial,user_info json);

CREATE TABLE tbl_user_jsonb(id serial,user_info jsonb);

json、 jsonb 表写性能测试

根据user_ini数据通过 row_to_json函数向表user_ini_json插入200万json数据。

INSERT INTO tbl_user_json(user_info) SELECT row_to_json(user_ini) FROM user_ini;

在这里插入图片描述

INSERT INTO tbl_user_jsonb(user_info) SELECT row_to_json(user_ini) FROM user_ini;

在这里插入图片描述
可以明显看出来，json的写性能要比jsonb好 。从占用空间来看，同样数量jsonb数据类型占用空间比json稍大。

json、jsonb表读性能测试

根据user_info字段的user_name键的值查询

注意：关闭并行查询 SET max_parallel_workers_per_gather = 0

EXPLAIN ANALYSE SELECT * FROM tbl_user_jsonb WHERE user_info ->>'user_name' = '1_frances';

-- 执行计划
Seq Scan on tbl_user_jsonb  (cost=0.00..73500.32 rows=10000 width=144) (actual time=0.800..591.076 rows=1 loops=1)
  Filter: ((user_info ->> 'user_name'::text) = '1_frances'::text)
  Rows Removed by Filter: 1999999
Planning time: 0.056 ms
Execution time: 591.101 ms

基于user_info字段的user_name 键值创建btree索引

CREATE INDEX idx_jsonb ON tbl_user_jsonb USING btree((user_info->>'user_name'));

再次执行上述查询：

EXPLAIN ANALYSE SELECT * FROM tbl_user_jsonb WHERE user_info ->>'user_name' = '1_frances';

-- 执行计划
Bitmap Heap Scan on tbl_user_jsonb  (cost=233.93..24043.95 rows=10000 width=144) (actual time=0.372..0.373 rows=1 loops=1)
  Recheck Cond: ((user_info ->> 'user_name'::text) = '1_frances'::text)
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on idx_jsonb  (cost=0.00..231.43 rows=10000 width=0) (actual time=0.328..0.328 rows=1 loops=1)
        Index Cond: ((user_info ->> 'user_name'::text) = '1_frances'::text)
Planning time: 0.629 ms
Execution time: 0.429 ms

通过比较执行计划可以看出走了索引，并且SQL时间下降。

为了更好地对比 tbl_user_json、tbl_user_jsonb 表基于键值查询的效率，对user_info字段id键进行范围扫描以对比性能。

CREATE  INDEX idx_gin_user_info_id ON tbl_user_json USING btree(((user_info ->>'id')::integer));

CREATE  INDEX idx_gin_user_infob_id ON tbl_user_jsonb USING btree(((user_info ->>'id')::integer));

--索引创建后，查询tbl_user_json表
EXPLAIN ANALYSE SELECT id,user_info->'id',user_info->'user_name' FROM tbl_user_json 
WHERE(user_info->>'id')::int4 > 1 AND (user_info->>'id')::int4 < 10000;

-- 执行计划
Bitmap Heap Scan on tbl_user_json  (cost=214.93..22741.47 rows=10000 width=68) (actual time=1.830..41.511 rows=9998 loops=1)
  Recheck Cond: ((((user_info ->> 'id'::text))::integer > 1) AND (((user_info ->> 'id'::text))::integer < 10000))
  Heap Blocks: exact=173
  ->  Bitmap Index Scan on idx_gin_user_info_id  (cost=0.00..212.43 rows=10000 width=0) (actual time=1.762..1.762 rows=9998 loops=1)
        Index Cond: ((((user_info ->> 'id'::text))::integer > 1) AND (((user_info ->> 'id'::text))::integer < 10000))
Planning time: 0.389 ms
Execution time: 42.467 ms

查询表tbl_user_json的user_info字段id键值在1到10000范围内走了索引。

EXPLAIN ANALYSE SELECT id,user_info->'id',user_info->'user_name' FROM tbl_user_jsonb
WHERE(user_info->>'id')::int4 > 1 AND (user_info->>'id')::int4 < 10000;

-- 执行计划
Bitmap Heap Scan on tbl_user_jsonb  (cost=214.93..24224.95 rows=10000 width=68) (actual time=1.269..12.417 rows=9998 loops=1)
  Recheck Cond: ((((user_info ->> 'id'::text))::integer > 1) AND (((user_info ->> 'id'::text))::integer < 10000))
  Heap Blocks: exact=213
  ->  Bitmap Index Scan on idx_gin_user_infob_id  (cost=0.00..212.43 rows=10000 width=0) (actual time=1.175..1.175 rows=9998 loops=1)
        Index Cond: ((((user_info ->> 'id'::text))::integer > 1) AND (((user_info ->> 'id'::text))::integer < 10000))
Planning time: 0.271 ms
Execution time: 13.074 ms

执行时间明显变短了。

从上面的测试看出，json写入比jsonb快，但检索时比jsonb慢。

二、全文索引对json和jsonb数据类型的支持

对于多数应用来说全文检索很少在数据库中实现，一般使用单独的全文检索引擎，例如基于SQL的全文检索引擎Sphinx。PostgreSQL支持全文检索，对于规模不大的应用如果不想搭建专门的搜索引擎，PostgreSQL的全文检索也可满足需求。

大部分检索需要通过数据库like操作匹配，这种检索方式的主要缺点：

不能很好地支持索引，通常需要全表扫描检索数据，数据量大时检索性能很低。
不能提供检索结果排序，当输出结果数据量非常大时表现更加明显

PostgreSQL全文检索有效地解决这个问题。PostgreSQL全文检索通过以下两种数据类型来实现。

tsvector

tsvector 全文检索数据类型代表一个被优化的可以基于搜索的文档，要将一串字符串转换成tsvector全文检索数据类型。

-- tsvector

SELECT 'Hello,cat,how are u? cat is smiling!'::tsvector;

-- 'Hello,cat,how' 'are' 'cat' 'is' 'smiling!' 'u?'
--可以看到，字符串的内容被分隔成好几段，但通过::tsvector只是做类型转换，没有进行数据标准处理
--对于英文全文检索可以通过函数 to_tsvecotr进行标准化：

SELECT to_tsvector('english' ,'Hello cat,')
-- 'cat':2 'hello':1

tsquery

tsquery 表示一个文本查询，存储用于搜索的词，并且支持布尔操作 & 、|、!

--将字符串转换成 tsquery
SELECT 'hello&cat'::tsquery;
-- 'hello' & 'cat'

--上述只是转换成tsquery类型，而并没有做标准化，使用to_tsquery函数可以执行标准化

SELECT to_tsquery('hello&cat');
-- 'hello' & 'cat'

--用于检索字符串是否包括"hello" 和 "cat"字符。
SELECT to_tsvector('english','Hello cat, how are u') @@ to_tsquery('hello&cat');
-- 结果：t

--2. 检索字符串是否包含字符"hello" 和 "dog" 
SELECT to_tsvector('english','Hello cat, how are u') @@ to_tsquery('hello&dog');
--结果：f