Keeping Deleted Cells

探讨了HBase中保留已删除单元格的功能及其影响。默认情况下,删除标记会延伸到时间的开始,使得Get或Scan操作无法看到被删除的单元格。然而,通过设置KEEP_DELETED_CELLS属性,即使在存在删除操作的情况下,也能进行特定时间点的查询。此功能允许在删除单元格后仍能检索到它们,只要查询的时间范围结束于可能影响这些单元格的任何删除时间戳之前。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

By default, delete markers extend back to the beginning of time. Therefore, Get or Scan operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed.

ColumnFamilies can optionally keep deleted cells. In this case, deleted cells can still be retrieved, as long as these operations specify a time range that ends before the timestamp of any delete that would affect the cells. This allows for point-in-time queries even in the presence of deletes.

Deleted cells are still subject to TTL and there will never be more than “maximum number of versions” deleted cells. A new “raw” scan options returns all deleted rows and the delete markers.
Change the Value of KEEP_DELETED_CELLS Using HBase Shell

hbase> hbase> alter ‘t1′, NAME => ‘f1′, KEEP_DELETED_CELLS => true

Example 13. Change the Value of KEEP_DELETED_CELLS Using the API


HColumnDescriptor.setKeepDeletedCells(true);

Let us illustrate the basic effect of setting the KEEP_DELETED_CELLS attribute on a table.

First, without:

create ‘test’, {NAME=>‘e’, VERSIONS=>2147483647}
put ‘test’, ‘r1’, ‘e:c1’, ‘value’, 10
put ‘test’, ‘r1’, ‘e:c1’, ‘value’, 12
put ‘test’, ‘r1’, ‘e:c1’, ‘value’, 14
delete ‘test’, ‘r1’, ‘e:c1’, 11

hbase(main):017:0> scan ‘test’, {RAW=>true, VERSIONS=>1000}
ROW COLUMN+CELL
r1 column=e:c1, timestamp=14, value=value
r1 column=e:c1, timestamp=12, value=value
r1 column=e:c1, timestamp=11, type=DeleteColumn
r1 column=e:c1, timestamp=10, value=value
1 row(s) in 0.0120 seconds

hbase(main):018:0> flush ‘test’
0 row(s) in 0.0350 seconds

hbase(main):019:0> scan ‘test’, {RAW=>true, VERSIONS=>1000}
ROW COLUMN+CELL
r1 column=e:c1, timestamp=14, value=value
r1 column=e:c1, timestamp=12, value=value
r1 column=e:c1, timestamp=11, type=DeleteColumn
1 row(s) in 0.0120 seconds

hbase(main):020:0> major_compact ‘test’
0 row(s) in 0.0260 seconds

hbase(main):021:0> scan ‘test’, {RAW=>true, VERSIONS=>1000}
ROW COLUMN+CELL
r1 column=e:c1, timestamp=14, value=value
r1 column=e:c1, timestamp=12, value=value
1 row(s) in 0.0120 seconds

Notice how delete cells are let go.

Now let’s run the same test only with KEEP_DELETED_CELLS set on the table (you can do table or per-column-family):

hbase(main):005:0> create ‘test’, {NAME=>‘e’, VERSIONS=>2147483647, KEEP_DELETED_CELLS => true}
0 row(s) in 0.2160 seconds

=> Hbase::Table - test
hbase(main):006:0> put ‘test’, ‘r1’, ‘e:c1’, ‘value’, 10
0 row(s) in 0.1070 seconds

hbase(main):007:0> put ‘test’, ‘r1’, ‘e:c1’, ‘value’, 12
0 row(s) in 0.0140 seconds

hbase(main):008:0> put ‘test’, ‘r1’, ‘e:c1’, ‘value’, 14
0 row(s) in 0.0160 seconds

hbase(main):009:0> delete ‘test’, ‘r1’, ‘e:c1’, 11
0 row(s) in 0.0290 seconds

hbase(main):010:0> scan ‘test’, {RAW=>true, VERSIONS=>1000}
ROW COLUMN+CELL
r1 column=e:c1, timestamp=14, value=value
r1 column=e:c1, timestamp=12, value=value
r1 column=e:c1, timestamp=11, type=DeleteColumn
r1 column=e:c1, timestamp=10, value=value
1 row(s) in 0.0550 seconds

hbase(main):011:0> flush ‘test’
0 row(s) in 0.2780 seconds

hbase(main):012:0> scan ‘test’, {RAW=>true, VERSIONS=>1000}
ROW COLUMN+CELL
r1 column=e:c1, timestamp=14, value=value
r1 column=e:c1, timestamp=12, value=value
r1 column=e:c1, timestamp=11, type=DeleteColumn
r1 column=e:c1, timestamp=10, value=value
1 row(s) in 0.0620 seconds

hbase(main):013:0> major_compact ‘test’
0 row(s) in 0.0530 seconds

hbase(main):014:0> scan ‘test’, {RAW=>true, VERSIONS=>1000}
ROW COLUMN+CELL
r1 column=e:c1, timestamp=14, value=value
r1 column=e:c1, timestamp=12, value=value
r1 column=e:c1, timestamp=11, type=DeleteColumn
r1 column=e:c1, timestamp=10, value=value
1 row(s) in 0.0650 seconds

KEEP_DELETED_CELLS is to avoid removing Cells from HBase when the only reason to remove them is the delete marker. So with KEEP_DELETED_CELLS enabled deleted cells would get removed if either you write more versions than the configured max, or you have a TTL and Cells are in excess of the configured timeout, etc.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值