Heap only tuples mechanism in PostgreSQL

Postgresql differs from other DBMSs in that during the UPDATE operation, changes to the existing row do not occur, and instead a copy of the row is made that differs from the original with the values ​​of the columns affected by the update - in the original they are old, and in the copy they are changed. This approach, on the one hand, allows you to avoid blocking while executing read and write requests, and on the other hand, it creates the need to constantly clean old versions of strings that no one will ever read. In connection with this architectural feature, the question often arises of what will happen if you need to store in the database something like the time of the last access to data that otherwise does not change. Will it respond to performance? Will it lead to a constant restructuring of the indices?


In short, yes, Copy On Write will not go anywhere, but in many cases indexes can not be rebuilt, thanks to HOT.


Heap only tuples, also known as HOT, is the optimization Postgres uses to reduce the amount of I / O required for updates. Due to MVCC, an update in Postgres consists of finding a row for the update and inserting new versions of the row into the database. The main drawback of this procedure is the need to re-add a row to each index.
This requires a lot more I / O because the row needs to be reinserted into each index in the table. The need for reinsertion arises because the physical position of the new version of a line on disk is different from the physical position of the old version.


To reduce the amount of I / O required for UPDATE, the Postgres team added HOT to Postgres. The idea behind HOT is relatively simple. When updating the line, if possible, Postgres will put a new copy of the line immediately after the old copy of the line. Plus, in the old copy of the string, a special label is affixed so that Postgres knows that the new copy of the string is immediately after the old one. Therefore, updating all indexes is not necessary.


During the scan by index for which a new copy of the line passes, the Postgres filter will find the old copy of the line. Since there is a special label on the old copy of the line, Postgres will understand that the new copy of the line is located immediately after the old one and will find the new version and use it. It turns out that Postgres in such cases can behave as if all indexes point to a new copy of the line, and they do not need to be rebuilt.


Now HOT is involved only when only non-indexable columns are involved in the update. If at least one column participating in the update is included in the index, HOT cannot be applied. In this case, there are several problems with using HOT. For example, when you need to make a scan index by the index on the column that you updated and the old copy of the line falls into the predicate of the scan, but the new one does not. In this situation, Postgres will try to use the index to quickly find all rows that are suitable for the query predicate, and in the case of columns that were updated using HOT, it will produce a new copy of the row that does not match the query predicate. Due to this limitation (that HOT does not work when indexable columns are included in the update), Postgres can guarantee that when he tries to find rows that are suitable for the predicate through which the index is passed, then if the predicate matches the old version of the row, then the new version of the row also suits him and vice versa.


Currently under development is a HOT extension called WARM, which also works when updating columns on which indexes are created. The idea behind WARM is to put a new row immediately after the old and update the row for indexes in which the columns were changed. This greatly complicates the described situation, because now Postgres needs a way to somehow determine whether the row passes the filter for the index or not.


PS In the original article, the HOT mechanism is described, but here we have in mind the mechanism in which heap only tuples are involved, and the term itself has a separate meaning.


Heap only tuple is just a new version of the line. Strange as it may seem, a Heap is a table, and Heap only means that this line can only be found by the chain that leads from the older version of the line called the root.



Source: https://habr.com/ru/post/475642/


All Articles