PostgreSQL Architecture 2
PostgreSQL Architecture 2
For every user or connection that is established there is a server/backend process that is created for it.
You the have a number of processes that interact with the files on disk and also some of the processes
interact with the shared memory in order to process the transactions that are being sent by users and
applications.
The backend or server process then works on behalf of that user/app connected and interacts with the
buffers in the shared memory.
Every database system caches data in memory for fast access. Read or write operations from physical
disk and always slower than in memory.
So, most operations or work takes place in memory and is later on flushed or sent and saved to the disk.
Your data and transactions reside in Shared buffers until the are permanently written off to the disk.
WAL logs – is what helps the database system to maintain data integrity in the ACID model. Every
transaction/action is written to the transaction log before it is written to disk.
Its like the court stenographer/transcriptionist that takes notes in court for all that is being said. Then at
any given time the court stenographer/transcriptionist can be called upon to replay the conversation to
bring everyone up to speed and what the state of the argument in court is. Think of a secretary that
takes meeting minutes and might have to do a recap of what was said in the previous meeting by
reading his previous notes from the prior meeting to remind everyone of what took place.
So, in a nutshell the WAL/Transaction log – logs or records every transaction that is done against the
data so if and when there is a failure or you need to restore to a different system, those transaction can
be replayed/copied to bring the data back up to point and time that we want it.
Just like the data pages on disk use shared buffers as their primary work area before flushing to disk, The
WAL logs or transaction logs also use WAL buffers up front and later the information in the buffers is
flushed to disk later.
Temp buffers are for temp transactions – like creations of temp tables
The are a number of utility processes that make the database system operate smoothly.
The writer process – occasionally wakes up or comes along and occasionally looks for data that has
changed or has been modified called dirty pages in the shared buffers and occasionally writes that data
to the disk – in the data files on disk.
So, it frees up dirty pages from the shared buffers and takes them out of memory and stores the pages
permanently on disk in the data files. This also helps to free up space in the shared memory area.
The WAL writer does the same as it takes what ever have been written or transcribed in the WAL buffers
and removed them from memory and writes then permanently to the WAL files on disk.
A checkpoint is a point in the transaction log sequence where all the data files have been updated to
reflect the information in the log. So, at that point all data files that have been modified in memory are
going to be flushed to disk.
So, in order not to wait for the check pointer to do the work each time its scheduled to checkpoint the
writer keeps working in the background to keep the memory area freed up and to write modified data
to disk.
The archiver does not interact or go through the memory area. Its job is to simple copy and archive the
transaction logs that are being written to in a circular fashion. So, it copies the WAL files and puts them
in Archived files.
The logging collector collects and stores system Log files. So, troubleshooting information from the
system is also written to disk occasionally to the Log files to help us debug or troubleshoot/investigate.
The stats collector process collects information about activity on the server such as whats going on,
who is connected and what they are doing.
Any database system needs a mechanism for handling concurrent activity. To deal smoothly with locks
and deadlocks we implement a mechanism called MVCC and this helps the DB system to have DB
snapshots and by so doing reading/updating and inserting in the database by many users does not
become so contentious.
Auto vacuum helps with garbage collection and also runs analyze. This works on table statistics.
The stats collector works but on system statistics and auto vacuum work on table/db stats.
Every DB system has a query engine/GPS system that tries to decide what the best way to process/best
path a query is going to be. So, it formulates a query plan based on up to date and accurate information
or statistics. So, if you plan based of outdated or inaccurate information then your system will not work
efficiently based on the plan and your queries will suffer.
The analyze operation allows you to regularly update your databases tables statistics. Then the query
engine will give to the best or most optimized query plan. Hence your queries will run with the best
plan.