You are here

Identifying The Hottest Tables (PostgreSQL)

In recent versions of PostgreSQL there is a magical view called pg_stat_user_tables which provides per table information on usage; there is one row per table and eight counters per row.

Fields

  • relname - The name of the table.
  • seq_scan - The number of sequential scans that have been performed on the table. A sequential scan is a read of the table from beginning to end, either because the table is very small or no indexes were available that could satisfy the filter criteria in an efficient way. Sequential scans are probably the most expensive operation the database server performs, some are however unavoidable. If proper indexing cannot resolve the need to sequentially scan a table it is imperative that the PostrgeSQL configuration provide enough resources to maintain a high cache rate.
  • seq_tup_read - The number of rows processed through sequential scans. This is not the number of records returned to the applications as results but the number of records processed in order to create the result set, which is probably a significant subset of this number. For example, if a query returns ten records but requires a sequential scan of the table then this value will increase by the number of records in the table, not by ten.
  • idx_scans - The number of indexes scans of the tables.
  • idx_tup_fetch - The number of rows processes through indexed scans. As with seq_tup_read this is not the count of records returned as the results of queries but those evaluated for queries due to index entries.
  • seq_tup_read - The number of records processed in order to create the result set of a query, not the number of records returned to the applications.
  • n_tup_ins - The number of rows inserted into the table.
  • n_tup_upd - The number of rows updated.
  • n_tup_del - The number of rows deleted.

Using this view an administrator can isolate the ten busiest tables in the database.

    SELECT relname AS table_name,
    seq_tup_read, idx_tup_fetch
    FROM pg_stat_user_tables
    WHERE (seq_tup_read + idx_tup_fetch) > 0
    ORDER BY records DESC LIMIT 10

These results will reveal both table usage and the effectiveness of your indexes. If you have lots of sequential scans occurring then the query engine isn't finding indexes that match the queries being performed.

         table_name                           seq_tup_read      idx_tup_fetch
    doc                                  1,423,407,729,074    349,028,985,971
    job_history                                 71,378,301      4,213,364,118
    job_history_info                            74,454,363      4,207,594,850
    date_company_assignment                     31,059,671      1,305,469,897
    enterprise                               3,551,311,871      1,083,015,878
    date_x                                      12,884,498        982,418,723
    object_acl                              15,942,621,939        137,179,721
    job                                     39,956,712,914         46,912,825
    project_info                             1,709,329,011                 23
    team                                     1,141,035,688                  0

In these example results it is apparent that the table doc is one of the hottest objects and while many records are being identified using index entries there is also a very large number of sequential processes occurring. This may be because either the indexes do not match the queries being performed or the cardinality of the indexed values is too low. Now we know where to look.

So don't grope about speculating about how to improve database performance or scalability - ask where to look, PostgreSQL wants to help you. Much more information can be found at the PostgreSQL stats monitoring documentation.

Tags: 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer