In this post, we’ll examine why in an initial flushing analysis we find that Performance Schema data is incomplete.
Having shown the performance impact of Percona Server 5.7 patches , we can now discuss their technical reasoning and details. Let’s revisit the MySQL 5.7.11 performance schema synch wait graph from the previouspost, for the case of unlimited InnoDB concurrency:
First of all, this graph is a little “nicer” than reality, which limits its diagnostic value. There are two reasons for this. The first one is that page cleaner worker threads are invisible to Performance Schema (see bug 79894 ). This alone limits PFS value in 5.7 if, for example, one tries to select only the events in the page cleaner threads or monitors low concurrency where the cleaner thread count is non-negligible part of the total threads.
To understand the second reason, let’s look into PMP for the same setting. Note that selected intermediate stack frames were removed for clarity, especially in the InnoDB mutex implementation.
660 pthread_cond_wait,enter(ib0mutex.h:850),buf_dblwr_write_single_page(ib0mutex.h:850),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page(buf0flu.cc:1096),buf_flush_single_page_from_LRU(buf0flu.cc:2217),buf_LRU_get_free_block(buf0lru.cc:1401),... 631 pthread_cond_wait,buf_dblwr_write_single_page(buf0dblwr.cc:1213),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page(buf0flu.cc:1096),buf_flush_single_page_from_LRU(buf0flu.cc:2217),buf_LRU_get_free_block(buf0lru.cc:1401),... 337 pthread_cond_wait,PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),get_next_redo_rseg(trx0trx.cc:1185),trx_assign_rseg_low(trx0trx.cc:1278),trx_set_rw_mode(trx0trx.cc:1278),lock_table(lock0lock.cc:4076),... 324 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2448),LinuxAIOHandler::poll(os0file.cc:2594),... 241 pthread_cond_wait,PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),trx_write_serialisation_history(trx0trx.cc:1578),trx_commit_low(trx0trx.cc:2135),... 147 pthread_cond_wait,enter(ib0mutex.h:850),trx_undo_assign_undo(ib0mutex.h:850),trx_undo_report_row_operation(trx0rec.cc:1918),... 112 pthread_cod_wait,mtr_t::s_lock(sync0rw.ic:433),btr_cur_search_to_nth_level(btr0cur.cc:1008),... 83 poll(libc.so.6),Protocol_classic::get_command(protocol_classic.cc:965),do_command(sql_parse.cc:935),handle_connection(connection_handler_per_thread.cc:301),... 64 pthread_cond_wait,Per_thread_connection_handler::block_until_new_connection(thr_cond.h:136),...
The top wait in both PMP and the graph is the 660 samples of enter mutex in buf_dblwr_write_single_pages , which is the doublewrite mutex. Now try to find the nearly as hot 631 samples of event wait in buf_dblwr_write_single_page in the PFS output. You won’t find it because InnoDB OS event waits are not annotated in Performance Schema. In most cases this is correct, as OS event waits tend to be used when there is no work to do. The thread waits for work to appear, or for time to pass. But in the report above, the waiting thread is blocked from proceeding with useful work (see bug 80979 ).
Now that we’ve shown the two reasons why PFS data is not telling the whole server story, let’s take PMP data instead and consider how to proceed. Those top two PMP waits suggest 1) the server is performing a lot of single page flushes, and 2) those single page flushes have their concurrency limited by the eight doublewrite single-page flush slots available, and that the wait for a free slot to appear is significant.
Two options become apparent at this point: either make the single-page flush doublewrite more parallel or reduce the single-page flushing in the first place. We’re big fans of the latter option since version 5.6 performance work, where we configured Percona Server to not perform single-page flushes at all by introducing the innodb_empty_free_list_algorithm option, with the “backoff” default.
The next post in the series will describe how we removed single-page flushing in 5.7.
Laurynas is a software engineer and Percona Server lead whose primary interest is InnoDB performance. In the past he worked in industry, interned in Google as a compiler software engineer, as well as academia where he researched physical database indexes, including large-scale spatial models of the brain.
Alexey Stroganov is a Performance Engineer at Percona, where he works on improvements and features that makes Percona Server even more flexible, faster and scalable. Before joining Percona he worked on the performance testings/analysis of MySQL server and it components at MySQL AB/Sun/Oracle for more than ten years. During this time he was focused on performance evaluations, benchmarks, analysis, profiling, various optimizations and tunings.
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » MySQL 5.7: initial flushing analysis and why Performance Schema data is incomplete