martedì 2 aprile 2019

How the power query cache works?

Let's see how the power query cache works (in both Excel and Power BI Desktop). A "persistent cache" stored on disk, when refreshing queries. But the persistent cache does not store the results of your M Queries. Instead, it stores the results of the behind-the-scenes requests sent to data sources. The warning like "This preview may be up to .... old" happens because there is a persistent cache used for interactive previewing of query results. This cache is long-lived and is intended to make the experience of working in the editor faster.

So Query1 queries a SQL Server database and returns a single unfiltered table. In this case, the persistent cache will now know the result of sending. If another M query (whether through referencing Query1, or by querying the same table directly) needs the same result, the persistent cache can provide it, and the result won't have to be fetched a second time from the SQL Server

We will see that this does not work and this sometimes creates performance problems


Let's try to understand how the Power Query cache works. To do this we need to prepare the test environment.Now let's start by uploading data from the sql server and in detail from the contoso database using the following query t-sql

Insert the t-sql query in the dialog box of a direct query connection as shown in the image below
After entering the query we click on ok. When we import the data we enter the query modification mode to be able to handle the query just created. Rename the query with the name "test"
Next we create three references of the "test" query. The references we call them R1, R2, R3 as shown in the image below:
Now you can close the power query window with "close and apply". We have almost completed the preparation of the test environment. We need to go into the power bi settings and make sure the flag is set to "Enabled parallel loading to tables" as shown in the image below
Now let's keep the sql server profiler open and refresh it in power bI. As we see the power query engine executes 4 queries to SQL Server that are run in parallel.


I thought the Power Query engine first evaluates "Test", and after that happened the data goes from there to R1, R2, R3. Actually, what happens is the opposite: R1, R2 and R3 are evaluated in parallel independently because each of them refers to "test". Power Query persistent cache is updated through a background thread. And the separate assessments running at the same time are not coordinated; when the R1 evaluation accesses the persistent cache this does not prevent the R2 or R3 evaluation from accessing the cache. This means that even when using a shared persistent cache, PQ can potentially require the same data twice. It depends on the timing of the various requests.

This happens because we have enabled the "Enable parallel table loading" flag if instead we set the "Enable parallel loading of tables" on off,

so the queries above are evaluated in series, the SQL Profiler shows that the sql server is called a only once.

The Profiler shows that only one of the queries on R1, R2 and R3 is requested from the SQL Server while the other two are taken from the cache. In fact, even if the three queries cause the "test" evaluation, every time "test" is evaluated, it is possible to reuse the data stored in the persistent cache.

On the contrary, with the three queries updated in parallel, each "test" evaluation takes place in the cache. This is slower, but in this case not much slower. In other cases, the difference between parallel and sequential query execution could be much wider.

to understand how the cache works, I used a beautiful chris webb article. About this feature in detail here https://blog.crossjoin.co.uk/2019/03/26/power-bi-caching-parallelism-and-power-query-refresh-performance/

Nessun commento:

Posta un commento

WPC Days - Power BI Datamart