Sunday, August 7, 2016

What is PLINQ?

What is PLINQ?

Parallel LINQ (PLINQ) is a parallel implementation of the LINQ pattern. A PLINQ query in many ways resembles a non-parallel LINQ to Objects query. PLINQ queries, just like sequential LINQ queries, operate on any in-memory IEnumerable or IEnumerable<T> data source, and have deferred execution, which means they do not begin executing until the query is enumerated. The primary difference is that PLINQ attempts to make full use of all the processors on the system. It does this by partitioning the data source into segments, and then executing the query on each segment on separate worker threads in parallel on multiple processors. In many cases, parallel execution means that the query runs significantly faster. [Ref: https://msdn.microsoft.com]

In .NET Framework, there is a subset of libraries that is called Parallel Framework, often referred to as Parallel Framework Extensions (PFX), which was the name of the very first version of these libraries. Parallel Framework was released with .NET Framework 4.0 and consists of three major parts:

    The Task Parallel Library (TPL)
    Concurrent collections
    Parallel LINQ or PLINQ

Until now, you have learned how to run several tasks in parallel and synchronize them with one another. In fact, we partitioned our program into a set of tasks and had different threads running different tasks. This approach is called task parallelism, and you have only been learning about task parallelism so far.

Imagine that we have a program that performs some heavy calculations over a big set of data. The easiest way to parallelize this program is to partition this set of data into smaller chunks, run the calculations needed over these chunks of data in parallel, and then aggregate the results of these calculations. This programming model is called data parallelism.

Task parallelism has the lowest abstraction level. We define a program as a combination of tasks, explicitly defining how they are combined. A program composed in this way could be very complex and detailed. Parallel operations are defined in different places in this program, and as it grows, the program becomes harder to understand and maintain. This way of making the program parallel is called unstructured parallelism. It is the price we have to pay if we have complex parallelization logic.

However, when we have simpler program logic, we can try to offload more parallelization details to the PFX libraries and the C# compiler. For example, we could say, "I would like to run those three methods in parallel, and I do not care how exactly this parallelization happens; let the .NET infrastructure decide the details". This raises the abstraction level as we do not have to provide a detailed description of how exactly we are parallelizing this. This approach is referred to as structured parallelism since the parallelization is usually a sort of declaration and each case of parallelization is defined in exactly one place in the program. [Ref: Multithreading with C# Cookbook - Second Edition - By: Eugene Agafonov - Print ISBN-13: 978-1-78588-125-1]

No comments:

Post a Comment