Parallel Programming Notes - Parallel Loops
use Parallel Loop pattern when performing same independent operation for each element of a collection or for a fixed number of iterations
steps of a loop are independent if they don't write to memory locations or files that are read by other steps
similar to for and foreach loops except order of execution isn't defined
only guarantee is all iterations will be run
degree of parallelism isn't specified in code
run-time environment executes loop using as many cores as possible
Parallel For Loops
use Parallel.For method to iterate range of interger indices
sequential for loop
    int n = ...
    for (int i = 0; i < n; i++)
    {
        ...
    }
parallel for loop
    int n = ...
    Parallel.For (0, n, i =>
    {
        // lambda expression
        ...
    });
Parallel.For is static method with overloads
signature used above is
Parallel.For(int fromInclusive, int toExclusive, Action<int> body);
the body param can be a
lambda expression
instance of a delegate type
an anonymous method using delegate keyword
named method
Parallel ForEach Loops
use Parallel.ForEach method to iterate user-providded values
sequential foreach loop
    IEnumerable myEnumerable = ...
    foreach (var obj in myEnumerable)
    {
        ...
    }
parallel foreach loop
    IEnumerable myEnumerable = ...
    Parallel.ForEach (myEnumerable, obj =>
    {
        // lambda expression
        ...
    });
Parallel.ForEach is static method with overloads
signature used above is
Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body);
the body param can be a
lambda expression
instance of a delegate type
an anonymous method using delegate keyword
named method
Parallel LINQ (PLINQ)
most all LINQ-to-object expressions have parallel counterpart
   IEnumerable<MyObject> source = ...
    // LINQ
    var query1 = from i in source select Normalize(i);
    // PLINQ
    var query2 = from i in source.AsParallel() select Normalize(i);
Expectations
degree of parallelism depends upon number of available cores
Amdahl's Law predicts a point of diminishing returns
Parallel.For, Parallel.ForEach and PLINQ methods collect exceptions into an AggregateException object and rethrown in the context of the calling thread
many variations provide configuration options
Parallel.For has 12 overloads
Parallel.ForEach has 20 overloads
PLINQ has close to 200 extension methods
common examples of dependent loop bodies
Breaking Out of Loops Early
the need to break a sequential loop is a common occurrence
    int n = ...;
    for(int i = 1; i < n; i++)
    {
       bool someCondition = ...;
       if (someCondition)
       {
            break;
       }
    }
more complicated with parallel loops
multiple steps may be active & steps are not executed in predetermined manner
Parallel break
allows all steps with lower indices to run before breaking the loop
    int n = ...;
    Parallel.For(0, n, (i, loopState) =>
    {
       bool someCondition = ...;
       if (someCondition)
       {
            loopState.Break;
            return;
       }
    });
method signature is
    Parallel.For(int fromInclusive, int toInclusive, Action<int, ParallelLoopState> body);
calling ParallelLoopState>Break begins orderly shutdown of the loop processing
long-running loop bodies may want to check for break condition by evaluating the loopstate using
LowestBreakIteration property - returns nullable long int whose HasValue property is true if loop has been broken
ShouldExitCurrentIteration property - checks for breaks & other stopping conditions
possible for more than one step to call Break method, lowest index rules
ParallelLoopResult
both Parallel.For & Parallel.ForEach return a ParallelLoopResult object
determine if Break was called by examining values of IsCompleted & LowestBreakIteration properties , if the IsCompleted value is false and the LowestBreakIteration's HasValue property is true the loop was broken
specific index where loop was terminated can be found in the LowestBreakIteration's Value property
    int n = ...;
    Parallel.For(0, n, (i, loopState) =>
    {
       bool someCondition = ...;
       if (someCondition)
       {
            loopState.Break;
            return;
       }
    });
    
    if(loopResult.IsCompleted && loopResult.LowestBreakIteration.HasValue)
    {
        Console.WriteLine("Loop broke at index {0}", loopResult.LowestBreakIteration.Value
    }
Parallel stop
terminates loop allowing no new steps to begin
External Loop Cancellation
no Stop method for PLINQ query
overloaded For & ForEach methods
   Parallel.For(int fromInclusive, int toExclusive, ParallelOptions parallelOptions, Action<int> body);
use CancellationTokenSource type to signal cancellation and CancellationToken structure to detect & respond to a cancellation request
    void doLoop(CancellationTokenSource cts)
    {
        int n = ...
        CancellationToken token = cts.Token;
        var options = new ParallelOptions{ CancellationToken = token };
        try
        {
            Parallel.For(0, n, options, (i) =>
            {
                ...
                if(token.IsCancellationRequested)
                {
                    ...
                    return;
                }
            }
        }
        catch (OperationCanceledException ex)
        {
            // ... handle loop cancellation
        }
    }
use WithCancellation extension method to add external cancellation capabilities to a PLINQ query
Exception Handling
if an unhandled exception is thrown no new steps are started
by default steps that are running are allowed to complete
after running steps complete the parallel loop will throw an exception in the context of calling thread
long-running steps can check the ParallelLoopState's IsExceptional property, returns true if exception is pending
multiple exceptions can be thrown, up to the number of concurrent steps AggregateException has InnerExceptions property contains a collection of all exception thrown in the loop
exceptions take priority over Break and Stop methods of ParallelLoopState
Special Handling of Small Loop Bodies
if loop body performs only a small amount of work better performance can be achieved by partitioning iterations into larger units of work
Partitioner object divides indices into non-overlapping intervals named ranges
    int n = ...;
    double[] result = new double[n];
    Parallel.ForEach(Partioner.Create(0, n), (range) =>
    {
       for(int i = range.Item1; i < range.Item2; i++)
       {
            result[i] = (double)(i * i);
       }
    });
Partioner.Create method returns the type Partitioner<Tuple<int, int>> which implements IEnumerable
each tuple represents index values to be used in loop iterations
number of ranges depends upon number of available cores
default number of ranges is ~3 times number of cores
overload of Create method allows size of ranges to be specified
    int n = 1000000;
    double[] result = new double[n];
    Parallel.ForEach(Partioner.Create(0, n, 50000), (range) =>
    {
       for(int i = range.Item1; i < range.Item2; i++)
       {
            result[i] = (double)(i * i);
       }
    });
in the example each range will span 50000 index values
Controlling the Degree of Parallelism
generally let system manage cores but may want additional control
reduce degree of parallelism to simulate less capable hardware while performance testing
increase degree of parallelism to a number larger than the number of cores when iterations wait a lot of time on I/O operations
degree of parallelism can be considered two ways
1 - refers to number of cores used for simultaneous iterations
2 - the number of tasks that can be used simultaneously by the parallel loop, ParallelOptions.MaxDegreeOfParallelism property is maximum number of worker tasks that will be scheduled at any one time
    int n = ...;
    var options = new ParallelOptions(){ MaxDegreeOfParallelism = 2 };
    Parallel.For(0, n, options, i =>
    {
        ...
    });
maximum number of worker threads for PLINQ queries can be set with the ParallelQuery<T>.WithDegreeOfParallelism property
    IEnumerable<T> myCollection = ...
    myCollection.AsParallel().WithDegreesOfParallelism(8).ForAll( obj => ... );
when large degree of parallelism is specified, use ThreadPool type's SetMinThreads methods so threads are created without delay
Using Task-Level State in a Loop Body
overloaded methods
each parallel loop has its own Random object
   double[] result = new double[NumberOfSteps];
    // int fromInclusive, int toExclsuive
    Parallel.For(0, NumberOfSteps,
                // parallel options
                new ParallelOptions(),
                // Func<TLocal> localInit
                () => { return new Random(MakeRandomSeed()); },
                // Func<TSource, ParallelLoopState, TLocal, TLocal> body
                (i, loopState, random) =>
                {
                    result[i] = random.NextDouble();
                    return random;
                },
                // Action<TLocal> localFinally
                _ => { });
Using a Custom Task Scheduler for a Parallel Loop
substitute custom task scheduling logic for default
   int n = ...
    TaskScheduler myTaskScheduler = ...
    var options = new ParallelOptions(){ TaskScheduler = myTaskScheduler };
    Parallel.For(0, n, options, i =>
    {
        ...
    });
signature of method used
    Parallel.For(int fromInclusive,
                 int toExclsuive,
                 ParallelOptions options,
                 Action<int> body);
cannot specify custom task scheduler for PLINQ queries
Anti-Patterns
Step Size Other Than One
often indicates a data dependency
negative step size indicates ordering is significant
before converting sequential for loop reverse the order of indices, code running correctly indicates steps are independent
Hidden Loop Body Dependencies
sharing instance of type that is not thread-safe
shared variables must be protected & synchronized
synchronization can reduce performance
Small Loop Bodies With Few Iterations
overhead dominates benefits
Processor Oversubscription & Undersubscription
oversubscription
more worker threads than cores
generally optimum number of worker threads is number of available cores divided by average fraction of core utilization per task
higher number incurs overhead of context switching
undersubscription
having too few tasks misses opportunity to effectively use available processor cores
generally most efficient to use built-in load balancing algorithms in .NET framework
Mixing the Parallel Class & PLINQ
can use PLINQ query as source collection for a Parallel.For loop
introduces unnecessary repetion
Duplicates in the Input Enumeration
duplicate objects in an enumeration can lead to race condition
Design Notes
Adaptive Partitioning
useful when size of units of work increases over time
able to meet needs of both small & large input ranges
Adaptive Concurrency
PLINQ uses fixed number of tasks to execute a query, by default creates same number of tasks as there are logical cores, or uses value passed to WithDegreeOfParallelism method if specified
Parallel.For & Parallel.ForEach can use variable number of tasks, thread pool adapts dynamically to changing workloads
Support For Nested Loops & Server Applications
with nested parallel loops each loop uses fewer threads
Parallel class attempts to deal with multiple App Domains in server applications the same way
back to index
n4jvp.com