## Documentation Center |

This example benchmarks the `parfor` construct by repeatedly playing the
card game of blackjack, also known as 21. We use `parfor` to play the
card game multiple times in parallel, varying the number of MATLAB®
workers, but always using the same number of players and hands.

Related examples:

On this page… |
---|

Check the Status of the Parallel Pool Run the Benchmark: Weak Scaling |

The basic parallel algorithm uses the `parfor` construct to execute
independent passes through a loop. It is a part of the MATLAB®
language, but behaves essentially like a regular `for`-loop if you do not
have access to the Parallel Computing Toolbox™ product. Thus, our
initial step is to convert a loop of the form

for i = 1:numPlayers S(:, i) = playBlackjack(); end

into the equivalent `parfor` loop:

parfor i = 1:numPlayers S(:, i) = playBlackjack(); end

We modify this slightly by specifying an optional argument to `parfor`,
instructing it to limit to `n` the number of workers it uses for the
computations. The actual code is as follows:

```
dbtype pctdemo_aux_parforbench
```

1 function S = pctdemo_aux_parforbench(numHands, numPlayers, n) 2 %PCTDEMO_AUX_PARFORBENCH Use parfor to play blackjack. 3 % S = pctdemo_aux_parforbench(numHands, numPlayers, n) plays 4 % numHands hands of blackjack numPlayers times, and uses no 5 % more than n MATLAB(R) workers for the computations. 6 7 % Copyright 2007-2011 The MathWorks, Inc. 8 9 S = zeros(numHands, numPlayers); 10 parfor (i = 1:numPlayers, n) 11 S(:, i) = pctdemo_task_blackjack(numHands, 1); 12 end

**Check the Status of the Parallel Pool**

We will use the parallel pool to allow the body of the `parfor` loop to
run in parallel, so we start by checking whether the pool is open. We
will then run the benchmark using anywhere between 2 and `poolSize`
workers from this pool.

p = gcp; if isempty(p) error('pctexample:parforbench:poolClosed', ... ['This example requires a parallel pool. ' ... 'Manually start a pool using the parpool command or set ' ... 'your parallel preferences to automatically start a pool.']); end poolSize = p.NumWorkers;

**Run the Benchmark: Weak Scaling**

We time the execution of our benchmark calculations using 2 to `poolSize`
workers. We use weak scaling, that is, we increase the problem size with
the number of workers.

numHands = 2000; numPlayers = 6; fprintf('Simulating each player playing %d hands.\n', numHands); t1 = zeros(1, poolSize); for n = 2:poolSize tic; pctdemo_aux_parforbench(numHands, n*numPlayers, n); t1(n) = toc; fprintf('%d workers simulated %d players in %3.2f seconds.\n', ... n, n*numPlayers, t1(n)); end

Simulating each player playing 2000 hands. 2 workers simulated 12 players in 7.09 seconds. 3 workers simulated 18 players in 6.98 seconds. 4 workers simulated 24 players in 7.06 seconds. 5 workers simulated 30 players in 7.41 seconds. 6 workers simulated 36 players in 7.02 seconds. 7 workers simulated 42 players in 7.59 seconds. 8 workers simulated 48 players in 7.08 seconds. 9 workers simulated 54 players in 7.58 seconds. 10 workers simulated 60 players in 7.45 seconds. 11 workers simulated 66 players in 7.21 seconds. 12 workers simulated 72 players in 7.14 seconds. 13 workers simulated 78 players in 7.32 seconds. 14 workers simulated 84 players in 7.49 seconds. 15 workers simulated 90 players in 7.21 seconds. 16 workers simulated 96 players in 7.44 seconds. 17 workers simulated 102 players in 7.45 seconds. 18 workers simulated 108 players in 7.75 seconds. 19 workers simulated 114 players in 7.38 seconds. 20 workers simulated 120 players in 7.11 seconds. 21 workers simulated 126 players in 7.32 seconds. 22 workers simulated 132 players in 7.59 seconds. 23 workers simulated 138 players in 7.41 seconds. 24 workers simulated 144 players in 7.19 seconds. 25 workers simulated 150 players in 7.41 seconds. 26 workers simulated 156 players in 8.06 seconds. 27 workers simulated 162 players in 7.45 seconds. 28 workers simulated 168 players in 7.48 seconds. 29 workers simulated 174 players in 7.55 seconds. 30 workers simulated 180 players in 7.42 seconds. 31 workers simulated 186 players in 7.25 seconds. 32 workers simulated 192 players in 7.49 seconds. 33 workers simulated 198 players in 7.40 seconds. 34 workers simulated 204 players in 8.08 seconds. 35 workers simulated 210 players in 7.63 seconds. 36 workers simulated 216 players in 8.10 seconds. 37 workers simulated 222 players in 8.15 seconds. 38 workers simulated 228 players in 7.55 seconds. 39 workers simulated 234 players in 7.60 seconds. 40 workers simulated 240 players in 7.22 seconds. 41 workers simulated 246 players in 7.51 seconds. 42 workers simulated 252 players in 7.44 seconds. 43 workers simulated 258 players in 7.31 seconds. 44 workers simulated 264 players in 7.41 seconds. 45 workers simulated 270 players in 7.46 seconds. 46 workers simulated 276 players in 7.45 seconds. 47 workers simulated 282 players in 7.44 seconds. 48 workers simulated 288 players in 7.55 seconds. 49 workers simulated 294 players in 7.30 seconds. 50 workers simulated 300 players in 7.49 seconds. 51 workers simulated 306 players in 7.52 seconds. 52 workers simulated 312 players in 7.27 seconds. 53 workers simulated 318 players in 7.35 seconds. 54 workers simulated 324 players in 8.09 seconds. 55 workers simulated 330 players in 7.49 seconds. 56 workers simulated 336 players in 7.52 seconds. 57 workers simulated 342 players in 8.04 seconds. 58 workers simulated 348 players in 7.59 seconds. 59 workers simulated 354 players in 8.07 seconds. 60 workers simulated 360 players in 7.35 seconds. 61 workers simulated 366 players in 7.38 seconds. 62 workers simulated 372 players in 7.50 seconds. 63 workers simulated 378 players in 8.06 seconds. 64 workers simulated 384 players in 7.65 seconds.

We compare this against the execution using a regular `for`-loop in
MATLAB®.

tic; S = zeros(numHands, numPlayers); for i = 1:numPlayers S(:, i) = pctdemo_task_blackjack(numHands, 1); end t1(1) = toc; fprintf('Ran in %3.2f seconds using a sequential for-loop.\n', t1(1));

Ran in 6.46 seconds using a sequential for-loop.

We compare the speedup using `parfor` with different numbers of workers
to the perfectly linear speedup curve. The speedup achieved by using
`parfor` depends on the problem size as well as the underlying hardware
and networking infrastructure.

speedup = (1:poolSize).*t1(1)./t1; fig = pctdemo_setup_blackjack(1.0); set(fig, 'Visible', 'on'); ax = axes('parent', fig); x = plot(ax, 1:poolSize, 1:poolSize, '--', ... 1:poolSize, speedup, 's', 'MarkerFaceColor', 'b'); t = get(ax, 'XTick'); t(t ~= round(t)) = []; % Remove all non-integer x-axis ticks. set(ax, 'XTick', t); legend(x, 'Linear Speedup', 'Measured Speedup', 'Location', 'NorthWest'); xlabel(ax, 'Number of MATLAB workers participating in computations'); ylabel(ax, 'Speedup');

**Measure the Speedup Distribution**

To get reliable benchmark numbers, we need to run the benchmark multiple
times. We therefore run the benchmark multiple times for `poolSize`
workers to allow us to look at the spread of the speedup.

numIter = 100; t2 = zeros(1, numIter); for i = 1:numIter tic; pctdemo_aux_parforbench(numHands, poolSize*numPlayers, poolSize); t2(i) = toc; if mod(i,20) == 0 fprintf('Benchmark has run %d out of %d times.\n',i,numIter); end end

Benchmark has run 20 out of 100 times. Benchmark has run 40 out of 100 times. Benchmark has run 60 out of 100 times. Benchmark has run 80 out of 100 times. Benchmark has run 100 out of 100 times.

We take a close look at the speedup of our simple parallel program when using the maximum number of workers. The histogram of the speedup allows us to distinguish between outliers and the average speedup.

speedup = t1(1)./t2*poolSize; clf(fig); ax = axes('parent', fig); hist(speedup, 5); a = axis(ax); a(4) = 5*ceil(a(4)/5); % Round y-axis to nearest multiple of 5. axis(ax, a) xlabel(ax, 'Speedup'); ylabel(ax, 'Frequency'); title(ax, sprintf('Speedup of parfor with %d workers', poolSize)); m = median(speedup); fprintf(['Median speedup is %3.2f, which corresponds to '... 'efficiency of %3.2f.\n'], m, m/poolSize);

Median speedup is 54.23, which corresponds to efficiency of 0.85.

Was this topic helpful?