Parallel Processing

Parallel Processing is a way to improve performance on high-end machines.

What is Parallel Processing?

Each FME translation is usually a single process on your computer. Parallel processing is when you decide to transform your data as several simultaneous processes. The fact that they run simultaneously means the whole translation will run several times quicker than it used to.

The idea is to make use of multiple cores on a computer. There are four levels of parallel processing, and each maps to the number of cores in this way:

So, for example, on a quad core machine, minimal parallelism will result in two simultaneous FME processes. Extreme parallelism would result in eight. There is also a hard cap for each license level:

So, if you have a Base Edition license you are never going to get more than four processes at one time, regardless of machine type and the parallelism parameter.

Jake Speedie says…

“Parallel Processing is very effective when you are offloading a task elsewhere – for example calling a Server with the HTTPFetcher – as each process is a tiny impact on the FME system resources. However, be aware, each parallel process involves starting and stopping an FME engine, and this takes time. So, don’t parallelize your processes when the task already takes less than the time to stop/start FME!”

Transformers and Parallel Processing

There are a number of basic FME transformers that have built in options for parallel processing. Parallel processes work on groups of features, so the transformer must have a group-by parameter in order for the user to be able to define the parallel processing groups.

For example, this Bufferer transformer is set up to buffer a set of street features. Each type of street (highways, roads, lanes, etc.) will be processed as a separate group. To speed up the translation, each group is being handled as a separate process (sadly the user cannot confirm the source data is already ordered by group, which would improve performance even more). When you run a translation in parallel mode, then you’ll see a number of “worker” processes appear in your process manager:

Parallel Processing Groups

Best performance gains are when you have a small number of groups with a large amount of data. When there are many groups with only a few features then any performance gain will not be great and, in fact, the whole process might even be slower. Disk access can be a big bottleneck there.

Because each group gets processed independently, there can be no relationship between features in different groups. If features are related, and their results dependent on each other, then they must be in the same group.

However, if all data is unrelated and the contents of the group are unimportant, then it’s possible to make artificial groups using a ModuloCounter or RandomNumberGenerator transformer.

For example, here the user has millions of lines to buffer (separately) and uses a ModuloCounter to assign them to one of four groups for parallel processing. Note the "GroupBy" parameter in the Bufferer is set to the _modulo_count attribute:

Included here are a number of workspaces generated by your colleagues. As the resident FME expert you have been asked to assess the performance of each workspace.

1) Start Workbench

Start Workbench and open the workspaces in turn.

Assess each workspace for its ability to employ parallel processing. You should answer the following questions for each workspace.

Do any transformers permit parallel processing?
Is there an existing group I can parallel process by?
Is there an artificial group I can create to parallel process by?
Will parallel processing speed up the workspace performance, or make it worse?

The answers to these questions can be found in the finished workspaces.