Nested structures are challenging to read; a pipeline is usually easier to read and think about.
Nested structures have that “arrow of doom” feeling, where you work your way in multiple levels then out. It’s hard to tell what’s being operated on because you need to manage the context of all the parent structures simultaneously. A pipeline is usually linear.
Many languages have added “functional pipeline” style, building on map-filter-reduce first explored in Lisp, oh, about 50 years ago, and in Unix and Smalltalk about 40 years ago:)
What I’m describing below applies conceptually to Java, C#, Kotlin, Python, etc. I’ll use examples in Swift, and explain any Swift-specific constructs. See the “Sequences” article below for Apple’s documentation of the available functions.
Transformation Approaches
I use all three of these approaches:
- Tool: If your tool does the job, use the tool! For example, when IntelliJ IDEA sees a loop it knows how to transform, it pops up a yellow light bulb that offers to do it.
- Extract a new collection: When a loop iterates over a collection, extract the collection into a variable as the seed of the new pipeline, and gradually move parts of the loop into it until the original loop is gone. Martin Fowler explores this approach in his excellent book and article (see References), so I won’t explore this further.
- Transform in place: Transform the loop to a pipeline in place, one nesting level at a time. We’ll use this approach below.
In each case, I’ll show a pattern of a nested loop, then a refactoring that peels off a layer of nesting and uses pipeline methods instead.
Basic Iteration
Sometimes code iterates directly through a collection. We can switch to a pipeline with forEach()
:
1. Direct use of collection
for item in collection { print(item) }
=>
collection .forEach { item in print(item) }
In Swift, we can replace lambda arguments with $n, like this:
collection .forEach { print($0) }
As the body gets shorter, this shorthand becomes more convenient.
2. Index only used to get collection members
The iteration may be even more old-school:
for index in 0..<collection.count { let item = collection[index] print(item) // body, with no references to i }
=>
collection .forEach { item in print(item) }
3. Index used to get collection member and used in body
In some cases, the index is used for purposes other than fetching the next collection item.
for index in 0..<collection.count { let item = collection[index] print("index: \(index) item:\(item)") // body, w/refs to i }
=>
collection .enumerated() .forEach { index, item in print("index: \(index) item:\(item)") // body, w/refs to i }
4. Index used to get current and next collection members
You may need access to the sequential pairs of items in the collection:
for index in 0..<(collection.count - 1) { let current = collection[index] let next = collection[index+1] print("\(current):\(next)") // body, references both }
=>
zip(collection, collection.dropFirst()) .forEach { current, next in print("\(current):\(next)") // body, references both }
The zip()
function combines two sequences into a pair, up to the length of the shorter sequence. In this case, we’re using it on the same collection twice.
Transforming Collections
Sometimes we’re transforming a collection to a new collection, of the same or different type.
var result: [Int] = [] collection .forEach { item in let transformed = item.count result.append(transformed) }
=>
let result2 = collection .map { item in item.count }
Don’t confuse the pipeline function map() with the map data structure.
Filtering
Some loops only return a subset of the initial values.
var result1 : [String] = [] collection.forEach { item in if item.contains("a") { result1.append(item) // body using item } }
=>
let result2 = collection .filter { item in item.contains("a") }
Note that filter()
uses a predicate that’s true for items you want to keep.
Some other filters:
dropFirst()
,dropLast()
, anddrop(while:)
– skip the front or end of a sequenceprefix()
,prefix(while:)
,suffix()
– return the front or end of a sequence
Transform Plus Filter
Sometimes you want to transform a value, then filter it. The compactMap()
method lets you do both, filtering out null transformed values. (The “if let
” syntax tries to convert an optional value to a “real” one: if it can, it executes the body; if not, it skips it.)
let collection = ["42", "x", "314", "217", ""] var result1: [Int] = [] collection .forEach { item in let transformed = Int(item) // can fail if let transformed = transformed { result1.append(transformed) } }
=>
let collection = ["42", "x", "314", "217", ""] let result2 = collection .compactMap { item in Int(item) }
Aggregating a Value
You don’t always just transform a sequence to a sequence. Sometimes you want to compute an aggregate value from the sequence.
The core function for doing this is reduce()
. It takes two arguments: the initial value, and a function that given the previous result and the next value can compute the next result. For example, reduce(0, +)
will add up the list of numbers; reduce(false, |)
can “or” together a list of booleans; reduce(Int.min, { $0 < $1 ? $0 : $1})
will compute a maximum if the list is guaranteed to contain at least one item.
let collection = [42, 314, 504] var result1 = 0 // initial value collection .forEach { item in result1 = result1 + item }
=>
let result2 = collection .reduce(0, { result, item in result + item }) print(result2)
Again, we can refactor the latter to:
let result3 = collection .reduce(0, { $0 + $1 })
or even:
let result4 = collection.reduce(0, +)
Depending on your language, you may have specialized versions such as min(),
max()
, or sum()
.
If your reduce()
function is associative, you may have the option to (somewhat) parallelize by splitting the sequence into parts, and running reduce()
on the parts, the result of those, etc., until you’re down to a single value. The MapReduce system (see References) is built on this idea.
Extended Example
Let’s go through an example. We’ll turn the loop below into a functional pipeline one step at a time. The collection is an array, containing dictionaries (maps).
var points = [ ["x":"17", "y":"23"], ["x": "x12", "y": "y100"], ["x": "3", "y": "2", "z": "11"], ["w":"21"]]
The goal is to compute the average of the y coordinates for any entry that has one.
var sum = 0 var count = 0 for i in 0..<points.count { let item = points[i] if let y = item["y"] { if let theInt = Int(y) { sum += theInt count += 1 } } } print(sum / count)
=> Use forEach()
on collection; “i
” is not used further
var sum = 0 var count = 0 points.forEach { item in if let y = item["y"] { if let theInt = Int(y) { sum += theInt count += 1 } } } print(sum / count)
=> Use compactMap()
to handle the case where the coordinate is not found
var sum = 0 var count = 0 points .compactMap { $0["y"] } .forEach { y in if let theInt = Int(y) { sum += theInt count += 1 } } print(sum / count)
=> Use compactMap()
again to ignore ill-formed integers
var sum = 0 var count = 0 points .compactMap { $0["y"] } .compactMap { Int($0) } .forEach { theInt in sum += theInt count += 1 } print(sum / count)
Path 1: Split the Loop
We might look and realize our code is computing two things, and split the loops. To do this, we’ll save the common computation into a new collection, the duplicate the iteration to handle sum
and count
separately.
let values = points .compactMap { $0["y"] } .compactMap { Int($0) } var sum = 0 values .forEach { theInt in sum += theInt } var count = 0 values .forEach { theInt in count += 1 } print(sum / count)
=> Use reduce() on the two loops
let values = points .compactMap { $0["y"] } .compactMap { Int($0) } let sum = values.reduce(0, +) let count = values .map { _ in 1} .reduce(0, +) print(sum / count)
We can compute the count even more easily:
let count = values.count
Path 2: Single pipeline
We may recognize that sum and count can be kept together in a tuple (a mostly-anonymous object):
var tuple = (sum: 0, count: 0) points .compactMap { $0["y"] } .compactMap { Int($0) } .forEach { theInt in tuple = (sum: tuple.sum + theInt, count: tuple.count + 1) } print(tuple.sum / tuple.count)
If you’re like me, that tuple seems a little ugly and perhaps confusing. I’ll spare you the reduce() call that would use it. Instead, we see that sum and count have to be coordinated together. Sounds like a good place for an actual object:
class Average { var sum = 0 var count = 0 var value : Int? { if count == 0 { return nil } return sum / count } func add(_ item: Int) -> Average { sum += item count += 1 return self } }
=> Now, a reduce call looks good:
let average = points .compactMap { $0["y"] } .compactMap { Int($0) } .reduce(Average(), { $0.add($1) }) print(average.value!)
Comparison of the Two Paths
Splitting a loop can be a good move when it clarifies the code (and maybe lets you redistribute behavior among several objects) It has a downside though – if you store the partial work in a collection, you may force a real collection into existence, with all the memory management that entails.
By contrast, keeping it as a pipeline means there may never be a collection. Sure, our example had an array constant, but the same pipeline works for a stream of objects, never needing them all at once.
In this case, the new object is the win for me. I wasn’t expecting it, but this small object definitely improves the code.
Conclusion
Functional pipelines are often a win over nested structures (while – if – while -if -if etc:).
- Easier to understand: you have less context to maintain.
- Potentially more memory-efficient: a pipeline is just as happy to work on a stream as a collection.
- Easier to parallelize: you can imagine each stage as its own computer. (Real-life parallelization is harder than this, unfortunately.)
There’s a lot of overlap between languages that have these pipelines: they often provide similar features, even if they change the names a bit.
If your language supports functional pipelines, give them a try!
References
“MapReduce” – https://en.wikipedia.org/wiki/MapReduce . Retrieved 2021-08-15.
“Replace Loop with Pipeline” – in Refactoring (2/e), by Martin Fowler, 2019.
“Refactoring with Loops and Collection Pipelines”, https://martinfowler.com/articles/refactoring-pipelines.html, by Martin Fowler. Retrieved 2021-07-24.
“Sequence”. https://developer.apple.com/documentation/swift/sequence/ . Retrieved 2021-08-15