Nested Loops to Functional Pipeline

Nested structures are challenging to read; a pipeline is usually easier to read and think about.

Nested loops go deeper; functional pipelines stay at the same level

Nested structures have that “arrow of doom” feeling, where you work your way in multiple levels then out. It’s hard to tell what’s being operated on because you need to manage the context of all the parent structures simultaneously. A pipeline is usually linear. 

Many languages have added “functional pipeline” style, building on map-filter-reduce first explored in Lisp, oh, about 50 years ago, and in Unix and Smalltalk about 40 years ago:)

What I’m describing below applies conceptually to Java, C#, Kotlin, Python, etc. I’ll use examples in Swift, and explain any Swift-specific constructs. See the “Sequences” article below for Apple’s documentation of the available functions. 

Transformation Approaches

I use all three of these approaches:

  1. Tool: If your tool does the job, use the tool! For example, when IntelliJ IDEA sees a loop it knows how to transform, it pops up a yellow light bulb that offers to do it. 
  2. Extract a new collection: When a loop iterates over a collection, extract the collection into a variable as the seed of the new pipeline, and gradually move parts of the loop into it until the original loop is gone. Martin Fowler explores this approach in his excellent book and article (see References), so I won’t explore this further.
  3. Transform in place: Transform the loop to a pipeline in place, one nesting level at a time. We’ll use this approach below. 

In each case, I’ll show a pattern of a nested loop, then a refactoring that peels off a layer of nesting and uses pipeline methods instead.

Basic Iteration

Sometimes code iterates directly through a collection. We can switch to a pipeline with forEach():

1. Direct use of collection

    for item in collection {
      print(item)
    }

=>

    collection
      .forEach { item in
        print(item)
      }

In Swift, we can replace lambda arguments with $n, like this:

    collection
      .forEach { print($0) }

As the body gets shorter, this shorthand becomes more convenient. 

2. Index only used to get collection members

The iteration may be even more old-school:

    for index in 0..<collection.count {
      let item = collection[index]
      print(item)     //   body, with no references to i
    }

    =>

    collection
      .forEach { item in
        print(item)
      }

3. Index used to get collection member and used in body

In some cases, the index is used for purposes other than fetching the next collection item.

    for index in 0..<collection.count {
      let item = collection[index]
      print("index: \(index) item:\(item)")   //   body, w/refs to i
    }

=> 

    collection
      .enumerated()
      .forEach { index, item in
        print("index: \(index) item:\(item)")  //  body, w/refs to i
      }

4. Index used to get current and next collection members 

You may need access to the sequential pairs of items in the collection:

    for index in 0..<(collection.count - 1) {
      let current = collection[index]
      let next = collection[index+1]
      print("\(current):\(next)")     // body, references both
    }

=> 

    zip(collection, collection.dropFirst())
      .forEach { current, next in
        print("\(current):\(next)")     // body, references both
      }

The zip() function combines two sequences into a pair, up to the length of the shorter sequence. In this case, we’re using it on the same collection twice.

Transforming Collections

Sometimes we’re transforming a collection to a new collection, of the same or different type.

    var result: [Int] = []
    collection
      .forEach { item in
        let transformed = item.count
        result.append(transformed)
      }

    =>

    let result2 = collection
      .map { item in
        item.count
      }

Don’t confuse the pipeline function map() with the map data structure. 

Filtering

Some loops only return a subset of the initial values. 

    var result1 : [String] = []
    collection.forEach { item in
      if item.contains("a") {
        result1.append(item)  // body using item
      }
    }

=> 

    let result2 = collection
      .filter { item in
        item.contains("a")
      }

Note that filter() uses a predicate that’s true for items you want to keep. 

Some other filters: 

  • dropFirst(), dropLast(), and drop(while:) – skip the front or end of a sequence
  • prefix(), prefix(while:), suffix() – return the front or end of a sequence 

Transform Plus Filter

Sometimes you want to transform a value, then filter it. The compactMap() method lets you do both, filtering out null transformed values. (The “if let” syntax tries to convert an optional value to a “real” one: if it can, it executes the body; if not, it skips it.)

    let collection = ["42", "x", "314", "217", ""]
    
    var result1: [Int] = []
    
    collection
      .forEach { item in
        let transformed = Int(item)  // can fail
        if let transformed = transformed {
          result1.append(transformed)
        }
      }

=> 

    let collection = ["42", "x", "314", "217", ""]

    let result2 = collection
      .compactMap { item in
        Int(item)
      }

Aggregating a Value

You don’t always just transform a sequence to a sequence. Sometimes you want to compute an aggregate value from the sequence. 

The core function for doing this is reduce(). It takes two arguments: the initial value, and a function that given the previous result and the next value can compute the next result. For example, reduce(0, +) will add up the list of numbers; reduce(false, |) can “or” together a list of booleans; reduce(Int.min, { $0 < $1 ? $0 : $1}) will compute a maximum if the list is guaranteed to contain at least one item.

    let collection = [42, 314, 504]
    
    var result1 = 0    // initial value
    collection
      .forEach { item in
        result1 = result1 + item
      }

=>

    let result2 = collection
      .reduce(0, { result, item in result + item })
    print(result2)

Again, we can refactor the latter to:

    let result3 = collection
      .reduce(0, { $0 + $1 })

or even:

    let result4 = collection.reduce(0, +)

Depending on your language, you may have specialized versions such as min(), max(), or sum().

If your reduce() function is associative, you may have the option to (somewhat) parallelize by splitting the sequence into parts, and running reduce() on the parts, the result of those, etc., until you’re down to a single value. The MapReduce system (see References) is built on this idea. 

Extended Example

Let’s go through an example. We’ll turn the loop below into a functional pipeline one step at a time. The collection is an array, containing dictionaries (maps).  

  var points = [ ["x":"17", "y":"23"], ["x": "x12", "y": "y100"], ["x": "3", "y": "2", "z": "11"], ["w":"21"]]

The goal is to compute the average of the y coordinates for any entry that has one.

    var sum = 0
    var count = 0

    for i in 0..<points.count {
      let item = points[i]
      if let y = item["y"] {
        if let theInt = Int(y) {
          sum += theInt
          count += 1
        }
      }
    }
    
    print(sum / count)

=>   Use forEach() on collection; “i” is not used further

    var sum = 0
    var count = 0
    
    points.forEach { item in
      if let y = item["y"] {
        if let theInt = Int(y) {
          sum += theInt
          count += 1
        }
      }
    }
    
    print(sum / count)

=> Use compactMap() to handle the case where the coordinate is not found

    var sum = 0
    var count = 0
    
    points
      .compactMap { $0["y"] }
      .forEach { y in
        if let theInt = Int(y) {
          sum += theInt
          count += 1
        }
      }
    
    print(sum / count)

=> Use compactMap() again to ignore ill-formed integers

    var sum = 0
    var count = 0

    points
      .compactMap { $0["y"] }
      .compactMap { Int($0) }
      .forEach { theInt in
        sum += theInt
        count += 1
      }

    print(sum / count)

Path 1: Split the Loop

We might look and realize our code is computing two things, and split the loops. To do this, we’ll save the common computation into a new collection, the duplicate the iteration to handle sum and count separately.

    let values = points
      .compactMap { $0["y"] }
      .compactMap { Int($0) }

    var sum = 0
    values
      .forEach { theInt in
        sum += theInt
      }

    var count = 0
    values
      .forEach { theInt in
        count += 1
      }

    print(sum / count)

=> Use reduce() on the two loops

    let values = points
      .compactMap { $0["y"] }
      .compactMap { Int($0) }

    let sum = values.reduce(0, +)

    let count = values
      .map { _ in 1}
      .reduce(0, +)

    print(sum / count)

We can compute the count even more easily:

    let count = values.count

Path 2: Single pipeline

We may recognize that sum and count can be kept together in a tuple (a mostly-anonymous object):

    var tuple = (sum: 0, count: 0)

    points
      .compactMap { $0["y"] }
      .compactMap { Int($0) }
      .forEach { theInt in
        tuple = (sum: tuple.sum + theInt,
                 count: tuple.count + 1)
      }

    print(tuple.sum / tuple.count)

If you’re like me, that tuple seems a little ugly and perhaps confusing. I’ll spare you the reduce() call that would use it. Instead, we see that sum and count have to be coordinated together. Sounds like a good place for an actual object:

  class Average {
    var sum = 0
    var count = 0

    var value : Int? {
      if count == 0 { return nil }
      return sum / count
    }

    func add(_ item: Int) -> Average {
      sum += item
      count += 1
      return self
    }
  }

=> Now, a reduce call looks good:

    let average = points
      .compactMap { $0["y"] }
      .compactMap { Int($0) }
      .reduce(Average(), { $0.add($1) })

    print(average.value!)

Comparison of the Two Paths

Splitting a loop can be a good move when it clarifies the code (and maybe lets you redistribute behavior among several objects) It has a downside though – if you store the partial work in a collection, you may force a real collection into existence, with all the memory management that entails.

By contrast, keeping it as a pipeline means there may never be a collection. Sure, our example had an array constant, but the same pipeline works for a stream of objects, never needing them all at once.

In this case, the new object is the win for me. I wasn’t expecting it, but this small object definitely improves the code.

Conclusion

Functional pipelines are often a win over nested structures (while – if – while -if -if etc:).

  • Easier to understand: you have less context to maintain.
  • Potentially more memory-efficient: a pipeline is just as happy to work on a stream as a collection.
  • Easier to parallelize: you can imagine each stage as its own computer. (Real-life parallelization is harder than this, unfortunately.)

There’s a lot of overlap between languages that have these pipelines: they often provide similar features, even if they change the names a bit.

If your language supports functional pipelines, give them a try!

References

“MapReduce” – https://en.wikipedia.org/wiki/MapReduce . Retrieved 2021-08-15. 

“Replace Loop with Pipeline” – in Refactoring (2/e), by Martin Fowler, 2019.

“Refactoring with Loops and Collection Pipelines”, https://martinfowler.com/articles/refactoring-pipelines.html, by Martin Fowler. Retrieved 2021-07-24. 

“Sequence”. https://developer.apple.com/documentation/swift/sequence/ . Retrieved 2021-08-15