Refactoring Iterator Adapters in Rust

Refactoring the creation of multiple consecutive iterator adapter chains

Once chains of consecutive iterator adapters get sufficiently complex, and one wants to reuse them in multiple places, the natural instinct is to refactor them into something reusable. Initially, it was not obvious to me how to approach this, so I set out to explore various options.

Function returning the complete adapter

Make sure to include lifetimes for (generic) functions to pass along to returned iterator wrappers

Return impl Iterator

fn parse_iterable<'a, T>(
    iter: impl Iterator<Item = T> + 'a
) -> impl Iterator<Item = T> + 'a {
    todo!();
}

One limitation of returning impl Iterator is that any variations of behaviour must occur within the same adapter types in all branches of logic. E.g.

fn fork_iterable_behaviour_impl<'a, T>(
    iter: impl Iterator<Item = T> + 'a,
    limit: Option<usize>,
) -> impl Iterator<Item = T> + 'a {
    if let Some(n) = limit {
        iter.take(n)
    } else {
        iter.take(usize::MAX) // forced to implement a limit that might not have wanted
    }
}

Additionally, one cannot use this approach if any of the adapters used take different closures (or other inputs that are part of their type) as input. E.g. the following will not compile as the return value is a different type in each branch, because for the returned Map<I, F> struct, the closure (F) is part of the type definition

fn fork_iterable_behaviour_impl_map<'a, T>(
    iter: impl Iterator<Item = T> + 'a,
    condition: bool,
) -> impl Iterator<Item = T> + 'a {
    if condition {
        iter.map(|x| {x})
    } else {
        iter.map(|x| {x}) // mismatched type error
    }
}

We would instead have to move the evaluation of the condition inside of the closure, so we only return one type.

fn fork_iterable_behaviour_impl_inside_map<'a, T>(
    iter: impl Iterator<Item = T> + 'a,
    condition: bool,
) -> impl Iterator<Item = T> + 'a {
    iter.map(move |x| {
        if condition {
            x
        } else {
            x
        }
    })
}

However, due to the logic being evaluated on every iteration, it might be less performant.

Return Box<dyn Iterator>

More flexible forking behaviour can be achieved by returning Box<dyn Iterator<Item = T> + 'a> instead of impl Iterator<Item = T> + 'a

fn fork_iterable_behaviour_box<'a, T>(
    iter: impl Iterator<Item = T> + 'a,
    limit: Option<usize>,
) -> Box<dyn Iterator<Item = T> + 'a> {
    if let Some(n) = limit {
        Box::new(iter.take(n))
    } else {
        Box::new(iter) // could continue forever
    }
}

fn fork_iterable_behaviour_box_map<'a, T>(
    iter: impl Iterator<Item = T> + 'a,
    condition: bool,
) -> Box<dyn Iterator<Item = T> + 'a> {
    if condition {
        Box::new(iter.map(|x| {x}))
    } else {
        Box::new(iter.map(|x| {x}))
    }
}

General tips

Adapters with closures

When trying to return iterators, remember to use move closures (e.g. in .map()) when you need to capture variables constructed in the function, or otherwise owned by the current function (e.g. owned function arguments). Sometimes the compiler will recommend this, other times it seems to miss it.

fn capture_variables<'a, T>(
    iter: impl Iterator<Item = T> + 'a,
) -> impl Iterator<Item = (&'a str, T)> + 'a {
    let captured = "I'm captured";
    iter.map(move |t| (captured, t))
}

Construct a new Iterator wrapping the complete adapter

Another approach is to create an entirely new Iterator that internally contains your iterator adapter chain.

As of writing this, the rust documentation on implementing Iterator, does not have an example covering iterator adapters. However, there are instances in the standard library.

Intersperse, provides an example of creating an Iterator by using another iterator adapter stored in its struct.

So, one can create our own Iterator like so:

Using a struct wrapping the Iterator

pub struct MyIterator<I: Iterator> {
    iter: Take<Enumerate<I>>,
}

impl<T, I: Iterator<Item = T>> MyIterator<I> {
    pub fn new(iter: I) -> Self {
        Self { iter: iter.enumerate().take(5) }
    }
}

impl<I> Iterator for MyIterator<I>
where
    I: Iterator,
{
    type Item = (usize, I::Item);
    
    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next()
    }
}

let mut initial_iterator = [1,2,3,4,5].iter();
let mut i = MyIterator::new(initial_iterator);
i.next();

There are alternative ways to store the Iterator in the struct with different trade-offs discussed in Storing Iterator adapters

The ergonomics of this approach can be further improved by defining a Trait that provides a method to create this Iterator and doing a blanket implementation on relevant upstream Iterators as explored later in this post

Using std::iter::from_fn

The standard library provides a helpful function for creating iterators from functions (or closures) std::iter::from_fn. This requires a lot less boilerplate than creating our own struct implementing Iterator

let mut initial_iterator = [1,2,3,4,5].iter();
// Can produce iterators that produce more or less values than the upsteam iterator,
// more general purpose than filter or flatmap
let mut can_loop = std::iter::from_fn(move || {
    loop {
        let n = initial_iterator.next();
        if let Some(i) = n {
            if i % 2 == 2 {
                break Some(i.to_owned())
            } else {
                continue;
            }
        } else {
            break None
        }
    }
});
can_loop.next();

Storing Iterator adapters

As seen above, the way the Iterator adapter chain is stored has a significant impact.

Store an iterator adapter in a struct

If you know the adapters are always the same types of adapter (and hence the same size), you can define a struct containing the specific adapter struct (potentially nested)

pub struct MyStruct<I> {
    iter: Take<Enumerate<I>>,
}

impl<T, I: Iterator<Item = T>> MyStruct<I> {
    pub fn new(iter: I) -> Self {
        Self { iter: iter.enumerate().take(5) }
    }
}

Alternatively, for more flexibility of the types, but potentially less runtime performance, store a Box of the iterater in the struct

pub struct IterWrapper<'a, T> {
    iter: Box<dyn Iterator<Item = (usize, T)> + 'a>,
}

impl<'a, T> IterWrapper<'a, T> {
    pub fn new(iter: impl Iterator<Item = T> + 'a) -> Self {
        Self { iter: Box::new(iter.enumerate().take(5)) }
    }
}

Store an iterator adapter without a struct

fn to_generic_container<'a, T>(
    iter: impl Iterator<Item = T> + 'a,
) -> Box<dyn Iterator<Item = T> + 'a> {
    Box::new(iter.take(5).skip(3))
}

This approach can be made a little more reusable via named types for reference downstream:

type MyIterType<'a, T> = Box<dyn Iterator<Item = T> + 'a>;

fn to_generic_container<'a, T>(
    iter: impl Iterator<Item = T> + 'a,
) -> MyIterType<'a, T> {
    Box::new(iter.take(5).skip(3))
}

fn from_generic_container<'a, T>(
    iter: MyIterType<'a, T>,
) {
    todo!();
}

Refactor multiple consecutive iterator adapters into a new iterator adapter method

Bringing together the different approaches for adapter chain creation and storage, one can define convenience Traits for creating any of the new wrappers:

Trait (method) Creating Struct

One could use either the specific Iterator adapter type struct or the Box struct, they use the same Trait definition structure

// Copied from earlier
pub struct MyIterator<I: Iterator> {
    iter: Take<Enumerate<I>>,  // could be Box version
}

impl<T, I: Iterator<Item = T>> MyIterator<I> {
    pub fn new(iter: I) -> Self {
        Self { iter: iter.enumerate().take(5) }
    }
}

impl<I> Iterator for MyIterator<I>
where
    I: Iterator,
{
    type Item = (usize, I::Item);
    
    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next()
    }
}

// Additional code here
pub trait ExampleStructIterator<'a, T>:
    Iterator<Item = T> + Sized + 'a
{
    fn to_struct_wrapper(
        self,
    ) -> MyIterator<Self> {
        MyIterator::new(self)
    }
}

impl<'a, T, I> ExampleStructIterator<'a, T> for I where I: Iterator<Item = T> + 'a {}

Trait (method) Creating Box

pub trait ExampleBoxIterator<'a, T>:
    Iterator<Item = T> + Sized + 'a
where Self: Clone
{
    fn to_box_wrapper(
        self,
    ) -> Box<dyn Iterator<Item = T> + 'a> {
        Box::new(self.take(5).cycle().skip(3))
    }
}

impl<'a, T, I> ExampleBoxIterator<'a, T> for I where I: Iterator<Item = T> + Clone + 'a {}

Trait extending the Iterator trait

This option is quite simple, performant, and adaptable. It avoids the need to implement the Iterator trait on a struct, while enabling one to create a short hand method for the adapters you are calling, without resorting to using the heap, by maintaining the inner types

trait Indexed: Iterator {
    fn indexed(self) -> Zip<RangeFrom<usize>, Self>
    where
        Self: Sized,
    {
        (1usize..).zip(self)
    }
}

impl<T> Indexed for T where T: Iterator {}

Alteratively with type erasure, one can more easily adapt the internals without changing the method signature.

trait Indexed: Iterator {
    fn indexed(self) -> Box<dyn Iterator<Item = (usize, Self::Item)>> 
    where
        Self: Sized,
    {
        Box::new((1usize..).zip(self))
    }
}

impl<T> Indexed for T where T: Iterator {}

Rayon - ParallelIterator considerations

With Rayon’s ParallelIterators there is less flexibility as:

So generally it is easier to constrain ourselves to what is possible within returning impl ParallelIterator and storage in structs containing specific nested adapter structs.

Forking behaviour

Forking behaviour can still be achieved via logic inside a closure.

fn fork_par_iterable_behaviour<'a, T>(
    iter: impl ParallelIterator<Item = T> + 'a,
    repeats: Option<usize>,
) -> impl ParallelIterator<Item = T> + 'a
where T: Clone + Send
{
    iter.flat_map_iter(move |item| {
        if let Some(r) = repeats {
            std::iter::repeat(item).take(r)
        } else {
            std::iter::repeat(item).take(1)
        }
    })
}