An iterator is just a class that defines an __iter__(self) method. __iter__(self) function is automatically called when iter(inst) is called, where inst is an instance of this class. After performing beginning-of-iteration initialization, the __iter__() method returns any object that implements a __next__() method. Sometimes it just return self, because the class can implement __next__(). __iter__()is a good place to initialize the iterator with initial values.

The __next__() method is called when next() is called on an iterator of an instance of a class. It should raise a StopIteration exception to stop generating values. To spit out the next value, simply returns the value (do not use yield!).

An example:

class Fib:
	''' Generate Fibonacci series '''

	def __init__(self, max):
		self.max = max # class variable

	def __iter__(self):
		self.a = 0 # perform initialization
		self.b = 1
		return self

	def __next__(self):
		fib = self.a
		if fib > self.max
			raise StopIteration # stop
		self.a, self.b = self.b, self.a + self.b
		return fib # next value

To use iterator in for loop:

for n in Fib(1000):
	doSomething...

for loop automatically creates the iterator, calls next() on it, and stop when StopIteration is raised.

There are several special methods with iterator:

  • __reversed__: return a reversed iterator. The built-in reversed() function calls that. Reversed iteration only works if the object in question has a size that can be determined or __reversed__is implemented.
  • enumerate(iter)returns an instance of an enumerate object, which is an iterator that returns successive tuples consisting of a counter (index) and the value returned by calling next() on the sequence you’ve passed in. It is useful to keep track of indexes in iteration.
  • zip(a, b) creates an iterator that produces tuples (x, y) where x is taken from a and y is taken from b. One can iterate over multiple iterators at the same time. Iteration stops whenever one of the input sequences is exhausted. If this behavior is not desired, use itertools.zip_longest() instead. zip can also be used to create dictionary from two iterators: dict(zip(headers,values)).

iter method

It returns an itertor object.

  • iter(obj): obj must be a collection object that supports the iteration protocol or the sequence protocol, or TypeError is raised.
  • iter(callable, sentinel): it will call callable with no arguments for each call to its __next__() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

One useful application of the second form is to read lines of a file until a certain line is reached.

with open('mydata.txt') as fp:
    for line in iter(fp.readline, ''):
        process_line(line)

itertools module

It provides many useful functions on iterators.

  • itertools.islice(iter, start, end) slices an iterator. It achieves this by going through the iterator and discard unwanted items. end can be None if it returns everything beyond start.
  • itertools.dropwhile(func, iter) discards the first items in iter as long as the supplied function returns True.
  • itertools.permutations(items, num) generates permutations of items with length num. If num is not specified, the permutation is the same length of items.
  • itertools.combinations(items, num) generates combinations. itertools.combinations_with_replacement() allows repetitions with same item in items.
  • itertools.chain() chains multiple iterators together. It masks the actual type of each underlying iterators. It is more efficient than combining the sequences and then iterating.
  • itertools.product(a_list, b_list): returns an iterator over all Cartesian product of two sequences.
  • itertools.groupby(a_list, key): returns an iterator of iterators which groups elements in a_list by keys generated by key function. It only works if a_list is sorted by key.
  • itertools.zip_longest(): does the same thing as the built-in zip() except it stops at the end of the longest sequence, padding None values for shorter sequences.

Unpacking

Any sequence (or iterable) can be unpacked into variables using a simple assignment operation. The only requirement is that the number of variables and structure match the sequence.

x, y = (4, 5)
a, b, c = [1, 2, 3]
a, b, c = {1, 2, 3}
m, n = {1: "1", 2: "2"} # m = 1, n = 2

Unpacking actually works with any object that happens to be iterable, not just tuples or lists. This includes strings, files, iterators, and generators. When unpacking, you may sometimes want to discard certain values. Python has no special syntax for this, but you can often just pick a throwaway variable name for it. For example:

x, y, _ = [1, 2, 3]

To unpack an iterable longer than the number of variables, use “star expression”:

first, *middle, last = iterable

It’s worth noting that the middle variable will always be a list, regardless of how many items are unpacked (including none). There is a certain similarity between star unpacking and list-processing features of various functional languages. For example, if you have a list, you can easily split it into head and tail components:

head, *tail = items