I’m writing a load balancing algorithm for this cluster I’m working on, and have a bunch of things in a Python list sequence. So came the issue, how to divide the list into sublists for each node in the cluster to compute? I came up with the algorithm:
def split_seq_stride(l, n):
"""Splits a sequence l into a list of subsequences containing at least n
elements each, not preserving order. The first few subsequences may contain
n+1 elements, containing the last few elements. (Algorithm is good for load
balancing)"""
r=[]
k=len(l)/n
[r.append(l[i::k]) for i in range(k)]
return r
which behaves like:
>>> l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> print split_seq_stride(l, 3)
[[1, 4, 7, 10], [2, 5, 8], [3, 6, 9]]
Assuming each task is an equal amount of work, this algorithm is a good load balancing algorithm. it prevents any node from having to do any less significant work than the others.
Probably useful for someone: what about splitting up a sequence in-order, with the last subsequence containing fewer elements? This code segment does just that:
def split_seq(l, n):
"""Splits a sequence l into a list of subsequences containing at most n
elements each, preserving order. The last subsequence may contain less
than n elements."""
r=[]
[r.append(l[s:s+n]) for s in range(0, len(l), n)]
return r
which behaves like:
>>> l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> print split_seq(l, 3)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
Thanks to yason and others on EFnet’s #python for pointers.
Recent comments
7 weeks 2 days ago
9 weeks 6 days ago
10 weeks 2 days ago
28 weeks 6 hours ago
32 weeks 2 days ago
32 weeks 3 days ago
34 weeks 3 days ago
36 weeks 1 day ago
36 weeks 2 days ago
37 weeks 6 days ago