Flocc (functional language on compute clusters) is a high-level language for Big Data/data parallel programming on clusters. Its compiler showcases a new technique to automatically optimize the storage of Big Data collections on clusters, that works for distributed arrays, maps, and lists. It is much more flexible than existing techniques like HPF and MapReduce that don't optimize their distributed data layouts, and typically only work for one collection type. The compiler works by considering using different distributed-memory implementations of a program's high-level data-parallel operators (encoded as higher-order functions), and uses a type system and type inference algorithm to automatically derive distributed data layout information for these operators. It then code generates MPI programs in C++ from possible plans, and uses a performance feedback based search to look for optimal cluster implementations of input programs.
Its primary purpose is to research a
type-driven technique to automatically synthesise
cluster implementations of data parallel programs, that is more flexible than
existing collection specific approaches like polyhedral auto-parallelization.
For more information please contact Tristan Aubrey-Jones in the ESS research group, ECS, University of Southampton, UK.
Additional information for HLPP14 paper:
Flocc compiler prototype:
Flocc is an experimental programming language. No production compiler exists for Flocc, but only a proof of concept code generator that targets C++ and MPI. This code generator is not being actively maintained. However, if you are interested in working on, continuing to research, or adapting Flocc, the code is available under the Apache 2.0 open-source lisence:
git clone https://github.com/flocc-net/flocc.git
PhD Thesis (2015)