Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture
Indirect addressing is known for being slow on conventional architectures, due to the extra step of gathering data before computations can be done. There have been proposed many methods for optimizing indirect addressing. However, these almost exclusively, merely try to change the order in which data is accessed, so as to better utilize the cache. Furthermore, vector instructions can not be used, since data is not accessed continuously, and therefore valuable processing power can not be exploited. The Cell/B.E. architecture has multiple powerful DMA engines, suitable for gathering scattered data. Unfortunately, at fine data granularity, they have several constraints which make them inefficient. In this paper, a novel solution called DMA list Interlacing (DLI) is explored, which overcomes the DMA constraints and enables the usage of vector instructions, without any extra effort. It is shown that DLI can achieve speedups of several orders of magnitude, compared to conventional processors.