The availability of digital X-ray detectors, together with the advances on new reconstruction algorithms, create an opportunity for bringing 3D capabilities to conventional radiology systems. The downside is that reconstruction algorithms for non-standard acquisition protocols are generally based on iterative approaches that involve a high computational burden.
The development of new flexible X-ray systems may benefit from the use of computer simulations as it may enable checking performance before implementing expensive real systems. The development of simulation/reconstruction algorithms in this context poses three main difficulties. First, these algorithms deal with large data volumes and are computationally expensive, thus leading to the need of hardware and software optimizations. Second, these optimizations are limited by the high flexibility required to explore new scanning geometries, including fully configurable positioning of source and detector elements. Finally, the evolution of different hardware setups increases the required effort for maintaining and adapting the implementations to current and future programming models.
Previous works lack the support for completely flexible geometries and/or the compatibility with multiple programming models and platforms.
We present FUX-Sim, a novel X-ray simulation/reconstruction framework created with the objective of being flexible and fast. An optimized implementation for different families of GPUs (CUDA & OpenCL) and CPUs was achieved thanks to a modularized approach based on a layered architecture and a parallel implementation of the algorithms in both GPU and CPU.
A detailed performance evaluation demonstrates for different system configurations and hardware platforms that FUX-Sim maximizes performance with CUDA programming model (5 times faster than other state-of-the-art implementations). Furthermore, CPU and OpenCL programming models allow FUX-Sim to be executed in a wider range of hardware platforms.
In many scientific research fields, Matlab has been established as de facto tool for application design. This approach offers multiple advantages such as rapid deployment prototyping and the use of high performance linear algebra, among others. However, the applications developed are highly dependent of the Matlab runtime, limiting the deployment in heterogeneous platforms. In this paper we present the migration of a Matlab-implemented application to the C++ programming language, allowing the parallelization in GPUs. In particular, we have chosen RUMBA-SD, a spherical deconvolution algorithm, which estimates the intravoxel white-matter fiber orientations from diffusion MRI data. We describe the methodology used along with the tools and libraries leveraged during the translation task of such application. To demonstrate the benefits of the migration process, we perform a series of experiments using different high performance computing heterogeneous platforms and linear algebra libraries. This work aims to be a guide for future developments that are implemented out of Matlab. The results show that the C++ version attains, on average, a speedup of 8x over the Matlab one.