Towards heterogeneous parallelism for SPHinXsys (Part 1)

Xiangyu Hu and Alberto Guarnieri

Introduction

This blog is based on the presentation of the same title to be presented on 19^th SPHERIC World Conference 2025.

As smoothed particle hydrodynamics (SPH) is a typical particle-based method, many SPH libraries suffer from high computational costs due to intensive particle interactions. Compared to the multi-cores CPU system, the many-cores device (typically GPU) system provides a much higher performance/cost ratio for the intermediate (multi-million particles) scale simulations, which are frequently encountered in industrial applications. On the other hand, since the versatile functionality of CPU system, it is expected that SPH libraries are using heterogeneous parallelism so that they are able to take advantage of the both.

Currently, some SPH libraries already offer GPU support, either through vendor-specific implementations, such as DualSPHysics, or heterogeneous programming using the OpenCL standard, such as AQUAgpusph and pySPH. An important issue of these implementations is that they require computing kernels to be defined, as a low-level specification, with specific directives separated from the rest of the code, rendering it less easily programmable and maintainable.

Open-source multi-physics library SPHinXsys

SPHinXsys is an open-source C++ SPH multi-physics simulations library. It addresses the complexities of fluid dynamics, structural mechanics, fluid-structure interactions, thermal analysis, chemical reactions and AI-aware optimizations. SPHinXsys has offered CPU parallelism using Intel’s Threading Building Blocks (TBB) library and several features suitable for open-source based development.

First, since the parallel execution is encapsulated into the low-level classes decoupled from the SPH method, the developers, as typical mechanical engineers, only need to work the numerical discretization without concerning parallelization. Second, variable testing approaches, including unit test, google test and regression test, have been incorporated so that refraction, adding new features and other modifications can be implemented without the worries about the already established functionalities. Third, automated cross-platform continuous integration/development (CI/CD) tests have being carried out on the open-source development platform frequently for any modification of the main branch of SPHinXsys repository. In addition, SPHinXsys makes the uses of several third particle libraries, such as Simbody library for multi-body dynamics, Pybind11 library for generating python interface used for machine learning and optimization applications.

SYCL Programming Standard

Different from the other GPU-able SPH libraries, our work is based on the SYCL standard for parallel computing. SYCL specification defines a new single-source ISO C++ compliant standard with compute acceleration. It aims to be an high level abstraction that can be programmed as classical CPU code, without accelerator-specific directive and application interface (API) calls. In this work, the Intel SYCL implementation, i.e. Data Parallel C++ (DPC++) is chosen, because it is a part of Intel’s oneAPI suit which also includes the TBB library already used in SPHinXsys.

A SYCL computing kernel (sharing the same term with the smoothing kernel used in SPH) is a scoped block of code executed on a SYCL device under the global context queue and the local one command_group initiated by the host. Note that, as the SYCL host and device both can be CPU, CI/CD jobs can test SYCL kernels on computers without GPU, such the standard runner provided by Github platform.

In the next post, I will give the details on the implementation of SYCL kernels in SPHinXsys.