- computational finance
- scientific computing on GPUs
- random number generation
- finite difference applications
- current developments

Analysts, scientist, engineers, and multimedia professionals require massive processing power to analyze financial trends, create test simulations, model climate, compile code, render video, decode genomes and other complex tasks. Although these groups could use specialized super computers, the custom development time and the hardware costs are prohibitive. This paper describes how we applied the Zircon adaptive high-performance computing software platform and tools with the NAG C library to substantially improve the performance of a representative complex computational finance application via distribution and parallelization, thereby reducing the total computation time from several hours to several minutes.

]]>**Benedikt Wilbertz***PMA - Laboratoire de Probabilites et Modeles Aleatoires*

**Abstract**

The pricing of American style and multiple exercise options is a very challenging problem in mathematical finance. One usually employs a Least-Square Monte Carlo approach (Longstaff-Schwartz method) for the evaluation of conditional expectations which arise in the Backward Dynamic Programming principle for such optimal stopping or stochastic control problems in a Markovian framework. Unfortunately, these Least-Square Monte Carlo approaches are rather slow and allow, due to the dependency structure in the Backward Dynamic Programming principle, no parallel implementation; whether on the Monte Carlo levelnor on the time layer level of this problem.

We therefore present in this paper a quantization method for the computation of the conditional expectations, that allows a straightforward parallelization on the Monte Carlo level. Moreover, we are able to develop for AR(1)-processes a further parallelization in the time domain, which makes use of faster memory structures and therefore maximizes parallel execution.

Finally, we present numerical results for a CUDA implementation of this methods. It will turn out that such an implementation leads to an impressive speed-up compared to a serial CPU implementation.

]]>