Ray Casting Volume Rendering on Shared-Memory Architectures:
Memory Hierarchy Considerations

IEEE Concurrency, Spring, 1998.

Authors

Abstract

Two algorithmic factors have the greatest impact on attainable frame rates for ray casting of regular datasets on shared memory architectures: 1) selection of a parallel partitioning and load balancing algorithm, which has been thoroughly researched; and 2) efficient exploitation of the memory hierarchy, which has been identified as important, but not well characterized.

In ray casting through regular data, cache miss rates depend strongly on the pattern of memory accesses as sets of rays traverse the data voxels, causing performance to depend on view direction. Controlling this dependence is important in the context of interactive volume rendering. We explore memory hierarchy performance using three tools: 1) algorithmic modifications designed to separate out the portion of time per frame attributable to memory hierarchy effects; 2) a hardware bus-snooping board, which keeps a detailed log of bus traffic; 3) a software cache miss simulator, which separates the total penalty into two types of cache misses. Complex interpixel and intrapixel effects are responsible for most of the cache effects observed.

We investigate two parallel partitioning methods, exploring the tradeoff between their memory hierarchy performance and other algorithmic optimizations which they do or do not allow.

Our focus on parallel memory hierarchy effects yields extremely good performance. We render a 1 GB dataset at an average of 1.0 frames per second on a 16 processor Power Challenge, faster than previously cited in the literature for this size of dataset. Extending our methods to a cluster of eight machines with eight processors each, we attain rates of up to 10 frames per second with a 357 MB dataset.

Our analytical framework is applicable to other problems which, like ray casting, contain coherence that can be exploited to improve memory locality.

Copyright notice

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Download

This file is in Adobe PDF format.

  • The Full paper (color, 16 pages).

    Acknowledgements

    Machine resources for this work were provided by Silicon Graphics Corporation, the National Center for Supercomputing Applications (NCSA), and Peter Schröder at Caltech. This research is sponsored by the Defense Advanced Research Projects Agency (DARPA) under contract number DABT63-95-C-0116, and AASERT award number N0014-93-1-0843. The Visible Male and Visible Female datasets were courtesy the National Library of Medicine. The Vorticity dataset was courtesy the Laboratory for Computational Science and Engineering at the University of Minnesota.