Ray Casting Volume Rendering on Shared-Memory Architectures:
Memory Hierarchy Considerations
IEEE Concurrency, Spring, 1998.
- Michael E. Palmer
- Brian Totty
- Stephen Taylor
Two algorithmic factors have the greatest impact on attainable frame
rates for ray casting of regular datasets on shared memory
architectures: 1) selection of a parallel partitioning and load
balancing algorithm, which has been thoroughly researched; and 2)
efficient exploitation of the memory hierarchy, which has been
identified as important, but not well characterized.
In ray casting through regular data, cache miss rates depend strongly
on the pattern of memory accesses as sets of rays traverse the data
voxels, causing performance to depend on view direction. Controlling
this dependence is important in the context of interactive volume
rendering. We explore memory hierarchy performance using three tools:
1) algorithmic modifications designed to separate out the portion of
time per frame attributable to memory hierarchy effects; 2) a hardware
bus-snooping board, which keeps a detailed log of bus traffic; 3) a
software cache miss simulator, which separates the total penalty into
two types of cache misses. Complex interpixel and intrapixel effects
are responsible for most of the cache effects observed.
We investigate two parallel partitioning methods, exploring the
tradeoff between their memory hierarchy performance and other
algorithmic optimizations which they do or do not allow.
Our focus on parallel memory hierarchy effects yields extremely good
performance. We render a 1 GB dataset at an average of 1.0 frames per
second on a 16 processor Power Challenge, faster than previously cited
in the literature for this size of dataset. Extending our methods to a
cluster of eight machines with eight processors each, we attain rates
of up to 10 frames per second with a 357 MB dataset.
Our analytical framework is applicable to other problems which, like
ray casting, contain coherence that can be exploited to improve memory
This material is presented to ensure timely dissemination of scholarly
and technical work. Copyright and all rights therein are retained by
authors or by other copyright holders. All persons copying this
information are expected to adhere to the terms and constraints
invoked by each author's copyright. In most cases, these works may not
be reposted without the explicit permission of the copyright holder.
Personal use of this material is permitted. However, permission to
reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or
redistribution to servers or lists, or to reuse any copyrighted
component of this work in other works must be obtained from the IEEE.
This file is in Adobe PDF format.
The Full paper (color, 16 pages).
Machine resources for this work were provided by Silicon Graphics
Corporation, the National Center for Supercomputing Applications
(NCSA), and Peter Schröder at Caltech. This research is sponsored
by the Defense Advanced Research Projects Agency (DARPA) under
contract number DABT63-95-C-0116, and AASERT award number
N0014-93-1-0843. The Visible Male and Visible Female datasets were
courtesy the National Library of Medicine. The Vorticity dataset was
courtesy the Laboratory for Computational Science and Engineering at
the University of Minnesota.