Fall 2007 Final Project and Resources

Please feel free to look around and find an application group that may be of interest for your final project. If you would like to use your own idea for your final project, please submit a Final Project Application Information sheet (doc) to one of the TAs.

The objective of ECE498AL final projects is to provide students with first-hand experience in accelerating real-world applications. These projects should allow students to apply their knowledge in writing high-performance massively parallel applications to applications whose acceleration can enable significant advancement of a research area. The areas of interest include science, engineering, medicine, and financing. In order to save your time in search for a project, we have created a recommended list of projects that are of interest to world-class researchers. These researchers will be interested in help mentoring the student teams pursuing their projects.

Due: 10/31/07: Please submit your team's Final Project Design Documentation (doc) by Oct. 31.
Due: 12/14/07: Final Project Presentation template slide (ppt).
Due: 12/14/07: Final Project Report template (doc).

Leaders Themes
Jim Phillips, John Stone

1. Molecular Dynamics (UIUC Beckman Institute)
Theoretical and Computational Biophysics (CUDA Projects Webpage)

Many aspects of molecular dynamics simulation can be drastically accelerated by massively parallel processors. The Theoretical and Computational Biophysics Lab at the Beckman Institute is the home of NAMD/VMD, a suite of computational molecular dynamics tools that are used by a very large user community worldwide.

The projects include Molecular Surface Computation (pdf), Lattice Cutoff Computation for Multilevel Summation (pdf), Grid Generation and Matching for Small Molecule Docking (pdf), Monte Carlo Sampling of Protein Conformations (pdf).

New! GPU Acceleration of spatial decomposition and pairlist maintenance for Molecular Dynamics (pdf)

Molecular dynamics (MD) is a computational technique used to study biopolymers and
other related systems using classical mechanics. MD calculations consist of a large series
(>10^7) timesteps in which the energy of and forces on the system are evaluated using a
given potential, and the system's motion over a short period of time (~1 fs) is propagated
using the previous velocities and current forces using Hamilton's equations of motion.
Molecular dynamics is used, among other things, for investigating the atomicscale
mechanisms of biological systems, designing and analyzing pharmaceutical compounds,
and designing nanodevices that interact with biomolecules. Because of the massive
computational requirements of MD, the ability to accelerate MD using GPUs would be
greatly beneficial to research using this technique.

Behzad Sharif
Sam Stone

2. Magnetic Resonance Imaging (UIUC)

Fast MRI image reconstruction can enable production of a movie of beating heart and allows early detection of cardio-vascular disease. The project here uses a model-based adaptive approach to construct detailed real-time 3D movies that can be used in clinic settings.

PARADISE - MRI (pdf)
The MRI project that we are working on is entitled Patient-Adaptive Reconstruction and Acquisition in Dynamic Imaging with Sensitivity Encoding (PARADISE). PARADISE promises to increase the spatial and temporal resolutions of current cardiac MRI methods by 20 folds or more. It has been tested both in simulation and also on real human subjects. (I can provide links to publications and results later if you're interested).

Ed Seidel
Fluid Dynamics
3. Computational Fluid Dynamics and Ocean simulation (Louisiana State University)

Finite-Element coastal ocean model (pdf).

Robert J. Brunner
Volodymyr Kindratenko
Dave Semeraro
NCSA

4. Computational Astrophysics (UIUC)

Fast Power Spectrum Estimation (pdf)
The last few decades have seen dramatic changes in our understanding of nature
on the grandest of scales. The current theoretical picture is that the universe began in a
Big Bang, after which a period of exponential growth, perhaps due to the decay of a
scalar field, inflated quantum fluctuations in the matter-radiation plasma into actual
perturbations. These perturbations grew under the influence of gravity to become cosmic
structures, including galaxies, clusters of galaxies, and superclusters. To understand how
these structures developed and evolve as well as to quantify fundamental cosmological
parameters, we must statistically analyze large samples of cosmic sources. One of the
fundamental statistics used in cosmology is the power spectrum.

Spherical 3D Discrete Elements Method (DEM) (doc)
The application allows to simulate a large number of spherical particles moving due to the gravity and contact forces. See http://en.wikipedia.org/wiki/Discrete_element_method for a general overview of the method and its applications.

Dennis J Lin
Thomas Huang
5. Computer Vision for HCI (UIUC)

Gestures are an integral part of human-human communication, and automated gesture recognition systems will go a long way to improving human-computer interactions. Vision-based gesture interfaces are useful in virtual-reality and clean-room environments, where carrying an input device can be cumbersome or dangerous. Also, advances in gesture recognition may translate into better whole-body tracking and eventually lead to automated sign-language recognition. However, the human hand is a versatile instrument, and it can take on a wide variety of shapes. Unlike faces, which change shape by deforming smoothly, the joints on the fingers produce articulated motion that is difficult to characterize. Also, the fingers are relatively textureless and prone to self-occlusion, which means that a single view is often too ambiguous to identify the hand pose. Finally, driven by relatively large muscle in the forearm, fingers are capable of incredibly fast motion, posing an additional challenge for interactive applications.

Computer Vision for HCI (pdf)

Michael Garland 6. Sparse Matrix Computation (NVIDIA)

The ability to perform efficient computations on linear systems and matrices is crucial in a host of applications. For linear systems where all (or most) entries of the matrix are non-zero, we would use efficient dense matrix routines like those provided by BLAS and LAPACK. However, many of the linear systems that arise in practice have relatively few non-zero entries. For instance, a system of equations defined over a triangle mesh might form a matrix with 1 million rows and columns, but for which on average only 7 entries per row are non-zero. Such sparse matrices require rather different routines to achieve efficiency. The goal of this project is to implement such routines.

Sparse Matrix Computation (doc)

Stuart Levy, UIUC/NCSA 7. Computer Animation for Scientific Visualization (UIUC/NCSA)

Current CPUs are a bit too slow to decompress high quality, very high resolution animations (for example, 1920x1080 stereoscopic, or 4Kx2K monoscopic) at full frame rates (for example, 30 Hz). But this should be well within the capability of GPUs, which can thus replace what would otherwise be quite expensive hardware. The core of the application would be a GPU-accelerated JPEG decompressor, whose output would go to GPU textures or to the framebuffer. Most of the arithmetic needed for JPEG (Discrete Cosine Transform and YCbCr â†' RGB conversion, on nice compact 8x8-pixel blocks) should be well-suited for GPU acceleration. It's less clear whether the Huffman decoding phase can be accelerated too, but may be worth a try. JPEG offers some features not needed here: progressive, hierarchical, and lossless extensions. We might also specialize for a particular style of chroma subsampling, e.g. 4:2:0.

Wen-mei Hwu, Shane Ryoo

8. More Projects
Physics / Quantum Chromodynamics (doc)

Russell Hewett (preferred contact)
Mark Butala (preferred contact)
Farzad Kamalabadi

9. State Estimation with the Localized Ensemble Kalman Filter (doc)

The goal of state estimation is to make a best estimate of a time evolving object given a statistical model for the state evolution and a set of statistically related measurements. State estimation of large-scale dynamic objects is a computationally enormous problem. Even relatively small state estimation problems require lengthy execution times and large amounts of computational storage. The Localized Ensemble Kalman Filter is method that scales much better in memory usage compared to most state estimation techniques, but significant computational speedup is still needed to make large-scale reconstructions possible.

An example of a large scale dynamic state estimation problem is time dependent tomography where the goal is to reconstruct a time evolving object from noisy line integral measurements. Fast algorithms such as filtered back projection exist for large-scale tomography problems, but such methods assume that the object is unchanging during the measurement interval. Dynamic tomography problems such as the time dependent reconstruction of the electron density in the solar atmosphere or functional MRI scanning of the human brain are of such large-scale that state estimation, even with the LEnKF, is computationally intractable.

Lu Wan 10. Design Space Exploration for Implicitly Parallel Programming Models on Massive Processor Array (doc)

Implicitly Parallel Programming Models have recently been proposed. To reduce the cost of parallel programming, new compiler modules need be constructed to deal with the complexity automatically. Code-Gen is one important phase in that the hardware constraints and software runtime requirements need be considered in a unified framework. The quality of the generated code largely depends on efficient space exploration during Code-Gen phase. Besides, hardware setting such as granularity of the parallel execution hardware, sram / register size, execution speed, etc, should be modeled accordingly to facilitate space exploration. Similarities exist between Code-Gen problem and circuit-level synthesis. This project will try to exploit circuit-level CAD techniques to deal with problems in the emerging Implicitly Parallel Programming domain.
Michael Connor

11. Machine Learning: SVMs on GPU (doc)

A popular and very powerful machine learning algorithm is Support Vector Machine (SVM). While this technique is both theoretically interesting and empirically quite accurate, it is also quite slow. A GPU implementation should be able to speed up training that once took many hours or days (and tuning runs that could take weeks) into something manageable, while retaining global optimization lost by other approximations.

Xiaohuang Huang
Changrui Yuan
Shanxiang Qi
12. HDR Image Processing Library (doc)

Our aim is going to implement the HDR image processing library. Because the floating-point calculation is not as fast as integer calculation. Once the HDR image processing in use, the GPU accelerate is required.
   
Resources:
NVIDIA Official CUDA Page http://developer.nvidia.com/object/cuda.html
CUDA Occupancy Calculator (xls) - A programmer's tool that allows you to compute the multiprocessor occupancy of a GPU by a given CUDA kernel. The multiprocessor occupancy is the ratio of active warps to the maximum number of warps supported on a multiprocessor of the GPU, and is helpful in determining how efficient the kernel will be on the GPU.

 

  Archived final project listings from previous semester(s):  
  Spring 2007 - First-time course offering by Prof. Hwu (UIUC) and Prof. Kirk (NVIDIA)!