News: OpenCL Standard Speeds Compute Intensive Applications

  1. OpenCL, an emerging new standard for distributing applications across processor cores and GPUs promises to dramatically speed up processing time for parallelizable tasks. One Mac programmer was stunned to see a 492 second audio processing task shrink to a mere 14.1 seconds. (http://www.supermegaultragroovy.com/blog/2009/11/12/swimming-in-opencl/) In this case, the application was only accelerating the code across the many cores on his Mac. But OpenCL can also be used to run many types of computationally intense applications on top of the Graphics Processing Units (GPU) in many PCs. These engines were initially designed for texture mapping and rendering photorealistic imagery for computer games. But it turns out they are also ideal for things like protein folding, DNA analysis and other floating point calculations. The Folding@Home project found that they could speed calculations by 30-times by taking advantage of the GPU in a PC with a custom built application. OpenCL hopes to make this style of programming applicable to a much wider audience. Ben Bergen with the Evolving Applications and Architectures Team, Los Alamos National Laboratory wrote that OpenCL will help them address several areas where there are no obvious tools or techniques to maintain portability across the variety of platforms they must routinely support. The current OpenCL trailblazers are AMD, NVIDIA, and Apple, which have all released OpenCL compliant products. AMD makes multi-core chips, ATI graphics cards, and Stream processors for accelerating floating point calculations. NVIDIA specializes in graphics cards and chips, which are increasingly being used in next generation supercomputers. Apple sells a lot of media workstations and servers. But a lot more support is expected as over 33 members have signed on to support the OpenCL standard. They include video game makers, chip vendors, cell phone vendors, and software developers. The technology promises to make it possible to speed up many types of computationally intense applications across multiple cores and special purpose DSPs in servers and workstations. New products are likely to come out more quickly thanks to the release of the OpenCL conformance testing suite last May which tests for functionality and computational accuracy. Read the press release here: http://www.khronos.org/news/press/releases/khronos-demonstrates-opencl-momentum-at-sc09 A list of OpenCL conformant products can be found here: http://www.khronos.org/adopters/conformant-products/#topencl Alpha level OpenCL drivers are also available for IBM’s POWER6 and Cell/B.E. Linux systems: http://www.alphaworks.ibm.com/tech/opencl

    Threaded Messages (4)

  2. CPUs vs GPUs[ Go to top ]

    It's worth noting that the detail of the article states that the task at hand may not have been well suited for GPU based acceleration - with the test taking over a minute on a GeForce 8800GT. The author was actually using it as a way of parallelizing work on an 8-core Mac Pro, which shows that it had advantages beyond just 'doing work on the GPU'.
  3. Java OpenCL bindings[ Go to top ]

    There are already some really good Java OpenCL bindings available I have used JOpenCL (http://sourceforge.net/projects/jopencl) and for the appropriate workloads (Data Parallel) OpenCL can really offer some impressive performance #'s. Other possible bindings are OpenCL4Java (http://code.google.com/p/nativelibs4java/wiki/OpenCL) and it's associated Scala bindings ScalaCL (a really great example of Scala's ability to host Domain Specific Languages) I also think that JOCL (http://www.jocl.org) is worth tracking as well.
  4. maybe this is a dumb question[ Go to top ]

    Why can't the JVM just decide when to use the GPU, for example when a user does floating point math, works with BigDecimal, or needs high speed locking around queues. I think OpenCL is great, but just wondering why GPU's can't be automatically leveraged in some cases? Thanks, ~Matt
  5. Re: maybe this is a dumb question[ Go to top ]

    I think it would be a real challenge for the JVM to do this for the general case. However, I do think where the Java application developer has already indicated that their algorithm is a data parallel problem and refactored their code to use something like ParallelArray (sadly no longer in JDK7 plans) it maybe a lot easier to imagine something getting offloaded.