Questions tagged [gpgpu]

For questions related to the usage of graphics processing units for computation outside of the traditional graphics pipeline but still somewhat related to computer graphics.

Common APIs for GPGPU include CUDA, OpenCL, and compute shaders (e.g., via DirectX, OpenGL, and Vulkan).

23 questions
19
votes
1 answer

Why is recursion forbidden in OpenCL?

I'd like to use OpenCL to accelerate rendering of raytraced images, but I notice that the Wikipedia page claims that recursion is forbidden in Open CL. Is this true? As I make extensive use of recursion when raytracing, this will require a…
15
votes
1 answer

Synchronizing successive OpenGL Compute Shader invocations

I have a couple of compute shaders that need to be executed in a certain order and whose outputs depend on previous inputs. Ideally, I'll never need to copy a buffer client-side and do all of my work on the GPU. Consider I have two compute shaders…
Mokosha
  • 1,124
  • 7
  • 23
11
votes
1 answer

What is my GPU waiting on?

I am writing an OpenCL program for use with my AMD Radeon HD 7800 series GPU. According to AMD's OpenCL programming guide, this generation of GPU has two hardware queues that can operate asynchronously. 5.5.6 Command Queue For Southern Islands and…
Mokosha
  • 1,124
  • 7
  • 23
10
votes
1 answer

Per Vertex Computation in OpenGL Tesselation

I try to implement a position based cloth simulation using hardware tesselation. This means I want to just upload a control quad to the graphics card and then use tesselation and geometry shading to create the nodes in the cloth. This idea follows…
Dragonseel
  • 1,790
  • 1
  • 10
  • 23
10
votes
1 answer

Optimal memory access when using lookup tables on GPU?

I'm exploring isosurface algorithms on GPU for a bachelor's project (specifically concentrating on just binary in/out voxel data rather than real-valued fields). So I have a CPU implementation of good old marching cubes up and running in…
russ
  • 2,332
  • 8
  • 17
7
votes
2 answers

GPU branching if without else

It's common knowledge that branching in a GPU program is costly because it may have to run both the if and else logic for every pixel being evaluated in the same wave, but only applying each result to the appropriate pixels. I was curious if…
Alan Wolfe
  • 7,711
  • 3
  • 29
  • 72
5
votes
0 answers

CUDA cuMemcpuHtoD vs cuMemcpy2D

Asking it here and not on SO as it seems to be appropriate question for CG. I am learning NVIDIA NVENC API.The SDK supplies a sampled called "NvEncoderCudaInterop" .There is a chunk of code which copies YUV plane arrays from CPU to GPU buffers. This…
Michael IV
  • 259
  • 2
  • 13
4
votes
1 answer

Using multiply and accumulate of 4x4 matrices for ray-triangle intersection tests on GPU

Is it possible to gain performance boost using new 4x4 MAD from NVIDIA'a tensor cores for ray-triangle intersection tests? Really there are two questions: Is it possible to modify some of the ray-triangle algorithms (say, Möller-Trumbore algorithm)…
3
votes
1 answer

How to avoid slowdown with 25-30 students running simple GPU kernels on 4 GeForce GTX 650 Ti s?

So I'm teaching crash-course in CUDA that teaches students how to write good GPU code (CUDA 7.5 in this case). They kernels they will be running will do matrix multiply on 2048x2048 floating point matrices, some kernels involving multiple blocks and…
lil' wing
  • 33
  • 6
3
votes
0 answers

What aspects of GPU architecture are computer graphics programmers expected to be intimately familiar with?

I am an aspiring CG programmer and would like to know what some of the more nuanced aspects of computer architecture are. I have already taken several introductory arch courses where we've covered things like cache levels, basic multi-staged…
2
votes
0 answers

Using GPU instead of CPU in Scala

I wrote a program that displays points expressed in 3D in a 2D canvas, using perspective projection. The aim is to display a cube. Each face of the cube is drawn by linearly interpolating the points that define it. My cube can be translated, scaled…
2
votes
0 answers

Optimization Strategies for FFT sound transformations using GPGPU

I want to run audio FFT transformations on a GPU using, possibly, OpenCL. What are the best optimization strategies for: converting audio signals to FFT; transfer them to the graphics card; compute those return the FFT into audio signal domain if…
tmm88
  • 21
  • 3
1
vote
0 answers

Why cache working set per multiprocessor for texture memory in Nvidia has a variable size?

I saw it here https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications__technical-specifications-per-compute-capability , I don't know if it also happens for AMD.
alvaro9650
  • 11
  • 1
1
vote
1 answer

Is `groupshared` memory stored in L2 cache of GPU?

The article says that L1 cache is shared by work items in the same work group(aka. SM) and L2 cache is shared by different work groups. In Direct3D, it seems that a thread group (which is specified by numthread(x, y, z)) is mapped to multi work…
Cu2S
  • 167
  • 4
1
vote
0 answers

Why don't discretization errors occur with compute-shaded kernel filters?

An efficient compute-shaded image filter would be emitted with (screenX / [kernel width], screenY / [kernel height], 1) groups and one kernel in each group, allowing texels to pass into groupshared memory and reducing the total number of samples at…
Paul Ferris
  • 437
  • 2
  • 11
1
2