OpenCL Copy-Once Share a lot -
i implementing solution using opencl , want following thing, example have large array of data want copy in gpu once , have many kernels process batches of , store results in specific output buffers.
the actual question here way faster? en-queue each kernel portion of array needs have or pass out whole array before hand let each kernel (in same context) process required batch, since have same address space , each map array concurrently. of course said array read-only not constant changes every time execute kernel(s)... (so cache using global memory buffer).
also if second way faster point me direction on how implemented, haven't found concrete yet (although still searching :)).
cheers.
i use second memory normally. sharing memory easy. pass same buffer each kernel. in real-time ray-tracer. render 1 kernel , post-process (image process) another.
using c++ bindings looks this
cl_input_mem = cl::buffer(context, cl_mem_write_only, sizeof(cl_uchar4)*npixels, null, &err); kernel_render.setarg(0, cl_input_mem); kernel_postprocess.setarg(0, cl_input_mem);
if want 1 kernel operate on different segment of array/memory can pass offset value kernel arguments , add e.g. global memory pointer each kernel.
i use first method if array (actually sum of each buffer - including output) not fit in memory. reason use first method if you're running on multiple devices. in ray tracer use first method when render on multiple devices. example have 1 gtx 580 render upper half of screen , other gtx 580 rendering lower half (actually dynamically 1 device may render 30% while other 70% that's besides point). have each device render it's fraction of output , assemble output on cpu. pci 3.0 transfer , forth between cpu , gpu (multiple times) has negligible effect on frame rate 1920x1080 images.
Comments
Post a Comment