id,summary,reporter,owner,description,type,status,priority,milestone,component,version,resolution,keywords,cc
28,reduced performance of small types,tmcdonell,,"CUDA devices do not coalesce memory transfers to global memory of 8- and 16-bit types. Without providing alternate skeletons that process multiple elements per thread (vec4 and vec2 types respectively), we may be able to promote these to 32-bit transactions, and mask off the irrelevant data. Similar issues exist for shared memory bank conflicts.",defect,new,minor,,CUDA backend,0.8.1.0,,,
