Ticket #28 (new defect)

Opened 7 years ago

Last modified 7 years ago

reduced performance of small types

Reported by: tmcdonell Owned by:
Priority: minor Milestone:
Component: CUDA backend Version:
Keywords: Cc:


CUDA devices do not coalesce memory transfers to global memory of 8- and 16-bit types. Without providing alternate skeletons that process multiple elements per thread (vec4 and vec2 types respectively), we may be able to promote these to 32-bit transactions, and mask off the irrelevant data. Similar issues exist for shared memory bank conflicts.

Change History

Changed 7 years ago by tmcdonell

  • owner tmcdonell deleted

Changed 7 years ago by chak

  • version changed from to
Note: See TracTickets for help on using tickets.