Ticket #28 (new defect)
reduced performance of small types
| Reported by: | tmcdonell | Owned by: | |
|---|---|---|---|
| Priority: | minor | Milestone: | |
| Component: | CUDA backend | Version: | 0.8.1.0 |
| Keywords: | Cc: |
Description
CUDA devices do not coalesce memory transfers to global memory of 8- and 16-bit types. Without providing alternate skeletons that process multiple elements per thread (vec4 and vec2 types respectively), we may be able to promote these to 32-bit transactions, and mask off the irrelevant data. Similar issues exist for shared memory bank conflicts.
Change History
Note: See
TracTickets for help on using
tickets.
