Skip to content
Commit 0e5e07b2 authored by Justin Luitjens's avatar Justin Luitjens Committed by Daniel Povey
Browse files

[src] Add interfaces to nnet-batch-compute that expects device input. (#3311)

This avoids a ping pong of memory to host.

Implementation now assumes device memory.  interfaces will allocate
device memory and copy to it if data starts on host.

Add a cuda matrix copy function which clamps rows.  This is much
faster than copying one row at a time and the kernel can handle the
clamping for free.
parent 9e0a7f60
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment