- Aug 30, 2016
-
-
Shiyin Kang authored
-
Shiyin Kang authored
New: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0138019 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0133804 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.056202 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.052121 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.227829 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.186255 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65638 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65072 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 2.15268 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 1.64888 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 5.1179 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 3.85136 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 10.8209 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 6.76963 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.0133584 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.011373 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0533796 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0528196 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.202721 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.170107 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.627234 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.722198 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.89987 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.44478 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 4.14807 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 3.37973 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 6.70849 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 4.96657 gigaflops.
-
- Aug 27, 2016
-
-
Shiyin Kang authored
-
- Aug 26, 2016
-
-
Shiyin Kang authored
-
Shiyin Kang authored
-
- Aug 24, 2016
-
-
Shiyin Kang authored
-
Shiyin Kang authored
-
Shiyin Kang authored
-
- Aug 23, 2016
-
-
Shiyin Kang authored
-
Shiyin Kang authored
-
- Aug 21, 2016
-
-
Daniel Povey authored
-
- Aug 17, 2016
-
-
freewym authored
-
- Aug 04, 2016
-
-
Daniel Povey authored
Minor fix to queue.pl RE its workaround for bugs in NFS; change when CuDevice warning is printed regarding device exclusive mode.
-
- Jul 30, 2016
-
-
Shiyin Kang authored
CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 16 0.0138 0.0172 1.24x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 32 0.0581 0.0646 1.11x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 64 0.2201 0.2271 1.03x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 128 0.7907 0.7302 0.92x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 256 1.9197 2.0379 1.06x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 512 3.8760 3.9739 1.03x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 1024 5.3297 7.2730 1.36x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 2048 4.7379 7.2775 1.54x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 4096 4.1652 8.7746 2.11x CuVector::AddDiagMatMat<double>[no-trans],[no-trans], 8192 2.7393 9.6129 3.51x CuVector::AddDiagMatMat<double>[trans],[trans], 16 0.0137 0.0175 1.28x CuVector::AddDiagMatMat<double>[trans],[trans], 32 0.0576 0.0639 1.11x CuVector::AddDiagMatMat<double>[trans],[trans], 64 0.2209 0.2254 1.02x CuVector::AddDiagMatMat<double>[trans],[trans], 128 0.8055 0.7418 0.92x CuVector::AddDiagMatMat<double>[trans],[trans], 256 1.9017 2.0358 1.07x CuVector::AddDiagMatMat<double>[trans],[trans], 512 3.8703 3.9644 1.02x CuVector::AddDiagMatMat<double>[trans],[trans], 1024 5.2985 7.3149 1.38x CuVector::AddDiagMatMat<double>[trans],[trans], 2048 4.9325 7.2759 1.48x CuVector::AddDiagMatMat<double>[trans],[trans], 4096 4.1638 8.7515 2.10x CuVector::AddDiagMatMat<double>[trans],[trans], 8192 2.6703 9.6149 3.60x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 16 0.0137 0.0174 1.28x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 32 0.0576 0.0614 1.07x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 64 0.2150 0.2367 1.10x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 128 0.8098 0.7457 0.92x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 256 1.9851 2.1878 1.10x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 512 4.1400 4.3129 1.04x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 1024 6.2485 8.0504 1.29x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 2048 6.7869 12.2660 1.81x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 4096 5.8144 12.1037 2.08x CuVector::AddDiagMatMat<float>[no-trans],[no-trans], 8192 3.2519 15.0645 4.63x CuVector::AddDiagMatMat<float>[trans],[trans], 16 0.0137 0.0180 1.31x CuVector::AddDiagMatMat<float>[trans],[trans], 32 0.0568 0.0672 1.18x CuVector::AddDiagMatMat<float>[trans],[trans], 64 0.2193 0.2263 1.03x CuVector::AddDiagMatMat<float>[trans],[trans], 128 0.8132 0.7751 0.95x CuVector::AddDiagMatMat<float>[trans],[trans], 256 1.9621 2.1918 1.12x CuVector::AddDiagMatMat<float>[trans],[trans], 512 4.2527 4.3181 1.02x CuVector::AddDiagMatMat<float>[trans],[trans], 1024 6.3149 8.0543 1.28x CuVector::AddDiagMatMat<float>[trans],[trans], 2048 6.7934 12.3520 1.82x CuVector::AddDiagMatMat<float>[trans],[trans], 4096 5.8246 12.0940 2.08x CuVector::AddDiagMatMat<float>[trans],[trans], 8192 3.2314 15.0555 4.66x reformat code
-
Shiyin Kang authored
CuVector::AddDiagMatMat<float>[trans],[no-trans], 16 0.0150 0.0172 1.15x CuVector::AddDiagMatMat<float>[trans],[no-trans], 32 0.0593 0.0666 1.12x CuVector::AddDiagMatMat<float>[trans],[no-trans], 64 0.2161 0.2533 1.17x CuVector::AddDiagMatMat<float>[trans],[no-trans], 128 0.6925 0.9069 1.31x CuVector::AddDiagMatMat<float>[trans],[no-trans], 256 1.7409 2.9110 1.67x CuVector::AddDiagMatMat<float>[trans],[no-trans], 512 3.5518 6.7235 1.89x CuVector::AddDiagMatMat<float>[trans],[no-trans], 1024 5.5328 13.3136 2.41x CuVector::AddDiagMatMat<double>[trans],[no-trans], 16 0.0157 0.0179 1.14x CuVector::AddDiagMatMat<double>[trans],[no-trans], 32 0.0578 0.0693 1.20x CuVector::AddDiagMatMat<double>[trans],[no-trans], 64 0.2088 0.2620 1.25x CuVector::AddDiagMatMat<double>[trans],[no-trans], 128 0.7430 0.9503 1.28x CuVector::AddDiagMatMat<double>[trans],[no-trans], 256 1.7494 3.0979 1.77x CuVector::AddDiagMatMat<double>[trans],[no-trans], 512 3.0646 6.1060 1.99x CuVector::AddDiagMatMat<double>[trans],[no-trans], 1024 5.0206 9.4023 1.87x
-
Shiyin Kang authored
CuVector::AddDiagMatMat<float>[no-trans],[trans], 16 0.0132 0.0188 1.42x CuVector::AddDiagMatMat<float>[no-trans],[trans], 32 0.0563 0.0738 1.31x CuVector::AddDiagMatMat<float>[no-trans],[trans], 64 0.2220 0.2846 1.28x CuVector::AddDiagMatMat<float>[no-trans],[trans], 128 0.8277 0.9890 1.19x CuVector::AddDiagMatMat<float>[no-trans],[trans], 256 3.4564 3.3012 0.96x CuVector::AddDiagMatMat<float>[no-trans],[trans], 512 7.8546 8.6339 1.10x CuVector::AddDiagMatMat<float>[no-trans],[trans], 1024 14.4238 16.4371 1.14x CuVector::AddDiagMatMat<double>[no-trans],[trans], 16 0.0138 0.0175 1.27x CuVector::AddDiagMatMat<double>[no-trans],[trans], 32 0.0561 0.0715 1.27x CuVector::AddDiagMatMat<double>[no-trans],[trans], 64 0.2280 0.2765 1.21x CuVector::AddDiagMatMat<double>[no-trans],[trans], 128 0.9059 0.9130 1.01x CuVector::AddDiagMatMat<double>[no-trans],[trans], 256 3.2346 2.9633 0.92x CuVector::AddDiagMatMat<double>[no-trans],[trans], 512 5.7313 6.6734 1.16x CuVector::AddDiagMatMat<double>[no-trans],[trans], 1024 9.2105 10.1042 1.10x
-
- Jul 24, 2016
-
-
Ke Li authored
-
- Jul 16, 2016
-
-
Shiyin Kang authored
bench result: CuMatrix::DiffGroupPnorm<float>, 16 0.019 0.009 2.11x CuMatrix::DiffGroupPnorm<float>, 32 0.074 0.036 2.06x CuMatrix::DiffGroupPnorm<float>, 64 0.297 0.142 2.10x CuMatrix::DiffGroupPnorm<float>, 128 1.142 0.520 2.20x CuMatrix::DiffGroupPnorm<float>, 256 3.442 1.553 2.22x CuMatrix::DiffGroupPnorm<float>, 512 6.856 2.943 2.33x CuMatrix::DiffGroupPnorm<float>, 1024 11.653 3.915 2.98x CuMatrix::DiffGroupPnorm<float>, 2048 13.812 4.263 3.24x CuMatrix::DiffGroupPnorm<float>, 4096 14.431 4.381 3.29x CuMatrix::DiffGroupPnorm<double>, 16 0.019 0.009 2.17x CuMatrix::DiffGroupPnorm<double>, 32 0.073 0.033 2.20x CuMatrix::DiffGroupPnorm<double>, 64 0.296 0.133 2.22x CuMatrix::DiffGroupPnorm<double>, 128 1.068 0.457 2.34x CuMatrix::DiffGroupPnorm<double>, 256 2.999 1.159 2.59x CuMatrix::DiffGroupPnorm<double>, 512 4.921 1.705 2.89x CuMatrix::DiffGroupPnorm<double>, 1024 6.932 1.993 3.48x CuMatrix::DiffGroupPnorm<double>, 2048 7.499 2.087 3.59x CuMatrix::DiffGroupPnorm<double>, 4096 7.684 2.104 3.65x fix bug unit test for diff group pnorm easy test for now back to full test fix p=inf for MatrixBase::GrouPnormDeriv
-
Shiyin Kang authored
standard inf del TODO
-
Shiyin Kang authored
fix bug
-
Daniel Povey authored
Add script to automatically put the Kaldi libraries we link with in the right order; use it to modify the Makefiles. Minor top-level Makefile fix.
-
- Jul 15, 2016
-
-
Yiming Wang authored
Shrink in-value of ClipGradientComponent toward some smaller value when clipping proportion exceeds some threshold (#803) (also minor bug fix in profiling in cu-vector.cc)
-
- Jul 08, 2016
-
-
Shiyin Kang authored
New: For CuMatrix::Softmax<float>, for dim = 16, speed was 0.0153621 gigaflops. Old: For CuMatrix::Softmax<float>, for dim = 16, speed was 0.0138999 gigaflops. New: For CuMatrix::Softmax<float>, for dim = 32, speed was 0.0614275 gigaflops. Old: For CuMatrix::Softmax<float>, for dim = 32, speed was 0.0507328 gigaflops. New: For CuMatrix::Softmax<float>, for dim = 64, speed was 0.235765 gigaflops. Old: For CuMatrix::Softmax<float>, for dim = 64, speed was 0.203548 gigaflops. New: For CuMatrix::Softmax<float>, for dim = 128, speed was 0.729239 gigaflops. Old: For CuMatrix::Softmax<float>, for dim = 128, speed was 0.725481 gigaflops. New: For CuMatrix::Softmax<float>, for dim = 256, speed was 2.30126 gigaflops. Old: For CuMatrix::Softmax<float>, for dim = 256, speed was 1.71863 gigaflops. New: For CuMatrix::Softmax<float>, for dim = 512, speed was 5.0565 gigaflops. Old: For CuMatrix::Softmax<float>, for dim = 512, speed was 3.69659 gigaflops. New: For CuMatrix::Softmax<float>, for dim = 1024, speed was 10.2482 gigaflops. Old: For CuMatrix::Softmax<float>, for dim = 1024, speed was 6.38335 gigaflops. New: For CuMatrix::Softmax<double>, for dim = 16, speed was 0.0143354 gigaflops. Old: For CuMatrix::Softmax<double>, for dim = 16, speed was 0.013143 gigaflops. New: For CuMatrix::Softmax<double>, for dim = 32, speed was 0.0590478 gigaflops. Old: For CuMatrix::Softmax<double>, for dim = 32, speed was 0.0495458 gigaflops. New: For CuMatrix::Softmax<double>, for dim = 64, speed was 0.228611 gigaflops. Old: For CuMatrix::Softmax<double>, for dim = 64, speed was 0.193465 gigaflops. New: For CuMatrix::Softmax<double>, for dim = 128, speed was 0.668961 gigaflops. Old: For CuMatrix::Softmax<double>, for dim = 128, speed was 0.676449 gigaflops. New: For CuMatrix::Softmax<double>, for dim = 256, speed was 2.1013 gigaflops. Old: For CuMatrix::Softmax<double>, for dim = 256, speed was 1.51862 gigaflops. New: For CuMatrix::Softmax<double>, for dim = 512, speed was 4.13055 gigaflops. Old: For CuMatrix::Softmax<double>, for dim = 512, speed was 3.1547 gigaflops. New: For CuMatrix::Softmax<double>, for dim = 1024, speed was 6.43429 gigaflops. Old: For CuMatrix::Softmax<double>, for dim = 1024, speed was 5.02974 gigaflops. minor changes
-
- Jun 26, 2016
-
-
Shiyin Kang authored
-
- Jun 25, 2016
-
-
Shiyin Kang authored
New: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 16, speed was 0.0165568 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 16, speed was 0.00355242 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 32, speed was 0.0678791 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 32, speed was 0.0145515 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 64, speed was 0.24739 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 64, speed was 0.0583246 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 128, speed was 0.898427 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 128, speed was 0.225076 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 256, speed was 2.89009 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 256, speed was 0.834096 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 512, speed was 6.72164 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 512, speed was 1.92722 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 1024, speed was 10.4916 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<float>, for dim = 1024, speed was 2.78281 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 16, speed was 0.0148584 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 16, speed was 0.00260567 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 32, speed was 0.0586865 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 32, speed was 0.0121077 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 64, speed was 0.22893 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 64, speed was 0.0527767 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 128, speed was 0.763462 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 128, speed was 0.175736 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 256, speed was 2.40457 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 256, speed was 0.58351 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 512, speed was 4.55165 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 512, speed was 1.42464 gigaflops. New: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 1024, speed was 4.36421 gigaflops. Old: For CuMatrix::DiffSoftmaxPerRow<double>, for dim = 1024, speed was 1.94971 gigaflops.
-
Shiyin Kang authored
-
Shiyin Kang authored
-
- Jun 23, 2016
-
-
Dan Povey authored
Modify error message and documentation to suggest nvidia-smi -c 3 (process exclusive mode) as thread exclusive mode is deprecated now
-
- Jun 08, 2016
-
-
Shiyin Kang authored
stronger unit test
-
Shiyin Kang authored
New: For CuMatrix::GroupPnorm<float>, for dim = 16, speed was 0.014416 gigaflops. Old: For CuMatrix::GroupPnorm<float>, for dim = 16, speed was 0.0138561 gigaflops. New: For CuMatrix::GroupPnorm<float>, for dim = 32, speed was 0.0616648 gigaflops. Old: For CuMatrix::GroupPnorm<float>, for dim = 32, speed was 0.0542906 gigaflops. New: For CuMatrix::GroupPnorm<float>, for dim = 64, speed was 0.241291 gigaflops. Old: For CuMatrix::GroupPnorm<float>, for dim = 64, speed was 0.213442 gigaflops. New: For CuMatrix::GroupPnorm<float>, for dim = 128, speed was 0.869675 gigaflops. Old: For CuMatrix::GroupPnorm<float>, for dim = 128, speed was 0.821949 gigaflops. New: For CuMatrix::GroupPnorm<float>, for dim = 256, speed was 3.07193 gigaflops. Old: For CuMatrix::GroupPnorm<float>, for dim = 256, speed was 2.90466 gigaflops. New: For CuMatrix::GroupPnorm<float>, for dim = 512, speed was 8.8404 gigaflops. Old: For CuMatrix::GroupPnorm<float>, for dim = 512, speed was 6.48644 gigaflops. New: For CuMatrix::GroupPnorm<float>, for dim = 1024, speed was 16.7489 gigaflops. Old: For CuMatrix::GroupPnorm<float>, for dim = 1024, speed was 9.3791 gigaflops. New: For CuMatrix::GroupPnorm<double>, for dim = 16, speed was 0.0159731 gigaflops. Old: For CuMatrix::GroupPnorm<double>, for dim = 16, speed was 0.0101083 gigaflops. New: For CuMatrix::GroupPnorm<double>, for dim = 32, speed was 0.0605624 gigaflops. Old: For CuMatrix::GroupPnorm<double>, for dim = 32, speed was 0.0393037 gigaflops. New: For CuMatrix::GroupPnorm<double>, for dim = 64, speed was 0.249944 gigaflops. Old: For CuMatrix::GroupPnorm<double>, for dim = 64, speed was 0.153672 gigaflops. New: For CuMatrix::GroupPnorm<double>, for dim = 128, speed was 0.840825 gigaflops. Old: For CuMatrix::GroupPnorm<double>, for dim = 128, speed was 0.598191 gigaflops. New: For CuMatrix::GroupPnorm<double>, for dim = 256, speed was 3.13722 gigaflops. Old: For CuMatrix::GroupPnorm<double>, for dim = 256, speed was 1.78274 gigaflops. New: For CuMatrix::GroupPnorm<double>, for dim = 512, speed was 6.86864 gigaflops. Old: For CuMatrix::GroupPnorm<double>, for dim = 512, speed was 2.96384 gigaflops. New: For CuMatrix::GroupPnorm<double>, for dim = 1024, speed was 12.5614 gigaflops. Old: For CuMatrix::GroupPnorm<double>, for dim = 1024, speed was 3.79237 gigaflops.
-
Shiyin Kang authored
loop unroll by template generalize to group transform reduce _transform_reduce for vec, mat-col and group fix min bug fix bug fix template param bug
-
- Jun 04, 2016
-
-
Shiyin Kang authored
Good performance on large group sizes (>10). New: For CuMatrix::GroupMax<float>, for dim = 16, speed was 0.0190836 gigaflops. Old: For CuMatrix::GroupMax<float>, for dim = 16, speed was 0.0193129 gigaflops. New: For CuMatrix::GroupMax<float>, for dim = 32, speed was 0.0791846 gigaflops. Old: For CuMatrix::GroupMax<float>, for dim = 32, speed was 0.0768508 gigaflops. New: For CuMatrix::GroupMax<float>, for dim = 64, speed was 0.311131 gigaflops. Old: For CuMatrix::GroupMax<float>, for dim = 64, speed was 0.299519 gigaflops. New: For CuMatrix::GroupMax<float>, for dim = 128, speed was 1.13589 gigaflops. Old: For CuMatrix::GroupMax<float>, for dim = 128, speed was 1.14847 gigaflops. New: For CuMatrix::GroupMax<float>, for dim = 256, speed was 4.22264 gigaflops. Old: For CuMatrix::GroupMax<float>, for dim = 256, speed was 3.92072 gigaflops. New: For CuMatrix::GroupMax<float>, for dim = 512, speed was 12.2629 gigaflops. Old: For CuMatrix::GroupMax<float>, for dim = 512, speed was 10.0812 gigaflops. New: For CuMatrix::GroupMax<float>, for dim = 1024, speed was 21.6979 gigaflops. Old: For CuMatrix::GroupMax<float>, for dim = 1024, speed was 16.5123 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 16, speed was 0.0188551 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 16, speed was 0.0163827 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 32, speed was 0.0701613 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 32, speed was 0.0620238 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 64, speed was 0.271106 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 64, speed was 0.215268 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 128, speed was 0.931745 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 128, speed was 0.723582 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 256, speed was 3.53189 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 256, speed was 1.9751 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 512, speed was 9.95109 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 512, speed was 3.91183 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 1024, speed was 17.2099 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<float>, for dim = 1024, speed was 4.92671 gigaflops. New: For CuMatrix::GroupMax<double>, for dim = 16, speed was 0.0199497 gigaflops. Old: For CuMatrix::GroupMax<double>, for dim = 16, speed was 0.0148693 gigaflops. New: For CuMatrix::GroupMax<double>, for dim = 32, speed was 0.079538 gigaflops. Old: For CuMatrix::GroupMax<double>, for dim = 32, speed was 0.0718237 gigaflops. New: For CuMatrix::GroupMax<double>, for dim = 64, speed was 0.314509 gigaflops. Old: For CuMatrix::GroupMax<double>, for dim = 64, speed was 0.237838 gigaflops. New: For CuMatrix::GroupMax<double>, for dim = 128, speed was 1.08104 gigaflops. Old: For CuMatrix::GroupMax<double>, for dim = 128, speed was 0.788395 gigaflops. New: For CuMatrix::GroupMax<double>, for dim = 256, speed was 3.7741 gigaflops. Old: For CuMatrix::GroupMax<double>, for dim = 256, speed was 2.87856 gigaflops. New: For CuMatrix::GroupMax<double>, for dim = 512, speed was 8.65988 gigaflops. Old: For CuMatrix::GroupMax<double>, for dim = 512, speed was 5.87111 gigaflops. New: For CuMatrix::GroupMax<double>, for dim = 1024, speed was 14.0373 gigaflops. Old: For CuMatrix::GroupMax<double>, for dim = 1024, speed was 8.88655 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 16, speed was 0.0174585 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 16, speed was 0.0136057 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 32, speed was 0.0694617 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 32, speed was 0.0500527 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 64, speed was 0.265809 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 64, speed was 0.177945 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 128, speed was 0.973417 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 128, speed was 0.588654 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 256, speed was 3.43166 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 256, speed was 1.57864 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 512, speed was 8.26032 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 512, speed was 3.14173 gigaflops. New: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 1024, speed was 12.1338 gigaflops. Old: For CuMatrix::GroupMax (all group sizes)<double>, for dim = 1024, speed was 3.05406 gigaflops. fix typo; rename and comment
-
Shiyin Kang authored
fix space
-
- May 30, 2016
-
-
Shiyin Kang authored
New: For CuMatrix::DivRowsVec<float>, for dim = 16, speed was 0.0180391 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 16, speed was 0.017677 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 32, speed was 0.0686798 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 32, speed was 0.0682798 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 64, speed was 0.290613 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 64, speed was 0.273113 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 128, speed was 1.12576 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 128, speed was 1.08792 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 256, speed was 3.79354 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 256, speed was 3.48151 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 512, speed was 9.247 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 512, speed was 8.70703 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 1024, speed was 16.535 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 1024, speed was 12.8467 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 2048, speed was 21.0912 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 2048, speed was 14.6946 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 4096, speed was 21.8187 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 4096, speed was 15.1197 gigaflops. New: For CuMatrix::DivRowsVec<float>, for dim = 8192, speed was 20.9238 gigaflops. Old: For CuMatrix::DivRowsVec<float>, for dim = 8192, speed was 15.2273 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 16, speed was 0.0171395 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 16, speed was 0.0173988 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 32, speed was 0.0708914 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 32, speed was 0.0745867 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 64, speed was 0.302615 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 64, speed was 0.279866 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 128, speed was 1.12123 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 128, speed was 1.15183 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 256, speed was 3.73959 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 256, speed was 3.61588 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 512, speed was 6.75394 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 512, speed was 6.86088 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 1024, speed was 10.2967 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 1024, speed was 9.63553 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 2048, speed was 11.3301 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 2048, speed was 10.9322 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 4096, speed was 11.063 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 4096, speed was 10.7829 gigaflops. New: For CuMatrix::DivRowsVec<double>, for dim = 8192, speed was 10.6967 gigaflops. Old: For CuMatrix::DivRowsVec<double>, for dim = 8192, speed was 10.6246 gigaflops.
-
Shiyin Kang authored
-
- May 29, 2016
-
-
Shiyin Kang authored
New sum: LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 16, speed was 0.00954969 gigaflops, result = 26.2034 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 32, speed was 0.0400381 gigaflops, result = 17.9455 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 64, speed was 0.152595 gigaflops, result = 31.6159 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 128, speed was 0.546459 gigaflops, result = 81.3117 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 256, speed was 1.94432 gigaflops, result = 572.224 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 512, speed was 6.23377 gigaflops, result = 39.6669 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 1024, speed was 16.0119 gigaflops, result = 518.841 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 16, speed was 0.00916145 gigaflops, result = -2.71724 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 32, speed was 0.0366853 gigaflops, result = 43.261 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 64, speed was 0.144912 gigaflops, result = 30.3323 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 128, speed was 0.501765 gigaflops, result = -152.665 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 256, speed was 1.83353 gigaflops, result = -355.256 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 512, speed was 5.609 gigaflops, result = 744.185 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 1024, speed was 12.6693 gigaflops, result = -857.049
-
Shiyin Kang authored
Old sum: LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 16, speed was 0.00340611 gigaflops, result = 26.2034 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 32, speed was 0.0141018 gigaflops, result = 17.9455 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 64, speed was 0.0575425 gigaflops, result = 31.6159 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 128, speed was 0.229418 gigaflops, result = 81.3117 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 256, speed was 0.778943 gigaflops, result = 572.224 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 512, speed was 3.11055 gigaflops, result = 39.6668 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<float>, for dim = 1024, speed was 7.50506 gigaflops, result = 518.842 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 16, speed was 0.00216499 gigaflops, result = -2.71724 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 32, speed was 0.00863257 gigaflops, result = 43.261 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 64, speed was 0.0513208 gigaflops, result = 30.3323 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 128, speed was 0.20313 gigaflops, result = -152.665 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 256, speed was 0.759338 gigaflops, result = -355.256 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 512, speed was 2.60159 gigaflops, result = 744.185 LOG (TestCuMatrixSum():cu-matrix-speed-test.cc:57) For CuMatrix::TestCuMatrixSum<double>, for dim = 1024, speed was 7.2258 gigaflops, result = -857.049
-
- May 28, 2016
-
-
Shiyin Kang authored
vec max/min/sum tests cover both long and short vector cases; use the smallest length threshold tested on different machines.
-
- May 27, 2016
-
-
Shiyin Kang authored
Add test to choose min length of vectors to be reduced on GPU. New: LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 16, speed was 0.000886179 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 32, speed was 0.00119834 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 64, speed was 0.00182674 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 128, speed was 0.00721178 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 256, speed was 0.0166563 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 1024, speed was 0.0626621 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 2048, speed was 0.108495 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 4096, speed was 0.162914 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 8192, speed was 0.248687 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 16384, speed was 0.491677 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 32768, speed was 0.931507 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 65536, speed was 1.75797 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 16, speed was 0.00116685 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 32, speed was 0.00229885 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 64, speed was 0.00430313 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 128, speed was 0.00840191 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 256, speed was 0.0156417 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 1024, speed was 0.051799 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 2048, speed was 0.09064 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 4096, speed was 0.122844 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 8192, speed was 0.241084 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 16384, speed was 0.468114 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 32768, speed was 0.859946 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 65536, speed was 1.53817 gigaflops. Old: LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 16, speed was 0.000461866 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 32, speed was 0.000936284 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 64, speed was 0.00180461 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 128, speed was 0.00350883 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 256, speed was 0.00700597 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 1024, speed was 0.0273135 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 2048, speed was 0.0529984 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 4096, speed was 0.0930953 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 8192, speed was 0.149376 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 16384, speed was 0.197131 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 32768, speed was 0.492249 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<float>, for dim = 65536, speed was 0.657485 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 16, speed was 0.000406633 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 32, speed was 0.000836551 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 64, speed was 0.00167463 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 128, speed was 0.00338708 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 256, speed was 0.00668978 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 1024, speed was 0.0253556 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 2048, speed was 0.0510465 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 4096, speed was 0.081494 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 8192, speed was 0.156451 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 16384, speed was 0.311666 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 32768, speed was 0.545834 gigaflops. LOG (TestCuVectorSum():cu-vector-speed-test.cc:72) For CuVector::Sum<double>, for dim = 65536, speed was 0.914985 gigaflops. fix vector sum bug. del old kernels correct way for inline. only do this when we have cuda.
-
- May 19, 2016
-
-
Shiyin Kang authored
-