1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
| #include "plugin/device/gpu/kernel/random/random_poisson_gpu_kernel.h"
#include <functional> #include <utility> #include <memory> #include <string> #include <algorithm>
#include "ir/anf.h" #include "utils/log_adapter.h" #include "kernel/common_utils.h"
#include "include/cuda_fp16.h"
namespace mindspore { namespace kernel {
namespace { using KernelRunFunc = RandomPoissonGpuKernelMod::KernelRunFunc; #define ADD_KERNEL(shape_dtype, rate_dtype, output_dtype, rate_type, output_type) \ { \ KernelAttr() \ .AddInputAttr(kNumberType##shape_dtype) \ .AddInputAttr(kNumberType##rate_dtype) \ .AddOutputAttr(kNumberType##output_dtype), \ &RandomPoissonGpuKernelMod::LaunchKernel<rate_type, output_type> \ } }
bool RandomPoissonGpuKernelMod::Init(const BaseOperatorPtr &base_operator, const std::vector<KernelTensorPtr> &inputs, const std::vector<KernelTensorPtr> &outputs) { MS_EXCEPTION_IF_NULL(base_operator); kernel_name_ = base_operator->name(); if (!MatchKernelFunc(base_operator, inputs, outputs)) { return false; } auto kernel_attr = GetKernelAttrFromTensors(inputs, outputs); unit_shape_size_ = abstract::TypeIdSize(kernel_attr.GetInputAttr(0).first); unit_rate_size_ = abstract::TypeIdSize(kernel_attr.GetInputAttr(1).first); unit_output_size_ = abstract::TypeIdSize(kernel_attr.GetOutputAttr(0).first); auto kernel_ptr = std::make_shared<ops::RandomPoisson>(base_operator->GetPrim()); seed_ = static_cast<int64_t>(kernel_ptr->get_seed()); seed2_ = static_cast<int64_t>(kernel_ptr->get_seed2()); return true; }
int RandomPoissonGpuKernelMod::Resize(const BaseOperatorPtr &base_operator, const std::vector<KernelTensorPtr> &inputs, const std::vector<KernelTensorPtr> &outputs, const std::map<uint32_t, tensor::TensorPtr> &) { for (const auto &input : inputs) { auto input_shape = input->GetShapeVector(); if (!IsValidShape(input_shape)) { return KRET_UNKNOWN_SHAPE; } } ResetResource(); std::vector<int64_t> shape_shape = std::vector<int64_t>(inputs.at(kIndex0)->GetDeviceShapeAdaptively().begin(), inputs.at(kIndex0)->GetDeviceShapeAdaptively().end()); std::vector<int64_t> rate_shape = std::vector<int64_t>(inputs.at(kIndex1)->GetDeviceShapeAdaptively().begin(), inputs.at(kIndex1)->GetDeviceShapeAdaptively().end()); std::vector<int64_t> output_shape = std::vector<int64_t>(outputs.at(kIndex0)->GetDeviceShapeAdaptively().begin(), outputs.at(kIndex0)->GetDeviceShapeAdaptively().end()); int64_t shape_elements = std::accumulate(shape_shape.begin(), shape_shape.end(), 1, std::multiplies<int64_t>()); rate_elements_ = std::accumulate(rate_shape.begin(), rate_shape.end(), 1, std::multiplies<int64_t>()); output_elements_ = std::accumulate(output_shape.begin(), output_shape.end(), 1, std::multiplies<int64_t>()); if (output_elements_ == 0) { is_null_input_ = true; } input_size_list_.emplace_back(shape_elements * unit_shape_size_); input_size_list_.emplace_back(rate_elements_ * unit_rate_size_); output_size_list_.emplace_back(output_elements_ * unit_output_size_); workspace_size_list_.push_back(output_elements_ * sizeof(curandState)); return KRET_OK; }
template <typename R, typename T> bool RandomPoissonGpuKernelMod::LaunchKernel(const std::vector<kernel::AddressPtr> &inputs, const std::vector<AddressPtr> &workspace, const std::vector<kernel::AddressPtr> &outputs) { R *rate_addr = GetDeviceAddress<R>(inputs, 1); T *output = GetDeviceAddress<T>(outputs, 0); curandState *devStates = nullptr; void *workspace_addr = GetDeviceAddress<void *>(workspace, 0); devStates = reinterpret_cast<curandState *>(workspace_addr); RandomPoisson(seed_, seed2_, devStates, rate_addr, rate_elements_, output, output_elements_, reinterpret_cast<cudaStream_t>(cuda_stream_)); return true; }
const std::vector<std::pair<KernelAttr, KernelRunFunc>> &RandomPoissonGpuKernelMod::GetFuncList() const { static const std::vector<std::pair<KernelAttr, KernelRunFunc>> func_list = { ADD_KERNEL(Int32, Float16, Float16, half, half), ADD_KERNEL(Int32, Float16, Float32, half, float), ADD_KERNEL(Int32, Float16, Float64, half, double), ADD_KERNEL(Int32, Float16, Int32, half, int), ADD_KERNEL(Int32, Float16, Int64, half, int64_t),
ADD_KERNEL(Int32, Float32, Float16, float, half), ADD_KERNEL(Int32, Float32, Float32, float, float), ADD_KERNEL(Int32, Float32, Float64, float, double), ADD_KERNEL(Int32, Float32, Int32, float, int), ADD_KERNEL(Int32, Float32, Int64, float, int64_t),
ADD_KERNEL(Int32, Float64, Float16, double, half), ADD_KERNEL(Int32, Float64, Float32, double, float), ADD_KERNEL(Int32, Float64, Float64, double, double), ADD_KERNEL(Int32, Float64, Int32, double, int), ADD_KERNEL(Int32, Float64, Int64, double, int64_t),
ADD_KERNEL(Int32, Int32, Float16, int, half), ADD_KERNEL(Int32, Int32, Float32, int, float), ADD_KERNEL(Int32, Int32, Float64, int, double), ADD_KERNEL(Int32, Int32, Int32, int, int), ADD_KERNEL(Int32, Int32, Int64, int, int64_t),
ADD_KERNEL(Int32, Int64, Float16, int64_t, half), ADD_KERNEL(Int32, Int64, Float32, int64_t, float), ADD_KERNEL(Int32, Int64, Float64, int64_t, double), ADD_KERNEL(Int32, Int64, Int32, int64_t, int), ADD_KERNEL(Int32, Int64, Int64, int64_t, int64_t),
ADD_KERNEL(Int64, Float16, Float16, half, half), ADD_KERNEL(Int64, Float16, Float32, half, float), ADD_KERNEL(Int64, Float16, Float64, half, double), ADD_KERNEL(Int64, Float16, Int32, half, int), ADD_KERNEL(Int64, Float16, Int64, half, int64_t),
ADD_KERNEL(Int64, Float32, Float16, float, half), ADD_KERNEL(Int64, Float32, Float32, float, float), ADD_KERNEL(Int64, Float32, Float64, float, double), ADD_KERNEL(Int64, Float32, Int32, float, int), ADD_KERNEL(Int64, Float32, Int64, float, int64_t),
ADD_KERNEL(Int64, Float64, Float16, double, half), ADD_KERNEL(Int64, Float64, Float32, double, float), ADD_KERNEL(Int64, Float64, Float64, double, double), ADD_KERNEL(Int64, Float64, Int32, double, int), ADD_KERNEL(Int64, Float64, Int64, double, int64_t),
ADD_KERNEL(Int64, Int32, Float16, int, half), ADD_KERNEL(Int64, Int32, Float32, int, float), ADD_KERNEL(Int64, Int32, Float64, int, double), ADD_KERNEL(Int64, Int32, Int32, int, int), ADD_KERNEL(Int64, Int32, Int64, int, int64_t),
ADD_KERNEL(Int64, Int64, Float16, int64_t, half), ADD_KERNEL(Int64, Int64, Float32, int64_t, float), ADD_KERNEL(Int64, Int64, Float64, int64_t, double), ADD_KERNEL(Int64, Int64, Int32, int64_t, int), ADD_KERNEL(Int64, Int64, Int64, int64_t, int64_t)}; return func_list; }
MS_KERNEL_FACTORY_REG(NativeGpuKernelMod, RandomPoisson, RandomPoissonGpuKernelMod); } }
|