Class: Ikra::Translator::CommandTranslator::KernelLauncher
- Defined in:
- lib/translator/kernel_launcher/kernel_launcher.rb
Overview
Builds the launch of the kernel. This class is responsible for generating the invocation of the kernel.
For example: kernel<<<…, …>>>(env, result, d_a, …);
Direct Known Subclasses
Class Attribute Summary collapse
-
.debug_free_previous_input_immediately ⇒ Object
Debug flag only: Frees all input after launching kernel.
Instance Attribute Summary collapse
-
#additional_arguments ⇒ Object
Additional parameters that this kernel should accept (to access the result of previous kernels).
-
#block_dim ⇒ Object
Returns the value of attribute block_dim.
-
#cached_results ⇒ Object
IDs and types of commands whose results are kept on the GPU.
-
#grid_dim ⇒ Object
Block/grid dimensions (should be 1D).
-
#kernel_builder ⇒ Object
Returns the value of attribute kernel_builder.
-
#kernel_result_var_name ⇒ Object
readonly
Pointer to the resulting array (device memory).
-
#num_threads ⇒ Object
Number of threads (elements to be processed).
-
#previous_kernel_input ⇒ Object
Additional parameters that this kernel should accept (to access the result of previous kernels).
-
#previously_cached_results ⇒ Object
readonly
IDs and types of commands that were previously computed and shall now be used in this kernel as input.
-
#reuse_memory ⇒ Object
Whether the launch allocates new memory beforehand or uses previous memory.
Instance Method Summary collapse
-
#add_additional_arguments(*arguments) ⇒ Object
Add additional arguments to the kernel function that might be needed for some computations.
-
#add_cached_result(result_id, type) ⇒ Object
Adds command whose result will be kept on GPU.
- #add_previous_kernel_parameter(parameter) ⇒ Object
- #assert_ready_to_build ⇒ Object
- #build_device_memory_free ⇒ Object
-
#build_device_memory_free_in_host_section ⇒ Object
Same as above, but also removes item from the list of allocated memory chunks.
-
#build_kernel_launcher ⇒ Object
Build the code that launches this kernel.
-
#configure_grid(size, block_size: 256) ⇒ Object
Configures grid size and block size.
-
#initialize(kernel_builder) ⇒ KernelLauncher
constructor
A new instance of KernelLauncher.
- #kernel_builders ⇒ Object
-
#prepare_additional_args_for_launch(command) ⇒ Object
Some of the values stored in ‘@additional_arguments` might be blocks, because not all information was known when adding something to that list.
-
#result_size ⇒ Object
The size of the result array is the number of threads.
-
#result_type ⇒ Object
The result type of this kernel launcher.
- #reuse_memory!(parameter_name) ⇒ Object
- #reuse_memory? ⇒ Boolean
-
#use_cached_result(result_id, type) ⇒ Object
Adds a previously computed result which will be used in this launche as input.
Constructor Details
#initialize(kernel_builder) ⇒ KernelLauncher
Returns a new instance of KernelLauncher.
46 47 48 49 50 51 52 53 54 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 46 def initialize(kernel_builder) @kernel_builder = kernel_builder @additional_arguments = [] @previous_kernel_input = [] @reuse_memory = false @kernel_result_var_name = "_kernel_result_" + CommandTranslator.next_unique_id.to_s @cached_results = {} @previously_cached_results = {} end |
Class Attribute Details
.debug_free_previous_input_immediately ⇒ Object
Debug flag only: Frees all input after launching kernel. This causes an error if data is used twice or kept (using the ‘keep` flag)
14 15 16 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 14 def debug_free_previous_input_immediately @debug_free_previous_input_immediately end |
Instance Attribute Details
#additional_arguments ⇒ Object
Additional parameters that this kernel should accept (to access the result of previous kernels)
25 26 27 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 25 def additional_arguments @additional_arguments end |
#block_dim ⇒ Object
Returns the value of attribute block_dim.
32 33 34 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 32 def block_dim @block_dim end |
#cached_results ⇒ Object
IDs and types of commands whose results are kept on the GPU
41 42 43 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 41 def cached_results @cached_results end |
#grid_dim ⇒ Object
Block/grid dimensions (should be 1D)
31 32 33 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 31 def grid_dim @grid_dim end |
#kernel_builder ⇒ Object
Returns the value of attribute kernel_builder.
17 18 19 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 17 def kernel_builder @kernel_builder end |
#kernel_result_var_name ⇒ Object (readonly)
Pointer to the resulting array (device memory)
38 39 40 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 38 def kernel_result_var_name @kernel_result_var_name end |
#num_threads ⇒ Object
Number of threads (elements to be processed)
28 29 30 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 28 def num_threads @num_threads end |
#previous_kernel_input ⇒ Object
Additional parameters that this kernel should accept (to access the result of previous kernels)
21 22 23 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 21 def previous_kernel_input @previous_kernel_input end |
#previously_cached_results ⇒ Object (readonly)
IDs and types of commands that were previously computed and shall now be used in this kernel as input
44 45 46 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 44 def previously_cached_results @previously_cached_results end |
#reuse_memory ⇒ Object
Whether the launch allocates new memory beforehand or uses previous memory
35 36 37 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 35 def reuse_memory @reuse_memory end |
Instance Method Details
#add_additional_arguments(*arguments) ⇒ Object
Add additional arguments to the kernel function that might be needed for some computations
101 102 103 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 101 def add_additional_arguments(*arguments) @additional_arguments.push(*arguments) end |
#add_cached_result(result_id, type) ⇒ Object
Adds command whose result will be kept on GPU
78 79 80 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 78 def add_cached_result(result_id, type) @cached_results[result_id] = type end |
#add_previous_kernel_parameter(parameter) ⇒ Object
96 97 98 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 96 def add_previous_kernel_parameter(parameter) kernel_builder.add_previous_kernel_parameter(parameter) end |
#assert_ready_to_build ⇒ Object
139 140 141 142 143 144 145 146 147 148 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 139 def assert_ready_to_build required_values = [:num_threads, :grid_dim, :block_dim] for selector in required_values if send(selector) == nil raise AssertionError.new( "Not ready to build (KernelBuilder): #{selector} is not set") end end end |
#build_device_memory_free ⇒ Object
229 230 231 232 233 234 235 236 237 238 239 240 241 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 229 def build_device_memory_free Log.info("Building kernel post-launch CUDA free") assert_ready_to_build if KernelLauncher.debug_free_previous_input_immediately == true Log.warn("Debug flag set... Freeing input memory immediately and some memory not at all!") return "" end return Translator.read_file(file_name: "free_device_memory.cpp", replacements: { "name" => kernel_result_var_name}) end |
#build_device_memory_free_in_host_section ⇒ Object
Same as above, but also removes item from the list of allocated memory chunks.
244 245 246 247 248 249 250 251 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 244 def build_device_memory_free_in_host_section Log.info("Building kernel post-launch CUDA free (host section") assert_ready_to_build return Translator.read_file(file_name: "host_section_free_device_memory.cpp", replacements: { "name" => kernel_result_var_name}) end |
#build_kernel_launcher ⇒ Object
Build the code that launches this kernel. The generated code performs the following steps:
-
Allocate device memory for the result.
-
If result should be written back: Allocate host memory for the result.
-
Launch the kernel (+ error checking, synchronization)
-
If result should be written back: Copy result back to host memory.
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 158 def build_kernel_launcher Log.info("Building kernel launcher") assert_ready_to_build result = "" if !reuse_memory # Allocate device memory for kernel result result = result + Translator.read_file(file_name: "allocate_device_memory.cpp", replacements: { "name" => kernel_result_var_name, "bytes" => "(sizeof(#{kernel_builder.result_type.to_c_type}) * #{num_threads})", "type" => kernel_builder.result_type.to_c_type}) end previously_cached_results.each do |result_id, type| result = result + " #{type.to_c_type} *prev_" + result_id.to_s + " = (#{type.to_c_type} *) " + Constants::ENV_HOST_IDENTIFIER + "->prev_" + result_id.to_s + ";\n" end # Allocate device memory for cached results cached_results.each do |result_id, type| result = result + Translator.read_file(file_name: "allocate_device_memory.cpp", replacements: { "name" => Constants::RESULT_IDENTIFIER + result_id, "bytes" => "(#{type.c_size} * #{num_threads})", "type" => type.to_c_type}) end # Build arguments a_env = Constants::ENV_DEVICE_IDENTIFIER a_result = kernel_result_var_name previous_kernel_args = [] for var in kernel_builder.previous_kernel_input previous_kernel_args.push(var.name.to_s) end a_cached_results = cached_results.map do |result_id, type| Constants::RESULT_IDENTIFIER + result_id end if reuse_memory previous_kernel_args[0] = a_result end arguments = ([a_env, num_threads, a_result] + a_cached_results + previous_kernel_args + additional_arguments).join(", ") # Launch kernel result = result + Translator.read_file(file_name: "launch_kernel.cpp", replacements: { "kernel_name" => kernel_builder.kernel_name, "arguments" => arguments, "grid_dim" => grid_dim, "block_dim" => block_dim}) # ---- DEBUG ONLY: Free input after computation so that we can process larger # data sets in benchmarks without running out of memory # TODO: Implement analysis and do this automatically if KernelLauncher.debug_free_previous_input_immediately == true for var in kernel_builder.previous_kernel_input result = result + Translator.read_file(file_name: "free_device_memory.cpp", replacements: { "name" => var.name.to_s}) end end # ---- END DEBUG ONLY cached_results.each do |result_id, type| result = result + " " + Constants::ENV_HOST_IDENTIFIER + "->prev_" + result_id + " = " + Constants::RESULT_IDENTIFIER + result_id + ";\n" end return result end |
#configure_grid(size, block_size: 256) ⇒ Object
Configures grid size and block size. Also sets number of threads.
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 117 def configure_grid(size, block_size: 256) if block_size == nil block_size = 256 end if size.is_a?(Fixnum) # Precompute constants @grid_dim = [size.fdiv(block_size).ceil, 1].max.to_s @block_dim = (size >= block_size ? block_size : size).to_s @num_threads = size else if !size.is_a?(String) raise AssertionError.new("Fixnum or String expected") end # Source code string determines the size @grid_dim = "max((int) ceil(((float) #{size}) / #{block_size}), 1)" @block_dim = "(#{size} >= #{block_size} ? #{block_size} : #{size})" @num_threads = size end end |
#kernel_builders ⇒ Object
70 71 72 73 74 75 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 70 def kernel_builders # The program builder accesses kernel builders via kernel launchers through # this method, because some specialized launchers might have multiple kernel # builders. return [kernel_builder] end |
#prepare_additional_args_for_launch(command) ⇒ Object
Some of the values stored in ‘@additional_arguments` might be blocks, because not all information was known when adding something to that list. This method replaces those blocks (evaluates them) with actual strings, based on the command that is being launched.
60 61 62 63 64 65 66 67 68 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 60 def prepare_additional_args_for_launch(command) @additional_arguments = @additional_arguments.map do |arg| if arg.is_a?(String) arg else arg.call(command) end end end |
#result_size ⇒ Object
The size of the result array is the number of threads.
112 113 114 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 112 def result_size return num_threads end |
#result_type ⇒ Object
The result type of this kernel launcher. Same as the result type of its kernel builder.
107 108 109 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 107 def result_type return kernel_builder.result_type end |
#reuse_memory!(parameter_name) ⇒ Object
87 88 89 90 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 87 def reuse_memory!(parameter_name) @reuse_memory = true @kernel_result_var_name = parameter_name end |
#reuse_memory? ⇒ Boolean
92 93 94 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 92 def reuse_memory? return @reuse_memory end |
#use_cached_result(result_id, type) ⇒ Object
Adds a previously computed result which will be used in this launche as input
83 84 85 |
# File 'lib/translator/kernel_launcher/kernel_launcher.rb', line 83 def use_cached_result(result_id, type) @previously_cached_results[result_id] = type end |