Class: Maatkit::ParallelDump
- Inherits:
-
Object
- Object
- Maatkit::ParallelDump
- Defined in:
- lib/maatkit-ruby/mk-parallel-dump.rb
Overview
Dump MySQL tables in parallel.
Maatkit::ParallelDump.new( array, str, array)
Instance Attribute Summary collapse
-
#ask_pass ⇒ Object
Prompt for a password when connecting to MySQL.
-
#base_dir ⇒ Object
type: string The base directory in which files will be stored.
-
#biggest_first ⇒ Object
default: yes Process tables in descending order of size (biggest to smallest).
-
#bin_log_position ⇒ Object
default: yes Dump the master/slave position.
-
#charset ⇒ Object
short form: -A; type: string Default character set.
-
#chunk_size ⇒ Object
type: string Number of rows or data size to dump per file.
-
#client_side_buffering ⇒ Object
Fetch and buffer results in memory on client.
-
#config ⇒ Object
type: Array Read this comma-separated list of config files; if specified, this must be the first option on the command line.
-
#csv ⇒ Object
Do “–tab” dump in CSV format (implies “–tab”).
-
#databases ⇒ Object
short form: -d; type: hash Dump only this comma-separated list of databases.
-
#databases_regex ⇒ Object
type: string Dump only databases whose names match this Perl regex.
-
#defaults_file ⇒ Object
short form: -F; type: string Only read mysql options from the given file.
-
#dry_run ⇒ Object
Print commands instead of executing them.
-
#engines ⇒ Object
short form: -e; type: hash Dump only tables that use this comma-separated list of storage engines.
-
#flush_lock ⇒ Object
Use “FLUSH TABLES WITH READ LOCK”.
-
#flush_log ⇒ Object
Execute “FLUSH LOGS” when getting binlog positions.
-
#gzip ⇒ Object
default: yes Compress (gzip) SQL dump files; does not work with “–tab”.
-
#help ⇒ Object
Show help and exit.
-
#host ⇒ Object
short form: -h; type: string Connect to host.
-
#ignore_databases ⇒ Object
type: Hash Ignore this comma-separated list of databases.
-
#ignore_databases_regex ⇒ Object
type: string Ignore databases whose names match this Perl regex.
-
#ignore_engines ⇒ Object
type: Hash; default: FEDERATED,MRG_MyISAM Do not dump tables that use this comma-separated list of storage engines.
-
#ignore_tables ⇒ Object
type: Hash Ignore this comma-separated list of table names.
-
#ignore_tables_regex ⇒ Object
type: string Ignore tables whose names match the Perl regex.
-
#lock_tables ⇒ Object
Use “LOCK TABLES” (disables “–[no]flush-lock”).
-
#lossless_floats ⇒ Object
Dump float types with extra precision for lossless restore (requires “–tab”).
-
#password ⇒ Object
short form: -p; type: string Password to use when connecting.
-
#path_to_mk_parallel_dump ⇒ Object
Sets the executable path, otherwise the environment path will be used.
-
#pid ⇒ Object
type: string Create the given PID file.
-
#port ⇒ Object
short form: -P; type: int Port number to use for connection.
-
#progress ⇒ Object
Display progress reports.
-
#quiet ⇒ Object
short form: -q Quiet output; disables “–verbose”.
-
#resume ⇒ Object
default: yes Resume dumps.
-
#set_vars ⇒ Object
type: string; default: wait_timeout=10000 Set these MySQL variables.
-
#socket ⇒ Object
short form: -S; type: string Socket file to use for connection.
-
#stop_slave ⇒ Object
Issue “STOP SLAVE” on server before dumping data.
-
#tab ⇒ Object
Dump tab-separated (sets “–umask” 0).
-
#tables ⇒ Object
short form: -t; type: hash Dump only this comma-separated list of table names.
-
#tables_regex ⇒ Object
type: string Dump only tables whose names match this Perl regex.
-
#threads ⇒ Object
type: int; default: 2 Number of threads to dump concurrently.
-
#tz_utc ⇒ Object
default: yes Enable TIMESTAMP columns to be dumped and reloaded between different time zones.
-
#umask ⇒ Object
type: string Set the program’s “umask” to this octal value.
-
#user ⇒ Object
short form: -u; type: string User for login if not current user.
-
#verbose ⇒ Object
short form: -v; cumulative: yes Be verbose; can specify multiple times.
-
#version ⇒ Object
Show version and exit.
-
#wait ⇒ Object
short form: -w; type: time; default: 5m Wait limit when the server is down.
-
#zero_chunk ⇒ Object
default: yes Add a chunk for rows with zero or zero-equivalent values.
Instance Method Summary collapse
-
#initialize ⇒ ParallelDump
constructor
Returns a new ParallelDump Object.
-
#start(options = nil) ⇒ Object
Execute the command.
Constructor Details
#initialize ⇒ ParallelDump
Returns a new ParallelDump Object
343 344 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 343 def initialize() end |
Instance Attribute Details
#ask_pass ⇒ Object
Prompt for a password when connecting to MySQL.
17 18 19 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 17 def ask_pass @ask_pass end |
#base_dir ⇒ Object
type: string The base directory in which files will be stored. The default is the current working directory. Each database gets its own directory under the base directory. So if the base directory is “/tmp” and database “foo” is dumped, then the directory “/tmp/foo” is created which contains all the table dump files for “foo”.
25 26 27 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 25 def base_dir @base_dir end |
#biggest_first ⇒ Object
default: yes Process tables in descending order of size (biggest to smallest). This strategy gives better parallelization. Suppose there are 8 threads and the last table is huge. We will finish everything else and then be running single-threaded while that one finishes. If that one runs first, then we will have the max number of threads running at a time for as long as possible.
33 34 35 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 33 def biggest_first @biggest_first end |
#bin_log_position ⇒ Object
default: yes Dump the master/slave position. Dump binary log positions from both “SHOW MASTER STATUS” and “SHOW SLAVE STATUS”, whichever can be retrieved from the server. The data is dumped to a file named 00_master_data.sql in the “–base-dir”. The file also contains details of each table dumped, including the WHERE clauses used to dump it in chunks.
42 43 44 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 42 def bin_log_position @bin_log_position end |
#charset ⇒ Object
short form: -A; type: string Default character set. If the value is utf8, sets Perl’s binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.
49 50 51 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 49 def charset @charset end |
#chunk_size ⇒ Object
type: string Number of rows or data size to dump per file. Specifies that the table should be dumped in segments of approximately the size given. The syntax is either a plain integer, which is interpreted as a number of rows per chunk, or an integer with a suffix of G, M, or k, which is interpreted as the size of the data to be dumped in each chunk. See “CHUNKS” for more details.
58 59 60 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 58 def chunk_size @chunk_size end |
#client_side_buffering ⇒ Object
Fetch and buffer results in memory on client. By default this option is not enabled because it causes data to be completely fetched from the server then buffered in-memory on the client. For large dumps this can require a lot of memory Instead, the default (when this option is not specified) is to fetch and dump rows one-by-one from the server. This requires a lot less memory on the client but can keep the tables on the server locked longer. Use this option only if you’re sure that the data being dumped is relatively small and the client has sufficient memory. Remember that, if this option is specified, all “–threads” will buffer their results in-memory, so memory consumption can increase by a factor of N “–threads”.
70 71 72 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 70 def client_side_buffering @client_side_buffering end |
#config ⇒ Object
type: Array Read this comma-separated list of config files; if specified, this must be the first option on the command line.
76 77 78 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 76 def config @config end |
#csv ⇒ Object
Do “–tab” dump in CSV format (implies “–tab”). Changes “–tab” options so the dump file is in comma-separated values (CSV) format. The SELECT INTO OUTFILE statement looks like the following, and can be re-loaded with the same options: # SELECT * INTO OUTFILE %D.%N.%6C.txt # FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘"’ # LINES TERMINATED BY ‘n’ FROM %D.%N;
85 86 87 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 85 def csv @csv end |
#databases ⇒ Object
short form: -d; type: hash Dump only this comma-separated list of databases.
90 91 92 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 90 def databases @databases end |
#databases_regex ⇒ Object
type: string Dump only databases whose names match this Perl regex.
95 96 97 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 95 def databases_regex @databases_regex end |
#defaults_file ⇒ Object
short form: -F; type: string Only read mysql options from the given file. You must give an absolute pathname.
100 101 102 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 100 def defaults_file @defaults_file end |
#dry_run ⇒ Object
Print commands instead of executing them.
104 105 106 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 104 def dry_run @dry_run end |
#engines ⇒ Object
short form: -e; type: hash Dump only tables that use this comma-separated list of storage engines.
109 110 111 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 109 def engines @engines end |
#flush_lock ⇒ Object
Use “FLUSH TABLES WITH READ LOCK”. This is enabled by default. The lock is taken once, at the beginning of the whole process and is released after all tables have been dumped. If you want to lock only the tables you’re dumping, use “–lock-tables”.
116 117 118 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 116 def flush_lock @flush_lock end |
#flush_log ⇒ Object
Execute “FLUSH LOGS” when getting binlog positions. This option is NOT enabled by default because it causes the MySQL server to rotate its error log, potentially overwriting error messages.
122 123 124 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 122 def flush_log @flush_log end |
#gzip ⇒ Object
default: yes Compress (gzip) SQL dump files; does not work with “–tab”. The IO::Compress::Gzip Perl module is used to compress SQL dump files as they are written to disk. The resulting dump files have a “.gz” extension, like “table.000000.sql.gz”. They can be uncompressed with gzip. mk-parallel-restore will automatically uncompress them, too, when restoring. This option does not work with “–tab” because the MySQL server writes the tab dump files directly using “SELECT INTO OUTFILE”.
132 133 134 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 132 def gzip @gzip end |
#help ⇒ Object
Show help and exit.
136 137 138 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 136 def help @help end |
#host ⇒ Object
short form: -h; type: string Connect to host.
141 142 143 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 141 def host @host end |
#ignore_databases ⇒ Object
type: Hash Ignore this comma-separated list of databases.
146 147 148 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 146 def ignore_databases @ignore_databases end |
#ignore_databases_regex ⇒ Object
type: string Ignore databases whose names match this Perl regex.
151 152 153 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 151 def ignore_databases_regex @ignore_databases_regex end |
#ignore_engines ⇒ Object
type: Hash; default: FEDERATED,MRG_MyISAM Do not dump tables that use this comma-separated list of storage engines. The schema file will be dumped as usual. This prevents dumping data for Federated tables and Merge tables.
158 159 160 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 158 def ignore_engines @ignore_engines end |
#ignore_tables ⇒ Object
type: Hash Ignore this comma-separated list of table names. Table names may be qualified with the database name.
164 165 166 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 164 def ignore_tables @ignore_tables end |
#ignore_tables_regex ⇒ Object
type: string Ignore tables whose names match the Perl regex.
169 170 171 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 169 def ignore_tables_regex @ignore_tables_regex end |
#lock_tables ⇒ Object
Use “LOCK TABLES” (disables “–[no]flush-lock”). Disables “–[no]flush-lock” (unless it was explicitly set) and locks tables with “LOCK TABLES READ”. The lock is taken and released for every table as it is dumped.
175 176 177 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 175 def lock_tables @lock_tables end |
#lossless_floats ⇒ Object
Dump float types with extra precision for lossless restore (requires “–tab”). Wraps these types with a call to “FORMAT()” with 17 digits of precision. According to the comments in Google’s patches, this will give lossless dumping and reloading in most cases. (I shamelessly stole this technique from them. I don’t know enough about floating-point math to have an opinion). This works only with “–tab”.
183 184 185 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 183 def lossless_floats @lossless_floats end |
#password ⇒ Object
short form: -p; type: string Password to use when connecting.
188 189 190 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 188 def password @password end |
#path_to_mk_parallel_dump ⇒ Object
Sets the executable path, otherwise the environment path will be used.
338 339 340 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 338 def path_to_mk_parallel_dump @path_to_mk_parallel_dump end |
#pid ⇒ Object
type: string Create the given PID file. The file contains the process ID of the script. The PID file is removed when the script exits. Before starting, the script checks if the PID file already exists. If it does not, then the script creates and writes its own PID to it. If it does, then the script checks the following: if the file contains a PID and a process is running with that PID, then the script dies; or, if there is no process running with that PID, then the script overwrites the file with its own PID and starts; else, if the file contains no PID, then the script dies.
198 199 200 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 198 def pid @pid end |
#port ⇒ Object
short form: -P; type: int Port number to use for connection.
203 204 205 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 203 def port @port end |
#progress ⇒ Object
Display progress reports. Progress is displayed each time a table or chunk of a table finishes dumping. Progress is calculated by measuring the average data size of each full chunk and assuming all bytes are created equal. The output is the completed and total bytes, the percent completed, estimated time remaining, and estimated completion time. For example:
40.72k/112.00k 36.36% ETA 00:00 (2009-10-27T19:17:53)
If “–chunk-size” is not specified then each table is effectively one big chunk and the progress reports are pretty accurate. When “–chunk-size” is specified the progress reports can be skewed because of averaging. Progress reports are inaccurate when a dump is resumed. This is known issue and will be fixed in a later release.
217 218 219 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 217 def progress @progress end |
#quiet ⇒ Object
short form: -q Quiet output; disables “–verbose”.
222 223 224 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 222 def quiet @quiet end |
#resume ⇒ Object
default: yes Resume dumps.
227 228 229 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 227 def resume @resume end |
#set_vars ⇒ Object
type: string; default: wait_timeout=10000 Set these MySQL variables. Immediately after connecting to MySQL, this string will be appended to SET and executed.
233 234 235 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 233 def set_vars @set_vars end |
#socket ⇒ Object
short form: -S; type: string Socket file to use for connection.
238 239 240 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 238 def socket @socket end |
#stop_slave ⇒ Object
Issue “STOP SLAVE” on server before dumping data. This ensures that the data is not changing during the dump. Issues “START SLAVE” after the dump is complete. If the slave is not running, throws an error and exits. This is to prevent possibly bad things from happening if the slave is not running because of a problem, or because someone intentionally stopped the slave for maintenance or some other purpose.
247 248 249 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 247 def stop_slave @stop_slave end |
#tab ⇒ Object
Dump tab-separated (sets “–umask” 0). Dump via “SELECT INTO OUTFILE”, which is similar to what “mysqldump” does with the “–tab” option, but you’re not constrained to a single database at a time. Before you use this option, make sure you know what “SELECT INTO OUTFILE” does! I recommend using it only if you’re running mk-parallel-dump on the same machine as the MySQL server, but there is no protection if you don’t. This option sets “–umask” to zero so auto-created directories are writable by the MySQL server.
257 258 259 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 257 def tab @tab end |
#tables ⇒ Object
short form: -t; type: hash Dump only this comma-separated list of table names. Table names may be qualified with the database name.
263 264 265 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 263 def tables @tables end |
#tables_regex ⇒ Object
type: string Dump only tables whose names match this Perl regex.
268 269 270 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 268 def tables_regex @tables_regex end |
#threads ⇒ Object
type: int; default: 2 Number of threads to dump concurrently. Specifies the number of parallel processes to run. The default is 2 (this is mk-parallel-dump, after all – 1 is not parallel). On GNU/Linux machines, the default is the number of times ‘processor’ appears in /proc/cpuinfo. On Windows, the default is read from the environment. In any case, the default is at least 2, even when there’s only a single processor.
277 278 279 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 277 def threads @threads end |
#tz_utc ⇒ Object
default: yes Enable TIMESTAMP columns to be dumped and reloaded between different time zones. mk-parallel-dump sets its connection time zone to UTC and adds “SET TIME_ZONE=‘+00:00’” to the dump file. Without this option, TIMESTAMP columns are dumped and reloaded in the time zones local to the source and destination servers, which can cause the values to change. This option also protects against changes due to daylight saving time. This option is identical to “mysqldump –tz-utc”. In fact, the above text was copied from mysqldump’s man page.
288 289 290 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 288 def tz_utc @tz_utc end |
#umask ⇒ Object
type: string Set the program’s “umask” to this octal value. This is useful when you want created files and directories to be readable or writable by other users (for example, the MySQL server itself).
295 296 297 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 295 def umask @umask end |
#user ⇒ Object
short form: -u; type: string User for login if not current user.
300 301 302 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 300 def user @user end |
#verbose ⇒ Object
short form: -v; cumulative: yes Be verbose; can specify multiple times. See “OUTPUT”.
306 307 308 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 306 def verbose @verbose end |
#version ⇒ Object
Show version and exit.
310 311 312 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 310 def version @version end |
#wait ⇒ Object
short form: -w; type: time; default: 5m Wait limit when the server is down. If the MySQL server crashes during dumping, waits until the server comes back and then continues with the rest of the tables. “mk-parallel-dump” will check the server every second until this time is exhausted, at which point it will give up and exit. This implements Peter Zaitsev’s “safe dump” request: sometimes a dump on a server that has corrupt data will kill the server. mk-parallel-dump will wait for the server to restart, then keep going. It’s hard to say which table killed the server, so no tables will be retried. Tables that were being concurrently dumped when the crash happened will not be retried. No additional locks will be taken after the server restarts; it’s assumed this behavior is useful only on a server you’re not trying to dump while it’s in production.
324 325 326 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 324 def wait @wait end |
#zero_chunk ⇒ Object
default: yes Add a chunk for rows with zero or zero-equivalent values. The only has an effect when “–chunk-size” is specified. The purpose of the zero chunk is to capture a potentially large number of zero values that would imbalance the size of the first chunk. For example, if a lot of negative numbers were inserted into an unsigned integer column causing them to be stored as zeros, then these zero values are captured by the zero chunk instead of the first chunk and all its non-zero values.
333 334 335 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 333 def zero_chunk @zero_chunk end |
Instance Method Details
#start(options = nil) ⇒ Object
Execute the command
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 |
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 349 def start( = nil) tmp = Tempfile.new('tmp') command = option_string() + .to_s + " 2> " + tmp.path success = system(command) if success begin while (line = tmp.readline) line.chomp selected_string = line end rescue EOFError tmp.close end return selected_string else tmp.close! return success end end |