Class: Maatkit::ParallelDump

Inherits:
Object
  • Object
show all
Defined in:
lib/maatkit-ruby/mk-parallel-dump.rb

Overview

Dump MySQL tables in parallel.

Maatkit::ParallelDump.new( array, str, array)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeParallelDump

Returns a new ParallelDump Object



343
344
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 343

def initialize()
end

Instance Attribute Details

#ask_passObject

Prompt for a password when connecting to MySQL.



17
18
19
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 17

def ask_pass
  @ask_pass
end

#base_dirObject

type: string The base directory in which files will be stored. The default is the current working directory. Each database gets its own directory under the base directory. So if the base directory is “/tmp” and database “foo” is dumped, then the directory “/tmp/foo” is created which contains all the table dump files for “foo”.



25
26
27
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 25

def base_dir
  @base_dir
end

#biggest_firstObject

default: yes Process tables in descending order of size (biggest to smallest). This strategy gives better parallelization. Suppose there are 8 threads and the last table is huge. We will finish everything else and then be running single-threaded while that one finishes. If that one runs first, then we will have the max number of threads running at a time for as long as possible.



33
34
35
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 33

def biggest_first
  @biggest_first
end

#bin_log_positionObject

default: yes Dump the master/slave position. Dump binary log positions from both “SHOW MASTER STATUS” and “SHOW SLAVE STATUS”, whichever can be retrieved from the server. The data is dumped to a file named 00_master_data.sql in the “–base-dir”. The file also contains details of each table dumped, including the WHERE clauses used to dump it in chunks.



42
43
44
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 42

def bin_log_position
  @bin_log_position
end

#charsetObject

short form: -A; type: string Default character set. If the value is utf8, sets Perl’s binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.



49
50
51
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 49

def charset
  @charset
end

#chunk_sizeObject

type: string Number of rows or data size to dump per file. Specifies that the table should be dumped in segments of approximately the size given. The syntax is either a plain integer, which is interpreted as a number of rows per chunk, or an integer with a suffix of G, M, or k, which is interpreted as the size of the data to be dumped in each chunk. See “CHUNKS” for more details.



58
59
60
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 58

def chunk_size
  @chunk_size
end

#client_side_bufferingObject

Fetch and buffer results in memory on client. By default this option is not enabled because it causes data to be completely fetched from the server then buffered in-memory on the client. For large dumps this can require a lot of memory Instead, the default (when this option is not specified) is to fetch and dump rows one-by-one from the server. This requires a lot less memory on the client but can keep the tables on the server locked longer. Use this option only if you’re sure that the data being dumped is relatively small and the client has sufficient memory. Remember that, if this option is specified, all “–threads” will buffer their results in-memory, so memory consumption can increase by a factor of N “–threads”.



70
71
72
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 70

def client_side_buffering
  @client_side_buffering
end

#configObject

type: Array Read this comma-separated list of config files; if specified, this must be the first option on the command line.



76
77
78
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 76

def config
  @config
end

#csvObject

Do “–tab” dump in CSV format (implies “–tab”). Changes “–tab” options so the dump file is in comma-separated values (CSV) format. The SELECT INTO OUTFILE statement looks like the following, and can be re-loaded with the same options: # SELECT * INTO OUTFILE %D.%N.%6C.txt # FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘"’ # LINES TERMINATED BY ‘n’ FROM %D.%N;



85
86
87
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 85

def csv
  @csv
end

#databasesObject

short form: -d; type: hash Dump only this comma-separated list of databases.



90
91
92
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 90

def databases
  @databases
end

#databases_regexObject

type: string Dump only databases whose names match this Perl regex.



95
96
97
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 95

def databases_regex
  @databases_regex
end

#defaults_fileObject

short form: -F; type: string Only read mysql options from the given file. You must give an absolute pathname.



100
101
102
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 100

def defaults_file
  @defaults_file
end

#dry_runObject

Print commands instead of executing them.



104
105
106
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 104

def dry_run
  @dry_run
end

#enginesObject

short form: -e; type: hash Dump only tables that use this comma-separated list of storage engines.



109
110
111
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 109

def engines
  @engines
end

#flush_lockObject

Use “FLUSH TABLES WITH READ LOCK”. This is enabled by default. The lock is taken once, at the beginning of the whole process and is released after all tables have been dumped. If you want to lock only the tables you’re dumping, use “–lock-tables”.



116
117
118
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 116

def flush_lock
  @flush_lock
end

#flush_logObject

Execute “FLUSH LOGS” when getting binlog positions. This option is NOT enabled by default because it causes the MySQL server to rotate its error log, potentially overwriting error messages.



122
123
124
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 122

def flush_log
  @flush_log
end

#gzipObject

default: yes Compress (gzip) SQL dump files; does not work with “–tab”. The IO::Compress::Gzip Perl module is used to compress SQL dump files as they are written to disk. The resulting dump files have a “.gz” extension, like “table.000000.sql.gz”. They can be uncompressed with gzip. mk-parallel-restore will automatically uncompress them, too, when restoring. This option does not work with “–tab” because the MySQL server writes the tab dump files directly using “SELECT INTO OUTFILE”.



132
133
134
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 132

def gzip
  @gzip
end

#helpObject

Show help and exit.



136
137
138
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 136

def help
  @help
end

#hostObject

short form: -h; type: string Connect to host.



141
142
143
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 141

def host
  @host
end

#ignore_databasesObject

type: Hash Ignore this comma-separated list of databases.



146
147
148
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 146

def ignore_databases
  @ignore_databases
end

#ignore_databases_regexObject

type: string Ignore databases whose names match this Perl regex.



151
152
153
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 151

def ignore_databases_regex
  @ignore_databases_regex
end

#ignore_enginesObject

type: Hash; default: FEDERATED,MRG_MyISAM Do not dump tables that use this comma-separated list of storage engines. The schema file will be dumped as usual. This prevents dumping data for Federated tables and Merge tables.



158
159
160
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 158

def ignore_engines
  @ignore_engines
end

#ignore_tablesObject

type: Hash Ignore this comma-separated list of table names. Table names may be qualified with the database name.



164
165
166
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 164

def ignore_tables
  @ignore_tables
end

#ignore_tables_regexObject

type: string Ignore tables whose names match the Perl regex.



169
170
171
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 169

def ignore_tables_regex
  @ignore_tables_regex
end

#lock_tablesObject

Use “LOCK TABLES” (disables “–[no]flush-lock”). Disables “–[no]flush-lock” (unless it was explicitly set) and locks tables with “LOCK TABLES READ”. The lock is taken and released for every table as it is dumped.



175
176
177
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 175

def lock_tables
  @lock_tables
end

#lossless_floatsObject

Dump float types with extra precision for lossless restore (requires “–tab”). Wraps these types with a call to “FORMAT()” with 17 digits of precision. According to the comments in Google’s patches, this will give lossless dumping and reloading in most cases. (I shamelessly stole this technique from them. I don’t know enough about floating-point math to have an opinion). This works only with “–tab”.



183
184
185
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 183

def lossless_floats
  @lossless_floats
end

#passwordObject

short form: -p; type: string Password to use when connecting.



188
189
190
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 188

def password
  @password
end

#path_to_mk_parallel_dumpObject

Sets the executable path, otherwise the environment path will be used.



338
339
340
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 338

def path_to_mk_parallel_dump
  @path_to_mk_parallel_dump
end

#pidObject

type: string Create the given PID file. The file contains the process ID of the script. The PID file is removed when the script exits. Before starting, the script checks if the PID file already exists. If it does not, then the script creates and writes its own PID to it. If it does, then the script checks the following: if the file contains a PID and a process is running with that PID, then the script dies; or, if there is no process running with that PID, then the script overwrites the file with its own PID and starts; else, if the file contains no PID, then the script dies.



198
199
200
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 198

def pid
  @pid
end

#portObject

short form: -P; type: int Port number to use for connection.



203
204
205
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 203

def port
  @port
end

#progressObject

Display progress reports. Progress is displayed each time a table or chunk of a table finishes dumping. Progress is calculated by measuring the average data size of each full chunk and assuming all bytes are created equal. The output is the completed and total bytes, the percent completed, estimated time remaining, and estimated completion time. For example:

40.72k/112.00k  36.36% ETA 00:00 (2009-10-27T19:17:53)

If “–chunk-size” is not specified then each table is effectively one big chunk and the progress reports are pretty accurate. When “–chunk-size” is specified the progress reports can be skewed because of averaging. Progress reports are inaccurate when a dump is resumed. This is known issue and will be fixed in a later release.



217
218
219
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 217

def progress
  @progress
end

#quietObject

short form: -q Quiet output; disables “–verbose”.



222
223
224
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 222

def quiet
  @quiet
end

#resumeObject

default: yes Resume dumps.



227
228
229
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 227

def resume
  @resume
end

#set_varsObject

type: string; default: wait_timeout=10000 Set these MySQL variables. Immediately after connecting to MySQL, this string will be appended to SET and executed.



233
234
235
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 233

def set_vars
  @set_vars
end

#socketObject

short form: -S; type: string Socket file to use for connection.



238
239
240
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 238

def socket
  @socket
end

#stop_slaveObject

Issue “STOP SLAVE” on server before dumping data. This ensures that the data is not changing during the dump. Issues “START SLAVE” after the dump is complete. If the slave is not running, throws an error and exits. This is to prevent possibly bad things from happening if the slave is not running because of a problem, or because someone intentionally stopped the slave for maintenance or some other purpose.



247
248
249
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 247

def stop_slave
  @stop_slave
end

#tabObject

Dump tab-separated (sets “–umask” 0). Dump via “SELECT INTO OUTFILE”, which is similar to what “mysqldump” does with the “–tab” option, but you’re not constrained to a single database at a time. Before you use this option, make sure you know what “SELECT INTO OUTFILE” does! I recommend using it only if you’re running mk-parallel-dump on the same machine as the MySQL server, but there is no protection if you don’t. This option sets “–umask” to zero so auto-created directories are writable by the MySQL server.



257
258
259
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 257

def tab
  @tab
end

#tablesObject

short form: -t; type: hash Dump only this comma-separated list of table names. Table names may be qualified with the database name.



263
264
265
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 263

def tables
  @tables
end

#tables_regexObject

type: string Dump only tables whose names match this Perl regex.



268
269
270
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 268

def tables_regex
  @tables_regex
end

#threadsObject

type: int; default: 2 Number of threads to dump concurrently. Specifies the number of parallel processes to run. The default is 2 (this is mk-parallel-dump, after all – 1 is not parallel). On GNU/Linux machines, the default is the number of times ‘processor’ appears in /proc/cpuinfo. On Windows, the default is read from the environment. In any case, the default is at least 2, even when there’s only a single processor.



277
278
279
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 277

def threads
  @threads
end

#tz_utcObject

default: yes Enable TIMESTAMP columns to be dumped and reloaded between different time zones. mk-parallel-dump sets its connection time zone to UTC and adds “SET TIME_ZONE=‘+00:00’” to the dump file. Without this option, TIMESTAMP columns are dumped and reloaded in the time zones local to the source and destination servers, which can cause the values to change. This option also protects against changes due to daylight saving time. This option is identical to “mysqldump –tz-utc”. In fact, the above text was copied from mysqldump’s man page.



288
289
290
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 288

def tz_utc
  @tz_utc
end

#umaskObject

type: string Set the program’s “umask” to this octal value. This is useful when you want created files and directories to be readable or writable by other users (for example, the MySQL server itself).



295
296
297
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 295

def umask
  @umask
end

#userObject

short form: -u; type: string User for login if not current user.



300
301
302
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 300

def user
  @user
end

#verboseObject

short form: -v; cumulative: yes Be verbose; can specify multiple times. See “OUTPUT”.



306
307
308
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 306

def verbose
  @verbose
end

#versionObject

Show version and exit.



310
311
312
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 310

def version
  @version
end

#waitObject

short form: -w; type: time; default: 5m Wait limit when the server is down. If the MySQL server crashes during dumping, waits until the server comes back and then continues with the rest of the tables. “mk-parallel-dump” will check the server every second until this time is exhausted, at which point it will give up and exit. This implements Peter Zaitsev’s “safe dump” request: sometimes a dump on a server that has corrupt data will kill the server. mk-parallel-dump will wait for the server to restart, then keep going. It’s hard to say which table killed the server, so no tables will be retried. Tables that were being concurrently dumped when the crash happened will not be retried. No additional locks will be taken after the server restarts; it’s assumed this behavior is useful only on a server you’re not trying to dump while it’s in production.



324
325
326
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 324

def wait
  @wait
end

#zero_chunkObject

default: yes Add a chunk for rows with zero or zero-equivalent values. The only has an effect when “–chunk-size” is specified. The purpose of the zero chunk is to capture a potentially large number of zero values that would imbalance the size of the first chunk. For example, if a lot of negative numbers were inserted into an unsigned integer column causing them to be stored as zeros, then these zero values are captured by the zero chunk instead of the first chunk and all its non-zero values.



333
334
335
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 333

def zero_chunk
  @zero_chunk
end

Instance Method Details

#start(options = nil) ⇒ Object

Execute the command



349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 349

def start(options = nil)
  tmp = Tempfile.new('tmp')
  command = option_string() + options.to_s + " 2> " + tmp.path
  success = system(command)
  if success
    begin
      while (line = tmp.readline)
        line.chomp
        selected_string = line
      end
    rescue EOFError
      tmp.close
    end
    return selected_string
  else
    tmp.close!
    return success
  end
end