Class: Maatkit::ParallelDump

Inherits:

Object

Object
Maatkit::ParallelDump

show all

Defined in:: lib/maatkit-ruby/mk-parallel-dump.rb

Overview

Dump MySQL tables in parallel.

Maatkit::ParallelDump.new( array, str, array)

Instance Attribute Summary collapse

#ask_pass ⇒ Object

Prompt for a password when connecting to MySQL.
#base_dir ⇒ Object

type: string The base directory in which files will be stored.
#biggest_first ⇒ Object

default: yes Process tables in descending order of size (biggest to smallest).
#bin_log_position ⇒ Object

default: yes Dump the master/slave position.
#charset ⇒ Object

short form: -A; type: string Default character set.
#chunk_size ⇒ Object

type: string Number of rows or data size to dump per file.
#client_side_buffering ⇒ Object

Fetch and buffer results in memory on client.
#config ⇒ Object

type: Array Read this comma-separated list of config files; if specified, this must be the first option on the command line.
#csv ⇒ Object

Do “–tab” dump in CSV format (implies “–tab”).
#databases ⇒ Object

short form: -d; type: hash Dump only this comma-separated list of databases.
#databases_regex ⇒ Object

type: string Dump only databases whose names match this Perl regex.
#defaults_file ⇒ Object

short form: -F; type: string Only read mysql options from the given file.
#dry_run ⇒ Object

Print commands instead of executing them.
#engines ⇒ Object

short form: -e; type: hash Dump only tables that use this comma-separated list of storage engines.
#flush_lock ⇒ Object

Use “FLUSH TABLES WITH READ LOCK”.
#flush_log ⇒ Object

Execute “FLUSH LOGS” when getting binlog positions.
#gzip ⇒ Object

default: yes Compress (gzip) SQL dump files; does not work with “–tab”.
#help ⇒ Object

Show help and exit.
#host ⇒ Object

short form: -h; type: string Connect to host.
#ignore_databases ⇒ Object

type: Hash Ignore this comma-separated list of databases.
#ignore_databases_regex ⇒ Object

type: string Ignore databases whose names match this Perl regex.
#ignore_engines ⇒ Object

type: Hash; default: FEDERATED,MRG_MyISAM Do not dump tables that use this comma-separated list of storage engines.
#ignore_tables ⇒ Object

type: Hash Ignore this comma-separated list of table names.
#ignore_tables_regex ⇒ Object

type: string Ignore tables whose names match the Perl regex.
#lock_tables ⇒ Object

Use “LOCK TABLES” (disables “–[no]flush-lock”).
#lossless_floats ⇒ Object

Dump float types with extra precision for lossless restore (requires “–tab”).
#password ⇒ Object

short form: -p; type: string Password to use when connecting.
#path_to_mk_parallel_dump ⇒ Object

Sets the executable path, otherwise the environment path will be used.
#pid ⇒ Object

type: string Create the given PID file.
#port ⇒ Object

short form: -P; type: int Port number to use for connection.
#progress ⇒ Object

Display progress reports.
#quiet ⇒ Object

short form: -q Quiet output; disables “–verbose”.
#resume ⇒ Object

default: yes Resume dumps.
#set_vars ⇒ Object

type: string; default: wait_timeout=10000 Set these MySQL variables.
#socket ⇒ Object

short form: -S; type: string Socket file to use for connection.
#stop_slave ⇒ Object

Issue “STOP SLAVE” on server before dumping data.
#tab ⇒ Object

Dump tab-separated (sets “–umask” 0).
#tables ⇒ Object

short form: -t; type: hash Dump only this comma-separated list of table names.
#tables_regex ⇒ Object

type: string Dump only tables whose names match this Perl regex.
#threads ⇒ Object

type: int; default: 2 Number of threads to dump concurrently.
#tz_utc ⇒ Object

default: yes Enable TIMESTAMP columns to be dumped and reloaded between different time zones.
#umask ⇒ Object

type: string Set the program’s “umask” to this octal value.
#user ⇒ Object

short form: -u; type: string User for login if not current user.
#verbose ⇒ Object

short form: -v; cumulative: yes Be verbose; can specify multiple times.
#version ⇒ Object

Show version and exit.
#wait ⇒ Object

short form: -w; type: time; default: 5m Wait limit when the server is down.
#zero_chunk ⇒ Object

default: yes Add a chunk for rows with zero or zero-equivalent values.

Instance Method Summary collapse

#initialize ⇒ ParallelDump constructor

Returns a new ParallelDump Object.
#start(options = nil) ⇒ Object

Execute the command.

Constructor Details

#initialize ⇒ `ParallelDump`

Returns a new ParallelDump Object



343
344

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 343

def initialize()
end

Instance Attribute Details

#ask_pass ⇒ `Object`

Prompt for a password when connecting to MySQL.



17
18
19

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 17

def ask_pass
  @ask_pass
end

#base_dir ⇒ `Object`

type: string The base directory in which files will be stored. The default is the current working directory. Each database gets its own directory under the base directory. So if the base directory is “/tmp” and database “foo” is dumped, then the directory “/tmp/foo” is created which contains all the table dump files for “foo”.



25
26
27

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 25

def base_dir
  @base_dir
end

#biggest_first ⇒ `Object`

default: yes Process tables in descending order of size (biggest to smallest). This strategy gives better parallelization. Suppose there are 8 threads and the last table is huge. We will finish everything else and then be running single-threaded while that one finishes. If that one runs first, then we will have the max number of threads running at a time for as long as possible.



33
34
35

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 33

def biggest_first
  @biggest_first
end

#bin_log_position ⇒ `Object`

default: yes Dump the master/slave position. Dump binary log positions from both “SHOW MASTER STATUS” and “SHOW SLAVE STATUS”, whichever can be retrieved from the server. The data is dumped to a file named 00_master_data.sql in the “–base-dir”. The file also contains details of each table dumped, including the WHERE clauses used to dump it in chunks.



42
43
44

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 42

def bin_log_position
  @bin_log_position
end

#charset ⇒ `Object`

short form: -A; type: string Default character set. If the value is utf8, sets Perl’s binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.



49
50
51

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 49

def charset
  @charset
end

#chunk_size ⇒ `Object`

type: string Number of rows or data size to dump per file. Specifies that the table should be dumped in segments of approximately the size given. The syntax is either a plain integer, which is interpreted as a number of rows per chunk, or an integer with a suffix of G, M, or k, which is interpreted as the size of the data to be dumped in each chunk. See “CHUNKS” for more details.



58
59
60

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 58

def chunk_size
  @chunk_size
end

#client_side_buffering ⇒ `Object`

Fetch and buffer results in memory on client. By default this option is not enabled because it causes data to be completely fetched from the server then buffered in-memory on the client. For large dumps this can require a lot of memory Instead, the default (when this option is not specified) is to fetch and dump rows one-by-one from the server. This requires a lot less memory on the client but can keep the tables on the server locked longer. Use this option only if you’re sure that the data being dumped is relatively small and the client has sufficient memory. Remember that, if this option is specified, all “–threads” will buffer their results in-memory, so memory consumption can increase by a factor of N “–threads”.



70
71
72

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 70

def client_side_buffering
  @client_side_buffering
end

#config ⇒ `Object`

type: Array Read this comma-separated list of config files; if specified, this must be the first option on the command line.



76
77
78

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 76

def config
  @config
end

#csv ⇒ `Object`

Do “–tab” dump in CSV format (implies “–tab”). Changes “–tab” options so the dump file is in comma-separated values (CSV) format. The SELECT INTO OUTFILE statement looks like the following, and can be re-loaded with the same options: # SELECT * INTO OUTFILE %D.%N.%6C.txt # FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘"’ # LINES TERMINATED BY ‘n’ FROM %D.%N;



85
86
87

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 85

def csv
  @csv
end

#databases ⇒ `Object`

short form: -d; type: hash Dump only this comma-separated list of databases.



90
91
92

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 90

def databases
  @databases
end

#databases_regex ⇒ `Object`

type: string Dump only databases whose names match this Perl regex.



95
96
97

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 95

def databases_regex
  @databases_regex
end

#defaults_file ⇒ `Object`

short form: -F; type: string Only read mysql options from the given file. You must give an absolute pathname.



100
101
102

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 100

def defaults_file
  @defaults_file
end

#dry_run ⇒ `Object`

Print commands instead of executing them.



104
105
106

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 104

def dry_run
  @dry_run
end

#engines ⇒ `Object`

short form: -e; type: hash Dump only tables that use this comma-separated list of storage engines.



109
110
111

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 109

def engines
  @engines
end

#flush_lock ⇒ `Object`

Use “FLUSH TABLES WITH READ LOCK”. This is enabled by default. The lock is taken once, at the beginning of the whole process and is released after all tables have been dumped. If you want to lock only the tables you’re dumping, use “–lock-tables”.



116
117
118

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 116

def flush_lock
  @flush_lock
end

#flush_log ⇒ `Object`

Execute “FLUSH LOGS” when getting binlog positions. This option is NOT enabled by default because it causes the MySQL server to rotate its error log, potentially overwriting error messages.



122
123
124

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 122

def flush_log
  @flush_log
end

#gzip ⇒ `Object`

default: yes Compress (gzip) SQL dump files; does not work with “–tab”. The IO::Compress::Gzip Perl module is used to compress SQL dump files as they are written to disk. The resulting dump files have a “.gz” extension, like “table.000000.sql.gz”. They can be uncompressed with gzip. mk-parallel-restore will automatically uncompress them, too, when restoring. This option does not work with “–tab” because the MySQL server writes the tab dump files directly using “SELECT INTO OUTFILE”.



132
133
134

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 132

def gzip
  @gzip
end

#help ⇒ `Object`

Show help and exit.



136
137
138

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 136

def help
  @help
end

#host ⇒ `Object`

short form: -h; type: string Connect to host.



141
142
143

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 141

def host
  @host
end

#ignore_databases ⇒ `Object`

type: Hash Ignore this comma-separated list of databases.



146
147
148

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 146

def ignore_databases
  @ignore_databases
end

#ignore_databases_regex ⇒ `Object`

type: string Ignore databases whose names match this Perl regex.



151
152
153

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 151

def ignore_databases_regex
  @ignore_databases_regex
end

#ignore_engines ⇒ `Object`

type: Hash; default: FEDERATED,MRG_MyISAM Do not dump tables that use this comma-separated list of storage engines. The schema file will be dumped as usual. This prevents dumping data for Federated tables and Merge tables.



158
159
160

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 158

def ignore_engines
  @ignore_engines
end

#ignore_tables ⇒ `Object`

type: Hash Ignore this comma-separated list of table names. Table names may be qualified with the database name.



164
165
166

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 164

def ignore_tables
  @ignore_tables
end

#ignore_tables_regex ⇒ `Object`

type: string Ignore tables whose names match the Perl regex.



169
170
171

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 169

def ignore_tables_regex
  @ignore_tables_regex
end

#lock_tables ⇒ `Object`

Use “LOCK TABLES” (disables “–[no]flush-lock”). Disables “–[no]flush-lock” (unless it was explicitly set) and locks tables with “LOCK TABLES READ”. The lock is taken and released for every table as it is dumped.



175
176
177

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 175

def lock_tables
  @lock_tables
end

#lossless_floats ⇒ `Object`

Dump float types with extra precision for lossless restore (requires “–tab”). Wraps these types with a call to “FORMAT()” with 17 digits of precision. According to the comments in Google’s patches, this will give lossless dumping and reloading in most cases. (I shamelessly stole this technique from them. I don’t know enough about floating-point math to have an opinion). This works only with “–tab”.



183
184
185

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 183

def lossless_floats
  @lossless_floats
end

#password ⇒ `Object`

short form: -p; type: string Password to use when connecting.



188
189
190

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 188

def password
  @password
end

#path_to_mk_parallel_dump ⇒ `Object`

Sets the executable path, otherwise the environment path will be used.



338
339
340

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 338

def path_to_mk_parallel_dump
  @path_to_mk_parallel_dump
end

#pid ⇒ `Object`

type: string Create the given PID file. The file contains the process ID of the script. The PID file is removed when the script exits. Before starting, the script checks if the PID file already exists. If it does not, then the script creates and writes its own PID to it. If it does, then the script checks the following: if the file contains a PID and a process is running with that PID, then the script dies; or, if there is no process running with that PID, then the script overwrites the file with its own PID and starts; else, if the file contains no PID, then the script dies.



198
199
200

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 198

def pid
  @pid
end

#port ⇒ `Object`

short form: -P; type: int Port number to use for connection.



203
204
205

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 203

def port
  @port
end

#progress ⇒ `Object`

Display progress reports. Progress is displayed each time a table or chunk of a table finishes dumping. Progress is calculated by measuring the average data size of each full chunk and assuming all bytes are created equal. The output is the completed and total bytes, the percent completed, estimated time remaining, and estimated completion time. For example:

40.72k/112.00k  36.36% ETA 00:00 (2009-10-27T19:17:53)

If “–chunk-size” is not specified then each table is effectively one big chunk and the progress reports are pretty accurate. When “–chunk-size” is specified the progress reports can be skewed because of averaging. Progress reports are inaccurate when a dump is resumed. This is known issue and will be fixed in a later release.



217
218
219

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 217

def progress
  @progress
end

#quiet ⇒ `Object`

short form: -q Quiet output; disables “–verbose”.



222
223
224

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 222

def quiet
  @quiet
end

#resume ⇒ `Object`

default: yes Resume dumps.



227
228
229

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 227

def resume
  @resume
end

#set_vars ⇒ `Object`

type: string; default: wait_timeout=10000 Set these MySQL variables. Immediately after connecting to MySQL, this string will be appended to SET and executed.



233
234
235

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 233

def set_vars
  @set_vars
end

#socket ⇒ `Object`

short form: -S; type: string Socket file to use for connection.



238
239
240

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 238

def socket
  @socket
end

#stop_slave ⇒ `Object`

Issue “STOP SLAVE” on server before dumping data. This ensures that the data is not changing during the dump. Issues “START SLAVE” after the dump is complete. If the slave is not running, throws an error and exits. This is to prevent possibly bad things from happening if the slave is not running because of a problem, or because someone intentionally stopped the slave for maintenance or some other purpose.



247
248
249

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 247

def stop_slave
  @stop_slave
end

#tab ⇒ `Object`

Dump tab-separated (sets “–umask” 0). Dump via “SELECT INTO OUTFILE”, which is similar to what “mysqldump” does with the “–tab” option, but you’re not constrained to a single database at a time. Before you use this option, make sure you know what “SELECT INTO OUTFILE” does! I recommend using it only if you’re running mk-parallel-dump on the same machine as the MySQL server, but there is no protection if you don’t. This option sets “–umask” to zero so auto-created directories are writable by the MySQL server.



257
258
259

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 257

def tab
  @tab
end

#tables ⇒ `Object`

short form: -t; type: hash Dump only this comma-separated list of table names. Table names may be qualified with the database name.



263
264
265

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 263

def tables
  @tables
end

#tables_regex ⇒ `Object`

type: string Dump only tables whose names match this Perl regex.



268
269
270

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 268

def tables_regex
  @tables_regex
end

#threads ⇒ `Object`

type: int; default: 2 Number of threads to dump concurrently. Specifies the number of parallel processes to run. The default is 2 (this is mk-parallel-dump, after all – 1 is not parallel). On GNU/Linux machines, the default is the number of times ‘processor’ appears in /proc/cpuinfo. On Windows, the default is read from the environment. In any case, the default is at least 2, even when there’s only a single processor.



277
278
279

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 277

def threads
  @threads
end

#tz_utc ⇒ `Object`

default: yes Enable TIMESTAMP columns to be dumped and reloaded between different time zones. mk-parallel-dump sets its connection time zone to UTC and adds “SET TIME_ZONE=‘+00:00’” to the dump file. Without this option, TIMESTAMP columns are dumped and reloaded in the time zones local to the source and destination servers, which can cause the values to change. This option also protects against changes due to daylight saving time. This option is identical to “mysqldump –tz-utc”. In fact, the above text was copied from mysqldump’s man page.



288
289
290

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 288

def tz_utc
  @tz_utc
end

#umask ⇒ `Object`

type: string Set the program’s “umask” to this octal value. This is useful when you want created files and directories to be readable or writable by other users (for example, the MySQL server itself).



295
296
297

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 295

def umask
  @umask
end

#user ⇒ `Object`

short form: -u; type: string User for login if not current user.



300
301
302

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 300

def user
  @user
end

#verbose ⇒ `Object`

short form: -v; cumulative: yes Be verbose; can specify multiple times. See “OUTPUT”.



306
307
308

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 306

def verbose
  @verbose
end

#version ⇒ `Object`

Show version and exit.



310
311
312

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 310

def version
  @version
end

#wait ⇒ `Object`

short form: -w; type: time; default: 5m Wait limit when the server is down. If the MySQL server crashes during dumping, waits until the server comes back and then continues with the rest of the tables. “mk-parallel-dump” will check the server every second until this time is exhausted, at which point it will give up and exit. This implements Peter Zaitsev’s “safe dump” request: sometimes a dump on a server that has corrupt data will kill the server. mk-parallel-dump will wait for the server to restart, then keep going. It’s hard to say which table killed the server, so no tables will be retried. Tables that were being concurrently dumped when the crash happened will not be retried. No additional locks will be taken after the server restarts; it’s assumed this behavior is useful only on a server you’re not trying to dump while it’s in production.



324
325
326

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 324

def wait
  @wait
end

#zero_chunk ⇒ `Object`

default: yes Add a chunk for rows with zero or zero-equivalent values. The only has an effect when “–chunk-size” is specified. The purpose of the zero chunk is to capture a potentially large number of zero values that would imbalance the size of the first chunk. For example, if a lot of negative numbers were inserted into an unsigned integer column causing them to be stored as zeros, then these zero values are captured by the zero chunk instead of the first chunk and all its non-zero values.



333
334
335

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 333

def zero_chunk
  @zero_chunk
end

Instance Method Details

#start(options = nil) ⇒ `Object`

Execute the command

# File 'lib/maatkit-ruby/mk-parallel-dump.rb', line 349

def start(options = nil)
  tmp = Tempfile.new('tmp')
  command = option_string() + options.to_s + " 2> " + tmp.path
  success = system(command)
  if success
    begin
      while (line = tmp.readline)
        line.chomp
        selected_string = line
      end
    rescue EOFError
      tmp.close
    end
    return selected_string
  else
    tmp.close!
    return success
  end
end

Class: Maatkit::ParallelDump

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ ParallelDump

Instance Attribute Details

#ask_pass ⇒ Object

#base_dir ⇒ Object

#biggest_first ⇒ Object

#bin_log_position ⇒ Object

#charset ⇒ Object

#chunk_size ⇒ Object

#client_side_buffering ⇒ Object

#config ⇒ Object

#csv ⇒ Object

#databases ⇒ Object

#databases_regex ⇒ Object

#defaults_file ⇒ Object

#dry_run ⇒ Object

#engines ⇒ Object

#flush_lock ⇒ Object

#flush_log ⇒ Object

#gzip ⇒ Object

#help ⇒ Object

#host ⇒ Object

#ignore_databases ⇒ Object

#ignore_databases_regex ⇒ Object

#ignore_engines ⇒ Object

#ignore_tables ⇒ Object

#ignore_tables_regex ⇒ Object

#lock_tables ⇒ Object

#lossless_floats ⇒ Object

#password ⇒ Object

#path_to_mk_parallel_dump ⇒ Object

#pid ⇒ Object

#port ⇒ Object

#progress ⇒ Object

#quiet ⇒ Object

#resume ⇒ Object

#set_vars ⇒ Object

#socket ⇒ Object

#stop_slave ⇒ Object

#tab ⇒ Object

#tables ⇒ Object

#tables_regex ⇒ Object

#threads ⇒ Object

#tz_utc ⇒ Object

#umask ⇒ Object

#user ⇒ Object

#verbose ⇒ Object

#version ⇒ Object

#wait ⇒ Object

#zero_chunk ⇒ Object