Class: Maatkit::Archiver

Inherits:

Object

Object
Maatkit::Archiver

show all

Defined in:: lib/maatkit-ruby/mk-archiver.rb

Overview

Archive rows from a MySQL table into another table or a file.

Maatkit::Archiver.new( array, str, array)

Instance Attribute Summary collapse

#analyze ⇒ Object

type: string Run ANALYZE TABLE afterwards on –source and/or –dest.
#ascend_first ⇒ Object

Ascend only first column of index.
#ask_pass ⇒ Object

Prompt for a password when connecting to MySQL.
#buffer ⇒ Object

Buffer output to –file and flush at commit.
#bulk_delete ⇒ Object

Delete each chunk with a single statement (implies –commit-each).
#bulk_insert ⇒ Object

Insert each chunk with LOAD DATA INFILE (implies –bulk-delete –commit-each).
#charset ⇒ Object

short form: -A; type: string Default character set.
#check_columns ⇒ Object

default: yes Ensure –source and –dest have same columns.
#check_interval ⇒ Object

type: time; default: 1s How often to check for slave lag if –check-slave-lag is given.
#check_slave_lag ⇒ Object

type: string Pause archiving until the specified DSN’s slave lag is less than –max-lag.
#columns ⇒ Object

short form: -c; type: array Comma-separated list of columns to archive.
#commit_each ⇒ Object

Commit each set of fetched and archived rows (disables –txn-size).
#config ⇒ Object

type: Array Read this comma-separated list of config files; if specified, this must be the first option on the command line.
#delayed_insert ⇒ Object

Add the DELAYED modifier to INSERT statements.
#dest ⇒ Object

type: DSN DSN specifying the table to archive to.
#dry_run ⇒ Object

Print queries and exit without doing anything.
#file ⇒ Object

type: string File to archive to, with DATE_FORMAT()-like formatting.
#for_update ⇒ Object

Adds the FOR UPDATE modifier to SELECT statements.
#header ⇒ Object

Print column header at top of –file.
#help ⇒ Object

Show help and exit.
#high_priority_select ⇒ Object

Adds the HIGH_PRIORITY modifier to SELECT statements.
#host ⇒ Object

short form: -h; type: string Connect to host.
#ignore ⇒ Object

Use IGNORE for INSERT statements.
#limit ⇒ Object

type: int; default: 1 Number of rows to fetch and archive per statement.
#local ⇒ Object

Do not write OPTIMIZE or ANALYZE queries to binlog.
#low_priority_delete ⇒ Object

Adds the LOW_PRIORITY modifier to DELETE statements.
#low_priority_insert ⇒ Object

Adds the LOW_PRIORITY modifier to INSERT or REPLACE statements.
#max_lag ⇒ Object

type: time; default: 1s Pause archiving if the slave given by –check-slave-lag lags.
#no_ascend ⇒ Object

Do not use ascending index optimization.
#no_delete ⇒ Object

Do not delete archived rows.
#nobulk_delete_limit ⇒ Object

default: yes Add –limit to –bulk-delete statement.
#optimize ⇒ Object

type: string Run OPTIMIZE TABLE afterwards on –source and/or –dest.
#password ⇒ Object

short form: -p; type: string Password to use when connecting.
#path_to_mk_archiver ⇒ Object

Sets the executable path, otherwise the environment path will be used.
#pid ⇒ Object

type: string Create the given PID file when daemonized.
#plugin ⇒ Object

type: string Perl module name to use as a generic plugin.
#port ⇒ Object

short form: -P; type: int Port number to use for connection.
#primary_key_only ⇒ Object

Primary key columns only.
#progress ⇒ Object

type: int Print progress information every X rows.
#purge ⇒ Object

Purge instead of archiving; allows omitting –file and –dest.
#quick_delete ⇒ Object

Adds the QUICK modifier to DELETE statements.
#quiet ⇒ Object

short form: -q Do not print any output, such as for –statistics.
#replace ⇒ Object

Causes INSERTs into –dest to be written as REPLACE.
#retries ⇒ Object

type: int; default: 1 Number of retries per timeout or deadlock.
#run_time ⇒ Object

type: time Time to run before exiting.
#safe_auto_increment ⇒ Object

default: yes Do not archive row with max AUTO_INCREMENT.
#sentinel ⇒ Object

type: string; default: /tmp/mk-archiver-sentinel Exit if this file exists.
#set_vars ⇒ Object

type: string; default: wait_timeout=10000 Set these MySQL variables.
#share_lock ⇒ Object

Adds the LOCK IN SHARE MODE modifier to SELECT statements.
#skip_foreign_key_checks ⇒ Object

Disables foreign key checks with SET FOREIGN_KEY_CHECKS=0.
#sleep ⇒ Object

type: int Sleep time between fetches.
#sleep_coef ⇒ Object

type: float Calculate –sleep as a multiple of the last SELECT time.
#socket ⇒ Object

short form: -S; type: string Socket file to use for connection.
#source ⇒ Object

type: DSN DSN specifying the table to archive from (required).
#statistics ⇒ Object

Collect and print timing statistics.
#stop ⇒ Object

Stop running instances by creating the sentinel file.
#txn_size ⇒ Object

type: int; default: 1 Number of rows per transaction.
#user ⇒ Object

short form: -u; type: string User for login if not current user.
#version ⇒ Object

Show version and exit.
#where ⇒ Object

type: string WHERE clause to limit which rows to archive (required).
#why_quit ⇒ Object

Print reason for exiting unless rows exhausted.

Instance Method Summary collapse

#initialize ⇒ Archiver constructor

Returns a new Archiver Object.
#start(options = nil) ⇒ Object

Execute the command.

Constructor Details

#initialize ⇒ `Archiver`

Returns a new Archiver Object



416
417

# File 'lib/maatkit-ruby/mk-archiver.rb', line 416

def initialize()
end

Instance Attribute Details

#analyze ⇒ `Object`

type: string Run ANALYZE TABLE afterwards on –source and/or –dest. Runs ANALYZE TABLE after finishing. The argument is an arbitrary string. If it contains the letter ‘s’, the source will be analyzed. If it contains ‘d’, the destination will be analyzed. You can specify either or both. For example, the following will analyze both:

--analyze=ds

See dev.mysql.com/doc/en/analyze-table.html for details on ANALYZE TABLE.



21
22
23

# File 'lib/maatkit-ruby/mk-archiver.rb', line 21

def analyze
  @analyze
end

#ascend_first ⇒ `Object`

Ascend only first column of index. If you do want to use the ascending index optimization (see –no-ascend), but do not want to incur the overhead of ascending a large multi-column index, you can use this option to tell mk-archiver to ascend only the leftmost column of the index. This can provide a significant performance boost over not ascending the index at all, while avoiding the cost of ascending the whole index. See EXTENDING for a discussion of how this interacts with plugins.



27
28
29

# File 'lib/maatkit-ruby/mk-archiver.rb', line 27

def ascend_first
  @ascend_first
end

#ask_pass ⇒ `Object`

Prompt for a password when connecting to MySQL.



31
32
33

# File 'lib/maatkit-ruby/mk-archiver.rb', line 31

def ask_pass
  @ask_pass
end

#buffer ⇒ `Object`

Buffer output to –file and flush at commit. Disables autoflushing to –file and flushes –file to disk only when a transaction commits. This typically means the file is block-flushed by the operating system, so there may be some implicit flushes to disk between commits as well. The default is to flush –file to disk after every row. The danger is that a crash might cause lost data. The performance increase I have seen from using –buffer is around 5 to 15 percent. Your mileage may vary.



38
39
40

# File 'lib/maatkit-ruby/mk-archiver.rb', line 38

def buffer
  @buffer
end

#bulk_delete ⇒ `Object`

Delete each chunk with a single statement (implies –commit-each). Delete each chunk of rows in bulk with a single DELETE statement. The statement deletes every row between the first and last row of the chunk, inclusive. It implies –commit-each, since it would be a bad idea to INSERT rows one at a time and commit them before the bulk DELETE. The normal method is to delete every row by its primary key. Bulk deletes might be a lot faster. They also might not be faster if you have a complex WHERE clause. This option completely defers all DELETE processing until the chunk of rows is finished. If you have a plugin on the source, its before_delete method will not be called. Instead, its before_bulk_delete method is called later. WARNING: if you have a plugin on the source that sometimes doesn’t return true from is_archivable(), you should use this option only if you understand what it does. If the plugin instructs mk-archiver not to archive a row, it will still be deleted by the bulk delete!



46
47
48

# File 'lib/maatkit-ruby/mk-archiver.rb', line 46

def bulk_delete
  @bulk_delete
end

#bulk_insert ⇒ `Object`

Insert each chunk with LOAD DATA INFILE (implies –bulk-delete –commit-each). Insert each chunk of rows with LOAD DATA LOCAL INFILE. This may be much faster than inserting a row at a time with INSERT statements. It is implemented by creating a temporary file for each chunk of rows, and writing the rows to this file instead of inserting them. When the chunk is finished, it uploads the rows. To protect the safety of your data, this option forces bulk deletes to be used. It would be unsafe to delete each row as it is found, before inserting the rows into the destination first. Forcing bulk deletes guarantees that the deletion waits until the insertion is successful. The –low-priority-insert, –replace, and –ignore options work with this option, but –delayed-insert does not.



59
60
61

# File 'lib/maatkit-ruby/mk-archiver.rb', line 59

def bulk_insert
  @bulk_insert
end

#charset ⇒ `Object`

short form: -A; type: string Default character set. If the value is utf8, sets Perl’s binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.



64
65
66

# File 'lib/maatkit-ruby/mk-archiver.rb', line 64

def charset
  @charset
end

#check_columns ⇒ `Object`

default: yes Ensure –source and –dest have same columns. Enabled by default; causes mk-archiver to check that the source and destination tables have the same columns. It does not check column order, data type, etc. It just checks that all columns in the source exist in the destination and vice versa. If there are any differences, mk-archiver will exit with an error. To disable this check, specify –no-check-columns.



71
72
73

# File 'lib/maatkit-ruby/mk-archiver.rb', line 71

def check_columns
  @check_columns
end

#check_interval ⇒ `Object`

type: time; default: 1s How often to check for slave lag if –check-slave-lag is given.



76
77
78

# File 'lib/maatkit-ruby/mk-archiver.rb', line 76

def check_interval
  @check_interval
end

#check_slave_lag ⇒ `Object`

type: string Pause archiving until the specified DSN’s slave lag is less than –max-lag.



81
82
83

# File 'lib/maatkit-ruby/mk-archiver.rb', line 81

def check_slave_lag
  @check_slave_lag
end

#columns ⇒ `Object`

short form: -c; type: array Comma-separated list of columns to archive. Specify a comma-separated list of columns to fetch, write to the file, and insert into the destination table. If specified, mk-archiver ignores other columns unless it needs to add them to the SELECT statement for ascending an index or deleting rows. It fetches and uses these extra columns internally, but does not write them to the file or to the destination table. It does pass them to plugins. See also –primary-key-only.



88
89
90

# File 'lib/maatkit-ruby/mk-archiver.rb', line 88

def columns
  @columns
end

#commit_each ⇒ `Object`

Commit each set of fetched and archived rows (disables –txn-size). Commits transactions and flushes –file after each set of rows has been archived, before fetching the next set of rows, and before sleeping if –sleep is specified. Disables –txn-size; use –limit to control the transaction size with –commit-each. This option is useful as a shortcut to make –limit and –txn-size the same value, but more importantly it avoids transactions being held open while searching for more rows. For example, imagine you are archiving old rows from the beginning of a very large table, with –limit 1000 and –txn-size 1000. After some period of finding and archiving 1000 rows at a time, mk-archiver finds the last 999 rows and archives them, then executes the next SELECT to find more rows. This scans the rest of the table, but never finds any more rows. It has held open a transaction for a very long time, only to determine it is finished anyway. You can use –commit-each to avoid this.



94
95
96

# File 'lib/maatkit-ruby/mk-archiver.rb', line 94

def commit_each
  @commit_each
end

#config ⇒ `Object`

type: Array Read this comma-separated list of config files; if specified, this must be the first option on the command line.



99
100
101

# File 'lib/maatkit-ruby/mk-archiver.rb', line 99

def config
  @config
end

#delayed_insert ⇒ `Object`

Add the DELAYED modifier to INSERT statements. Adds the DELAYED modifier to INSERT or REPLACE statements. See dev.mysql.com/doc/en/insert.html for details.



104
105
106

# File 'lib/maatkit-ruby/mk-archiver.rb', line 104

def delayed_insert
  @delayed_insert
end

#dest ⇒ `Object`

type: DSN DSN specifying the table to archive to. This item specifies a table into which mk-archiver will insert rows archived from –source. It uses the same key=val argument format as –source. Most missing values default to the same values as –source, so you don’t have to repeat options that are the same in –source and –dest. Use the –help option to see which values are copied from –source. WARNING: Using a default options file (F) DSN option that defines a socket for –source causes mk-archiver to connect to –dest using that socket unless another socket for –dest is specified. This means that mk-archiver may incorrectly connect to –source when it connects to –dest. For example:

--source F=host1.cnf,D=db,t=tbl --dest h=host2

When mk-archiver connects to –dest, host2, it will connect via the –source, host1, socket defined in host1.cnf.



113
114
115

# File 'lib/maatkit-ruby/mk-archiver.rb', line 113

def dest
  @dest
end

#dry_run ⇒ `Object`

Print queries and exit without doing anything. Causes mk-archiver to exit after printing the filename and SQL statements it will use.



118
119
120

# File 'lib/maatkit-ruby/mk-archiver.rb', line 118

def dry_run
  @dry_run
end

#file ⇒ `Object`

type: string File to archive to, with DATE_FORMAT()-like formatting. Filename to write archived rows to. A subset of MySQL’s DATE_FORMAT() formatting codes are allowed in the filename, as follows:

%d# Day of the month, numeric (01..31)
%H# Hour (00..23)
%i# Minutes, numeric (00..59)
%m# Month, numeric (01..12)
%s# Seconds (00..59)
%Y# Year, numeric, four digits

You can use the following extra format codes too:

%D# Database name
%t# Table name

Example:

--file '/var/log/archive/%Y-%m-%d-%D.%t'

The file’s contents are in the same format used by SELECT INTO OUTFILE, as documented in the MySQL manual: rows terminated by newlines, columns terminated by tabs, NULL characters are represented by N, and special characters are escaped by . This lets you reload a file with LOAD DATA INFILE’s default settings. If you want a column header at the top of the file, see –header. The file is auto-flushed by default; see –buffer.



137
138
139

# File 'lib/maatkit-ruby/mk-archiver.rb', line 137

def file
  @file
end

#for_update ⇒ `Object`

Adds the FOR UPDATE modifier to SELECT statements. For details, see dev.mysql.com/doc/en/innodb-locking-reads.html.



142
143
144

# File 'lib/maatkit-ruby/mk-archiver.rb', line 142

def for_update
  @for_update
end

#header ⇒ `Object`

Print column header at top of –file. Writes column names as the first line in the file given by –file. If the file exists, does not write headers; this keeps the file loadable with LOAD DATA INFILE in case you append more output to it.



147
148
149

# File 'lib/maatkit-ruby/mk-archiver.rb', line 147

def header
  @header
end

#help ⇒ `Object`

Show help and exit.



151
152
153

# File 'lib/maatkit-ruby/mk-archiver.rb', line 151

def help
  @help
end

#high_priority_select ⇒ `Object`

Adds the HIGH_PRIORITY modifier to SELECT statements. See dev.mysql.com/doc/en/select.html for details.



156
157
158

# File 'lib/maatkit-ruby/mk-archiver.rb', line 156

def high_priority_select
  @high_priority_select
end

#host ⇒ `Object`

short form: -h; type: string Connect to host.



161
162
163

# File 'lib/maatkit-ruby/mk-archiver.rb', line 161

def host
  @host
end

#ignore ⇒ `Object`

Use IGNORE for INSERT statements. Causes INSERTs into –dest to be INSERT IGNORE.



166
167
168

# File 'lib/maatkit-ruby/mk-archiver.rb', line 166

def ignore
  @ignore
end

#limit ⇒ `Object`

type: int; default: 1 Number of rows to fetch and archive per statement. Limits the number of rows returned by the SELECT statements that retrieve rows to archive. Default is one row. It may be more efficient to increase the limit, but be careful if you are archiving sparsely, skipping over many rows; this can potentially cause more contention with other queries, depending on the storage engine, transaction isolation level, and options such as –for-update.



172
173
174

# File 'lib/maatkit-ruby/mk-archiver.rb', line 172

def limit
  @limit
end

#local ⇒ `Object`

Do not write OPTIMIZE or ANALYZE queries to binlog. Adds the NO_WRITE_TO_BINLOG modifier to ANALYZE and OPTIMIZE queries. See –analyze for details.



177
178
179

# File 'lib/maatkit-ruby/mk-archiver.rb', line 177

def local
  @local
end

#low_priority_delete ⇒ `Object`

Adds the LOW_PRIORITY modifier to DELETE statements. See dev.mysql.com/doc/en/delete.html for details.



182
183
184

# File 'lib/maatkit-ruby/mk-archiver.rb', line 182

def low_priority_delete
  @low_priority_delete
end

#low_priority_insert ⇒ `Object`

Adds the LOW_PRIORITY modifier to INSERT or REPLACE statements. See dev.mysql.com/doc/en/insert.html for details.



187
188
189

# File 'lib/maatkit-ruby/mk-archiver.rb', line 187

def low_priority_insert
  @low_priority_insert
end

#max_lag ⇒ `Object`

type: time; default: 1s Pause archiving if the slave given by –check-slave-lag lags. This option causes mk-archiver to look at the slave every time it’s about to fetch another row. If the slave’s lag is greater than the option’s value, or if the slave isn’t running (so its lag is NULL), mk-table-checksum sleeps for –check-interval seconds and then looks at the lag again. It repeats until the slave is caught up, then proceeds to fetch and archive the row. This option may eliminate the need for –sleep or –sleep-coef.



194
195
196

# File 'lib/maatkit-ruby/mk-archiver.rb', line 194

def max_lag
  @max_lag
end

#no_ascend ⇒ `Object`

Do not use ascending index optimization. The default ascending-index optimization causes mk-archiver to optimize repeated SELECT queries so they seek into the index where the previous query ended, then scan along it, rather than scanning from the beginning of the table every time. This is enabled by default because it is generally a good strategy for repeated accesses. Large, multiple-column indexes may cause the WHERE clause to be complex enough that this could actually be less efficient. Consider for example a four-column PRIMARY KEY on (a, b, c, d). The WHERE clause to start where the last query ended is as follows:

WHERE (a > ?)

# OR (a = ? AND b > ?) # OR (a = ? AND b = ? AND c > ?) # OR (a = ? AND b = ? AND c = ? AND d >= ?) Populating the placeholders with values uses memory and CPU, adds network traffic and parsing overhead, and may make the query harder for MySQL to optimize. A four-column key isn’t a big deal, but a ten-column key in which every column allows NULL might be. Ascending the index might not be necessary if you know you are simply removing rows from the beginning of the table in chunks, but not leaving any holes, so starting at the beginning of the table is actually the most efficient thing to do. See also –ascend-first. See EXTENDING for a discussion of how this interacts with plugins.



207
208
209

# File 'lib/maatkit-ruby/mk-archiver.rb', line 207

def no_ascend
  @no_ascend
end

#no_delete ⇒ `Object`

Do not delete archived rows. Causes mk-archiver not to delete rows after processing them. This disallows –no-ascend, because enabling them both would cause an infinite loop. If there is a plugin on the source DSN, its before_delete method is called anyway, even though mk-archiver will not execute the delete. See EXTENDING for more on plugins.



213
214
215

# File 'lib/maatkit-ruby/mk-archiver.rb', line 213

def no_delete
  @no_delete
end

#nobulk_delete_limit ⇒ `Object`

default: yes Add –limit to –bulk-delete statement. This is an advanced option and you should not disable it unless you know what you are doing and why! By default, –bulk-delete appends a –limit clause to the bulk delete SQL statement. In certain cases, this clause can be omitted by specifying –no-bulk-delete-limit. –limit must still be specified.



52
53
54

# File 'lib/maatkit-ruby/mk-archiver.rb', line 52

def nobulk_delete_limit
  @nobulk_delete_limit
end

#optimize ⇒ `Object`

type: string Run OPTIMIZE TABLE afterwards on –source and/or –dest. Runs OPTIMIZE TABLE after finishing. See –analyze for the option syntax and dev.mysql.com/doc/en/optimize-table.html for details on OPTIMIZE TABLE.



219
220
221

# File 'lib/maatkit-ruby/mk-archiver.rb', line 219

def optimize
  @optimize
end

#password ⇒ `Object`

short form: -p; type: string Password to use when connecting.



224
225
226

# File 'lib/maatkit-ruby/mk-archiver.rb', line 224

def password
  @password
end

#path_to_mk_archiver ⇒ `Object`

Sets the executable path, otherwise the environment path will be used.



411
412
413

# File 'lib/maatkit-ruby/mk-archiver.rb', line 411

def path_to_mk_archiver
  @path_to_mk_archiver
end

#pid ⇒ `Object`

type: string Create the given PID file when daemonized. The file contains the process ID of the daemonized instance. The PID file is removed when the daemonized instance exits. The program checks for the existence of the PID file when starting; if it exists and the process with the matching PID exists, the program exits.



229
230
231

# File 'lib/maatkit-ruby/mk-archiver.rb', line 229

def pid
  @pid
end

#plugin ⇒ `Object`

type: string Perl module name to use as a generic plugin. Specify the Perl module name of a general-purpose plugin. It is currently used only for statistics (see –statistics) and must have new() and a statistics() method. The new( src = $src, dst => $dst, opts => $o )> method gets the source and destination DSNs, and their database connections, just like the connection-specific plugins do. It also gets an OptionParser object ($o) for accessing command-line options (example: $o-get(‘purge’);>). The statistics(%stats, $time) method gets a hashref of the statistics collected by the archiving job, and the time the whole job started.



237
238
239

# File 'lib/maatkit-ruby/mk-archiver.rb', line 237

def plugin
  @plugin
end

#port ⇒ `Object`

short form: -P; type: int Port number to use for connection.



242
243
244

# File 'lib/maatkit-ruby/mk-archiver.rb', line 242

def port
  @port
end

#primary_key_only ⇒ `Object`

Primary key columns only. A shortcut for specifying –columns with the primary key columns. This is an efficiency if you just want to purge rows; it avoids fetching the entire row, when only the primary key columns are needed for DELETE statements. See also –purge.



247
248
249

# File 'lib/maatkit-ruby/mk-archiver.rb', line 247

def primary_key_only
  @primary_key_only
end

#progress ⇒ `Object`

type: int Print progress information every X rows. Prints current time, elapsed time, and rows archived every X rows.



253
254
255

# File 'lib/maatkit-ruby/mk-archiver.rb', line 253

def progress
  @progress
end

#purge ⇒ `Object`

Purge instead of archiving; allows omitting –file and –dest. Allows archiving without a –file or –dest argument, which is effectively a purge since the rows are just deleted. If you just want to purge rows, consider specifying the table’s primary key columns with –primary-key-only. This will prevent fetching all columns from the server for no reason.



259
260
261

# File 'lib/maatkit-ruby/mk-archiver.rb', line 259

def purge
  @purge
end

#quick_delete ⇒ `Object`

Adds the QUICK modifier to DELETE statements. See dev.mysql.com/doc/en/delete.html for details. As stated in the documentation, in some cases it may be faster to use DELETE QUICK followed by OPTIMIZE TABLE. You can use –optimize for this.



264
265
266

# File 'lib/maatkit-ruby/mk-archiver.rb', line 264

def quick_delete
  @quick_delete
end

#quiet ⇒ `Object`

short form: -q Do not print any output, such as for –statistics. Suppresses normal output, including the output of –statistics, but doesn’t suppress the output from –why-quit.



270
271
272

# File 'lib/maatkit-ruby/mk-archiver.rb', line 270

def quiet
  @quiet
end

#replace ⇒ `Object`

Causes INSERTs into –dest to be written as REPLACE.



274
275
276

# File 'lib/maatkit-ruby/mk-archiver.rb', line 274

def replace
  @replace
end

#retries ⇒ `Object`

type: int; default: 1 Number of retries per timeout or deadlock. Specifies the number of times mk-archiver should retry when there is an InnoDB lock wait timeout or deadlock. When retries are exhausted, mk-archiver will exit with an error. Consider carefully what you want to happen when you are archiving between a mixture of transactional and non-transactional storage engines. The INSERT to –dest and DELETE from –source are on separate connections, so they do not actually participate in the same transaction even if they’re on the same server. However, mk-archiver implements simple distributed transactions in code, so commits and rollbacks should happen as desired across the two connections. At this time I have not written any code to handle errors with transactional storage engines other than InnoDB. Request that feature if you need it.



282
283
284

# File 'lib/maatkit-ruby/mk-archiver.rb', line 282

def retries
  @retries
end

#run_time ⇒ `Object`

type: time Time to run before exiting. Optional suffix s=seconds, m=minutes, h=hours, d=days; if no suffix, s is used.



288
289
290

# File 'lib/maatkit-ruby/mk-archiver.rb', line 288

def run_time
  @run_time
end

#safe_auto_increment ⇒ `Object`

default: yes Do not archive row with max AUTO_INCREMENT. Adds an extra WHERE clause to prevent mk-archiver from removing the newest row when ascending a single-column AUTO_INCREMENT key. This guards against re-using AUTO_INCREMENT values if the server restarts, and is enabled by default. The extra WHERE clause contains the maximum value of the auto-increment column as of the beginning of the archive or purge job. If new rows are inserted while mk-archiver is running, it will not see them.



295
296
297

# File 'lib/maatkit-ruby/mk-archiver.rb', line 295

def safe_auto_increment
  @safe_auto_increment
end

#sentinel ⇒ `Object`

type: string; default: /tmp/mk-archiver-sentinel Exit if this file exists. The presence of the file specified by –sentinel will cause mk-archiver to stop archiving and exit. The default is /tmp/mk-archiver-sentinel. You might find this handy to stop cron jobs gracefully if necessary. See also –stop.



301
302
303

# File 'lib/maatkit-ruby/mk-archiver.rb', line 301

def sentinel
  @sentinel
end

#set_vars ⇒ `Object`

type: string; default: wait_timeout=10000 Set these MySQL variables. Specify any variables you want to be set immediately after connecting to MySQL. These will be included in a SET command.



307
308
309

# File 'lib/maatkit-ruby/mk-archiver.rb', line 307

def set_vars
  @set_vars
end

#share_lock ⇒ `Object`

Adds the LOCK IN SHARE MODE modifier to SELECT statements. See dev.mysql.com/doc/en/innodb-locking-reads.html.



312
313
314

# File 'lib/maatkit-ruby/mk-archiver.rb', line 312

def share_lock
  @share_lock
end

#skip_foreign_key_checks ⇒ `Object`

Disables foreign key checks with SET FOREIGN_KEY_CHECKS=0.



316
317
318

# File 'lib/maatkit-ruby/mk-archiver.rb', line 316

def skip_foreign_key_checks
  @skip_foreign_key_checks
end

#sleep ⇒ `Object`

type: int Sleep time between fetches. Specifies how long to sleep between SELECT statements. Default is not to sleep at all. Transactions are NOT committed, and the –file file is NOT flushed, before sleeping. See –txn-size to control that. If –commit-each is specified, committing and flushing happens before sleeping.



323
324
325

# File 'lib/maatkit-ruby/mk-archiver.rb', line 323

def sleep
  @sleep
end

#sleep_coef ⇒ `Object`

type: float Calculate –sleep as a multiple of the last SELECT time. If this option is specified, mk-archiver will sleep for the query time of the last SELECT multiplied by the specified coefficient. This option is ignored if –sleep is specified. This is a slightly more sophisticated way to throttle the SELECTs: sleep a varying amount of time between each SELECT, depending on how long the SELECTs are taking.



330
331
332

# File 'lib/maatkit-ruby/mk-archiver.rb', line 330

def sleep_coef
  @sleep_coef
end

#socket ⇒ `Object`

short form: -S; type: string Socket file to use for connection.



335
336
337

# File 'lib/maatkit-ruby/mk-archiver.rb', line 335

def socket
  @socket
end

#source ⇒ `Object`

type: DSN DSN specifying the table to archive from (required). This argument is a DSN. See DSN OPTIONS for the syntax. Most options control how mk-archiver connects to MySQL, but there are some extended DSN options in this tool’s syntax. The D, t, and i options select a table to archive:

--source h=my_server,D=my_database,t=my_tbl

The a option specifies the database to set as the connection’s default with USE. If the b option is true, it disables binary logging with SQL_LOG_BIN. The m option specifies pluggable actions, which an external Perl module can provide. The only required part is the table; other parts may be read from various places in the environment (such as options files). The ‘i’ part deserves special mention. This tells mk-archiver which index it should scan to archive. This appears in a FORCE INDEX or USE INDEX hint in the SELECT statements used to fetch archivable rows. If you don’t specify anything, mk-archiver will auto-discover a good index, preferring a PRIMARY KEY if one exists. In my experience this usually works well, so most of the time you can probably just omit the ‘i’ part. The index is used to optimize repeated accesses to the table; mk-archiver remembers the last row it retrieves from each SELECT statement, and uses it to construct a WHERE clause, using the columns in the specified index, that should allow MySQL to start the next SELECT where the last one ended, rather than potentially scanning from the beginning of the table with each successive SELECT. If you are using external plugins, please see EXTENDING for a discussion of how they interact with ascending indexes. The ‘a’ and ‘b’ options allow you to control how statements flow through the binary log. If you specify the ‘b’ option, binary logging will be disabled on the specified connection. If you specify the ‘a’ option, the connection will USE the specified database, which you can use to prevent slaves from executing the binary log events with –replicate-ignore-db options. These two options can be used as different methods to achieve the same goal: archive data off the master, but leave it on the slave. For example, you can run a purge job on the master and prevent it from happening on the slave using your method of choice. WARNING: Using a default options file (F) DSN option that defines a socket for –source causes mk-archiver to connect to –dest using that socket unless another socket for –dest is specified. This means that mk-archiver may incorrectly connect to –source when it is meant to connect to –dest. For example:

--source F=host1.cnf,D=db,t=tbl --dest h=host2

When mk-archiver connects to –dest, host2, it will connect via the –source, host1, socket defined in host1.cnf.



348
349
350

# File 'lib/maatkit-ruby/mk-archiver.rb', line 348

def source
  @source
end

#statistics ⇒ `Object`

Collect and print timing statistics. Causes mk-archiver to collect timing statistics about what it does. These statistics are available to the plugin specified by –plugin Unless you specify –quiet, mk-archiver prints the statistics when it exits. The statistics look like this:

Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
Source: D=db,t=table
SELECT 4
INSERT 4
DELETE 4
Action# #  Count#    Time# # Pct
commit# # # 10#  0.1079#   88.27
select# # #  5#  0.0047#    3.87
deleting# #    4#  0.0028#    2.29
inserting# #   4#  0.0028#    2.28
other# # #   0#  0.0040#    3.29

The first two (or three) lines show times and the source and destination tables. The next three lines show how many rows were fetched, inserted, and deleted. The remaining lines show counts and timing. The columns are the action, the total number of times that action was timed, the total time it took, and the percent of the program’s total runtime. The rows are sorted in order of descending total time. The last row is the rest of the time not explicitly attributed to anything. Actions will vary depending on command-line options. If –why-quit is given, its behavior is changed slightly. This option causes it to print the reason for exiting even when it’s just because there are no more rows. This option requires the standard Time::HiRes module, which is part of core Perl on reasonably new Perl releases.



369
370
371

# File 'lib/maatkit-ruby/mk-archiver.rb', line 369

def statistics
  @statistics
end

#stop ⇒ `Object`

Stop running instances by creating the sentinel file. Causes mk-archiver to create the sentinel file specified by –sentinel and exit. This should have the effect of stopping all running instances which are watching the same sentinel file.



374
375
376

# File 'lib/maatkit-ruby/mk-archiver.rb', line 374

def stop
  @stop
end

#txn_size ⇒ `Object`

type: int; default: 1 Number of rows per transaction. Specifies the size, in number of rows, of each transaction. Zero disables transactions altogether. After mk-archiver processes this many rows, it commits both the –source and the –dest if given, and flushes the file given by –file. This parameter is critical to performance. If you are archiving from a live server, which for example is doing heavy OLTP work, you need to choose a good balance between transaction size and commit overhead. Larger transactions create the possibility of more lock contention and deadlocks, but smaller transactions cause more frequent commit overhead, which can be significant. To give an idea, on a small test set I worked with while writing mk-archiver, a value of 500 caused archiving to take about 2 seconds per 1000 rows on an otherwise quiet MySQL instance on my desktop machine, archiving to disk and to another table. Disabling transactions with a value of zero, which turns on autocommit, dropped performance to 38 seconds per thousand rows. If you are not archiving from or to a transactional storage engine, you may want to disable transactions so mk-archiver doesn’t try to commit.



382
383
384

# File 'lib/maatkit-ruby/mk-archiver.rb', line 382

def txn_size
  @txn_size
end

#user ⇒ `Object`

short form: -u; type: string User for login if not current user.



387
388
389

# File 'lib/maatkit-ruby/mk-archiver.rb', line 387

def user
  @user
end

#version ⇒ `Object`

Show version and exit.



391
392
393

# File 'lib/maatkit-ruby/mk-archiver.rb', line 391

def version
  @version
end

#where ⇒ `Object`

type: string WHERE clause to limit which rows to archive (required). Specifies a WHERE clause to limit which rows are archived. Do not include the word WHERE. You may need to quote the argument to prevent your shell from interpreting it. For example:

--where 'ts < current_date - interval 90 day'

For safety, –where is required. If you do not require a WHERE clause, use –where 1=1.



399
400
401

# File 'lib/maatkit-ruby/mk-archiver.rb', line 399

def where
  @where
end

#why_quit ⇒ `Object`

Print reason for exiting unless rows exhausted. Causes mk-archiver to print a message if it exits for any reason other than running out of rows to archive. This can be useful if you have a cron job with –run-time specified, for example, and you want to be sure mk-archiver is finishing before running out of time. If –statistics is given, the behavior is changed slightly. It will print the reason for exiting even when it’s just because there are no more rows. This output prints even if –quiet is given. That’s so you can put mk-archiver in a cron job and get an email if there’s an abnormal exit.



406
407
408

# File 'lib/maatkit-ruby/mk-archiver.rb', line 406

def why_quit
  @why_quit
end

Instance Method Details

#start(options = nil) ⇒ `Object`

Execute the command

# File 'lib/maatkit-ruby/mk-archiver.rb', line 422

def start(options = nil)
  tmp = Tempfile.new('tmp')
  command = option_string() + options.to_s + " 2> " + tmp.path
  success = system(command)
  if success
    begin
      while (line = tmp.readline)
        line.chomp
        selected_string = line
      end
    rescue EOFError
      tmp.close
    end
    return selected_string
  else
    tmp.close!
    return success
  end
end

Class: Maatkit::Archiver

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ Archiver

Instance Attribute Details

#analyze ⇒ Object

#ascend_first ⇒ Object

#ask_pass ⇒ Object

#buffer ⇒ Object

#bulk_delete ⇒ Object

#bulk_insert ⇒ Object

#charset ⇒ Object

#check_columns ⇒ Object

#check_interval ⇒ Object

#check_slave_lag ⇒ Object

#columns ⇒ Object

#commit_each ⇒ Object

#config ⇒ Object

#delayed_insert ⇒ Object

#dest ⇒ Object

#dry_run ⇒ Object

#file ⇒ Object

#for_update ⇒ Object

#header ⇒ Object

#help ⇒ Object

#high_priority_select ⇒ Object

#host ⇒ Object

#ignore ⇒ Object

#limit ⇒ Object

#local ⇒ Object

#low_priority_delete ⇒ Object

#low_priority_insert ⇒ Object

#max_lag ⇒ Object

#no_ascend ⇒ Object

#no_delete ⇒ Object

#nobulk_delete_limit ⇒ Object

#optimize ⇒ Object

#password ⇒ Object

#path_to_mk_archiver ⇒ Object

#pid ⇒ Object

#plugin ⇒ Object

#port ⇒ Object

#primary_key_only ⇒ Object

#progress ⇒ Object

#purge ⇒ Object

#quick_delete ⇒ Object

#quiet ⇒ Object

#replace ⇒ Object

#retries ⇒ Object

#run_time ⇒ Object

#safe_auto_increment ⇒ Object

#sentinel ⇒ Object

#set_vars ⇒ Object

#share_lock ⇒ Object

#skip_foreign_key_checks ⇒ Object

#sleep ⇒ Object

#sleep_coef ⇒ Object

#socket ⇒ Object

#source ⇒ Object

#statistics ⇒ Object

#stop ⇒ Object

#txn_size ⇒ Object

#user ⇒ Object

#version ⇒ Object

#where ⇒ Object

#why_quit ⇒ Object