Module: CodeZauker

Defined in:
lib/code_zauker.rb,
lib/code_zauker/cli.rb,
lib/code_zauker/version.rb,
lib/code_zauker/constants.rb

Overview

This module implements a simple reverse indexer based on Redis The idea is ispired by swtch.com/~rsc/regexp/regexp4.html

Defined Under Namespace

Classes: CliUtil, FileScanner, IndexManager, Util

Constant Summary collapse

GRAM_SIZE =
3
SPACE_GUY =
" "*GRAM_SIZE
VERSION =
"0.1.0"
DB_VERSION =
1
MAX_PUSH_TRIGRAM_RETRIES =

Under Amazon AWS, a lot of timeout can happen. We put a higer retry here

15
TRIGRAM_DEFAULT_PUSH_SIZE =

Stats It is difficult to decide what is the best trigram push size. a larger one ensure a best in memory processing but can lead to longer transactions 6000 Ehuristic value used for historical reasons

6000
DEFAULT_EXCLUDED_EXTENSION =
[
# Documents                             
".xps",
".zip",".7z","rar",
# MS Office zip-like files...
".pptx",".docx",".xlsx",
# MS Visual Studio big bad files"
".scc",".datasource",".pdb","vspscc",".settings",
#"Telerik.Web.UI.xml",
".Web.UI.xml",
# Auto-generated stuff...is suggested to be avoided
".designer.cs",
# Avoid slurping text document too...
".doc",
".ppt",".xls",".rtf",".vsd", ".odf",
# Binary bad stuff
".dll",".exe",".out",".elf",".lib",".so",
# Redis db
".rdb",
# Ruby and java stuff-like
".gem",
".jar",".class",".ear",".war",
".mar",
".tar",
".gz",".Z",
".dropbox",
".svn-base",".cache", 
#IDE STUFF
".wlwLock",
# Music exclusion
".mp3",".mp4",".wav",
# Image exclusion
".png",".gif",".jpg",".bmp",
# Temp stuff and logs
".tmp","~",".log",".bar",
# Oracle exports...
".exp"
]