Module: CodeZauker
- Defined in:
- lib/code_zauker.rb,
lib/code_zauker/cli.rb,
lib/code_zauker/version.rb,
lib/code_zauker/constants.rb
Overview
This module implements a simple reverse indexer based on Redis The idea is ispired by swtch.com/~rsc/regexp/regexp4.html
Defined Under Namespace
Classes: CliUtil, FileScanner, IndexManager, Util
Constant Summary collapse
- GRAM_SIZE =
3
- SPACE_GUY =
" "*GRAM_SIZE
- VERSION =
"0.1.0"
- DB_VERSION =
1
- MAX_PUSH_TRIGRAM_RETRIES =
Under Amazon AWS, a lot of timeout can happen. We put a higer retry here
15
- TRIGRAM_DEFAULT_PUSH_SIZE =
Stats It is difficult to decide what is the best trigram push size. a larger one ensure a best in memory processing but can lead to longer transactions 6000 Ehuristic value used for historical reasons
6000
- DEFAULT_EXCLUDED_EXTENSION =
[ # Documents ".xps", ".zip",".7z","rar", # MS Office zip-like files... ".pptx",".docx",".xlsx", # MS Visual Studio big bad files" ".scc",".datasource",".pdb","vspscc",".settings", #"Telerik.Web.UI.xml", ".Web.UI.xml", # Auto-generated stuff...is suggested to be avoided ".designer.cs", # Avoid slurping text document too... ".doc", ".ppt",".xls",".rtf",".vsd", ".odf", # Binary bad stuff ".dll",".exe",".out",".elf",".lib",".so", # Redis db ".rdb", # Ruby and java stuff-like ".gem", ".jar",".class",".ear",".war", ".mar", ".tar", ".gz",".Z", ".dropbox", ".svn-base",".cache", #IDE STUFF ".wlwLock", # Music exclusion ".mp3",".mp4",".wav", # Image exclusion ".png",".gif",".jpg",".bmp", # Temp stuff and logs ".tmp","~",".log",".bar", # Oracle exports... ".exp" ]