First part of a series (I plan to write) with a brief overview about how to search different things on the computer. This post focuses on how to search files on disk and regular expressions (regexp) in files.
Finding files on your computer
Many time you need to find files on the computer disk. In the terminal one can use
locate= or similiar tool. Another frequent usecase is to list and filter files in e.g. matlab or python code.
A simple way to filter out some files is with shell wildcards/globbing, e.g.
ls *.pdf ls **/*.pdf
The goto tool is
find, which searches files recursively in all subdirectories. To search for a certain file name, you just have to input a directory and a regexp.
find <dir> -name <regexp>
find . -name "*.txt" # searches all txt files in the current working directory "."
There are many filter you can search for as well. Here are some:
find <dir> -iname <pattern> # ignore case find <dir> -atime 1 # access time: last time file was opened, exacly one day ago find <dir> -mtime -7 # modification time: last time file was modified, less than a week ago find <dir> -ctime +31 # change time: last time a file's meta-data changed, more than 31 days ago # same as atime, mtime, ctime but with minutes as unit find <dir> -amin 1 find <dir> -mmin -7 find <dir> -cmin +31 find <dir> -anewer <file> # newer in access time (in days) as file find <dir> -mnewer <file> # same for modification time find <dir> -cnewer <file> # same for change time find <dir> -newermt "date" # modification time today find <dir> -newermt "yyyy-mm-dd" # modification "yyyy-mm-dd" find <dir> -type f # file find <dir> -type d # directory find <dir> -type l # link find <dir> -size 50k # file size exacly 50 KB find <dir> -size -3M # file size less than 3 MB find <dir> -size +4G # file size more than 4 GB find <dir> -user <user> # file belongs to user find <dir> -group <groupname> # file belongs to group find <dir> -perm 755 # file permission ocal number find <dir> -maxdepth 3 # maximal subdirectory depth find <dir> -maxdepth 2 # minimum subdirectory depth
locate is another tool to find files, which is a bit faster (because it uses some sort of database of all files on the file system). If you only look for files by name you can use
Most often I use
os.walk in a list comprehension to first list desired files to later loop over them. Here is a simple example
import os = 'path/to/data/files' datadir = [os.path.join(r, f) for f, d, fs in os.walk(datadir) files for f in fs if f.endswith('.nii')] # find all nifti files recursively
and one with regular expression matching
import os import re = 'path/to/data/files' datadir = [os.path.join(r, f) for f, d, fs in os.walk(datadir) files for f in fs if re.match('<regex>', f)] # find all files whose names match <regex>
regexpdir in matlab
For the same usecase in matlab, I came across the package
regexpdir, which does the same as the above python code.
files = regexpdir(datadir, '<regexp>');
Finding regular expressions in files
Given the task to find a string in some files the normal goto tool is
grep and its derivatives.
To find a specific pattern in a text file, grep is the tool of choice. Just enter the desired pattern and the file(s).
grep <pattern> <file1> <file2> ...
To use it with a proper regexp its
grep -e <regexp> <files>
and there are many options (see “$ man grep”), here are some I use more often:
grep -o <pattern> <file> # only show matching parts instead of lines grep -n <pattern> <files> # show line numbers grep -i <pattern> <files> # ignore case grep -I <pattern> <files> # ignore binary files grep -r <pattern> <directory> # recursively search in directory
There are also two other derivates of the grep tool that I always mix up:
fgrep is faster but can only search for fixed patterns and
egrep is like
grep -E <regexp> <files>
which handles extended regular expressions, something like (see “$ man reformat”). The differences between the base grep, egrep, and fgrep are described here: Difference Between Egrep and Fgrep | Difference Between
ag (the silver searcher)
Another tool very similiar to grep is the silver searcher. To search a whole path one needs to type:
ag <pattern> <dir>
Available options are almost the same as for
grep, but ag is much faster, especially for large project directories, because
ag is smart and neglects files in e.g. “.gitignore” and can run several threads (GitHub - ggreer/thesilversearcher: A code-searching tool similar to ack, bu…).
TODO To be continued …
Next I try to cover how to search stuff in git repositories.