First part of a series (I plan to write) with a brief overview about how to search different things on the computer. This post focuses on how to search files on disk and regular expressions (regexp) in files.
Finding files on your computer
Many time you need to find files on the computer disk. In the terminal one can use find
or locate=
or similiar tool. Another frequent usecase is to list and filter files in e.g. matlab or python code.
ls
A simple way to filter out some files is with shell wildcards/globbing, e.g.
ls *.pdf
ls **/*.pdf
find
The goto tool is find
, which searches files recursively in all subdirectories. To search for a certain file name, you just have to input a directory and a regexp.
find <dir> -name <regexp>
for example:
find . -name "*.txt" # searches all txt files in the current working directory "."
There are many filter you can search for as well. Here are some:
find <dir> -iname <pattern> # ignore case
find <dir> -atime 1 # access time: last time file was opened, exacly one day ago
find <dir> -mtime -7 # modification time: last time file was modified, less than a week ago
find <dir> -ctime +31 # change time: last time a file's meta-data changed, more than 31 days ago
# same as atime, mtime, ctime but with minutes as unit
find <dir> -amin 1
find <dir> -mmin -7
find <dir> -cmin +31
find <dir> -anewer <file> # newer in access time (in days) as file
find <dir> -mnewer <file> # same for modification time
find <dir> -cnewer <file> # same for change time
find <dir> -newermt "date" # modification time today
find <dir> -newermt "yyyy-mm-dd" # modification "yyyy-mm-dd"
find <dir> -type f # file
find <dir> -type d # directory
find <dir> -type l # link
find <dir> -size 50k # file size exacly 50 KB
find <dir> -size -3M # file size less than 3 MB
find <dir> -size +4G # file size more than 4 GB
find <dir> -user <user> # file belongs to user
find <dir> -group <groupname> # file belongs to group
find <dir> -perm 755 # file permission ocal number
find <dir> -maxdepth 3 # maximal subdirectory depth
find <dir> -maxdepth 2 # minimum subdirectory depth
locate
locate
is another tool to find files, which is a bit faster (because it uses some sort of database of all files on the file system). If you only look for files by name you can use
locate <file>
python’s os.walk
Most often I use os.walk
in a list comprehension to first list desired files to later loop over them. Here is a simple example
import os
= 'path/to/data/files'
datadir = [os.path.join(r, f) for f, d, fs in os.walk(datadir)
files for f in fs
if f.endswith('.nii')] # find all nifti files recursively
and one with regular expression matching
import os
import re
= 'path/to/data/files'
datadir = [os.path.join(r, f) for f, d, fs in os.walk(datadir)
files for f in fs
if re.match('<regex>', f)] # find all files whose names match <regex>
regexpdir
in matlab
For the same usecase in matlab, I came across the package regexpdir
, which does the same as the above python code.
files = regexpdir(datadir, '<regexp>');
Finding regular expressions in files
Given the task to find a string in some files the normal goto tool is grep
and its derivatives.
grep
To find a specific pattern in a text file, grep is the tool of choice. Just enter the desired pattern and the file(s).
grep <pattern> <file1> <file2> ...
To use it with a proper regexp its
grep -e <regexp> <files>
and there are many options (see “$ man grep”), here are some I use more often:
grep -o <pattern> <file> # only show matching parts instead of lines
grep -n <pattern> <files> # show line numbers
grep -i <pattern> <files> # ignore case
grep -I <pattern> <files> # ignore binary files
grep -r <pattern> <directory> # recursively search in directory
There are also two other derivates of the grep tool that I always mix up: egrep
and fgrep
. Apparently fgrep
is faster but can only search for fixed patterns and egrep
is like
grep -E <regexp> <files>
which handles extended regular expressions, something like (see “$ man reformat”). The differences between the base grep, egrep, and fgrep are described here: Difference Between Egrep and Fgrep | Difference Between
ag
(the silver searcher)
Another tool very similiar to grep is the silver searcher. To search a whole path one needs to type:
ag <pattern> <dir>
Available options are almost the same as for grep
, but ag is much faster, especially for large project directories, because ag
is smart and neglects files in e.g. “.gitignore” and can run several threads (GitHub - ggreer/thesilversearcher: A code-searching tool similar to ack, bu…).
TODO To be continued …
Next I try to cover how to search stuff in git repositories.