How to effetively search stuff on your computer, Part I

Posted on August 3, 2016
Tags: howto, bash

First part of a series (I plan to write) with a brief overview about how to search different things on the computer. This post focuses on how to search files on disk and regular expressions (regexp) in files.

Finding files on your computer

Many time you need to find files on the computer disk. In the terminal one can use find or locate= or similiar tool. Another frequent usecase is to list and filter files in e.g. matlab or python code.

ls

A simple way to filter out some files is with shell wildcards/globbing, e.g.

ls *.pdf
ls **/*.pdf

find

The goto tool is find, which searches files recursively in all subdirectories. To search for a certain file name, you just have to input a directory and a regexp.

find <dir> -name <regexp>

for example:

find . -name "*.txt"            # searches all txt files in the current working directory "."

There are many filter you can search for as well. Here are some:

find <dir> -iname <pattern>     # ignore case

find <dir> -atime 1             # access time: last time file was opened, exacly one day ago
find <dir> -mtime -7            # modification time: last time file was modified, less than a week ago
find <dir> -ctime +31           # change time: last time a file's meta-data changed, more than 31 days ago

# same as atime, mtime, ctime but with minutes as unit
find <dir> -amin 1
find <dir> -mmin -7
find <dir> -cmin +31

find <dir> -anewer <file>        # newer in access time (in days) as file
find <dir> -mnewer <file>        # same for modification time
find <dir> -cnewer <file>        # same for change time
find <dir> -newermt "date"       # modification time today
find <dir> -newermt "yyyy-mm-dd" # modification "yyyy-mm-dd"

find <dir> -type f              # file
find <dir> -type d              # directory
find <dir> -type l              # link

find <dir> -size 50k            # file size exacly 50 KB
find <dir> -size -3M            # file size less than 3 MB
find <dir> -size +4G            # file size more than 4 GB

find <dir> -user <user>         # file belongs to user
find <dir> -group <groupname>   # file belongs to group
find <dir> -perm 755            # file permission ocal number

find <dir> -maxdepth 3          # maximal subdirectory depth
find <dir> -maxdepth 2          # minimum subdirectory depth

locate

locate is another tool to find files, which is a bit faster (because it uses some sort of database of all files on the file system). If you only look for files by name you can use

locate <file>

python’s os.walk

Most often I use os.walk in a list comprehension to first list desired files to later loop over them. Here is a simple example

import os

datadir = 'path/to/data/files'
files = [os.path.join(r, f) for f, d, fs in os.walk(datadir)
         for f in fs
         if f.endswith('.nii')]  # find all nifti files recursively

and one with regular expression matching

import os
import re

datadir = 'path/to/data/files'
files = [os.path.join(r, f) for f, d, fs in os.walk(datadir)
         for f in fs
         if re.match('<regex>', f)]  # find all files whose names match <regex>

regexpdir in matlab

For the same usecase in matlab, I came across the package regexpdir, which does the same as the above python code.

files = regexpdir(datadir, '<regexp>');

Finding regular expressions in files

Given the task to find a string in some files the normal goto tool is grep and its derivatives.

grep

To find a specific pattern in a text file, grep is the tool of choice. Just enter the desired pattern and the file(s).

grep <pattern> <file1> <file2> ...

To use it with a proper regexp its

grep -e <regexp> <files>

and there are many options (see “$ man grep”), here are some I use more often:

grep -o <pattern> <file>        # only show matching parts instead of lines
grep -n <pattern> <files>       # show line numbers
grep -i <pattern> <files>       # ignore case
grep -I <pattern> <files>       # ignore binary files
grep -r <pattern> <directory>   # recursively search in directory

There are also two other derivates of the grep tool that I always mix up: egrep and fgrep. Apparently fgrep is faster but can only search for fixed patterns and egrep is like

grep -E <regexp> <files>

which handles extended regular expressions, something like (see “$ man reformat”). The differences between the base grep, egrep, and fgrep are described here: Difference Between Egrep and Fgrep | Difference Between

ag (the silver searcher)

Another tool very similiar to grep is the silver searcher. To search a whole path one needs to type:

ag <pattern> <dir>

Available options are almost the same as for grep, but ag is much faster, especially for large project directories, because ag is smart and neglects files in e.g. “.gitignore” and can run several threads (GitHub - ggreer/thesilversearcher: A code-searching tool similar to ack, bu…).

TODO To be continued …

Next I try to cover how to search stuff in git repositories.