How to rename PDF files of scientific articles automatically according to bibtex information

Posted on February 11, 2023
Tags: howto, python

I mostly read scientific papers or technical articles, which I often download as pdf from the arxiv or journal websites. Most pdf files are not descriptively named, so I always end up with folders of many cryptically named files.

To automatically rename the pdf files according to the bibtex information, authors, year, title, journal, …., I tried to use the emacs org-ref functionalities in the past and also wrote my own “bibtex-helper (hbib)” tool in Haskell. However, I always wanted a simpler commandline tool for this job. Some day I found the python program pdf-renamer, which uses two other packages, pdf2doi and pdf2bib, by the same author to query for reference information online and create bibtex entries.

The options for the renaming format was not freely customizable to my liking, but the underlying libraries are great to use in python scripts, so one can quickly write the desired functionality oneself.

I just installed the package via pip

pip install pdf2bib

… made the following script executable ($ chmod +x …) and copied it to ~/.local/bin/pdfrename to have it in the PATH.

#!/usr/bin/env ipython

import argparse
from pathlib import Path
import pdf2bib


parser = argparse.ArgumentParser(
    prog="pdfrename",
    description='Auto rename PDF files into "{FirstAuthorLastname}{Year}_{Title}.pdf"',
    epilog="Based on Michele Cotrufo's pdf2doi, pdf2bib, and pdf-renamer",
)
parser.add_argument("filename")


def main():
    args = parser.parse_args()
    filename = args.filename

    result = pdf2bib.pdf2bib_singlefile(filename)
    metadata = result["metadata"]

    new_filename = f"{metadata['author'][0]['family'].lower()}{metadata['year']}_{'-'.join([w.lower() for w in metadata['title'].split()])}.pdf"

    print(filename)
    print("-->")
    print(new_filename)

    Path(filename).rename(new_filename)
    return


if __name__ == "__main__":
    main()