Skip to main content

How to print table of contents of a pdf? [Resolved]

I have a pdf kinda-book file which has a table of contents as metadata in file but they are not listed on any page of the document. I want to print the file with table of contents, or print the table of contents separately. How can I do that?


Question Credit: CrabMan
Question Reference
Asked January 11, 2019
Tags: pdf
Posted Under: Unix Linux
55 views
1 Answers

pdftk can dump out the "bookmarks" with, e.g., pdftk file.pdf dump_data_utf8; you'll get a bunch of Bookmark* entries buried in the rest of the metadata. grep can give just them:

$ pdftk whatever.pdf dump_data_utf8 | grep ^Bookmark
BookmarkBegin
BookmarkTitle: Cover
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: Agenda
BookmarkLevel: 1
BookmarkPageNumber: 2

The "level" is the indentation level (so a level 2 is indented from a level 1). You can format that into whatever format you want for printing.

Here is a Perl script to print it in LaTeX format, which can then be fed to e.g., pdflatex to get a PDF file (which you could even use pdftk to prepend to your original PDF). Note this is also available at https://gitlab.com/derobert/random-toys/blob/master/pdf/pdftoc-to-latex (which is a good place to send pull requests if you want to improve it):

#!/usr/bin/perl
use 5.024;
use strict;
use warnings qw(all);
use IPC::Run3;
use LaTeX::Encode;
use Encode qw(decode);

my @levels
    = qw(chapter section subsection subsubsection paragraph subparagraph);
my @counters;

my ($data_enc, $data);
run3 ['pdftk', $ARGV[0], 'dump_data_utf8'], undef, \$data_enc;
$data = decode('UTF-8', $data_enc, Encode::FB_CROAK);

my @latex_bm;

my $bm;
foreach (split(/\n/, $data)) {
    /^Bookmark/ or next;
    if (/^BookmarkBegin$/) {
        add_latex_bm($bm) if $bm;
        $bm = {};
    } elsif (/^BookmarkLevel: (\d+)$/a) {
        ++$counters[$1 - 1];
        $#counters = $1 - 1;
        $bm->{number} = join(q{.}, @counters);
        $bm->{level} = $1 - 1;
    } elsif (/^BookmarkTitle: (.+)$/) {
        $bm->{title} = latex_encode($1);
    } elsif (/^BookmarkPageNumber: (\d+)$/a) {
        $bm->{page} = $1;
    } else {
        die "Unknown Bookmark tag in $_\n";
    }
}
add_latex_bm($bm) if $bm;

print <<LATEX;
\\documentclass{report}
\\begin{document}
${ \join('', @latex_bm) }
\\end{document}
LATEX

exit 0;

sub add_latex_bm {
    my $bm     = shift;
    my $level  = $levels[$bm->{level}];
    my $number = $bm->{number};
    my $title  = $bm->{title};
    my $page   = $bm->{page};

    push @latex_bm, <<LINE;
\\contentsline {$level}{\\numberline {$number}$title}{$page}%
LINE
}

Here is how to use this script:

  1. Download https://gitlab.com/derobert/random-toys/raw/master/pdf/pdftoc-to-latex?inline=false and save as pdftoc-to-latex.pl
  2. Make it executable by running chmod +x /path/to/pdftoc-to-latex.pl in the terminal
  3. Install Latex::Encode perl package. On Debian Stretch you can do so via sudo apt install liblatex-encode-perl. On other distros you will probably need to do something else.
  4. Run the script like this: /path/to/pdftoc-to-latex.pl /path/to/pdf/file.pdf > /path/to/where/you/want/tex/file.tex
  5. Compile the resulting tex file to pdf with your favorite LaTeX compiler

credit: CrabMan
Answered January 11, 2019
Your Answer