Saturday, December 5, 2020

Extract data from PDF in Debian

Every one of us in our lifetime have encountered a PDF file from which we need to extract the texts from it. There are commercial solutions and some free solutions available in the web. In Debian for most of the time there is an application called Ghostscript that would have been installed by default. By using Ghostscript we can extract our textual data from PDF with no hassle. If it's not installed you can use synaptic package manager to install it or you can use the command line to install. 

apt-get install ghostscript

Now open your terminal and type the following command

/usr/bin/gs  -sDEVICE=txtwrite -o output.txt input.pdf

That's it. You can now check the output.txt in the working directory to check your data.

Reference:

https://stackoverflow.com/questions/3650957/how-to-extract-text-from-a-pdf


No comments:

Post a Comment

Popular Posts