How To Read PDF In Python

Lerekoqholosha
2 min readOct 15, 2021
Unsplash Image by :Jonathan Simcoe

PDF is one of the widely used file formats for sharing data digitally.

So reading a pdf file using python language would be more interesting.

In this tutorial we will learn how to read pdf files using python.

!pip install PyPDF2# importing required modulesimport PyPDF2# creating a pdf file objectdata = open('/content/beginner-guide-web-scraping-beautiful-soup-python.pdf', 'rb')# creating a pdf reader objectreader = PyPDF2.PdfFileReader(data)# printing number of pages in pdf fileprint(reader.numPages)8

create a page object:

# creating a page objectpage=  reader.getPage(1)

extracting text:

# extracting text from pageprint(page.extractText())# closing the pdf file objectdata.close()''

Working Text File In Python

text_data= open("/content/demofile.txt", "r")print(text_data.read())text_data.close()Hello! Welcome to my page please follow me!!

Working with csv file

#import libraries
import pandas as pd
import numpy as np

Read Data:

data=pd.read_csv('/content/Climate.csv')

Show data:

climate change data
data.columnsIndex(['Unnamed: 0', 'Tweets', 'Date', 'len', 'ID', 'Source', 'Likes', 'RTs'], dtype='object')

shape of dataset:

data.shape(197, 8)

Summary Statistics:

data.describe()
Summary of dataset

Missing values:

data.isnull().sum()
data.isnull().sum()

Summary

So this is how you can use python to read different files. I hope you liked this article on a tutorial reading pdf file with python. Feel free to ask your valuable questions in the comments section below:).

[PyPDF2]

GoogleColab

--

--

Lerekoqholosha

I am a data scientist with 1 year of experience working with Python.