How to scrape articles from website

Lerekoqholosha
Aug 10, 2021

Hello everyone In this article we will be looking at how to scrape articles from website using python package called newspaper.

Thanks everyone let’s get started this a short article we will be using my medium story to show you how this Is done feel free to use It also for you nlp project.

Install newspaper3k library

! pip install newspaper3k

Article

from newspaper import Articleurl = 'https://lerekoqholosha9.medium.com/data-preprocessing-with-pandas-23728a06cec5'article=Article(url)article.download()print(article.html)

Parse article

article.parse()article.authors
Author name
article.title
article title
article.publish_date
date
article.tags
tags
print(article.text)
print(article.top_image)
image

NLP

import nltknltk.download('punkt')article.nlp()
nltk
article.keywords
keywords
print(article.summary)
summary of article

Conclusion

we have finally scrape the article with python you can look at the documentation for more features of this package.

Documentation GitHub

Thank you for reading.

Please let me know if you have any feedback.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Lerekoqholosha
Lerekoqholosha

Written by Lerekoqholosha

I am a data scientist with 3 years of experience working with Python.

No responses yet

Write a response

Recommended from Medium

Lists

See more recommendations