How to scrape articles from website

Lerekoqholosha
Aug 10, 2021

Hello everyone In this article we will be looking at how to scrape articles from website using python package called newspaper.

Thanks everyone let’s get started this a short article we will be using my medium story to show you how this Is done feel free to use It also for you nlp project.

Install newspaper3k library

! pip install newspaper3k

Article

from newspaper import Articleurl = 'https://lerekoqholosha9.medium.com/data-preprocessing-with-pandas-23728a06cec5'article=Article(url)article.download()print(article.html)

Parse article

article.parse()article.authors
Author name
article.title
article title
article.publish_date
date
article.tags
tags
print(article.text)
print(article.top_image)
image

NLP

import nltknltk.download('punkt')article.nlp()
nltk
article.keywords
keywords
print(article.summary)
summary of article

Conclusion

we have finally scrape the article with python you can look at the documentation for more features of this package.

Documentation GitHub

Thank you for reading.

Please let me know if you have any feedback.

--

--

Lerekoqholosha

I am a data scientist with 1 year of experience working with Python.