Skip to content

Leveraging AI to stay in the know

intheknow

In today's fast-paced digital world, staying updated with the latest news from various sources can be overwhelming. To simplify this process, we show you behind the scenes walkthrough of one of the process accelerators our customers get access to where we use Python scripting to fetch and summarizes news articles with OpenAI to send these summaries to Slack for easy sharing. Here's a breakdown of how this works in three simple steps:

  1. Fetching News from Blog RSS Feeds: The script uses a customized list of Blogg RSS feed URLs from various tech and AI-focused websites like OpenAI, HuggingFace, Microsoft, and Salesforce, among others. It then reads these RSS feeds to extract recent articles published within a specified time frame, typically the last 24 hours. This is achieved using the feedparser library, which parses the RSS feed and the datetime module to filter articles based on their publication date.

  2. Summarizing Articles: For each article fetched, the script summarizes its content to extract the main points. This is done using a language model from OpenAI, which is accessed through the openai (OpenAI API account required) library. The script sends the article's content to the model, which then returns a concise summary highlighting the key information. This feature is particularly useful for quickly grasping the essence of an article without reading it in full.

  3. Integration with Slack: Once the articles are summarized, the script formats the news and summaries for each source and sends them to a specified Slack channel using a webhook URL. This integration allows team members to stay informed about the latest news directly within their Slack workspace, facilitating easy access and discussion.

This Python script is part of our library of process accelerators, providing customers with a solution for automating news aggregation, summarization, and sharing. By leveraging blog feeds, utilizing OpenAI language models for summarization, and integrating with Slack, the script offers a streamlined way to stay updated on the latest developments in any topic of interest. See the code below.

Subscribe to our content

 

Code:

from dotenv import load_dotenv
load_dotenv()

from datetime import datetime, timedelta
import pytz
import feedparser
import requests

URLS = {
'OpenAI': 'https://openai.com/blog/rss.xml',
'HuggingFace': 'https://huggingface.co/blog/rss.xml',
'Microsoft': 'https://news.microsoft.com/source/topics/ai/feed/',
'AWS': 'https://aws.amazon.com/blogs/machine-learning/feed/',
'Google': 'https://blog.google/technology/ai/rss/',
'Langchain': 'https://blog.langchain.dev/rss/',
'Apple': 'https://machinelearning.apple.com/rss.xml',
'Clarifai': 'https://www.clarifai.com/blog/rss.xml',
'Machine Learning Mastery': 'https://machinelearningmastery.com/blog/feed/',
'Salesforce Blog': 'https://www.salesforce.com/blog/feed/'
}

def read_rss(_url, days: int = 1):
feed = feedparser.parse(_url)
now = datetime.now(pytz.timezone('GMT'))
time_range = timedelta(days=days)

articles = []
for entry in feed.entries:
try:
entry_date = datetime.strptime(entry.published, "%a, %d %b %Y %H:%M:%S %z")
except ValueError:
entry_date = datetime.strptime(entry.published, "%a, %d %b %Y %H:%M:%S GMT")
entry_date = entry_date.replace(tzinfo=pytz.timezone('GMT'))
if now - entry_date <= time_range:
articles.append(entry)
return articles

from langchain_community.document_loaders import WebBaseLoader

def summarize_article(url: str):
loader = WebBaseLoader(url)
data = loader.load()
return chat_completions_article_summary(data[0].page_content)

import openai

client = openai.OpenAI()

def chat_completions_article_summary(web_article: str):
prompt = f"""From the long article below, extract the three main points
of the article below in bullet points.
Ignore the parts of the text that do not contain the article.
Ignore any HTML code snippets you find:
{web_article}
"""
role_content = """You are a professional news summarizer. Write in
easy to understand terms the main ideas of the provided article"""

try:
response = client.chat.completions.create(
model='gpt-3.5-turbo-1106',
messages=[
{"role": "system", "content": f"{role_content}"},
{"role": "user", "content": prompt}
]
)
completion_text = response.choices[0].message.content
except Exception as e:
completion_text = f"Error occurred: {str(e)}"
return completion_text

def read_news(days):
news = {}
for key, url in URLS.items():
news[key] = read_rss(url, days)
return news

def load_posted_urls():
try:
with open('posted_urls.txt', 'r') as file:
return set(file.read().splitlines())
except FileNotFoundError:
return set()

def save_posted_urls(urls):
with open('posted_urls.txt', 'w') as file:
for url in urls:
file.write(url + '\n')

def format_news(report: bool = False, days: int = 1):
entries = read_news(days=days)
posted_urls = load_posted_urls()

text = ""
for company, articles in entries.items():
if articles:
company_text = f"*{company} news*:\n"
for article in articles:
if article.link not in posted_urls:
company_text += f"<{article.link}|{article.title}>\n"
if report:
summary = f"\n{summarize_article(url=article.link)}"
else:
summary = chat_completions_article_summary(article.summary)
company_text += f"Summary: {summary}\n"
posted_urls.add(article.link)
company_text += f"\n\n"
text += company_text
save_posted_urls(posted_urls)
return text

def send_to_slack(message: str, webhook_url: str):
payload = {'text': message}
response = requests.post(webhook_url, json=payload)
return response.status_code

if __name__ == '__main__':
formatted_news = format_news(report=False, days=1)
slack_webhook_url = "YOUR SLACK WEBHOOK" # Replace with your Slack webhook URL
send_to_slack(formatted_news, slack_webhook_url)