Python – Reddit PRAW API : Extracting entire JSON format

Reddit PRAW API : Extracting entire JSON format… here is a solution to the problem.

Reddit PRAW API : Extracting entire JSON format

I’m using The Reddit API Praw for sentiment analysis. My code is as follows:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import praw
from IPython import display
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
from pprint import pprint
import pandas as pd
import nltk
import seaborn as sns
import datetime

sns.set(style='darkgrid', context='talk', palette='Dark2')

reddit = praw. Reddit(client_id='XXXXXXXXXXX',
                     client_secret='XXXXXXXXXXXXXXXXXXX',
                     user_agent='StackOverflow')

headlines = set()
results = []
sia = SIA()

for submission in reddit.subreddit('bitcoin').new(limit=None):
    pol_score = sia.polarity_scores(submission.title)
    pol_score['headline'] = submission.title
    readable = datetime.datetime.fromtimestamp(submission.created_utc).isoformat()
    results.append((submission.title, readable, pol_score["compound"]))
    display.clear_output()

Question A: With this code, I can only extract the title of the text and a few other keys. I wanted to extract everything in JSON format, but researched the documentation I haven’t seen yet to see if it’s possible.

If I just call submission in reddit.subreddit(‘bitcoin’), the result will only show the ID code. I want to extract all the information and save it in a JSON file.

Question B: How do I extract comments/messages for a specific date?

Solution

Question 1:

You can simply add .json to the end of the full URL of the post to get the full JSON of that page, which includes the title, author, comments, votes, and everything else.

Use submission.permalink to get the full URL of the post. You can use requests to get the JSON for that page.

import requests

url = submission.permalink
response = requests.get('http' + url + '.json') 
json = response.content # your Json

Question B:

Unfortunately, Reddit removed timestamp search from their search API sometime last year. It’s a announcement post about it.

Besides some minor syntax differences, the most notable change is that searches by exact timestamp are no longer supported on the newer system. Limiting results to the past hour, day, week, month and year is still supported via the ?t= parameter (e.g. ?t=day)

Therefore, this cannot be done using Praw at this time. But you can take a look at the the Pushshift api, which provides this functionality.

Related Problems and Solutions