Why do these two APIs (twitter geo/search API) return different result sets?… here is a solution to the problem.

Why do these two APIs (twitter geo/search API) return different result sets?

I’m getting tweets from a specific region, but I’m getting a very different set of results. The first method is to give the longitude and latitude within a given radius. These are the longitude and latitude within the city (Lahore, PK) with a radius of 5 km. 5 km is a small part of the city. In this way, I get about 60,000 tweets a day.

Method 1

import tweepy
consumer_key= 'xxxxxxxxxxxxxx'
consumer_secret= 'xxxxxxxxxxxxx'
access_token='xxxxxxxxxxxxxxx'
access_token_secret='xxxxxxxxxxxxxxxxxxxx'
api = tweepy. API(auth,wait_on_rate_limit = Truewait_on_rate_limit_notify= True)
public_tweets = tweepy. Cursor(api.search, count=100, geocode="31.578871,74.305184,5km",since="2018-06-09",show_user = True,tweet_mode="extended").items()
for tweet in public_tweets:
    print(tweet.full_text)

In the second method, I use the Twitter Geo Search API, by querying Lahore, granularity=”city”. Now I’m getting tweets from the whole city. But now I only receive 1200 tweets a day. I also extracted data from the last 7 days, but only got 15,000 tweets. This is a very big difference, the whole city only gives me 1200 tweets, while a small part of the same city gives me more than 60,000 tweets. I also print the place ID to verify that the polygons I got are accurate. These are polygons (
74.4493870, 31.4512220
74.4493870, 31.6124170
74.2675860, 31.6124170
74.2675860, 31.4512220) I checked these on https://www.keene.edu/. Yes, these are exact polygons of the city of Lahore.

Method 2

import tweepy
consumer_key= 'xxxxxxxxxxxxxx'
consumer_secret= 'xxxxxxxxxxxxx'
access_token='xxxxxxxxxxxxxxx'
access_token_secret='xxxxxxxxxxxxxxxxxxxx'
api = tweepy. API(auth,wait_on_rate_limit = Truewait_on_rate_limit_notify= True)

places = api.geo_search(query="Lahore", granularity="city")

for place in places:    
    print("placeid:%s" % place)
public_tweets = tweepy. Cursor(api.search, count=100,q="place:%s" % place.id,since="2018-06-09",show_user = True,tweet_mode="extended").items()
for tweet in public_tweets:
    print(tweet.full_text)

Now let’s talk about why the results are so different. I’m using the standard API version.

Second, tell me how these (APIs) get tweets. Because less than 1% of Tweets are geotagged, and not every user on the profile provides the exact city and country. Some users mentioned Mars and Earth, among others. So how do these APIs work to get tweets for a specific region. Search within a radius or by querying for a city/country. I looked into the twitter api documentation and the tweepy docs to look into how these APIs work in the background to collect tweets for specific regions, but I didn’t find any useful Material.

Solution

The reason why the first method has more results is that if the tweet doesn’t have any geographic information, then a search using geocoding will return to the profile (as you’ve already guessed) and try to parse it to latitude/longitude.

See the documentation here:

https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators.html

Geolocalization: the search operator “near” isn’t available in the
API, but there is a more precise way to restrict your query by a given
location using the geocode parameter specified with the template
“latitude,longitude,radius”, for example, “37.781157,-122.398720,1mi”.
When conducting geo searches, the search API will first attempt to
find Tweets which have lat/long within the queried geocode, and in
case of not having success, it will attempt to find Tweets created by
users whose profile location can be reverse geocoded into a lat/long
within the queried geocode, meaning that is possible to receive Tweets
which do not include lat/long information.

On the other hand, using place_id search seems to be looking for that exact location. Here is the basic API call syntax:
https://developer.twitter.com/en/docs/tweets/search/guides/tweets-by-place

The Place API works very differently than latitude/longitude in geocoding. The following pages clarify the differences between the two types of location data that can be associated with Tweets:

https://developer.twitter.com/en/docs/tutorials/filtering-tweets-by-location

Tweet-specific location information falls into two general categories:
Tweets with a specific latitude/longitude “Point” coordinate
Tweets with a Twitter “Place” (see our blog post on Twitter Places: More Context For Your Tweets and our documentation on Twitter
geo objects for more information).
…
Tweets with a Twitter “Place” contain a polygon, consisting of 4
lon-lat coordinates that define the general area (the “Place”) from
which the user is posting the Tweet. Additionally, the Place will have
a display name, type (e.g. city, neighborhood), and country code
corresponding to the country where the Place is located, among other
fields.

Also, this section: Note the plural usage of Place IDs

place:
Filter for specific Places by their name or ID. To discover “Places”
associated with a specific area, use Twitter’s reverse_geocode
endpoint in the REST API. Then use the Place IDs you find with the
place: operator to track Tweets that include the specific Place being
referenced. If you use the Place name rather than the numeric ID,
ensure that you quote any names that include spaces or punctuation.

Python – Why do these two APIs (twitter geo/search API) return different result sets?

Why do these two APIs (twitter geo/search API) return different result sets?

Method 1

Method 2

Solution

Related Problems and Solutions