Today, we'll be diving into The Movie Database with HTTPie, jq, and jid and learning how to navigate its API efficiently.
When it comes to movie and TV show data, the first thing that comes to mind is likely to be Amazon's IMDB service. IMDB lacks a public, supported REST API though. Historically people have used various URL scraping techniques and well-known paths to get data out of IMDB, but that is very fragile. The Movie Database, on the other hand, has a REST API for developers (with free access) and commercial licensing for money-making projects making it a useful resource.
Dive preparation
When you start from scratch working or playing with a new API, there are three important questions to answer up front:
1. Where's the documentation?
The first question's answer is easy enough — developers.themoviedb.org
.
2. What's the root endpoint?
This should be answered by the documentation, but not always. The MovieDB docs don’t mention the root endpoint. Spoiler, the endpoint is api.themoviedb.org
. You'll only find that though after you have signed up as a user and requested an API key. It’s then that you are given an example URL with the API key:
https://api.themoviedb.org/3/movie/550?api_key=7e23cee5bfb742e781fccc26b9e9009f
3. Is there authentication and how does it work?
Yes, there is authentication in the form of an API key which is passed in the query parameter api_key
.
That's a fake API key we're showing so do remember to substitute it with your own API key if you are following along. You can obtain an API key by signing up for a TMDB account and requesting API access in the settings.
Ready to HTTPie
(If you haven’t already, now’s the time to install HTTPie so that you can follow along.) Let's translate that URL to an HTTPie command first. We'll use the https
command to match the protocol from the URL. The first parameter, in the example and as defined in the API docs, is the endpoint and path like so. Let’s try just that, with no authentication:
$ https api.themoviedb.org/3/movie/550
If you run that, you'll get:
HTTP/1.1 401 Unauthorized
Content-Type: application/json
{
"status_code": 7,
"status_message": "Invalid API key: You must be granted a valid key.",
"success": false
}
The HTTP status reflects an unauthorised access but there’s also a dump of response headers and the JSON error response. All because we didn't include the api_key
value to authorise the request. That was deliberate, because it's a good idea to see what the error responses look like early on. With this API we get both an HTTP status code for a type of error, an explicit error message and ”success”: false
in the JSON body. The API’s own error codes are listed in the documentation
Now we can add in our required authentication. With HTTPie, you can list headers and form values after the URL as name value pairs. What happens to them depends on the characters that separate them.
If they are separated with a :
then they go into the HTTP request headers.
If they are separated with an =
then they both go into a JSON body (and the request becomes a JSON body POST).
And, this is the one we are interested in, if they are separated with an ==
they get added to the query string parameters. This is what we want to do with the api_key.
$ https api.themoviedb.org/3/movie/550 \
api_key==7e23cee5bfb742e781fccc26b9e9009f
HTTP/1.1 200 OK
Content-Type: application/json
{
"adult": false,
"backdrop_path": "/rr7E0NoGKxvbkb89eR1GwfoYjpA.jpg",
"belongs_to_collection": null,
"budget": 63000000,
"genres": [
{
"id": 18,
"name": "Drama"
}
],
"homepage": "http://www.foxmovies.com/movies/fight-club",
"id": 550,
"imdb_id": "tt0137523",
"original_language": "en",
"original_title": "Fight Club",
"overview": "A ticking-time-bomb insomniac and a slippery soap salesman channel primal male aggression into a shocking new form of therapy.",
"popularity": 39.996,
"poster_path": "/pB8BM7pdSp6B6Ih7QZ4DrQ3PmJK.jpg",
"production_companies": [
{
"id": 711,
"logo_path": "/tEiIH5QesdheJmDAqQwvtN60727.png",
"name": "Fox 2000 Pictures",
"origin_country": "US"
},
],
"production_countries": [
{
"iso_3166_1": "US",
"name": "United States of America"
}
],
"release_date": "1999-10-15",
"revenue": 100853753,
"runtime": 139,
"spoken_languages": [
{
"english_name": "English",
"iso_639_1": "en",
"name": "English"
}
],
"status": "Released",
"tagline": "Mischief. Mayhem. Soap.",
"title": "Fight Club",
"video": false,
"vote_average": 8.4,
"vote_count": 21654
}
Before we move on, a quick pro-tip: never forget that your command line shell can save you typing. Export the api_key
section as an environment variable like this:
$ export API_KEY="api_key==7e23cee5bfb742e781fccc26b9e9009f"
and then you can just do:
$ https api.themoviedb.org/3/movie/550 $API_KEY
Working with responses
There's a lot of JSON in the response we just got and, thanks to HTTPie, it is both formatted and syntax-coloured for easier reading. That formatted and coloured view is the default when HTTPie is outputting to a console. If you want to page through the results, you would usually pipe the output through the more
or less
commands.
$ https api.themoviedb.org/3/movie/550 $API_KEY | less
But if you do that, you'll see unformatted, colour-less output. That is because the HTTPie default for output to a pipe or anything that isn't a console is to leave the content untouched for other commands to consume.
If you want to force formatted output, add --pretty=format
to the command:
$ https --pretty=format api.themoviedb.org/3/movie/550 \
$API_KEY | less
Now you can read the formatted results at your own pace. If you want the colour back too, use --pretty=all
and the -R
flag of the less
command:
$ https --pretty=all api.themoviedb.org/3/movie/550 \
$API_KEY | less -R
You'll now get formatted and colourised output you can page through.
There is one other difference between the defaults for output to the console and redirected output. Headers are output to the console, but skipped when outputting to another program through a pipe. If you want only headers output to a pipe, use the -h
option.
You can learn more about HTTPie's default options for terminal and redirected output in the Terminal Output section of the documentation. If you want to take direct control of what is output, check out the Output Options. They control which parts of the request/response exchange are output.
Digging into responses
If you're trying to work out the structure and content of an API's JSON responses, you can keep paging through the documentation and the paged output of less
or you can reach for more precise JSON parsing tools such as, jq and jid.
With jq
, a command line JSON processor, you can write expressions to extract and format data from JSON streams. jid
is an interactive JSON digger which lets you use some of jq's expressions and auto-completion to explore JSON files. For example, if we run
$ https api.themoviedb.org/3/movie/550 $API_KEY | jid
Up comes a view of the JSON, formatted and colourised. Control-N and Control-P will let you page up and down through the returned data, but that's just the start.
Say, for example, we are looking for which country the production occurred in. Start typing a name for a field like pro
, and you'll see jid
offers up production_co
as a potential autocomplete. Tap Tab and you'll be able to alternate between production_companies
and production_countries
.
Hit return and jid
displays the contents of the production_companies array. Tap Tab again and you'll be able to enter an array index 1
and Tab again to display just one entry from the array. Type .
and you can auto-complete through that entry’s field names. jid
is great for locating data in nested structures and the queries you use work in the command line with jq
.
Now with these two tools to hand, you're ready to explore more of the MovieDB API.
Finding more about a movie
While there are endpoints for every movie under the /movie/ path — they are located with an id — there are also subsections to each movie entry for release dates, keywords, ratings, credits, all with a path that comes after the id. So if you want to get the credits for a movie, you'd call:
$ https api.themoviedb.org/3/movie/550/credits $API_KEY
This does seem to be a query for, potentially, just a small snippet of data. That's where append_to_response
in the MovieDB API comes in. This lets you add, for particular endpoints like /movie, the various subsections. So, if we want a movie’s credits and images embedded into our response, we do this:
$ https api.themoviedb.org/3/movie/550 \
$API_KEY \
append_to_response==credits,images
It only costs us one request and merges all the data together into one JSON object.
Finding a movie
What about finding a movie? For that, there's a search endpoint, /search/movies
. It takes a query
parameter which "must be URI-encoded". If your query value has a space in it, you can use quotes around the value and HTTPie will take care of URL-encoding the value for you. No hand-encoding spaces to &20
or similar. All we need to do is remember to put quotes around our value like so:
$ https api.themoviedb.org/3/search/movie \
$API_KEY \
query=="O.C. and Stiggs"
That'll return JSON data designed for paginated search results. If we wanted to look up more about the movie, we'd want the id from the first result. For that we can pipe the results to jq ".results[0].id"
and we'll get the movie id. From there, it's a call to the /movie/{id}
to get the movie details:
$ MOVIE_ID=$(
https api.themoviedb.org/3/search/movie \
$API_KEY \
query=='O.C. and Stiggs' \
| jq .results[0].id
)
$ https api.themoviedb.org/3/movie/$MOVIE_ID $API_KEY
Or, if you want to pack it all into one line:
$ https $(
https api.themoviedb.org/3/search/movie \
$API_KEY query=='O.C. and Stiggs' \
| jq -r '.results[0] | "api.themoviedb.org/3/movie/\(.id)"'
) $API_KEY
Discovering movies
One of the richest API endpoints in The Movie DB API is /discover/movie. It lets you specify over thirty different properties for searching across films.
So say you want the most popular science fiction movies on The Movie Database. Genres are represented in the API with an integer id so we need to look up what the id for “Science Fiction” is. The full genre list is available on another endpoint, /genres/movie/list
:
$ https api.themoviedb.org/3/genre/movie/list $API_KEY
HTTP/1.1 200 OK
Content-Type: application/json
{
"genres": [
{
"id": 28,
"name": "Action"
},
{
"id": 12,
"name": "Adventure"
},
{
"id": 16,
"name": "Animation"
},
[…]
You can manually work your way through the list or use jq
to find the entry we are interested in:
$ https api.themoviedb.org/3/genre/movie/list $API_KEY \
| jq '.genres[] | select(.name=="Science Fiction")'
{
"id": 878,
"name": "Science Fiction"
}
So that tells us the genre id we are looking for is 878. Back at the /discover/movie endpoint, there’s a property called with_genre
which we’ll set to 878. Another property available on the endpoint is sort_by
which takes a field name and a sort order. We’ll set that to popularity.desc
:
$ https api.themoviedb.org/3/discover/movie \
with_genre==878 \
sort_by==popularity.desc \
$API_KEY
HTTP/1.1 200 OK
Content-Type: application/json
{
"page": 1,
"results": [
{
"adult": false,
"backdrop_path": "/9yBVqNruk6Ykrwc32qrK2TIE5xw.jpg",
"genre_ids": [
28,
14,
12,
878
],
"id": 460465,
"original_language": "en",
"original_title": "Mortal Kombat",
"overview": "Washed-up MMA fighter Cole Young, unaware of his heritage, and hunted by Emperor Shang Tsung's best warrior, Sub-Zero, seeks out and trains with Earth's greatest champions as he prepares to stand against the enemies of Outworld in a high stakes battle for the universe.",
"popularity": 5817.001,
"poster_path": "/xGuOF1T3WmPsAcQEQJfnG7Ud9f8.jpg",
"release_date": "2021-04-07",
"title": "Mortal Kombat",
"video": false,
"vote_average": 7.7,
"vote_count": 2266
},
…
],
"total_pages": 500,
"total_results": 10000
}
The results are paginated and this is page 1 with 20 results a page. If we want to compress this down to a simple list, we can use jq
again:
$ https api.themoviedb.org/3/discover/movie \
with_genre==878 \
sort_by==popularity.desc \
$API_KEY \
| jq -r '.results[] | "\(.title) \(.popularity)"'
Mortal Kombat 5817.001
Godzilla vs. Kong 3608.866
Tom Clancy's Without Remorse 4266.181
Nobody 2993.014
Vanquish 3156.355
Zack Snyder's Justice League 1992.158
…
That snippet of jq just extracts and formats the title and popularity from the results array for each item. The -r
option just stops jq wrapping all its output in quotes.
Wrapping up
Creating queries on The MovieDB API is quick thanks to HTTPie and with jq and jid, you can make sense of the results with the least amount of fuss.
In the next part of this dive, we’ll look at how TMDB’s API now handles user-generated lists and how HTTPie makes it simpler to explore.