= """
query select actor_id, count(film_id) as num from film_actor group by film_id order by num limit 5
"""
run_sql(query, engine)
[(31, 1), (77, 1), (70, 1), (198, 1), (65, 1)]
Fundamentals of Data Science
There are a few additional tricks to know about.
sakila
database that appear in only one film. Remember that the relationship between films and actors is captured in the film_actor
table.First, we can make a table showing, for each actor_id, how many films that actor appears in.
= """
query select actor_id, count(film_id) as num from film_actor group by film_id order by num limit 5
"""
run_sql(query, engine)
[(31, 1), (77, 1), (70, 1), (198, 1), (65, 1)]
To pick out the ones where num
is one, we use the having
clause. (Notice that we include a limit clause to reduce the amount of output).
= """
query select actor_id, count(film_id) as num from film_actor group by film_id having count(film_id)=1 limit 5
"""
run_sql(query, engine)
[(70, 1), (65, 1), (77, 1), (198, 1), (31, 1)]
Exercise: How can you fill out the actor names and film names from this information?
Suppose you want to find the actors that appear in the most films, it’s a bit trickier One way is to first find the largest number of films, and then join with the table of film counts to pick out those actors having that number of films.
The with
clause allows you to nest queries like this.
= """
query with filmcounts as
(select actor_id, count(film_id) as num from film_actor group by film_id),
maxfilms as
(select max(num) as maxnum from filmcounts)
select actor_id, num from filmcounts join maxfilms on num=maxnum
"""
run_sql(query, engine)
[(28, 15)]
Excercise: Extend this to include the actor names for these most prolific actors. .
If you want the actors with the second largest number of films, you can try this.
= """
query with filmcounts as
(select actor_id, count(film_id) as num from film_actor group by film_id),
maxfilms as
(select max(num) as maxnum from filmcounts where num < (select max(num) from filmcounts))
select actor_id, num from filmcounts join maxfilms on num=maxnum
"""
run_sql(query, engine)
[(3, 13), (5, 13), (31, 13), (2, 13), (17, 13), (4, 13)]
If you want, say, the top 5, then you need other techniques.