Querying Pull Request Data from Github

Let's see how we can use MergeStat and Arctype to monitor contributions to our projects.

7 months ago   •   8 min read

By Bonnie
Learn how to use MergeStat, Arctype, and SQL to study your Github repositories
Table of contents

To make sure that bugs do not reach the end-user of your product, you need to do code reviews. In code review, metrics of pull requests matter a lot because they provide data on how well you are shipping. Software developers can use pull requests metrics to understand the team dynamics and act appropriately to correct behaviors before things get out of hand.

What is code review?

Code review is when a peer or a senior developer examines a developer’s code or a pull request. Code reviews help developers discover common bugs faster and reduce the work required to optimize code later.

Some of the benefits of code review include:

  1. Consistency in design and implementation.
  2. Optimizing code for better performance.
  3. Saving time and money.
  4. Monitoring project quality and requirements.
  5. Streamlining the development process.
  6. Collaboration and sharing of new techniques.

This article will help you learn how to use MergeStat to get pull request data from a GitHub repo.

We will start by fetching data from a Github repo using MergeStat. Then we will connect to SQLite using Arctype. Finally, we will explore the pull requests data by running some SQL queries on Arctype.

Arctype is a database management and visualization tool. Using Arctype, you will learn how to execute or run SQL queries to come up with metrics such as:

  • How many unanswered issues does your project have?
  • How many pull requests were merged in the last month?
  • How many unapproved merged pull requests have we had?

Before querying Github repo data in Arctype, we will get the repo data using the MergeStat web app by querying Github API. Let's get started.

Getting Github repo data by querying the Github API using MergeStat

In this section, you will learn how to use the MergeStat web app to query a Github API and get all the pull requests in a Github repository. To query the Github API, you must provide an authentication token. You can create a personal access token by following these instructions.

Before we proceed, there is something you need to know. All the API requests to a Github API are rate limited. The SQL query we will be running uses the Github GraphQL API (v4), which rate limits based on the “complexity” of GraphQL queries. The more fields/relations in your GraphQL, the higher the “cost” of a single API request, and the faster you may reach the rate limit.

Unfortunately, it is hard to know ahead of time what an excellent client-side rate limit is. By default, each SQL query will fetch 100 items per page and permit 2 API requests per second.

With that in mind, navigate to the MergeStat web app.

Screenshot of Mergestat
The MergeStat web app

To feel the magic of the MergeStat web app, click the blue "Execute" button at the bottom on the right side of the screenshot above. After the execution finishes running, scroll down a bit, and you should see the results below.

Screenshot of query results
Results of the query

Let's now run a SQL query that returns all the React Codebase repository pull requests on Github.  On the MergeStat web app, click the ‘Query Settings’ button at the bottom on the left side of this screenshot.

Screenshot of mergestat
Query settings

After clicking the ‘Query Settings’, make sure you are logged in to the MergeStat web app to add the Github repo you want to query. Follow these instructions to get GitHub Access Token and paste it in its field as shown below. Enter a "Github Per Page" limit of 10 to avoid running to errors when the query is executing. You can leave the Github Rate Limit field blank. Don’t forget to save your inputs.

Screenshot of mergestat
Connecting to the repository

After saving your inputs, execute the following SQL query on the MergeStat web app.

SELECT * FROM github_repo_prs('facebook/react');
Getting the first batch of data

You should get results once the execution is done, as shown below.

Screenshot of Mergestat
Successful query

Let us now repeat what we did above, but in this case, after installing the MergeStat app on our computers. We will then use the installed MergeStat app to query a Github API and get all the pull request data in React Codebase by Facebook. After that, we will use Arctype to visualize the pull request data using charts.

How to install the MergeStat desktop app

One of the easiest ways to install MergeStat on your system is by using Homebrew. First, make sure Homebrew is installed in your system, and then run the following commands on your command line.

brew tap mergestat/mergestat
brew install mergestat
Installing MergeStat

After MergeStat is installed, you must add your Github token and Github per page limit as environment variables. If you are on Linux, you can follow these instructions on adding environment variables.

The environment variables should look as shown below.

GITHUB_TOKEN=’your_github_token’
GITHUB_PER_PAGE=10 
Setting environment variables

MergeStat will look for a GITHUB_TOKEN environment variable when executing and use it for authentication. Additionally,GITHUB_PER_PAGE expects an integer between 1 and 100 to set how many items are fetched per page in API calls that paginate results.

Let us now query the Github API using the MergeStat app to get the pull request data of the React codebase by Facebook.

On your command line, run the command shown below.

mergestat export pr-file.db -e github_repo_prs \
	-e "SELECT * FROM github_repo_prs('facebook/react')"
Getting the data for the React codebase

The command above will create a file called pr-file.db in your current directory.

Exporting the React codebase pull requests data from Github

To query the React codebase pull requests data, we will use Arctype. You can install the Arctype app on your system here.

Once you have Arctype installed, you will be redirected to a page to add your database credentials. In this case, select SQLite as your connection.

Screenshot of Arctype
Choose SQLite for your connection with Arctype

Once you have selected the SQLite database as your connection, you will be redirected to a page to name your SQLite connection and add an SQLite file to Arctype.

Screenshot of Arctype
Connecting to SQLite in Arctype

The SQLite file that you need to add is the pr-file.db file, which was downloaded to your directory when you ran the command below on your command line.

mergestat export pr-file.db -e github_repo_prs \
	-e "SELECT * FROM github_repo_prs('facebook/react')"
We need the file that we generated when we ran this command.

Click the "Choose SQLite File" button and navigate to the folder that has the pr-file.db file. In my case, the file is in a folder called mysql-server-main, as shown below.

Screnshot
Finding the pr-file.db file

Once you have selected the file, click "Open" to add the file to Arctype. Then test the connection, and don’t forget to click the save button. If everything is okay, you will be redirected to your workspace. On your workspace, click the table to explore it.

Screenshot of Arctype
Select the added table

Once you click on the table name, you should see a table containing all pull requests data from the React codebase repo on Github.

Screenshot of Arctype
Seeing everyone who contributed to the React codebase

Let us now run some queries in Arctype to check out some different metrics from the data.

Identifying unanswered issues and pull requests

Unanswered issues and pull requests on Github repos are no fun if you are the one asking for help. However, identifying unanswered issues can also be valuable to maintainers to ensure that they are addressing user concerns.

In this step, we will run a query on Arctype to identify stale issues or pull requests in React Codebase repository on Github. In Arctype, run the following SQL query that will return the oldest 25 issues created more than 30 days ago, with no comments, and remain open.

SELECT
   title, author_login, comment_count, created_at, url
FROM github_repo_prs
WHERE
   created_at < date('now', '-30 days')
   AND (merged = 0 OR closed = 0)
   AND comment_count = 0
ORDER BY created_at ASC
LIMIT 25
Finding old and unanswered issues

Once the SQL query is done running, you should see a table like the one shown below, showing the oldest unanswered pull requests. "Add explanation if rendered a nonvalid component" was created in 2013 and never received any comments. It can probably be removed!

Screenshot of Arctype
Table of query results in Arctype

Finding unapproved merged pull requests

Allowing repository administrators to merge pull requests without a review can be a helpful shortcut for “emergency” hotfixes that avoid a potentially time-consuming code review cycle. However, too many unapproved merges are probably a sign of something wrong.

In this step, we will run a SQL query on Arctype that will list all unapproved pull requests merged into the main branch of React Codebase repository on Github.

SELECT
   number,
   title,
   date(created_at) AS created_at,
   date(merged_at) AS merged_at,
   author_login,
   review_decision,
   merged
FROM github_repo_prs -- set to your own repo
WHERE
   merged = true
   AND review_decision <> 'APPROVED'
   AND base_ref_name = 'main' -- set to your own branch
ORDER BY created_at DESC
Finding unapproved merged pull requests

Once the SQL query is done running, you should have a table as shown below. These issues both seem like behind-the-scenes internal stuff. To be sure, we can pop the full text of the title open with cell inspector.

Screenshot of Arctype
The result in Arctype

Pull requests merged in the past 30 days

When a project has more than a handful of regular contributors, it can be challenging to see all the changes.

In this step, we will run a query on Arctype that will show all pull requests that have been merged in the past 30 days on React codebase repository on Github.

SELECT
  base_repository_name,
  title,
  number,
  url,
  author_login,
  created_at,
  merged_at
FROM github_repo_prs
WHERE
   merged_at > date('now', '-30 days') -- replace with the time period you care about
   AND merged = true
Finding pull requests merged in the last 30 days

Once the SQL query is done running, you should have a table as shown below. Notice the 62 rows count at the very bottom. This is a very active repo, averaging 2 merges a day!

Screenshot of Arctype
Results of the query in Arctype

Conclusion

Every time someone writes code and wants to merge it into the main branch, they should request that another developer take a look and offer feedback. A code review helps developers to maintain consistency between design and implementation styles across man team members.

However, as a code reviewer, you might miss some pull requests. Using tools like Mergestat and Arctype can be a great way to review pull requests in depth and take action on your findings.

JOIN the Arctype Newsletter
Programming stories, tutorials, and database tips every 2 weeks
Follow Arctype's Development
Programming stories, tutorials, and database tips every 2 weeks

Spread the word

Keep reading