This blog post inspired the creators of Left on Read — a free and open source iMessage analyzer that uses SQLite under the hood!
SQLite is an often overlooked flavor of SQL engines. Some have suggested it is the most prolific SQL engine in existence due to its highly flexible nature and ability to run on almost any platform with limited resources. Unlike other SQL engines like MySQL, PostgreSQL, MSSQL, or Oracle, SQLite runs without a server. SQLite does not rely on a data directory or a constantly running daemon: a database is encapsulated in a single file.
SQLite and iMessage
iMessage is one of the most popular messaging platforms today, largely because it is built into iOS and Mac devices. Since its release, it has evolved significantly. But, at its core, it is simply an instant messaging platform. iMessage uses SQLite in the background to store relational data about messages, conversations, and their participants.
As a long-time Apple user, I have backed up and transferred my iPhone data since my first time using an iPhone, which was November 10, 2009. Because I have been digitally hoarding my text data for so long, my iMessage database is nearly 1GB in size.
Until a few years ago, the built-in search feature for iMessage was very limited and buggy. Although it has recently improved significantly, it is, like nearly any end-user tool, very limited in how you can query it. Those of us who frequently work with data that is trapped behind a limited front-end often wish we could get direct access to the SQL database. Fortunately, the iMessage database is not inaccessible - in fact, it is very easy to access.
Finding the iMessage SQL Database
On Your Mac
If you have iMessage enabled on your Mac as well as your iPhone, you have 2 different databases from which to choose. The database on your Mac is very easy to find, as it is simply under
~/Library/Messages/chat.db. If you do not use your Mac for iMessage, or, as in my case, your Mac iMessages do not go as far back, you can extract your iPhone's database by performing a backup to your Mac.
On Your iPhone
Follow these instructions to extract your iPhone's iMessage database:
- Open Finder and select your iPhone under "Locations".
- Find the "Backups" section and select "Back up all of the data on your iPhone to this Mac", then press Back Up Now to immediately create a new backup. This process may take a while.
- Once it is complete, you will find the SQLite file under
/Users/[username]/Library/Application Support/MobileSync/Backup/[backup name]/3d/3d0d7e5fb2ce288813306e4d4636395e047a3d28.
- If you plan to open this database with Arctype, you'll want to copy and rename the file with a
.dbextension to indicate that it is a SQLite file.
Getting Started with SQLite
Unlike most SQL servers, you do not need a connection string, host, or username to connect to a SQLite database. All you need to do is point your SQL client to the database file. Arctype makes it simple and convenient to load in SQLite databases within the same workspace as your other connections.
- Under the Connections dropdown, select "Add new data source"
- Select "SQLite"
- Find the SQLite database file. The file must have a .sqlite3 or .db extension for Arctype to open it.
More detailed instructions can be found in the Arctype Docs.
With Command Line
From a UNIX terminal, type
One of my favorite parts about Arctype is how easy it is to analyze database schema. I'm a long-time user of command-line tools and old-school editors, but sometimes having a more visually interactive tool is a lifesaver. Let's dig into the schema Apple has created for iMessage. Today we will focus on the
handle tables, as well as a few join tables to connect related records.
Note that I have created a custom view called
handle2 which adds a field
id2 that obfuscates the phone numbers and email addresses of my personal contacts, and you will see this view referenced in the examples in this article.
Digging Into iMessage
Let's write some queries and makes some observations that would not be possible without direct SQL access.
Pique Your Nostalgia with Old Messages
To get started, let's begin with a simple query to view your first 50 messages. If you have chat threads that go back years and years, there is no easy way to access early messages from your iPhone or Mac.
The interface on both platforms requires you to scroll back by about 25 messages at a time. This is prohibitively time-consuming and can result in a crash or reset if the user sends you a new message while you're scrolled back.
Fortunately, we have custom SQL to save us:
handle.id represents the readable identifier for the user. It will be either a phone number or an email address.
Rate Your Friendships with SQL
Let's use SQL to find out who our best friends are. Assuming you view the quality of friendships as a function of the quantity of sent text messages, this should be very accurate!
First, let's divide the number of messages that are
from_me by those that are not to produce a reply ratio. This query shows the top 10 people we have been messaging by the total amount of messages, as well as the reply ratio.
Multiplication by 1.0 casts to the
REAL data type to avoid integer division, which would result in 1 or 0 instead of a decimal. You can use the link here to see the rules for integer division in SQLite.
One issue with this analysis is that fewer sent messages does not necessarily imply fewer words sent. Let's add some more fields to get a better insight.
Here we can see the total amount of characters sent and received, the average length of text message sent and received, the total ratio of characters sent and received, and the reply ratio. In my case, people from whom I tend to receive more messages also send longer messages than me.
This query makes heavy use of aggregate filters. Aggregate filters allow you to use an aggregate function on only a part of the data by specifying a
WHERE clause to filter out unwanted records.
Examining iMessage Reactions
There are 2 newer iMessage features whose implementations, in the context of their schema design, are interesting to look into. Recently an announcement was made that Android phones will be able to show iMessage "reactions" properly. Historically, if you send an iMessage reaction to a non-Apple device, it will show up as a textual addition instead of an icon.
With the announcement of the new compatibility with Android devices, I was curious to learn how the current implementation of the feature works.
SELECTed a few records with and without a reaction and compared the results. I discovered that the
associated_message_type column was usually set to 0, but in messages with a reaction, it was an integer value between 2000-2005. I also noticed that
associated_message_guid was present. Apple appears to be using 2000-2005 for its 5 reaction types, 3000-3005 for when a user removed a reaction, and 3 for an Apple Pay request.
From this investigation, it appears that reactions are sent as iMessages with the reaction's textual equivalent appended and a foreign key relation to the parent message. This allows the messages to seamlessly be sent and received by non-Apple devices.
If the message is sent over SMS, the metadata linking the reaction to the message it references is simply lost. If the device is iMessage capable, Apple devices will ignore the
text part of the message, find the associated message and add the proper reaction as a visual overlay.
Note that the
message table includes both a
ROWID and a
ROWID is a typical auto-increment integer
id field, which is useful for joining on within the local database. However, the auto-incremented primary key will not be the same for the same message across devices. The
GUID is globally unique, generated by the author of the message, and sent to all of its recipients. This allows foreign key references across different databases, devices, and users. For more information about the utility of GUIDs, check out this article.
Find Your Most Popular Group Chats
Group chats are stored in the
chat table. Join tables
chat_message_join are used to associate users and messages, respectively, with group chats. Here's a query to find out most used group chats (chat with > 1 members) and the identities of their participants.
group_concat function, which is familiar from MySQL by the same name and familiar to PostgreSQL users as
string_agg, is an aggregate function that concatenates strings together. See more on how it can be used within SQLite here.
HAVING clause is similar to a
WHERE clause but operates on aggregate functions. If you've wanted to write a query conditional on an aggregate but are not able to inside of your
HAVING is there for you.
SQLite is a powerful tool whose prolific reach across devices and numerous use cases make it one of the most impressive software projects around. If you're curious about what's behind the scenes, SQLite's source code is well known to be well-organized and fun (well, to some of us) to peek into.
iMessage is just one of many pieces of software that rely on SQLite and are used by millions of end-users. You can try out a SQL client like Arctype for free and start exploring the databases that power the tools you use daily!