How many people are currently using Bitcoin? How fast is the Bitcoin network growing relative to other revolutionary technologies and top tech companies? Is Bitcoin a bubble? Is BTC overvalued or undervalued? Is Bitcoin dead? Or will it someday take over entire industries? Does Bitcoin have the potential to someday become integral to global finance, commerce, and society? If so, when?
What follows is a multi-part, comprehensive, statistical analysis of many relevant global technology trends, blockchain metrics, market penetration estimates, top tech companies' and BTC trading prices—all aimed at answering the above questions, which I believe are foremost on the minds of many people interested in the modern crypto-phenomenon that is Bitcoin.
* * *
Protocol and network trends
In part 1 on historical technology trends, we looked at how new technologies are adopted. In this section, we'll take a closer look at how Bitcoin adoption has proceeded thus far, as is recorded in various blockchain metrics
Besides fulfilling my own curiosity, I wanted to conduct this statistical analysis of the Bitcoin protocol, network, and trading price because there seems to be surprisingly few published, thorough analysis of such. In the interest of full transparency, and so that other nerds may hopefully expand on this work, I’ve provided all my scraping, processing, analyzing, fitting, and organizing python scripts and Matlab functions, along with resulting data and figures, in this GitHub repository.
Statistical and analytical techniques
The data sources used in my figures (Fig. 2, 4-15) are as follows:
- Blockchain metrics: oxt.me
- Bitstamp Bitcoin trading prices: bitcoincharts.com
- World development indicators: data.worldbank.org
- Tech companies trading prices: finance.yahoo.com
I chose to use the blockchain data from oxt.me because, short running a full node and compiling the various metrics manually, it provides one of the most comprehensive, organized, and accessible sources publicly available, for a wide range of blockchain metrics—shout out to Laurent at oxt.me and Samourai Wallet for providing me beta access, fixing relevant data formatting bugs, and answering my specific queries regarding the database. I chose to use Bitstamp trading data mainly because it’s one of the longest running sources of reliable trading data. There is earlier trading data than the start of Bitstamp trading, from Mt. Gox, starting July 2010, during which time the price basically moved from USD/BTC $0.10 to $10. But I've not included this data in my analysis here: because the relatively small price move during that time is largely extraneous for our purposes, and because Mt. Gox trading has been shown to have been heavily manipulated by trading bots.
Equation 1, 'exp1': y = a * exp( b * t )
Because Bitcoin is a network, in the early stage of its initial expansion, most blockchain metrics, along with its trading price, are clearly following exponential growth curves. I've used the exponential Equation 1 to model the various growth curves because it's capable of fitting all the relevant exponential plots and is convenient for transposing data to logarithmic scales. In addition, Equation 1 has the following useful properties, specific for our purposes:
- time t units are normally measured in seconds and recorded as Unix timestamps
- i.e seconds since Unix epoch, Jan 1, 1973
- t units can easily be converted into years by multiplying by 1/(60s/m * 60m/h * 24h/d * 365d/y) = 1/3.1536e+07s/y = 3.1710-08y/s
- coefficient a = y(0) = y-intercept at t0 (t = 0)
- i.e. at time t0, y is equal to a
- function is stretched vertically by the factor a
- a price units: [USD/BTC]
- coefficient b = rate of growth
- when time t units are measured in seconds, growth rate b units are 1/second
- growth rate at any instant equals b times value of y at that instant
- i.e. b is percent growth per time period (y grows b % per time period t)
- function is squeezed horizontally by the factor b
- b rate units: [1/second]
- slope of curve is everywhere equal to y and goes through the point (t = 0, y = y0)
- e.g. on May 22, 2010 (t = 1.274e+09s), Laszlo Hanyecz traded 2 pizzas worth ~$41 for 10,000BTC
- thus, y(1.274e+09) = 0.0041USD/BTC = slope of curve at t = 1.274e+09
- e.g. on May 22, 2010 (t = 1.274e+09s), Laszlo Hanyecz traded 2 pizzas worth ~$41 for 10,000BTC
Equation 2, 'poly1': ln(y) = p1 * t + p2
When the data is converted to the logarithmic scale by taking the natural log ln() of each data point, the exponential Equation 1 curve plots as a straight line on the transposed ln(y) axis and is modeled by the simple linear Equation 2. For our purposes, Equation 2 has the following properties:
- coefficient p1 = slope of ln(y) at time t
- p1 rate units: [1/second]
- coefficient p2 = ln(y)-intercept at t0 (t = 0)
- p2 price units: [USD/BTC]
In the associated Matlab functions, Equations 1 and 2 are respectively notated as 'exp1' and 'poly1', which are also the names of the corresponding Matlab best-fit functions. For our purposes, the fitting functions have the following properties:
- 'exp1' fit is biased to larger values, and thus to the more recent end of the curves
- i.e. more representative of most recent history, because more proximal values are exponentially larger than early values
- thus, 'exp1' fit is a good indicator for analysis of shorter term future trends
- 'poly1' fit is an unbiased across all time periods, because it's fitted to the natural logarithm of values ln(y)
- i.e. logarithmic scale squashes data down to roughly the same order-of-magnitude
- thus, 'poly1' fit provides a good indicator for analysis of longer term future trends
- the coefficient pairs for the above fits are related:
- y = a*exp(b * x) --> ln(y) = ln(a*exp(b * x)) = b * x + ln(a)
- thus: p1 = b, p2 = ln(a), and a = exp(p2)
- y = a*exp(b * x) --> ln(y) = ln(a*exp(b * x)) = b * x + ln(a)
All fits are fitted to complete data sets, except for the partial price fit to July 2007 (Fig. 13). The July 2007 subset fit is present because it altogether ignores the most recent price spike of the past ~6 months, and as such, it may provide a more impartial long-term fit of trading price data.
So much for the boring (yet important) maths; let's get into crunching the datas!
Statistical analysis of Bitcoin blockchain metrics
For more information, or to audit the preparation of these metrics, please go to the associated bitcoin-analysis GitHub repository. All plots, except where otherwise noted, are based on weekly averages of daily data, i.e. each data point represents the average of seven days of data.
Figure 4: Cumulative block size, i.e. cumulative total size of all blocks mined per day
The growth of the daily cumulative block size is by design slowing overtime towards an upper limit (Fig. 4), despite an exponential increase in hashing power, because mining difficulty is regularly adjusted (and also growing exponentially) to hold block discovery times at approximately 10 minutes. Of course the growing adoption of SegWit over the past six months or so has somewhat raised the block size limit and should help eased this bottleneck as more of the network implements the protocol update. Conversely, notice the dramatic decline in cumulative block size since the January ATH, which is likely not only due to lowered transaction volume but also because it appears that ~30% of hashing power has gone offline since then (which we'll discuss in more detail in a later section).
Figure 5: Total addresses included in Bitcoin blockchain
The plot of total addressees (Fig. 5) is much smoother and very closely follows its exponential curve relative to other more volatile metrics because its one of the least constrained metrics (details below), is cumulative, and is also a few orders of magnitude larger than other metrics. The same also applies to the total UTXOs plot (Fig. 8).
Figure 6: New addresses included in Bitcoin blockchain
The new addresses metric (Fig. 6) provides a better representation of active usage of the network, as most of the dust-only and abandoned addresses, present in total addresses, are removed. Of course, even this metric cannot be used as a direct indication of users, because many addresses (ideally a new one for each transaction) are controlled by individual wallets, of which users may have many.
Figure 7: Transactions included in Bitcoin blockchain
A better proxy to current Bitcoin activity is perhaps accepted transactions (Fig. 7); however, this metric also isn't entirely representative of network activity as each transaction may contain thousands of UTXOs (Fig. 8). In addition, accepted transactions give no indication regarding current bitcoin holdings (hodlers) and investment, store of value use cases. Certainly such use cases currently comprise much of the utility of Bitcoin, as can be seen in the dramatic decline in transactions (Fig. 7) directly inline with the recent price decline (Fig. 13, 14). At this time, many users are choosing not spending their bitcoin, as they believe it is currently undervalued, due to its long-term deflationary trait. Another interesting event is the 5x increase in transactions in summer 2012, associated with the rise in popularity of Satoshi Dice.
Figure 8: Unconfirmed Transaction Outputs (UTXOs)
Unlike accepted transactions, the UTXO set includes all unspent balances, a.k.a. otherwise inactive hodlers' ‘savings accounts’; however, unfortunately all dust outputs, unprofitable outputs, and other such unspendable outputs (e.g. outputs with permanently lost keys) are also included. Unprofitable outputs have been estimated to comprise a significant proportion of total UTXOs: anywhere from 20-70%, based on current mining fees. Hence, the UTXO set can provide us with some further insight into certain use cases, such as store-of-value, but is not without it's own inherent noise that complicates data interpretation. Also, note the sudden order of magnitude increase in UTXOs, summer 2015, owing to a so called ‘stress test’, when millions of UTXOs were spammed to the network within a few days.
Figure 9: Bitcoin days destroyed, BDD
In an attempt to eliminate such spam from the network and provide a better estimate of actual economic volume, the BDD metric was proposed (Fig. 9), in which transactions are weighted by the age of their input UTXOs. The BDD metric is supposed to remove artificial noise transactions sent back and forth from single entities (e.g. spam) from legitimate transactions. Though, some would argue that all transactions are legitimate and there is no such thing as spam on the blockchain, at least for our purposes, this debate is largely semantic.
Figure 10: Estimated Bitcoin payments
Another metric proposed to provide a better estimate of actual economic activity is payments (i.e. #UTXOsCreated - #Tx - #OpReturn). Essentially, the estimated payments metric (Fig. 10) accounts for all newly created UTXOs that are not change addresses. Unfortunately, the estimate does include so called spam payments, as can be seen in the dramatic fluctuations surrounding the Satoshi Dice and 'stress-test' events noted above (Fig. 7, 8, 10). At this point, no single ideal metric exists as a direct proxy for network usage or number of users; although, there have been attempts made to estimate Bitcoin network usage and market penetration. Currently, such estimations are not very accurate, but may at least provide us with a very rough idea of the percent of the global population that are active Bitcoin users. Examples of such estimates range from ~3-20 million active users.
Owing to insufficient public KYC data (FTW!), such estimates must at present rely on proxies for users such as active wallets, unique addresses, or number of transactions. At least, such indirect proxies can certainly provide us with upper and lower bound estimates. Let's say there's 20 million active Bitcoin users; this would mean that less than ~0.3% (0.02/7.6B) of people on Earth currently use Bitcoin. For our own cross-reference estimation, let's say the average global consumer conducts ~10 (1 is too low, 100 too high) economic transactions per day, that's ~76e+09 non-btc-txs/day. The largest non-spam payments volume there's ever been on the blockchain is ~6e+05payments/day (Fig. 10 inset), which is ~8e-06payments/non-btc-tx ~= 0.0008%. Let's also assume that an average user still conducts 100x more non-btc-txs than Bitcoin payments: 0.0008% * 100 = 0.08%. Thus, a reasonable order-of-magnitude estimation of the global percentage of Bitcoin users is likely ~0.2% (average of 0.3% and 0.08%). Though one recent study of 2001 Americans found that perhaps the percentage of holders is as high as 5% in the most affluent nations.
I think it's safe to say we haven't even come close to hitting the mainstream yet, which at least by one estimate of the current exponential growth, may be reached by approximately 2024, at 200 million users, ~3% of the global population. Using the average (50%/year) of the 'exp1' recent blockchain growth rates (Tab. 1), we find this time line to agree well with our estimated short-term growth rates: t3% = ln(0.03 / 0.002) / ln(1.5) ~= 7yrs.
Figure 11: Transacted BTC volumes
There are yet other metrics more focused on directly measuring the value associated with the blockchain, such as BTC volume (Fig. 11), mining fees (Fig. 12), and of course USD/BTC trading price (Fig. 13, 14), which we'll cover in the following section.
Figure 12: Bitcoin mining fees
Such metrics surely provide a better estimation of actual value present in the network, than other popular metrics, such as coin market cap, often falsely assumed to do the same. Using coin market caps as an estimate of their real-world value is problematic for a number of reasons. Firstly, market caps are not reflective of the actual economic usage of a particular coin. For example, BCash's market cap is a significant portion of Bitcoin's, but its actual usage (e.g. transactions per block) is far less significant relative to Bitcoin, i.e. all those big blocks are mostly empty. It's also likely that, considering the most recent, obvious Bitcoin mempool spam attacks (Nov.2017-Jan.2018)—resulting in drastically increased mining fees (Fig. 12) and suspiciously correlated with artificial BCash pumps—a high percentage of those present on BCash are also spam: meant to provide a facade of non-existent value and utility.
Secondly, anyone can fork their shitcoin off their coin of choice, tweak a couple parameters with minimal effort, and instantly create a shitcoin with a marketcap rivaling Bitcoin's own. For a good example of such, see Morgan Rockwell’s purported $400 Billion Counterparty Asset, BANKCOIN. If the shitcoin is a Bitcoin clone, this can be accomplished with even less effort because by their nature all Bitcoin-forks inherit Bitcoin's associated market cap by default. Keeping this in mind, such metrics may be able to provide a very rough benchmark to compare Bitcoin against, as we'll discuss in more detail later.
Since the decline in price from January 2018's ATH, there has been a dramatic decline in volumes across many blockchain metrics during the ensuing months, indicating that the day-to-day activity of the network has fallen by at least half from the January ATH (Fig. 9-12)! We'll talk more about why the price is likely the primary driver of this trend later. Spoiler: it's almost as if hodlers be hodling!
Very recently, there has been a pronounced decline across most metrics—particularly new addresses, transactions, payments, BTC volumes, and mining fees (Fig. 6, 7, 10-12)—most likely caused by the recent fall in trading price, which we'll cover in the next section. Such declines in transaction volumes are typically correlated closely with large declines in BTC price, apparently hodling really is a thing ;).
Table 1: Combined and averaged 'exp1', 'poly1' fit coefficients data spreadsheet
Note from Table 1 that the average BTC valuation is growing an order-of-magnitude faster than that of the top 5 tech companies, and faster than their combined totals! Approximate R2 goodness of fit values are provided along with block 0-intercepts (Tab. 1); a live version of the spreadsheet is available at the associated bitcoin-analytics GitHub repo.
Over the longer term, there appear to be a few overall Bitcoin usage trends reflected in the above blockchain metric plots that are more easily recognized when directly comparing their growth coefficients (Tab. 1). Firstly, the 'poly1' fits is greater than that of 'exp1' fits, i.e. in general, blockchain growth has been slowing overtime. Most metrics are still growing exponentially, just at a slower rate than very early growth rates. There has been a similar leveling out trend across nearly all blockchain metrics, at least relative to their clear early exponential growth. This recent slowing could be interpreted as blockchain metrics stretching out at the top of the logistical S-curve, and although this is surely the case for certain constrained metrics (discussed below), it's unlikely to be the case for relatively unconstrained metrics, due to the relatively low market penetration percentage (~0.2%) to date.
Indeed, when we compare the recent 'exp1' Bitcoin growth rates to those of the historically fastest growing technologies in the US (~5%/year, Fig. 1-3), even the relatively slow Bitcoin growth of ~50%/year (Tab. 1) remains an order of magnitude greater. The long-term growth of Bitcoin much closer aligns with the global adoption rate of Internet and mobile cellular subscriptions (~40%, Tab. 1). Also, keep in mind, as previously mentioned, that Bitcoin is potentially far less limited by the global population than these other communications technologies, because of the massive machine infrastructures we now have in place along with our rapidly advancing ability to automate these infrastructures: machines paying machines paying machines...
Of course some blockchain metrics have inherent limits and are surely reaching their capacity saturation points. The cumulative blockchain size and accepted transactions are such metrics that are directly constrained by the Bitcoin protocol. Most directly constrained metrics are already approaching the upper bounds of their associated s-curves, i.e. their growth rates are leveling off according to the current blockchain specifications.
Other metrics, though not directly constrained at the protocol level, can nonetheless be indirectly constrained for a couple or reasons:
- they are dependent on directly constrained metrics
- technically little constrained by transaction processing rate, as batching allows thousands of UTXOs per transaction
- however, owing to bad practices (non-batchers), UTXOs are likely somewhat constrained by blocksize limit
- payments, BDD:
- derived in part from UTXOs, so share similar indirect constraints with the latter
- they are indirectly constrained by the current scaling pressures on the network (discussed in detail later)
- mining fees:
- as the transaction capacity bottleneck increases fees are driven upwards
- new, total addresses:
- usability difficulties, such as high fees and obtuse UX/UI, slow the adoption/usage rates
- though addresses can by no means be directly correlated with users, they can be somewhat reflective of usage rates
- BTC volume:
- as transactions can be for any legitimate amount, BTC volumes should be fairly unrelated to transaction volumes or speeds
- although certainly usage rates will impact the amount of BTC being transacted
- mining fees:
Overcoming these limitations is the current challenge to scaling the Bitcoin—but have no fear, Lightning mainnet is here! Already we’re seeing the impacts of increased efficiencies as SegWit and batching become more widely used: Coinbase and Bitfinex implement SegWit, has immediate easing effects on fees, and mempool.
In the next section, we'll apply these same analytical techniques to the trading price of Bitcoin and compare the results against the top 5 publicly traded tech companies.