Data Economics

The fundamentals behind buying and selling data

9 min readJan 19, 2022

In Argentina, chronic inflation over the past years has destroyed all reference prices. What’s cheap and what’s expensive? You have to be a psychic to know. For the exact same product in the same neighborhood there are all kinds of price offerings that don’t carry any fundamental reason, and price speculation is all over the place.

Surprisingly, the global data market works exactly in the same fashion. How much data costs? Hard to say.

The same data product that’s sold for a high fee might be free on another site, and what takes one data provider ages to pack and deliver might be bought off the shelf from another. Accessing high quality data is as important as getting it in a timely manner, and data providers must tailor data packages right to their customers: each customer has a specific data format and structure needed.

There’s no universal way of trading data.

Given this high volatility in pricing and data offerings, does it mean we should just accept whatever it’s out there? Of course not. Next, I present 3 sections to help you solve this challenge: how to value data, how to price data, and how to sell data.

How to value data

Data can generate monetary value directly (when data is sold, traded, or acquired) or indirectly (when a new product or service leveraging data is created, but the data itself is not sold), and there are multiple variables that condition the data market.

Different audiences require information in different ways. For example, executive management can review high-level data through dashboards to track critical performance metrics related to profitability, while analysts can dig deep into data about individual operational processes to identify areas in need of improvement, or uncover ways to maximize productivity.

There’s a whole spectrum in the data value chain, going from:

Enriched data: where organizations can take their own data and monetize it through a data-as-a-service model, either to intermediate companies or end customers who mine the data for insights. For example, transactions in an e-commerce online platform can generate a tremendous amount of data containing economic value to customers downstream or upstream the value chain. Grocery retailer Kroger captures shopping data generated by its rewards card and sells it to consumer packaged-goods companies thirsty for a deeper understanding of their customers’ shopping habits and evolving tastes and preferences.
Transformed data: in which organizations can gain new value out of the information by combining it with other datasets, looking at different correlations and making inferences. For instance, an agricultural company can take weather data and georeference it with data on soil and crops. This way, it’s possible to make connections between the datasets to help the business figure out the best fertilizer and pesticide combinations that will optimize crop production in different regions. Also, banks can apply graph analytics to identify connections among documents or individuals in a dataset, looking at financial transactions to identify a possible network of money launderers. They can then combine this data with sensor data, such as point of sale charges and phone calls, and data scraped from the web, like product catalog searches, to find these individuals.
Applied data: where data is used to make predictive business decisions and anticipate customer needs. For instance, Amazon uses data on what consumers have previously bought and correlates the information with similar customers’ purchases to make new recommendations. By using Machine Learning models, the suggestions become more accurate as people make more purchases over time

Business Model Spectrum. External data monetization models vary by level of value impact to customers, analytics sophistication, and revenue potential. Source: MIT Sloan Management Review

The data value spectrum shows that there are intermediate stages as data is collected, processed, integrated, combined, and transformed with context to produce actionable insights, which can lead to action and value. This concept assumes a natural flow, but some aspects of this model suggest data providers should assume a “microservices” approach rather than a full-fixed process view because:

Although it would be expected that value increases as you move from dis-aggregated datasets to fully integrated data models, customers might not perceive it, demanding only disaggregated datasets (e.g. to build the models themselves).
Services like “data analysis” can be offered all across the components, so it doesn’t necessarily belong to a specific stage of the value chain.
There could be opportunities to monetize applied data models without including any proprietary data, only selling the modeling knowledge to the customer. In these scenarios, the value chain components need to be considered in isolation.

How to price data

Now, how do you price data? Data use is typically defined by the application, and frequency of use. The frequency of use is typically defined by the application workload, the transaction rate, and the frequency of data access.

The frequency of data usage brings up an interesting aspect of data value. Conventional tangible assets generally exhibit decreasing returns to use (they decrease in value the more they are used). But data has the potential to increase in value the more it is used, increasing returns to use. For example, Google’s Waze navigation and traffic application integrates real-time data from drivers, so Waze mapping data becomes more valuable as more people use it.

2 dimensions for data pricing

Personally, I like to think about data pricing in 2 different dimensions: the first one related to general conditions that the data must meet in order to have basic value, and a second one related to quality aspects that may or may not increase that basic value:

1)General conditions

Data needs to fulfill some minimum requirements in order to be tradable:

Dependability: you must be able to verify that you gathered your data from a reliable source and in a manner that didn’t compromise validity. You also need enough data and the right data to have a representative sample. This element is important within an AI project to avoid bias.
Relevance: your data must be aligned with your clients’ business needs. Obviously, these needs will change between customers. As such, you must understand their goals and figure out how your dataset can help. Moreover, you should make sure it’s organized and in a format they can use.
Ownership: don’t assume that because you have access to some data, it automatically belongs to you… For instance, in healthcare, be sure that those patients in your dataset have signed a contract authorizing the commercial use of their data, images, etc. And when selling, be sure that the buyer signs a contract of exclusive use for research purposes and add a clause for non-public disclosure of those data in the database.
Secure and anonymized: it’s important that your data is secure and anonymized. Techniques like encryption (which obscures data so it can’t be read if stolen) or tokenization (which replaces the values in the data while preserving certain elements but uses a different process to do so), can keep the information secured at all times. You should also consider that a single data point that a data subject provides under anonymity has little value, but a collection of observations generates a value by revealing statistical regularities.

2)Quality aspects

Data quality can be measured along a number of attributes like:

Accuracy, which refers to the exactness of the data, and relates to its intended use. Without understanding how the data will be consumed, ensuring accuracy could be off-target or more costly than necessary. For example, accuracy in healthcare might be more important than in another industry.
Completeness, since incomplete data is as dangerous as inaccurate data. Gaps in data collection lead to a partial view of the overall picture to be displayed.
Consistency, meaning that the same information stored and used at multiple instances matches. This becomes critical when data is aggregated from multiple sources, and a lack of it can create unreliable datasets.
Accessibility, that can be tricky at times due to legal and regulatory constraints. Regardless of the challenge, it’s key to provide customers with the right level of data access and a proper delivery method.
Granularity. The level of detail at which data is collected is important, since aggregated, summarized and manipulated collections of data could offer a different meaning than the data implied at a lower level.
Timely, which means that data has to be collected at the right moment in time (doing it too soon or too late could misrepresent a situation), and have an appropriate latency.

Different uses require different levels of quality along each of these attributes, and high-quality data for one use may be low-quality data for another. A failure to align expectations will hinder your ability to monetize. Data from the stock exchange that is cleaned and has outliers removed may be extremely valuable to long-term financial modelers, but inappropriate for fraud detection.

Given all general data conditions are met, in principle, the higher the quality level, the higher the data value.

Besides the mentioned dimensions, you should also consider elements like data volume, the frequency with which the data is enabled, the creation cost, the scarcity of the information in the market, potential substitutes, and its organization.

Finally, you need to consider that data value not only changes over time in response to new regulations or other external factors, but also as a consequence of changes in customer behaviors and interests.

Pricing strategies

In simple terms, the most common pricing strategies are:

Cost-based: where the price is determined based on how much the data asset cost to create. The major cost factors are its capture, storage, and maintenance.
Market-based: in which the price is defined on the market price of comparable “datasets” and services. However, in most cases comparable assets are non-existent.
Income-based: which is based on an estimate of future cash flows to derive from the data asset. This is a useful approach for valuing data for a specific use.

How to sell data

For data providers, there are different strategies to sell services and generate revenue:

Ad-Supported, providing content or services for free to one party while selling listeners, viewers, or “eyeballs” to another party.
Auction, allowing a market and its users to set the price for goods and services.
Bundled Pricing, selling in a single transaction two or more items that could be sold as standalone offerings.
Cost Leadership, keeping variable costs low and selling high volumes at low prices.
Disaggregated Pricing, which allows customers to buy exactly and only what they want.
Financing, capturing revenue not from the direct sale of a product but from structured payment plans and after-sale interest.
Flexible Pricing, varying prices for an offering based on demand.
Float, receiving payment prior to building the offering, and earning interest on that money prior to delivering the goods.
Forced Scarcity, limiting the supply of offerings available, by quantity, time frame, or access, to drive up demand and/or prices.
Freemium, offering basic services for free while charging a premium for advanced or special features.
Installed Base, offering a “core” product for slim margins (or even a loss) to drive demand and loyalty, to then realize profit on additional products and services.
Licensing, granting permission to a group or individual to use the offering in a defined way for a specified payment.
Membership, charging a time-based payment to allow access to locations, offerings, or services that non-members don’t have.
Metered Use, allowing customers to pay only for what they use.
Microtransactions, selling many items for as little as a dollar (or even only one cent) to drive impulse purchases.
Premium, pricing at a higher margin than competitors, usually for a superior product, offering, experience, service, or brand.
Risk Sharing, waiving standard fees or costs if certain metrics aren’t achieved, but receiving outsize gains when they are.
Scaled Transactions, maximizing margins by pursuing high-volume, large-scale transactions when unit costs are relatively fixed.
Subscription, creating predictable cash flows by charging customers upfront (a one time or recurring fee) to have access to the product or service over time.
Switchboard, connecting multiple sellers with multiple buyers. The more buyers and sellers who join, the more valuable the switchboard becomes.
User-Defined, inviting customers to set the price they wish to pay.

Final remarks

If you’re providing data, it needs to be either faster or more accurate than what your customers are using, or it must offer a unique insight previously unavailable to them. Put yourself in their shoes. Think about what specific pieces of data might help fill a hole in their puzzle.

As a data consumer, make sure all basic data conditions are met, and be very specific about your objective to align the data quality aspects. Ask for samples, and test. Try to develop your proof of concept before engaging in any commercial activity.

Interested in these topics? Follow me on Linkedin or Twitter