STORE is bringing Cloud 2.0 to the world with a zero-fee cryptocurrency and checks and balances governance.
STORE aims to tokenize data to make it open (easily discoverable), tradable (data can be purchased), and programmable (developers can build their applications using purchased data streams). STORE uses a ⅔ fault tolerant trust model to secure the data created on its platform. STORE miners become trusted oracles with this model, enhancing the transparency for data creators, data buyers, and developers on the STORE platform.
The business model for open data is created around a concept called datacoins. This document explains what datacoins are, why they are needed, and how they are used to create an economy around open data. The document is organized as a series of questions and answers to help simplify understanding the business model around open data.
Today’s data live in their own silos. They are usually unstructured or they are formatted by a specific company [1] to suit its business needs. The data is not universally discoverable and hence not easily sellable to, tradable with, or usable by others. The data’s value is not leveraged because they cannot be easily accessed or combined with other data or further leveraged by others to create better data with even value.
Additionally scrubbing and cleaning data after the fact for Artificial Intelligence/Machine Learning (AI/ML) is expensive and error prone.
“Open” data, on the other hand, are not silo’d, are well structured and labeled, so they are easily discovered. Open data allows a company’s data to be efficiently sold, traded and built with/upon. At STORE we call this process of opening up data the “tokenization of data”. The applications or systems that tokenize their data are called “tokenized apps” or tApps.
At STORE we use the term “data” to mean pre-created data as well as APIs and algorithms that produce data on-demand. These APIs and algorithms can be invoked with their supported properties to produce different sets of data. Tokenization includes APIs and algorithms as well.
All kinds! Any company owning or producing valuable data can tokenize their data and take advantage of STORE’s open data platform. This includes any web2.0 or mobile app who can tokenize their data without expensive rewrites in STORE specific languages. A lightweight interface allows existing applications or systems to open up their data on the STORE platform. This enables migrating existing apps on to STORE without expensive re-engineering.
An important distinction is that unlike dApp platforms which have limited capacity causing them to be extremely expensive to deploy any real world, data-rich apps, STORE is designed ground up to support data-rich apps and consume massive amounts of data in a trusted and decentralized network.
Today’s web2.0 and mobile apps consume and create large amounts of data. AI/ML services take this further and they consume and create even massive amounts of data. We call them data-rich apps. On the contrary, dApps built on top of web3.0 and similar platforms can be termed data-light apps. This differentiation is important because most dApp platforms require app developers or users to pay for network resources — memory, CPU, bandwidth, and storage. Given the limited capacity of these decentralized platforms, the network resources tend to be extremely expensive to deploy any real world, data-rich apps. STORE is designed grounds up to support data-rich apps in a trusted and decentralized network.
For data to be open and tradable, it should have the following properties.
The process of tokenization involves securing the data created by tApps with private keys. Data can then be traded when the buyer purchases access to private keys. In order to prevent access leaks, the private keys are hashed with the buyer’s public key so only the authorized buyer can access data.
STORE has chosen 1MB of data as the unit of data. Data is priced and access is purchased in these data units. These tradable tradable and programmable at 1MB data units ends up representing an asset (data), we call them datacoins.
1 MB of data = 1 datacoin
Each company’s tokenized app issues its own datacoins, which are created, purchased, and used in the context of their tokenized app and are unrelated to another company’s datacoins. For every MB of data created, 1 datacoin is created by that app. Fig. 1 illustrates the process of creating datacoins based on the data created by the tokenized apps.
For example, an IoT app that monitors activities around volcanoes and produces live data feed will issue its own datacoins (say, ‘volcanocoin’ [2]) compared to another AI-app that predicts traffic congestion based on real time traffic data, which will issue its own datacoins (say, ‘trafficcoin’). On an average, volcanocoin app may produce 500MB of data on a given day, so it creates 500 volcanocoins daily. The trafficcoin app, on the other hand, may produce 1 TB of data daily, so it creates 1M trafficcoins daily. Fig. 2 illustrates different tokenized apps issuing their own datacoins.
Datacoins are mined[3] based on the units of data created in a tokenized app, so they are mined on a continuous basis. For example, if a company creates 1 GB of data on a given day, then they will mine 1,000 datacoins on that day. This is different from Etherum’s ERC20 tokens, which have their “total supply” predefined when the tokens are first created. The number of datacoins mined may vary widely on a day-to-day basis, depending on the volume of data created.
You bet! The company’s data will need to first be structured, labeled and price like other data created in STORE. If a company wants to migrate 1 TB of their preexisting data then, 1,000,000 datacoins will be mined on day 1. From that point forward new data will generate new datacoins at the rate of 1 data coins per 1MB of data.
No. Different data have different values. Some data is more valuable and other data is less.
Of course. Companies can use different “classes” to identify and subsequently uniquely price different groups of their data.
Fig. 3 illustrates data classes. Tokenized apps can create data of different classes, but datacoins are issued purely based on the total data created by the apps.
Fig. 4 gives an example of how total number of datacoins is based on the total amount of data created by the tokenized apps.
Companies decide the pricing of their data. Different classes of data may be priced differently based on their value and demand. Fig. 5 illustrates an example of pricing different classes of data.
Data is priced in 1MB units (though pricing flexibility exists to accommodate events such as large or full buyouts and even ongoing live data streaming). Fig. 6 illustrates an example of a full buyout where the buyer purchases access to all the data created by a tokenized app. In this example, the app creates 3 classes of data at different prices per class, but the buyer negotiates a common price (1.5 datacoins per 1 MB of data) to purchase all the 3 classes of data.
As discussed earlier in this document, data discovery is done via data categorization. For a requested category, multiple data classes from multiple tokenized apps may match. Buyers use the search feature on STORE to discover the categories of data they want. Fig. 8 illustrates how generic search on STORE or specific APIs published by tokenized apps can be used to discover and filter data.
The search or the API result contains matching data classes, data sizes, and the price of data in respective datacoins and in $STORE. The result may also contain the preview data of matching classes to help buyers with making purchase decisions. The result has a unique result ID, which can be used later to purchase the data.
Fig. 8 illustrates how buyers purchase access to the data. As described before, buyers start with search and filtering to decide what they want to buy. The buyers then link the result ID to their STORE wallet to pay for the data access. The buyers pay in $STORE, so they should have sufficient balance in their wallet to purchase the access to the data. Once the purchase is completed, a smart contract is created with the following content.
The contract contains other housekeeping data also, such as the public keys of app developers. The contract can be “executed” by the buyer to access the data. This process is discussed next.
The buyer can execute the contract to get an authorization code that is tied to their identity. The authorization code works similar to cookies used in websites and allows the STORE platform to enforce access rights. The authorization code is associated with the buyer’s identity to prevent unauthorized access to data. When executing the contract, the buyer signs the contract transaction with their private key to prove the ownership of the authorization code. Fig. 9 illustrates the data access flow.
The authorization code embeds any restrictions associated with the data access such as the expiry date, concurrent access to data, anonymization schemes if any, and so on.
The value of $STORE is determined based on the macroeconomics of the STORE network. It is the native currency of the STORE network. Datacoins however, are app specific and their value is based on the demand for the data of specific apps and set by the developers of those apps. Fig. 10 summarizes these differences.
Developers are responsible for pricing their data and creating demand for their data. So, the value of datacoins is primarily a function of the tokenized data. Since not all data is created equal, datacoins of some apps may be more valuable (in absolute terms) than others.
$STORE is the unit of account on the STORE platform. This means, all datacoins are valued and priced in $STORE. Just like any commodity, the value of data in a tokenized app may fluctuate based on demand and supply, so the value of a datacoin will fluctuate based on the value of the data. 1 volcanocoin may be equal to 2 $STORE and 1 trafficcoin may be equal to 0.1 $STORE.
Fig. 11 explains the difference between fungible and non-fungible tokens.[4] Datacoins can be both fungible and non-fungible depending on how developers want to model access to their data. In other words, this is the responsibility of developers. For example, if an app produces unique instances of data, such as digital art, access to each instance can be modeled as a non-fungible datacoin. The purchase flow described in fig. 9 covers both types of coins. The smart contract generated to grant ownership of data describes the type of the coin used to access data. Since the purchase process starts with searching for specific data, the access is granted for the searched data, thus hiding the complexities between fungible and non-fungible datacoins. The smart contracts used on STORE to purchase access to data can be compared to ERC998[5] standard.
We estimate that the majority of tokenized apps will issue fungible tokens, which exhibit the following properties.
The cooperating apps negotiate an exchange rate between their datacoins. The purchase contract works exactly as described before, granting apps authorizations to each other’s data.
When accessing each other’s data, if payment is required, it is automatically triggered before data access is allowed. Fig 13 illustrates an example of what happens when App1 wants to use App2’s data.
App developers can purchase network resources such as the memory, CPU, storage, and bandwidth in one of the following two ways:
Paid p2p cloud — In this option, app developers pay STORE miners for compute resources they need for their apps. This is similar to paying AWS to rent EC2 instances, S3 storage, and other services. App developers may choose on-demand resources or enter into long-term contracts depending on their specific needs.
Zero-fee p2p cloud — In this option, app developers receive compute resources at no upfront cost from STORE miners. Not all apps may be eligible for receiving zero-fee compute resources. STORE miners in respective Markets evaluate apps for receiving zero-fee compute resources. In return for receiving zero-fee compute resources, app developers will share datacoin revenue with STORE miners. The specifics of revenue sharing varies from one app to another depending on the negotiation between STORE miners and app developers.
Depending on how developers purchase p2p compute resources, the revenue from purchasing datacoins may be paid out entirely to app developers or they share a percentage of the revenue with STORE miners. Figures 14 and 15 illustrate these two cases. Note that in both cases, the buyer pays in $STORE to purchase datacoins and the revenue is realized in $STORE.
Buyers pay app developers in datacoins to request access to data. This results in datacoin revenue. Depending on how developers purchase p2p compute resources, datacoin revenue may be paid out entirely to app developers or they share a percentage of the revenue with STORE miners. Figures 16 and 17 illustrate these two cases.
In practice, buying datacoins and using that purchase to access data are executed in a single step. So, $STORE revenue and datacoin revenue are also paid out in a single step. The exception may be when the buyer wants to buy datacoins for speculation purposes and doesn’t intend to purchase data with them. In this case, the $STORE revenue and the datacoin revenue will be paid out in two separate steps based on when or if the buyer eventually decides to purchase data.
Data creators — in many cases, end users of tokenized apps — may be incentivized by developers with revenue sharing contracts with them. So, while developers earn 100% of the datacoin revenue, they may incentivize data creators by sharing the revenue with them, as illustrated in figures 16 and 17. The ⅔ trust model used to secure the data elevates transparency where data creators have visibility into how (and if) their data is sold, the number of times it is sold and accessed, and so on. This transparency helps with creating reasonable revenue sharing contracts where developers want to incentivize users to create more valuable data.
Datacoins have some similarities to ERC20 tokens, but they are different in many ways.
Similarities:
Differences:
Fig. 18 compares datacoins with ERC20 tokens, stablecoins, and other coin types.
STORE cannot prevent data reselling because the buyer can copy the data to their local infrastructure, repackage, and resell it. This is very similar to how products are sold in gray markets in other businesses such as electronic items, phones, computers, etc. What STORE guarantees is integrity of data when it is accessed directly on the STORE platform. The data is guaranteed to have passed schema enforcement and other validations and it is guaranteed to be anonymized of any sensitive information. So, if data quality and such guarantees are critical, buyers will stick to legal access to data on the STORE platform.
All data, including sensitive and private information can be sold on the STORE platform. So, where is the user privacy then? Company’s tokenized apps are required to annotate sensitive and private data before submitting it to the platform for persistence and datacoin mining. The annotated segment is encrypted automatically at rest, so even an unauthorized access to the data doesn’t leak sensitive information. When sensitive information is annotated, additional metadata is created in the clear text form that describes the classification and categorization of the sensitive data. This metadata assists in data discovery and context-sensitive anonymization, when the data is eventually accessed. If the sensitive information happens to be included in the query executed by a purchase contract, the sensitive data is automatically anonymized with the information provided in the metadata. So, all queries always return safe data. This allows all types of data being traded without sacrificing user’s privacy.
If app developers fail to annotate sensitive information in their respective apps, the data created by those apps will not be attested by STORE miners. In other words, while the apps may create and sell their data on STORE, the data will not pass the validation rules for sensitive information and hence will not contain the proof of approval by STORE miners. It is a red flag for data buyers because it clearly implies that the apps haven’t fully followed protocol rules. Instead of taking punitive measures, STORE lets the markets decide if the data created by such apps have any value at all.
Cheating can happen in any environment and STORE is no exception. However, STORE’s data discovery APIs also describe if tokenized apps annotate any sensitive information and the schema used to annotate sensitive information. This transparency is what forces app developers to be honest. If the users of a tokenized app know that sensitive and private information is collected, but don’t see that information being annotated, they know that the app is not treating data privacy as it should. The market decides what should happen to the data (and the resulting datacoins) of that tokenized app.
[1] In this case company could mean a large enterprise or a single developer.
[2] Don’t take the coin names too seriously! They are used here for illustration purposes only.
[3] We use the terms “mined”, “created”, and “issued” interchangeably. 1 datacoin is created, mined, or issued when 1MB of data is created by a tokenized app.
[4] Source: https://medium.com/0xcert/fungible-vs-non-fungible-tokens-on-the-blockchain-ab4b12e0181a
[5] https://github.com/ethereum/EIPs/blob/master/EIPS/eip-998.md