There has been elevated curiosity in applying artificial intelligence (AI) in various settings to inform decision-making and facilitate predictive analytics. In recent occasions Sellit, there have additionally been makes an attempt to utilize blockchain (a peer-to-peer distributed system) to facilitate AI applications, for instance, in secure data sharing (for mannequin training), preserving data privateness, and supporting trusted AI decision and decentralized AI. Hence, in this paper, we carry out a comprehensive review of how blockchain can profit AI from these 4 features. Our analysis of 27 English-language articles revealed between 2018 and 2021 identifies a quantity of research challenges and opportunities.
1. Introduction
Artificial intelligence (AI), an important branch of pc science, underpins the analysis and development of the theory(ies), method(s), technology(ies), and application(s) for simulating, extending, and increasing human intelligence. While AI was first proposed in 1956, the curiosity in AI in all probability increased considerably after AlphaGo (an AI-based computer program) defeated Lee Sedol, the world Go champion. AI has been applied in diverse settings, ranging from healthcare [1–4] to drug discovery [5] to (medical) picture recognition [6–9] to automated driving [10, 11] and so on. McKinsey Global Institute, for instance, predicted that the AI market will develop to 13 trillion dollars by 2030 [12].
There are three key features in AI expertise, specifically, data, algorithm, and computing energy, in the sense that important data is required for training the algorithm to acquire a classification model, and the training process requires important computing power. In our massive data period, data can come from completely different sources (e.g., sensor systems, Internet of Things (IoT) gadgets and systems, and social media platforms) and/or owned by totally different stakeholders. This might result in some challenges.
One of the vital thing challenges is isolated data islands, the place data from one source/stakeholder is not accessible to others or the training of the AI model or it is too costly or impractical to collect the large volume of distributed data for centralized processing and training [13, 14]. There can additionally be the chance of being a single point of failure in centralized architectures [15], which can lead to data tampering.
In addition, data from totally different sources may be unstructured and differ in high quality. It can additionally be tough to determine the source and authenticity of the info. There can be the risk of invalid or malicious data. All these limitations can impression on the accuracy of the prediction [16]. For instance, in 2017, a gaggle of researchers from MIT demonstrated how one can trick Google’s AI classification to categorise a 3D printed turtle as a rifle [17]. It has additionally been shown that pretend biometric options can be used to taint recognition models, to facilitate impersonation and fraud [18]. Such assaults are additionally referred to as adversarial machine studying within the literature, and it’s an ongoing analysis topic [19–22]. However, it’s difficult for standard AI architectures to screen data effectively and/or monitor malicious data suppliers. Conventional centralized AI architectures can also result in privateness disclosure and data abuse. For instance, AI training tasks might have to handle delicate user data, such as users’ medical data, which can be divulged and even tampered with during the training process. There are additionally privateness rules that may limit the sharing of user data, even for mannequin training. This motivates the design of information sharing/trading platforms [23].
In practice, AI models are created, educated, and used by totally different entities. The training course of is opaque to the users, and users could not absolutely trust the model they’re using. In addition, as AI algorithms turn out to be more and more advanced, it’s tough for folks to grasp how the training result’s obtained. Hence, a latest pattern is to maneuver away from centralized AI approaches to decentralized AI approaches.
Compared with AI, blockchain is a comparatively younger expertise as it was first proposed by Nakamoto and Wright in 2008 [24]. Blockchain, a peer-to-peer distributed system, ensures tamper-proofing by way of the underlying hash algorithm and time stamp know-how. The privacy of knowledge stored on the blockchain is assured through the use of some cryptographic algorithms. Through the utilization of good contracts, this system may be executed mechanically to make sure the credibility of the execution results. Through consensus mechanism and distributed ledger technology, all nodes can take part in bookkeeping and verify the transactions. The market capitalization of Bitcoin as of February 20, 2020, was approximately a hundred seventy five billion dollars [25]. As proven in Table 1, these traits of blockchain could overcome the challenges confronted by AI; hence, it varieties the concentrate on this paper.
There has been some research on how blockchain and AI could be combined. In [26], the authors described how the mixing of blockchain and AI can develop a brand new ecosystem for a decentralized economic system by means of decentralized data storage and management, infrastructure, and AI applications. However, there’s a lack of debate on how the precise expertise of blockchain is applied and the way privateness is protected. The works in [27–29] focused on the mixing of blockchain and artificial intelligence and their mutual reinforcement. In [30], the authors discussed the feasibility and benefits of combining blockchain and artificial intelligence for energy cloud management. But it doesn’t categorise the literature out there for dialogue, whereas the applying situations are limited.
Many firms have also accomplished plenty of exploration of blockchain-based AI applications. SingularityNet [31] utilized blockchain know-how to build distributed AI buying and selling marketplaces. TraneAi [32] has created a blockchain-based AI platform to speed up the bogus intelligence training course of in a decentralized method. Neureal [33] is a distributed open source platform for artificial intelligence that provides peer-to-peer tremendous AI computing. In basic, a lot of the applications are aimed at developing distributed ecosystems and infrastructures via blockchain.
In this paper, we survey the prevailing literature specializing in the applications of blockchain in AI. Specifically, we searched using keywords similar to (blockchain and AI) on major educational databases (e.g., IEEE Xplore, ScienceDirect, ACM Digital Library, and SpringerLink) for articles published in English between Feb, 2018 and Jan, 2021. We located over 500 articles, and we excluded articles that aren’t directly relevant. Eventually, we included 27 articles for discussion in this paper.
The remainder of the paper is organized as follows. In the following two sections, we briefly introduce blockchain and AI, prior to presenting our review of the applications of blockchain in AI (i.e., data sharing, privateness protection, trusted AI decision, and decentralized AI) in Section four. Finally, we focus on the findings, analysis challenges, and identified analysis alternatives within the final part.
2. Blockchain Technology
The architecture of blockchain primarily comprises the info layer, network layer, consensus layer, incentive layer, contract layer, and application layer, see also Figure 1.
The data layer primarily focuses on the information structure, together with the hash function, digital signature, Merkle tree, uneven encryption, and other technologies. The most important structure of the information layer is the block, and the block construction is shown in Figure 2. A block consists of each the block head and the block body. The block header incorporates the Merkle root, timestamp, and hash value of the current block and former block. The block physique mainly includes transaction info and Merkle tree. Each transaction is signed by the transaction’s initiator and then processed and verified by the miner. The verified transaction is embedded in the block. The hash worth of each transaction is mixed in pairs to calculate the hash, after which, the ensuing hash value is mixed in pairs to calculate the hash value once more until the Merkle root, which is recorded within the block header. Every change to the information about each transaction stored on the blockchain affects the Merkle root. In this manner, the tamper-proofing of blockchain can be realized. Every block additionally stores the hash worth of the previous block and timestamp, resulting in a time sorted chain.
The community layer primarily incorporates P2P community, design of the data communication mechanism, and data verification mechanism. There is no centralized server within the blockchain. All messages propagate between nodes in a peer-to-peer method. All nodes maintain the blockchain together. One node generates a new block and transmits it to the other nodes. Other nodes retailer the copy of the block after verification. Subsequent blocks may also be generated on the premise of this block. In this way, all nodes can keep a backside ledger.
The consensus layer primarily consists of various consensus algorithms. The consensus algorithm is used to determine which node can add new blocks to the main chain. Common consensus algorithms embrace PoW, PoS, and PBFT.
The incentive layer mainly consists of some incentive measures. There is no centralized server in the blockchain, so the safe operation of the blockchain is dependent upon the energetic participation of every node. At present, the generally used incentive measures include (1) the corresponding reward for every block’s bookkeeping proper and (2) the service cost for every transaction. With the event of blockchain, the design of the motivation layer of blockchain in the future just isn’t solely limited to financial rewards but also to achieve widespread goals.
The contract layer encapsulates a number of scripts, algorithms, and smart contracts to support the programmable options of blockchain. Through the preset rules and conditions, it can be routinely executed and not using a third get together. This is the muse of blockchain trust. The final part is the appliance layer. It accommodates all types of blockchain applications, together with finance, regulation, audit, and health care.
2.1. Smart Contract
Smart contract was first defined by Szabo in 1994 [34]. Smart contracts have not been adopted on a big scale due to the lack of a reliable execution surroundings before the introduction of blockchain expertise. With the speedy development of distributed ledger technology, especially the large-scale deployment and application of blockchain, folks pay extra consideration to sensible contract, which is the key and important a half of distributed ledger know-how.
Smart contract is a sort of laptop protocol that may self execute, self implement, self verify, and self prohibit the execution of its directions. It allows transactions to be executed between untrusted or nameless events without the need for a trusted third celebration. These transactions are traceable and irreversible. Smart contract consists of value, address, function, and state. The transaction is taken as enter, the corresponding code is executed, and an output occasion is triggered; then, the state modifications according to the practical logic [35]. All parties agree on the main points of the smart contract upfront, including eventualities that trigger contract execution, state transition rules, and responsibility for breach of contract. Then, the sensible contract is deployed on the blockchain in the form of a code. After that, when the necessities are satisfied, the sensible contract might be triggered and routinely executed.
Ethereum is the most popular platform for the event of sensible contract [36]. The code of Ethereum good contract is written in stack-based bytecode language and runs on Ethereum virtual machine (EVM). Solidity and Serpent are often used to write good contracts. Hyperledger fabric can even deploy smart contract, which known as “chain code.” It is the single interplay channel with the blockchain and the one source for transaction generation. Chain code is usually developed using Go or Java.
2.2. Consensus Mechanism
There isn’t any trust relationship between every node in the blockchain. It is necessary to coordinate every independent node in order to share information in such a network. Therefore, the network system will decide who’s the subsequent bookkeeper by way of related protocols so as to succeed in a consensus of which is the consensus mechanism. The essence of the consensus mechanism is to resolve the problem of decentralized trust [37]. It is an important technology for the impartial operation of the blockchain. In the steady functioning of a blockchain system, an excellent consensus mechanism performs a very important half. The blockchain can successfully negotiate and assemble a constant blockchain construction utilizing an effective consensus mechanism.
There are two sorts of consensus algorithms. One is for non-Byzantine fault, such as RAFT and Paxos. The other is for the Byzantine common drawback [38], corresponding to PoW, PoS, DPoS, and PBFT. There are two methods to take care of Byzantine fault. One is to restrict the chance of malicious behavior by rising the cost of doing evil, such as PoW and PoS. The PoW algorithm will rely computing power as value, and the PoS algorithm will count stakes as value. Another means is to design certain guidelines. Even if there are certain malicious nodes, all other nodes can still attain a consensus, corresponding to the sensible Byzantine fault tolerant algorithm. Several widespread consensus algorithms are described as follows:(1)Proof of Work: the PoW algorithm was first proposed in Bitcoin, and its core idea is the competition of node computing energy. The miner can have the bookkeeping right by consuming lots of computing power to calculate a hash worth that meets the requirements [39]. The block will be added to blockchain after the other nodes validating it. Then, the bookkeeping node shall be rewarded. In the PoW consensus mechanism, it takes lots of resources for malicious nodes to destroy the system (control greater than 50 p.c of nodes). Therefore, it could possibly limit the malicious behavior of malicious nodes. PoW may be decentralized, and nodes can enter and depart freely. But clearly, it’ll cause a waste of resources and low efficiency.(2)Proof of Stake: the PoS algorithm is a substitute for clear up the waste of resources within the PoW algorithm. It reduces the problem of mining as a outcome of quantity and time of tokens taken by each node. To a sure extent, it shortens the time to reach consensus and avoids plenty of waste of resources within the PoW algorithm. But, at the same time, PoS advantages wealthy miners and may lead to near monopoly. Therefore, blockchain tasks using the PoS algorithm often have to run the PoW consensus algorithm for a time period and then convert to PoS to prevent numerous stakes accumulating in a small variety of nodes.(3)Delegated Proof of Stake: the DPoS consensus algorithm is improved on the premise of the PoS algorithm. The consensus process no longer requires all taking part nodes to battle for the bookkeeping rights, but pick some representatives through voting. It greatly improves the effectivity of consensus.
There are also another consensus algorithms. The comparison of consensus algorithms is proven in Table 2.
2.3. Taxonomy
Generally, blockchain can be divided into three types according to the access degree of blockchain data: public blockchain, personal blockchain, and consortium blockchain [40]. The comparison of these three forms of blockchain is proven in Table 3.
2.3.1. Public Blockchain
All data saved in the public blockchain are open and clear to the basic public, and all nodes can be a part of and go away the blockchain community freely. Everyone can verify and verify the transaction and in addition compete for the rights to bookkeeping. Bitcoin and Ethereum are each public blockchains.
2.3.2. Private Blockchain
The non-public blockchain is totally controlled by a company. Not each node is allowed to participate within the blockchain. Only these nodes from specific organizations are allowed to affix the competitors for bookkeeping rights. It has strict authority management for data access.
2.3.3. Consortium Blockchain
It is a mixture of public blockchain and private blockchain. The nodes with permission are selected in advance to take part within the consensus process of the consortium chain. Other nodes can participate within the transactions, however can’t obtain the bookkeeping rights. The data within the blockchain may be public or non-public. The consortium chain could be seen as partially decentralized. Hyperledger cloth is a consortium blockchain.
three. Artificial Intelligence
The analysis of AI covers a variety of subjects, together with machine learning, laptop vision, and natural language processing. Among them, machine learning is a crucial know-how that allows AI to mimic human thought and behavior, and most present AI applications are based mostly on it. Machine studying has been developed over a long period of time, now has a relatively full technical framework and mature algorithms, and has developed techniques similar to deep studying, reinforcement learning, and federated studying.
three.1. Machine Learning
Machine learning was first defined by Samuel [41] in 1959 as “the subject of examine that provides computers the ability to learn without being explicitly programmed.” As shown in Figure three, the everyday workflow of machine learning involves training and testing. In the training section, the original data is preprocessed first. Then, characteristic extraction and mannequin training are carried out based mostly on these data. In the take a look at part, data preprocessing and feature extraction are required for the take a look at dataset, after which, the test data is analysed and categorised by the training mannequin.
Machine learning can usually be divided into supervised learning, semisupervised studying, and unsupervised learning. Supervised learning makes use of the labeled data to train the mannequin, which is used to foretell. K-nearest neighbor, choice tree, neural network, and SVM are all supervised studying algorithms. Unsupervised learning uses training data set with no labels. The key of unsupervised learning is to investigate the hidden construction of knowledge and find out whether there’s a divisible set. Semisupervised learning combines supervised learning with unsupervised learning, using a few labeled data and a appreciable amount of unlabeled data for training and classification.
three.2. Federated Learning
The mannequin training of machine learning wants numerous delicate data, and data privateness is a very important problem. At the same time, data is distributed in several organizations. These decentralized data are often heterogeneous and unbalanced, so it is difficult to combine data. Google first proposed federated studying in 2016 [42], which combines machine learning with distributed computing. As shown in Figure 4, the info owners train their native data to get their local submodel. Then, they will addContent the up to date parameters to the coordinator, which aggregates the native submodel into the federated model. In federated learning, participants only have to share their own training model parameters and do not must share the unique data, which might defend the info privacy to a certain extent.
4.1. Data Sharing
Data is the most important resource of AI. The quantity and quality of data immediately have an result on the accuracy of AI classification outcomes. But, within the process of sharing data, there are some issues. First, the info wanted for training is managed by different stakeholders, and they can not belief one another. It is troublesome to authorize or verify the info. Second, there may be malicious users sharing malicious data for certain purposes. There are already some blockchain-based solutions to these problems. Detailed comparison is shown in Table 4.
The work in [43] proposed a blockchain-based, decentralized, and untrusted data market. IoTs’ equipment suppliers and AI solution providers can carry out transparent interaction and cooperation by way of this platform and realize user registration, data addContent, data search, purchase, payment, and feedback by way of sensible contract.
The authors in [44] proposed a SecNet architecture, which realizes safe data sharing and possession safety via blockchain. A data storage module and an access management module are included in each SecNet node. The use of the non-public data heart (PDC) [48] permits users to observe any operation of their own data to achieve fine-grained management of data access behavior. Any data able to be shared is registered within the blockchain, and the access of different events to the data is also verified and recorded within the blockchain. The economic incentive of sharing data or exchanging security services between completely different entities is realized by way of sensible contracts.
Singh et al. [46] designed an IoT structure based mostly on blockchain and AI, known as BlockIoTIntelligence. As proven in Figure 5, there are four layers in this framework: device intelligence (DI), edge intelligence (EI), fog intelligence (FI), and cloud intelligence (CI). Each intelligent system in DI is a node within the blockchain. Through the blockchain, data is transmitted between numerous IoT devices in a distributed method. And, IoT units share their data with the EI. The EI transfers the processed data and the underlying computing tasks to the FI in a distributed method. In FI, AI know-how is principally used to coach fashions and make decisions. A variety of fog nodes that support AI are connected with the blockchain to share intermediate parameters or architecture information to CI. Data middle, the core component of the CI, and support AI are linked with blockchain to provide the service of decentralized and secure big data evaluation for IoTs’ applications.
The work in [47] proposed a cognitive manufacturing mining course of based mostly on blockchain. This paper improves the normal blockchain network and uses the distributed consensus blockchain network based on sidechain to enhance the issue of restricted cupboard space of conventional blockchain. This methodology stores the information of smart gadgets in a separate database and then maps the data in the sidechain transactions of blocks. Based on the traceability and tamper-proofing of blockchain, the data loss may be effectively prevented at any stage of the cognitive production course of.
Robots are used increasingly broadly nowadays, and they should work together with each other to enhance their talents. Therefore, the safety data sharing between each robotic is also crucial. In [45], the authors proposed a knowledge mannequin sharing framework known as RoboChain, which is used for safe data sharing between robots located in different locations. Robots constantly learn to enhance their interplay ability to observe and enhance human health. All operations are carried out domestically. Personal data won’t go away the native hub. After the model is up to date, the information shall be deleted. At the identical time, the native repository will publish a “change” to the hub. The hub will announce the replace to the entire network. Robots on other websites will know that there are available updates on the network through their own local hub. After each mannequin update, they should ship a model update transaction to the blockchain. The transaction incorporates the parameters of the transaction update and the knowledge of the robots participating in the consensus so that other members can verify how the mannequin is created.
4.2. Privacy Preserving
Privacy preserving can additionally be a key issue. The protection of such private delicate data during the sharing process is tough, which can forestall users from sharing their data. Additionally, the info ought to be utterly controlled by the owner, but now users need to ship their very own data to the service provider when utilizing the service, ensuing within the abuse of non-public data by some big companies. Detailed comparison of the solutions for privateness preserving is proven in Table 5.
The work in [49] proposed a distributed multilayer ledger named DeepLinQ, which may allow privacy-preserving data sharing. Taking medical data for instance, a multilayer blockchain architecture is proposed on this paper, which mixes the advantages of blockchain (such as complete decentralization, consensus mechanism, and user anonymity) with the present standing and actual wants of Electronic Health Records (EHRs), such as EHRs has been centralized; due to the requirement of efficiency, it is unimaginable to make use of PoW, and there’s a have to introduce jury and verification committee. The underlying blockchain maintains the present properties of Ethereum, and the department layer meets these precise wants by way of the design of features. Medical data is kept off-chain within the structure proposed on this paper, and the blockchain holds pointers that may find saved data off-chain. This paper presents 4 methods to design department layers: (1) add reliable advanced validators; (2) use subgroup signatures; (3) create trusted branches; (4) employ efficient consensus protocols. This proposed structure reflects the management of patients on their very own health information, eliminates users’ considerations about privateness disclosure, and promotes data sharing between completely different hospitals.
In [50], the authors proposed to run machine learning algorithms on the blockchain, the place totally different nodes in the blockchain each calculate part of the machine learning algorithm and cooperate to complete the whole machine studying task. In the sensible home environment, the data of IoTs devices is collected to foretell users’ actions. For example, a tool is routinely opened when a selected person enters a room. The configuration file is generated for every consumer and system through association rule mining and calculation of personalized parameters. The configuration file is saved in IPFS, and the hash of the configuration file is saved on the blockchain. At the identical time, the blockchain calculates one other hash worth based on the transaction data and stores it on a smart hub. When the person enters the room, the smartphone logs into a wise hub. The transaction data within the blockchain is queried by the hash value saved in the good hub. The user configuration file is obtained on the IPFS via the hash value saved within the transaction to discover out the setting of a tool. The confidentiality and authenticity of users’ data are assured by way of the blockchain and smart contract.
Kuo et al. [51] proposed to spread a half of the mannequin and other meta data by the metadata within the transactions of private blockchain. Each node can initialize, replace, consider, and forward the model. Both the mannequin and the hash of the mannequin are included within the update transaction. Other transactions include simply the hash of mannequin to lower the blockchain dimension. The non-public chain used in this paper does not provide mining reward, and the motivation for block mining comes from the utilization of cross-agency data in a privacy-protected way to improve the accuracy of predictive models. At the identical time, this paper proposed a proof-of-information algorithm based on the PoW algorithm. When a site trains the model with local data and publishes it, different websites, respectively, evaluate the model with their own native data. The site which has the highest error wins the “information bid” after which trains the model with its personal local data. When a site updates the mannequin and discovers itself to be within the biggest mistake, the web learning process ends. The current model is the consensus mannequin. The proof-of-information algorithm can use all patients’ data to train the mannequin, nevertheless it does not have to transmit personal protected health information, which effectively achieves the objective of privateness preserving.
In addition to designing the blockchain architecture and consensus mechanisms, cryptography is also usually used to realize data privateness preserving in AI. The work in [52] proposed a blockchain-enabled learning data management method. The tamper-proofing of blockchain may help forestall the forgery and guarantee the integrity of data. The unique data and the hash of the original data are saved in the blockchain based on the schemes proposed in this paper. The AI mannequin verifies the information integrity by evaluating the hash worth of the acquired authentic data with the hash value stored within the blockchain.
In [53], blockchain has been used to construct a trustworthy and safe data platform throughout quite a few data sources, and IoT data is stored on the blockchain via Paillier encryption. IoT data providers’ delicate information and SVM mannequin parameters are confidential. Data providers can replace the gradient with out understanding the model parameters by way of homomorphic encryption. In this manner, collusion between data providers and data analysts can be avoided. Combined with the tamper-proofing of blockchain, a safe SVM training algorithm is established to address data integrity and privateness issues.
In healthcare, each shoppers and research companies require private data to train their deep neural networks. Data security and privacy are two of the vital thing difficulties they encounter. Mamoshina et al. [54] proposed to use blockchain to healthcare to deal with data privateness points. Users can store and sell their very own biological data on the blockchain architecture offered on this article and only allow organizations which have paid for them to access the info. Data validators are used to confirm the data to assure the standard of the data given by users. All interactions (user uploads the data, data validator validates the info, and buyer buys the data) are carried out by way of transactions on the blockchain, so as to all data use activities are tracked pretty. This research additionally offers a threshold encryption scheme to assure data security and consumer privateness. Users make the most of symmetric encryption to encrypt data and then divide the vital thing and distribute it to the blockchain nodes, which act as the necessary thing managers. The clients get hold of enough keys from the vital thing managers to decrypt the data after they buy the data. In long-term data storage, the edge encryption scheme can successfully handle the only level of failure. The leakage of reminiscence and single key supervisor won’t lead to data leakage, which successfully ensures data security.
4.three. Trusted AI Decision
Different organizations create, train, and use fashions of machine studying and AI. The entities that train the model are completely different from the entities that present the information. Failure to concentrate to data utilized in training the model might lead to improper outcomes. Meanwhile, the trained mannequin may need some restrictions if we make use of biased data. All of the above operations are opaque to users, and users cannot belief the model they’re utilizing. Therefore, we want a mechanism to record the entire strategy of AI (model creation, training data, and training process), and these information cannot be changed or cast. Blockchain, a platform that permits quite a few participants to trustly share data, has the characteristics of tamper-proofing and transparency, which are very appropriate for recording the entire process of machine learning. Detailed comparison of the solutions for trusted AI choice is shown in Table 6.
The work in [55] proposed to manage swarm robots utilizing blockchain to cope with the problem of Byzantine robots (the robots that show arbitrary errors or malicious behavior). In the scheme proposed in this paper, every robotic could additionally be utilized as a node or a miner on the blockchain. , , and are applied by sensible contract. Robots register in the blockchain via , publish their very own opinions through , and then obtain the views of other robots through . achieves consensus of robotic views through certain strategy functions. Finally, the experiment demonstrated that the use of blockchain can detect and reject Byzantine robots from the group. Thus, the decision-making results with greater accuracy are obtained.
In [56], a trusted routing scheme for wireless sensor networks based on blockchain and reinforcement studying was proposed. The routing information (destination tackle, transmitted data, etc.) between various routing nodes is stored on the blockchain. Through the tamper-proofing of blockchain, the credibility of routing info is improved. Firstly, completely different data packets are represented by different blockchain tokens; secondly, when the routing node joins the blockchain community, it must register by way of the smart contract; Thirdly, before sending the info packets to the subsequent hop route, each routing node has to substantiate the routing info stored in the blockchain, and the server node confirms the knowledge by way of the consensus system and publishes it in the blockchain. Then, each routing node’s studying model collects info from blockchain. According to the data, a routing scheduling scheme based mostly on reinforcement learning is proposed to assist routing nodes make higher routing decisions. Due to the tamper-proofing, traceability, and consensus mechanism of blockchain, the credibility of routing data is enhanced, and the credibility of routing determination is enhanced.
In army operational facilities, commanders often must make well timed choices based on data from multiple intelligence sources. The work in [57] proposed a framework combining AI, as shown in Figure 6, machine learning and personal blockchain to provide determination assist for operational centers. This paper proposed to synthesize a quantity of data sources via multiple AI agents to predict and evaluate the current decision. Blockchain plays two roles on this framework. First, different AI agents can confirm whether they’re operating in the identical blockchain state as other brokers, to ensure that all brokers are analyzing the identical dataset. It can better evaluate and analyze the decision results. Secondly, the AI agents are rewarded by the blockchain record to encourage the AI brokers to supply choice assist.
The work in [58] took federated learning for instance and proposed to utilize blockchain to document important processes of AI in an invariable and verifiable method. Every course of is recorded as transactions within the blockchain.
Wang [59] proposed a trusted ML evaluation framework based mostly on blockchain. The automated execution of ML algorithms is achieved by way of good contracts, and the credibility of machine studying is realized by storing data permanently. Model initialization, training, validation, scoring, evaluation, and reward allocation are routinely executed by the preset good contract, which tremendously improves the credibility of the training results.
In the method of AI training, a lot of computing energy needs to be consumed. Therefore, in [60], the authors proposed to unload a large amount of computation of the deep neural network from the cloud server to the sting server and then utilized blockchain to accomplish two issues: first, encourage the edge server to just accept and full the computation; second, make sure the credibility of edge computing results. Both the embedded gadget and the sting computing server must pay a deposit to the blockchain, and then, the embedded system publishes the computing task. After the edge servers have completed its computing duties, it will get the reward from the embedded device to encourage the edge servers to calculate. At the same time, the credibility of the sting computing results of all the above steps is assured by the good contract, thus ensuring the reliability of the deep neural community training model. In addition, this paper proposed to make use of Node.js as an alternative of EVM to cope with good contracts and carry out some complicated calculations. The modified program standing and outcomes are returned to the blockchain for storage after Node.js execution.
Winnicka and Kęsik [61] additionally proposed to use edge computing to reduce the pressure of the server when training very massive quantities of data. In addition, they also mentioned another problem of current artificial neural network training. If the server receives monumental numbers of information in a brief house of time and processes them, there could also be an issue of overtraining. For this problem, this paper offered to apply blockchain idea to AI. The scheme helps the re-training of classifiers in case of pressing need through setting the priority of every task. All duties are sent via blockchain transactions, and every task has a precedence. The larger the priority, the higher the reward. When miners receive duties with greater precedence, they can select to interrupt the present task in order to achieve pressing wants. The interrupted task can be broadcasted once more in the future, and their training could be executed again until the duty is completed, i.e., the cease condition is reached. At the identical time, the paper proposed that the extra rewards the gadget gets, the more tasks it performs, which represents the consumption of the system to a certain extent, so the device is extra likely to fail.
four.4. Decentralized Intelligence
A vast amount of IoT data has been created with the fast evolution of the IoTs. Through the AI service, we can get hold of the training outcomes and fashions from the massive IoT data. In order to perform complex model training tasks, collaboration with multiple devices is generally wanted owing to the distribution of IoT units and edge computing devices. There are two ways to collaborate right here. First, different IoT devices or edge devices have to share data for complete data analysis and prediction (such as intelligent monitoring, monitor in several areas needs to share data). Second, different IoT devices or edge devices share their very own studying fashions after which combination the fashions, that is, federated studying. Detailed comparison of the solutions for decentralized intelligence is shown in Table 7.
Federated learning is a distributed machine learning expertise with privacy preserving. A massive variety of nodes utilize their own native data to train their own native model in a distributed manner. Besides, they only must share their fashions instead of sharing original data, which can stop the leakage of delicate data.
The work in [62] proposed a decentralized security learning model LearningChain. In the training chain community, there are two kinds of members, data holder and computing node. The computing node helps the holder to train the training mannequin, which is paid by the data holder. Finally, completely different data holders work together to coach a world model. The entire training process is divided into three sections. The first step is the initialization of blockchain and the establishment of the P2P network. The second step is local gradient computation. The third step is world gradient aggregation. The successful node of the PoW adopts an aggregation algorithm, updates the mannequin, after which creates a brand new block containing the local mannequin and global mannequin info and provides it to the blockchain. The pseudoidentity is utilized within the scheme proposed in this paper to stop the disclosure of the info owner’s identification information. The data holders can continuously rework the pseudoidentity information in a quantity of generations to further protect the id. When a computing node creates a brand new block, it checks the validity of the earlier block. If the block just isn’t valid, it checks the validity of the previous block until the correct block is discovered to resist the assault of the Byzantine node.
In [63], the authors proposed a safe multiparty learning system referred to as BEMA based mostly on blockchain. This paper talked about two types of Byzantine assaults: (1) malicious individuals broadcast a malicious native model to different events for changing the outcomes of categorization; (2) malicious individuals send malicious calibration information to specific parties to mislead the replace strategy of the native model. Through the system proposed on this paper, the primary type of Byzantine assault can be effectively prevented, and the second type of Byzantine attack may be managed in an appropriate vary. BEMA consists of system initialization, off-chain pattern mining, and on-chain mining. In the initialization phase, each participant broadcasts its native mannequin parameters and stores them within the blockchain. Off-chain pattern mining encourages every participant to test the learning model on the blockchain with their very own local data. Then, they could find data samples that may calibrate these models and broadcast them to the blockchain. In the on-chain mining phase, miners will update the corresponding mannequin with acquired calibration data samples after verification.
Although federated studying does not must share the unique data, the privateness of coaching data cannot nonetheless be completely protected. Some research shows that the necessary information about original training data could additionally be calculated via the intermediate gradient. We need to combine the information of cryptography to beat these difficulties. For instance, homomorphic encryption is utilized to protect data privacy in [71]. And, secret sharing and symmetric encryption are utilized to protect data privateness in [72].
The work in [64] proposed a secure and decentralized privacy-preserving deep learning framework. In this framework, members initially reached a consensus on the preliminary parameters of the cooperation mannequin. The parameters had been encrypted and recorded within the blockchain via the transaction signed by all events. For any participant , the local data is skilled to get the intermediate gradient , which is encrypted and recorded within the blockchain by way of the transaction . Then, the employees will download these transactions to acquire the intermediate gradient of the members and calculate the brand new collaboration gradient at round by way of the good contract. The idea of homomorphic encryption is used here. It just isn’t essential to decrypt the previous cooperation gradient and the intermediate gradient of the individuals when calculating the new cooperation gradient. We can get the encrypted weight at round by computing . Then, shall be connected to . This article additionally proposed a consensus protocol called blockwise-ba. Three steps are included within the consensus protocol: (1) a leader is chosen randomly to generate a new block is by utilizing cryptographic sortition talked about in [73], (2) the new block is verified and accepted by way of executing a Byzantine agreement protocol by a committee, which comprises of individuals whose transactions are included in the new block, and (3) members of the committee inform their neighboring members of the brand new blocks through gossip protocol. Through these three steps, all members reach a consensus.
With the continuous development of information generated by IoT, data processing and analysis need to be transferred to the edge computing gadgets to cut back the burden of the cloud. The work in [65] proposed a information market of IoT based on edge-AI, which makes use of knowledge blockchain (k-chain) for information management and buying and selling. The information market consists of edge-AI nodes and information aggregators. AI algorithms are deployed in the edge-AI nodes who receive the IoT data, analyze the data, and acquire data, and they are often both a purchaser or a seller on this market. Edge-AI nodes must upload the encrypted data to the nearby KAGs. KAG is an enhanced base station (BS) with higher resources, which aggregate the knowledge of edge-AI nodes in the corresponding region.
K-chain is separated into two subchains: data management chain (KM-chain) and information buying and selling chain (KT-chain). KAGs collect the uploaded data of its protection, generate knowledge blocks, and retailer them in KM-chain. At the identical time, knowledge management good contract (KMSC) is deployed in KM-chain to comprehend automated knowledge management. The consensus algorithm used in KM-chain is recognized as proof of capacity (PoC). That is, KAG who has contributed probably the most storage capacity in the past period of time could be a leader and broadcast the brand new information block. KT-chain is used to document knowledge buying and selling. All the trading course of is accomplished by way of information buying and selling good contract (KTSC) to ensure buying and selling effectivity and equity. Besides, this paper proposed a brand new consensus algorithm Proof of Trading (PoT) which mixes PoW and PoS. For a KAG, the whole number of knowledge buying and selling currency (KC) it owns is taken as its stake. Then, the issue of the hash puzzle to be solved within the consensus process shall be adjusted dynamically based on the stakes owned by KAGs. The more the stake KAG owns, the easier the hash puzzle it has to solve. PoT cannot solely keep away from the waste of resources compared with PoW but additionally resist some attacks compared with PoS.
The work in [66] additionally deployed AI at the edge of community. The authors mixed blockchain and deep Q-learning and proposed a collective Q-learning, which is used to allocate computing community resources to users. First, each edge node learns domestically and trains DNNS. Then, the parameters of the local model and different required data are encapsulated into the transaction. The learning results are shared by way of the blockchain. This enables a quantity of nodes to work together to complete a posh task.
FLchain is proposed in [67] to reinforce the safety of federated studying applied on cloth. The scheme proposed in this paper utilizes the idea of cloth channel. The channel is used to attain communication isolation between no much less than two peers. Entities outside the channel are unable to acquire the information within the channel, so as to realize the privacy of transactions. Every global model is educated on a different channel in FLchain. The specific steps are as follows: (1) the gadget inquires the out there channel and obtains the channel record; (2) the system selects a channel and registers on channel; (3) the system downloads the current international model of the channel from the blockchain; (4) the system calculates the local model with local data; (5) the gadget sends the native model to the blockchain; (6) after a time period, members on channel update the global model by way of the consensus algorithm and generate new blocks.
Federated learning can effectively protect privacy as a end result of it doesn’t must share authentic data. However, there could also be some malicious behavior, which will affect the model’s high quality. The choice of reliable workers is very important for task publishers. For this downside to be solved, the authors in [68] proposed to make the most of consortium blockchain to manage workers’ popularity. The structure of the system in this paper advised that the task writer evaluates the mannequin high quality and generates status opinions after receiving the native model uploaded by all staff. The reputation opinions are stored within the block and maintained by the consortium blockchain after being verified by the miners of consortium blockchain. Due to the decentralization and tamper-proofing of blockchain, reputation opinions stored in blocks can’t be changed. After that, task publishers can select dependable workers by way of status blockchain to enhance the quality of the training model.
In addition to federated studying, there are another schemes for decentralized intelligence. The work in [69] proposed a framework in which everyone could brazenly view the model’s forecast and input data to reinforce the model. That is, the model is educated by many contributors in collaboration. Verification of the incentive mechanism, data upload, and model training are completed via the execution of sensible contract. This paper introduces in detail the incentive mechanism used to inspire individuals to provide useful data. As shown in Figure 7, the model provider saves a deposit as a reward and defines a loss function , the place represents the mannequin and represents the dataset. Each participant must deposit 1 unit of foreign money as a deposit. The smart contract initially pays to the first individual , and updates the mannequin to with its personal data. Then, must pay to the second person , until the final individual pays to the good contract. So, the reward for the tth person is his own deposit plus . The better the mannequin skilled by each participant t is, the much less the amount it needs to pay to the next participant. This can motivate individuals to offer useful data.
Kurtulmus and Daniel [70] proposed to make use of blockchain expertise and good contract to create an automated anonymous machine studying model trading market. The complete process consists of 4 phases. The first section is initialization. Bob, the organizer who desires to obtain the machine learning mannequin, creates an Ethereum good contract and supplies the hash value for the data to the smart contract to set off the randomization operate to generate the index of training and take a look at data. Then, Bob sends the training data and nonce to the good contract. The data could be verified by the hash value supplied previously; the second stage is the submission stage. Different scheme providers can obtain training data from sensible contract and submit their own training mannequin; the third stage is the analysis stage. The organizer Bob sends the test data to the sensible contract. If Bob does not upload the check data within the specified time, the training data is used for testing. The submitting get together calls the evaluation operate to submit the analysis rating. The fourth stage is the finalize stage. The organizer Bob pays rewards to the most effective mannequin supplier. If there isn’t any best model, the deposit shall be returned to Bob. This scheme helps users to obtain machine learning models at a sure value and automatically trains, evaluates, and trades fashions via good contracts. Organizers can acquire totally different fashions trained by decentralized committers and select the most effective model, which greatly improves the efficiency and credibility of model buying and selling.
5. Conclusion and Future Research
We surveyed the present literature to know the potential applications of blockchain in AI. For example, we defined how the totally different traits of blockchain can be used in supporting data sharing, privacy preserving, trusted AI choice, and decentralized intelligence.(i)First, as a decentralized platform, blockchain permits data house owners and data users to share or commerce data in a peer-to-peer method [23]. Because blockchain is clear and immutable, it might possibly minimize the potential for fraud in distributed data sharing or transaction.(ii)In addition, the underlying cryptographic algorithms (hash algorithms, homomorphic encryption, threshold encryption, and so forth.) used to process data stored on the blockchain assist make sure the confidentiality, integrity, and authenticity of sensitive data.(iiii)The use of sensible contracts to automate mannequin creation, training, sharing, decision-making, and traceability on blockchain helps ensure the credibility of decision results.(iv)Incentive mechanisms can be designed on blockchain to promote the cooperation of all participants in completing the AI training duties.
In addition, we additionally recognized a quantity of present and emerging challenges, which is able to hopefully guide future research agenda.
5.1. Identity Privacy
Blockchain privacy preserving can be both identification privacy or transaction privateness. Identity privateness preserving ensures that an attacker can not match an address on the blockchain to a user’s precise id, and transaction privateness preserving ensures delicate data from being stolen or tampered. Most current schemes use blockchain and decision-making process to, respectively, report the training data and defend the privateness of delicate training data. However, identification privacy preserving is generally ignored.
Bitcoin and Ethereum provide anonymity by utilizing pseudonames as a substitute of actual names for managing and verifying transactions. However, the user’s real id can still be inferred by monitoring the user’s transactions [74,75]. Androulaki et al. [76], for instance, demonstrated how one can use behavior-based clustering to investigate Bitcoin transactions and consequently match 40 percent of student identities to the associated blockchain addresses, even when users had adopted the privacy measures really helpful by Bitcoin. Monroe uses numerous strategies to attain anonymized transactions, similar to stealth-address and ring confidential transactions (RingCT) [77]. However, the number of transactions is proscribed due to using ring signatures.
In addition, it can be challenging to attain blockchain identification privateness preserving, when one takes into consideration blockchain/privacy regulation, deployment difficulty, and robustness of the corresponding architecture for privateness preserving schemes and the impression of privateness preserving schemes on performance.
This reinforces the significance of designing light-weight privacy-preserving schemes.
5.2. Performance
The scalability of blockchain may be considered from each data storage and transaction rate. In AI applications, important space for storing is needed to record the training data and the generated transactions. However, due to the restricted blockchain cupboard space, it is impractical to store the complete training data. Some existing schemes use sharding [78], sidechain (see [79] for an overview of sidechains), and some other ways to mitigate the storage limitations in blockchain. In addition, the throughput of most public blockchains could be very limited. For example, Bitcoin can only deal with 7 transactions per second, whereas Ethereum can handle 7–15 transactions per second. Such rate does not meet the wants of time-sensitive duties, for instance in a sensible grid environment.
One potential solution is to design a more efficient consensus mechanism, for instance by designing blockchain-based AI applications that make the most of personal blockchains or consortium blockchains (which can effectively enhance throughput) or by designing incentive mechanisms to encourage nodes within the network to participate in the consensus (which can improve the efficiency).
5.3. Security of Smart Contracts
Most blockchain-based AI applications depend on good contracts to automate the training course of. There could additionally be errors and loopholes in good contracts [80–82]. For instance, vulnerabilities in DAO smart contract constructed on the Ethereum platform had been exploited in an attack in 2016, which resulted within the loss of 3.6 million ethers [83].
Hence, designing secure good contracts is a subject of ongoing importance [80, eighty one, 84]. For instance, can we also design AI-based approaches to determine and repair vulnerabilities in sensible contracts?
Data Availability
The telemetry data used to help the findings of this research are included inside the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.