PoCo #2 — On the use of staking to prevent attacks

The PoCo series of articles discusses the Proof-of-Contribution protocol developed by iExec. The last article gave a broad view of the protocol. In the second episode, we will discuss staking, how it will help prevent attacks and how it will be part of the token economy.

I didn’t quite get your last post about agent incentives, can you ELI5?

The iExec platform uses blockchain technology to create a marketplace where people can rent computing power to run ÐApps. In this platform, there are 4 types of agents:

  • Application developers design applications running on the Ethereum/iExec platform
  • Users want to run applications and are therefore buying computing power to execute them
  • Workers execute applications required by the user and are therefore selling computing power
  • Schedulers organize workers into working pools and manage the execution of tasks: handling work distribution, assigning tasks to workers, transferring data, and handling failures.

In the previous article, we have introduced the Proof-of-Contribution (PoCo) protocol, which ensures that the results computed by the workers are valid and can be trusted by the user who required them. The PoCo protocol controls the way multiple workers achieve consensus on a computation result by leveraging several mechanisms: user-adjustable confidence threshold, task replication and voting on results, worker reputation based on computations history, and worker security deposit (staking). Here are the main steps of the protocol, but you can refer to the first article to read a more detailed description:

  • Each application execution comes with a confidence threshold which determines the minimum level of confidence required for a result to be validated.
  • Before performing any computation, the workers have to commit a security deposit (stake).
  • Workers (randomly designated by the scheduler) perform the computation and provide a (signed) hash of the result they computed.
  • The contribution of each worker has a confidence level which depends on the worker’s reputation and stake.
  • When multiple workers provide the same result, their confidences are aggregated to compute this result’s confidence.
  • The task is duplicated until one of the proposed results confidence exceeds the confidence threshold. At this point, we have achieved a consensus on the application result:
    – Workers who computed an erroneous result will lose their stake
    – Workers who computed the validated result will share a reward composed of the application payments (from the user) and the stake lost by workers who published an erroneous result.
  • Workers reputation is adjusted.

“You are using a reputation mechanism, this makes you vulnerable to Sybil attacks”

Let me explain to you why this is not the case. Sybil attacks are attacks that use many identities to subvert the reputation mechanism. Preventing them requires a way to control the creation of new identities. At first glance, our PoCo mechanism could be attacked by many workers trying to overcome a consensus.

It is the staking part of PoCo which prevents such attacks from being viable.

The PoCo controls the number of identities anyone can use by requiring them to hold tokens in order to be considered a valid worker. The coordination of workers trying to take over a consensus is limited by the scheduler randomly selecting which workers are working on a given task. In order to perform a Sybil attack, the attacker would have to create many identities and stake enough for those identities to actually be given work by the scheduler.

Therefore, in order to take over x% of the platform, an attacker would have to hold x% of the stake in the worker pool.

Why do I have to stake to participate as a worker?

The first question that might come to mind is “As a worker, how much am I getting for the work I do?”. A simple answer would be “your share based on how much you contributed to the consensus”. There are different ways to compute what a “fair share” is, and the choice of formula is made by the scheduler and could vary between working pools. This “fair share” results from the distribution of the application payment, paid by the user, among the workers. Therefore, the more workers are required to achieve consensus, the more the rewards get subdivided, leading to each worker getting less.

The use of staking avoids this negative effect. By taking tokens from bad contributors we not only give incentives against performing wrong computation but we also ensure that workers contributing to disputed consensus can make as much as they would make performing any other application.

In this article, we will evaluate this staking mechanism in the context of different attacks. The attacks described here are performed by coordinated malicious users. They are not willing to do any computation to provide the “good” result (otherwise they would just be honest users) and agree on a false result to provide. By sending the same fake result, they are attacking the consensus, hoping for their result to achieve the confidence threshold required by the application and get the reward.

We only need addressing coordinated attacks where attackers work together. Indeed, having a similar fraction of uncoordinated attackers would be less threatening to the consensus. We can also notice that if attackers were to misbehave only on some occasion, to try faking being honest users, this would result in a smaller apparent fraction of attackers, hence reducing the number of bad contributions attacking the consensus, and therefore reducing the effectiveness of attacks. By applying security through replication to an unbonded fraction of the tasks being executed, PoCo leaves no room for attackers to avoid detection like in approaches that rely on occasional spot-checking.

Can you show us what would happen without staking?

As discussed previously, without staking, we expect the workers to get a lesser reward when contributing to disputed consensus, thus making attacks effective at annoying the honest workers. In the meantime, workers would not lose anything and could even run a profit if they ever manage to win a consensus.

The following graph shows the mean profit per contribution for both attackers (red) and honest workers (green) for various commitment values (should be seen as a ratio of the reward given by the user). Along the horizontal axis are different ratios of attackers in the platform ranging from 0.05% to 50%. For each value, we simulated the execution of 100 000 applications (100 000 independent consensuses). We then displayed the mean reward of contributing to this platform.

In these simulations, all workers were given an identical reputation of 90%. The application payment was fixed to a value of 1.

There are a few things to observe here:

  • If we increase the confidence threshold required for a consensus to be achieved, each contribution will receive a smaller reward. This is because, even without attackers, more contributions are required to achieve the consensus, and the rewards are therefore distributed among more workers. This is expected and can be solved by tweaking the cost based on the required credibility level.
  • Mean profit for attackers is not negative. While attackers barely get any reward, doing an attack could still be profitable and would definitely be effective if the only objective is to be mean to honest workers.
  • Mean profit for honest workers decreases when there are more attackers. With bad contributions attacking the consensus, more honest workers are required and the reward gets distributed among more users.

Can you show us what changes when we add staking?

Using the same simulator, we can have a look at the effect of staking on the profitability of running a worker. The profitability shown in the following graphs combines the profit made when positively contributing to a consensus and the loss of stake when negatively contributing to a consensus. With the same settings as for the previous graph, we gradually incremented the stake committed by the different workers when submitting the result of an application. Contributions made to the winning side of a consensus kept their stake and were rewarded. On the other hand, contributions made to the losing side are not rewarded and lose the committed stake.

As previously, the cost of running an application is set to 1. The previous section showed what happens without any stake (staked amount is 0). In this section, we considered stakes ranging from 10% to 200% of the application running cost.

When increasing the worker stake, we see a change in the shape of the curve. When the stake is high, the funds confiscated from the bad contributors are so high that, for each bad contribution by an attacker, the number of additional workers required to reach consensus is lower than the reward increase.

In those cases, a high attacker ratio leads to a spike that challenges the equilibrium between contributions. This spike indicates that it is more profitable to contribute to disputed consensus where you can get funds committed by bad workers than it is to contribute to new consensuses when the only reward comes from the basic application cost.

The different graphs visible above show that, in order for all contributions by honest workers to be rewarded equally, without being affected by the presence of attackers, the stake committed with each contribution to a consensus should range between 20% and 50% of the application cost. This amount represents between 1 and 3 times the benefit expected, which we believe is low enough not to frighten workers while ensuring the good economics of the protocol.

How secure are you against coordinated attacks?

In this section, we evaluate how likely an attack is to succeed. We consider that an attack succeeds if the attackers had their fake results accepted by PoCo. In this case, the user obtains an erroneous result, and the honest workers which have contributed to the task execution do not receive payment and have their stake stolen by the attackers.

Similarly to the previous experiments, we measured the fraction of consensuses won by the attackers. As previously we analyse the execution of 100,000 applications for each considered environment.

As expected, the capacity of PoCo to prevent attacks depends on the confidence threshold. A higher confidence threshold means lower susceptibility to attacks at the cost of more contributions required to achieve consensus.

For confidence thresholds of 99% (resp 99.9%, 99.99%, 99.999%), the smallest ratio of attackers for which we observed bad consensus (1 in 100,000) was 1.238% (respectively 1.811%, 5.376%, 11.513%).

Number of consensuses won by the attackers during our simulations
Number of consensuses won by the attackers during our simulations

While having an erroneous result achieving consensus is technically not impossible (and cannot be guaranteed not to happen), it is extremely unlikely. In the meantime, attackers would lose a lot of money from their commitment funds being redistributed to honest workers.

Figures in the previous section show that in order to notlose money (red curve above 0), an attacker would have to control over ⅓ of the workers in a pool. Even if they ever did, they could win much more by using all those agents (and staked funds) to do actual work for the platform.

While increasing the confidence threshold can provide protection against large attacks, it also has a price. In our simulations, requiring a confidence score of 99.999% to achieve consensus doubles the number of workers required to achieve consensus (on a platform without any attacker) compared to a targeted score of 99%. This means that, in order for the worker to get the same reward, the users would have to pay twice as much for the extra confidence.

In addition to security, changing the confidence threshold also impacts the execution pattern. Using a confidence threshold of 90% or less would probably allow a worker to achieve consensus alone, thus returning a result with minimum delay. This would provide a QoS (quality of service) similar to usual cloud providers.

On the other hand, using a confidence threshold above 99.99% would prevent any worker to achieve consensus alone, and would lead to systematic replication, increasing confidence in the result as well as the cost of running the application and the delay before consensus is achieved and the result can be sent back to the user.

What about erroneous ÐApps executions (bugs)?

In the previous section, all bad contributions were originating from attackers, and honest workers were all providing the same result. However, the complexity of application software, their runtime and the hardware they are executed on can lead to bugs or data corruption which do not cause the program to crash but could affect the results produced.

We will designate these potential events as “erroneous executions” of the application. In the worst case, where these erroneous executions were to systematically produce the same (bad) results, we would be facing something similar to the coordinated attack described previously. We showed that if bad results are provided less than 1% of the time, then we can be confident the consensus mechanism will perform correctly.

From an economic point of view, erroneous executions will only marginally redistribute some funds between lucky and unlucky workers. On average, workers will still be rewarded the same amount. Yet, while these erroneous executions do not affect the long-term profitability of running a worker, neither should they have a negative effect on reputation. They should not prevent an honest worker from growing on the platform.

Therefore, rather than resetting the credibility to 0 when a bad answer is submitted, we should consider losing only part of the history, either through a ratio (e.g. losing 50% of your bonus) or through a flat rate (e.g. losing 50 points). Providing they still have enough stake, workers would be able to rapidly regain the lost reputation. For old workers, who already gathered RLC through their work, such events would be virtually invisible.

The only circumstances where an erroneous execution would have a real impact on the worker would be if it was to strike a young worker, with a low initial stake, before it got time to grow and gather enough momentum (funds and reputation).

Still, efforts are expected from ÐApp developers to write robust code. Worker pools could also use a whitelist/blacklist mechanism to restrict which ÐApps are executed by their workers. While it is not realistic to audit the code of each application, it is possible to use replication to verify the application correctness. More details on that below.

What about attacker-ÐApp coordination?

A more sophisticated attack scheme, discussed by u/mreima on Slack and Reddit, that could be deployed by attackers is to use a specific application to try stealing the stakes of other workers. The idea is that, if the application can manage to detect whether it is running on one of the attacker workers or not, it could leverage this knowledge to perform an attack. If it is being executed on one of the attacker workers, the application would provide the correct result all the time, but if it is running on another worker, it could fail to provide the correct result with some probability.

While we should prevent the applications from accessing information identifying the workers, it might be possible for the attacker workers to intentionally leak information to the applications. Nevertheless, if the application runtime is the same during the application validation and later, during the execution on an honest worker, the application should not be able to evade the validation process and we should be able to detect a high probability of erroneous executions (signalling that the application is not behaving correctly).

But is such an attack more sophisticated than the scenario described before?

We discussed that bugs in applications cause variation in the profit of workers, but the mean profit stays very similar and should not discourage workers. However, this is only true if all workers are identical. If some workers (those owned by the attacker) are guaranteed to never be on the losing side, they will get an “unfair advantage”, and steal money for the other workers.

However, such an application, designed to perform an attack, would require users to pay for its execution. Simulations show that if the attacker has to pay for it, the cost of doing so will far outweigh the benefit from the attack for the commitment funds discussed earlier (x0.2~x0.5). In the end, the honest workers would benefit from the cost paid by the attacker to execute the fraudulent app more then the attacker would steal stakes using the fraudulent app.

That attack is therefore not viable if the application is designed specifically for the attack. It could be viable in the context of a useful application used by many users. However such an application already gets part of the reward, and its reputation among user of the platform would be ruined if such a behaviour was to be proven. Open source applications would be a plus for preventing such types of attacks. Once again, it would be the scheduler’s choice to decide which application can run on the working pool it manages.

Distribution of gains for a potential attacker owning ¼ of the platform
Distribution of gains for a potential attacker owning ¼ of the platform.

As shown in the graph above, a potential attacker, owning 25% of the platform would, on average, win 25% of the funds exchanged through the platform, regardless of the presence of bugs in the application (scenarios Good ÐApp/Bad ÐApp). On the other hand, doing an attack on a good application and doing an attack through a specifically designed bad application would not be profitable in the long run.

How could you to verify ÐApps?

A simple way to statistically detect bad application behaviour is to run the application n times in a controlled environment. In our case, we are focused on bad behaviour leading to the application returning an erroneous result. If among n replicated runs (with the same input) we have 2 or more results, that means something went wrong and the application is obviously not bug-free.

However, what happens if all replicated runs gave us the same result? What does that say about the application? If we note p the probability of the application to provide this result (work correctly), and x=1-p the probability of obtaining a different result, then the probability of achieving n times the “good” result is p^n. That means that the probability y of detecting non-determinism by executing n runs of the application is:

We can also deduce the number of required repetition to achieve a given confidence level:

These equations give us a good idea of the number of repetitions required before an application (and its runtime) can be validated. For example, if we want a 99.9% certainty that the application will fail less than 1% of the time, we have to do at least:

Number of required repetition depending on the required value for x and y
Number of required repetition depending on the required value for x and y.

Toward Ðapp ranking and reputation

As with all other agents contributing to the iExec platform, trust in the applications is paramount. Users want them to consistently provide the right result so that consensus can be reached and the result can be returned as soon as possible. Workers also want them to provide consistent results in order to avoid the loss they would suffer if they were to produce an erroneous result. They also want to make sure their system will not be hurt by a misbehaving application.

This means applications will have to be benchmarked so that their runtime can be predicted from the input they are provided. This is essential to determine the price of running an application. It will also be useful to protect workers against infinite loops. Applications should be designed to detect an invalid input that may cause it to loop indefinitely. Yet the halting problem requires us to control the application runtime, and potentially stop it, in order to protect the workers.

History of past behaviour and results of the validation by replication discussed earlier could be used to determine an application’s reputation. We can even imagine the reputation being shown in the Ðapp Store by using badges like Gold/Silver/Bronze. By achieving a high reputation, applications could gain visibility and be accepted by more working pools, which would use them as a whitelist mechanism.