Why Arweave is Doomed to Fail

Arweave is a new blockchain technology that aims to solve the ever growing problem of storage space. It claims that it will be able to store information permanently in the cloud, with a single fee charged at the start. This is in contrast to blockchains such as Siacoin which charge a fee per month.

There’s one problem with this approach: suppose there is a total of 10TB of storage space in the world. Since the arweave chain always increases in size and never decreases, it will eventually reach this. And once it does, it will not be possible to store it. The end result is that there is no other option than for it to be abandoned – since the entirety of the chain is needed to make it valid, it will become impossible to use it. Arweave is a broken idea, and should be abandoned.

Alternate coins that charge fees per time period have a much better solution: the fee can simply increase as the storage space available decreases. This lets the market decide who gets to keep their data on the cloud. A much more natural solution. Instead of the entire thing imploding, only those who no longer need their data on the cloud can delete that data from it.

Amongst all the hype surrounding the launch of any new technology where those who say positive things about it are rewarded, you will find countless people who proclaim it will solve problems that it cannot solve. When financial incentives are given to those who say positive things but nothing is given to those who have a more critical approach, the public’s understanding of what can realistically be accomplish is warped.


How to prove P = NP

In this latest post, I will outline the progress I have made. I have realised that there is no simple way to do multiplication in a reversible computer because the known multiplication algorithms all rely on the construction and destruction of bits. This is something that cannot be done in a reversible computer without adding extra state to the machine, which is potentially impossible to know if you want to start at the end where you would have to guess where the excess bits ended up. It may be viable to do this to solve the Integer Factorization problem but it would not be a flawless victory, and that’s what I’m looking for.

I’ve been focused more on proving directly that P = NP by tackling an NP complete problem. For the longest time I thought the solution revolved around starting at the final state then reversing backwards to reach the initial state, but the real solution is even more profound than that. The key is to start at the beginning then reverse to the end. This is because in a reversible compputer with a finite number of states, any given state is guaranteed to loop. Irriversible computers will also innevitably loop but the state you input, unlike in reversible computers, is not necessarily a part of that loop.

My main focus is on the boolean satisfiability problem. My idea is basically as follows: use a tradional algorithm with exponential time complexity to iterate through all potential inputs to check if they’re the solutions, then once a solution is found return the bits to their original state. It’s the final part that’s tricky. As a many to 1 function is not possible in a reversible computer, returning the final correct output to the original state is difficult. There can be no simple sorting of the bits so they’re ordered as they were in the beginning as that would be a many to 1 function – it fundamentally cannot work. Therefore an algorithm must be used that’s 1-1 that returns it, and my suspicious is that the boolean test itself is that algorithm. In other words, I need to re use the boolean test in some way in order to sort the bits to their original state. I am yet to figure out precisely how to do this though. It may involve things like using it in it’s inverse state, or in it’s reflected state (so that whatever was input can be returned directly) or some combination of these things.

Efficiently Factoring Integers

Integer Factorization  is thought to belong to a class of problems known as NP-intermediate. This class lies between the easy P class (polynomial class) and the difficult NP-complete class. It is known as a trapdoor function which means it’s easy to compute in one direction, but very hard in the opposite direction… or so it seems.

Cellular automata are worlds with simple rules that can result with chaotic and complex behaviour. For example, rule 30. This pyramid is produced by following the 8 keys. The 3 colours at the top produce the cell below it – it is procedurally generated starting from the top and goes down.

What’s even more interesting is that some of these rule sets are Turing complete. This means given the right input (the right set of starting points), it can compute anything. What’s more is that some of the cellular automata are reversible, in that if you know the current state, you can compute the previous state. An example of this is rule 51, in which bits are flipped. Unfortunately this particular rule isn’t both Turing complete and reversible, which is what is desired. But what’s so important about combining these two things? Consider the following thought experiment.

Person A multiplies two large primes to get a semiprime (a number which has exactly 2 prime factors) and tells person asks person B what the semiprime is and asks him to figure out what two numbers he multiplied (i.e. to factor the semiprime). Person B is aware of the location and velocity of every atom in the universe. Trying to figure out what the 2 primes were in the traditional way would take exponential time, but were person B to simply compute the past by means of simulating the universe going backwards in time, it would only take as long as it took for A to multiply those two numbers (in terms of the complexity class given that reversing time for x seconds takes an amount of time proportional to this). In other words, by focusing on reversing time we can transform an exponentially difficult problem into a mere polynomially difficult problem.

Taking this idea and applying it in a practical way can be done as follows: design a computer inside a Turing complete reversible cellular automata that takes 2 numbers as input, tests that they are prime using the AKS primality test, sorts the numbers in ascending order then multiplies them and outputs the result. To factor a semiprime, you need only set the semiprime as the output and set it to go backwards in time, and the bits would flow back into the multiplying machine and 2 primes would form as if by magic.

Were this algorithm to work, it would not on its own prove that P=NP. Only producing an algorithm that is classed as NP-complete could do this. However this is an important step forward, as it would be the first algorithm that solved what was thought to be an NP-intermediate problem. An interesting fact is that if P != NP there must be at least one problem that belongs in the class NP-intermediate, so revealing that one of these problems actually belongs in P will lower the confidence that P != NP.

Though it still would be a significantly powerful algorithm to invent. If the totient function were vulnerable to be solved in the same way (and I suspect it is), it would become trivial to break RSA encryption. Meaning that you could hack into virtually any system online with ease. You would be able to easily acquire the log in details of any user, such as their Gmail account or their online banking account. Online banking systems would be totally insecure.

One seeming contradiction in all of this is that logic gates in computers are normally irreversible. NAND, XOR, OR gates all take 2 inputs and return 1 output. In an OR gate, the inputs (1, 1) and (1, 0) both produce (0), so how can a reversible machine magically know what input produced it? It can’t. This is why traditional logic gates cannot be used. Instead, a more advanced form of logic gates are used. These are known as Fredkin gates. Their special property is that the number of 1s and 0s in the input exactly matches the amount in the output – everything is conserved. These gates are Turing complete to can be used to do any arbitrary calculation.


I have always wondered why setting goals is advantageous. Instead of saying “I want to lose 10 pounds this month” why not just attempt to lose as much weight as possible? An empty argument about this is that “but then if I did lose more than 10 pounds I wouldn’t feel satisfied since I didn’t break a goal I set myself”. This is quite bizarre thinking. Goals make you happy because of the effect they have on the universe. You’re happy because you lost weight not because you achieved a goal. But there is a small amount of truth to this. I found finally understood the nature of goals.

Goals are self fulfilling prophecies. As you achieve more goals the power of goals as self fulfilling prophecies increases. This is because your confidence in them increases and in the realm of mind, your expectation of achieving a goal increases the probability that you will succeed in that goal. That’s the nature of it.

So therefore it would be a bad idea to often set yourself goals that you can never achieve, such as losing 100 pounds in a month. Because this means that the moment you set the goal you have a strong belief that you won’t complete this goal because you’ve never completed any of your previous goals.

This is why setting realistic goals is beneficial to you. Setting goals which are too easy is a waste of time. You want to be at the point where setting the goal makes the difference between achieving it or not. You would’ve lost 1 pound in a month without setting yourself that goal, making the process of setting such a goal wasteful. You need to be setting goals that you wouldn’t have achieved without setting that goal.

With this new understanding of goals it surprisingly does make sense to be happy that you completed a goal because it gives you more power over your decisions. Your belief in goals increases as a result of finishing a goal therefore goals you set in the future will have a greater probability of success.

Optimizing the goals you set for yourself is key. I recommend setting goals which you think have a 50% chance of success to maximize your growth. The further you stray from 50% the less influence you have over whether it is completed or not. If you think your chance of success is 99% then it doesn’t really matter what you do, you’re going to succeed anyway. If your chance of success is 1% then it doesn’t really matter what you do, you’re going to fail anyway.

With this new understanding of goals, I began to value them a lot more. I started to build a program to visualize goals. The basic idea is for it to visualize the probability that you succeed for a range of goals by analyzing goals you input in the past along with the record of if that goal was completed. It uses the information in a very strict way. It can only work with goals that are time based such as “jog for 30 hours in the next 2 weeks” or “spend 40 hours per week learning to program”.

Here is an image of the program.


This graph visualizes 2 goals, one was successful and the other was not. The goals shown were the following:

8 hours work in 24 hours: Success

12 hours in 24 hours: Failure

Before I go further explaining what is shown, I need to explain the logical deductions about the ability of this person to work using the information from a single goal. Here are the simple deductions:

If a person does x hours of work in y hours he must therefore be able to do x hours of work in z hours where z >= y. This is fairly intuitive – it wouldn’t make any sense for a person to not be able to do the same amount of work given more time.

If a person does x hours of work in y hours, he must therefore be able to do Minimum(0, x-z) hours work in y-z hours where z <= y. This one’s slightly harder to grasp. The basic idea is to think about the worst case scenario of how a worker might complete his work and then work for that. Perhaps there is a worker who always waits 16 hours before starting his goal. If he set a goal to do 8 hours work in 24 hours, he would succeed. But this deduction also means he can do 7 hours work in 23 hours, 6 hours work in 22 hours and so on.

These logical deductions can basically never be wrong without weird assumptions (such as a person being able to do more work with a smaller time limit). It could be true that a person is motivated more when there’s less time left but if you’re given more time then you’re inevitably going to end up with a small amount of time left at some point.

Back to the graph. Here it is again with the goals annotated.


It could be that a work does do more work given less time, but that isn’t calculated by the algorithm. It just assigns it an unknown value for that amount of time.

If the areas overlap, they mix colors. If green overlaps red it becomes yellow.



This is how the graph looks once I’ve added the goal 14 hours work in 20 hours, success. The old success goal has been totally eclipsed (because the new goal does more work in less time) and there has been some overlap creating a yellow area to represent a 50% chance of success. In reality the edge would never go straight from 50% to 100% it would be a gradient but this will be shown wen the user has completed many goals so it becomes a gradient. Trying to do it using an algorithm would result in incorrect values – it’s impossible to know just how the transition from 0% success to 100% success will be. There’s not enough data to do this (and there never will be) without potentially creating bias. By bias I mean some users could look better than other users even though they’re just as capable overall as a result of how the algorithm might change their results.

I’m still working on this, it’s a growing project. The hardest part is trying to get goals to work together. Setting goals of different types is quite tricky since one goal may be more important than another. The basic problem is that if you set two different goal types in the same period, it makes it look worse than if you just did one of those goals alone since you’d have more time to do it (since you wouldn’t spend any time on a goal you didn’t set) but it would show up on the graph the same. This is why I haven’t yet fully figured out how to use multiple goals together.

I suspect the solution is for the user to input their idea ratio of goals in. For every 2 hours of programming, do 1 hour of jogging and such. And I suspect the ideal ratio of people will change according to how much time they have. If its only 20 hour in the next week all of that should be sleep (sleeping is a goal – you aim to sleep when you go to bed) – it would be crazy to try to have less sleep than this.

Programming Puzzle: Unbiased Selection Algorithms

Programming Challenge: Unbiased Results

There is a set of sets of characters. Your task is to produce an algorithm which outputs a set of characters by taking exactly one random character from each of the sets of characters with probability 1/(total possible outputs). There is one rule. Once you select a character from a set, that character can’t be selected all remaining sets. For example with
{{A, B, C} {A,B}} if you choose A from the first set, the legal choices from the second set becomes {B} – choosing A would result in a collision and this is not legal. It is assumed that there is always a solution – you never have to worry about there being no possible solutions (so you will not encounter sets such as {{A},{A}}).
The potential combinations are {A,B},{B,A},{C,A},{C,B}. (note that {B,C} and similar is not possible since C is not in the 2nd set (output is in the same order as the sets are in))
You must make an algorithm which can pick these potential outputs with equal probability.
A naive and incorrect solution works as follows:

0. create an empty set for the output
1. start at the first set of characters
2. generate a random number from 0 inclusive to the cardinality of the set of characters exclusive.
3. Select that character and append it to the output set.
4. remove that character from the remaining sets which have not yet had a character picked from them
5. if the cardinality of the output set is the same as the cardinality of the set of sets, terminate. Else, iterate to the next set of characters and go to step 2

The problem of this is obvious. The order of the sets affects the probability of producing certain outputs and makes it wrong. If the order is {A,B}{A,B,C}, the probabilities of each permutation are as follows

A B 1/2 * 1/2
A C 1/2 * 1/2
B A 1/2 * 1/2
B C 1/2 * 1/2

But if the order is {A,B,C},{A,B} the probabilities of output is

A B 1/3 * 1
B A 1/3 * 1
C A 1/3 * 1/2
C B 1/3 * 1/2

as you can see, they don’t all have the same probability of output. this is an error. What is the solution to this problem?

General Intelligence Does Not Exist

There is a specific case of the use of the word general that is entirely inapprpriate – intelligenece. The problem is that we don’t know and possible it may be impossible to know what constitutes as general intelligence. We do not the significance of what we don’t know.

To assert that an entity has general intelligence would require you to know everything. It’s sort of like how you need to know the entire Earth’s surface area before you can say that someone has explored most of it.

Artificial Intelligence has many people in it working together towards a “General Intelligence”. What they are really doing is moving towards human-like intelligence. This term is a much more accurate description of what they are working towards.

The strange thing about intelligence is that you can not understand all the thoughts of an entity smarter than yourself. This means that a human could never intentionally create an AI greater than themselves. It would require some kind of randomness to create the AI. If you keep generating AI by means of simply producing random code, eventually, you will produce something more intelligent than yourself.

Ants are smarter than humans when it comes to working in large crowds. Humans often crush themselves to death in large crowds. It’s very surprising to say the least that ants behave more intelligently than humans. They have almost no brain compared to us. We can build fusion reactors and smash particles into each other at the speed of light but the tasks that ants accomplish every day are too much for us.

Swarm intelligence is an intelligence we lack. When we’re being crushed to death by people running away from a fire in a crowded building, people scream. This makes it hard. I’ve never really understood this response, screaming achieves nothing other than making it more difficult for those around you to communicate. If you were in isolation screaming does make sense. Anyway, the simple rules that ants have triumph over our neocortex. It reminds me of cellular automata in which simple rules can create complex beautiful behavior.

I strongly advise against using the phrase “general intelligence” again. Use human-like intelligence instead. Perhaps it will turn out that there’s a huge space of problems that we cannot comprehend that are parts of everyday life. That we’re not even aware of. I sometimes get the idea that, perhaps just as we can only view a small part of the visible spectrum (and the non-visible spectrum does have profound consequences on our body) perhaps our mind can only ‘see’ a very small space of problems.


Information Visualization Isn’t Easy

Knowing how to best represent a set of quantitative information is not easy. I strongly recommend a quick read of the Wikipedia entry on Misleading Graphs and if you want to read a whole book try The Visual Display of Quantitative Information by Edward R. Tufte. Instead of remphasizing the need for displaying information in the simplest way possible, I’m going to give examples of how not to display information. Sometimes the errors are only small. But those errors will always be notiable, they affect how people interpret the information. First up, WordPress’s graphs


The problem here is that when the view count is 0 the colour is grey – it should just be a lighter shade of yellow. The sudden shift from 1 as yellow to grey as 0 suggests there is a fundamental difference between 1 and 0. There isn’t, as I explained in my my https://emphatious.wordpress.com/2012/10/12/zero/. Wikipedia correctly uses a colour to show a semantic difference. White is used to represent the fact that there is no information available in the following graph.Image

However, this graph isn’t without its own problems. Can you figure out what the problem is? It’s incredibly difficult to realize the problem as it is a very deep one. Think about colours.

The most obvious problem is that the graph does not use a gradient. groups of GDP are lumped together for no reason. There should be no groups. instead of $800-$1600 being one colours, $900 should be a little more red than $850, $950 a bit more red than $900 and so on. They are essentially rounding figured for no reason. If you had a graph showing $11.53 million profit in Q1 and $12.42 million profit in Q2, would you round them? Rounding them serves no purpose. All it does is reduce accuracy… nothing is improved.

There is a much more subtle assumption made by this graph. It might not be wrong but it isn’t explained. The assumption is that the utility of money increases logarithmically. Look at the groups, they increase logarithmically.

The designers of the graph took it upon themselves to change how the information is shown. Instead of something that’s 10x the size quantitatively being 10x brighter or bigger (10x more intense in some way represented visually) on the graphic, they’ve made it log_10 times brighter. That’s an arbitrary operation done without justification. I do expect that if I were to ask the designer of the graphic why he did this, he would reply with the unforgivable line
“because if it weren’t like that, it’d be hardest to tell the difference between countries with low income. For example, almost all of Africa would be very similar in colour”.
If your goal is to let everyone know the exact GDP of countries, use a table not a graphic.

It is not a dilemma that people would see only a small difference in the colours of African countries because there is only a small difference in the GDP of African countries.

One last puzzle. Can you spot what’s wrong with this graph?


The problem is that again they’ve grouped things together when not necessary. But more importantly, look at the way the “Barely Dem” and “Barely GOP” are coloured. They are different to the others in that their centers are not coloured. This destroys the semantic meaning of the colours of the graph. It was originally that the colour represents the strength of the likelihood but not the colour AND how much white is in the center also represents it, adding complexity without adding any information.

One last error in this graph is that they have changed the size of Alaska. This is only confusing. The graph is being totally warped by the designer who felt that the shape of Alaska was awkward to work with.This makes the graph now entirely subjective. The designer could’ve also changed the shape of all other states – the integrity of the graphic is entirely destroyed by this decision.