Programming Puzzle: Unbiased Selection Algorithms

Programming Challenge: Unbiased Results

There is a set of sets of characters. Your task is to produce an algorithm which outputs a set of characters by taking exactly one random character from each of the sets of characters with probability 1/(total possible outputs). There is one rule. Once you select a character from a set, that character can’t be selected all remaining sets. For example with
{{A, B, C} {A,B}} if you choose A from the first set, the legal choices from the second set becomes {B} – choosing A would result in a collision and this is not legal. It is assumed that there is always a solution – you never have to worry about there being no possible solutions (so you will not encounter sets such as {{A},{A}}).
The potential combinations are {A,B},{B,A},{C,A},{C,B}. (note that {B,C} and similar is not possible since C is not in the 2nd set (output is in the same order as the sets are in))
You must make an algorithm which can pick these potential outputs with equal probability.
A naive and incorrect solution works as follows:

0. create an empty set for the output
1. start at the first set of characters
2. generate a random number from 0 inclusive to the cardinality of the set of characters exclusive.
3. Select that character and append it to the output set.
4. remove that character from the remaining sets which have not yet had a character picked from them
5. if the cardinality of the output set is the same as the cardinality of the set of sets, terminate. Else, iterate to the next set of characters and go to step 2

The problem of this is obvious. The order of the sets affects the probability of producing certain outputs and makes it wrong. If the order is {A,B}{A,B,C}, the probabilities of each permutation are as follows

A B 1/2 * 1/2
A C 1/2 * 1/2
B A 1/2 * 1/2
B C 1/2 * 1/2

But if the order is {A,B,C},{A,B} the probabilities of output is

A B 1/3 * 1
B A 1/3 * 1
C A 1/3 * 1/2
C B 1/3 * 1/2

as you can see, they don’t all have the same probability of output. this is an error. What is the solution to this problem?


General Intelligence Does Not Exist

There is a specific case of the use of the word general that is entirely inapprpriate – intelligenece. The problem is that we don’t know and possible it may be impossible to know what constitutes as general intelligence. We do not the significance of what we don’t know.

To assert that an entity has general intelligence would require you to know everything. It’s sort of like how you need to know the entire Earth’s surface area before you can say that someone has explored most of it.

Artificial Intelligence has many people in it working together towards a “General Intelligence”. What they are really doing is moving towards human-like intelligence. This term is a much more accurate description of what they are working towards.

The strange thing about intelligence is that you can not understand all the thoughts of an entity smarter than yourself. This means that a human could never intentionally create an AI greater than themselves. It would require some kind of randomness to create the AI. If you keep generating AI by means of simply producing random code, eventually, you will produce something more intelligent than yourself.

Ants are smarter than humans when it comes to working in large crowds. Humans often crush themselves to death in large crowds. It’s very surprising to say the least that ants behave more intelligently than humans. They have almost no brain compared to us. We can build fusion reactors and smash particles into each other at the speed of light but the tasks that ants accomplish every day are too much for us.

Swarm intelligence is an intelligence we lack. When we’re being crushed to death by people running away from a fire in a crowded building, people scream. This makes it hard. I’ve never really understood this response, screaming achieves nothing other than making it more difficult for those around you to communicate. If you were in isolation screaming does make sense. Anyway, the simple rules that ants have triumph over our neocortex. It reminds me of cellular automata in which simple rules can create complex beautiful behavior.

I strongly advise against using the phrase “general intelligence” again. Use human-like intelligence instead. Perhaps it will turn out that there’s a huge space of problems that we cannot comprehend that are parts of everyday life. That we’re not even aware of. I sometimes get the idea that, perhaps just as we can only view a small part of the visible spectrum (and the non-visible spectrum does have profound consequences on our body) perhaps our mind can only ‘see’ a very small space of problems.


Information Visualization Isn’t Easy

Knowing how to best represent a set of quantitative information is not easy. I strongly recommend a quick read of the Wikipedia entry on Misleading Graphs and if you want to read a whole book try The Visual Display of Quantitative Information by Edward R. Tufte. Instead of remphasizing the need for displaying information in the simplest way possible, I’m going to give examples of how not to display information. Sometimes the errors are only small. But those errors will always be notiable, they affect how people interpret the information. First up, WordPress’s graphs


The problem here is that when the view count is 0 the colour is grey – it should just be a lighter shade of yellow. The sudden shift from 1 as yellow to grey as 0 suggests there is a fundamental difference between 1 and 0. There isn’t, as I explained in my my Wikipedia correctly uses a colour to show a semantic difference. White is used to represent the fact that there is no information available in the following graph.Image

However, this graph isn’t without its own problems. Can you figure out what the problem is? It’s incredibly difficult to realize the problem as it is a very deep one. Think about colours.

The most obvious problem is that the graph does not use a gradient. groups of GDP are lumped together for no reason. There should be no groups. instead of $800-$1600 being one colours, $900 should be a little more red than $850, $950 a bit more red than $900 and so on. They are essentially rounding figured for no reason. If you had a graph showing $11.53 million profit in Q1 and $12.42 million profit in Q2, would you round them? Rounding them serves no purpose. All it does is reduce accuracy… nothing is improved.

There is a much more subtle assumption made by this graph. It might not be wrong but it isn’t explained. The assumption is that the utility of money increases logarithmically. Look at the groups, they increase logarithmically.

The designers of the graph took it upon themselves to change how the information is shown. Instead of something that’s 10x the size quantitatively being 10x brighter or bigger (10x more intense in some way represented visually) on the graphic, they’ve made it log_10 times brighter. That’s an arbitrary operation done without justification. I do expect that if I were to ask the designer of the graphic why he did this, he would reply with the unforgivable line
“because if it weren’t like that, it’d be hardest to tell the difference between countries with low income. For example, almost all of Africa would be very similar in colour”.
If your goal is to let everyone know the exact GDP of countries, use a table not a graphic.

It is not a dilemma that people would see only a small difference in the colours of African countries because there is only a small difference in the GDP of African countries.

One last puzzle. Can you spot what’s wrong with this graph?


The problem is that again they’ve grouped things together when not necessary. But more importantly, look at the way the “Barely Dem” and “Barely GOP” are coloured. They are different to the others in that their centers are not coloured. This destroys the semantic meaning of the colours of the graph. It was originally that the colour represents the strength of the likelihood but not the colour AND how much white is in the center also represents it, adding complexity without adding any information.

One last error in this graph is that they have changed the size of Alaska. This is only confusing. The graph is being totally warped by the designer who felt that the shape of Alaska was awkward to work with.This makes the graph now entirely subjective. The designer could’ve also changed the shape of all other states – the integrity of the graphic is entirely destroyed by this decision.

Consciousness is not at the Core of Problem Solving

Learning to walk takes some time. There’s a reason for this: walking actually requires you to solve simultaneous equations. Yes, that’s right. It’s incredibly difficult math. It’s only recently that engineers have been able to build a robot that can walk on uneven terrain with four feet. I am yet to see one that can do so on TWO feet. There is Asimo but that only works on flat surfaces and so isn’t very impressive. To program a robot that can walk you need many gyroscopes in. Ray Kurzweil touches on this in Age Of Spiritual Machines
“To predict where the ball will go, and where the fielder should also go, would appear to require the solution of a rather overwhelming set of complex simultaneous equations. These equations need to be constantly recomputed as new visual data streams in. How does a ten-year-old Little Leaguer accomplish this, with no computer, no calculator, no pen and paper, having taken no calculus classes, and having only a few seconds of time? The answer is, she doesn’t. She uses her neural nets’ pattern-recognition abilities, which provide the foundation for much of skill formation.”
This assertion that she is not solving the equations is wrong. She is solving the equation but only in a way that doesn’t lend itself to being expressed in math. The solution to the simultaneous equations are in her brain unable to be expressed by her. Some people struggle with expressing emotions. I argue that we also struggle expressing the solutions of complex equations we solve.

To think it is pattern recognition is absurd. No two balls ever take the same path, how can it use pattern recognition? To transform the ball’s path into a previous pattern we’ve seen would be more computationally expensive than solving the simultaneous equation. It wouldn’t allow us to predict the path of a ball in windy conditions or when the ball has spin or anything of the sort.

The conscious mind does not give us unrestricted access to the computations our neurons are doing. Why would anyone believe it could? You can’t access the part of your brain that’s telling your heart to beat or your stomach to pump food. We can only observe a tiny fraction of our minds and how it functions. This makes many people uncomfortable because they identify with “their” conscious mind and enjoy living in the fantasy world in which they have absolute conscious control over their decisions, fueled by the illusion that is free will. If I say “why did you choose to play football instead of doing your homework?” saying “I don’t know” leaves many unsatisfied. As though not knowing is a reason to not have done that. There is a lack of consistency because if people were asked how they were walking (and express it in computer language or math) they would be unable to yet one wouldn’t expect that person to then stop walking.

I don’t fear doing something without a reason for it becoming consciously apparent to me and neither should you. You’re really blocking your own intuition and your greatest weapon – the unconscious (subconscious) mind. A recent article I read on this subject was about crows solving problems without planning their actions. I’m going to interpret the results differently. When we do things on the fly is called improvisation.  I think that what we call planning is actually just improvisation in a virtual world that we have constructed by imagining it. In other words, the same processes in our unconscious mind (the processes of solving the problem at hand) when we are in the real situation than the imagined one. It’s just that we are storing the improvisations we made in the virtual world in our memory so that once we encounter the real scenario we have already had a head start spending time improvising the answers. It is not that when we plan things it is our conscious mind overseeing this whole thing.

Now one mind say that my point is wrong because the conscious mind is the one that solves improvisation problems but this is not true. I see it as the unconscious mind solving problems then that solution rising up (“falling down “is perhaps the language I should have used) to the level of consciousness. The conscious mind then swiftly takes credit for solving this problem. It reminds me somewhat of the quote “good artists copy, great artists steal”. This quote was brought to my mind for unknown reasons but I still feel its relevant. I don’t need to be able to express justification for everything I do in order to be satisfied with it.

I’m guilty of separating the brain into conscious and unconscious parts. This isn’t quite right. Our brain is just one big mush of neurons and somehow it works.  To split it up in this way isn’t going to be very successful.

Our Displays Are 2D Not 1D: GUIs Need to Exploit This

Many applications do not take advantage of the fact that our displays can be arbitrarily wide. Almost all of them only extend their GUI in the vertical direction.

Of course, GUIs are shown in 2D. In 1D they would have a width of 0 and hence be invisible to the human eye. But the core of their design doesn’t take advantage of the horizontal dimension properly. Here are some examples.

Here Google uses only 1 column regardless of how wide your monitor is. It’s insane. The number of columns should increase just as the number of rows increases as the height of your monitor increases. Bing has the correct solution.


As you increase the width of your browser the number of videos per row increases unlike Google. This concept can be applied to other programs such as chat ones. Here’s what chat current looks like

Here’s how it could look.

There are two columns of text instead of one. The scroll bar would control both the left and the right windows. The left window is just an extension of the one on the right. If you were to scroll upwards 1 line of text, m would be at the top of the right panel and l would be at the bottom of the left panel.

The advantage of this is that the amount of information visible to the user is proportional to the area of the screen available rather than just proportional to the height of the screen. This is a fundamental improvement over the previous. One downside to having more columns of text is that the user may confuse the two but this is easily remedied by having a large space between columns and a border separating them.

This design could be applied to any list of text like Google’s search results. Though for that there is a better solution. Take a look at the website of Digg.


Digg does have multiple column but the numbers of columns is not dynamic. There are always three columns regardless of how wide your monitor is. They need to learn from Bing. This layout with a dynamic number of columns would be ideal for a site like Google. The top search result would be at the top left, the 2nd best would be on the next column in row 1. It would not be below it. This is because we read from left to right, top to bottom. Not top to bottom, left to right. It’s key that the items would be ordered correctly.

Accuracy is Everything

Accuracy is everything

If you want to become a good debator and an effective communicator, accuracy in your statements is essential. I make a conscious effort to avoid using words such as “really” or “actually” when it adds no meaning to the sentence. Instead of saying “1 + 1 is actually 2” I just say “1 + 1 is 2” because by using “actually” I undermine my sentences without it in, because “actually” acts an an intensifier in boolean statements (where as “very” is used in continuous statements “1 + 1 is very 2” wouldn’t make sense but “the sky is very bright” does make sense because the potential brightnesses take on a continous space) but this makes no sense. A statement is either true or false. It can’t be, no matter how much you want it to be, half true or mostly true.

I am not a pedant, but I am often called it. Another phrase I avoid like the plague is “too”. This word is misused to often. It should only be used for shortening sentences when the long form of the sentence is easily deduced. So the sentence “There is an amount of water being poured into that bucket such that water is flowing out of it and onto the floor” is simplified to “There is too much water being poured into the bucket”. But people rarely think about the long form of the sentence, they use the “too” shorthand so often that they don’t even bother to think about the longer form of the sentence they’re shortening. As a result, in their mind the long form of the sentence evolves to something like “there is an amount of water being poured into that bucket such that something bad is happening” and use too always in this sense, so it becomes meaningless when used in some cases. One example is “Don’t eat too much pie” which is short for “Don’t eat an amount of pie such that something bad will happen”. It’s a barren tautology almost. No one would ever do something that they thought was bad. By saying it, you do nothing to help the person – no information is contained in the sentence.

English is not rigorous but I take pride in meaning everything I say. I don’t use metaphors. I hate them. It’s just butchering my language. Stop it. Don’t make demonstrably false statements and then say “but it’s not false, it’s a metaphor”. Just because a sentence can be constructed without breaking the laws of grammar does not make it coherent. Metaphors are basically art but made it language rather than a paintbrush. I don’t hate analogies that help people understand things more by means of pattern recognition. I do find metaphors that are blatantly false annoying, such as “A lifetime is a day, death is sleep; a lifetime is a year, death is winter”. All of those are false statements.

Here’s an analogy that explains my position. It’s like I’m an air traffic controller and the local radio station decides that it will broadcast songs on the same frequncy I’m on. Sure it sounds great. But people will die as a result. I’m trying to communicate with pilots and you’re making this much more difficult by blasting your art over our communication. It’s sort of muddying the language I’m using. Take your art to another medium – it isn’t suitable to use English in this way. I don’t buy into the hand waving “it’s a metaphor” when I point out that the statement “Death is winter” is false… and neither should you.

“Literally” was a word that used to mean something. Sadly, so many people have begun to attempt to use it an an intensifier that it has started to lost all meaning. “She was literally the hottest girl I’ve ever seen” strictly means that the girl I saw had the highest temperature of all girls I’ve ever seen. It does NOT mean “She was the most attractive girl I’ve ever seen and I’m not lying”. This is basically the path English is going down by having so many people use metaphors. But at least people know when they’re using metaphors. So it doesn’t matter too much. Where as people who use ‘literally’ incorrectly are unaware of their errors.

The Nature of Computation


If you were building an AI for a game of chess it would be strange if you did anything other than modelling the game. But take a look at the universe we live in and the best equations to explain the reality. Some physicists will tell you that time does not exist. That time and space are really made of the same thing. It is my conception that if we put all the greatest minds together working on a chess AI, they would come to the conclusion that there is something more fundamental that the pieces and the squares. There is some more fundamental substance that they are both constructed from that would serve us better when creating an AI. Just as space-time  is named, we’ll call this fundamental matter square-piece.

This square-piece would allow a much more effective AI. An argument against this might be that we don’t know what reality is defined as, but chess *is* defined as pieces on a board obeying various rules therefore it can’t be anything else. This is wrong however. Just because a problem can be defined in a certain way does not mean it cannot be reduced to a simpler problem. Reducing the game so that the AI can make computations on the fundamental matter would be more effective.

I enjoy making wild statements that are difficult to verify (this seems to ruffle some feathers at times but I don’t care), so here’s another: our brains are actually operating on this fundamental square-piece when playing chess. This is how we can still compete current day chess AI that just iterates through potential scenarios on a weak processor that has a ‘mere’ 100 million transistors on it.

I think to exploit this square-piece a cellular automaton is needed. The space of algorithms that can be designed by humans is infinite in size but that does not mean it explores the entirety of the space of algorithms. There are many algorithms our brains could never comprehend that are essential to building intelligent systems. Instead of designing a CA, we need to search for one that does what we need. Play chess well.