Wow, my second blog post already! I'm itching to get this one out, because my current project is also pretty awesome, and I don't want to get behind now that I'm on a roll, so let's dive into another fun side project I decided to do, shall we? This one I'm just going to call CatchEmAll (also on Github) because, "All 152^2 Generation One Pokefusions Megaposter" doesn't quite roll off the tongue very well. In fact, neither does "First-Gen Pokefusions Webcrawler and Posterizer".
I was very much inspired by the Pokemon Fusion website I stumbled upon while I was grinding the Elite Four for experience in HeartGold. In fact, I had to put my game aside, as the waves of laughter that I was experiencing began to require my full attention. Needless to say, I spent the rest of the day bugging everyone claiming, "I found an even better one this time, I'm not even joking."
Fun and games aside, it quickly dawned on me that it would be difficult to peruse every fusion possible in its current form. Repeatedly clicking "random" can get tiring after a while, so I took it upon myself to think of a better way. I thought back to the poster of all the official Pokemon I had taped to my bedroom door when I was a kid. That's when I realized that I should make one too. After all, posters make great conversation pieces, and this one would be no exception.
Now, I heard today that the forecast for the rest of this post would be, "Technical, With A Chance Of Learning", so let's be sure to put on our fancy thinking caps, polymonocles, and Sunday moustaches before we leave this paragraph and venture out into the wild world of computer science.
But where should we begin? Ah, I know, let's learn about…
Making A Simple Web Crawler
Nope, not that kind of web crawler. I mean the other kind. You know, the one that scours the internet for all the memes and shit.
First, we need to figure out what language we would like to write this in. I'm a bit biased and lazy, so python will do nicely. If python isn't your thing, the process should still be the same no matter what language you use, so feel free to wander around and peruse the wares a bit while I talk.
You had me at #!/usr/bin/python
What you see above is the entire program I used to grab every fusion. It would be way more complicated if I had to grab the names of each fusion as well, but a little inspection of the html code on the Pokemon Fusion website yielded a portion of code that included every prefix, postfix, and full Pokemon name.
The site makes fusion work as follows: the Pokemon on the left contributes the prefix of its name, its head, and the colouring of the sprite to the final result. The Pokemon on the right contributes its body and the postfix of its name to the final result. Easy peasy.
Now we need to figure out how the site creates the names for its fusions, so we can get the correct fusion whenever we create the link to the file. The fusion image links all look like .../fused/<integer 1>/<integer 1>.<integer 2>.png, where the first integer is the Pokemon to the right side of the fusion. This makes things a little confusing when we download the files, as I sort them based on the Pokemon on the left side of the fusion (it's a little bit more appealing this way on the poster.)
Basically, when we fuse 001. Bulbasaur (left) with 002. Ivysaur (right), we should get Bulbysaur. Logically speaking, if the left makes the prefix and the right the postfix of the final name, then the file should be numbered something like 1.2.png, right? Nope. It's flipped on the site, remember? They sort them based on body first, head second, so Bulbysaur is actually 2.1.png. Drats. No matter, that just means we need to be careful when we rename them to reflect that switch. That's the reason we have a slightly confusing i and j usage in the code.
First, we want to use a handy url getter, which I call getter. "Very imaginative, Alex." Thanks, I guess. Anyway, whenever we feed getter a url and a filename, it will get the file located at the url and name it to that filename. Now we just need to build the url, the filename, get the file, then rinse and repeat until we have all the files saved inside our handy Pokemon folder.
To make things easy, I use two for loops that force variables i and j to step through all the values between 0 and 151, inclusively. Wait a minute, there aren't any Pokemon with a number 0, so what's with the 0? This is a special case, where if you look at the arrays that store all the Pokemon names, prefixes, and postfixes, you see that there's a Missingno. hidden at index 0 (the first value in an array.) I happened to notice that Missingno. was also a fusion contender on the website, so I just had to add it.
It is also a special case on the site, where any number outside the accepted 1-151 range will just use Missingno. in that Pokemon's stead. In the code you may have already seen me force i and j to 152 if they ever equal 0, and then back to 0 again. This is due to the fact that I want to place Missingno. at the far ends of the poster, so I need to make sure that the filename reflects this positioning. Another special case here is when the two Pokemon coming into the fusion are the same. We should just generate the same Pokemon again, so it takes the original name instead of mashing the postfix and the prefix together as we usually do (this can create a slightly different name than the original one, as sometimes the prefix and postfix share letters, e.g. Ivysaur's prefix ends with y, and the postfix begins with y.)
Whew. The last thing we need to do here is run it in the command line/terminal. This might take a little while, so while that runs, we can finally get into the next section of this post.
Making A Huge, 2.11 Gigapixel Poster
That's 2,110,000,000 pixels! Wow, but why is it so big? Well, let's use some math to see why: we have 152 Pokemon fused with 152 other Pokemon, which results in 152 x 152, or just 152 squared fusions. Remember that squares grow really fast, so you shouldn't be surprised that 152 x 152 = 23,104 fusions!
That still doesn't tell us why we end up with 2.11 Gigapixels, so assuming that every square that a Pokemon takes up is exactly 300 pixels wide x 300 high = 90,000 pixels/sprite. 90,000 pixels/sprite x 23,104 sprites = 2,059,360,000 pixels. Hmm, that's not right.
Oh, I forgot that we also need to include a topmost row and leftmost column for the original Pokemon (this way you can reference fusions just by following the row for the head and moving over to the column with the desired body.) Alright, now we end up with 153 rows x 153 columns = 23,409 sprites. 23,409 sprites x 90,000 pixels/sprite = 2,106,810,000 pixels. That's more like it! GigaPan likes to use only 3 significant digits, so it rounds up to 2,110,000,000 pixels.
PIL is the shiznit
Above, you will find some more code, but this time it's from the posterizer half of this project. Why don't we walk through all the variables and what they do: pokemonCount is pretty simple, it tells us how many Pokemon we want to have fused in the square. A value of 6 ends up making the miniature poster of the first six starter Pokemon that I have at the top of this post, for example.
My friend, Andrew Tinits, recommended that I use Python Image Library for this project. This was definitely the way to go, as I had already tried a thousand other ways of making the poster come together. All these other ways ended up either making files that were way too large to even be able to save them in other formats (I'm looking at you, Photoshop), or just didn't include the names or any other styling that I wanted. I couldn't settle for less, so that's why I went with Python Image Library. Shortly after I began reading the documentation, I started to realize just how convenient PIL would really be, and soon you will too!
As you can see, the first thing we use PIL for is to initialize the font that the text will have under the sprites. Next, we need to load the 300x300 background that we'll tile behind each sprite, and then initialize the final poster image based on the dimensions that we set. In order to draw objects or text onto an image object, PIL requires a draw object to be initialized with the image object that will be changed. The draw object renders any drawn shapes and text onto the image object for you.
Finally, the meat and gravy to this project's three-course meal.
For safety reasons, I've included a catch at the top so that we never actually create an invalid poster. It's not really necessary right now, at least not until I let it accept user input for how many fusions they want on the poster. Again, we can see that I used another double for loop, which just goes through all the fusions, plus one extra row and column for the original fusion ingredients. Inside this loop is a condition that makes sure we don't continue unless i and j don't equal zero at the same time. We don't have a Pokemon for that corner, so we'll have to leave it blank until we add the logo I made a little later on.
The next condition handles the topmost row and leftmost column, which hold the reference Pokemon for the overall fusion poster. We set the red value for the text to the highest amount (used later when we draw the Pokemon name and number), which will distinguish the actual Pokemon from the fusions. If you were paying attention to the prefixes, postfixes and Pokemon names from the code before this snippet, you would have noticed that Missingno. isn't at the beginning of each of the arrays anymore. I moved it to the end, that way it corresponds to the proper index when i and/or j are 152.
The next condition handles a similar situation, where the fusion ingredients are the same Pokemon. The text will be red, and the name will be of the original Pokemon. The condition after this one handles the rest of the cases, where i and j are different. Now we need to concatenate the appropriate postfix and prefix, which again means we need to use i and j in opposite order. This is due to the way that we are displaying the fusions in order of their faces first (rows), and then their bodies (columns).
Cool, now we need to open the actual sprite file into an image object, paste the background into position, and then paste the sprite overtop it after being offset appropriately. For those wondering why I add current into the paste method twice; the last current is there to make sure that the transparency is maintained, otherwise our background will be cancelled out where the sprite is pasted. We definitely don't want that.
Wait, let's not forget to label the fusion with its name! We first need to measure the dimensions of the text box using the textsize method, that way we can accurately draw the text so that it's centred horizontally, and vertically offset to just above the bottom of the cell. The colour is determined by how red our red will be (either 255 or 0), while the alpha is always full.
Once all the sprites are finished being added, we can paste the logo to fill in the gap we left for it, and finally save the poster! For those curious about the resulting size; it ended up being around 270MB large. This doesn't seem like much, but we will most likely want to put the result up onto the internets, which opens up a whole new can of worms.
Hosting An Incredibly Massive Poster
Massive is a bit of an understatement here. Each side is 300 pixels/sprite * 153 sprites / 72 pixels/inch = 637.5 inches = 16.19 meters. The area is 2,822.19 square feet (262.19 meters squared!) Where does one go with a poster who's size is larger than the average Australian house (214 meters squared = 2,303.48 square feet), or just over one tennis court in size? Don't bother with imgur, instagram, or any of the other traditional places you'd put your poster. Those sites don't support files this big, nor will they allow you to zoom in and look closely at the sprites. Bummer.
Not good. We can't even make out the individual sprites like this.
But wait, there has to be a solution to all of those things! There are a plethora of websites that host panoramas, including Gigapixel sized ones, so why don't we just put it up on one of those? Good idea! I'm glad we were wearing our fancy thinking caps today. After some research, I've concluded that the solution to this problem ends up (almost always) being GigaPan.
All in all, it took me over two weeks of on-off dedication to complete. But why so long? The first half was spent trying to find a way to put it together, which was before I decided to try using Python Image Library. Then it was smooth sailing until I hit another roadblock; where do I store it? That took me a few more days of trial and error, which included a few attempts at reducing the image size. But it was pretty awesome when I finally solved that problem; not only was I able to upload the full file in a lossless format, but I could also zoom into the poster relatively quickly and painlessly. High fives all around!
So for my first blog post I decided I would talk about a fun side project I did with my friend, Andrew Tinits, called Dumb Reviews.
Originally, we set out on a journey to pursue interesting things that nobody had really done before. We had ideas like: let's build an AI that plays QWOP (but this was already a thing * ), or what about a Twitch channel that plays a classic game? Ditto (pun intended ^^.) We were quickly running out of ideas here, so we decided to go with ol' faithful and go golfing. Who knew putting off a decision could lead to good ideas?
It was then that my friend mentioned that the internet would be soon graced by the presence of a multitude of new gTLDs, that is, "generic Top Level Domains". A mouthful, I know, but this is the magical acronym that would help us out of our procrastination session, and back into development mode. Simply put, Top Level Domains are the last few letters at the end of a domain name (e.g. .com, .gov, .org, etc.)
What makes a gTLD so generic? Well, picture our boring .com's and .org's and wonder, "What if we could end our website domain name with more appealing TLDs?" Then take a flood of new TLDs like .fish, .sex, .construction, or .restaurant for example, and set them free to roam the internet as they please (that's over-simplifying it a bit, but you get the idea.) Now we can really begin to have interesting domain names like crypto.fish, or dumb.domains (lots of inspiration drawn from this one.)
It was while we were busy laughing at dumb domain names that we decided that we should buy our own funny domain, and make a website of some sort. Easy, right? Now to just come up with a domain name and the purpose of the site! I decided that there aren't enough generators out on the web for my liking and that we needed to make one more, which posed another problem. What should it generate? We decided that we already did enough that day and started to procrastinate again by looking at the Video Game Name Generator, and Dumb Domains.
After some googling, we eventually decided that there aren't any movie review generators on the internet, and settled on that as our site's purpose. That's also when I attempted at making a logo for the site, and after a little searching, I eventually found Hipster Logo Generator. It's a delightful site that takes your hipster Abercrombie-and-Fitch-esque logo ideas and brings them to life. The only thing is that you have to shell out 5 bucks for a better-looking, higher-res result (I disagreed with that and stuck with the lower-res result.)
And now for the technical portion of this post, where I reveal how someone who has never before created an entire webpage in his life, helped throw one into existence.
Step 1: Register your Domain Name
Because nobody wants to implement their website around a specific name, only to find out when they finish that the smart domain name they had picked out is already taken. Nope, that's why there's a handy domain name registry lookup on any domain name registrar company's site. After you discover that your domain name is still available (congrats!), it's time to register it. We decided to go for GoDaddy as our site's host, but personal preferences and hosting costs may sway you to other hosting companies, so be sure to do a bit of research before you commit to one specific company.
What is Flask, you may ask? Flask is a very handy python oriented web framework that will save you a lot of trouble setting up your backend. It's so user-friendly, that practically anyone with basic knowledge of python, web development, and a bit of Google magic can write their own web server. Needless to say, I was pretty pleased with how painless the process was, and hope that all of you people thinking about making a site of your very own consider starting with Flask.
Pretty, isn't it?
Step 3: Style Is Everything
The first thing you should think about when styling your site is background tiles. If you don't intend on using one large background for your site, you should consider using a pattern that can be tiled seamlessly across the page. Subtle Patterns is a great site to find such patterns, as it contains many different styles that will appeal to a wide range of people. Triangular ended up being our final contender for the site (for those that are wondering what we used.)
Step 4: Figure Out The Mechanics
Right now you may be asking the question, "How does it comes up with all those <insert appropriate adjective here> reviews?" That's a great question, which I will get to answering shortly, but first I need to briefly talk about something very core to the site. That something is an equally interesting and useful concept in artificial intelligence known as Markov Chains.
A Markov Chain can be simply described as "A series of independent events [where] the probability distribution of the next step depends nontrivially on the current state." * A simple example of this is the board game snakes and ladders. Your position on the board after the next roll depends on what the current roll is, creating a chain of events, or a Markov Chain.
There, now that we have that out of the way, I can start explaining what we use Markov Chains for! We perused the interwebs for a database of movie reviews, and happened upon the database that Stanford University includes on their list of resources for a certain computer science class. More specifically, it's a collection of 50,000 IMDB user reviews for an arbitrary list of movies. What we wanted to do was turn that very large database into something that will generate every movie review page that the site spits out, which is where Markov Chains come into the picture.
Luckily, there was already an implementation handy for us to use, known as PyMarkovChain. It takes a database of text files and turns them into arrays of probability distributions that calculate the likelihood of any sections of text unbroken by whitespace appearing immediately after another section of text. In other words, it figures out the likelihood of any word in the database appearing immediately after another one.
This is useful because now we can just ask it to generate a sentence word-by-word, randomly picking out the first, and then using that result to find the most likely word that occurs after that, and the next one after that word, and so on. We also managed to separate the database into positive and negative Markov Chains. That way we can generate sentences for positive, negative, or neutral sections of text. Each personal review will either be positive or negative, while the movie title and description will be a neutral mix of both.
While the potential for cohesive sentences is always there, Markov Chains are only as good as your training set is. Our test case is written by the general public, so it means the Markov Chains reach into a mixed bag of potential spelling mistakes and bad grammar. Even if the entire set was perfect, the algorithm itself is not.
Any text generated using this method will only be as relevant as any of the words that came before. So while reading the text on the site, a sentence may start off well, but it quickly degrades into something else and loses any coherence it may have had as it goes. Then it goes without saying that shorter sentences will usually make more sense than longer ones. Keeping this in mind, hilarity still finds a way into the site on the off-chance the sentences manage to make sense grammatically.
That's great! But how are the rating, release date, runtime, genre, director, cast, reviewer names, and individual scores generated?
Step 5: Generate Everything Else
Now that we have the cool AI-related probability stuff behind us, we should probably start talking about some good old-fashion non-AI-related probability stuff.
Let's begin with how the individual scores are generated. We decided that we wouldn't be smart and use some clever algorithm to detect what the rating will be based off of their reviews. No, we just went and gave each one a random score out of 5. Besides, we thought it would be funny if there was a negative review paired with a high score, and vice versa. In fact, it's pretty hard to tell what the reviewers are saying half of the time anyway, so it doesn't really detract from the experience.
Next, we should talk about how the release date and runtime are calculated. The release date is a randomly chosen date between January 1, 1970 and today's date. Nothing too special, I guess. Neither is the runtime, as it's just another random time between 30min and 300min.
I guess I should top that last paragraph with a more interesting one. How about I explain how the overall score is generated? Yes, that sounds more interesting than the last two paragraphs combined. It is, I promise!
I decided that it would be way too boring to use another random number here, so why not get a bit fancy? I started by getting a long list of the top 500 actors according to an entry I found on IMDB, and two lists of directors, one being a 1785 person long list of directors from around the world, and the other being a top 100 list from IMDB. Sounds good, but when do we get to the overall score part? Fine, these two lists end up influencing the rating entirely, which is why I started talking about them.
It starts with the site choosing a cast of a random size between 1 to 13 from the list of top actors. It places the actors in order of rank, that way lower-ranked actors get less important roles when there are higher-ranked actors in the cast. The ranks of all the actors are then turned into a raw score out of 5 by first bracketing each rank to a value in [0, 1, 2, 3, 4]. This is done by using a for loop:
The loop checks if the rank is less than or equal to 100, 200, 300, 400, or 500. If any make this true, the actor is given a score of 0, 1, 2, 3, or 4, respectively. The code then reuses cast as a container for the list of actors' names. Which is then later displayed on the site. Immediately after that, the overall rating is calculated as 5 minus the average score. Pretty simple, no? Actually, no, there's a little bit more to it than that, but we're almost there.
The loop only spits out the base score based on the cast, but now the code has to decide what hand the director will have in all of this. So, I've included another excerpt of my code to show you:
This is where it starts to get complicated…
Here it gets interesting, as the director and the cast now have to play an intricate waltz in order to create the final rating. Wait a minute, this sounds just like real life! I told you it would get interesting ^^. Now let's work our way through the loop, shall we? The director will always make a hit movie if they appear in the top 100 list, which is usually a given anyway, but if that's not the case, then we have to introduce some new conditions.
We know the director is not in the top 100 list now, so I thought that I would be generous and give one of the better known actors a shot at directing their own movie, because why not. This is only possible if any of the cast is known to be a previous director, otherwise we'd have any joe putting together a movie, and that just isn't realistic enough. If the cast is lucky enough to be graced by the presence of a director, then that actor gets a generous 60% chance of being the new director. If the new director is in the top 100, they automatically score 5 stars because that's what happens in real life, and I said so. Otherwise they have a 40% chance of scoring it big. That's cool, but now we have to consider how some movies get their low ratings.
If all of these conditions return false, then we are quite possibly facing the most unlucky bastard to direct a movie, or just another not-too-well-known indie director that hasn't made it big yet. Your pick. If this aforementioned soul is misfortunate enough, they halve the total score. In this case, they have a 70% chance of failing, meaning that our critics might be just too harsh.
One of my other friends was fascinated by this and wondered what the overall distribution of scores might look like, so I made a small script that generates 10,000 movie reviews, tallies up each score, and creates a handy histogram for our enjoyment:
The histogram code I used didn't know what styling was.
Aside from obvious lack of styling, the final result is a bit striking. Over 60% of all the reviews generated will be given a score of 2 or lower, only 6% ever get the middle-of-the-road score of 3, and 34% ever make it above that.
Ok, well that just leaves us the genre and reviewer names. Reviewer names are simply random first and last names from the actor and director lists combined. Not too bad. For the genres, I compiled a list of popular genres and subgenres, and the code randomly chooses one to use. As a bonus, it has a 50% chance of grabbing another genre from the list to create a crossover genre. Anyone up for a wholesome Gay / Lesbian-Alien Invasion film?
Whew, that covers all the fun content generation stuff. Wait, I hear someone in the back row yelling, "If everything is generated randomly, how come we can go back and see movie reviews that we've generated in the past?" True, I almost forgot to mention how the site can magically remember each and every generated review.
Step 6: Remember That Seeds Exist
Wait, what are seeds? Whoops, well luckily I can answer that question, and no, it does not involve botany. Purely random number generators are a pipe dream that we can't achieve anytime soon without nature's help, so we have to make due with a pseudo-random number generator (there's a nice wiki page that talks about this topic in further detail if you are interested.)
Without a seed, one does not a pseudo-random number generator get. Those are wise words to live by, but what does a pseudo-random number generator need a seed for? A seed can be any old number, string of characters, etc. used to prime the pseudo-random number generator. Basically it's like meeting up with a bunch of your buddies and supplying the set of dice/deck of cards for that board game/card game you were going to play. Any old deck of cards/set of dice would have done just fine, but you brought that special set/lucky deck.
The subtle uniqueness of each deck of cards/set of dice influences the outcome slightly. The deck might be bent one way, the dice might have a slight difference in weight. Either way, the results that come out of rolling the dice/shuffling the deck will be unique to that individual set of dice/deck of cards. In our case, each seed will always generate the same set of outcomes when used the same way, hence why it's pseudo-random.
Now that we know what seeds do, how can we use them to our advantage? Conveniently, the Markov Chain implementation uses the exact same pseudo-random number generator as the other randomly generated content on the site, so now we can kill two birds with one seed, if you will. It's as easy as letting the seed equal the current time, in seconds since the epoch. This is technically what the seed is set to by default, but explicitly setting it here lets us know that the seed is being used to generate a new review.
To make each review publicly accessible after it's been generated, we need to hand out a permalink containing the seed. So when the server gets a request for any page that looks like "http://dumb.reviews/generate/<seed>", it will set the seed back to what it was for that page and regenerate the site using that seed. Neat, no?
Part 7: Write A Blog Post About It
Aaand check. Now that I've done that, I hope that my journey making something not quite relevant to your daily lives has brought about cognitive growth of some sort, maybe. Not only was this a fun project to put together, but also brings about a small sense of accomplishment when I can finally show someone something that I made happen, insignificant as it may be. I hope you too can find that insignificant accomplishment that you can show other people and laugh about, as it's not only good practice, but it also looks good on a resume!