Through this tutorial, we'll walk through how to generate text using a Markov Chain. First, to get a feel for the calculations that happen, we'll make a simple one ourselves. Then, we'll see how to use a python library to do the same thing, in a more robust fashion.
You can integrate this generated text code into a discord bot to make it an interactive experience.
All of the code explained here is covered in the excellent Coding Train video on the topic. (the only difference is this is python, and that was in p5 javascript)
txt = "Oh yeah, good luck with that. Who's brave enough to fly into something we all keep calling a death sphere? Tell her you just want to talk. It has nothing to do with mating. Fry, you can't just sit here in the dark listening to classical music.Take me to your leader! Anyhoo, your net-suits will allow you to experience Fry's worm infested bowels as if you were actually wriggling through them. Okay, it's 500 dollars, you have no choice of carrier, the battery can't hold the charge and the reception isn't very⦠Then we'll go with that data file! Do a flip! Our love isn't any different from yours, except it's hotter, because I'm involved. I barely knew Philip, but as a clergyman I have no problem telling his most intimate friends all about him. Leela, are you alright? You got wanged on the head. We'll go deliver this crate like professionals, and then we'll go home. No, she'll probably make me do it. Yes! In your face, Gandhi! Then throw her in the laundry room, which will hereafter be referred to as the brig. Bite my shiny metal ass. You'll have all the Slurm you can drink when you're partying with Slurms McKenzie! I saw you with those two ladies of the evening at Elzars. Explain that."'Oh ': [y, w, a, g]
'the': [a, r, i]
'hro': [a, a, r, i, i, w, w, w]
...So let's make a variable called ngrams to hold our dictionary, and a variable called order to define how long each of the "words" in our dictionary should be.
ngrams = {}
order = 3ngrams = {}
order = 3
for i in range(0, len(txt)-order)):
# do somethingngrams = {}
order = 3
for i in range(0, len(txt)-order)):
# 1. get the "word" for our dictionary
gram = txt[i:i+order]
# 2. If the word doesn't exist in our dictionary already, let's add it
if(gram not in ngrams):
ngrams[gram] = []
# 3. Let's add the following character to the list stored in the dictionary
ngrams[gram].append(txt[i+order])
currentGram = txt[0:order]
result = currentGramimport random
currentGram = txt[0:order]
result = currentGram
possibilities = ngrams[currentGram]
result += random.choice(possibilities)
print(result)import random
currentGram = txt[0:order]
result = currentGram
# range sets number of generated characters
for i in range(100):
possibilities = ngrams[currentGram]
result += random.choice(possibilities)
# update currentGram to be the last three characters
currentGram = result[-order:]
print(result)import random
currentGram = txt[0:order]
result = currentGram
# range sets number of generated characters
for i in range(100):
# Make sure the currentGram is in our dictionary
if currentGram not in ngrams:
break
possibilities = ngrams[currentGram]
result += random.choice(possibilities)
# update currentGram to be the last three characters
currentGram = result[-order:]
print(result)All of the code together is below. Play around with different "order" values and see what kinds of results you get. There are clearly limitations with out simple implementation - for example treating uppercase and lowercase letters differently.
import random
ngrams = {}
order = 3
for i in range(0, len(txt)-order)):
gram = txt[i:i+order]
if(gram not in ngrams):
ngrams[gram] = []
ngrams[gram].append(txt[i+order])
currentGram = txt[0:order]
result = currentGram
for i in range(100):
if currentGram not in ngrams:
break
possibilities = ngrams[currentGram]
result += random.choice(possibilities)
currentGram = result[-order:]
print(result)There are many libraries that implement Markov Chains for us that we can use. One such library is called Markovify. Install the package with pip (or in Replit using the package manager). Below is the basics of the library, but explore the documentation for more.
import markovify
text_model = markovify.Text(txt, state_size=1)
print(text_model.make_sentence())By default, Markovify doesn't split the text into ngrams. Instead, it splits by words. The state_size parameter changes how many words are split (default is 2 words, but for small amounts of text, you get more interesting results using 1)
If we want to make sentences but with a starting point, Markovify has another function called make_sentence_with_start. You have to make sure that the "start" you provide is definitely in the source text.
import markovify
text_model = markovify.Text(txt, state_size=1)
print(text_model.make_sentence_with_start(beginning="Oh"))