Introduction
Markovian is a command line utility for generating fake words or names. You give it one or more lists of words and it analyses their structure, finding some of the underlying rules and then using those to generate new words.
Installation
Precompiled releases
Precompiled releases area available for macOS, windows and linux (untested). You can download the most recent release from the releases page.
If you're using windows you'll need to have the microsoft Visual C++ Redistributable libraries installed.
Compiling from source
The source code is available on github from the mikeando/markovian
repository.
You will need the nightly version of the rust compiler and cargo installed.
Then you simply need to run
cargo build --release
Quick Start
Installing the binary
Download the release for your platform from the releases page and place it in an appropriate location.
Install a few word-lists
We'll use the Moby_Names_M_lc.txt
downloadable from the Markovian
repository resources
and theological_angels.txt
from the large list of wordlists from the data
of MarkovNameGenerator.
Generate some words
First we'll generate some names from the Moby_Name_M_lc.txt file
> markovian simple generate --encoding=string --count=5 \
Moby_Names_M_lc.txt
stergiramessey
barnan
bralph
jerrel
jord
Next we'll use both of the word lists
> markovian simple generate --encoding=string --count=5 \
Moby_Names_M_lc.txt theological_angels.txt
harutchel
gord
waylinton
cord
benatton
We can also request a specific prefix and/or suffix
> markovian simple generate --encoding=string --count=5 \
Moby_Names_M_lc.txt theological_angels.txt \
--prefix=jo
joenjamey
jophield
jon
jos
johnaterrius
> markovian simple generate --encoding=string --count=5 \
Moby_Names_M_lc.txt theological_angels.txt \
--suffix=io
alio
meodovio
hatrizio
antonio
allio
> markovian simple generate --encoding=string --count=5 \
Moby_Names_M_lc.txt theological_angels.txt \
--prefix=jo --suffix=io
johansalvandrio
jonancio
jorrizio
josidarrio
jonancilio
Speeding it up
On large word-lists the `simple generate` command can be slow, as it needs to process a lot of data. We can make the word generation run very quickly by precomputing everything it needs.
Conceptually creating the preprocessed data has three steps.
- Determining the input symbols.
- Identifying and combining symbols that occur together.
- Building triplet map
Generating the initial symbol table
The first stage is
markovian symbol-table generate --encoding=string \
--output=A.symboltable --input=word-list-1.txt --input=word-list-2.txt
This generates a symbol table file called A.symboltable
containing all the letters
from the two input word lists --- you'll want to use a better name for the output.
You can see the list of symbols it uses
markovian symbol-table print --input=A.symtable
For example
> markovian symbol-table generate --encoding=string --output=Moby_initial.symboltable --input=Moby_Names_M_lc.txt
using 3878 input strings
found 30 symbols
wrote Moby_initial.symboltable
> markovian symbol-table print --input=Moby_initial.symboltable
encoding: char
max symbol id: 30
0 => START
1 => END
2 => a
3 => r
4 => o
5 => n
...
24 => x
25 => z
26 => v
27 => '
28 =>
29 => q
Combining symbols
This step works on an existing symbol table file and looks for symbols that occur together frequently in the input and combines them into one compound symbol.
markovian symbol-table improve A.symboltable --output=B.symboltable word-list-1.txt word-list-2.txt
For example
> markovian symbol-table improve Moby_initial.symboltable --output Moby_50.symboltable resources/Moby_Names_M_lc.txt
...
> markovian symbol-table print --input=Moby_50.symboltable
encoding: char
max symbol id: 80
0 => START
1 => END
2 => a
3 => r
...
29 => q
30 => er
31 => ar
...
75 => em
76 => ab
77 => do
We can then see how this symbol-table breaks up words using
> markovian symbol-table symbolify --symbol-separator="." Moby_50.symboltable johnathon stephan arnold eric
johnathon => ["j.o.h.n.a.th.on"]
stephan => ["st.e.p.h.an"]
arnold => ["ar.n.ol.d"]
eric => ["er.i.c", "e.ri.c"]
We only show the shortest symbols that produce the given word, but it is possible that more than one combination
can produce the same length - in the example above eric
can be written two ways.
At the moment this performs a fixed number (50) of symbol combining steps. This will become configurable in the future.
If you want to combine more symbols you can rerun this stage on the new symbol table file.
Generating the triplet maps / generator
We create the triplet maps / generator file using
> markovian generator create B.symboltable --output=A.generator word-list-1.txt word-list-2.txt
Generating words with the generator
markovian generator generate A.generator
You can add --prefix=prefix
, --suffix=suffix
and --count=N
to this too.
Reference
Details about all commands are available usingmarkovian --help
Contact Me
You can contact me with feedback or issues through the issues page on github.Word Lists
- MarkovNameGenerator has some great word lists in its embed directory.
- Moby Word Lists by Grady Ward contain compiled from data in the project Guttenberg library
If you know of any other great word-lists I'd love to add them here.