Localipsum, a sample text generator

Posted on 23 Aug 2015 in programming • 3 min read

Localipsum

Localipsum is a simple python program that aim to generate sample text for designers and developers as the well known Lorem Ipsum website. The main difference is that it use your own dictionaries on your computer.

I worked on this program at the beginning of the year on the basis of a fun idea: use my own dictionaries to generate sample text. This was a quick program to write but the vim's dictionaries were (and still are) a problem (more on the dedicated section).

I have stopped to work on this program as it fulfil my current need.

The program is available on github under public license: github.com/maggick/localipsum

Usage

The documentation is display with the --help, -h option:

$ ./localipsum.py -h
usage: localipsum.py [-h] [--paragraphs] [--sentences] [--words] [--lorem] integer

Localispum is a simple program providing sample text generating from your own
dictionaries
...

The program take in argument a integer and a "type" of text. The available "type" of text are:

  • words (--words, -w)
  • sentences (--sentences, -s)
  • paragraphs (--paragraphs, -p)

The integer will indicate how many quantity of this object it should print.

An optional argument (--lorem, -l) will put "Lorem ipsum dolor sit amet," at the beginning of the generated text.

And for instance:

$ ./localipsum.py --sentences 5
Crutch's just dissociable pick, janitors contractions appropriating multipliers
beguiling patrimonial escorted opalescent penetrations coweringly forging await
furtive northland roves humidification relentlessly tomb's will. Styrofoam
pediatric exposer incubator's exam alibi's misnomer, vi scope armpits swung
commutative osteopathic serving chortle dreading rates composite chalked expose
agreer preconceive magnanimity railroad. Filmstrip gases flatulent incant
footstep purchasing, gyroscope's furtiveness nun putnam pungent genuineness
equator's far transference. Carla bhutan, soundings protect jabbing knower
pyramids reintroducing riven cumbersome. Siltstone consider amphibole,
incident's cherub clam's should carolingian expired abatement charlemagne.

Dictionaries

The program use the dictionaries from the ./locations file (one path per line). Some basic dictionaries are already in it.

/usr/share/dict/

This directory may contain dictionaries on Unix system, on my own it contain a dictionary for passwords cracking.

Firefox

The Firefox dictionaries are located in a file like: ~/.mozilla/firefox/xxxxxxxx.default/extensions/fr-dicollecte@dictionaries.addons.mozilla.org/dictionaries/ where xxxxxxxx is the identifier of your Firefox profile.

If you have more than one profile each one will have a different name.

vim

The vim dictionaries are spl and sug files. The main ones are located in a path like /usr/share/vim/vimXX/spell/ where XX is your version of vim (for me it is 74 right now).

Your personal dictionaries are located in ~/.vim/spell/.

This files are not directly readable in python. The command vim -c spelldump let you display the current dictionary in a vim buffer. From there it is quit simple to redirect it to a file. Nevertheless this suppose that vim is install on the system and the file are not clean and include lots of non word in them.

I choose not to include them in the program for now.

Technical stuff

The program is written in python 3, it is actually 128 lines written in 29 commits.

It supposes that that the paths are Unix like.

Random generation

The words are chosen with the random.randint(0, len(dictionaries)-1) function. We also put a comma if there is more than 7 words. This comma is placed randomly with more and more probability to be place as the words are generated.

In order to generate sentences, we suppose that one sentence is composed of 10 to 25 words.

In order to generate paragraphs, we suppose that one paragraph is composed from 4 to 8 sentences.