Solving wordle with grep
Grabbing the word list⌗
To narrow down our word guesses it would help to know what list wordle uses, apparently it’s hardcoded into the javascript on the site. I found this wget command to mirror an entire site offline and I used this to get the source code for wordle.
$ wget --mirror --page-requisites --convert-links --adjust-extension --compression=auto --no-if-modified-since --no-check-certificate https://www.nytimes.com/games/wordle/index.html
Turns out worlde is now owned by
the NewYork Times and https://www.powerlanguage.co.uk/wordle will now route you to the NewYork times domain.
Anyway, after running this command you should have a folder named www.nytimes.com
$ tree www.nytimes.com
www.nytimes.com
├── games
│ └── wordle
│ ├── fonts
│ │ ├── franklin-normal-500.woff
│ │ ├── franklin-normal-500.woff2
│ │ ├── franklin-normal-700.woff
│ │ ├── franklin-normal-700.woff2
│ │ ├── karnakcondensed-normal-700.woff
│ │ └── karnakcondensed-normal-700.woff2
│ ├── images
│ │ ├── nav-icons
│ │ │ ├── Crossword-Icon-Normalized-Color.svg
│ │ │ ├── Crossword-Icon-Normalized.svg
│ │ │ ├── LetterBoxed-Icon-Normalized-Color.svg
│ │ │ ├── LetterBoxed-Icon-Normalized.svg
│ │ │ ├── Mini-Icon-Normalized-Color.svg
│ │ │ ├── Mini-Icon-Normalized.svg
│ │ │ ├── SpellingBee-Icon-Normalized-Color.svg
│ │ │ ├── SpellingBee-Icon-Normalized.svg
│ │ │ ├── Sudoku-Icon-Normalized-Color.svg
│ │ │ ├── Sudoku-Icon-Normalized.svg
│ │ │ ├── Tiles-Icon-Normalized-Color.svg
│ │ │ ├── Tiles-Icon-Normalized.svg
│ │ │ ├── Vertex-Icon-Normalized-Color.svg
│ │ │ └── Vertex-Icon-Normalized.svg
│ │ ├── wordle_logo_192x192.png
│ │ └── wordle_logo_32x32.png
│ ├── index.html
│ ├── main.bd4cb59c.js
│ └── manifest.json
├── games-assets
│ ├── gdpr
│ │ └── cookie-notice-v2.1.2.min.js
│ └── v2
│ └── metadata
│ ├── nyt-apple-touch-icon.png?v=v2202021345
│ └── nyt-safari-pinned-tab.svg?v=v2202021345
└── robots.txt
Only thing we need are what’s in the www.nytimes.com/games/wordle
directory
Cleaning up the javascript⌗
main.bd4cb59c.js
is a minified js file with the wordlist used for the game
Using the Beutify plugin from vs code we can make this a bit more readable
Before⌗
After⌗
Making our own wordlist⌗
Scrolling down a little we can see that there are two variables
that contain large the full word list Ma
and Oa
We can copy these two variables to a new python file and treat them as a normal list.
Then we can print both of them out and count how many valid words are posible which is 12,947
for i in ma:
print(i)
for x in oa:
print(x)
python3 new.py | wc -l
12947
Now we can save this to a new file
python3 new.py >> wordlist.txt
Side note about the New York Times buyout⌗
I had a copy of this site when it was hosted on www.powerlanguage.co.uk and it looks like the NewYork times removed some unsavory words the word list. Probably a good idea considering this is family game. However they should probably do a couple more word searches
Anyway with the words saved to a list we can randomly select one for our first guess
shuf -n 1 wordlist.txt
varix
What does this tell us?
Well we now know that the word does not contain the letters v, a, r, or x and it must have the letter i in the 4th index.
to simulate this logic in grep we need to use the -v
option which allows us to invert our match, basically a NOT
operator.
to make this faster we can also add an OR
operator using \|
as our match string. so if we wanted all words from our list not containing
the characters v
or a
or r
or or x
we can use this
cat wordlist.txt | grep -v 'v\|a\|r\|x'
zoppo
zouks
zowee
zowie
zulus
zuzim
zygon
zymes
zymic
This narrows our list down to 4,947
cat wordlist.txt | grep -v 'v\|a\|r\|x' | wc -l
4947
We also need to filter out all of the words that don’t have an i
in
the 4th position which can be done with the -E
flag
that let’s us use more RegEx with grep I guess
this is grep command to match all words with i
as the 4th letter
grep -E "...i."
When combined with our previous commands it narrows our wordlist down to 341 words.
cat wordlist.txt | grep -v 'v\|a\|r\|x' | grep -E "...i." | wc -l
341
Let’s randomly select one word from this new list
cat wordlist.txt | grep -v 'v\|a\|r\|x' | grep -E "...i." | shuf -n 1
No luck but we do get some more hints
now we can add o,f and e to as new filters in our grep command which narrows our search down to 117 words.
cat wordlist.txt | grep -v 'v\|a\|r\|x\|o\|f\|e' | grep -E "...i." | wc -l
117
Let’s get another random word
cat wordlist.txt | grep -v 'v\|a\|r\|x\|o\|f\|e' | grep -E "...i." | shuf -n 1
tiyin
better luck
we now know that N and Y are in the word but Y is not in index 3 and N is not in index 5, we also know I only appears in the word once and T is not in the word. To get the best out of this we need an expression that can count how often I appears in the word and an expression that ensures Y and N can only appear in the first two indexes
we can the letter t
to our ignore command
grep -v 'v\|a\|r\|x\|o\|f\|e\|t'
this command uses regex quantifier to exclude all words that contain
an i
character more than once
grep -vE '([^i]*i){2,}'
Now we want to exclude all words that end with yin
grep -vE "..yin"
But we still only want words that contain y
and n
so we can string these two statements together and it
will work as an AND
operator. Putting it all together the commands looks like this
cat wordlist.txt | grep -v 'v\|a\|r\|x\|o\|f\|e\|t' | grep -E "...i." | grep -vE '([^i]*i){2,}' | grep -vE "..yin" | grep "y" | grep "n"
Leaving us with three words to pick from
cynic
kylin
lysin
I’ll chose cynic because it’s the most word like option. Perfect