Grabbing the word list

To narrow down our word guesses it would help to know what list wordle uses, apparently it’s hardcoded into the javascript on the site. I found this wget command to mirror an entire site offline and I used this to get the source code for wordle.

$ wget --mirror --page-requisites --convert-links --adjust-extension --compression=auto --no-if-modified-since --no-check-certificate https://www.nytimes.com/games/wordle/index.html

Turns out worlde is now owned by the NewYork Times and https://www.powerlanguage.co.uk/wordle will now route you to the NewYork times domain. Anyway, after running this command you should have a folder named www.nytimes.com

$ tree www.nytimes.com
www.nytimes.com
├── games
│   └── wordle
│       ├── fonts
│       │   ├── franklin-normal-500.woff
│       │   ├── franklin-normal-500.woff2
│       │   ├── franklin-normal-700.woff
│       │   ├── franklin-normal-700.woff2
│       │   ├── karnakcondensed-normal-700.woff
│       │   └── karnakcondensed-normal-700.woff2
│       ├── images
│       │   ├── nav-icons
│       │   │   ├── Crossword-Icon-Normalized-Color.svg
│       │   │   ├── Crossword-Icon-Normalized.svg
│       │   │   ├── LetterBoxed-Icon-Normalized-Color.svg
│       │   │   ├── LetterBoxed-Icon-Normalized.svg
│       │   │   ├── Mini-Icon-Normalized-Color.svg
│       │   │   ├── Mini-Icon-Normalized.svg
│       │   │   ├── SpellingBee-Icon-Normalized-Color.svg
│       │   │   ├── SpellingBee-Icon-Normalized.svg
│       │   │   ├── Sudoku-Icon-Normalized-Color.svg
│       │   │   ├── Sudoku-Icon-Normalized.svg
│       │   │   ├── Tiles-Icon-Normalized-Color.svg
│       │   │   ├── Tiles-Icon-Normalized.svg
│       │   │   ├── Vertex-Icon-Normalized-Color.svg
│       │   │   └── Vertex-Icon-Normalized.svg
│       │   ├── wordle_logo_192x192.png
│       │   └── wordle_logo_32x32.png
│       ├── index.html
│       ├── main.bd4cb59c.js
│       └── manifest.json
├── games-assets
│   ├── gdpr
│   │   └── cookie-notice-v2.1.2.min.js
│   └── v2
│       └── metadata
│           ├── nyt-apple-touch-icon.png?v=v2202021345
│           └── nyt-safari-pinned-tab.svg?v=v2202021345
└── robots.txt

Only thing we need are what’s in the www.nytimes.com/games/wordle directory

Cleaning up the javascript

main.bd4cb59c.js is a minified js file with the wordlist used for the game 04f07217ee4bb01d0151b4cba5e73ff1.png

Using the Beutify plugin from vs code we can make this a bit more readable 99995329d7c74cabb0ddd9c8c5134b42.png

Before

4edd1980819a0c6a7bf6f08ebfedf36e.png

After

bc849d70181d0fcde907ad475d7d124c.png

Making our own wordlist

Scrolling down a little we can see that there are two variables that contain large the full word list Ma and Oa c7215992672630c0ca39d98a5599629f.png 82e3facce1c9c11934c721b979df6e8d.png

We can copy these two variables to a new python file and treat them as a normal list.

2164730c7a9c0a3a452c039ca6c275be.png

Then we can print both of them out and count how many valid words are posible which is 12,947

for i in ma:    
    print(i)
for x in oa:
    print(x)
python3 new.py | wc -l    
   12947

Now we can save this to a new file

python3 new.py >> wordlist.txt   

Side note about the New York Times buyout

I had a copy of this site when it was hosted on www.powerlanguage.co.uk and it looks like the NewYork times removed some unsavory words the word list. Probably a good idea considering this is family game. d06227b5cd480da23e119a542fce7503.png However they should probably do a couple more word searches 6b1f19baee5ae1ce74b5b06d1d79ce8d.png

Anyway with the words saved to a list we can randomly select one for our first guess

shuf -n 1 wordlist.txt
varix

e35c281d8424df91bb4a3e41eb1a744e.png What does this tell us?

Well we now know that the word does not contain the letters v, a, r, or x and it must have the letter i in the 4th index.

to simulate this logic in grep we need to use the -v option which allows us to invert our match, basically a NOT operator. c8b22577727d8598db8919492775673c.png

to make this faster we can also add an OR operator using \| as our match string. so if we wanted all words from our list not containing
the characters v or a or r or or x we can use this

cat wordlist.txt | grep -v 'v\|a\|r\|x'
zoppo
zouks
zowee
zowie
zulus
zuzim
zygon
zymes
zymic

This narrows our list down to 4,947

cat wordlist.txt | grep -v 'v\|a\|r\|x' | wc -l                        
    4947

We also need to filter out all of the words that don’t have an i in the 4th position which can be done with the -E flag that let’s us use more RegEx with grep I guess fe09458cb80ee9a24b9c58f20d47f8cb.png

this is grep command to match all words with i as the 4th letter

grep -E "...i."

When combined with our previous commands it narrows our wordlist down to 341 words.

cat wordlist.txt | grep -v 'v\|a\|r\|x' |  grep -E "...i." | wc -l 
     341

Let’s randomly select one word from this new list

cat wordlist.txt | grep -v 'v\|a\|r\|x' |  grep -E "...i." | shuf -n 1

No luck but we do get some more hints 96015693b3b05464287e6ec49ff5896e.png

now we can add o,f and e to as new filters in our grep command which narrows our search down to 117 words.

cat wordlist.txt | grep -v 'v\|a\|r\|x\|o\|f\|e' |  grep -E "...i." | wc -l
117

Let’s get another random word

cat wordlist.txt | grep -v 'v\|a\|r\|x\|o\|f\|e' |  grep -E "...i." | shuf -n 1  
tiyin

better luck 7e22f15dfcc5470f0668d1df82964880.png

we now know that N and Y are in the word but Y is not in index 3 and N is not in index 5, we also know I only appears in the word once and T is not in the word. To get the best out of this we need an expression that can count how often I appears in the word and an expression that ensures Y and N can only appear in the first two indexes

we can the letter t to our ignore command

grep -v 'v\|a\|r\|x\|o\|f\|e\|t'

this command uses regex quantifier to exclude all words that contain an i character more than once

grep -vE '([^i]*i){2,}' 

Now we want to exclude all words that end with yin

grep -vE "..yin"

But we still only want words that contain y and n so we can string these two statements together and it will work as an AND operator. Putting it all together the commands looks like this

cat wordlist.txt | grep -v 'v\|a\|r\|x\|o\|f\|e\|t' | grep -E "...i." | grep -vE '([^i]*i){2,}' | grep -vE "..yin" | grep "y" |  grep "n"

Leaving us with three words to pick from

cynic
kylin
lysin

I’ll chose cynic because it’s the most word like option. Perfect f426b0fef16292f06210a56033b2d7dd.png