Natural language processing tool
This was among the more challenging problems I have tackled recently. It started on a bus ride home to visit family. I was toying around with trying to interpret the meaning of sentences using a simple Python script.
After some time hacking away the script quickly began to grow in complexity. I soon realized I should tackle this using a more strict language like C#. I rewrote my algorithm in C#, and continued development. I hooked into the Merriam-Webster Inc. API to automatically define and categorize any words I didn’t have in my hand-written dictionary. This greatly expanded the vocabulary of my words (by about 250k words!) and enabled me to experiment with totally novel phrases.
After a few weeks and some helpful tips from my Linguistics Professor Dr. Reinholtz, My program was able to receive an input phrase and recursively generate a tree representation of the structure of the phrase.
During a conversation with Dr. Reinholtz about the program I mentioned my program had an issue where it was (so I thought) misclassifying a part of the tree as a Determiner-Phrase rather than a specifier for a Noun-Phrase. I was suprised to learn that this was not a bug, but rather a real feature of the English language! My algorithm had predicted a language feature that I initially hadn’t been aware of.
The code is up on GitLab for anyone interested