Skip to content

benev0/english-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

English lang parser

This project takes input from the user in the terminal and displays results in the terminal. Results are somewhat obtuse as the output contains an autogenerated tree print.

The parser can handle the following nouns, verbs, prepositions, and infinitives; notably adverbs and adjectives have been omitted.

Available words are hardcoded are hard coded into the assignment, but others can be added but the user.

organization of the program

organization

The program is contained in one file and has three primary parts: a lexer, a parser, and a main method. There is one data structure in the program XPhrase; not all configurations of the tree are valid this will be restricted by the parser.

program

the main method has a few some commands

  • quit -- exit the program
  • noun -- add nouns to the context
  • verb -- add verbs to the context
  • prep -- add prepositions to the context
  • check -- try to generate parse trees for an input sentence

Research

This was based somewhat on x-bar tree structure. There was some difficulty locating resources that gave exact parsing rules, so the rules developed where primarily based off of prior understanding.

My understanding of tree printing comes form this post.

Discussion

English has ambiguity. There are two sources of this lexical and syntactical.

Lexical ambiguity is when one word has different meanings. This is resolved by simply generating every posable combination of ambiguous words when making the starting contexts.

Syntactical ambiguity is when phrases could apply to multiple phrases. This is not trivially resolved as phrases can only wonder so far or may require additional context other than the part of speech to correctly place these. This program attempts to move these phrases around without accounting for those added contexts; this often results in strange behavior or breaking of the program including vanishing prepositions.

Avoiding infinite recursion is the most difficult task as a small subset of the rules can indefinitely loop when not properly checked. To fully avoid this the following change could be made to the program: store all valid ParseConfig in a set rather than a list, and each iteration the set would be extended with all posable parses while also preserving the old parses; the loop terminates when the set no longer alters. This would only encounter an issue where there is also a leak.

Extensions

The program does not support adjectives nor adverbs as these do not significantly alter the structure of the sentence these should be a small extension. For a more robust rule set, there are many things that could be added for example wh movement AKA parsing questions or linking verbs such as is and are.

Conclusion

This project is a success, having accomplished the goal of writing a simple english parser. There are some bugs and oversimplifications that are present; however, this software is extendable in its current form.

What I learned

As this is a parsing project I have a much better understanding of writing parsers. The most concrete pice of understanding that I gained would be in IO. In particular how to wright IO functions that return values.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published