A Novel Approach to Semantic Parsing and Text Analysis
CLEAVER introduces a novel, hierarchical approach to English text categorization and partitioning. Focusing on independent semantic value, it distinguishes itself from traditional syntactic parsers by offering an expansive range of linguistic units for text analysis. By identifying the largest meaningful span of tokens within a text, CLEAVER's hierarchical framework provides a novel system for textual segmentation to be used in text generation and analysis.
Recent years have seen Transformer-based Neural Networks replace rule-based, statistical models in Natural Language Processing (NLP) and semantic parsing. Despite these advancements, there remains no comprehensive framework for semantic analysis, making it difficult to translate machine learning advancements to educational technologies. Traditional syntactic parsers provide limited insight into the semantic structure of text, and researchers seeking semantic analysis often construct custom partitioning systems or use specialized transformer models.
Figure 1: Visualization of CLEAVER's hierarchical text partitioning approach
CLEAVER addresses several critical limitations in current NLP approaches:
By providing a hierarchical semantic partitioning system, CLEAVER enables better analysis of text data itself, allowing language models to be specialized for specific writing tasks such as analyzing semantic patterns in student writing or argumentation.
CLEAVER draws from the principle of compositionality in Compositional Semantics, by which the aggregate meaning of a phrase is determined by the composition of its subphrases. The system includes:
The system begins partitioning at the highest level of compositional meaning (compound-complex sentences) and continues down to simple sentences, noun complexes, and eventually the word-token level.
Figure 2: Hierarchical structure of text partitioning in CLEAVER
Attribute (Abbreviation) | Description |
---|---|
Compound-Complex Sentence (CCS) | Contains multiple Nominal Subjects (SH), indefinite Verb-Object units (SB), and indefinite Asides (A). |
Compound Sentence (CS) | Contains multiple Nominal Subjects (SH), indefinite Verb-Object units (SB), and no Asides (A). |
Complex Sentence (XS) | Contains a singular Nominal Subjects (SH), indefinite Verb-Object units (SB), and indefinite Asides (A). |
Aside (A) | Extraneous information signified by surrounding punctuation. |
Sentence (S) | Composition of a Structural Phrase, Nominal Subject (SH), and an indefinite number of Verb-Object units. |
Structural Phrase (SP) | Adjectival, Adverbial, Prepositional, or Nominal information that precedes the Nominal Subject (SH). |
Simple Sentence (SS) | Composition of one Nominal Subject (SH) and an indefinite number of Verb-Object units. |
Sentence Head (SH) | Nominal Subject of a Simple Sentence. |
Sentence Body (SB) | Containing at least one Verb, the total span of Verb-Object units within a Simple Sentence. |
Table 1: Core semantic idea-units in CLEAVER's partitioning system (showing 9 of 30 total units)
Future development of CLEAVER will include integration with cognitive and psychological word categories from established lexical resources to enhance semantic analysis capabilities:
This integration will enable CLEAVER to not only partition text according to its hierarchical semantic structure but also to estimate underlying psychological and cognitive dimensions of the text, bridging computational linguistics with cognitive psychology.
Further validation of CLEAVER's utility within text generative models and analysis is needed through continued research in text prompting and categorization.