Linguistic Variation in Classical Hebrew: from Markov Models to Neural Networks

This project is a continuation of the AA 2017 project “Probabilistic Approach to Linguistic Variation and Change in Biblical Hebrew”, which investigated whether the so-called Standard Biblical Hebrew [SBH] books and Late Biblical Hebrew [LBH] books exhibit enough internal consistency to confirm the traditional divisions into an SBH and an LBH corpus. With a Markov Model [MM] of the clause, phrase, and partof-speech tendencies for each book in the Hebrew Bible, distances between books were measured in order to cluster them based on similarity. The results partly confirmed the scholarly consensus (e.g.: internal consistency SBH), partly corrected it (e.g.: LBH is much more heterogeneous). In addition, both the potential and the limitations of MMs became increasingly visible. MMs predict the next state only based on the current state. However, in linguistic utterances, previous states affect future states (e.g.: in the sequence [Subject] [Verb] [X], the presence of [Subject] before [Verb] rules out that state [X] is [Subject]). The new project will build upon the previous project by applying more complex probabilistic models that are able to do justice to the sequential structure of natural language: Recurrent Neural Networks (RNNs). RNNs have the ability to output the next state depending not only on the current state, but also on the previous states. Hence, structural dependencies in a sentence can be better modeled than by using MMs. Therefore, the use of an RNN for this problem is a natural next step in better understanding the linguistic variation. However, since RNNs have a more complex structure, the results of the model are harder to interpret and to explain. The results of the previous project can help in understanding the output of the RNN. We expect that we will obtain a better model that will give more precise results as compared to the MMs.