Linguistic Variation in Classical Hebrew: from Markov Models to Neural Networks

This project is a continuation of the AA 2017 project “Probabilistic Approach to
Linguistic Variation and Change in Biblical Hebrew”, which investigated whether the
so-called Standard Biblical Hebrew [SBH] books and Late Biblical Hebrew [LBH]
books exhibit enough internal consistency to confirm the traditional divisions into an
SBH and an LBH corpus. With a Markov Model [MM] of the clause, phrase, and partof-speech tendencies for each book in the Hebrew Bible, distances between books
were measured in order to cluster them based on similarity.
The results partly confirmed the scholarly consensus (e.g.: internal consistency
SBH), partly corrected it (e.g.: LBH is much more heterogeneous). In addition, both
the potential and the limitations of MMs became increasingly visible. MMs predict the
next state only based on the current state. However, in linguistic utterances, previous
states affect future states (e.g.: in the sequence [Subject] [Verb] [X], the presence of
[Subject] before [Verb] rules out that state [X] is [Subject]).
The new project will build upon the previous project by applying more complex
probabilistic models that are able to do justice to the sequential structure of natural
language: Recurrent Neural Networks (RNNs). RNNs have the ability to output the
next state depending not only on the current state, but also on the previous states.
Hence, structural dependencies in a sentence can be better modeled than by using
MMs. Therefore, the use of an RNN for this problem is a natural next step in better
understanding the linguistic variation. However, since RNNs have a more complex
structure, the results of the model are harder to interpret and to explain. The results
of the previous project can help in understanding the output of the RNN. We expect
that we will obtain a better model that will give more precise results as compared to
the MMs.