Core functions

Tokenization

luvina.backend.tokenize(sentence, lowercase=True)

Just your basic tokenization operation.

luv.tokenize('The cat sat on the mat.')
>>> ['the', 'cat', 'sat', 'on', 'the', 'mat', '.']

Get definitions from token using WordNet.

luvina.backend.get_definition(token)

luv.get_definition('car')
>>> 'a motor vehicle with four wheels; usually propelled by an internal combustion engine'

luvina.backend.get_synonyms(token)

Get synonyms of the given token using WordNet.

luv.get_synonyms('wallet')
>>> ['billfold', 'notecase', 'pocketbook']

luvina.backend.get_stop_words(language='english'):

Get a set of stop words (most common words) in the given language.

list(luv.get_stop_words())[:5]
>>> ['doing', 'than', 'up', 'wouldn', 'this']

luvina.backend.calculate_wordnet_similarity(token_1, token_2):

Returns a score based on the shortest path between the senses in the is-a (hypernym/hypnoym) taxonomy.

luv.calculate_wordnet_similarity('dog','cat')
>>> 0.8571428571428571

luvina.backend.lemmatize(token, pos='n'):

Basic lemmatization using a given part-of-speech.

luv.lemmatize('driving', pos='v')
>>> 'drive'

luvina.backend.expand_contractions(sentence):

Expand English contractions found in sentence/token:

luv.expand_contractions("don't eat it")
>>> 'do not eat it'

luvina.backend.tag_pos(tokens):

POS tagging using a pre-trained neural network from NLTK.

luv.tag_pos(luv.tokenize('a person is driving'))
>>> [('a', 'DT'), ('person', 'NN'), ('is', 'VBZ'), ('driving', 'VBG')]

luvina.backend.calculate_levenshtein_distance(token_1, token_2):

Returns number of characters that need to be substituted, inserted or deleted from "token_1" to get "token_2".

luv.calculate_levenshtein_distance('base','vase')
>>> 1

luvina.backend.in_dictionary(token)

Returns True if the given token is found in the enchant dictionary. Returns False otherwise.

luv.in_dictionary('book')
>>> True

luvina.backend.suggest_words(token)

Returns a list of similar tokens.

luv.suggest_words('better')[:5]
>>> ['better', 'netter', 'betters', 'bester', 'setter']

luvina.backend.get_vector(token)

Get glove word vector representation of a token.

luv.get_vector('space')

luvina.backend.get_vectors(tokens)

Returns a list of glove word vector representation of a list of tokens.

luv.get_vectors(['I','was', 'here'])

luvina.backend.calculate_norm(vector)

Calculates the norm of a vector.

luv.calculate_norm(luv.get_vector('mouse'))
>>> 7.2006297

luvina.backend.calculate_cosine_similarity(vector_1, vector_2)

Calculates the cosine similarity between two vectors.

vector_1 = luv.get_vector('cup')
vector_2 = luv.get_vector('plate')
luv.calculate_cosine_similarity(vector_1, vector_2)
>>> 0.4010278

luvina.backend.get_related_words(token, max_num=3)

Use glove embeddings to get closer word vectors.

luv.get_related_words('animal', 3)
>>> ['animals', 'pet', 'dog']