Homework
Your task is to develop a sentiment analyzer train on the Stanford Sentiment Treebank:
- Create a vector_space_models.py file in the src/homework/ directory.
- Define a function named
sentiment_analyzer()that takes two parameters, a list of training documents and a list of test documents for classification, and returns the predicted sentiment labels along with the respective similarity scores. - Use the -nearest neighbors algorithm for the classification. Find the optimal value of using the development set, and then hardcode this value into your function before submission.
Data
The sentiment_treebank directory contains the following two files:
- sst_trn.tst: a training set consisting of 8,544 labeled documents.
- sst_dev.tst: a development set consisting of 1,101 labeled documents.
Each line is a document, which is formatted as follows:
[Label]\t[Document]
Below are the explanations of what each label signifies:
0: Very negative1: Negative2: Neutral3: Positive4: Very positive
Submission
Commit and push the vector_space_models.py file to your GitHub repository.
Extra Credit
Define a function named sentiment_analyzer_extra() that gives an improved sentiment analyzer.
Rubric
- Code Submission (1 point)
- Program Execution (1 point)
- Development Set Accuracy (4 points)
- Evaluation Set Accuracy (4 points)
- Concept Quiz (2 points)