# Experiment 10: Implement an RNN for IMDB Movie Review Classification
## Title
Recurrent Neural Network (RNN) for IMDB Movie Review Classification
## Aim
To implement a Recurrent Neural Network (RNN) for classifying IMDB movie reviews as
either positive or negative.
## Objectives
- Understand the use of RNN for text classification.
- Preprocess text data and convert it into sequences using word embeddings.
- Train an RNN model using TensorFlow/Keras for sentiment analysis.
- Evaluate the model's performance using accuracy metrics.
---
## Program with Line-by-Line Explanation
Below is the complete Python code to implement an RNN for sentiment classification
on the IMDB dataset:
```python
# Import required libraries
import tensorflow as tf
from tensorflow import keras
from [Link] import sequence
from [Link] import Sequential
from [Link] import Embedding, SimpleRNN, Dense
from [Link] import imdb
# Step 1: Load the IMDB dataset
max_features = 10000 # Vocabulary size (top 10,000 words)
maxlen = 500 # Max length of a review (truncate/pad to this size)
batch_size = 32
# Load dataset with only top `max_features` words
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
# Step 2: Preprocess the data (pad sequences to ensure equal length)
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
# Step 3: Build the RNN model
model = Sequential([
Embedding(input_dim=max_features, output_dim=32), # Embedding layer
SimpleRNN(32), # Simple RNN layer with 32 units
Dense(1, activation='sigmoid') # Output layer for binary classification
])
# Step 4: Compile the model
[Link](loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Step 5: Train the model
[Link](x_train, y_train, epochs=5, batch_size=batch_size, validation_data=(x_test,
y_test))
# Step 6: Evaluate the model
test_loss, test_acc = [Link](x_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")
```
### Explanation of Code (Line by Line)
#### Step 1: Load the IMDB Dataset
```python
max_features = 10000 # Vocabulary size (top 10,000 words)
maxlen = 500 # Max length of a review (truncate/pad to this size)
batch_size = 32
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
```
- The IMDB dataset contains 50,000 movie reviews (25,000 for training and 25,000 for
testing).
- Each review is a sequence of integers representing word indices.
- `num_words=max_features` limits the vocabulary to the 10,000 most frequent
words.
- Reviews are labeled as positive (1) or negative (0).
#### Step 2: Preprocess the Data
```python
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
```
- Reviews vary in length, so they are padded or truncated to a fixed length of 500
words.
- This ensures all input sequences have the same shape, which is required for the
RNN.
#### Step 3: Build the RNN Model
```python
model = Sequential([
Embedding(input_dim=max_features, output_dim=32), # Embedding layer
SimpleRNN(32), # Simple RNN layer with 32 units
Dense(1, activation='sigmoid') # Output layer for binary classification
])
```
- **Embedding Layer**: Converts word indices into dense vectors of size 32, learning
word representations during training.
- **SimpleRNN Layer**: A basic RNN with 32 units that processes the sequence and
captures temporal dependencies between words.
- **Dense Layer**: A single neuron with a sigmoid activation function outputs a
probability (0 to 1) for binary classification.
#### Step 4: Compile the Model
```python
[Link](loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
```
- **Loss Function**: `binary_crossentropy` is suitable for binary classification tasks.
- **Optimizer**: `adam` adapts the learning rate for efficient training.
- **Metrics**: `accuracy` measures the model's performance.
#### Step 5: Train the Model
```python
[Link](x_train, y_train, epochs=5, batch_size=batch_size, validation_data=(x_test,
y_test))
```
- Trains the model for 5 epochs with a batch size of 32.
- Uses training data (`x_train`, `y_train`) and validates on test data (`x_test`, `y_test`)
after each epoch.
#### Step 6: Evaluate the Model
```python
test_loss, test_acc = [Link](x_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")
```
- Evaluates the model on the test dataset and prints the test accuracy, showing
performance on unseen data.
---
## Expected Output
After training for 5 epochs, the output might look like this:
```
Epoch 1/5
782/782 [==============================] - 35s 45ms/step - loss:
0.6500 - accuracy: 0.6000 - val_loss: 0.5500 - val_accuracy: 0.7000
Epoch 2/5
782/782 [==============================] - 32s 41ms/step - loss:
0.4500 - accuracy: 0.8000 - val_loss: 0.4000 - val_accuracy: 0.8200
Epoch 3/5
782/782 [==============================] - 32s 41ms/step - loss:
0.3000 - accuracy: 0.8800 - val_loss: 0.3500 - val_accuracy: 0.8500
Epoch 4/5
782/782 [==============================] - 32s 41ms/step - loss:
0.2000 - accuracy: 0.9200 - val_loss: 0.3200 - val_accuracy: 0.8600
Epoch 5/5
782/782 [==============================] - 32s 41ms/step - loss:
0.1200 - accuracy: 0.9500 - val_loss: 0.3100 - val_accuracy: 0.8700
Test Accuracy: 0.8700
```
The model typically achieves a test accuracy of around 85–87%, meaning it correctly
classifies reviews as positive or negative about 85% of the time.
---
## Conclusion
- Successfully implemented an RNN for IMDB movie review classification.
- Used word embeddings to numerically represent text data, enabling sequence
processing.
- The model effectively learns sentiment patterns, achieving good accuracy on the
test set.
This experiment demonstrates the power of RNNs in handling sequential data like text
for sentiment analysis tasks.