22 KiB
Отчёт по лабораторной работе №4
по теме: "Распознавание последовательностей"
Выполнили: Бригада 2, Мачулина Д.В., Бирюкова А.С., А-02-22
1. Создание блокнота и настройка среды. Настройка блокнота для работы с аппаратным ускорителем GPU
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/MyDrive/Colab Notebooks/is_lab4')
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
2. Загрузка данных IMBd. Настройка набора данных (4*2 - 1)
# загрузка датасета
from keras.datasets import imdb
vocabulary_size = 5000
index_from = 3
(X_train, y_train), (X_test, y_test) = imdb.load_data(path="imdb.npz",
num_words=vocabulary_size,
skip_top=0,
maxlen=None,
seed=7,
start_char=1,
oov_char=2,
index_from=index_from
)
print('Shape of X train:', X_train.shape)
print('Shape of y train:', y_train.shape)
print('Shape of X test:', X_test.shape)
print('Shape of y test:', y_test.shape)
Shape of X train: (25000,)
Shape of y train: (25000,)
Shape of X test: (25000,)
Shape of y test: (25000,)
3. Вывод отзыва из обучающего множества в виде списка индекса слов. Преобразование списка индексов в текст и вывод отзыва в виде текста. Вывод длины отзыва, метки класса и названия.
# создание словаря для перевода индексов в слова
# заргузка словаря "слово:индекс"
word_to_id = imdb.get_word_index()
# уточнение словаря
word_to_id = {key:(value + index_from) for key,value in word_to_id.items()}
word_to_id["<PAD>"] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2
word_to_id["<UNUSED>"] = 3
# создание обратного словаря "индекс:слово"
id_to_word = {value:key for key,value in word_to_id.items()}
#Вывод отзыва из обучающего множества в виде списка индексов слов
some_number = 192
review_indices = X_train[some_number]
print("Список индексов слов:")
print(review_indices)
#Преобразование списка индексов в текст
review_as_text = ' '.join(id_to_word[id] for id in X_train[some_number])
print("\nОтзыв в виде текста:")
print(review_as_text)
Список индексов слов: [1, 225, 164, 433, 74, 2753, 35, 2188, 20, 5, 397, 35, 298, 20, 585, 305, 10, 10, 45, 64, 61, 652, 21, 6, 52, 708, 9, 2, 725, 4, 2, 7, 1451, 1089, 105, 17, 230, 17, 2, 9, 1947, 4, 3238, 11, 135, 422, 26, 6, 2, 1021, 378, 1780, 224, 472, 36, 26, 379, 724, 2607, 387, 178, 1582, 4, 771, 36, 2, 112, 2, 34, 6, 2, 10, 10, 300, 103, 6, 2, 2, 8, 516, 25, 30, 252, 8, 376, 90, 51, 1455, 335, 4010, 33, 54, 25, 2440, 90, 125, 10, 10, 241, 1559, 4, 609, 46, 7, 4, 2, 11, 3826, 2, 5, 11, 1011, 7, 4193, 7, 4684, 2, 3498, 90, 8, 3534, 2, 7, 4779, 10, 10, 342, 92, 1414, 979, 4, 568, 44, 4, 2, 5, 331, 2237, 18, 57, 684, 52, 282, 15, 4, 1788, 71, 2, 34, 90, 10, 10, 470, 137, 269, 8, 1090, 387, 129, 761, 46, 7, 129, 1682, 17, 76, 17, 614, 8, 2, 15, 4, 2, 2, 41, 10, 10, 457, 103, 397, 339, 39, 294, 8, 169, 4, 2, 103, 2, 129, 322, 30, 252, 8, 2222, 98, 245, 17, 515, 17, 614, 38, 25, 70, 393, 90, 31, 23, 31, 57, 213, 11, 112, 2, 208, 10, 10, 150, 474, 115, 535, 15, 101, 415, 62, 30, 2, 8, 231, 6, 171, 2497, 467, 134, 2, 2, 21, 4, 105, 11, 135, 422, 26, 38, 2, 5, 97, 38, 111, 1297, 2497, 15, 45, 2620, 1167, 18, 4, 529, 8, 459, 44, 68, 4369, 237, 36, 26, 1484, 7, 68, 205, 399, 14, 1098, 4, 2, 7, 4, 436, 22, 10, 10, 11, 420, 25, 71, 1535, 4, 2, 161, 570, 19, 2, 2, 105, 237, 36, 533, 26, 1348, 2, 2, 18, 487, 14, 2, 36, 872, 8, 97, 1186, 38, 2, 2270, 15, 32, 281, 7, 635, 271, 46, 4, 2054, 10, 10, 300, 4, 2, 1098, 6, 1002, 1004, 6, 568, 1709, 475, 137, 4, 2311, 9, 2359, 57, 53, 74, 747, 2194, 245, 10, 10, 241, 4, 2, 2, 11, 32, 2580, 7, 2, 4983, 11, 3826, 2, 5, 187, 3400, 7, 84, 246, 57, 31, 85, 74, 4, 1021, 378, 186, 8, 1495, 27, 1032, 2003, 10, 10, 342, 4, 2, 2, 35, 1755, 1166, 7, 567, 15, 62, 28, 556, 101, 406, 112, 10, 10, 470, 4, 836, 139, 69, 57, 1546, 1689, 11, 192, 49, 139, 71, 1504, 1677, 2, 39, 298, 102, 10, 10, 4, 64, 1123, 9, 4, 2, 751, 4, 130, 63, 16, 6, 184, 1770, 136, 237, 12, 16, 2, 725, 4, 322, 45, 99, 78, 4, 1057, 1477, 12, 56, 19, 35, 2, 379, 277, 15, 266, 46, 7, 317, 1845, 10, 10, 371, 4, 2, 496, 4, 231, 7, 135, 422, 144, 30, 3013, 7, 533, 128, 246, 36, 144, 43, 847, 8, 2642, 5, 193, 2, 19, 84, 37, 97, 102, 19, 6, 729, 2, 18, 489, 5, 1663]
Отзыв в виде текста: there's nothing worse than renting an asian movie and getting an american movie experience instead br br it's only my opinion but a good thriller is upon the of likable intelligent characters as far as is concerned the protagonists in say yes are a married couple nicely done unfortunately they are stupid beyond belief let us count the ways they being by a br br 1 after a to kill you be sure to tell him what hotel you're staying at when you drop him off br br 2 beat the hell out of the in broad and in front of dozens of witnesses allowing him to press of assault br br 3 don't bother telling the police about the and simply assume for no apparently good reason that the cops were by him br br 4 while trying to escape let your lady out of your sight as much as possible to that the her br br 5 after getting help from someone to find the after your wife be sure to send them away as soon as possible so you can face him one on one no point in being right br br now i'd never expect that any person would be to making a few mistakes under these but the characters in say yes are so and make so many unbelievable mistakes that it's effectively impossible for the viewer to care about their safety since they are victims of their own doing this kills the of the entire film br br in case you were wondering the didn't stop with characters since they themselves are surely for writing this they decided to make situations so unrealistic that all sense of reality goes out the window br br 1 the kills a cop inside a police station while the protagonist is asleep no more than ten feet away br br 2 the in all sorts of activities in broad and around tons of people yet no one other than the married couple seems to notice his odd behavior br br 3 the an absurd amount of violence that would have killed any human being br br 4 the suspense scenes had no imagination whatsoever in fact some scenes were direct rip from american movies br br the only positive is the near the end which was a pretty brutal scene since it was upon the wife it's too bad the filmmakers followed it up with an stupid ending that comes out of left field br br truly the behind the making of say yes should be ashamed of themselves better yet they should just move to california and take with people who make movies with a similar for quality and intelligence
#Вывод метки и названия класса
class_label = y_train[some_number]
class_name = "Positive" if class_label == 1 else "Negative"
print(f"\nМетка класса: {class_label} - {class_name}")
Метка класса: 0 - Negative
4. Вывод максимальной и минимальной длины отзыва в обучающем множестве
#Вывод длины отзыва
max_review_length = len(max(X_train, key=len))
print(f"\nМаксимальная длина отзыва: {max_review_length}")
min_review_length = len(min(X_train, key=len))
print(f"\nМинимальная длина отзыва: {min_review_length}")
review_length = len(review_indices)
print(f"\nДлина отзыва: {review_length}")
Максимальная длина отзыва: 2494
Минимальная длина отзыва: 11
Длина отзыва: 502
5. Предобработка данных
# предобработка данных
from tensorflow.keras.utils import pad_sequences
max_words = 500
X_train = pad_sequences(X_train, maxlen=max_words, value=0, padding='pre', truncating='post')
X_test = pad_sequences(X_test, maxlen=max_words, value=0, padding='pre', truncating='post')
6. Повтор пунктов 4-3. Вывод о том, как преобразовался отзыв после предобработки
#Вывод длины отзыва
max_review_length = len(max(X_train, key=len))
print(f"\nМаксимальная длина отзыва: {max_review_length}")
min_review_length = len(min(X_train, key=len))
print(f"\nМинимальная длина отзыва: {min_review_length}")
review_length = len(review_indices)
print(f"\nДлина отзыва: {review_length}")
Максимальная длина отзыва: 500
Минимальная длина отзыва: 500
Длина отзыва: 502
#Вывод отзыва из обучающего множества в виде списка индексов слов
some_number = 132
review_indices = X_train[some_number]
print("Список индексов слов:")
print(review_indices)
#Преобразование списка индексов в текст
review_as_text = ' '.join(id_to_word[id] for id in X_train[some_number])
print("\nОтзыв в виде текста:")
print(review_as_text)
Список индексов слов: [ 1 225 164 433 74 2753 35 2188 20 5 397 35 298 20 585 305 10 10 45 64 61 652 21 6 52 708 9 2 725 4 2 7 1451 1089 105 17 230 17 2 9 1947 4 3238 11 135 422 26 6 2 1021 378 1780 224 472 36 26 379 724 2607 387 178 1582 4 771 36 2 112 2 34 6 2 10 10 300 103 6 2 2 8 516 25 30 252 8 376 90 51 1455 335 4010 33 54 25 2440 90 125 10 10 241 1559 4 609 46 7 4 2 11 3826 2 5 11 1011 7 4193 7 4684 2 3498 90 8 3534 2 7 4779 10 10 342 92 1414 979 4 568 44 4 2 5 331 2237 18 57 684 52 282 15 4 1788 71 2 34 90 10 10 470 137 269 8 1090 387 129 761 46 7 129 1682 17 76 17 614 8 2 15 4 2 2 41 10 10 457 103 397 339 39 294 8 169 4 2 103 2 129 322 30 252 8 2222 98 245 17 515 17 614 38 25 70 393 90 31 23 31 57 213 11 112 2 208 10 10 150 474 115 535 15 101 415 62 30 2 8 231 6 171 2497 467 134 2 2 21 4 105 11 135 422 26 38 2 5 97 38 111 1297 2497 15 45 2620 1167 18 4 529 8 459 44 68 4369 237 36 26 1484 7 68 205 399 14 1098 4 2 7 4 436 22 10 10 11 420 25 71 1535 4 2 161 570 19 2 2 105 237 36 533 26 1348 2 2 18 487 14 2 36 872 8 97 1186 38 2 2270 15 32 281 7 635 271 46 4 2054 10 10 300 4 2 1098 6 1002 1004 6 568 1709 475 137 4 2311 9 2359 57 53 74 747 2194 245 10 10 241 4 2 2 11 32 2580 7 2 4983 11 3826 2 5 187 3400 7 84 246 57 31 85 74 4 1021 378 186 8 1495 27 1032 2003 10 10 342 4 2 2 35 1755 1166 7 567 15 62 28 556 101 406 112 10 10 470 4 836 139 69 57 1546 1689 11 192 49 139 71 1504 1677 2 39 298 102 10 10 4 64 1123 9 4 2 751 4 130 63 16 6 184 1770 136 237 12 16 2 725 4 322 45 99 78 4 1057 1477 12 56 19 35 2 379 277 15 266 46 7 317 1845 10 10 371 4 2 496 4 231 7 135 422 144 30 3013 7 533 128 246 36 144 43 847 8 2642 5 193 2 19 84 37 97 102 19 6 729 2 18 489]
Отзыв в виде текста: there's nothing worse than renting an asian movie and getting an american movie experience instead br br it's only my opinion but a good thriller is upon the of likable intelligent characters as far as is concerned the protagonists in say yes are a married couple nicely done unfortunately they are stupid beyond belief let us count the ways they being by a br br 1 after a to kill you be sure to tell him what hotel you're staying at when you drop him off br br 2 beat the hell out of the in broad and in front of dozens of witnesses allowing him to press of assault br br 3 don't bother telling the police about the and simply assume for no apparently good reason that the cops were by him br br 4 while trying to escape let your lady out of your sight as much as possible to that the her br br 5 after getting help from someone to find the after your wife be sure to send them away as soon as possible so you can face him one on one no point in being right br br now i'd never expect that any person would be to making a few mistakes under these but the characters in say yes are so and make so many unbelievable mistakes that it's effectively impossible for the viewer to care about their safety since they are victims of their own doing this kills the of the entire film br br in case you were wondering the didn't stop with characters since they themselves are surely for writing this they decided to make situations so unrealistic that all sense of reality goes out the window br br 1 the kills a cop inside a police station while the protagonist is asleep no more than ten feet away br br 2 the in all sorts of activities in broad and around tons of people yet no one other than the married couple seems to notice his odd behavior br br 3 the an absurd amount of violence that would have killed any human being br br 4 the suspense scenes had no imagination whatsoever in fact some scenes were direct rip from american movies br br the only positive is the near the end which was a pretty brutal scene since it was upon the wife it's too bad the filmmakers followed it up with an stupid ending that comes out of left field br br truly the behind the making of say yes should be ashamed of themselves better yet they should just move to california and take with people who make movies with a similar for quality
Отбросили два последних слова, чтобы длина отзыва стала равна заданному значению (500)
7. Вывод предобработанных массивов обучающих и тестовых данных и их размерностей
# вывод данных
print('X train: \n',X_train)
print('X train: \n',X_test)
# вывод размерностей
print('Shape of X train:', X_train.shape)
print('Shape of X test:', X_test.shape)
X train:
[[ 0 0 0 ... 104 545 7]
[ 0 0 0 ... 2 262 372]
[ 0 0 0 ... 758 10 10]
...
[ 0 0 0 ... 2 27 375]
[ 0 0 0 ... 11 111 531]
[ 0 0 0 ... 152 1833 12]]
X train:
[[ 0 0 0 ... 2 126 3849]
[ 0 0 0 ... 25 1833 12]
[ 0 0 0 ... 129 249 4262]
...
[ 0 0 0 ... 2 24 1178]
[ 0 0 0 ... 61 278 145]
[ 0 0 0 ... 12 5 358]]
Shape of X train: (25000, 500)
Shape of X test: (25000, 500)
8. Реализация модели рекуррентной нейронной сети, состоящей из слоев Embedding, LSTM, Dropout, Dense и её обучение. Вывод информации об архитектуре нейронной сети.
model = Sequential()
model.add(layers.Embedding(input_dim=vocabulary_size, output_dim=32, input_length=max_words, input_shape=(max_words,)))
model.add(layers.LSTM(76))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
| Layer (type) | Output Shape | Param # |
|---|---|---|
| embedding_1 (Embedding) | (None, 500, 32) | 160,000 |
| lstm_1 (LSTM) | (None, 76) | 33,136 |
| dropout_1 (Dropout) | (None, 76) | 0 |
| dense_1 (Dense) | (None, 1) | 77 |
Total params: 193,213 (754.74 KB)
Trainable params: 193,213 (754.74 KB)
Non-trainable params: 0 (0.00 B)
batch_size = 64
epochs = 5
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"\nTest accuracy: {test_acc}")
Test accuracy: 0.8670799732208252
9. Оценить качество обучения на тестовых данных.
y_score = model.predict(X_test)
y_pred = [1 if y_score[i,0]>=0.5 else 0 for i in range(len(y_score))]
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred, labels = [0, 1], target_names=['Negative', 'Positive']))
| precision | recall | f1-score | support | |
|---|---|---|---|---|
| Negative | 0.86 | 0.88 | 0.87 | 12500 |
| Positive | 0.88 | 0.86 | 0.87 | 12500 |
| accuracy | 0.87 | 25000 | ||
| macro avg | 0.87 | 0.87 | 0.87 | 25000 |
| weighted avg | 0.87 | 0.87 | 0.87 | 25000 |
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
fpr, tpr, thresholds = roc_curve(y_test, y_score)
plt.plot(fpr, tpr)
plt.grid()
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC')
plt.show()
print('Area under ROC is', auc(fpr, tpr))
Area under ROC is 0.936850272
Вывод
| Модель | Количество настраиваемых параметров | Количество эпох обучения | Качество классификации тестовой выборки |
|---|---|---|---|
| Рекуррентная | 193,213 | 3 | accuracy: 0.867 loss: 0.331 ROC: 0.936 |
В ходе лабораторной работы было изучено применение рекуррентной нейронной сети. Исходя из анализа полученных результатов, представленных в таблице, делаем вывод, что модель хорошо справилась с задачей определения тональности текста: accuracy = 0.867, loss = 0.331. Показатель точности превышает требуемый порог в 0,8. Значение ROC превышает 0,9, что свидетельствует о способности модели хорошо различать два класса - отрицательные и положительные отзывы
