# Отчёт по лабораторной работе №4 **Ильинцева Л.В., Коновалова А.А. — А-01-22** --- ## Задание 1 ### 1. В среде Google Colab создали новый блокнот (notebook). Импортировали необходимые для работы библиотеки и модули. Настроили блокнот для работы с аппаратным ускорителем GPU. ```python # импорт модулей import os os.chdir('/content/drive/MyDrive/Colab Notebooks/is_lab4') from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.models import Sequential import matplotlib.pyplot as plt import numpy as np ``` ```python import tensorflow as tf device_name = tf.test.gpu_device_name() if device_name != '/device:GPU:0': raise SystemError('GPU device not found') print('Found GPU at: {}'.format(device_name)) ``` ``` Found GPU at: /device:GPU:0 ``` ### 2. Загрузили набор данных IMDb, содержащий оцифрованные отзывы на фильмы, размеченные на два класса: позитивные и негативные. При загрузке набора данных параметр seed выбрали равным значению (4k – 1)=31, где k=8 – номер бригады. Вывели размеры полученных обучающих и тестовых массивов данных. ```python # загрузка датасета from keras.datasets import imdb vocabulary_size = 5000 index_from = 3 (X_train, y_train), (X_test, y_test) = imdb.load_data( path="imdb.npz", num_words=vocabulary_size, skip_top=0, maxlen=None, seed=31, start_char=1, oov_char=2, index_from=index_from ) # вывод размерностей print('Shape of X train:', X_train.shape) print('Shape of y train:', y_train.shape) print('Shape of X test:', X_test.shape) print('Shape of y test:', y_test.shape) ``` ``` Shape of X train: (25000,) Shape of y train: (25000,) Shape of X test: (25000,) Shape of y test: (25000,) ``` ### 3. Вывели один отзыв из обучающего множества в виде списка индексов слов. Преобразовали список индексов в текст и вывели отзыв в виде текста. Вывели длину отзыва. Вывели метку класса данного отзыва и название класса (1 – Positive, 0 – Negative). ```python # создание словаря для перевода индексов в слова # загрузка словаря "слово:индекс" word_to_id = imdb.get_word_index() # уточнение словаря word_to_id = {key:(value + index_from) for key,value in word_to_id.items()} word_to_id[""] = 0 word_to_id[""] = 1 word_to_id[""] = 2 word_to_id[""] = 3 # создание обратного словаря "индекс:слово" id_to_word = {value:key for key,value in word_to_id.items()} ``` ```python print(X_train[26]) print('len:',len(X_train[26])) ``` ``` [1, 13, 805, 8, 40, 14, 1179, 40, 13, 353, 8, 358, 32, 1179, 108, 13, 384, 3091, 2, 1849, 19, 6, 117, 1006, 5, 49, 836, 89, 70, 25, 140, 355, 21, 2, 13, 104, 9, 35, 463, 7, 15, 2063, 170, 355, 4, 293, 1834, 9, 4, 527, 116, 7, 4, 293, 289, 539, 15, 2, 56, 11, 4, 313, 12, 16, 17, 48, 36, 71, 467, 2, 5, 12, 2230, 72, 39, 126, 397, 928, 11, 68, 4598, 4, 22, 2, 18, 836, 5, 2, 21, 4, 34, 4, 1396, 458, 2, 12, 7, 148, 5, 889, 4, 20, 184, 753, 45, 6, 902, 88, 48, 4, 20, 16, 128, 2142, 12, 62, 28, 28, 77, 2, 4, 65, 5, 105, 26, 184, 948, 5, 50, 26, 49, 465, 5, 2, 1984, 388, 7, 4347, 200, 4, 452, 4, 539, 5, 4, 577, 11, 4, 154, 313, 225, 49, 52, 1006, 5, 2552, 2, 2, 43, 24, 195, 8, 202, 4, 22, 4, 1968, 12, 887, 4, 1962, 9, 184, 2509, 5, 2, 5, 127, 202, 4, 22, 6, 194, 2, 21, 1038, 94, 99, 117, 99, 522, 38, 11, 61, 652, 31, 8, 798, 894, 25, 66, 119, 3720, 1179, 108, 225, 6, 1257, 1166, 7, 986, 21, 4, 22, 1545, 99, 117, 8, 30, 2640] len: 220 ``` ```python review_as_text = ' '.join(id_to_word[id] for id in X_train[26]) print(review_as_text) print('len:',len(review_as_text)) print('Label:', y_train[26], '(', 'Positive' if y_train[26] == 1 else 'Negative', ')') ``` ``` i tried to like this slasher like i try to enjoy all slasher films i mean mindless mixed with a little nudity and some suspense how can you go wrong but i think is an example of that formula going wrong the main issue is the horrible acting of the main three girls that up in the house it was as if they were under and it stopped me from ever getting interested in their plight the film for suspense and but the by the numbers direction it of those and leaves the movie pretty dull it's a shame because if the movie was better executed it would have have been the story and characters are pretty creepy and there are some dark and humorous moments of interaction between the mother the girls and the daughter in the old house there's some good nudity and occasional just not enough to give the film the kick it needed the finale is pretty twisted and and does give the film a big but sadly its too little too late so in my opinion one to avoid unless you really love obscure slasher films there's a fair amount of potential but the film delivers too little to be worthwhile len: 1159 Label: 0 ( Negative ) ``` ### 4. Вывели максимальную и минимальную длину отзыва в обучающем множестве. ```python print('MAX Len: ',len(max(X_train, key=len))) print('MIN Len: ',len(min(X_train, key=len))) ``` ``` MAX Len: 2494 MIN Len: 11 ``` ### 5. Провели предобработку данных. Выбрали единую длину, к которой будут приведены все отзывы. Короткие отзывы дополнили спецсимволами, а длинные обрезали до выбранной длины. ```python # предобработка данных from tensorflow.keras.utils import pad_sequences max_words = 500 X_train = pad_sequences(X_train, maxlen=max_words, value=0, padding='pre', truncating='post') X_test = pad_sequences(X_test, maxlen=max_words, value=0, padding='pre', truncating='post') ``` ### 6. Повторили пункт 4. ```python print('MAX Len: ',len(max(X_train, key=len))) print('MIN Len: ',len(min(X_train, key=len))) ``` ``` MAX Len: 500 MIN Len: 500 ``` ### 7. Повторили пункт 3. Сделали вывод о том, как отзыв преобразовался после предобработки. ```python print(X_train[26]) print('len:',len(X_train[26])) ``` ``` [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 4 78 46 304 39 2 7 968 2 295 209 101 147 65 10 10 2643 2 497 8 30 6 147 284 5 996 174 10 10 11 4 130 4 2 4979 11 2 10 10 2] len: 500 ``` ```python review_as_text = ' '.join(id_to_word[id] for id in X_train[26]) print(review_as_text) print('len:',len(review_as_text)) ``` ```