{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "gz18QPRz03Ec" }, "source": [ "### 1) В среде Google Colab создали новый блокнот (notebook). Импортировали необходимые для работы библиотеки и модули. Настроили блокнот для работы с аппаратным ускорителем GPU." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "mr9IszuQ1ANG" }, "outputs": [], "source": [ "# импорт модулей\n", "import os\n", "\n", "from tensorflow import keras\n", "from tensorflow.keras import layers\n", "from tensorflow.keras.models import Sequential\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n" ] }, { "cell_type": "markdown", "metadata": { "id": "FFRtE0TN1AiA" }, "source": [ "### 2) Загрузили набор данных IMDb, содержащий оцифрованные отзывы на фильмы, размеченные на два класса: позитивные и негативные. При загрузке набора данных параметр seed выбрали равным значению (4k – 1)=31, где k=8 – номер бригады. Вывели размеры полученных обучающих и тестовых массивов данных." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "Ixw5Sp0_1A-w" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m17464789/17464789\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m7s\u001b[0m 0us/step\n", "Shape of X train: (25000,)\n", "Shape of y train: (25000,)\n", "Shape of X test: (25000,)\n", "Shape of y test: (25000,)\n" ] } ], "source": [ "# загрузка датасета\n", "from keras.datasets import imdb\n", "\n", "vocabulary_size = 5000\n", "index_from = 3\n", "\n", "(X_train, y_train), (X_test, y_test) = imdb.load_data(\n", " path=\"imdb.npz\",\n", " num_words=vocabulary_size,\n", " skip_top=0,\n", " maxlen=None,\n", " seed=31,\n", " start_char=1,\n", " oov_char=2,\n", " index_from=index_from\n", " )\n", "\n", "# вывод размерностей\n", "print('Shape of X train:', X_train.shape)\n", "print('Shape of y train:', y_train.shape)\n", "print('Shape of X test:', X_test.shape)\n", "print('Shape of y test:', y_test.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "aCo_lUXl1BPV" }, "source": [ "### 3) Вывели один отзыв из обучающего множества в виде списка индексов слов. Преобразовали список индексов в текст и вывели отзыв в виде текста. Вывели длину отзыва. Вывели метку класса данного отзыва и название класса (1 – Positive, 0 – Negative)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "9W3RklPcZyH0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json\n", "\u001b[1m1641221/1641221\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2us/step\n" ] } ], "source": [ "# создание словаря для перевода индексов в слова\n", "# заргузка словаря \"слово:индекс\"\n", "word_to_id = imdb.get_word_index()\n", "# уточнение словаря\n", "word_to_id = {key:(value + index_from) for key,value in word_to_id.items()}\n", "word_to_id[\"\"] = 0\n", "word_to_id[\"\"] = 1\n", "word_to_id[\"\"] = 2\n", "word_to_id[\"\"] = 3\n", "# создание обратного словаря \"индекс:слово\"\n", "id_to_word = {value:key for key,value in word_to_id.items()}" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "Nu-Bs1jnaYhB" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 13, 805, 8, 40, 14, 1179, 40, 13, 353, 8, 358, 32, 1179, 108, 13, 384, 3091, 2, 1849, 19, 6, 117, 1006, 5, 49, 836, 89, 70, 25, 140, 355, 21, 2, 13, 104, 9, 35, 463, 7, 15, 2063, 170, 355, 4, 293, 1834, 9, 4, 527, 116, 7, 4, 293, 289, 539, 15, 2, 56, 11, 4, 313, 12, 16, 17, 48, 36, 71, 467, 2, 5, 12, 2230, 72, 39, 126, 397, 928, 11, 68, 4598, 4, 22, 2, 18, 836, 5, 2, 21, 4, 34, 4, 1396, 458, 2, 12, 7, 148, 5, 889, 4, 20, 184, 753, 45, 6, 902, 88, 48, 4, 20, 16, 128, 2142, 12, 62, 28, 28, 77, 2, 4, 65, 5, 105, 26, 184, 948, 5, 50, 26, 49, 465, 5, 2, 1984, 388, 7, 4347, 200, 4, 452, 4, 539, 5, 4, 577, 11, 4, 154, 313, 225, 49, 52, 1006, 5, 2552, 2, 2, 43, 24, 195, 8, 202, 4, 22, 4, 1968, 12, 887, 4, 1962, 9, 184, 2509, 5, 2, 5, 127, 202, 4, 22, 6, 194, 2, 21, 1038, 94, 99, 117, 99, 522, 38, 11, 61, 652, 31, 8, 798, 894, 25, 66, 119, 3720, 1179, 108, 225, 6, 1257, 1166, 7, 986, 21, 4, 22, 1545, 99, 117, 8, 30, 2640]\n", "len: 220\n" ] } ], "source": [ "print(X_train[26])\n", "print('len:',len(X_train[26]))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "JhTwTurtZ6Sp" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " i tried to like this slasher like i try to enjoy all slasher films i mean mindless mixed with a little nudity and some suspense how can you go wrong but i think is an example of that formula going wrong the main issue is the horrible acting of the main three girls that up in the house it was as if they were under and it stopped me from ever getting interested in their plight the film for suspense and but the by the numbers direction it of those and leaves the movie pretty dull it's a shame because if the movie was better executed it would have have been the story and characters are pretty creepy and there are some dark and humorous moments of interaction between the mother the girls and the daughter in the old house there's some good nudity and occasional just not enough to give the film the kick it needed the finale is pretty twisted and and does give the film a big but sadly its too little too late so in my opinion one to avoid unless you really love obscure slasher films there's a fair amount of potential but the film delivers too little to be worthwhile\n", "len: 1159\n", "Label: 0 ( Negative )\n" ] } ], "source": [ "review_as_text = ' '.join(id_to_word[id] for id in X_train[26])\n", "print(review_as_text)\n", "print('len:',len(review_as_text))\n", "print('Label:', y_train[26], '(', 'Positive' if y_train[26] == 1 else 'Negative', ')')" ] }, { "cell_type": "markdown", "metadata": { "id": "4hclnNaD1BuB" }, "source": [ "### 4) Вывели максимальную и минимальную длину отзыва в обучающем множестве." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "xJH87ISq1B9h" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MAX Len: 2494\n", "MIN Len: 11\n" ] } ], "source": [ "print('MAX Len: ',len(max(X_train, key=len)))\n", "print('MIN Len: ',len(min(X_train, key=len)))" ] }, { "cell_type": "markdown", "metadata": { "id": "7x99O8ig1CLh" }, "source": [ "### 5) Провели предобработку данных. Выбрали единую длину, к которой будут приведены все отзывы. Короткие отзывы дополнили спецсимволами, а длинные обрезали до выбранной длины." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "lrF-B2aScR4t" }, "outputs": [], "source": [ "# предобработка данных\n", "from tensorflow.keras.utils import pad_sequences\n", "max_words = 500\n", "X_train = pad_sequences(X_train, maxlen=max_words, value=0, padding='pre', truncating='post')\n", "X_test = pad_sequences(X_test, maxlen=max_words, value=0, padding='pre', truncating='post')" ] }, { "cell_type": "markdown", "metadata": { "id": "HL2_LVga1C3l" }, "source": [ "### 6) Повторили пункт 4." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "81Cgq8dn9uL6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MAX Len: 500\n", "MIN Len: 500\n" ] } ], "source": [ "print('MAX Len: ',len(max(X_train, key=len)))\n", "print('MIN Len: ',len(min(X_train, key=len)))" ] }, { "cell_type": "markdown", "metadata": { "id": "KzrVY1SR1DZh" }, "source": [ "### 7) Повторили пункт 3. Сделали вывод о том, как отзыв преобразовался после предобработки." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "vudlgqoCbjU1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 1 13 805 8 40 14 1179 40 13 353 8 358 32 1179\n", " 108 13 384 3091 2 1849 19 6 117 1006 5 49 836 89\n", " 70 25 140 355 21 2 13 104 9 35 463 7 15 2063\n", " 170 355 4 293 1834 9 4 527 116 7 4 293 289 539\n", " 15 2 56 11 4 313 12 16 17 48 36 71 467 2\n", " 5 12 2230 72 39 126 397 928 11 68 4598 4 22 2\n", " 18 836 5 2 21 4 34 4 1396 458 2 12 7 148\n", " 5 889 4 20 184 753 45 6 902 88 48 4 20 16\n", " 128 2142 12 62 28 28 77 2 4 65 5 105 26 184\n", " 948 5 50 26 49 465 5 2 1984 388 7 4347 200 4\n", " 452 4 539 5 4 577 11 4 154 313 225 49 52 1006\n", " 5 2552 2 2 43 24 195 8 202 4 22 4 1968 12\n", " 887 4 1962 9 184 2509 5 2 5 127 202 4 22 6\n", " 194 2 21 1038 94 99 117 99 522 38 11 61 652 31\n", " 8 798 894 25 66 119 3720 1179 108 225 6 1257 1166 7\n", " 986 21 4 22 1545 99 117 8 30 2640]\n", "len: 500\n" ] } ], "source": [ "print(X_train[26])\n", "print('len:',len(X_train[26]))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "dbfkWjDI1Dp7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " i tried to like this slasher like i try to enjoy all slasher films i mean mindless mixed with a little nudity and some suspense how can you go wrong but i think is an example of that formula going wrong the main issue is the horrible acting of the main three girls that up in the house it was as if they were under and it stopped me from ever getting interested in their plight the film for suspense and but the by the numbers direction it of those and leaves the movie pretty dull it's a shame because if the movie was better executed it would have have been the story and characters are pretty creepy and there are some dark and humorous moments of interaction between the mother the girls and the daughter in the old house there's some good nudity and occasional just not enough to give the film the kick it needed the finale is pretty twisted and and does give the film a big but sadly its too little too late so in my opinion one to avoid unless you really love obscure slasher films there's a fair amount of potential but the film delivers too little to be worthwhile\n", "len: 2839\n" ] } ], "source": [ "review_as_text = ' '.join(id_to_word[id] for id in X_train[26])\n", "print(review_as_text)\n", "print('len:',len(review_as_text))" ] }, { "cell_type": "markdown", "metadata": { "id": "mJNRXo5TdPAE" }, "source": [ "#### После обработки в начало отзыва добавилось необходимое количество токенов , чтобы отзыв был длинной в 500 индексов." ] }, { "cell_type": "markdown", "metadata": { "id": "YgiVGr5_1D3u" }, "source": [ "### 8) Вывели предобработанные массивы обучающих и тестовых данных и их размерности." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "7MqcG_wl1EHI" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "X train: \n", " [[ 0 0 0 ... 2 4050 2]\n", " [ 0 0 0 ... 721 90 180]\n", " [ 0 0 0 ... 1114 2 174]\n", " ...\n", " [ 1 1065 2022 ... 7 1514 2]\n", " [ 0 0 0 ... 6 879 132]\n", " [ 0 0 0 ... 12 152 157]]\n", "X train: \n", " [[ 0 0 0 ... 10 342 158]\n", " [ 0 0 0 ... 2 67 12]\n", " [ 0 0 0 ... 1242 1095 1095]\n", " ...\n", " [ 0 0 0 ... 4 2 136]\n", " [ 0 0 0 ... 14 31 591]\n", " [ 0 0 0 ... 7 3923 212]]\n", "Shape of X train: (25000, 500)\n", "Shape of X test: (25000, 500)\n" ] } ], "source": [ "# вывод данных\n", "print('X train: \\n',X_train)\n", "print('X train: \\n',X_test)\n", "\n", "# вывод размерностей\n", "print('Shape of X train:', X_train.shape)\n", "print('Shape of X test:', X_test.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "amaspXGW1EVy" }, "source": [ "### 9) Реализовали модель рекуррентной нейронной сети, состоящей из слоев Embedding, LSTM, Dropout, Dense, и обучили ее на обучающих данных с выделением части обучающих данных в качестве валидационных. Вывели информацию об архитектуре нейронной сети. Добились качества обучения по метрике accuracy не менее 0.8." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "ktWEeqWd1EyF" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "c:\\Users\\Admin\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\keras\\src\\layers\\core\\embedding.py:97: UserWarning: Argument `input_length` is deprecated. Just remove it.\n", " warnings.warn(\n", "c:\\Users\\Admin\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\keras\\src\\layers\\core\\embedding.py:100: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.\n", " super().__init__(**kwargs)\n" ] }, { "data": { "text/html": [ "
Model: \"sequential\"\n",
              "
\n" ], "text/plain": [ "\u001b[1mModel: \"sequential\"\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n",
              "┃ Layer (type)                     Output Shape                  Param # ┃\n",
              "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n",
              "│ embedding (Embedding)           │ (None, 500, 32)        │       160,000 │\n",
              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
              "│ lstm (LSTM)                     │ (None, 64)             │        24,832 │\n",
              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
              "│ dropout (Dropout)               │ (None, 64)             │             0 │\n",
              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
              "│ dense (Dense)                   │ (None, 1)              │            65 │\n",
              "└─────────────────────────────────┴────────────────────────┴───────────────┘\n",
              "
\n" ], "text/plain": [ "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n", "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n", "│ embedding (\u001b[38;5;33mEmbedding\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m500\u001b[0m, \u001b[38;5;34m32\u001b[0m) │ \u001b[38;5;34m160,000\u001b[0m │\n", "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", "│ lstm (\u001b[38;5;33mLSTM\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m24,832\u001b[0m │\n", "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", "│ dropout (\u001b[38;5;33mDropout\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n", "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", "│ dense (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m1\u001b[0m) │ \u001b[38;5;34m65\u001b[0m │\n", "└─────────────────────────────────┴────────────────────────┴───────────────┘\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
 Total params: 184,897 (722.25 KB)\n",
              "
\n" ], "text/plain": [ "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m184,897\u001b[0m (722.25 KB)\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
 Trainable params: 184,897 (722.25 KB)\n",
              "
\n" ], "text/plain": [ "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m184,897\u001b[0m (722.25 KB)\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
 Non-trainable params: 0 (0.00 B)\n",
              "
\n" ], "text/plain": [ "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "embed_dim = 32\n", "lstm_units = 64\n", "\n", "model = Sequential()\n", "model.add(layers.Embedding(input_dim=vocabulary_size, output_dim=embed_dim, input_length=max_words, input_shape=(max_words,)))\n", "model.add(layers.LSTM(lstm_units))\n", "model.add(layers.Dropout(0.5))\n", "model.add(layers.Dense(1, activation='sigmoid'))\n", "\n", "model.summary()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "CuPqKpX0kQfP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/5\n", "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m61s\u001b[0m 184ms/step - accuracy: 0.8464 - loss: 0.3649 - val_accuracy: 0.8366 - val_loss: 0.3726\n", "Epoch 2/5\n", "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m58s\u001b[0m 184ms/step - accuracy: 0.8838 - loss: 0.2931 - val_accuracy: 0.8692 - val_loss: 0.3221\n", "Epoch 3/5\n", "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m59s\u001b[0m 188ms/step - accuracy: 0.9015 - loss: 0.2519 - val_accuracy: 0.8652 - val_loss: 0.3294\n", "Epoch 4/5\n", "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m58s\u001b[0m 185ms/step - accuracy: 0.9151 - loss: 0.2225 - val_accuracy: 0.8636 - val_loss: 0.3255\n", "Epoch 5/5\n", "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m82s\u001b[0m 184ms/step - accuracy: 0.9162 - loss: 0.2174 - val_accuracy: 0.8660 - val_loss: 0.3360\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# компилируем и обучаем модель\n", "batch_size = 64\n", "epochs = 5\n", "model.compile(loss=\"binary_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])\n", "model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "hJIWinxymQjb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m38s\u001b[0m 49ms/step - accuracy: 0.8659 - loss: 0.3349\n", "\n", "Test accuracy: 0.865880012512207\n" ] } ], "source": [ "test_loss, test_acc = model.evaluate(X_test, y_test)\n", "print(f\"\\nTest accuracy: {test_acc}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "mgrihPd61E8w" }, "source": [ "### 10) Оценили качество обучения на тестовых данных:\n", "### - вывели значение метрики качества классификации на тестовых данных\n", "### - вывели отчет о качестве классификации тестовой выборки \n", "### - построили ROC-кривую по результату обработки тестовой выборки и вычислили площадь под ROC-кривой (AUC ROC)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "Rya5ABT8msha" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Test accuracy: 0.865880012512207\n" ] } ], "source": [ "#значение метрики качества классификации на тестовых данных\n", "print(f\"\\nTest accuracy: {test_acc}\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "id": "2kHjcmnCmv0Y" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m40s\u001b[0m 50ms/step\n", " precision recall f1-score support\n", "\n", " Negative 0.91 0.82 0.86 12500\n", " Positive 0.83 0.92 0.87 12500\n", "\n", " accuracy 0.87 25000\n", " macro avg 0.87 0.87 0.87 25000\n", "weighted avg 0.87 0.87 0.87 25000\n", "\n" ] } ], "source": [ "#отчет о качестве классификации тестовой выборки\n", "y_score = model.predict(X_test)\n", "y_pred = [1 if y_score[i,0]>=0.5 else 0 for i in range(len(y_score))]\n", "\n", "from sklearn.metrics import classification_report\n", "print(classification_report(y_test, y_pred, labels = [0, 1], target_names=['Negative', 'Positive']))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Kp4AQRbcmwAx" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "AUC ROC: 0.9420113727999999\n" ] } ], "source": [ "#построение ROC-кривой и AUC ROC\n", "from sklearn.metrics import roc_curve, auc\n", "\n", "fpr, tpr, thresholds = roc_curve(y_test, y_score)\n", "plt.figure(figsize=(8, 6))\n", "plt.plot(fpr, tpr)\n", "plt.grid()\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.title('ROC')\n", "plt.savefig('roc_curve.png', dpi=300, bbox_inches='tight')\n", "plt.show()\n", "print('AUC ROC:', auc(fpr, tpr))" ] }, { "cell_type": "markdown", "metadata": { "id": "MsM3ew3d1FYq" }, "source": [ "### 11) Сделали выводы по результатам применения рекуррентной нейронной сети для решения задачи определения тональности текста. " ] }, { "cell_type": "markdown", "metadata": { "id": "xxFO4CXbIG88" }, "source": [ "Таблица1:" ] }, { "cell_type": "markdown", "metadata": { "id": "xvoivjuNFlEf" }, "source": [ "| Модель | Количество настраиваемых параметров | Количество эпох обучения | Качество классификации тестовой выборки |\n", "|----------|-------------------------------------|---------------------------|-----------------------------------------|\n", "| Рекуррентная | 184 897 | 5 | accuracy:0.8659 ; loss:0.3349 ; AUC ROC:0.9420 |\n" ] }, { "cell_type": "markdown", "metadata": { "id": "YctF8h_sIB-P" }, "source": [ "#### По результатам применения рекуррентной нейронной сети, а также по данным таблицы 1 можно сделать вывод, что модель хорошо справилась с задачей определения тональности текста. Показатель accuracy = 0.8659 превышает требуемый порог 0.8. Значение AUC ROC = 0.9420 (> 0.9) говорит о высокой способности модели различать два класса (положительные и отрицательные отзывы). Модель показала хорошие результаты по метрикам precision и recall: для негативных отзывов precision = 0.91, recall = 0.82; для позитивных отзывов precision = 0.83, recall = 0.92." ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 0 }