Загрузил(а) файлы в 'labworks/LW4'

2 недель назад · a03211b59b
--- a/labworks/LW4/1.png
+++ b/labworks/LW4/1.png
--- a/labworks/LW4/Report.md
+++ b/labworks/LW4/Report.md
@ -0,0 +1,377 @@
+# Отчёт по лабораторной работе №3
+## по теме: "Распознавание изображений"
+
+---
+
+### Выполнили: Бригада 2, Мачулина Д.В., Бирюкова А.С., А-02-22 
+
+---
+### 1. Создание блокнота и настройка среды. Настройка блокнота для работы с аппаратным ускорителем GPU
+```python
+from google.colab import drive
+drive.mount('/content/drive')
+import os
+os.chdir('/content/drive/MyDrive/Colab Notebooks/is_lab4')
+
+from tensorflow import keras
+from tensorflow.keras import layers
+from tensorflow.keras.models import Sequential
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.metrics import classification_report, confusion_matrix
+from sklearn.metrics import ConfusionMatrixDisplay
+```
+```python
+import tensorflow as tf
+device_name = tf.test.gpu_device_name()
+if device_name != '/device:GPU:0':
+ raise SystemError('GPU device not found')
+print('Found GPU at: {}'.format(device_name))
+```
+---
+### 2. Загрузка данных IMBd. Настройка набора данных (4*2 - 1)
+```python
+# загрузка датасета
+from keras.datasets import imdb
+vocabulary_size = 5000
+index_from = 3
+(X_train, y_train), (X_test, y_test) = imdb.load_data(path="imdb.npz",
+                                                      num_words=vocabulary_size,
+                                                      skip_top=0,
+                                                      maxlen=None,
+                                                      seed=7,
+                                                      start_char=1,
+                                                      oov_char=2,
+                                                      index_from=index_from
+                                                      )
+print('Shape of X train:', X_train.shape)
+print('Shape of y train:', y_train.shape)
+print('Shape of X test:', X_test.shape)
+print('Shape of y test:', y_test.shape)
+```
+Shape of X train: (25000,) <br>
+Shape of y train: (25000,) <br>
+Shape of X test: (25000,) <br>
+Shape of y test: (25000,) <br>
+
+---
+### 3. Вывод отзыва из обучающего множества в виде списка индекса слов. Преобразование списка индексов в текст и вывод отзыва в виде текста. Вывод длины отзыва, метки класса и названия.
+```python
+# создание словаря для перевода индексов в слова
+# заргузка словаря "слово:индекс"
+word_to_id = imdb.get_word_index()
+# уточнение словаря
+word_to_id = {key:(value + index_from) for key,value in word_to_id.items()}
+word_to_id["<PAD>"] = 0
+word_to_id["<START>"] = 1
+word_to_id["<UNK>"] = 2
+word_to_id["<UNUSED>"] = 3
+# создание обратного словаря "индекс:слово"
+id_to_word = {value:key for key,value in word_to_id.items()}
+```
+```python
+#Вывод отзыва из обучающего множества в виде списка индексов слов
+some_number = 192
+review_indices = X_train[some_number]
+print("Список индексов слов:")
+print(review_indices)
+
+#Преобразование списка индексов в текст
+review_as_text = ' '.join(id_to_word[id] for id in X_train[some_number])
+print("\nОтзыв в виде текста:")
+print(review_as_text)
+```
+Список индексов слов:
+[1, 225, 164, 433, 74, 2753, 35, 2188, 20, 5, 397, 35, 298, 20, 585, 305, 10, 10, 45, 64, 61, 652, 21, 6, 52, 708, 9, 2, 725, 4, 2, 7, 1451, 1089, 105, 17, 230, 17, 2, 9, 1947, 4, 3238, 11, 135, 422, 26, 6, 2, 1021, 378, 1780, 224, 472, 36, 26, 379, 724, 2607, 387, 178, 1582, 4, 771, 36, 2, 112, 2, 34, 6, 2, 10, 10, 300, 103, 6, 2, 2, 8, 516, 25, 30, 252, 8, 376, 90, 51, 1455, 335, 4010, 33, 54, 25, 2440, 90, 125, 10, 10, 241, 1559, 4, 609, 46, 7, 4, 2, 11, 3826, 2, 5, 11, 1011, 7, 4193, 7, 4684, 2, 3498, 90, 8, 3534, 2, 7, 4779, 10, 10, 342, 92, 1414, 979, 4, 568, 44, 4, 2, 5, 331, 2237, 18, 57, 684, 52, 282, 15, 4, 1788, 71, 2, 34, 90, 10, 10, 470, 137, 269, 8, 1090, 387, 129, 761, 46, 7, 129, 1682, 17, 76, 17, 614, 8, 2, 15, 4, 2, 2, 41, 10, 10, 457, 103, 397, 339, 39, 294, 8, 169, 4, 2, 103, 2, 129, 322, 30, 252, 8, 2222, 98, 245, 17, 515, 17, 614, 38, 25, 70, 393, 90, 31, 23, 31, 57, 213, 11, 112, 2, 208, 10, 10, 150, 474, 115, 535, 15, 101, 415, 62, 30, 2, 8, 231, 6, 171, 2497, 467, 134, 2, 2, 21, 4, 105, 11, 135, 422, 26, 38, 2, 5, 97, 38, 111, 1297, 2497, 15, 45, 2620, 1167, 18, 4, 529, 8, 459, 44, 68, 4369, 237, 36, 26, 1484, 7, 68, 205, 399, 14, 1098, 4, 2, 7, 4, 436, 22, 10, 10, 11, 420, 25, 71, 1535, 4, 2, 161, 570, 19, 2, 2, 105, 237, 36, 533, 26, 1348, 2, 2, 18, 487, 14, 2, 36, 872, 8, 97, 1186, 38, 2, 2270, 15, 32, 281, 7, 635, 271, 46, 4, 2054, 10, 10, 300, 4, 2, 1098, 6, 1002, 1004, 6, 568, 1709, 475, 137, 4, 2311, 9, 2359, 57, 53, 74, 747, 2194, 245, 10, 10, 241, 4, 2, 2, 11, 32, 2580, 7, 2, 4983, 11, 3826, 2, 5, 187, 3400, 7, 84, 246, 57, 31, 85, 74, 4, 1021, 378, 186, 8, 1495, 27, 1032, 2003, 10, 10, 342, 4, 2, 2, 35, 1755, 1166, 7, 567, 15, 62, 28, 556, 101, 406, 112, 10, 10, 470, 4, 836, 139, 69, 57, 1546, 1689, 11, 192, 49, 139, 71, 1504, 1677, 2, 39, 298, 102, 10, 10, 4, 64, 1123, 9, 4, 2, 751, 4, 130, 63, 16, 6, 184, 1770, 136, 237, 12, 16, 2, 725, 4, 322, 45, 99, 78, 4, 1057, 1477, 12, 56, 19, 35, 2, 379, 277, 15, 266, 46, 7, 317, 1845, 10, 10, 371, 4, 2, 496, 4, 231, 7, 135, 422, 144, 30, 3013, 7, 533, 128, 246, 36, 144, 43, 847, 8, 2642, 5, 193, 2, 19, 84, 37, 97, 102, 19, 6, 729, 2, 18, 489, 5, 1663]
+
+Отзыв в виде текста:
+<START> there's nothing worse than renting an asian movie and getting an american movie experience instead br br it's only my opinion but a good thriller is <UNK> upon the <UNK> of likable intelligent characters as far as <UNK> is concerned the protagonists in say yes are a <UNK> married couple nicely done unfortunately they are stupid beyond belief let us count the ways they <UNK> being <UNK> by a <UNK> br br 1 after a <UNK> <UNK> to kill you be sure to tell him what hotel you're staying at when you drop him off br br 2 beat the hell out of the <UNK> in broad <UNK> and in front of dozens of witnesses <UNK> allowing him to press <UNK> of assault br br 3 don't bother telling the police about the <UNK> and simply assume for no apparently good reason that the cops were <UNK> by him br br 4 while trying to escape let your lady out of your sight as much as possible to <UNK> that the <UNK> <UNK> her br br 5 after getting help from someone to find the <UNK> after <UNK> your wife be sure to send them away as soon as possible so you can face him one on one no point in being <UNK> right br br now i'd never expect that any person would be <UNK> to making a few mistakes under these <UNK> <UNK> but the characters in say yes are so <UNK> and make so many unbelievable mistakes that it's effectively impossible for the viewer to care about their safety since they are victims of their own doing this kills the <UNK> of the entire film br br in case you were wondering the <UNK> didn't stop with <UNK> <UNK> characters since they themselves are surely <UNK> <UNK> for writing this <UNK> they decided to make situations so <UNK> unrealistic that all sense of reality goes out the window br br 1 the <UNK> kills a cop inside a police station  while the protagonist is asleep no more than ten feet away br br 2 the <UNK> <UNK> in all sorts of <UNK> activities in broad <UNK> and around tons of people yet no one other than the married couple seems to notice his odd behavior br br 3 the <UNK> <UNK> an absurd amount of violence that would have killed any human being br br 4 the suspense scenes had no imagination whatsoever in fact some scenes were direct rip <UNK> from american movies br br the only positive is the <UNK> near the end which was a pretty brutal scene since it was <UNK> upon the wife it's too bad the filmmakers followed it up with an <UNK> stupid ending that comes out of left field br br truly the <UNK> behind the making of say yes should be ashamed of themselves better yet they should just move to california and take <UNK> with people who make movies with a similar <UNK> for quality and intelligence
+
+```python
+#Вывод метки и названия класса
+class_label = y_train[some_number]
+class_name = "Positive" if class_label == 1 else "Negative"
+print(f"\nМетка класса: {class_label} - {class_name}")
+```
+Метка класса: 0 - Negative
+
+---
+### 4. Вывод максимальной и минимальной длины отзыва в обучающем множестве
+```python
+#Вывод длины отзыва
+max_review_length = len(max(X_train, key=len))
+print(f"\nМаксимальная длина отзыва: {max_review_length}")
+min_review_length = len(min(X_train, key=len))
+print(f"\nМинимальная длина отзыва: {min_review_length}")
+review_length = len(review_indices)
+print(f"\nДлина отзыва: {review_length}")
+```
+Максимальная длина отзыва: 2494 <br>
+Минимальная длина отзыва: 11 <br>
+Длина отзыва: 502
+
+---
+### 5. Предобработка данных
+```python
+# предобработка данных
+from tensorflow.keras.utils import pad_sequences
+max_words = 500
+X_train = pad_sequences(X_train, maxlen=max_words, value=0, padding='pre', truncating='post')
+X_test = pad_sequences(X_test, maxlen=max_words, value=0, padding='pre', truncating='post')
+```
+---
+### 6. Повтор пунктов 4-3. Вывод о том, как преобразовался отзыв после предобработки
+```python
+#Вывод длины отзыва
+max_review_length = len(max(X_train, key=len))
+print(f"\nМаксимальная длина отзыва: {max_review_length}")
+min_review_length = len(min(X_train, key=len))
+print(f"\nМинимальная длина отзыва: {min_review_length}")
+review_length = len(review_indices)
+print(f"\nДлина отзыва: {review_length}")
+```
+Максимальная длина отзыва: 500 <br>
+Минимальная длина отзыва: 500 <br>
+Длина отзыва: 502
+
+```python
+#Вывод отзыва из обучающего множества в виде списка индексов слов
+some_number = 132
+review_indices = X_train[some_number]
+print("Список индексов слов:")
+print(review_indices)
+
+#Преобразование списка индексов в текст
+review_as_text = ' '.join(id_to_word[id] for id in X_train[some_number])
+print("\nОтзыв в виде текста:")
+print(review_as_text)
+```
+Список индексов слов:
+[   1  225  164  433   74 2753   35 2188   20    5  397   35  298   20
+  585  305   10   10   45   64   61  652   21    6   52  708    9    2
+  725    4    2    7 1451 1089  105   17  230   17    2    9 1947    4
+ 3238   11  135  422   26    6    2 1021  378 1780  224  472   36   26
+  379  724 2607  387  178 1582    4  771   36    2  112    2   34    6
+    2   10   10  300  103    6    2    2    8  516   25   30  252    8
+  376   90   51 1455  335 4010   33   54   25 2440   90  125   10   10
+  241 1559    4  609   46    7    4    2   11 3826    2    5   11 1011
+    7 4193    7 4684    2 3498   90    8 3534    2    7 4779   10   10
+  342   92 1414  979    4  568   44    4    2    5  331 2237   18   57
+  684   52  282   15    4 1788   71    2   34   90   10   10  470  137
+  269    8 1090  387  129  761   46    7  129 1682   17   76   17  614
+    8    2   15    4    2    2   41   10   10  457  103  397  339   39
+  294    8  169    4    2  103    2  129  322   30  252    8 2222   98
+  245   17  515   17  614   38   25   70  393   90   31   23   31   57
+  213   11  112    2  208   10   10  150  474  115  535   15  101  415
+   62   30    2    8  231    6  171 2497  467  134    2    2   21    4
+  105   11  135  422   26   38    2    5   97   38  111 1297 2497   15
+   45 2620 1167   18    4  529    8  459   44   68 4369  237   36   26
+ 1484    7   68  205  399   14 1098    4    2    7    4  436   22   10
+   10   11  420   25   71 1535    4    2  161  570   19    2    2  105
+  237   36  533   26 1348    2    2   18  487   14    2   36  872    8
+   97 1186   38    2 2270   15   32  281    7  635  271   46    4 2054
+   10   10  300    4    2 1098    6 1002 1004    6  568 1709  475  137
+    4 2311    9 2359   57   53   74  747 2194  245   10   10  241    4
+    2    2   11   32 2580    7    2 4983   11 3826    2    5  187 3400
+    7   84  246   57   31   85   74    4 1021  378  186    8 1495   27
+ 1032 2003   10   10  342    4    2    2   35 1755 1166    7  567   15
+   62   28  556  101  406  112   10   10  470    4  836  139   69   57
+ 1546 1689   11  192   49  139   71 1504 1677    2   39  298  102   10
+   10    4   64 1123    9    4    2  751    4  130   63   16    6  184
+ 1770  136  237   12   16    2  725    4  322   45   99   78    4 1057
+ 1477   12   56   19   35    2  379  277   15  266   46    7  317 1845
+   10   10  371    4    2  496    4  231    7  135  422  144   30 3013
+    7  533  128  246   36  144   43  847    8 2642    5  193    2   19
+   84   37   97  102   19    6  729    2   18  489]
+
+Отзыв в виде текста:
+<START> there's nothing worse than renting an asian movie and getting an american movie experience instead br br it's only my opinion but a good thriller is <UNK> upon the <UNK> of likable intelligent characters as far as <UNK> is concerned the protagonists in say yes are a <UNK> married couple nicely done unfortunately they are stupid beyond belief let us count the ways they <UNK> being <UNK> by a <UNK> br br 1 after a <UNK> <UNK> to kill you be sure to tell him what hotel you're staying at when you drop him off br br 2 beat the hell out of the <UNK> in broad <UNK> and in front of dozens of witnesses <UNK> allowing him to press <UNK> of assault br br 3 don't bother telling the police about the <UNK> and simply assume for no apparently good reason that the cops were <UNK> by him br br 4 while trying to escape let your lady out of your sight as much as possible to <UNK> that the <UNK> <UNK> her br br 5 after getting help from someone to find the <UNK> after <UNK> your wife be sure to send them away as soon as possible so you can face him one on one no point in being <UNK> right br br now i'd never expect that any person would be <UNK> to making a few mistakes under these <UNK> <UNK> but the characters in say yes are so <UNK> and make so many unbelievable mistakes that it's effectively impossible for the viewer to care about their safety since they are victims of their own doing this kills the <UNK> of the entire film br br in case you were wondering the <UNK> didn't stop with <UNK> <UNK> characters since they themselves are surely <UNK> <UNK> for writing this <UNK> they decided to make situations so <UNK> unrealistic that all sense of reality goes out the window br br 1 the <UNK> kills a cop inside a police station  while the protagonist is asleep no more than ten feet away br br 2 the <UNK> <UNK> in all sorts of <UNK> activities in broad <UNK> and around tons of people yet no one other than the married couple seems to notice his odd behavior br br 3 the <UNK> <UNK> an absurd amount of violence that would have killed any human being br br 4 the suspense scenes had no imagination whatsoever in fact some scenes were direct rip <UNK> from american movies br br the only positive is the <UNK> near the end which was a pretty brutal scene since it was <UNK> upon the wife it's too bad the filmmakers followed it up with an <UNK> stupid ending that comes out of left field br br truly the <UNK> behind the making of say yes should be ashamed of themselves better yet they should just move to california and take <UNK> with people who make movies with a similar <UNK> for quality
+
+
+Отбросили два последних слова, чтобы длина отзыва стала равна заданному значению (500)
+
+---
+### 7. Вывод предобработанных массивов обучающих и тестовых данных и их размерностей
+```python
+# вывод данных
+print('X train: \n',X_train)
+print('X train: \n',X_test)
+
+# вывод размерностей
+print('Shape of X train:', X_train.shape)
+print('Shape of X test:', X_test.shape)
+```
+X train: <br> 
+ [[   0    0    0 ...  104  545    7] <br>
+ [   0    0    0 ...    2  262  372] <br>
+ [   0    0    0 ...  758   10   10] <br>
+ ... <br>
+ [   0    0    0 ...    2   27  375] <br>
+ [   0    0    0 ...   11  111  531] <br>
+ [   0    0    0 ...  152 1833   12]] <br> <br>
+X train: <br>
+ [[   0    0    0 ...    2  126 3849] <br>
+ [   0    0    0 ...   25 1833   12] <br>
+ [   0    0    0 ...  129  249 4262] <br>
+ ... <br>
+ [   0    0    0 ...    2   24 1178] <br>
+ [   0    0    0 ...   61  278  145] <br>
+ [   0    0    0 ...   12    5  358]] <br>
+
+Shape of X train: (25000, 500) <br>
+Shape of X test: (25000, 500)
+
+---
+### 8. Реализация модели рекуррентной нейронной сети, состоящей из слоев Embedding, LSTM, Dropout, Dense и её обучение. Вывод информации об архитектуре нейронной сети.
+```python
+model = Sequential()
+model.add(layers.Embedding(input_dim=vocabulary_size, output_dim=32, input_length=max_words, input_shape=(max_words,)))
+model.add(layers.LSTM(76))
+model.add(layers.Dropout(0.3))
+model.add(layers.Dense(1, activation='sigmoid'))
+
+model.summary()
+```
+<table>
+    <thead>
+        <tr>
+            <th>Layer (type)</th>
+            <th>Output Shape</th>
+            <th>Param #</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>embedding_1 (Embedding)</td>
+            <td>(None, 500, 32)</td>
+            <td>160,000</td>
+        </tr>
+        <tr>
+            <td>lstm_1 (LSTM)</td>
+            <td>(None, 76)  </td>
+            <td>33,136</td>
+        </tr>
+        <tr>
+            <td>dropout_1 (Dropout)</td>
+            <td>(None, 76) </td>
+            <td>0</td>
+        </tr>
+        <tr>
+            <td>dense_1 (Dense)</td>
+            <td>(None, 1)</td>
+            <td>77</td>
+        </tr>
+    </tbody>
+</table>
+
+Total params: 193,213 (754.74 KB) <br>
+Trainable params: 193,213 (754.74 KB) <br>
+Non-trainable params: 0 (0.00 B)
+```python
+batch_size = 64
+epochs = 5
+model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
+model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)
+```
+```python
+test_loss, test_acc = model.evaluate(X_test, y_test)
+print(f"\nTest accuracy: {test_acc}")
+```
+Test accuracy: 0.8670799732208252
+
+---
+### 9. Оценить качество обучения на тестовых данных.
+```python
+y_score = model.predict(X_test)
+y_pred = [1 if y_score[i,0]>=0.5 else 0 for i in range(len(y_score))]
+from sklearn.metrics import classification_report
+print(classification_report(y_test, y_pred, labels = [0, 1], target_names=['Negative', 'Positive']))
+```
+<table>
+    <thead>
+        <tr>
+            <th>        </th>
+            <th>precision</th>
+            <th>recall</th>
+            <th>f1-score</th>
+            <th>support</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Negative</td>
+            <td>0.86</td>
+            <td>0.88</td>
+            <td>0.87</td>
+            <td>12500</td>
+        </tr>
+        <tr>
+            <td>Positive</td>
+            <td>0.88</td>
+            <td>0.86</td>
+            <td>0.87</td>
+            <td>12500</td>
+        </tr>
+        <tr>
+            <td colspan = 5> </td>
+        </tr>
+        <tr>
+            <td>accuracy</td>
+            <td> </td>
+            <td> </td>
+            <td>0.87</td>
+            <td>25000</td>
+        </tr>
+        <tr>
+            <td>macro avg</td>
+            <td>0.87</td>
+            <td>0.87</td>
+            <td>0.87</td>
+            <td>25000</td>
+        </tr>
+        <tr>
+            <td>weighted avg</td>
+            <td>0.87</td>
+            <td>0.87</td>
+            <td>0.87</td>
+            <td>25000</td>
+        </tr>
+    </tbody>
+</table>
+
+```python
+from sklearn.metrics import roc_curve, auc
+import matplotlib.pyplot as plt
+fpr, tpr, thresholds = roc_curve(y_test, y_score)
+plt.plot(fpr, tpr)
+plt.grid()
+plt.xlabel('False Positive Rate')
+plt.ylabel('True Positive Rate')
+plt.title('ROC')
+plt.show()
+print('Area under ROC is', auc(fpr, tpr))
+```
+
+Area under ROC is 0.936850272
+
+**Вывод**
+<table>
+    <thead>
+        <tr>
+            <th>Модель</th>
+            <th>Количество настраиваемых параметров</th>
+            <th>Количество эпох обучения</th>
+            <th>Качество классификации тестовой выборки</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Рекуррентная</td>
+            <td><center>193,213</center></td>
+            <td><center>3</center></td>
+            <td>accuracy: 0.867<br> loss: 0.331 <br> ROC: 0.936</td>
+        </tr>
+    </tbody>
+</table>
+
+В ходе лабораторной работы было изучено применение рекуррентной нейронной сети. Исходя из анализа полученных результатов, представленных в таблице, делаем вывод, что модель хорошо справилась с задачей определения тональности текста:  accuracy = 0.867, loss = 0.331. Показатель точности превышает требуемый порог в 0,8. Значение  ROC превышает 0,9, что свидетельствует о способности модели хорошо различать два класса - отрицательные и положительные отзывы