Andrey 7 месяцев назад
Родитель 6e007803d6
Сommit 209ff00da3

3
.gitignore поставляемый

@ -0,0 +1,3 @@
invisible*
.venv*
.~lock*

@ -0,0 +1,11 @@
# Интеллектуальные информационные системы
## Лекции
| Дата |Лекция |
|:----------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 05.09.2024 | [Вводная лекция](lectures/lec1.odp) |
| 12.09.2024 | [Изолирование окружения. Docker](lectures/lec2-Docker.odp) |
| 19.09.2024 | [Разведочный анализ данных](lectures/lec3-eda) |
## Лабораторные работы

@ -0,0 +1,9 @@
FROM python:3.11-slim
COPY . /my_app
WORKDIR /my_app
RUN pip install tqdm
ENTRYPOINT [ "python", "main.py" ]

@ -0,0 +1,6 @@
import sys
def main(a = 3, b = 5):
print(f"multiply {a} by {b} is {a * b}")
main(int(sys.argv[1]), int(sys.argv[2]))

@ -0,0 +1,449 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"id": "e312113e",
"metadata": {},
"outputs": [],
"source": [
"\n",
"import pandas as pd\n",
"import matplotlib as plt\n",
"import seaborn as sns\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "de2c028d",
"metadata": {},
"source": [
"# Загрузка и знакомство с данными"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5cd00195",
"metadata": {},
"outputs": [],
"source": [
"# dataset https://www.kaggle.com/datasets/mrdaniilak/russia-real-estate-20182021/data \n",
"\n",
"df = pd.read_csv('data/all_v2.csv')\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "05b57100",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>price</th>\n",
" <th>date</th>\n",
" <th>time</th>\n",
" <th>geo_lat</th>\n",
" <th>geo_lon</th>\n",
" <th>region</th>\n",
" <th>building_type</th>\n",
" <th>level</th>\n",
" <th>levels</th>\n",
" <th>rooms</th>\n",
" <th>area</th>\n",
" <th>kitchen_area</th>\n",
" <th>object_type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>6050000</td>\n",
" <td>2018-02-19</td>\n",
" <td>20:00:21</td>\n",
" <td>59.805808</td>\n",
" <td>30.376141</td>\n",
" <td>2661</td>\n",
" <td>1</td>\n",
" <td>8</td>\n",
" <td>10</td>\n",
" <td>3</td>\n",
" <td>82.6</td>\n",
" <td>10.8</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>8650000</td>\n",
" <td>2018-02-27</td>\n",
" <td>12:04:54</td>\n",
" <td>55.683807</td>\n",
" <td>37.297405</td>\n",
" <td>81</td>\n",
" <td>3</td>\n",
" <td>5</td>\n",
" <td>24</td>\n",
" <td>2</td>\n",
" <td>69.1</td>\n",
" <td>12.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4000000</td>\n",
" <td>2018-02-28</td>\n",
" <td>15:44:00</td>\n",
" <td>56.295250</td>\n",
" <td>44.061637</td>\n",
" <td>2871</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>9</td>\n",
" <td>3</td>\n",
" <td>66.0</td>\n",
" <td>10.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1850000</td>\n",
" <td>2018-03-01</td>\n",
" <td>11:24:52</td>\n",
" <td>44.996132</td>\n",
" <td>39.074783</td>\n",
" <td>2843</td>\n",
" <td>4</td>\n",
" <td>12</td>\n",
" <td>16</td>\n",
" <td>2</td>\n",
" <td>38.0</td>\n",
" <td>5.0</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5450000</td>\n",
" <td>2018-03-01</td>\n",
" <td>17:42:43</td>\n",
" <td>55.918767</td>\n",
" <td>37.984642</td>\n",
" <td>81</td>\n",
" <td>3</td>\n",
" <td>13</td>\n",
" <td>14</td>\n",
" <td>2</td>\n",
" <td>60.0</td>\n",
" <td>10.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3300000</td>\n",
" <td>2018-03-02</td>\n",
" <td>21:18:42</td>\n",
" <td>55.908253</td>\n",
" <td>37.726448</td>\n",
" <td>81</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>32.0</td>\n",
" <td>6.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>4704280</td>\n",
" <td>2018-03-04</td>\n",
" <td>12:35:25</td>\n",
" <td>55.621097</td>\n",
" <td>37.431002</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>25</td>\n",
" <td>1</td>\n",
" <td>31.7</td>\n",
" <td>6.0</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>3600000</td>\n",
" <td>2018-03-04</td>\n",
" <td>20:52:38</td>\n",
" <td>59.875526</td>\n",
" <td>30.395457</td>\n",
" <td>2661</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>31.1</td>\n",
" <td>6.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>3390000</td>\n",
" <td>2018-03-05</td>\n",
" <td>07:07:05</td>\n",
" <td>53.195031</td>\n",
" <td>50.106952</td>\n",
" <td>3106</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" <td>24</td>\n",
" <td>2</td>\n",
" <td>64.0</td>\n",
" <td>13.0</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2800000</td>\n",
" <td>2018-03-06</td>\n",
" <td>09:57:10</td>\n",
" <td>55.736972</td>\n",
" <td>38.846457</td>\n",
" <td>81</td>\n",
" <td>1</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>2</td>\n",
" <td>55.0</td>\n",
" <td>8.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" price date time geo_lat geo_lon region building_type \\\n",
"0 6050000 2018-02-19 20:00:21 59.805808 30.376141 2661 1 \n",
"1 8650000 2018-02-27 12:04:54 55.683807 37.297405 81 3 \n",
"2 4000000 2018-02-28 15:44:00 56.295250 44.061637 2871 1 \n",
"3 1850000 2018-03-01 11:24:52 44.996132 39.074783 2843 4 \n",
"4 5450000 2018-03-01 17:42:43 55.918767 37.984642 81 3 \n",
"5 3300000 2018-03-02 21:18:42 55.908253 37.726448 81 1 \n",
"6 4704280 2018-03-04 12:35:25 55.621097 37.431002 3 2 \n",
"7 3600000 2018-03-04 20:52:38 59.875526 30.395457 2661 1 \n",
"8 3390000 2018-03-05 07:07:05 53.195031 50.106952 3106 2 \n",
"9 2800000 2018-03-06 09:57:10 55.736972 38.846457 81 1 \n",
"\n",
" level levels rooms area kitchen_area object_type \n",
"0 8 10 3 82.6 10.8 1 \n",
"1 5 24 2 69.1 12.0 1 \n",
"2 5 9 3 66.0 10.0 1 \n",
"3 12 16 2 38.0 5.0 11 \n",
"4 13 14 2 60.0 10.0 1 \n",
"5 4 5 1 32.0 6.0 1 \n",
"6 1 25 1 31.7 6.0 11 \n",
"7 2 5 1 31.1 6.0 1 \n",
"8 4 24 2 64.0 13.0 11 \n",
"9 9 10 2 55.0 8.0 1 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(10)"
]
},
{
"cell_type": "markdown",
"id": "6c892b3e",
"metadata": {},
"source": [
"# Очистка данных"
]
},
{
"cell_type": "markdown",
"id": "a3d3ad69",
"metadata": {},
"source": [
"# Анализ признаков для модели\n",
"\n",
"https://seaborn.pydata.org/examples/index.html - галерея примеров"
]
},
{
"cell_type": "markdown",
"id": "78845b8b",
"metadata": {},
"source": [
"## histplot"
]
},
{
"cell_type": "markdown",
"id": "9318a819",
"metadata": {},
"source": [
"## heatmap"
]
},
{
"cell_type": "markdown",
"id": "f4ab2ef2",
"metadata": {},
"source": [
"# Групповые операции"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "11e4da4e",
"metadata": {},
"outputs": [],
"source": [
"def flat_index(df_stats): \n",
" df_stats.columns = df_stats.columns.get_level_values(0) + '_' + df_stats.columns.get_level_values(1) \n",
" df_stats.columns = df_stats.columns.to_flat_index() \n",
" df_stats.reset_index(inplace=True) \n",
" return df_stats"
]
},
{
"cell_type": "markdown",
"id": "0fbb62de",
"metadata": {},
"source": [
"## lineplot"
]
},
{
"cell_type": "markdown",
"id": "b8bc652d",
"metadata": {},
"source": [
"## subplots"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7fab88d2",
"metadata": {},
"outputs": [],
"source": [
"fig, axs = plt.pyplot.subplots(2,2)\n",
"fig.tight_layout(pad=1)\n",
"fig.set_size_inches(16.5, 14, forward=True)\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "ba7d6b7c",
"metadata": {},
"source": [
"## displot"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eca41c1e",
"metadata": {},
"outputs": [],
"source": [
"for col in categorial_cols:\n",
" print(f'Unique categories in {col}: {df[col].nunique()}')dd"
]
},
{
"cell_type": "markdown",
"id": "ef2501d0",
"metadata": {},
"source": [
"## histplot"
]
},
{
"cell_type": "markdown",
"id": "5d64d58a",
"metadata": {},
"source": [
"# Bokeh\n",
"https://bokeh.org/"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fddb38a2",
"metadata": {},
"outputs": [],
"source": [
"from bokeh.plotting import figure, show\n",
"from bokeh.models import ColumnDataSource, HoverTool, Legend\n",
"from bokeh.io import output_notebook \n",
"output_notebook()"
]
},
{
"cell_type": "markdown",
"id": "3a7cbaaa",
"metadata": {},
"source": [
"# Выводы после EDA"
]
},
{
"cell_type": "markdown",
"id": "695334e5",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv_sprint02",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,4 @@
pandas
bokeh
matplotlib
seaborn

Двоичные данные
lectures/lec1.odp

Двоичный файл не отображается.

Двоичные данные
lectures/lec2-docker.odp

Двоичный файл не отображается.

Двоичные данные
lectures/lec3-eda.odp

Двоичный файл не отображается.
Загрузка…
Отмена
Сохранить