Data Processing

Data Processing

jupyter with conda:

1
python -m ipykernel install --user --name 环境名 --display-name “环境名”

1. import packages

1
2
3
4
5
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import icecream
import tqdm as tqdm

2. read files

1
df_deviceactive = pd.read_csv('./2021_1_data/1_device_active.csv',sep="|",nrows=100000)

3. drawings

1
2
3
plt.scatter(X,Y)
plt.plot(X,Y)
plt.hist(df_active_info['age'])

4. statistics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
df_active_info.count() # 显示各列非NAN的数量
```
device_id 2000000
days 2000000
daynum 2000000
gender 1995726
age 1805973
device 2000000
city 2000000
is_vip 247046
topics 8689
dtype: int64
```

df_userinfo['gender'].value_counts() # 统计各种取值的数量
```
0.0 1360810
1.0 634916
Name: gender, dtype: int64
```

df_active_info['age'].dropna() # 去除NAN

5. some snippets

1
2
for i,days in tqdm(enumerate(df_deviceactive['days'])): # 迭代

  • Copyrights © 2018-2022 Haojia Zhu
  • Visitors: | Views:

请我喝杯咖啡吧~

支付宝
微信