爱芯元智AX650N部署yolov8s 自定义模型
本博客将向你展示零基础一步步的部署好自己的yolov8s模型(博主展示的是自己训练的手写数字识别模型),本博客教你从训练模型到转化成利于Pulsar2 工具量化部署到开发板上
训练自己的YOLOv8s模型
准备自定义数据集
数据集结构可以不像下面一样,这个只是记录当前测试适合的数据集目录结构,常见结构也有VOC结构,所以看个人喜好
└─yolov8s_datasets: 自定义数据集
├─test
│ └─images 图片文件
│ └─label 标签文件
├─train
│ └─images 图片文件
│ └─label 标签文件
├─valid
│ └─images 图片文件
│ └─label 标签文件
├─data.yaml 路径和类别
本博客的data.yaml内容如下:
train: ../train/images
val: ../valid/images
test: ../test/images
nc: 10
names: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
YOLOv8训练环境搭建
git clone https://github.com/ultralytics/ultralytics
cd ultralytics
pip install -e .
https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt
model路径可以指定绝对路径,source也可以指定图片的绝对路径
yolo predict model=yolov8n.pt source='https://ultralytics.com/images/bus.jpg'
训练自己的YOLOv8s模型
- 训练模型(官方有两种方式一种是使用CLI命令,另一种是使用PYTHON命令)
我比较喜欢训练用PYTHON命令,测试用CLI命令吗,看个人喜好
YOLOv8官方PYTHON的用法
YOLOv8官方CLI的用法
cd ultralytics
touch my_train.py
将下面内容填写到py文件
from ultralytics import YOLO
model = YOLO('/root/ultralytics/yolov8s.pt')
results = model.train(data='/root/data1/wxw/yolov8s_datasets/data.yaml',epochs=80,amp=False,batch=16,val=True,device=0)
在此路径下执行python3 my_train.py
yolo predict model=/root/ultralytics/runs/detect/train17/weights/best.pt source='/root/ultralytics/ultralytics/assets/www.png' imgsz=640
模型部署和实机测试
前期准备
(1)导出onnx模型(记得加上opset=11)
yolo task=detect mode=export model=/root/ultralytics/runs/detect/train17/weights/best.pt format=onnx opset=11
(2)onnx模型onnxsim化
python3 -m onnxsim best.onnx yolov8s_number_sim.onnx
终端输出信息:
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add │ 9 │ 8 │
│ Concat │ 24 │ 19 │
│ Constant │ 153 │ 139 │
│ Conv │ 64 │ 64 │
│ Div │ 2 │ 1 │
│ Gather │ 4 │ 0 │
│ MaxPool │ 3 │ 3 │
│ Mul │ 60 │ 58 │
│ Reshape │ 5 │ 5 │
│ Resize │ 2 │ 2 │
│ Shape │ 4 │ 0 │
│ Sigmoid │ 58 │ 58 │
│ Slice │ 2 │ 2 │
│ Softmax │ 1 │ 1 │
│ Split │ 9 │ 9 │
│ Sub │ 2 │ 2 │
│ Transpose │ 2 │ 2 │
│ Unsqueeze │ 7 │ 0 │
│ Model Size │ 42.6MiB │ 42.6MiB │
└────────────┴────────────────┴──────────────────┘
(3)获得onnxsim化模型的sub
touch zhuanhuan.py
把下面内容加入进去,记得路径替换为自己模型
import onnx
input_path = "/root/ultralytics/runs/detect/train17/weights/yolov8s_number_sim.onnx"
output_path = "yolov8s_number_sim_sub.onnx"
input_names = ["images"]
output_names = ["400","433"]
onnx.utils.extract_model(input_path, output_path, input_names, output_names)
得到模型如下图:
└─data:
├─config
│ └─yolov8s_config_b1.json
├─dataset
│ └─calibration_data.tar 四张数据集照片
├─model
│ └─yolov8s_number_sim_sub.onnx
├─pulsar2-run-helper
其中yolov8s_config_b1.json文件配置如下:
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"quant": {
"input_configs": [
{
"tensor_name": "images",
"calibration_dataset": "./dataset/calibration_data.tar",
"calibration_size": 4,
"calibration_mean": [0, 0, 0],
"calibration_std": [255.0, 255.0, 255.0]
}
],
"calibration_method": "MinMax",
"precision_analysis": true,
"precision_analysis_method":"EndToEnd"
},
"input_processors": [
{
"tensor_name": "images",
"tensor_format": "BGR",
"src_format": "BGR",
"src_dtype": "U8",
"src_layout": "NHWC"
}
],
"output_processors": [
{
"tensor_name": "400",
"dst_perm": [0, 1, 3, 2]
},
{
"tensor_name": "433",
"dst_perm": [0, 2, 1]
}
],
"compiler": {
"check": 0
}
}
axmodel模型获取
进入docker环境(怎么搭建可以查看yolov5的自定义模型),将data文件拷贝到其中
执行下面命令:
cd data/
pulsar2 build --input model/yolov8s_number_sim_sub.onnx --output_dir output --config config/yolov8s_config_b1.json
终端输出信息:
root@1657ec5355e2:/data# pulsar2 build --input model/yolov8s_number_sim_sub.onnx --output_dir output --config config/yolov8s_config_b1.json
2023-11-24 17:00:31.661 | WARNING | yamain.command.build:fill_default:320 - ignore images csc config because of src_format is AutoColorSpace or src_format and tensor_format are the same
Building onnx ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
2023-11-24 17:00:33.226 | INFO | yamain.command.build:build:444 - save optimized onnx to [output/frontend/optimized.onnx]
2023-11-24 17:00:33.229 | INFO | yamain.common.util:extract_archive:21 - extract [dataset/calibration_data.tar] to [output/quant/dataset/images]...
Quant Config Table
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Input ┃ Shape ┃ Dataset Directory ┃ Data Format ┃ Tensor Format ┃ Mean ┃ Std ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ images │ [1, 3, 640, 640] │ images │ Image │ BGR │ [0.0, 0.0, 0.0] │ [255.0, 255.0, 255.0] │
└────────┴──────────────────┴───────────────────┴─────────────┴───────────────┴─────────────────┴───────────────────────┘
Transformer optimize level: 0
4 File(s) Loaded.
[17:00:35] AX LSTM Operation Format Pass Running ... Finished.
[17:00:35] AX Set MixPrecision Pass Running ... Finished.
[17:00:35] AX Refine Operation Config Pass Running ... Finished.
[17:00:35] AX Reset Mul Config Pass Running ... Finished.
[17:00:35] AX Tanh Operation Format Pass Running ... Finished.
[17:00:35] AX Confused Op Refine Pass Running ... Finished.
[17:00:35] AX Quantization Fusion Pass Running ... Finished.
[17:00:35] AX Quantization Simplify Pass Running ... Finished.
[17:00:35] AX Parameter Quantization Pass Running ... Finished.
Calibration Progress(Phase 1): 100%|██████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.54it/s]
Finished.
[17:00:38] AX Passive Parameter Quantization Running ... Finished.
[17:00:38] AX Parameter Baking Pass Running ... Finished.
[17:00:38] AX Refine Int Parameter Pass Running ... Finished.
[17:00:39] AX Refine Weight Parameter Pass Running ... Finished.
--------- Network Snapshot ---------
Num of Op: [166]
Num of Quantized Op: [166]
Num of Variable: [320]
Num of Quantized Var: [320]
------- Quantization Snapshot ------
Num of Quant Config: [521]
BAKED: [64]
OVERLAPPED: [230]
ACTIVATED: [147]
SOI: [17]
PASSIVE_BAKED: [63]
Network Quantization Finished.
quant.axmodel export success: output/quant/quant_axmodel.onnx
===>export per layer debug_data(float data) to folder: output/quant/debug/float
Writing npy... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
===>export input/output data to folder: output/quant/debug/test_data_set_0
Building native ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
2023-11-24 17:00:43.829 | WARNING | yamain.command.load_model:pre_process:454 - preprocess tensor [images]
2023-11-24 17:00:43.830 | INFO | yamain.command.load_model:pre_process:456 - tensor: images, (1, 640, 640, 3), U8
2023-11-24 17:00:43.830 | INFO | yamain.command.load_model:pre_process:456 - op: op:pre_dequant_1, AxDequantizeLinear, {'const_inputs': {'x_zeropoint': array(0, dtype=int32), 'x_scale': array(1., dtype=float32)}, 'output_dtype': <class 'numpy.float32'>, 'quant_method': 0}
2023-11-24 17:00:43.830 | INFO | yamain.command.load_model:pre_process:456 - tensor: tensor:pre_norm_1, (1, 640, 640, 3), FP32
2023-11-24 17:00:43.830 | INFO | yamain.command.load_model:pre_process:456 - op: op:pre_norm_1, AxNormalize, {'dim': 3, 'mean': [0.0, 0.0, 0.0], 'std': [255.0, 255.0, 255.0]}
2023-11-24 17:00:43.830 | INFO | yamain.command.load_model:pre_process:456 - tensor: tensor:pre_transpose_1, (1, 640, 640, 3), FP32
2023-11-24 17:00:43.830 | INFO | yamain.command.load_model:pre_process:456 - op: op:pre_transpose_1, AxTranspose, {'perm': [0, 3, 1, 2]}
2023-11-24 17:00:43.831 | WARNING | yamain.command.load_model:post_process:475 - postprocess tensor [400]
2023-11-24 17:00:43.831 | INFO | yamain.command.load_model:handle_postprocess:473 - op: op:post_transpose_1, AxTranspose
2023-11-24 17:00:43.831 | WARNING | yamain.command.load_model:post_process:475 - postprocess tensor [433]
2023-11-24 17:00:43.831 | INFO | yamain.command.load_model:handle_postprocess:473 - op: op:post_transpose_2, AxTranspose
tiling op... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 303/303 0:00:00
new_ddr_tensor = []
<frozen backend.ax650npu.oprimpl.normalize>:186: RuntimeWarning: divide by zero encountered in divide
<frozen backend.ax650npu.oprimpl.normalize>:187: RuntimeWarning: invalid value encountered in divide
build op... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1254/1254 0:00:06
add ddr swap... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2269/2269 0:00:00
calc input dependencies... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2659/2659 0:00:00
calc output dependencies... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2659/2659 0:00:00
assign eu heuristic ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2659/2659 0:00:00
assign eu onepass ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2659/2659 0:00:00
assign eu greedy ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2659/2659 0:00:00
2023-11-24 17:00:52.838 | INFO | yasched.test_onepass:results2model:2004 - max_cycle = 8,507,216
2023-11-24 17:00:53.860 | INFO | yamain.command.build:compile_npu_subgraph:1076 - QuantAxModel macs: 14,226,048,000
2023-11-24 17:00:53.862 | INFO | yamain.command.build:compile_npu_subgraph:1084 - use random data as gt input: images, uint8, (1, 640, 640, 3)
2023-11-24 17:00:58.726 | INFO | yamain.command.build:compile_ptq_model:1003 - fuse 1 subgraph(s)
root@1657ec5355e2:/data# ls
config dataset model output pulsar2-run-helper
root@1657ec5355e2:/data# cp -r output /mnt/
axmodel转化成功后可以在后缀加上.onnx,如下:
部署到开发板
开发板镜像为1.27版本,采用本地编译
下载源码:
git clone https://github.com/AXERA-TECH/ax-samples.git
修改ax_yolov8s_steps.cc文件中:
修改classname标签和类别数量
const char* CLASS_NAMES[] = {
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"};
int NUM_CLASS = 10;
cd ax-samples
mkdir build && cd build
cmake -DBSP_MSP_DIR=/soc/ -DAXERA_TARGET_CHIP=ax650 ..
make -j6
make install
编译完成后,生成的可执行示例存放在 ax-samples/build/install/ax650/
路径下:
ax-samples/build$ tree install
install
└── ax650
├── ax_classification
├── ax_detr
├── ax_dinov2
├── ax_glpdepth
├── ax_hrnet
├── ax_imgproc
├── ax_pfld
├── ax_pp_humanseg
├── ax_pp_liteseg_stdc2_cityscapes
├── ax_pp_ocr_rec
├── ax_pp_person_attribute
├── ax_pp_vehicle_attribute
├── ax_ppyoloe
├── ax_ppyoloe_obj365
├── ax_realesrgan
├── ax_rtmdet
├── ax_scrfd
├── ax_segformer
├── ax_simcc_pose
├── ax_yolo_nas
├── ax_yolov5_face
├── ax_yolov5s
├── ax_yolov5s_seg
├── ax_yolov6
├── ax_yolov7
├── ax_yolov7_tiny_face
├── ax_yolov8
├── ax_yolov8_pose
└── ax_yolox
将axmodel模型放在可执行文件下和测试图片:
root@maixbox:/home/ax-samples/build/install/ax650# ./ax_yolov8 -m yolov8snumber.axmodel -i 1.jpg
model file : yolov8snumber.axmodel
image file : 1.jpg
img_h, img_w : 640 640
WARN,Func(__is_valid_file),NOT find file = '/etc/ax_syslog.conf'
ERROR,Func(__syslog_parma_cfg_get), NOT find = '/etc/ax_syslog.conf'
Engine creating handle is done.
Engine creating context is done.
Engine get io info is done.
Engine alloc io is done.
Engine push input is done.
post process cost time:0.49 ms
Repeat 1 times, avg time 10.92 ms, max_time 10.92 ms, min_time 10.92 ms
detection num: 4
2: 94%, [ 275, 38, 362, 168], 2
3: 94%, [ 58, 47, 145, 175], 3
1: 92%, [ 75, 250, 140, 378], 1
1: 90%, [ 288, 249, 336, 378], 1