Version: 1.5.0

四、TacoAI 应用

先安装 tps-test 下载测试资源:

sudo apt update
sudo apt-get install tps-test

测试资源位于 /usr/data 目录下：

root@taco-dk:~# ls /usr/data/
automation  ffmpeg    ids        otp               taco-pipeline-1.0.0  venc
deb         fio       libcv      performance_test  tps_pipeline
distro      gmac      multicore  sound             usb
dvfs        gpiozero  npu        suspend           vdec

4.1 编码

视频编码测试命令是 taco_venc_simple_example，主要用于将 YUV 数据编码成 H.264 数据。测试资源为 /usr/data/venc/1920x1080_420p_10f.yuv（10 帧，1080P，YUV420p 格式数据）。使用说明如下：

taco_venc_simple_example encode <线程数> <编码周期> <循环次数> <帧数> <保存文件>

参数说明：

encode：指定编码操作
线程数：指定用于编码的线程数量
编码周期：每个线程编码操作的周期数
循环次数：每个周期中要编码的循环次数
帧数：每个循环中要编码的帧数
保存文件：是否将编码后的文件保存到磁盘

示例命令：

root@taco-dk:~# taco_venc_simple_example encode 4 1 1 10 1
NOTE: Using useExternal=1 (direct buffer access) for encoding test
TACO VENC Multi-threaded Test Configuration:
  Threads: 4
  Encoder cycles: 1
  Loops per encoder: 1
  Frames per loop: 10
  Save file: yes
  Total frames per thread: 10
Thread 0 - Cycle 1 - Loop 1: FPS: 60.83, Frames: 10, Streams: 10
Thread 0: Completed
Thread 3 - Cycle 1 - Loop 1: FPS: 60.10, Frames: 10, Streams: 10
Thread 1 - Cycle 1 - Loop 1: FPS: 60.67, Frames: 10, Streams: 10
Thread 1: Completed
Thread 3: Completed
Thread 2 - Cycle 1 - Loop 1: FPS: 57.51, Frames: 10, Streams: 10
Thread 2: Completed
All threads completed

输出文件为 output_thread_x.h264：

root@taco-dk:~# ls
a.out    da_pec_u0             output_thread_2.h264    test
core     output_thread_0.h264  output_thread_3.h264    test.c
da_gc_u  output_thread_1.h264  regress97_riscv.tar.gz

输出日志解析：

编码配置：显示多线程编码的配置信息，包括线程数、编码周期、循环次数和帧数。

TACO VENC Multi-threaded Test Configuration:
  Threads: 4
  Encoder cycles: 1
  Loops per encoder: 1
  Frames per loop: 10
  Save file: yes
  Total frames per thread: 10

编码性能：显示每个线程在每个周期和循环中的编码帧率（FPS）和处理的帧数。

Thread 0 - Cycle 1 - Loop 1: FPS: 60.83, Frames: 10, Streams: 10
Thread 3 - Cycle 1 - Loop 1: FPS: 60.10, Frames: 10, Streams: 10
Thread 1 - Cycle 1 - Loop 1: FPS: 60.67, Frames: 10, Streams: 10
Thread 2 - Cycle 1 - Loop 1: FPS: 57.51, Frames: 10, Streams: 10

编码完成：所有线程编码完成后的提示。

All threads completed

4.2 解码

视频解码测试命令是 ppvdec，主要用于将 H.264、H.265 或 JPEG 数据解码成 YUV 或 RGB 数据。测试资源位于 /usr/data/vdec 目录下。使用说明如下：

ppvdec decode_mode <线程数> <循环次数> <解码周期> <帧数> <保存文件> [解码类型] [分辨率] [PP模式]

参数说明：

线程数：解码线程数量（1-32）
循环次数：外层循环数量
解码周期：每个线程的解码循环数量
帧数：每轮循环需要解码的帧数
保存文件：1 = 保存 YUV 文件，0 = 不保存
解码类型：0 = H.264，1 = H.265（可选参数，默认值为 0）
分辨率：0 = 1080p，1 = 4K（可选参数，默认值为 0）
PP模式：0 = 仅 PP0，1 = 仅 PP1，2 = 双通道（可选参数，默认值为 0）

示例命令：

root@taco-dk:~# ppvdec decode_mode 1 1 1 10 1 0 0
Signal handlers installed (SIGINT, SIGTERM)
=== [ppvdec running from test_taco_vdec.c] ===
=== Legacy Mode Configuration ===
  Codec: H.265
  Resolution: 1080p
  PP Channel: Channel0 only
  Save YUV: Enabled
  Stream Files Used:
    H.264 1080p: /usr/data/vdec/stream_1920x1080.h264
    H.264 4K:    /usr/data/vdec/stream_3840x2160.h264
    H.265 1080p: /usr/data/vdec/stream_1920x1080.h265
    H.265 4K:    /usr/data/vdec/stream_3840x2160.h265
Starting single decode thread (no pthread)...
Thread 1 - PP Channel Configuration: mode=0, ch0=enabled, ch1=disabled
Thread 1 - Parsing stream file: /usr/data/vdec/stream_1920x1080.h265
Parsing H.265/HEVC format
Parsed 1000 frames from video file
Thread 1 Loop 1: FPS: 9.42
Thread 1: 1 frames, FPS: 9.23
PASS

输出日志解析：

解码线程配置：显示每个线程的配置信息，包括模式和通道状态。

Thread 1 - PP Channel Configuration: mode=0, ch0=enabled, ch1=disabled
Thread 1 - Parsing stream file: /usr/data/vdec/stream_1920x1080.h264

帧解析：显示从视频文件中解析的帧数。

Parsed 1000 frames from video file

解码性能：显示每个线程的解码帧率（FPS）。

Thread 1 Loop 1: FPS: 44.44
Thread 1: 10 frames, FPS: 43.90

4.3 NPU 模型运行

本例程基于官方 YOLO11 模型，经过优化和适配，可在 EM20-DK 硬件平台上高效运行，用于实现对 80 种常见物体的实时检测。本例程提供的预编译模型为 UINT8 量化模型（.nb 格式），旨在平衡检测精度与推理速度。项目代码通过 taRuntime 加载并执行 .nb 模型，利用 OpenCV 进行图像的预处理和后处理。

4.3.1 测试环境准备

准备一台 PC 作为 host 机，并配备 Ubuntu 系统和 Python 环境。EM20-DK 平台作为 device 机，已预装 Ubuntu 系统和 SDK。

访问 Model Zoo 官方 Gitee 或 Model Zoo 官方 Github，下载官方提供的算法示例。

以 YOLO11 模型为例，通过运行 samples/YOLO11/scripts/ 目录下的 download.sh 脚本，获取例程所需的模型、数据与脚本等内容。

chmod +x download.sh && ./download.sh

下载内容：

models/
├── datasets.txt
├── yolo11s_float16.nb
├── yolo11s.onnx
├── yolo11s_int8.nb
├── yolo11s_config_fp16.json
└── yolo11s_config_int8.json
test_images/  # 测试用图片
├── input1.jpg
├── input2.jpg
├── input3.jpg
├── input4.jpg
└── input5.jpg
datasets/
├── val2017_1000  # coco val2017中随机抽取的1000张样本
└── instances_val2017_1000.json # coco val2017中随机抽取的1000张样本对应的标注信息

通过 TACO SDK 搭建交叉编译环境，使用交叉编译工具链编译生成可执行文件 yolo11s_det_soc：

cd cpp
mkdir build && cd build
cmake ..
make 

在 EM20-DK 板端新建模型文件夹 yolo11，并通过 scp 命令将数据从 host 机复制到该目录下，复制完成后 yolo11 目录结构如下：

yolo11
├── test_images    # 测试集图片
│   ├── input1.jpg
├── models
│   └── yolo11s_int8.nb   # .nb 模型
|   └── yolo11s_float16.nb
└── yolo11s_det_soc # 例程程序

4.3.2 单图推理

YOLO11s UINT8 模型

在 yolo11 目录下运行 UINT8 模型：

root@taco-dk:~/yolo11# ./yolo11s_det_soc --input=test_images/input1.jpg --model=models/yolo11s_int8.nb
--------------------------------------
Single Image Inference Mode
Model: models/yolo11s_int8.nb
Input: test_images/input1.jpg
Output: output.jpg
Conf thresh: 0.25
NMS thresh: 0.45
--------------------------------------
Input num: 1, Output num: 3
--------------------------------------------------------------
    Tensor Attribute index:      |  0
    dim_count:                   |  4
    dim_size:                    |  [640, 640, 3, 1]
    data_format:                 |  3
    quant_format:                |  2
    quant_data (dfp):            
        fixed_point_pos:         |  998277230
    quant_data (affine):
        tf_scale:                |  0.003922
        tf_zero_point:           |  -128
    name:                        |  uid_30000_out_0
--------------------------------------------------------------
--------------------------------------------------------------
    Tensor Attribute index:      |  0
    dim_count:                   |  3
    dim_size:                    |  [6400, 144, 1]
    data_format:                 |  3
    quant_format:                |  2
    quant_data (dfp):            
        fixed_point_pos:         |  1044353412
    quant_data (affine):
        tf_scale:                |  0.187079
        tf_zero_point:           |  23
    name:                        |  uid_30001_out_0
--------------------------------------------------------------
--------------------------------------------------------------
    Tensor Attribute index:      |  1
    dim_count:                   |  3
    dim_size:                    |  [1600, 144, 1]
    data_format:                 |  3
    quant_format:                |  2
    quant_data (dfp):            
        fixed_point_pos:         |  1048615530
    quant_data (affine):
        tf_scale:                |  0.251178
        tf_zero_point:           |  66
    name:                        |  uid_30002_out_0
--------------------------------------------------------------
--------------------------------------------------------------
    Tensor Attribute index:      |  2
    dim_count:                   |  3
    dim_size:                    |  [400, 144, 1]
    data_format:                 |  3
    quant_format:                |  2
    quant_data (dfp):            
        fixed_point_pos:         |  1046139268
    quant_data (affine):
        tf_scale:                |  0.213690
        tf_zero_point:           |  70
    name:                        |  uid_30003_out_0
--------------------------------------------------------------
Model initialized successfully
[INFO] Using : DMA-BUF Heaps
[TACO] INFO: taco system initialized successfully
--------------------------------------
Detected 16 objects

===== Time Statistics =====
Image read time:    113.72 ms
Preprocess time:    89.49 ms
Inference time:     15.48 ms
Postprocess time:   57.66 ms
Total time:         276.36 ms
============================
Output saved to: output.jpg
--------------------------------------
Model deinitialized

输出日志解析：

使用模型：models/yolo11s_int8.nb
输入图像：test_images/input1.jpg（input1.jpg 可换成自定义 jpg 文件）
输出图像：output.jpg
输入图像的尺寸：640x640 像素（预处理后）
性能统计：
- 读图+解码: 113.72 ms，占比 44.19%
- 预处理：89.49 ms，占比 33.05%
- npu推理：15.48 ms，占比 5.07%
- 后处理：57.66 ms，占比 20.05%
- 总耗时：276.36 ms
检测结果：共 16 个对象，置信度 ≥ 0.25
Inference FPS：≈ 64.6 fps（1000/15.48）
端到端 FPS：≈ 3.6 fps（1000/276）
状态：PASS

生成结果

YOLO11s FP16 模型

在 yolo11 目录下运行 FP16 模型：

root@taco-dk:~/yolo11# ./yolo11s_det_soc --input=test_images/input1.jpg --model=models/yolo11s_float16.nb
--------------------------------------
Single Image Inference Mode
Model: models/yolo11s_float16.nb
Input: test_images/input1.jpg
Output: output.jpg
Conf thresh: 0.25
NMS thresh: 0.45
--------------------------------------
Input num: 1, Output num: 3
--------------------------------------------------------------
    Tensor Attribute index:      |  0
    dim_count:                   |  4
    dim_size:                    |  [640, 640, 3, 1]
    data_format:                 |  1
    quant_format:                |  0
    quant_data (dfp):            
        fixed_point_pos:         |  0
    quant_data (affine):
        tf_scale:                |  0.000000
        tf_zero_point:           |  0
    name:                        |  input/output[0]
--------------------------------------------------------------
--------------------------------------------------------------
    Tensor Attribute index:      |  0
    dim_count:                   |  3
    dim_size:                    |  [6400, 144, 1]
    data_format:                 |  1
    quant_format:                |  0
    quant_data (dfp):            
        fixed_point_pos:         |  0
    quant_data (affine):
        tf_scale:                |  0.000000
        tf_zero_point:           |  0
    name:                        |  uid_5_out_0
--------------------------------------------------------------
--------------------------------------------------------------
    Tensor Attribute index:      |  1
    dim_count:                   |  3
    dim_size:                    |  [1600, 144, 1]
    data_format:                 |  1
    quant_format:                |  0
    quant_data (dfp):            
        fixed_point_pos:         |  0
    quant_data (affine):
        tf_scale:                |  0.000000
        tf_zero_point:           |  0
    name:                        |  uid_4_out_0
--------------------------------------------------------------
--------------------------------------------------------------
    Tensor Attribute index:      |  2
    dim_count:                   |  3
    dim_size:                    |  [400, 144, 1]
    data_format:                 |  1
    quant_format:                |  0
    quant_data (dfp):            
        fixed_point_pos:         |  0
    quant_data (affine):
        tf_scale:                |  0.000000
        tf_zero_point:           |  0
    name:                        |  uid_3_out_0
--------------------------------------------------------------
Model initialized successfully
[INFO] Using : DMA-BUF Heaps
[TACO] INFO: taco system initialized successfully
--------------------------------------
Detected 16 objects

===== Time Statistics =====
Image read time:    90.98 ms
Preprocess time:    44.19 ms
Inference time:     24.61 ms
Postprocess time:   51.13 ms
Total time:         210.91 ms
============================
Output saved to: output.jpg
--------------------------------------
Model deinitialized

输出日志解析：

使用模型：models/yolo11_float16.nb
输入图像：test_images/input1.jpg（input1.jpg 可换成自定义 jpg 文件）
输出图像：output.jpg
输入图像的尺寸：640x640 像素（预处理缩放后）
性能统计：
- 读图+解码：90.98 ms，占比 44.19%
- 预处理：44.19 ms，占比 44.19%
- npu推理：24.61 ms，占比 24.61%
- 后处理：51.13 ms，占比 51.13%
- 总耗时：210.91 ms
检测结果：共 16 个对象（置信度 ≥ 0.25）
Inference FPS：≈ 40.7 fps（1000/24.61）
端到端FPS：≈ 40.7 fps
状态：pass

生成结果

4.3.3 模型性能评测

基于单图推理测试结果，我们对四个模型进行了性能对比分析：

性能对比表格

模型名称	输入图像	输出图像	读码解码时间	预处理时间	NPU推理时间	后处理时间	总耗时	INFERENCE FPS	端到端FPS
YOLO11s UINT8	input1.jpg	output.jpg	113.72	89.49	15.48	57.66	64.6	40.7
YOLO11s FP16	input1.jpg	output.jpg	90.98	44.19	24.61	51.13	3.6	4.7

总结

INT8 推理核心仅 15.5 ms，比 FP16 快 9 ms（≈ 59 %）。
端到端 INT8 快 24 %，主要节省在预处理量化与NPU 计算。
前后处理仍占 88 % → 后续优化应聚焦 pipeline 加速（DMA-BUF、双线程、降低输入分辨率）。

4.1 编码​

4.2 解码​

4.3 NPU 模型运行​

4.3.1 测试环境准备​

4.3.2 单图推理​

YOLO11s UINT8 模型​

YOLO11s FP16 模型​

4.3.3 模型性能评测​

性能对比表格​

总结​

4.1 编码

4.2 解码

4.3 NPU 模型运行

4.3.1 测试环境准备

4.3.2 单图推理

YOLO11s UINT8 模型

YOLO11s FP16 模型

4.3.3 模型性能评测

性能对比表格

总结