上一篇中,我们介绍了在 PC 如何使用 C++ 加载我们保存的模型并测试。这一篇,我们介绍在 PC 上交叉编译 aarch64 平台的 tensorflow 源码过程,这个难度比我想象的要难太多了。(耗时10天不止,一把心酸一把泪),首先看一下官方在文档介绍:
这里,我选择了 tensorflow 官方测试过支持 gcc 的最后版本 2.12.0。然后介绍下 PC 的配置:
然后看下 python3 和 bazel 的版本:
▸ python3 --version Python 3.11.9 ▸ bazel-5.3.0-linux-x86_64 --version bazel 5.3.0 ▸ aarch64-linux-gcc -v Using built-in specs. COLLECT_GCC=/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-gcc COLLECT_LTO_WRAPPER=/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/../libexec/gcc/aarch64-none-linux-gnu/13.3.1/lto-wrapper Target: aarch64-none-linux-gnu Configured with: /data/jenkins/workspace/GNU-toolchain/arm-13/src/gcc/configure --target=aarch64-none-linux-gnu --prefix= --with-sysroot=/aarch64-none-linux-gnu/libc --with-build-sysroot=/data/jenkins/workspace/GNU-toolchain/arm-13/build-aarch64-none-linux-gnu/install//aarch64-none-linux-gnu/libc --with-bugurl=https://bugs.linaro.org/ --enable-gnu-indirect-function --enable-shared --disable-libssp --disable-libmudflap --enable-checking=release --enable-languages=c,c++,fortran --with-gmp=/data/jenkins/workspace/GNU-toolchain/arm-13/build-aarch64-none-linux-gnu/host-tools --with-mpfr=/data/jenkins/workspace/GNU-toolchain/arm-13/build-aarch64-none-linux-gnu/host-tools --with-mpc=/data/jenkins/workspace/GNU-toolchain/arm-13/build-aarch64-none-linux-gnu/host-tools --with-isl=/data/jenkins/workspace/GNU-toolchain/arm-13/build-aarch64-none-linux-gnu/host-tools --enable-fix-cortex-a53-843419 --with-pkgversion='Arm GNU Toolchain 13.3.Rel1 (Build arm-13.24)'Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.3.1 20240614 (Arm GNU Toolchain 13.3.Rel1 (Build arm-13.24))
特别要说明的是,交叉编译工具联的版本也要选择合适的,否则会出现莫名奇妙的问题(现在这个版本都还存在问题:在 tensorflow 的 logging.h 这里,呜呜呜)。
基础性的介绍完了之后,我们开始正文。
交叉编译的目的是编译出对应的库和引用头文件,为什么选择交叉编译是因为 PC 性能强啊(实测在这台 PC 上一切顺利编译也要3h+,如果是本地编译可想而知那要多久啊)。首先要说明一下 tensorflow 使用 Bazel 构建。而如何在 bazel 环境下选择自己的交叉编译工具链呢?不像 makefile 或者 cmake 直接指定就行了,bazel 有点类似 GN,要配置好多东西,但是个人感觉比 GN 还要复杂,想哭。
刚开始,我走了好多弯路,一通晚上乱搜,最后搜的文章水平参差不齐导致徒劳浪费了不少时间,还对自己的能力产生了怀疑。最后还是根据官方文档,一点一点查漏补缺搭建好的交叉编译工具链的配置文件。首先,附上官方的指导文档连接 [Bazel Tutorial: Configure C++ Toolchains](Configuring C++ toolchains - Bazel 5.3.0)。
最后创建的涉及到交叉编译工具链的文档有两个:
这里要简单介绍下 bazel 构建工具的一些基本概念:
@myrepo//my/app/main:app_binary
这个标签,前半部分@myrepo
表示仓库的名字是 myrepo 仓库,如果本来就是在这个仓库下,那么可以进一步简写为//
,另外标签的第二部分my/app/main
表示这个包的名字,如果本来就在@myrepo//my/app/main
包下,可以简写为:app_binary
,进一步地,如果是文件类型的可以间写为app_binary
但是规则类型的还是要保留这个符号:
。至此,简单概括了 bazel 中涉及到的关键概念,下面结合配置交叉编译工具链添加的文件,我们可以知道在根仓库下新建了一个 toolchain 的包。看下这个包的 BUILD 文件:
package(default_visibility = ["//visibility:public"])# 定义了一个名字是 cross_gcc_suite 的 target,这个 target 是 C++ 工具链的集合cc_toolchain_suite(# name 和 toolchains 是必须项name ="cross_gcc_suite",# 声明 aarch64 平台对应的工具链名字是当前 package 下的 aarch64_toolchain 目标# 根据 --cpu 和 --compiler 选项选择工具链toolchains = {"aarch64":":aarch64_toolchain", }, )# 定义一个名字为 empty 的空 targetfilegroup(name ="empty")# 定义了一个名字是 aarch64_toolchain 的 c++ 工具链 targetcc_toolchain( name ="aarch64_toolchain", toolchain_identifier ="aarch64-toolchain",# 工具联的配置项toolchain_config =":aarch64_toolchain_config", all_files =":empty", compiler_files =":empty", dwp_files =":empty", linker_files =":empty", objcopy_files =":empty", strip_files =":empty", supports_param_files =0, )# 在当前 package 下的 cc_toolchain_config.bzl 中,导入 cc_toolchain_config 函数load(":cc_toolchain_config.bzl","cc_toolchain_config")# 通过 cc_toolchain_config 函数,定义一个名为 aarch64_toolchain_config 的 targetcc_toolchain_config(name ="aarch64_toolchain_config")
下面我们看下 toolchain 包下的 cc_toolchain_config.bzl 文件内容:
# toolchain/cc_toolchain_config.bzl:# 加载一些函数load("@bazel_tools//tools/build_defs/cc:action_names.bzl","ACTION_NAMES") load("@bazel_tools//tools/cpp:cc_toolchain_config_lib.bzl","feature","flag_group","flag_set","tool_path")# 定义变量all_link_actions = [# NEWACTION_NAMES.cpp_link_executable, ACTION_NAMES.cpp_link_dynamic_library, ACTION_NAMES.cpp_link_nodeps_dynamic_library, ] all_compile_actions = [ ACTION_NAMES.assemble, ACTION_NAMES.c_compile, ACTION_NAMES.clif_match, ACTION_NAMES.cpp_compile, ACTION_NAMES.cpp_header_parsing, ACTION_NAMES.cpp_module_codegen, ACTION_NAMES.cpp_module_compile, ACTION_NAMES.linkstamp_compile, ACTION_NAMES.lto_backend, ACTION_NAMES.preprocess_assemble, ] all_cpp_compile_actions = [ ACTION_NAMES.cpp_compile, ACTION_NAMES.cpp_header_parsing, ACTION_NAMES.cpp_module_codegen, ACTION_NAMES.cpp_module_compile, ]# 定义函数def_impl(ctx):tool_paths = [# NEWtool_path( name ="gcc",# 如下替换为指定的工具链的路径path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-gcc", ), tool_path( name ="g++", path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-g++", ), tool_path( name ="ld", path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-ld", ), tool_path( name ="ar", path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-ar", ), tool_path( name ="cpp", path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-cpp", ), tool_path( name ="gcov", path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-gcov", ), tool_path( name ="nm", path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-nm", ), tool_path( name ="objdump", path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-objdump", ), tool_path( name ="strip", path ="/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-strip", ), ] features= [# NEWfeature( name ="default_linker_flags", enabled =True, flag_sets = [ flag_set( actions = all_link_actions, flag_groups = ([ flag_group( flags = ["-static",#建议交叉编译还是带上这个参数,要不然无法在 PC 简单地正常执行 aarch64 elf 格式的工具"-lstdc++", ], ), ]), ), ], ), feature( name ="default_compiler_flags", enabled =True, flag_sets = [ flag_set( actions = all_cpp_compile_actions, flag_groups = ([ flag_group( flags = ["-fpermissive", ], ), ]), ), ], ), ]returncc_common.create_cc_toolchain_config_info( ctx = ctx, features = features, cxx_builtin_include_directories = [# NEW# 替换自己相关的头文件,这部分可以先空起来,在编译过程中提示缺少头文件的时候会告诉我们缺少哪些,到时候再追加也可以"/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/aarch64-none-linux-gnu/include/c++/13.3.1/bits","/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/aarch64-none-linux-gnu/include/c++/13.3.1","/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/aarch64-none-linux-gnu/libc/usr/include","/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/lib/gcc/aarch64-none-linux-gnu/13.3.1/include","/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/lib/gcc/aarch64-none-linux-gnu/13.3.1/include-fixed", ], toolchain_identifier ="aarch64-linux", host_system_name ="x86_64", target_system_name ="aarch64", target_cpu ="aarch64", target_libc ="unknown", compiler ="g++", abi_version ="unknown", abi_libc_version ="unknown", tool_paths = tool_paths,# NEW)# 定义了一个新的规则cc_toolchain_config = rule(# 规则的实现函数implementation = _impl, attrs = {}, provides = [CcToolchainConfigInfo], )
有了这两个文件,交叉编译工具链就配置好了。下面为了方便使用,修改 .bazelrc 文件,追加如下两行,可以看到使用 :
build:elinux_aarch64 --crosstool_top=//toolchain:cross_gcc_suite build:elinux_aarch64 --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
这里使用的 --crosstool_top 和 --host_crosstool_top 是 Bazel 中用于指定交叉编译工具和主机编译工具链配置的重要参数,这里要注意必须要使用 --host_crosstool_top 选项指定一个默认的 PC(或者 k8)平台的工具链,要不然会报错的。
然后和 PC 端那样,首先是./configure
然后就是触发构建了,具体命令如下:
bazel build --config=elinux_aarch64 --copt="-fPIC"--cxxopt="-fPIC"--verbose_failures //tensorflow:libtensorflow.so //tensorflow:install_headers
在编译过程中,我使用的工具链需要修改如下地方:
diff --git a/tensorflow/tsl/platform/default/logging.h b/tensorflow/tsl/platform/default/logging.hindex 3578bedf0f1..24c74607a96 100644--- a/tensorflow/tsl/platform/default/logging.h+++ b/tensorflow/tsl/platform/default/logging.h@@ -310,7 +310,7 @@inline uint64 GetReferenceableValue(uint64 t) { return t; } // it uses the definition for operator< < , with a few special cases below. template < typename T > inline void MakeCheckOpValueString(std::ostream* os, const T& v) {- // (*os) < < v;+ //(*os) < < v;} // Overrides for char types provide readable values for unprintable
这部分后续应该可以不用这么修改(暂时还没有解决)。
然后最重要的是在后期,静态连接 libtensorflow_framework.so.2.12.0 的时候会提示错误:
这时候需要强制动态链接才行,怎么做呢? diff 文件是:
diff --git a/tensorflow/BUILD b/tensorflow/BUILDindex 0d27a8294f5..adcdef8f8a9 100644--- a/tensorflow/BUILD+++ b/tensorflow/BUILD@@ -1100,6 +1100,8 @@tf_cc_shared_library( ], "//conditions:default": [ "-Wl,--version-script,$(location //tensorflow:tf_framework_version_script.lds)",+ "-Wl,-Bdynamic",], }), linkstatic = 1,
下面构建的时候又出现新的错误,动态链接交叉编译出来的工具无法正常生成一些链接 libtensorflow 库的文件,这时候我通过sudo chrpath -r
进行处理一下,记得要提前将需要的库放在一个指定的目录,还是很复杂的。
具体集合到 tensorflow 的编译环境的 diff 文件是:
diff --git a/tensorflow/tensorflow.bzl b/tensorflow/tensorflow.bzlindex 115ff76b414..2fb733de643 100644--- a/tensorflow/tensorflow.bzl+++ b/tensorflow/tensorflow.bzl@@ -1113,12 +1113,13 @@def tf_gen_op_wrapper_cc( ], srcs = srcs, tools = [":" + tool] + tf_binary_additional_srcs(),- cmd = ("$(location :" + tool + ") $(location :" + out_ops_file + ".h) " ++ cmd = ("echo "xxx" | sudo -S chrpath -r /home/red/Rcc/lib64 " + "$(location :" + tool + ")" + ";" + "$(location :" + tool + ") $(location :" + out_ops_file + ".h) " +"$(location :" + out_ops_file + ".cc) " + str(include_internal_ops) + " " + api_def_args_str), compatible_with = compatible_with,
至此,关键的修改就是这些了,其他编译过程中需要的修改,根据提示改一下就好了。
看一下最后编译成功的截图:
至此就有了开发 aarch64 平台 tensorflow 开发相关的库和头文件了:
这个库文件真够大的,哈哈。然后我们测试下,例程还是用上一篇的一个加载光照度模型并打印预测值的C++代码:
#include< tensorflow/cc/saved_model/loader.h >usingnamespacetensorflow;usingnamespacestd;intmain(){ SessionOptions options; RunOptions run_options; SavedModelBundle bundle; Status status = LoadSavedModel(options, run_options,"/home/red/Downloads/fivek_dataset/test_mark_illuminance_level/illu_v03", {"serve"}, &bundle);if(!status.ok()) {std::cerr< <"Error loading model: "< < status.ToString() < <std::endl;return1; }// Access the sessionSession* session = bundle.session.get();// Create input tensorTensorinput_tensor(DT_FLOAT, TensorShape({1,255,255,3}));// Fill input tensor with dataautoinput_tensor_flat = input_tensor.flat<float>();std::cout< <"size of input tensor is "< < input_tensor_flat.size() < <std::endl;for(inti =0; i < input_tensor_flat.size(); ++i) { input_tensor_flat(i) =255.0; }// Run inferencestd::vector< Tensor > outputs; Status run_status = session- >Run({{"serving_default_rescaling_input", input_tensor}}, {"StatefulPartitionedCall"}, {}, &outputs);if(!run_status.ok()) {std::cerr< <"Error running model: "< < run_status.ToString() < <std::endl;return1; }constEigen::TensorMap< Eigen::Tensor<float,1, Eigen::RowMajor >, Eigen::Aligned >& prediction = outputs[0].flat<float>();constlongcount = prediction.size();for(inti =0; i < count; ++i) {constfloatvalue = prediction(i);// value是该张量以一维数组表示时在索引i处的值。std::cout< <"hey hey "< < value < <std::endl; }// Process output tensorTensor ans = outputs[0];// auto ans_value = ans.tensor< float, 1 >();autoans_value = ans.tensor<float,2>();std::cout< < ans_value(0,0) < <std::endl;return0; }
对应的 Makefile 文件要改一下:
CROSS_COMPILE:=/home/red/Samba/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux- TARGET=tfcpp CFLAGS:=-I/home/red/.cache/bazel/_bazel_red/81f6b3978d226a63c6d017ab1c0efa9f/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/include/ CFLAGS+=-I/home/red/.cache/bazel/_bazel_red/81f6b3978d226a63c6d017ab1c0efa9f/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/include/src CFLAGS+=-I/home/red/.cache/bazel/_bazel_red/81f6b3978d226a63c6d017ab1c0efa9f/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/include/_virtual_includes/float8/ CFLAGS+=-I/home/red/.cache/bazel/_bazel_red/81f6b3978d226a63c6d017ab1c0efa9f/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/include/_virtual_includes/int4/ LDFLAGS:=-L/home/red/.cache/bazel/_bazel_red/81f6b3978d226a63c6d017ab1c0efa9f/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow -ltensorflow_framework LDFLAGS+=-L/home/red/.cache/bazel/_bazel_red/81f6b3978d226a63c6d017ab1c0efa9f/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow -ltensorflow$(TARGET):$(TARGET).cpp$(CROSS_COMPILE)g++$(CFLAGS)$(LDFLAGS)$^-o$@clean:rm -frv$(TARGET)
编译生成测试程序并测试对一个全白色图片的测试结果
可以看到和上一篇 PC 端对比的结果还是很一致的。
全部0条评论
快来发表一下你的评论吧 !