tensorflow + sionna 安装踩坑记录(待补充)

news/2025/2/26 5:42:23

1.本人基础配置

cpu笔记本一台,使用mobaxterm远程控制gpu服务器, 没有sudo权限。

2.Tensorflow安装

请打开官方英文版安装介绍 https://tensorflow.google.cn/install/pip,中文版可能会缺失部分提示信息。

conda create -n tf_sionna python==3.8 #创建新的虚拟环境
conda activate tf_sionna #激活新建立的虚拟环境
pip install tensorflow-gpu==2.10 #安装tensorflow  #似乎还可以写tensorflow[and-cuda]?

使用pip list 查看tf安装结果

conda activate tf_sionna
python
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

检验自己配置环境的结果如下,虽然目前显示用不了GPU,但在跑代码时还是可以用的。具体原因我还没有弄明白,待补充吧!
检验tf是否安装成功

3.Sionna安装

pip install sionna

安装sionna后的环境包安装sionna后的环境包-续
安装sionna时又自动安装了tensorflow2.13.1。两种版本同时存在真是一件奇妙的事情!

4.运行的代码(待补充)

import os, json
import tensorflow as tf
import numpy as np
import random
#import seaborn as sns
import sionna
import matplotlib.pyplot as plt
import pickle
from tensorflow.keras import Model
from tensorflow.keras.layers import Layer, Conv2D, LayerNormalization
from tensorflow.nn import relu

from sionna.channel.tr38901 import Antenna, AntennaArray, CDL, UMa
from sionna.channel import OFDMChannel, GenerateOFDMChannel, ApplyOFDMChannel
from sionna.ofdm import ResourceGrid

gpu_num = 0  # Use "" to use the CPU
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
gpus = tf.config.list_physical_devices('GPU')
print(gpus)
if gpus:
    try:
        tf.config.experimental.set_memory_growth(gpus[0], True)
    except RuntimeError as e:
        print(e)
# Avoid warnings from TensorFlow
tf.get_logger().setLevel('ERROR')

5.运行结果

(1)在实验室服务器已有的tf2.5环境中添加pip install sionna后,输出结果如下:

/gpu01/miniconda3/envs/tf2.5/bin/python3 /gpu03/gaosongling/RailwayScenario/gen_dataset.py 
2025-02-25 11:27:23.147936: I tensorflow/core/platform/cpu_feature_guard.cc:193] 
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2025-02-25 11:27:24.299367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] 
Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/clustertech/chess/ng/bin

2025-02-25 11:27:24.299498: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64]
 Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/clustertech/chess/ng/bin
 
2025-02-25 11:27:24.299524: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38]
 TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
 
2025-02-25 11:27:26.051376: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] 
Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/clustertech/chess/ng/bin

2025-02-25 11:27:26.051438: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265]
 failed call to cuInit: UNKNOWN ERROR (303)
 
2025-02-25 11:27:26.051497: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] 
kernel driver does not appear to be running on this host (mgmt): /proc/driver/nvidia/version does not exist
[]
2025-02-25 11:27:26.052310: I tensorflow/core/platform/cpu_feature_guard.cc:193] 
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P0 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P1 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P2 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P3 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P4 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P5 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P6 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P7 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P8 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P9 finished

Process finished with exit code 0

(2)在自建的tf_sionna环境下,输出结果如下:

2025-02-25 11:42:29.949793: I tensorflow/core/platform/cpu_feature_guard.cc:182] 
This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

2025-02-25 11:42:33.111435: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] 
TF-TRT Warning: Could not find TensorRT

2025-02-25 11:42:43.428919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] 
Created device /job:localhost/replica:0/task:0/device:GPU:0 with 617 MB memory:  -> device: 0, name: NVIDIA RTX A4000, pci bus id: 0000:c4:00.0, compute capability: 8.6
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P0 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P1 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P2 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P3 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P4 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P5 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P6 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P7 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P8 finished
(1, 256, 1, 2) (1, 256, 1, 2)
generate train data P9 finished

虽然在测试tensorflow时显示没有gpu,但在运行时还是可以用的,这点很令人抓狂。我重装了好几次环境,都无法在测试tf gpu时得到一个肯定的答案,一度要自闭了,并且tensorflow官方的文档我认为写的一点也不细致,不像pytorch那样明了。后来问了同门到底要怎么配环境,他说在实验室已有的tf2.5上加个sionna就能用,我添加之后确实可以运行代码,于是我又返回虚拟环境中测试是否可以使用gpu,但同样检测不到。可能,在服务器上配环境和在本地有gpu的电脑上还是不完全一样。此外tf2.5中的版本也不是2.5,而是2.11,如下图所示,由于没有提前记录tf2.5初始环境版本,不排除是因为添加了sionna导致tf版本升级。。。
tf2.5环境包

6.可能的报错

在我的安装过程中,如果安装时没有强调tensorflow-gpu, 比如直接pip install tensorflow,在运行代码时主要会报出如下错误:

ImportError: jit_init_thread_state(): the LLVM backend is inactive because the LLVM shared library ("libLLVM.so") 
could not be found! Set the DRJIT_LIBLLVM_PATH environment variable to specify its path.

我从tenserflow官网看到,尝试去https://github.com/llvm/llvm-project/releases/tag/llvmorg-19.1.0下载压缩包,但我并不知道应该下载哪个,最后我选择了两种有linux后缀的压缩包下载,但在本地解压时总会说XXX文件缺失权限解压失败类似的话,没能解决。ps: https://github.com/llvm/llvm-project/releases/tag/llvmorg-17.0.2。
https://github.com/NVlabs/sionna/discussions/296是对这个问题的一种解答,但也没有解释详细,所以我依旧没能解决。

Finally, 完整报错信息如下:

2025-02-24 23:11:59.500085: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-24 23:11:59.569298: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-24 23:11:59.569393: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-24 23:11:59.570989: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-24 23:11:59.582707: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-24 23:11:59.583154: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-24 23:12:02.438310: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-02-24 23:12:08.482323: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Traceback (most recent call last):
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/mitsuba/__init__.py", line 107, in __getattribute__
    _import('mitsuba.mitsuba_' + variant + '_ext'),
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 666, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 565, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1108, in create_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: jit_init_thread_state(): the LLVM backend is inactive because the LLVM shared library ("libLLVM.so") could not be found! Set the DRJIT_LIBLLVM_PATH environment variable to specify its path.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/gpu03/gaosongling/RailwayScenario/gen_dataset.py", line 6, in <module>
    import sionna
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/sionna/__init__.py", line 18, in <module>
    from . import rt
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/sionna/rt/__init__.py", line 29, in <module>
    mi.set_variant('llvm_ad_rgb')
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/mitsuba/__init__.py", line 317, in set_variant
    _import('mitsuba.ad.integrators')
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/mitsuba/python/ad/__init__.py", line 2, in <module>
    from .integrators import *
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/mitsuba/python/ad/integrators/__init__.py", line 25, in <module>
    importlib.import_module('mitsuba.ad.integrators.' + name)
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/mitsuba/python/ad/integrators/common.py", line 8, in <module>
    class ADIntegrator(mi.CppADIntegrator):
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/mitsuba/__init__.py", line 253, in __getattribute__
    result = module.__getattribute__(key)
  File "/home/gaosongling/.conda/envs/tf_sionna/lib/python3.9/site-packages/mitsuba/__init__.py", line 115, in __getattribute__
    raise AttributeError(e)
AttributeError: jit_init_thread_state(): the LLVM backend is inactive because the LLVM shared library ("libLLVM.so") could not be found! Set the DRJIT_LIBLLVM_PATH environment variable to specify its path.

7.Reference

TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.https://blog.csdn.net/m0_66233032/article/details/134606306?fromshare=blogdetail&sharetype=blogdetail&sharerId=134606306&sharerefer=PC&sharesource=m0_61175448&sharefrom=from_link
Tensorflow解决“TF-TRT Warning: Could not find TensorRT”的问题https://blog.csdn.net/weixin_45710350/article/details/140232873?fromshare=blogdetail&sharetype=blogdetail&sharerId=140232873&sharerefer=PC&sharesource=m0_61175448&sharefrom=from_link
import tensorflow as tf,但是Could not find TensorRThttps://blog.csdn.net/m0_54377950/article/details/145753869?fromshare=blogdetail&sharetype=blogdetail&sharerId=145753869&sharerefer=PC&sharesource=m0_61175448&sharefrom=from_link


http://www.niftyadmin.cn/n/5868083.html

相关文章

《白帽子讲Web安全》学习:深入解析Cookie与会话安全

目录 导言 一、Cookie 的原理与作用 二、Cookie 面临的安全风险 三、Cookie的核心安全属性 1.Domain 属性 2.Path 属性 3.Expires 属性 4.HttpOnly 属性 5.Secure 属性 6.SameSite 属性 7.SameParty 属性 四、安全使用Cookie 1.正确设置Cookie属性值 2.Cookie前缀…

C语言(13)------------>do-while循环

1.do-while循环的语法 我们知道C语言有三大结构&#xff0c;顺序、选择、循环。我们可以使用while循环、for循环、do-while循环实现循环结构。之前的博客中提及到了前两者的技术实现。可以参考&#xff1a; C语言&#xff08;11&#xff09;-------------&#xff1e;while循…

【C/C++】理解C++内存与Linux虚拟地址空间的关系---带你通透C++中所有数据

每日激励&#xff1a;“不设限和自我肯定的心态&#xff1a;I can do all things。 — Stephen Curry” 绪论&#xff1a; 本质编写的原因是我在复习过程中突然发现虚拟地址空间和C内存划分我好想有点分不清时&#xff0c;进行查询各类资料和整理各类文章后得出的文章&#xff…

django filter 不等于

然&#xff0c;我很乐意帮助你解决关于Django Filter的问题。首先&#xff0c;请确保你具体指的是Django的django-filter库&#xff0c;这是一个非常流行的第三方库&#xff0c;用于在Django项目中提供更复杂的搜索和过滤功能。 不等于的过滤 如果你需要在Django中使用django-…

Vue 报错error:0308010C:digital envelope routines::unsupported 解决方案

Vue 报错error:0308010C:digital envelope routines::unsupported 解决方案 拿了一个比较老的项目部署在本地&#xff0c;然后先安装依赖npm install,最后npm run serve,在run serve的时候报错&#xff1a;报错error:0308010C:digital envelope routines::unsupported&#xff…

【MySQL】表连接原理

目录 1、背景2、环境3、表连接原理【1】驱动表和被驱动表【2】内连接【3】外连接【4】嵌套循环连接【5】join buffer 4、总结 1、背景 在进行sql查询时有时需要多张表的查询结果组成一个共同的结果返回&#xff0c;这时就用到了mysql中连接的用法&#xff0c;接下来就以两张表…

Maven 从下载到实战:一站式配置与使用指南

一、Maven 简介 Maven 是一款基于 POM&#xff08;Project Object Model&#xff09; 的 Java 项目管理工具&#xff0c;支持依赖管理、构建自动化、标准化项目结构等功能。其核心优势包括&#xff1a; 依赖管理&#xff1a;自动下载和管理第三方库&#xff08;JAR 包&#xf…

数据中心“失宠”与AI算力争夺加剧的深度剖析与未来展望

一、案例分析&#xff1a;微软取消数据中心租约事件 1.1 事件回顾 2025 年2月&#xff0c;微软取消数据中心租约这一事件在科技行业引起轩然大波。据投行 TD Cowen 的报告显示&#xff0c;微软通过对供应链渠道的深入调查&#xff0c;发现微软已取消了总计 “数百兆瓦” 容量…