分类 linux 下的文章

pycharm中deepspeed分布式debug 方法|torchrun分布式debug方法

作者: admin
时间: 2024-03-14
分类: linux
评论

一、pycharm中运行/调试torch分布式训练

整体比较简单，可以参考：我下面的二、pycharm中运行/调试deepspeed分布式训练
关键步骤为：
软链接distributed文件
通过对调用分布式的命令分析，我们首先需要找到torch.distributed.launch这个文件，并将它软链接到我们的Pycharm项目目录下。为什么使用软链接而不是直接复制呢？因为软链接不会变更文件的路径，从而使得launch.py文件可以不做任何改动的情况下去import它需要的包。

在Ubuntu中，通过以下命令创建软链接

ln -s /yourpython/lib/python3.6/site-packages/torch/distributed/ /yourprogram/
以上命令没有直接链接launch.py而是它的父目录distributed，是因为这样比较容易知道launch.py是一个软链接，不与项目中的其他文件混淆。

设置Pycharm运行参数
打开Pycharm，依次点击Run->Edit Configurations 进入参数配置界面
微信截图_20240314134956.png

只需要配置Script path为launch.py路径；Parameters为launch.py运行参数，参考命令行调用的方法，设置如下。

--nproc_per_node=4
tools/train.py --cfg xxx.yaml
通过以上步骤就可以在Pycharm中运行分布式训练了。不过，如果是在调试模型最好还是修改一下trian.py文件，通过单GPU方式调试，并不是说分布式模式不能调试，仅仅是因为在单GPU方式下，对于数据流更好把控，减少调试时间

二、pycharm中运行/调试deepspeed分布式训练

1.pycharm版本

我用的是2020.1

2.环境

(1)首先服务器上需要配好对应的环境，并且保留代码一份，以及对应的模型，用conda配好虚拟环境
(2)本地只需要拷贝对应的代码和数据，不用拷贝模型
(3)代码，我主要是复现大语言模型的预训练,https://github.com/hiyouga/LLaMA-Factory,其他代码同理
最主要的启动脚本,其他代码也是同理，主要是使用deepspeed启动就行。
pretarin.sh

deepspeed  --master_port=9901 src/train_bash.py \
    --deepspeed ./ds_config.json \
    --stage pt \
    --do_train \
    --model_name_or_path ../Yi-34B  \
    --dataset input_test \
    --finetuning_type full \
    --lora_target q_proj,v_proj \
    --output_dir Yi-34B_output_test \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 5 \
    --save_steps 300 \
    --learning_rate 5e-5 \
    --num_train_epochs 1.0 \
    --preprocessing_num_workers 20 \
    --plot_loss \
    --bf16

3.本地配置：deepspeed安装

将虚拟环境的deepspeed安装包压缩，拷贝到本地。
我服务器虚拟环境的位置：/home/centos/anaconda3/envs/factory/lib/python3.10/site-packages/deepspeed/，
将deepspeed包压缩为zip包，拷贝到本地项目目录：D:codeLLaMA-Factory 然后解压，D:codeLLaMA-Factorydeepspeed

4.远程配置：软链接

查看：vim /home/centos/anaconda3/envs/factory/bin/deepspeed 可以知道实际使用的hideepspeed.launcher.runner
文件，

通过对调用分布式的命令分析，我们首先需要找到deepspeed.launcher.runner这个文件，并将它软链接到我们的Pycharm项目目录下。为什么使用软链接而不是直接复制呢？因为软链接不会变更文件的路径，从而使得runner.py文件可以不做任何改动的情况下去import它需要的包。

在centos中，通过以下命令创建软链接

ln -s /home/centos/anaconda3/envs/factory/lib/python3.10/site-packages/deepspeed/  /data/liulei/cpt/LLaMA-Factory/

如果要删除软连接用：

unlink  /data/liulei/cpt/LLaMA-Factorydeepspeed/

5.pycharm配置

配置本地代码用远程服务器的python解析器：

(1)从set里面进去
微信截图_.png

(2)add新的解析器
微信截图_20240314111021.png

(3)通过ssh 添加远程的，写Ip和用户名，后面输入密码

(4)从远程服务器中选择解析器、远程服务器代码和本地代码进行映射

(5)配置debug执行入口命令

(6)脚本、参数、python解析器、代码路径配置

微信截图_20240314113559.png

入口脚本：D:codeLLaMA-Factorydeepspeedlauncherrunner.py
参数:注意这里面的脚步，就是上面的pretrain.sh启动脚本，但是需要把deepspeed命令去掉，并且把反斜杠去掉，且要把默认参数True 补充完整。


--master_port=9901  src/train_bash.py      --deepspeed ./ds_config.json      --stage pt     --do_train True      --model_name_or_path ../Yi-34B-Chat      --dataset input_test      --finetuning_type full      --lora_target q_proj,v_proj      --output_dir path_to_pt_checkp1      --overwrite_cache True      --per_device_train_batch_size 1      --gradient_accumulation_steps 1      --lr_scheduler_type cosine      --logging_steps 1     --save_steps 100      --learning_rate 5e-5      --num_train_epochs 1.0      --plot_loss True      --fp16

完成就可以进行debug了，然后你就发现在实际运行的时候还是会报错的。需要几个地方的代码修改。

(7)本地对应代码修改

问题：1

ssh://centos@18:22/home/centos/anaconda3/envs/factory/bin/python -u /home/centos/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 0.0.0.0 --port 34567 --file /data/liulei/cpt/LLaMA-Factory/deepspeed/launcher/runner.py --master_port=9901 src/train_bash.py --deepspeed ./ds_config.json --stage pt --do_train True --model_name_or_path ../Yi-34B-Chat --dataset input_test --finetuning_type full --lora_target q_proj,v_proj --output_dir path_to_pt_checkp1 --overwrite_cache True --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --logging_steps 1 --save_steps 100 --learning_rate 5e-5 --num_train_epochs 1.0 --plot_loss True --fp16
/home/centos/.pycharm_helpers/pydev/pydevd.py:1806: DeprecationWarning: currentThread() is deprecated, use current_thread() instead
  dummy_thread = threading.currentThread()
pydev debugger: process 232478 is connecting
Connected to pydev debugger (build 201.6668.115)
Traceback (most recent call last):
  File "/home/centos/.pycharm_helpers/pydev/pydevd.py", line 1438, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/centos/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/data/liulei/cpt/LLaMA-Factory/deepspeed/launcher/runner.py", line 24, in <module>
    from .multinode_runner import PDSHRunner, OpenMPIRunner, MVAPICHRunner, SlurmRunner, MPICHRunner, IMPIRunner
ImportError: attempted relative import with no known parent package

解决办法：

将远程文件/data/liulei/cpt/LLaMA-Factory/deepspeed/launcher/runner.py 映射的本地文件D:\code\LLaMA-Factory\deepspeed\launcher\runner.py进行修改

修改前：

from .multinode_runner import PDSHRunner, OpenMPIRunner, MVAPICHRunner, SlurmRunner, MPICHRunner, IMPIRunner
from .constants import PDSH_LAUNCHER, OPENMPI_LAUNCHER, MVAPICH_LAUNCHER, SLURM_LAUNCHER, MPICH_LAUNCHER, IMPI_LAUNCHER
from ..constants import TORCH_DISTRIBUTED_DEFAULT_PORT
from ..nebula.constants import NEBULA_EXPORT_ENVS
from ..utils import logger

from ..autotuning import Autotuner
from deepspeed.accelerator import get_accelerator


修改后：
from deepspeed.launcher.multinode_runner import PDSHRunner, OpenMPIRunner, MVAPICHRunner, SlurmRunner, MPICHRunner, IMPIRunner
from deepspeed.launcher.constants import PDSH_LAUNCHER, OPENMPI_LAUNCHER, MVAPICH_LAUNCHER, SLURM_LAUNCHER, MPICH_LAUNCHER, IMPI_LAUNCHER
from deepspeed.constants import TORCH_DISTRIBUTED_DEFAULT_PORT
from deepspeed.nebula.constants import NEBULA_EXPORT_ENVS
from deepspeed.utils import logger

from deepspeed.autotuning import Autotuner

然后重新运行debug，就发现本地可以愉快的跑起来了。

nginx配置websocket

作者: admin
时间: 2023-12-15
分类: linux
评论

centos安装nginx

yum -y install nginx

CentOS系统中Nginx的默认安装目录为/etc/nginx。

如果需要修改Nginx的配置文件，可以使用vi或者nano等编辑器打开该目录下的nginx.conf文件进行编辑。

示例代码（在命令行中输入）

vim /etc/nginx/nginx.conf

启动、停止、重启Nginx服务


systemctl start nginx   # 启动Nginx
systemctl stop nginx    # 停止Nginx
systemctl restart nginx # 重启Nginx

nginx 日志

/var/log/nginx/error.log 
/var/log/nginx/access.log

配置websocket

vim /etc/nginx/nginx.conf
配置文件如下：

# For more information on configuration, see:
#   * Official English Documentation: http://nginx.org/en/docs/
#   * Official Russian Documentation: http://nginx.org/ru/docs/

user root;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;

# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;

events {
    worker_connections 1024;
}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   65;
    types_hash_max_size 2048;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    # Load modular configuration files from the /etc/nginx/conf.d directory.
    # See http://nginx.org/en/docs/ngx_core_module.html#include
    # for more information.
    include /etc/nginx/conf.d/*.conf;


map $http_upgrade $connection_upgrade {
    default upgrade;
    '' close;
}

upstream wsbackend{
    server 192.168.17.188:9005;
    server 192.168.17.188:9006;
    keepalive 1000;
}

    server {
        listen       8009 default_server;
        #listen       [::]:80 default_server;
        server_name  localhost;
        root         /usr/share/nginx/html;

        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
      proxy_pass http://wsbackend; 
          proxy_http_version 1.1;
          proxy_read_timeout   3600s; # 超时设置
          # 启用支持websocket连接
          proxy_set_header Upgrade $http_upgrade;
          proxy_set_header Connection "upgrade";
        }

        error_page 404 /404.html;
            location = /40x.html {
        }

        error_page 500 502 503 504 /50x.html;
            location = /50x.html {
        }
    }

# Settings for a TLS enabled server.
#
#    server {
#        listen       443 ssl http2 default_server;
#        listen       [::]:443 ssl http2 default_server;
#        server_name  _;
#        root         /usr/share/nginx/html;
#
#        ssl_certificate "/etc/pki/nginx/server.crt";
#        ssl_certificate_key "/etc/pki/nginx/private/server.key";
#        ssl_session_cache shared:SSL:1m;
#        ssl_session_timeout  10m;
#        ssl_ciphers PROFILE=SYSTEM;
#        ssl_prefer_server_ciphers on;
#
#        # Load configuration files for the default server block.
#        include /etc/nginx/default.d/*.conf;
#
#        location / {
#        }
#
#        error_page 404 /404.html;
#            location = /40x.html {
#        }
#
#        error_page 500 502 503 504 /50x.html;
#            location = /50x.html {
#        }
#    }

}

重要的是这两行，它表明是websocket连接进入的时候，进行一个连接升级将http连接变成websocket的连接。

proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

proxy_read_timeout; 表明连接成功以后等待服务器响应的时候，如果不配置默认为60s；
proxy_http_version 1.1; 表明使用http版本为1.1

遇到的问题：

2023/12/18 10:59:30 [crit] 626773#0: *1 connect() to :9006 failed (13: Permission denied) while connecting to upstream, client: , server: localhost, request: "GET / HTTP/1.1", upstream: "http://192:9006/", host: ":8009

解决办法：
1.nginx.conf的开头改为：user root;
2.关闭SeLinux
临时关闭（不用重启机器）

setenforce 0

参考：
https://www.jianshu.com/p/6205c8769e3c
https://blog.csdn.net/lazycheerup/article/details/117323466

centos安装cuda|gpu驱动

作者: admin
时间: 2023-08-23
分类: linux
评论

一、下载安装包

此处的安装环境为离线环境，需要先下载cuda安装文件，安装文件可以去官网地址下载对应的系统版本。官网下载地址：https://developer.nvidia.com/cuda-toolkit-archive

驱动和cuda版本对应：
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

查看版本
- cat /etc/redhat-release
  - CentOS Linux release 7.6.1810 (Core)
- uname -r
  - 3.10.0-957.el7.x86_64
GPU
- lspci | grep -i nvidia
- 01:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
  01:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
- 驱动版本
  - sudo dpkg --list | grep nvidia-*
- 驱动下载
  - https://www.nvidia.cn/drivers/results/184427/
- （安装教程）
  - https://blog.csdn.net/dechengtju/article/details/80146248
- 下载与系统内核版本对应的kernel-devel、kernel-headers
  - http://rpm.pbone.net/
- 问题：
  - Error 'An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel' when trying to get GPU support in AWS EMR
    - https://stackoverflow.com/questions/66680948/error-an-nvidia-kernel-module-nvidia-appears-to-already-be-loaded-in-your-ker
    - sudo lsof /dev/nvidia*
      - Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel
        解决：yum install elfutils-libelf-devel (centos8)
      -查看内核版本是否一致：uname -r和在usr/src下的版本号不一致：http://jiaocheng.bubufx.com/info-show-1012538.html
      1.1.安装c编译器 yum install gcc　　
      2.1安装kernel-devel yum install kernel-devel
      3.检查kernel跟kernel-devel的版本号 uname -r | rpm -q kernel
      4.4.两个版本号不一致，进行升级 yum -y update kernel kernel-devel
      5.再次检查版本号，还不一致，需要重启。 reboot

关闭 X server
- systemctl stop gdm.service
开启 X server
- systemctl start gdm.service
The Nouveau kernel driver is currently in use by your system
- 查看
  - lsmod | grep nouveau
- https://blog.csdn.net/qq_37296212/article/details/114265216
  -
./NVIDIA-Linux-x86_64-495.46.run --kernel-source-path=/usr/src/kernels/$(uname -r) -k $(uname -r) --dkms -s
-
cuda 安装
- https://blog.csdn.net/sinat_32724581/article/details/106807070
- 环境变量
  - vim ~/.bashrc
  - 下面加进去有
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
  export PATH=$PATH:/usr/local/cuda/bin
  export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
  source一下
  - source ~/.bashrc
cudnn 安装
- https://blog.csdn.net/zhouchen1998/article/details/107778087
- 验证：
  - cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
    -
coda 安装
- https://blog.csdn.net/wyf2017/article/details/118676765

问题 centos7

Using built-in stream user interface
-> Detected 32 CPUs online; setting concurrency level to 32.
-> The file '/tmp/.X0-lock' exists and appears to contain the process ID '2647' of a running X server.
ERROR: You appear to be running an X server; please exit X before installing.  For further details, please see the section INSTALLING THE NVIDIA DRIVER in the README available on the Linux driver download page at www.nvidia.com.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

解决：
systemctl stop gdm.service

问题 centos8

-> Detected 128 CPUs online; setting concurrency level to 32.
-> Tagging shared libraries with chcon -t textrel_shlib_t.
ERROR: An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel.  This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading.  Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver.  If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occurred that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
~

解决：
GPU正在使用，关闭正在使用的GPU

问题 centos 8

Using built-in stream user interface
-> Detected 128 CPUs online; setting concurrency level to 32.
-> Tagging shared libraries with chcon -t textrel_shlib_t.
ERROR: An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel.  This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading.  Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver.  If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occurred that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

解决：
GPU正在使用，关闭正在使用的GPU,通过命令：
sudo lsof /dev/nvidia*
kill -9 pid

问题：

sh ./cuda_11.6.0_510.39.01_linux.run
Extraction failed.
Ensure there is enough space in /tmp and that the installation package is not corrupt
Signal caught, cleaning up

没有安装解压软件
yum install tar

问题 centos

nvidia-smi 中可以显示GPU，但是torch.cuda.is_available() 出现如下错误：

UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling

Error 802: system not yet initialized

centos8中解决办法

注意驱动版本要对应上

wget https://developer.download.nvidia.cn/compute/cuda/repos/rhel8/x86_64/nvidia-fabric-manager-515.65.01-1.x86_64.rpm

sudo yum install nvidia-fabric-manager-515.65.01-1.x86_64.rpm

systemctl enable nvidia-fabricmanager

systemctl restart nvidia-fabricmanager

systemctl status nvidia-fabricmanager

二、cuda 版本切换

一般情况下从官网下载：https://developer.nvidia.com/cuda-toolkit-archive

注意安装的时候：不要安装cuda driver

安装完成后切换软连接：

rm -rf /usr/local/cuda  #删除之前创建的软链接 
sudo ln -s /usr/local/cuda-11.3/  /usr/local/cuda/ 
nvcc --version #查看当前 cuda 版本

如果还不行直接在环境变量中修改：

vim ~/.bashrc

#然后添加

export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.3/lib64

遇见过一种情况还不行：

查看 which nvcc 发现指向没有变可能就是环境变量没有改过来
查看环境变量：

echo $PATH
## 打印：/home/centos/anaconda3/bin:/home/centos/anaconda3/condabin:/home/centos/.local/bin:/home/centos/bin:/usr/local/cuda-12.2/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/cuda/bin

##发现环境变量没有变，需要将固定指向的环境变量修改：

export PATH=/home/centos/anaconda3/bin:/home/centos/anaconda3/condabin:/home/centos/.local/bin:/home/centos/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/cuda/bin

linux常用命令

作者: admin
时间: 2023-04-18
分类: linux
评论

一、centos中添加用户

1.添加用户

useradd liulei   #添加用户
passwd liulei    #添加密码
#下面的只有zhy在用，一般服务不用
usermod -aG test01 liulei  #给用户分组，注意如果不添加这个可能无法登录,将用户 liulei 添加到组test01

2.修改密码

#需要先登录root权限账户
passwd  liulei  #用户名

3.给用户添加sudo权限

#先切换为root
#一般服务器
vim /etc/sudoers
centos   ALL=(ALL)   ALL

#zhy个别服务器
vim /etc/sudoers.d/su
centos ALL=(ALL) NOPASSWD:/bin/su -

二、gcc版本升级

yum install centos-release-scl
yum install devtoolset-8
#激活对应的devtoolset，所以你可以一次安装多个版本的devtoolset，需要的时候用下面这条命令切换到对应的版本
scl enable devtoolset-8 bash

gcc -v

这条激活命令只对本次会话有效，重启会话后还是会变回原来的4.8.5版本，要想随意切换可按如下操作。
首先，安装的devtoolset是在 /opt/rh 目录下的，如图

每个版本的目录下面都有个 enable 文件，如果需要启用某个版本，只需要执行

source ./enable

所以要想切换到某个版本，只需要执行

source /opt/rh/devtoolset-8/enable

永久起效：

vim /etc/profile
#末尾添加
source /opt/rh/devtoolset-7/enable\

#自后重启
source /etc/profile

或者：
如需切换使用如下命令：source /opt/rh/devtoolset-7/enable   
或者$USER_HOME/.bashrc 中加入默认就切换。

三、杀进程

1.kill

ps -ef|grep python
kill -9 pid  //彻底杀死进程

2.killall

Linux killall （kill processes by name）用于杀死进程，与 kill 不同的是killall 会杀死指定名字的所有进程。kill 命令杀死指定进程 PID，需要配合 ps 使用，而 killall 直接对进程对名字进行操作，更加方便

killall -9 mysql         //结束所有的 mysql 进程

3.pkill命令
pkill 命令和 killall 命令的用法相同，都是通过进程名杀死一类进程，除此之外，pkill 还有一个更重要的功能，即按照终端号来踢出用户登录。

pkill -f mysql         //结束 mysql 进程
pkill -u mark,danny //结束mark,danny用户的所有进程
w  //#使用w命令查询本机已经登录的用户
pkill -9 -t pts/1  //#强制杀死从pts/1虚拟终端登陆的进程

四、下安装包

pip download -d 保存安装的库及其依赖库的文件夹路径安装库名 -i 临时换源加快下载速度
例如：pip download -d ./websocket websocket-client -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install --no-index --find-links=库及其依赖库的文件夹路径离线下载的库名称
例如：pip install --no-index --find-links=./websocket websocket-client

ubuntu下载

五、磁盘空间、操作系统信息

1.Linux查看未挂载的磁盘空间
fdisk -l

2.Linux查看未分配磁盘
lsblk -f
3.Linux查看磁盘挂载
df -h
4.查看隐藏目录大小
du -sh .[!.]*
5.系统信息
centos版本信息：cat /etc/redhat-release
centos内核信息：cat /proc/version
centos操作系统：uname -s
centos内核类型: uname -m

六、查看centos硬件

1.top -c 命令：

top -c
# 按照GB显示
# shift 和E

GiB Mem :    251.4 total,      0.6 free,    250.5 used,      0.3 buff/cache
GiB Swap:    128.0 total,     93.4 free,     34.6 used.      0.1 avail Mem

GiB Mem ：物理内存总量（251.4G),空闲内存总量（0.6G),使用中的内存总量（250.5G),缓冲内存量
GiB Swap：交换区总量（128.0G），空闲交换区总量（93.4G),使用的交换区总量（34.6 G），可用交换取总量

(1)mem中物理内存总量:指的是现在系统内核控制的内存数，
(2)空闲内存总量（free）是内核还未纳入其管控范围的数量。纳入内核管理的内存不见得都在使用中，还包括过去使用过的现在可以被重复利用的内存，内核并不把这些可被重新使用的内存交还到free中去，因此在linux上free内存会越来越少，但不用为此担心
(3)used：使用的内存量
(4)buffer/cache：用作内核缓存的内存量

Swap分区是由硬盘提供的交换区，当物理内存不够用的时候，操作系统才会把暂时不用的数据放到Swap中。所以当这个数值变高的时候，说明内存是真的不够用了。

当物理内存使用完或者达到一定比例之后，我们可以使用swap做临时的内存使用。当物理内存和swap都被使用完那么就会出错，out of memory。对于使用多大比例内存之后开始使用swap，在系统的配置文件中可以通过调整参数进行修改。

swap交换分区的used，如果这个数值在不断的变化，说明内核在不断进行内存和swap的数据交换，这是真正的内存不够用了。

2、查看内存

free -h

七、查看进程所在位置

1、用ps -ef |grep xxxxx 得到该进程的pid
2、输入ls -l ,结果中 exe链接对应的就是可执行文件的路径
ls -l /proc/18283

八、删除乱码文件

系统下命令行是无法输出汉字的，如果要删除带有汉字的文件或文件夹，只需要查看文件的节点号，删除对应的节点号即可。
通过： ls -i 命令，查看文件的节点号（节点号为每个文件前面显示的数字）

查看节点

ls -i

删除节点

find -inum 7516192896 -delete

九、修改path的方法

修改方法一：
export PATH=/usr/local/mongodb/bin:$PATH
//配置完后可以通过echo $PATH查看配置结果。
生效方法：立即生效
有效期限：临时改变，只能在当前的终端窗口中有效，当前窗口关闭后就会恢复原有的path配置
用户局限：仅对当前用户

修改方法二：
通过修改.bashrc文件:
vim ~/.bashrc
//在最后一行添上：
export PATH=/usr/local/mongodb/bin:$PATH
生效方法：（有以下两种）
1、关闭当前终端窗口，重新打开一个新终端窗口就能生效
2、输入“source ~/.bashrc”命令，立即生效
有效期限：永久有效
用户局限：仅对当前用户

修改方法三:
通过修改profile文件:
vim /etc/profile
/export PATH //找到设置PATH的行，添加
export PATH=/usr/local/mongodb/bin:$PATH
生效方法：系统重启
有效期限：永久有效
用户局限：对所有用户

修改方法四:
通过修改environment文件:
vim /etc/environment
在PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"中加入“:/usr/local/mongodb/bin”
生效方法：系统重启
有效期限：永久有效

防火墙

1.查看当前防火墙状态：
firewall-cmd --state
2.开启防火墙
systemctl start firewalld
service firewalld start
3.关闭防火墙：
systemctl stop firewalld
service firewalld stop