Skip to content

Segmentation fault after successful transfer #3

@drewhemm

Description

@drewhemm

Output on server:

# hdrdmacp -s -n 32 -m 8GB
Looking for IB devices ...

=============================================
Found 1 devices
---------------------------------------------
   device 0 : mlx4_0 : uverbs0 : IB : InfiniBand channel adapter : Num. ports=2 : port num=1 : lid=2
=============================================

Device mlx4_0 opened. num_comp_vectors=32
Port attributes:
           state: 4
         max_mtu: 5
      active_mtu: 5
  port_cap_flags: 38865002
      max_msg_sz: 1073741824
    active_width: 2
    active_speed: 4
      phys_state: 5
      link_layer: 1
buff_len_GB: 8
num_buff_sections: 32
We got this far...
Created 32 buffers of 250MB (8GB total)
Listening for connections on port ... 10470
=== [10 sec avg.] 0 GB/s  --  0 TB total received
=== [10 sec avg.] 0 GB/s  --  0 TB total received
=== [10 sec avg.] 0 GB/s  --  0 TB total received
Receiving file: /root/windows.iso
hi->flags: 0x1


Message from syslogd@HOSTNAME at May  9 16:45:46 ...
 kernel:[15768.841039] watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [hdrdmacp:130454]
^C^C^C # tried to exit the program here, but to no avail
Message from syslogd@HOSTNAME at May  9 16:46:14 ...
 kernel:[15796.840997] watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [hdrdmacp:130454]
Segmentation fault

Output on client:

# ./hdrdmacp windows.iso 192.168.19.1:/root/windows.iso
Looking for IB devices ...

=============================================
Found 1 devices
---------------------------------------------
   device 0 : mlx4_0 : uverbs0 : IB : InfiniBand channel adapter : Num. ports=2 : port num=1 : lid=1
=============================================

Device mlx4_0 opened. num_comp_vectors=96
Port attributes:
           state: 4
         max_mtu: 5
      active_mtu: 5
  port_cap_flags: 38865000
      max_msg_sz: 1073741824
    active_width: 2
    active_speed: 4
      phys_state: 5
      link_layer: 1
Created 4 buffers of 250MB (1GB total)
IP address: 192.168.19.1 (192.168.19.1)
Connected to 192.168.19.1:10470
Sending file: windows.iso-> (192.168.19.1:)/root/windows.iso   (5.50971 GB)
  queued 9MB (5509/5509 MB -- 100%  - 11.3267 Gbps)   ps)
  waiting for final 1 transfers to complete ...
  Transferred 5.50971 GB in 2.71587 sec  (16.2297 Gbps)
  I/O rate reading from file: 1.65955 sec  (26.56 Gbps)

Transfer from the client side looked good and I checked that the file size and md5sum of the destination file matched the source. If I can find a solution for the seg fault, I'll be a happy man!

Looks like the ib connection itself, as seen by opensm was interrupted:

# opensm
-------------------------------------------------
OpenSM 5.7.2.MLNX20201014.9378048
Command Line Arguments:
 Log File: /var/log/opensm.log
-------------------------------------------------
OpenSM 5.7.2.MLNX20201014.9378048

Using default GUID 0x2c903004bfc0b
Entering DISCOVERING state

Entering MASTER state


Message from syslogd@HOSTNAME at May  9 16:45:46 ...
 kernel:[15768.841039] watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [hdrdmacp:130454]

Message from syslogd@HOSTNAME at May  9 16:46:14 ...
 kernel:[15796.840997] watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [hdrdmacp:130454]
SM port is down

Entering DISCOVERING state

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions