Skip to content

Cannot load CIC-IDS2017 CSV data set: DatasetNotFoundError #5429

@ajpotts

Description

@ajpotts

When loading the Intrusion Detection Evaluation Dataset (CIC-IDS2017) found at https://www.unb.ca/cic/datasets/ids-2017.html (which is in CSV), I get the following error:

df = ak.read_csv("TrafficLabelling/M*")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 1
----> 1 df = ak.read_csv("/home/amandapotts/git/arkouda-netflow-demo/data/TrafficLabelling/M*")

File ~/git/arkouda/arkouda/pandas/io.py:1139, in read_csv(filenames, datasets, column_delim, allow_errors)
   1136 elif datasets is None:
   1137     datasets = get_columns(filenames, col_delim=column_delim, allow_errors=allow_errors)
-> 1139 rep_msg = generic_msg(
   1140     cmd="readcsv",
   1141     args={
   1142         "filenames": filenames,
   1143         "nfiles": len(filenames),
   1144         "datasets": datasets,
   1145         "num_dsets": len(datasets),
   1146         "col_delim": column_delim,
   1147         "allow_errors": allow_errors,
   1148     },
   1149 )
   1150 rep = json.loads(rep_msg)  # See GenSymIO._buildReadAllMsgJson for json structure
   1151 _parse_errors(rep, allow_errors)

File ~/git/arkouda/arkouda/core/client.py:1156, in generic_msg(cmd, args, payload, send_binary, recv_binary)
   1154     else:
   1155         assert payload is None
-> 1156         return cast(Channel, channel).send_string_message(
   1157             cmd=cmd, args=msg_args, size=size, recv_binary=recv_binary
   1158         )
   1159 except KeyboardInterrupt as e:
   1160     # Reset the socket before re-raising to keep the REQ/REP stream in sync.
   1161     cast(Channel, channel).connect(timeout=0)

File ~/git/arkouda/arkouda/core/client.py:577, in ZmqChannel.send_string_message(self, cmd, recv_binary, args, size, request_id)
    575 # raise errors or warnings sent back from the server
    576 if return_message.msgType == MessageType.ERROR:
--> 577     raise RuntimeError(return_message.msg)
    578 elif return_message.msgType == MessageType.WARNING:
    579     warnings.warn(return_message.msg)

RuntimeError: 1 errors: DatasetNotFoundError: DatasetNotFoundError Line 481 In CSVMsg.read_csv_pattern: The dataset Source IP was not found in /home/amandapotts/git/arkouda-netflow-demo/data/TrafficLabelling/Monday-WorkingHours.pcap_ISCX.csv

I'm wondering if it's due to the spaces in the header names.

Metadata

Metadata

Assignees

No one assigned

    Labels

    File IOArkouda file IO capabilitiesbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions