Skip to content

Write a program to replace all non-English characters by space #114

@tashrifbillah

Description

@tashrifbillah

This is the error message validation tool spits out:

Validating files...
  0%|          | 0/1 [00:00<?, ?it/s]Exception in thread Thread-2:
Traceback (most recent call last):
  File "/data/predict1/miniconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/NDATools/Validation.py", line 486, in run
    response = post_request(self.api_scope, data, timeout=self.validation_timeout, headers = {'content-type':'text/csv'}, auth=self.auth)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/NDATools/Utils.py", line 281, in post_request
    return _send_prepared_request(req.prepare(), timeout=timeout, deserialize_handler=deserialize_handler, error_handler=error_handler)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/NDATools/Utils.py", line 244, in _retry
    tmp = func(*args, **kwargs)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/NDATools/Utils.py", line 267, in _send_prepared_request
    tmp = session.send(prepped, timeout=timeout)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/data/predict1/miniconda3/lib/python3.10/site-packages/urllib3/connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 1327, in _send_request
    body = _encode(body, 'body')
  File "/data/predict1/miniconda3/lib/python3.10/http/client.py", line 166, in _encode
    raise UnicodeEncodeError(
UnicodeEncodeError: 'latin-1' codec can't encode character '\u02bc' in position 389181: Body ('ʼ') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Files and their non-English characters are following:

    /data/predict1/to_nda/nda-submissions/tbi01_Prescient_screening.csv
    消炎藥
    
    /data/predict1/to_nda/nda-submissions/network_combined/socdem01.csv
    ʼ
    
    /data/predict1/to_nda/nda-submissions/ampscz_pps01_Pronet.csv
    Many lines, search: 경기도 and scroll down along the same column
    
    /data/predict1/to_nda/nda-submissions/vitas01_Prescient_baseline.csv
    科興

Metadata

Metadata

Assignees

Labels

wontfixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions