Python – Decoding Ebcdic

character-encoding, ebcdic, python

I'm being passed data that is ebcdic encoded. Something like:

s = u'@@@@@@@@@@@@@@@@@@@ÂÖÉâÅ@ÉÄ'

Attempting to .decode('cp500') is wrong, but what's the correct approach? If I copy the string into something like Notepad++ I can convert it from EBCDIC to ascii, but I can't seem to find a viable approach in python to achieve the same. For what it's worth, the correct result is: BOISE ID (plus or minus space padding).

The information is being retrieved from a file of lines of JSON objects. That file looks like this:

{ "command": "flush-text", "text": "@@@@@[email protected]@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@O" }{ "command": "flush-text", "text": "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\u00C9\[email protected]\u00D5\u00A4\u0094\u0082\u0085\[email protected]@@@@@@@@@\u00D9\u00F5\u00F9\u00F7\u00F6\u00F8\u00F7\u00F2\u00F4" }{ "command": "flush-text", "text": "@@@@@OmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmO" }{ "command": "flush-text", "text": "@@@@@[email protected]@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@O" }

And the processing loop looks something like:

with open('myfile.txt', 'rb') as fh:  for line in fh:    data = json.loads(line)

Best Solution

If Notepad++ converts it ok, then you should simply need:

Python 2.7:

with io.open('myfile.txt', 'r', encoding="cp500") as fh:  for line in fh:    data = json.loads(line)

Python 3.x:

with open('myfile.txt', 'r', encoding="cp500") as fh:  for line in fh:    data = json.loads(line)

This uses a TextWrapper to decode the file as it's read using the given decoding. io module provides Python 3 open to Python 2.x, with codecs/TextWrapper and universal newline support