Categories
python python-3.x string

Convert bytes to a string

3397

I captured the standard output of an external program into a bytes object:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>>
>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

I want to convert that to a normal Python string, so that I can print it like this:

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

I tried the binascii.b2a_qp() method, but got the same bytes object again:

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

How do I convert the bytes object to a str with Python 3?

4

  • 133

    why doesn’t str(text_bytes) work? This seems bizarre to me.

    Mar 14, 2019 at 22:25

  • 53

    @CharlieParker Because str(text_bytes) can’t specify the encoding. Depending on what’s in text_bytes, text_bytes.decode('cp1250)` might result in a very different string to text_bytes.decode('utf-8').

    Mar 31, 2019 at 17:32


  • 14

    so str function does not convert to a real string anymore. One HAS to say an encoding explicitly for some reason I am to lazy to read through why. Just convert it to utf-8 and see if ur code works. e.g. var = var.decode('utf-8')

    Apr 22, 2019 at 23:32


  • 15

    @CraigAnderson: unicode_text = str(bytestring, character_encoding) works as expected on Python 3. Though unicode_text = bytestring.decode(character_encoding) is more preferable to avoid confusion with just str(bytes_obj) that produces a text representation for bytes_obj instead of decoding it to text: str(b'\xb6', 'cp1252') == b'\xb6'.decode('cp1252') == '¶' and str(b'\xb6') == "b'\\xb6'" == repr(b'\xb6') != '¶'

    – jfs

    Apr 12, 2020 at 5:11

5231

Decode the bytes object to produce a string:

>>> b"abcde".decode("utf-8") 
'abcde'

The above example assumes that the bytes object is in UTF-8, because it is a common encoding. However, you should use the encoding your data is actually in!

18

  • 85

    Using "windows-1252" is not reliable either (e.g., for other language versions of Windows), wouldn’t it be best to use sys.stdout.encoding?

    – nikow

    Jan 3, 2012 at 15:20

  • 20

    Maybe this will help somebody further: Sometimes you use byte array for e.x. TCP communication. If you want to convert byte array to string cutting off trailing ‘\x00′ characters the following answer is not enough. Use b’example\x00\x00’.decode(‘utf-8’).strip(‘\x00’) then.

    – Wookie88

    Apr 16, 2013 at 13:27

  • 3

    I’ve filled a bug about documenting it at bugs.python.org/issue17860 – feel free to propose a patch. If it is hard to contribute – comments how to improve that are welcome.

    Apr 28, 2013 at 14:40

  • 67

    In Python 2.7.6 doesn’t handle b"\x80\x02\x03".decode("utf-8") -> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte.

    – martineau

    May 18, 2014 at 20:12

  • 19

    If the content is random binary values, the utf-8 conversion is likely to fail. Instead see @techtonik answer (below) stackoverflow.com/a/27527728/198536

    – wallyk

    May 27, 2015 at 21:21

374

Decode the byte string and turn it in to a character (Unicode) string.


Python 3:

encoding = 'utf-8'
b'hello'.decode(encoding)

or

str(b'hello', encoding)

Python 2:

encoding = 'utf-8'
'hello'.decode(encoding)

or

unicode('hello', encoding)

7

  • 5

    On Python 3, what if the string is in a variable?

    – Alaa M.

    Feb 27, 2020 at 14:47

  • 2

    @AlaaM.: the same. If you have variable = b'hello', then unicode_text = variable.decode(character_encoding)

    – jfs

    Apr 12, 2020 at 5:03

  • 5

    for me, variable = variable.decode() automagically got it into a string format I wanted.

    – Alex Hall

    Jul 19, 2020 at 3:41


  • 5

    @AlexHall> fwiw, you might be interested to know that automagic uses utf8, which is the default value for encoding arg if you do not supply it. See bytes.decode

    – spectras

    Apr 17, 2021 at 11:12

  • Using any decoding gives me: AttributeError: ‘str’ object has no attribute ‘decode’

    – Seth

    May 27 at 17:43

249

This joins together a list of bytes into a string:

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

11

  • 6

    Thank you, your method worked for me when none other did. I had a non-encoded byte array that I needed turned into a string. Was trying to find a way to re-encode it so I could decode it into a string. This method works perfectly!

    May 10, 2014 at 0:28

  • 7

    @leetNightshade: yet it is terribly inefficient. If you have a byte array you only need to decode.

    – Martijn Pieters

    Sep 1, 2014 at 16:25

  • 20

    @Martijn Pieters I just did a simple benchmark with these other answers, running multiple 10,000 runs stackoverflow.com/a/3646405/353094 And the above solution was actually much faster every single time. For 10,000 runs in Python 2.7.7 it takes 8ms, versus the others at 12ms and 18ms. Granted there could be some variation depending on input, Python version, etc. Doesn’t seem too slow to me.

    Sep 1, 2014 at 17:06

  • 10

    @Sasszem: this method is a perverted way to express: a.decode('latin-1') where a = bytearray([112, 52, 52]) (“There Ain’t No Such Thing as Plain Text”. If you’ve managed to convert bytes into a text string then you used some encoding—latin-1 in this case)

    – jfs

    Nov 16, 2016 at 3:16

  • 8

    For python 3 this should be equivalent to bytes([112, 52, 52]) – btw bytes is a bad name for a local variable exactly because it’s a p3 builtin

    Oct 11, 2017 at 15:14