Python 3.0 urllib.parse.urlencode()

2 messages in this thread from Chicago Python Uers Group in 2009-02

  1.   missing
  2.   Dan Mahn <dan.mahn@di...com> 02-26 17:17
  3.   David Beazley <d-beazley@sb...net> 02-27 11:28

This message appeared in a previous month, was never archived, or was lost.

Dan Mahn <dan.mahn@di...com>

2009-02-26 17:17:56
Hello,

I'm writing a test suite that uses URLs and HTTP requests.

I would like to use a 'latin-1' for encoding some of the query string 
information.  I noticed in the Python 3.0 library documentation that 
urllib.parse.quote_plus() allows me to send "bytes".  When I use 
quote_plus() directly, I can encode my string in my choice of encodings 
(i.e. "latin-1") and send the resulting bytes to quote_plus(), and it 
does exactly what I expect.

Considering quote_plus() takes "bytes", I expected urlencode() to allow 
the same, but it does not.  However, with a slight modification, it can. 
  For instance (I am using doseq=0):

changing:

            v = quote_plus(str(v))

to ...

            if isinstance(v, bytes):
                 v = quote_plus(v)
             else:
                 v = quote_plus(str(v))


Produces the result I would expect.

Does this seem like something that has been overlooked, and should be 
the default behavior?  Python 3.0 has introduced some changes in the 
APIs here, and this appears to affect functionality that is unique to 3.0.

Additionally, it would seem to me that urlencode() itself should take an 
"encoding" parameter, and possibly other parameters, that would be 
passed on to quote_plus().  However, that could be worked around as long 
as urlencode() could use "bytes", as above.  (Actually, the above ought 
to be extended to "k", as well as the "doseq" side of the if.)

_______________________________________________
Chicago mailing list
Chicago@py...org
http://mail.python.org/mailman/listinfo/chicago

David Beazley <d-beazley@sb...net>

2009-02-27 11:28:32
> Does this seem like something that has been overlooked, and should be
> the default behavior?  Python 3.0 has introduced some changes in the
> APIs here, and this appears to affect functionality that is unique  
> to 3.0.
>
> Additionally, it would seem to me that urlencode() itself should  
> take an
> "encoding" parameter, and possibly other parameters, that would be
> passed on to quote_plus().  However, that could be worked around as  
> long
> as urlencode() could use "bytes", as above.  (Actually, the above  
> ought
> to be extended to "k", as well as the "doseq" side of the if.)
I don't know if this is something that's been overlooked or not.   
However, I recently did a fairly thorough pass through the Python 3.0  
libraries when updating my book and found *numerous* problems/bugs  
related to inconsistent treatment of strings/bytes--especially with  
regard to encodings of binary data into text formats (Base 64, quopri,  
hex, etc.).  Just as an example:

 >>> base64.b64encode(b'hello')
b'aGVsbG8='
 >>>

Discussion: The purpose of base64 is to encode data into text.  So,  
should that result be a byte string or a text string?

It wouldn't surprise me if there are inconsistencies in other places  
(like urlencode).  You might check the Python bug tracker and submit a  
report to see what happens (the developers will let you know if they  
think it's expected behavior or not).

Cheers,
Dave



_______________________________________________
Chicago mailing list
Chicago@py...org
http://mail.python.org/mailman/listinfo/chicago

2 messages in this thread from Chicago Python Uers Group in 2009-02