Export dump files as JSON

I want to export my mitmproxy dump files (captured using -w switch) to JSON.

I wrote a script which does this, more or less, here:

from mitmproxy import flow
import sys
import jsonpickle

items = []
with open(sys.argv[1], "rb") as logfile:
    freader = flow.FlowReader(logfile)
    try:
        for f in freader.stream():
	    items.append( f )
    except Exception as e:
        print("Flow file corrupted: {}".format(e))

    print jsonpickle.encode( items )

This works, but (as expected) is dumping out the python objects. The output looks like this:

[[{"py/object":"mitmproxy.models.http.HTTPFlow","server_conn":{"py/object":"mitmproxy.models.connections.ServerConnection","server_certs":[],"via":null,"protocol":null,"timestamp_tcp_setup":1473286806.433876,"cert":{"py/object":"netlib.certutils.SSLCert","x509":{"py/object":"OpenSSL.crypto.X509","_x509":{"py/object":"_cffi_backend.CDataGCP"}}},"timestamp_ssl_setup":1473286806.68306,"_TCPClient__source_address":{"py/object":"netlib.tcp.Address","family":2,"address":{"py/tuple":["1...

I’d love to find a better way to dump the files which gets me closer to the raw HTTP request/response itself and less of the internal python details. Perhaps go to HAR file, and then convert to JSON? I’m not much of a python programmer, so I’m unsure how to get started inspecting the objects and pulling out the important information.

My goal is to have:

  • the request: headers and body
  • the response: headers and body
  • host and path.
  • maybe timing information?

Any suggestions?

Hi @xrd,

I believe your semantic mistake is using jsonpickle, which would serialize whole objects - you probably just want to use the json module from the standard library. Then you can either use json.dumps(flow.get_state()) or construct the dictionary yourself and then json.dumps() that one. https://github.com/mitmproxy/mitmproxy/blob/master/examples/har_dump.py shows that on a more advanced level!

1 Like