Handeling high load in multicore machine with In-line scripting

Hi Team,

I was analyzed the MITM performance and found out it unable to handle multiple users connection specially in line scripting. It getting high CPU usage while request processing evan @parallel decoration present.

Also found out MITM not design to use multi core CPU’s and it’s single CPU core bound. I would like to know details about it and how to extend it’s functionality to utilize multi core modern hardware and improve the performance.

Please guide me to get start and possible contribution.

Thank you,
Manoj M

Hi, I’ve just tested the mitmproxy 0.18 performance myself.
Compared it to squid3. And it turns out mitmproxy is 13 times slower.

I analyzed the source code and the main issue seems to be software architecture.
Specifically the way requests are handled: http://docs.mitmproxy.org/en/stable/_images/architecture.png

There are two segments that process requests: network protocols handling and request and response processing with custom scripts, etc.
First of all, networking segment uses multithreaded architecture where a separate thread is created per TCP/IP connection.
Then, every request and response are processed by flow controller called FlowMaster.
This master controller runs on a single thread.
Networking segment uses so called channels to pass requests to flow controller.
These channels essentially are synchronized python queues.
So every request/response is synchronized and processed on a single thread.
And this is where the performance bottleneck must be.

I have a lot of motivation to improve performance.
I’m planning to do the necessary patches.
But also it would be nice to retain the debugging functionality and make it possible to choose in which mode proxy will run.

Guys, would you be interested in merging these changes into upstream repository?

Good analysis, MITM having rich functionalities but because of fundamental architectural issues it’s unable to scale at this stage I believe. Your attempt is good and many people get benefited. check ngnix and SEDA architecture for idea.

I’m sorry to say that performance is currently not our primary design goal. There are a few things that can be done to speed up things (e.g. one instance per core), but it’s nothing that will easily scale to 100s of users.

What’s your use case here?

My use case is a regular high performing proxy with the ability to run custom scripts to modify requests/responses.
But usually I don’t need interactive debugging mode.
Although it would be nice to optionally have it.

Anyways, I successfully eliminated the channel communication between the network layer and flow controller.
But the performance gains were negligible: HTTP +40% requests/s, HTTPS - +15%.
So I did some more profiling and located other performance issue.
It’s the HTTP stream handling.
I’ve profiled the code while running benchmark.
mitmproxy was consuming 100% CPU on all cores.
And one of the biggest problems was that it receives data from socket byte by byte.
See: https://github.com/mitmproxy/mitmproxy/blob/master/netlib/tcp.py#L250

The issue lies in the method readline in netlib.tcp.Reader.
This method is used to read HTTP first request/response line and headers.
readline loops recv(1) until gets new line symbol.
Such behavior leads to a lot of context switches and high CPU usage.

Have you considered using 3rd party HTTP parser. E.g.: https://github.com/benoitc/http-parser/
It has more advanced HTTP parsing techniques.

Thanks for playing around and reporting back your findings - that is super helpful!

We’re always grateful for patches that improve mitmproxy’s performance without adding significant complexity overhead (netlib.tcp.Reader is the most obvious thing that would need a buffer). As I said before, perfomance is just not something we’re putting a high emphasis on at the moment - I’m sorry that you are not finding what you are looking for.

We re-use our HTTP parsing bits in pathod, so we do need a lot of flexibility here. Also, mitmproxy supports a bunch of real-world invalid-per-RFC cases, which makes using external parsers even more difficult.

If I had a need for a very high performance proxy and some spare time, I would probably write a custom TCP transparent proxy in Go/Rust that just intercepts and relays TCP messages, combined with some fuzzy logic to detect/modify HTTP. Doesn’t that sound exciting to build? :wink: