False Negatives occur when there is a legitimate flow of tainted data from a source to a sink, but Pysa fails to catch it.
Pysa relies on type information from Pyre to identify sources and sinks, and to build the call graph needed to follow the propagation of taint between the two. Just becasue type information is available somewhere in the code, does not mean Pyre will know the type of an object in the exact place where Pysa needs it. See the documentation on Coverage Increasing Strategies for tips on how to increase type coverage. The following examples demonstrate how lost type information leads to lost flows.
HttpRequest.GET is a common source of
UserControled data in Django. If the
request objects are not explicitly typed as
HttpRequst, however, Pysa will
fail to detect obvious issues:
Pysa relies on type information in order to build a call graph that accurately
tracks a method call of
foo.bar(x) to the
def bar(self, x) implementation.
Without type information on
bar, Pysa will be unable to figure out how to
dispatch the call and the flow will be lost:
To allow for parallel processing, Pysa is limited in it's ability to track taint flows through global variables. For example, Pysa will not detect an issue in the following code:
The best workaround is to avoid using globals in your code. If a refactor isn't
possible, but you do know what globals should be considered tainted, you can
explicitly declare the global tainted in your
- Identify the flow you expect to see
- Every function call/return that propagates the tainted data from the source to the sink
- Every variable that the tainted data passes through, within the identified functions. This usually includes the parameter which initially received the taint, and then 0 or more local variables that hold the tainted data as it is transformed in some way.
- Add a
reveal_typestatement to each of the variables identified in the previous step
- Run Pysa using the same command you used when the false negative manifested,
but also include the
pyre --noninteractive analyze)
- Start following the flow from source to sink in your code, and find the
corresponding output for each
- Note that each time Pysa analyzes a function (could be many times), it will
dump the latest taint information, so the last instance of
reveal_taintoutput for a given line will be the most accurate and is the one you should look at.
reveal_taintoutput exposes some of the implementation details of Pysa, by giving you
Revealed forward taintand
Revealed backward taintmessages. Without going into those details, you should expect to see either the source name (eg.
UserControlled) you care about appearing in the
Revealed forward taintoutput, or the sink name (eg.
RemoteCodeExecution) you care about in the
Revealed backward taintoutput.
- For each
reveal_taint, following the flow of tainted data from source to sink, locate the output in the logs that reveals the taint (eg.
integration_test.reveal_taint:20:4-20:16: Revealed forward taint for ``command``:).
- If you see your source or sink name in the output, then go back to 1) and
carry on with the next
reveal_taintstatement. If you do not see the source or sink name, then that means the cause of the false negative is likely between your previous
reveal_taintand the one you're currently looking at. Refer to the "Commom Causes of False Negatives" section above for ideas on the cause, and how to fix it.
- Note that each time Pysa analyzes a function (could be many times), it will dump the latest taint information, so the last instance of
Pysa will not be able to detect a vulnerability in the following code:
Folling the above debugging steps we identify the flow of data from beginning to end, and add debugging statements:
See the appendix for the full output of running
pyre --noninteractive analyze on this example.
Starting at 1), we see this in the output:
Removing the timestamps and other noise gives us:
For debugging false negatives, the only portion we care about is:
This confirms that on line 11 (characters 14-25), we did indeed detect that
command was tainted as
Moving on to 2, the
forward taint output again tells us that we have
UserControlled taint on
command at line 26 (characters 4-16).
Starting with 4, we notice that we no longer see
RemoteCodeExecution in our revealed forward or backwards taint:
This has helped us narrow down the problem to the
execute_command function. In
the end, the problem was that we did not have type information on
Pysa did not know where the definition of
runner.run was. Without knowing
where the definition was, Pysa couldn't know that
run containted a sink and
thus couldn't know that
command eventually reached that sink.
Subset of the output from running
pyre --noninteractive analyze on the