Pysa is aware of inheritance, so you can add taint annotations to a base class, and Pysa will detect when the tainted attribute or function is accessed via a child class. For example, this flow will be detected during static analysis:
class Parent: def some_source(self): # Annotated as a source pass class Child(Parent): pass child = Child() some_sink(child.some_source()) # Detected as a tainted flow
Additionally, Pysa is aware that child classes can be used anywhere a parent classes's type is present. If you access a method on a parent class and the implementation on any child class returns taint, Pysa will detect that and treat the return from the parent class as tainted. For example, this will be detected as a tainted flow during static analysis:
class Parent: def some_fn(self): """Benign function with no annotations""" pass class Child(Parent): def some_fn(self): """Function returning a tainted value""" return get_some_tainted_value() def fn(obj: Parent): some_sink(obj.some_fn()) # Detected as a tainted flow
A huge caveat here is that Pysa needs to be aware of these inheritance relationships and function definitions for it to work. Code that lives outside the repo under analysis might not be visible to Pysa, so these inheritances/implementations may be missed. See the Stubs section below for more details.
The concept of stubs is covered in general here, but this
section in particular will cover specific issues you may encounter with the
.pyi kind of stubs. These stubs can be used to prevent pyre errors for types
that live outside the codebase you are running Pysa on. The simplest stubs are
just empty files in the root of the
stubs directory (assuming you have a
stubs directory specified in the
search_path list in your
.pyre_configuration file). An empty stub basically prevents all type checking
errors within the namespace of that stub. So for
uwsgi.pyi, in the
directory, the following code would not raise pyre errors (though it would
obviously fail to run):
import uwsgi from uwsgi import asdf, ZXCV uwsgi.qwer() variable = ZXCV() variable.hjkl()
If you want to be able to create
.pysa stubs (i.e. annotate sources, sinks,
etc.) for something that is outside your codebase, such as Django's
django.http.request.HttpRequest object, you need more than just an empty stubs
file. You need a directory structure and
.pyi file that matches your import,
stubs/django/http/request.pyi. Within that
.pyi file, you
then need a stub of the class:
class HttpRequest(BinaryIO): def __init__(self) -> None: ... COOKIES: Any = ... GET: QueryDict = ... # And a bunch more stuff...
Only at this point can you add
.pysa files with annotations such as these:
django.http.request.HttpRequest.COOKIES: TaintSource[UserControlled] = ... django.http.request.HttpRequest.GET: TaintSource[UserControlled] = ...
There is a huge gotcha here: If you had both an empty
file, and the
stubs/django/http/request.pyi file shown above, pyre will see
django.pyi file first and ignore the
request.pyi file. This would mean
that your stub of
HttpRequest would be missed, and your
HttpRequest.GET annotations would cause errors when running Pysa. The fix
is simply to delete the
django.pyi file. When deleting that file, you may all
of a sudden see new typing errors for other types within Django, for which
you'll need to add new .
pyi files at the appropriate locations.
Missing types cause missed flows
Due to optimizations to allow parallelization, Pysa can be blind in some
scenarios that might be obvious to a human. Pysa needs to know the type of an
object that is a source/sink at the point at which it is accessed, in order
for it to detect tainted flows. For example, if you have a function that returns
a wrapper around a source, flows from that source will not be found unless the
return type of the function is specified. See below how one of the flows in the
run function is missed, simply because the return type on
get_wrapper_untyped is missing:
from django.http import HttpRequest class RequestWrapper: request: HttpRequest def __init__(self, request: HttpRequest): self.request = request def get_request_data(self): return self.request.GET["data"] def get_wrapper_untyped(request: HttpRequest): return RequestWrapper(request) def get_wrapper_typed(request: HttpRequest) -> RequestWrapper: return RequestWrapper(request) def run(request: HttpRequest): # This flow WILL NOT be found wrapper = get_wrapper_untyped(request) eval(wrapper.get_request_data) # This flow WILL be found wrapper = get_wrapper_typed(request) eval(wrapper.get_request_data)
This illustrates how important typing is for ensuring all flows are caught by during static analysis.
Globals cause missed flows
To allow for parallel processing, Pysa is limited in it's ability to track taint flows through global variables. For example, Pysa will not detect an issue in the following code:
user_controlled_data = "" def load_data(request: HttpRequest) -> None: user_controlled_data = request.GET["data"] def run_command(request: HttpRequest) -> None: load_data(request) eval(user_controlled_data)
The best workaround is to avoid using globals in your code. If a refactor isn't
possible, but you do know what globals should be considered tainted, you can
explicitly declare the global tainted in your
Helpful Python knowledge
Pretty much all python operators are reduced down to double underbar functions.
For example, constructing an object results in a call to
and an asterisk operator results in a call to
__mul__(a, b). A full list of
these operators can be found
here. This is useful to
know when you need to add annotations to the usage of operators, such as the use
of square brackets to access a dictionary.
You can insert a call to the (non-existent)
pyre_dump() function in your code
to trigger to pyre to output a ton of metadata about it's current state when it
parses the that function call. This can be useful as a starting point to figure
out why something is/isn't happening. This will produce very verbose output.
If you only want to check what pyre knows about the types of variables, inject a
reveal_type(YOUR_VARIABLE) (no import needed) in your code. Running
Pyre on your code will then give you compact output indicating what Pyre thinks
the type of your variable is.
reveal_type, if you only want to check what pyre knows about the
taint on variables, inject a call to
reveal_taint(YOUR_VARIABLE) (no import
needed) in your code. Running Pysa on your code will then give you compact
output indicating what taint Pysa has discovered. Note that each time Pysa
analyzes the function (which could be many times) it will update it's
understanding of the taint flowing into the function and output the current
state. The final output will be the most complete.
reveal_taint is a new feature, and is may not always give correct results. A
simple debugging technique when
reveal_taint fails is to inject a call to a
known sink for your source, such as
eval, rather than
reveal_taint. You can
then run Pysa and see if your injected flow is detected.
Another strategy for getting a bit more metadata is adding a function into your
code, which simply constructs and returns the type you want to examine. You can
then run Pysa, and grep for the function's name in the
results.json file located wherever you pointed
--save-results-to= to when
running Pysa. You should then be able to see if that function is detected as
returning taint, plus a bit more metadata about it.
The Static Analysis Post Processor (SAPP)
has access to the same information as
results.json. While SAPP doesn't display
all the information
results.json contains, it can display the information in a
more user-friendly gdb-style way. It's especially useful for exploring flows
which pass through many frames.
taint.config is a JSON file and
.pysa files use Python syntax. If you update
your editor to recognize those files as JSON and Python respectively, it'll make
Run Pysa faster with
On large projects, Pysa can take a long time to run. When you're iterating on a
single rule, pass the
--rule option to Pysa to speed things up a bit by
omitting processing on all other rules. Eg.
pyre analyze --rule 5000