Feature Annotations
Features (sometimes called breadcrumbs) are additional metadata that are associated with taint flows. They can be useful for helping to filter out false positives, or for zeroing in on high-signal subsets of a rule. Some are automatically added during the analysis process, and there is a rich system for manually specifying additional features.
Manually Added Featuresβ
via
Feature Using Via[]
β
The via
feature indicates that a flow passed through a point in the code, such
as a function parameter, that was annotated with the specified feature name. For
example, via:getattr
might indicate that the flow passed through a call to
getattr
Feature names are declared in your taint.config
file (the same file as
sources/sinks/rules) like this:
features: [
{
name: "getattr",
comment: "via getattr first parameter"
},
{
name: "request_files",
comment: "via django request.FILES"
}
]
The via
feature can be appended to TaintSource
and TaintSink
annotations
to add extra metadata to the specified source and sink flows. It can also be
appended to TaintInTaintOut
annotations, to add extra metadata to any flow
that goes through that annotated function/parameter/attribute.
This is done by adding Via[FEATURE_NAME]
within square brackets after the
TaintXXXX
annotation in a model file:
# Augmenting TaintSource
django.http.request.HttpRequest.FILES: TaintSource[UserControlled, Via[request_files]] = ...
# Augmenting TaintInTaintOut
def getattr(
o: TaintInTaintOut[Via[getattr]],
name: TaintSink[GetAttr],
default: TaintInTaintOut[LocalReturn],
): ...
Pysa also supports attaching features to inferred flows, which allows you to
filter flows passing through a function without having to annotate the taint
yourself explicitly, and having the feature attached to all taint flowing
through the function. This is done by adding the AttachToSource
,
AttachToSink
, and AttachToTito
annotations in a model file:
# Attaching taint to sources.
def get_signed_cookie() -> AttachToSource[Via[signed]]: ...
# Attaching taint to sinks.
def HttpResponseRedirect.__init__(self, redirect_to: AttachToSink[Via[redirect]], *args, **kwargs): ...
# Attaching taint to taint-in-taint-out models.
def attach_features.tito_and_sink(arg: AttachToTito[Via[some_feature_name]]): ...
Pysa additionally supports attaching features to flows irrespective of sources,
sinks, and TITO, using the AddFeatureToArgument
annotation:
def add_feature_to_argument.add_feature_to_first(
first: AddFeatureToArgument[Via[string_concat_lhs]],
second
): ...
Note that Pysa automatically adds some via
features with special meaning.
See the Automatic Features section for details.
via-value
Feature Using ViaValueOf[]
β
The via-value
feature is similar to the via
feature, however, it captures
the value of the specified argument, rather than a feature name. Note that
this only works for string literals, boolean literals, numeric literals, and enums.
For example, via-value:Access-Control-Allow-Origin
might indicate that the string
literal Access-Control-Allow-Origin
was used to set a header in a Django response.
The via-value
feature can be added anywhere that the via
feature can be
added. It is added by specifying ViaValueOf[PARAMETER_NAME]
, where
PARAMETER_NAME
is the name of the function parameter for which you would like
to capture the argument value. To continue the above example, this is how you
would capture the name of a header being set on a Django HttpResponse
:
def django.http.response.HttpResponse.__setitem__(
self,
header: TaintSink[ResponseHeaderName],
value: TaintSink[ResponseHeaderValue, ViaValueOf[header]]
): ...
In cases where the argument is not a constant, the feature will appear as
via-value:<unknown:ARGUMENT_TYPE>
, where ARGUMENT_TYPE
indicates how the
argument value is provided at the callsite. For a model such as this:
def f (first, second, third) -> TaintSource[Test, ViaValueOf[second]]:...
The following function invocations will produce the features shown in the comments:
f(*args) # Generates via-value:<unknown:args>
f(**kwargs) # Generates via-value:<unknown:kwargs>
f(second=foo) # Generates via-value:<unknown:named>
f(foo, bar) # Generates via-value:<unknown:positional>
f(*args, **kwargs) # Generates via-value:<unknown:args_or_kwargs>
If the argument is not provided at the call site (e.g,
using the default value), the feature will appear as via-value:<missing>
.
You can also associate a tag with a via-value
feature to ensure that different
via-value
annotations don't interfere with each other. Here's how you can retain
the information that the name of the header was being set:
def django.http.response.HttpResponse.__setitem__(
self,
header: TaintSink[ResponseHeaderName],
value: TaintSink[ResponseHeaderValue, ViaValueOf[header, WithTag["set-header"]]
): ...
The feature would now appear as via-set-header-value:Access-Control-Allow-Origin
.
via-type
Feature Using ViaTypeOf[]
β
The via-type
feature is nearly identical to the via-value
feature, however,
it captures the type of the specified argument, rather than it's value. Pysa
will retrieve the type information for the argument from Pyre, and add a feature
such as "via-type": "str"
, "via-type": "typing.List[str]"
, or "via-type":
"typing.Any"
(in the case Pyre doesn't have type information).
ViaTypeOf
is useful for sinks such as subprocess.run
, which accepts
Union[bytes, str, Sequence]
for it's arg
parameter. The via-type
feature
can help identify which type the argument to arg
actually had. Knowing the
type of the argument can help assess the severity of a given issue (user
controlled input in a str
passed to arg
is much easier to exploit for RCE
than user controlled input in one element of a Sequence
passed to arg
).
The via-value
feature can be added anywhere that the via
feature can be
added. It is added by specifying ViaTypeOf[PARAMETER_NAME]
, where
PARAMETER_NAME
is the name of the function parameter for which you would like
to capture the argument value:
def subprocess.run(
args: TaintSink[RemoteCodeExecution, ViaTypeOf[args]],
): ...
The via-type
feature can also be used on attribute or global models. For example:
my_module.MyClass.source: TaintSource[Test, ViaTypeOf] = ...
my_module.MyClass.sink: TaintSource[Test, ViaTypeOf] = ...
A standalone ViaTypeOf
is also supported in this case, and is shorthand for TaintInTaintOut[ViaTypeOf]
:
my_module.MyClass.my_attribute: ViaTypeOf = ...
The via-type
feature also supports adding tags, using the same syntax as the via-value
feature:
def subprocess.run(
args: TaintSink[RemoteCodeExecution, ViaTypeOf[args, WithTag["my_tag"]]]
): ...
my_module.MyClass.sink: TaintSource[Test, ViaTypeOf[WithTag["my_tag"]]] = ...
my_module.MyClass.other_attribute: ViaTypeOf[WithTag["my_tag"]] = ...
Note that ViaTypeOf
on Annotated
types will not include the annotations after the first type specified.
This is because Pyre does not store annotations as part of the type information. Consider the following code:
from typing import Annotated
class Foo:
x: Annotated[int, "foo"]
If there is a ViaTypeOf
on Foo.x
here, the feature shown on traces will be via-type-of:typing.Annotated[int]
,
not via-type-of:typing.Annotated[int, "foo"]
.
via-attribute
Feature Using ViaAttributeName[]
β
The via-attribute
feature is similar to the via-value
feature, however,
it can only be used to model attributes, and captures the name of the attribute
being accessed.
For instance:
my_module.MyClass.my_attribute: ViaAttributeName = ...
Pysa will add the feature "via-attribute:my_attribute
when taint flows through
the attribute.
This also supports tags, using the same syntax as via-value
:
my_module.MyClass.my_attribute: ViaAttributeName[WithTag["example"]] = ...
Note that via-attribute
is most useful in
model queries,
when the attribute name is not known in advance.
Supporting Features Dynamically Using ViaDynamicFeature[]
β
In general, Pysa requires you to specify the list of features that are allowed. This encourages features to be documented, and help avoid typos when writing features so that the features propagating in the analysis are consistent with filters you might have on issues.
However, there might be very specific cases where you want to dynamically generate features, depending on artifacts
of the code. Most cases here can be handled by via-type
and via-value
features, however, you might be dealing with
dynamic code or metadata that the system can't detect. In these cases, Pysa allows skipping validation on features
by the use of ViaDynamicFeature
. This syntax has identical behavior to Via[]
except the lack of validation. Here's an example:
def subprocess.run(
args: TaintSink[RemoteCodeExecution, ViaDynamicFeature[subprocess_run_execution]]
): ...
Automatic Featuresβ
via
Featureβ
In addition to the manually specified via
features, Pysa automatically adds
some via
features with special meaning such as via:obscure:model
, via:obscure:unknown-callee
,
via:format-string
, and via:tito
. via:obscure:model
means that the flow passed
through code that Pysa does not have access to analyze, and thus some taint flow
assumptions were made. This can be a useful feature to filter out flows that may be more
noisy. via:obscure:unknown-callee
means that a call cannot be resolved as the callee is
unknown (most likely because of missing type information). via:format-string
means that
a flow passed through a python f-string (f"Variable:
{variable_name}"
) or a str.format
. Tito stands for taint-in-taint-out which refers to taint
flows that enter a function via a parameter and then exit it in some form via
the return value. The via:tito
feature is attached automatically to all such
flows.
type
Featureβ
The type
feature is an automatically added feature which indicates that the
flow passes through a conversion to the specified type. This feature currently
only tracks conversion to numeric values (ie. type:scalar
). This can be useful
for filtering out flows when numeric values are highly unlikely to result in an
exploitable flow, such as SQL injection or RCE.
first-field
Featureβ
The first-field
feature is automatically added to flows for the first field
access on the flow. E.g., if request
is a source, and the flow starts with
request.f
, then first-field:f
should be attached to the flow.
first-index
Featureβ
The first-index
feature is an automatically added feature which indicates that
a flow starts with a dictionary access using the specified constant as the key.
This is useful in cases such as Django's GET
/POST
/META
dictionaries on the
HttpRequest
object. A flow that started with as access of the HTTP_REFERER
header from the META
object would result in the first-index:HTTP_REFERER
feature being added.
has
Featureβ
The has
features is a summary feature for first-field
and first-index
.
Thus, has:first-index
simply indicates that there is at least one
first-index:<name>
feature present, and similarly for has:first-field
.
always-
Modifier on Featuresβ
The always-
modifier will automatically be added to any of the above features,
when every single flow within an issue has the feature. For example, if an issue
captures flows from three different sources of user input into a SQL sink, the
always-type:scalar
modifier would be added if all three of those flows pass
through a conversion to int
before reaching the sink. Note that the
always-
version of a feature is exclusive with the non-always-
version;
if always-type:scalar
is present, type:scalar
will not be present.
broadening
Featuresβ
Pysa automatically adds broadening features when taint broadening is applied during the analysis.
Broadening features are:
tito-broadening
model-broadening
model-source-broadening
model-sink-broadening
model-tito-broadening
model-shaping
model-source-shaping
model-sink-shaping
model-tito-shaping
widen-broadening
issue-broadening
See taint broadening for more infomation.