Running Pysa
This page walks you through the basics of running Pysa. If you want exercises to walk you through using Pysa's more advanced features, check out this tutorial.
Setupβ
The setup requires the following 4 types of files.
- Source Code (
*.py
): This is your application's code. - Taint Config (
taint.config
): This file declares sources, sinks, features, and rules. - Taint Models (
.pysa
): These files link together the information in your source code andtaint.config
. They tell Pysa where in our code there exist sources and sinks. - Pysa Configuration (
.pyre_configuration
): Parts of this file are critical to using Pysa.source_directories
tells Pysa the directory containing the source code you want to analyze.taint_models_path
tells Pysa where to find the config and model files.
Exampleβ
Let's look at a simple taint analysis example. To follow along, create a
directory static_analysis_example
and navigate to it. Paste the code snippets
into the appropriately named files.
1. Source Codeβ
# static_analysis_example/source.py
import os
def get_image(url):
command = "wget -q https:{}".format(url)
return os.system(command)
def convert():
image_link = input("image link: ")
image = get_image(image_link)
Notice the following:
- The
input
function is a taint source since it gets input directly from the user. - The
os.system
function is a taint sink, since we do not want user-controlled values to flow into it. - The return value of
input
is used as the URL for awget
call, which is executed byos.system
. Thewget
can therefore be doing anything, out of the programmer's control. - This data flow should be identified as a potential security issue.
2. Taint Configβ
# static_analysis_example/stubs/taint/core_privacy_security/taint.config
{
"sources": [
{
"name": "UserControlled",
"comment": "use to annotate user input"
}
],
"sinks": [
{
"name": "RemoteCodeExecution",
"comment": "use to annotate execution of code"
}
],
"features": [],
"rules": [
{
"name": "Possible shell injection",
"code": 5001,
"sources": [ "UserControlled" ],
"sinks": [ "RemoteCodeExecution" ],
"message_format": "Data from [{$sources}] source(s) may reach [{$sinks}] sink(s)"
}
]
}
This declares the valid sources and sinks that Pysa should recognize. We
also tell Pysa that data flowing from a UserControlled
source to a
RemoteCodeExecution
sink is a possible shell injection.
3. Taint Modelsβ
# static_analysis_example/stubs/taint/core_privacy_security/general.pysa
# model for raw_input
def input(__prompt) -> TaintSource[UserControlled]: ...
# model for os.system
def os.system(command: TaintSink[RemoteCodeExecution]): ...
This file links together the information in source.py
and taint.config
. We
use it to tell Pysa where in our code there exist sources and sinks.
4. Pysa Configurationβ
# static_analysis_example/.pyre_configuration
{
"source_directories": ["."],
"taint_models_path": "stubs/taint"
}
Pysa needs to know what directory to analyze, as well as where to find the config and model files.
Analysisβ
Now let's run the static analysis:
[~/static_analysis_example] $ pyre analyze
Ζ Fixpoint iterations: 2
[
{
"line": 9,
"column": 22,
"path": "source.py",
"code": 5001,
"name": "Possible shell injection",
"description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"long_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"concise_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"define": "source.convert"
}
]
Looking at the output, we see that pyre surfaces the tainted data flow that we expected.
Let's run it again and save the results:
[~/static_analysis_example] $ pyre analyze --save-results-to ./
The --save-results-to
option will save more detailed results to
./taint-output.json
.