Running Pysa
This page walks you through the basics of running Pysa. If you want exercises to walk you through using Pysa's more advanced features, check out this tutorial.
#
SetupThe setup requires the following 4 types of files.
- Source Code (
*.py
): This is your application's code. - Taint Config (
taint.config
): This file declares sources, sinks, features, and rules. - Taint Models (
.pysa
): These files link together the information in your source code andtaint.config
. They tell Pysa where in our code there exist sources and sinks. - Pysa Configuration (
.pyre_configuration
): Parts of this file are critical to using Pysa.source_directories
tells Pysa the directory containing the source code you want to analyze.taint_models_path
tells Pysa where to find the config and model files.
#
ExampleLet's look at a simple taint analysis example. To follow along, create a
directory static_analysis_example
and navigate to it. Paste the code snippets
into the appropriately named files.
#
1. Source Code# static_analysis_example/source.py
import os
def get_image(url): command = "wget -q https:{}".format(url) return os.system(command)
def convert(): image_link = input("image link: ") image = get_image(image_link)
Notice the following:
- The
input
function is a taint source since it gets input directly from the user. - The
os.system
function is a taint sink, since we do not want user-controlled values to flow into it. - The return value of
input
is used as the URL for awget
call, which is executed byos.system
. Thewget
can therefore be doing anything, out of the programmer's control. - This data flow should be identified as a potential security issue.
#
2. Taint Config# static_analysis_example/stubs/taint/taint.config
{ sources: [ { name: "UserControlled", comment: "use to annotate user input" } ],
sinks: [ { name: "RemoteCodeExecution", comment: "use to annotate execution of code" } ],
features: [],
rules: [ { name: "Possible shell injection", code: 5001, sources: [ "UserControlled" ], sinks: [ "RemoteCodeExecution" ], message_format: "Data from [{$sources}] source(s) may reach [{$sinks}] sink(s)" } ]}
This declares the valid sources and sinks that Pysa should recognize. We
also tell Pysa that data flowing from a UserControlled
source to a
RemoteCodeExecution
sink is a possible shell injection.
#
3. Taint Models# static_analysis_example/stubs/taint/general.pysa
# model for raw_inputdef input(__prompt) -> TaintSource[UserControlled]: ...
# model for os.systemdef os.system(command: TaintSink[RemoteCodeExecution]): ...
This file links together the information in source.py
and taint.config
. We
use it to tell Pysa where in our code there exist sources and sinks.
#
4. Pysa Configuration# static_analysis_example/.pyre_configuration
{ "source_directories": ["."], "taint_models_path": "stubs/taint"}
Pysa needs to know what directory to analyze, as well as where to find the config and model files.
#
AnalysisNow let's run the static analysis:
[~/static_analysis_example] $ pyre analyze Æ› Fixpoint iterations: 2[ { "line": 9, "column": 22, "path": "source.py", "code": 5001, "name": "Possible shell injection", "description": "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)", "long_description": "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)", "concise_description": "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)", "define": "source.convert" }]
Looking at the output, we see that pyre surfaces the tainted data flow that we expected.
Let's run it again and save the results:
[~/static_analysis_example] $ pyre analyze --save-results-to ./
The --save-results-to
option will save more detailed results to
./taint-output.json
.