The following text was submitted as my proposal for Google's Summer of Code 2006: Project Title Using static analysis to find vulnerabilities Synopsis Applications which are written in PHP usually deal with users and other external sources of data. This external data should always be processed in such a way that it cannot do any harm to the application itself or the system it is running on. Since programmers are usually just like normal people they sometimes forget to process input properly. A beginning programmer doesn't even know that input can be dangerous. With the use of a statical analysis of the source code of an application, 'dangerous' data can be tracked down and the unsafe use can be reported to the programmer. Tools for this analysis can be built with the help of Stratego/XT [1]. Benefits to the PHP Community The main benefit will be that programmers are able to find vulnerabilities in an automated way. Testing applications will be easier and this can decrease the amount of bugs in applications. The community will also get a basis for building more programs that can be used to improve their code. The PHC [2] has some ideas for this [3]. Deliverables 1: A parser for the latest PHP 4 version validated against the test suite of the distribution. 2: A parser for the latest PHP 5 version validated against the test suite of the distribution. 3: A tool that can analyse a PHP script to find possible vulnerabilities. The percentage of false positives should not be above 40%. 4: A description about the method used and problems encountered. Project Details This project will be built with the help of Stratego/XT. There is already an (incomplete) syntax definition in SDF of PHP that is made by Eelco Dolstra [6]. This is done in the context of the StringBorg [5] framework and based on the Bison/Flex definition of PHP itself. This SDF is not yet complete but provides a very good structure to make the tool to parse scripts to an Abstract Syntax Tree. The project will start with the development of a SDF that can parse all the test files in the current releases. This only includes the real code of the test-files, not the specific declaration of the environment. This code should be parsed and pretty-printed. After this transformation the output should be the same as the input. The second part of the project will consist of making a tool that statically analyzes the source code of an application for vulnerabilities. This tool will be able to see if the programmer uses variables that are not safe. For example the printing of a GET-variable that is not escaped. To detect this the tool will use the concepts of data-flow analysis. Project Schedule May 23, 2006: Start of the project. Starting to work on the SDF grammars. June 26, 2006: The SDF should be finished and the test-files should be parsed correctly. June 27, 2006: Starting to work on the static analysis. July 8-15, 2006: No progress. Student is away with the scouts on camp. August 1, 2006: The tool should be able to give useful feedback when parsing an open source PHP project. August 21, 2006: End of the project. All tools are finished. Project references During the development of this proposal I stumbled upon Pixy[4]. A Java tool that is based on the idea of data flow analysis. This project will do something similar. It will extend the analysis with the support of the object-oriented features of PHP. Apart from that it will provide a solid basis to create other tools. Another source of inspiration is PHC [2]. The problem with this is that one should use c++ to work with it. I think that it is easier to develop programs that transform/analyse source code in Stratego/XT instead of c++ because Stratego/XT is specifically made for this purpose. Bio I am currently following the Master Program Software Technology at the Utrecht University. Before that I have completed the Bachelor program at the same university. I also followed the teacher training for primary education at the Marnix Academie [8]. Apart from my study I work 1.5 days as a teacher in the first grade of a primary school. I'm also active in the scouts movement of The Netherlands [9]. An activity is the work with 'Team Internet'. We develop and maintain the system that is used to record and manage all information related to all members of the scouts organization in the Netherlands. This system is completely written in PHP. This project can help us in the search for vulnerabilities and provides a basis to make more tools that support our development. Apart from this practical aspect there is another motivation for this project. By giving the right feedback to people that use this tool, they can learn from their mistakes. My teacher-part really likes that idea. If there are any questions please contact me by e-mail. [1] http://www.stratego-language.org/Stratego/WebHome [2] http://www.phpcompiler.org/index.html [3] http://www.phpcompiler.org/spinoffs/index.html [4] Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities http://www.seclab.tuwien.ac.at/projects/pixy/ [5] http://www.stratego-language.org/Stratego/StringBorg [6] https://svn.cs.uu.nl:12443/repos/StrategoXT/stringborg/trunk/grammars/php/syntax/ [7] http://www.cs.uu.nl/ [8] http://www.hsmarnix.nl/english/english.htm [9] http://www.scouting.nl/frontend/sol/index.php?task=rs_static&action=news -- Main.EricBouwers - 08 Sep 2006