= Google Summer of Code 2022 =
'''[[https://summerofcode.withgoogle.com/archive/2022/organizations/strace|strace was taking part in the GSoC 2022 as a mentor organization]].'''
== About strace project ==
strace is a diagnostic, debugging and instructional userspace tracer for Linux. It is used to monitor and tamper with interactions between processes and the Linux kernel, which include system calls, signal deliveries, and changes of process state. The operation of strace is made possible by the kernel feature known as [[http://man7.org/linux/man-pages/man2/ptrace.2.html|ptrace]].
strace is one of the longest running open source projects and had been started even before Linux was started.
strace is an important tool for debugging and tracing deployed on all Linux distributions with a small community of active contributors.
While strace is a small project, the strace tool is essential for many developers, system administrators and open source projects. Its maintainers and contributors are experienced developers.
The project organization is simple: the community discusses proposed patches and a few core maintainers eventually accept or reject contributions. All contributions are submitted as git patches to the mailing list, which is the single point of communication, in a mode very similar to the ways of the Linux kernel.
strace release cycle is currently synchronized with the release cycle of the Linux kernel.
Note that we are pretty laid back and cool compared to larger and professional projects like the Linux kernel but our standards are high and the people involved in strace are die hard system coders often contributing to or maintaining major C libraries such as Glibc, Glib or Bionic, contributing to the Linux Kernel and other major free and open source projects.
So we expect that you would be making the efforts to learn our mailing list and patch ways and ask good questions and do your home work for a most productive and efficient participation.
== What to do as a prospective student ==
We want engage with students that are interested in system programming and want to help making strace a better tool. We hope to gain you as a new long term contributor and that you will contribute interesting and new features.
You need to grok C and have an interest in system programming and debugging. The codebase is not huge but the domain is not simple and requires a meticulous attention to many details.
All the communication is going through a single mailing list: https://lists.strace.io/mailman/listinfo/strace-devel
Subscribe to the list, introduce yourself and start the discussion!
Please prefix your email subjects with GSOC.
{{{
Please be kind enough to follow these simple guidelines when posting to the list:
1. only send text emails. No HTML
2. do not top post
3. use and abuse the mailing list archive to see how proper discussions are handled
4. be patient, a reply may need a week to come by
5. use git tools to create and submit patches to the list
6. apply to your code the same code style and indentation used overall in strace
Thank you!
}}}
— https://lists.strace.io/pipermail/strace-devel/2016-March/004704.html
After introducing yourself on the list, you can join our IRC channel, #strace @ oftc. '''Introducing yourself only on IRC is not enough; the mailing list is the primary means of communication.'''
Check our list of projects ideas below or submit new ideas to the list for consideration.
It is required that students who want to apply to the strace project for the GSoC 2022 complete a relatively small code-related "microproject" as part of their application. Please refer to our guidelines and suggestions for MicroProjects for more information. Completing a microproject is not only an important way for us to get experience with applicants, but it will also help applicants become familiar with strace's development and submission process.
== General proposal requirements ==
You will need to submit your official proposal via https://summerofcode.withgoogle.com and plain text is the way to go.
Please subscribe to the [[https://lists.strace.io/mailman/listinfo/strace-devel|strace-devel]] mailing list and post your proposal there too.
We expect your application to be in the range of 1000 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job. Your proposal should contain at least the following information, plus anything you think is relevant:
* Your name
* Title of your proposal
* Abstract of your proposal
* Detailed description of your idea including explanation on why is it innovative and what it will contribute
* Description of previous work, existing solutions (links to prototypes, bibliography are more than welcome)
* Mention the details of your academic studies, any previous work, internships
* Any relevant skills that will help you to achieve the goal (programming languages, frameworks)?
* Any previous open-source projects (or even previous GSoC) you have contributed to?
* Any open-source code of yours that we can check out?
* Do you plan to have any other commitments during SoC that may affect you work? Any vacations/holidays planned? Will you be available full time to work on your project? (Hint: do not bother applying if this is not a serious full time commitment)
Beyond your proposal you need obviously to be familiar with C and Git (or willing to learn these two super quick).
== List of project ideas for students ==
=== Comprehensive test suite ===
The test suite we have today is far from covering all branches of all parsers yet. According to [[https://codecov.io/github/strace/strace|Codecov]], current test coverage is just under 90%, but it tells very little about the actual coverage of various corner cases (checks for type sizes, signedness, handling of pointers to invalid memory, etc). Some trivial bugs([[https://bugzilla.redhat.com/show_bug.cgi?id=2028146|1]], [[https://bugzilla.redhat.com/show_bug.cgi?id=1660759|2]]) that pop from time to time only further confirm this fact.
The goal of this project is to improve the test suite to a level that makes strace more reliable.
On the one hand, it would be educational for any student who is interested in syscall internals because writing syscall parsers and tests for them is the second best way to find out how syscalls work.
On the other hand, a comprehensive test suite is a prerequisite for any major change in strace source code. This test suite project does not have to be a work from scratch, there are already existing tests (e.g. strace/tests, [[https://github.com/linux-test-project/ltp/tree/master/testcases/kernel/syscalls|ltp/testcases/kernel/syscalls]], and [[https://github.com/gentoo/sandbox/tree/master/tests|sandbox/tests]]) that could be used as a starting point.
There are several ideas that may help with significant coverage increase in some parts of the code base:
* More elaborate use of syscall injection. Specifically, `--inject=...:poke_enter=` and `--inject=...:poke_exit=` should help with testing of printing of values that change between entering and exiting.
* Perform syscall injection for the strace itself, to check various code paths that rely on the information returned by the kernel (like getxattr() and reads from procfs/sysfs)
<
>''Expected outcomes:'' increased test coverage
<
>''Skills required/preferred:'' C, shell, obsessiveness, pedantry
<
>''Possible mentors:'' [[eSyr|Eugene Syromyatnikov]], [[DmitryLevin|Dmitry V. Levin]]
<
>''Expected size of project:'' 175—350 hours
<
>''Difficulty rating:'' easy to medium
=== Support for BTF and other BPF decoding improvements ===
BTF (BPF Type Format) is a binary format (which is mutilated CTF, C Type Format) for describing type information for the data used in various places connected to eBPF. One of such places is eBPF maps, where BTF information for the key and value types may be supplied. The goal of this project is to retrieve the BTF information for the eBPF maps and use it to enhance decoding of keys and values of map elements passed to/from the kernel in map-manipulation-related `bpf` syscalls.
## If BPF assembly dumping patch[[https://lists.strace.io/pipermail/strace-devel/2018-June/008220.html|[1]]][[https://lists.strace.io/pipermail/strace-devel/2018-June/008222.html|[2]]] will be merged, this also can be used in the disassembly output, similarly to the way bpftool does this.
<
>''Expected outcomes:'' enhanced decoding for the map manipulation sub-calls of the bpf() syscall
<
>''Skills required/preferred:'' C, some eBPF knowledge is preferred
<
>''Possible mentors:'' [[eSyr|Eugene Syromyatnikov]]
<
>''Expected size of project:'' 175 hours
<
>''Difficulty rating:'' medium
=== Implement an ioctl decoder ===
ioctl commands is an endless field of strace improvement due to their vast diversity. There are many ioctl commands, decoding for which is not properly (and/or fully) implemented, including several classes of frequently requested commands, such as:
* !Video4Linux (`V4L2_*`): there is some decoder present, but it is in distinct need of renewal and update. See also a [[https://github.com/strace/strace/commits/esyr/v4l2|WIP branch]] and [[GhIssue:63|this issue]].
* socket (`SIOC*`): this one is somewhat fuzzy, as the issue is exacerbated by the fact that these ioctl commands are address family/protocol-specific (and there are currently no facilities implemented to distinguish ioctl commands based on socket FD information), which likely increases project's size and difficulty in case one decides to implement it in full. See also [[GhIssue:64|this issue]].
* USB (`USBDEVFS_*`). See also [[GhIssue:52|this issue]].
* ALSA (`SNDCTL_*`). See also [[GhIssue:44|this issue]].
* Binder. See also [[GhIssue:29|this issue]].
Note that there is a [[https://github.com/strace/strace/commit/5cad0ff0f30c2729fca215c733dac83e19752040|decoder generator]] now available ([[https://github.com/strace/strace/commit/757b775ba74a5e76927266774d983960eae3c20a|generated HDIO_* decoder example]]), which may help with writing a decoder (but not with tests so far).
<
>''Expected outcomes:'' enhanced decoding of specific category of ioctl commands
<
>''Skills required/preferred:'' C, possibly flex/bison, some ioctl field-specific knowledge is preferred
<
>''Possible mentors:'' [[eSyr|Eugene Syromyatnikov]]
<
>''Expected size of project:'' 175—350 hours
<
>''Difficulty rating:'' easy to medium
=== Implement a netlink decoder ===
The space of improvement with regards to netlink protocol decoding is as vast and endless, as the ioctl one. Possible candidates for improving netlink decoding include:
* netfilter. It is vast and definitely requires some scoping (and is probably a 350-hour project).
* 802.11. There is a caveat with vendor-specific commands.
* ethtool.
* taskstats. Requires implementation of genl decoder/dispatcher.
<
>''Expected outcomes:'' enhanced decoding of specific classes of netlink messages
<
>''Skills required/preferred:'' C, some netlink field-specific knowledge is preferred
<
>''Possible mentors:'' [[eSyr|Eugene Syromyatnikov]]
<
>''Expected size of project:'' 175—350 hours
<
>''Difficulty rating:'' easy
=== More flexible output to file ===
There are several disparate github issues that may form a cohesive strace output handling improvement project: an implementation of more flexible (for example, format-string-based, [[GhIssue:54]]) output file name specification (along with some additional rules for output file handling, like renaming/rotating/stopping) may solve the following issues:
* Improving process identification by the file name ([[GhIssue:25]], [[GhIssue:99]], [[GhIssue:153]])
* Avoid uncertainty regarding the owner of the trace stemmed from PID reuse ([[GhIssue:153]])
* Uncontrolled growth in log size for long-running traces ([[GhIssue:139]], [[GhIssue:179]])
As a bonus, the following issues may also be tackled:
* Extend the flexibility of output piping with -o| syntax by enabling pipe output per-tracee
<
>''Expected outcomes:'' improved managing of strace's output
<
>''Skills required/preferred:'' C
<
>''Possible mentors:'' [[eSyr|Eugene Syromyatnikov]]
<
>''Expected size of project:'' 175 hours
<
>''Difficulty rating:'' easy to medium
=== Improve statistics handling ===
While strace does indeed have some facility that provide some syscall statistics regarding the traced processes, it is still of rather limited use. Some things that can be improved and make its usability on par with other tracing tools:
* Provide per-tracee statistics
* Provide statistics periodically and/or on demand
* Provide histograms of syscall time
* Collect signal statistics
* Improve statistics output formatting
For this project, design and features of other tracing tools, like lttng and perf, are likely to be considered.
<
>''Expected outcomes:'' enhanced decoding
<
>''Skills required/preferred:'' C, some eBPF knowledge is preferred
<
>''Possible mentors:'' [[eSyr|Eugene Syromyatnikov]]
<
>''Expected size of project:'' 175 hours
<
>''Difficulty rating:'' easy
=== Implement the features requested on Github ===
There are several issues present on the [[https://github.com/strace/strace/issues|github issues page]]. A couple of them could be enough for a 175-hour long project, depending on the size of the changes required. You'll have to try and estimate how many hours you'll need for each task that you pick. Adding own enhancement ideas to the plan is fine. See also: FeatureRequests
There is an assortment of small enhancements that can be picked from that list, for example:
* Separate the test running system from the build system to make it possible to run pre-built tests on embedded devices (without gcc or even make); also [[GhIssue:151]]
* Enhance `-e inject=` condition specification [[GhIssue:86]], [[GhIssue:104]], [[GhIssue:173]], also [[GhIssue:125]]
* Fork/thread-following-related [[GhIssue:141]], [[GhIssue:175]]
* Autostop conditions: [[GhIssue:139]], [[GhIssue:179]]. Also, we need an ability to split logs during runtime into files of X bytes or less (imagine vfat file size limitations).
## * Naming the output file: [[GhIssue:54]], [[GhIssue:153]]
## * Related to syscall statistics: [[GhIssue:47]], [[GhIssue:48]], also [[GhIssue:156]]
## * Adding new ioctls is a never-ending resource of possible enhancements: [[GhIssue:28]], [[GhIssue:29]], [[GhIssue:30]], [[https://github.com/strace/strace/issues?q=is%3Aissue+is%3Aopen+label%3Aioctl|etc.]]
<
>''Expected outcomes:'' enhanced decoding
<
>''Skills required/preferred:'' C, shell, possibly m4 and awk
<
>''Possible mentors:'' [[lineprinter|Elvira Khabirova]], [[eSyr|Eugene Syromyatnikov]]
<
>''Expected size of project:'' 175 hours
<
>''Difficulty rating:'' medium
=== Support for alternative tracing backends ===
There is a long-standing gdbserver backend proposal, that will enable running strace on tracees that are under the control of gdb (that, in turn, acts as a ptrace request multiplexer, enabling simultaneous connection of various tracers and debuggers), but it's still not finished:
* [[https://github.com/cuviper/strace/|Original work by Josh Stone]]
* [[https://github.com/stanfordcox/strace/commits/gdbserver0|Current state by Stanford Cox]]
* [[https://github.com/esyr-rh/strace/commits/gdbserver-prep|Preparational patches that include initial backend support]]
* [[https://lists.strace.io/pipermail/strace-devel/2017-January/005915.html|gdbserver backend proposal letter]]
The project would require updating the backend to the current strace's state (that has gotten new calls that are supposed to be called on the tracee's side, including some calls inside libdw/libselinux), implementing a reasonable test suite for it, and scratching a set of missing gdb stub calls that are needed to be performed on the tracee's side.
## There is also an idea that uprobes/kprobes/ftrace/perf can be utilized for tracing syscalls as a more modern way of tracing processes, which makes the possible support for various tracing backend more useful.
##
## * [[https://github.com/pmem/vltrace|vltrace]]
## * [[https://devconfcz2018.sched.com/event/DJYj/stracing-using-perf-and-ebpf|"stracing using pers and eBPF" talk by Arnaldo Carvalho de Melo]]
## * [[http://vger.kernel.org/~acme/perf/linuxdev-br-2017.pdf|"News from tools/perf land: What has been brewing in the Linux observability tools" by Arnaldo Carvalho de Melo]]
## * [[https://linuxplumbersconf.org/event/2/contributions/78/|Discussion at LPC 2018]] ([[https://linuxplumbersconf.org/event/2/contributions/78/attachments/63/74/lpc_2018-what_could_be_done_in_the_kernel_to_make_strace_happy.pdf\#page=20|"Problem 8: strace is slow, perf can lose data"]])
## * [[https://lore.kernel.org/lkml/20181128134700.212ed035@gandalf.local.home/|RFC patch by Steven Rostedt]]
<
>''Expected outcomes:'' new strace tracing backend, based on gdb stub protocol
<
>''Skills required/preferred:'' C, gdb stub protocol knowledge is preferred
<
>''Possible mentors:'' [[eSyr|Eugene Syromyatnikov]]
<
>''Expected size of project:'' 350 hours
<
>''Difficulty rating:'' hard
##=== Improving documentation of the internal APIs ===
##''Suggested by:'' [[eSyr|Eugene Syromyatnikov]]
##
##''For GSoC 2022, the fate of this project idea is undecided, since the program rules urge not to include documentation-only projects.''
##
##Over the years, strace's internal APIs, that are used for various purposes (like printing various entities), have been grown significantly, to the point it leads to duplication of the code (for example, printing of hexadecimal strings used to be duplicated now in `v4l2.c`, `btrfs.c` and `util.c` for quite some time). The other issue with the vast internal API (which is usually the result of long history of handling various issues with various architectures and version of the Linux kernel) is that it's not self-evident how things should be done properly. It's believed that documenting current APIs could lower the learning curve and increase overall quality of the code. Some things that could be done here include, but are not limited to:
##
## * Adding Doxygen documentation for the existing APIs
## * Writing overviews for some parts of the API ([[XlatDocumentation|an incomplete example of an unfinished xlat API overview]])
=== Other ideas ===
We are also open to any suggestions not listed on this page.
Some existing ideas are present on a [[FeatureRequests|separate page]]. Note, however, that they may be not adequately sized for a GSoC project or require specific qualifications.