A Survey of Most Common Errors in Linux Kernel

This document summarizes an algorithm to analyze common errors in the Linux Kernel source code using Git. The algorithm identifies: 1) The most frequent error messages by finding commit messages containing "Fix" or "Fixed" and comparing them using Levenshtein distance. 2) The most buggy source files by counting how often each file is mentioned in commit messages containing a "fix". 3) The most buggy lines of code by finding which lines are most often mentioned in commit messages for the most buggy source files. The algorithm uses the Java library JGit to access Git commits and diffs to perform the analysis on a local Linux Kernel Git repository.

Uploaded by

Sauradip Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

322 views

A Survey of Most Common Errors in Linux Kernel

Uploaded by

Sauradip Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A survey of most common errors in Linux Kernel

Sergey Staroletov
Polzunov Altai State Technical University,
Lenin avenue 46, Barnaul, 656038, Russia
Email: serg [email protected]

Abstract—The paper is devoted to a technique to analyse the A commit is an action when a software developer makes
most frequent bugs in the Linux Kernel by checking the commit a fixation of the changes he did. The developer takes the code
messages and diffs with using Git library. It examines the most from a common repository, works on it and when he thinks he
relevant error messages, the most buggy files and the most buggy has done a requested progress after reviewing the changes, he
lines of code. Some results of experiments on real Linux Kernel saves the changes in the repository by doing a commit. The
Git repositories are given.
commit is coming with a message in natural language about
Keywords—Linux Kernel, Git, Bugs, Algorithm, Tool, Leven- the changes.
shtein Distance
JGit [4] is an open source library written in Java to operate
with Git repositories and walk around the internal commit
I. I NTRODUCTION trees. It was chosen in the current work because of proper
documentation and a big amount of code snippets on the
The Linux Kernel [1] is a very good example how people Internet. JGit offers to access to commits, analyze the diffs,
collaboration can solve difficult problems together. The Kernel and we can write the code in Java for own analysis while we
has been written initially by Linus Torvalds, and now there are iterating among desired Git nodes.
are more than 16.000 contributors (we can get that value by
counting the committers of main Linux repository). A kernel
III. F ORMULATION OF THE PROBLEMS
is a set of functions and data structures that do the low
lever system stuff for an operating system, especially memory Suppose we have a local Git repository of the Kernel
management, process scheduling, devices support, networking, or Kernel related sources (but it can be applied to any Git
etc. The Kernel is written in C and it uses their own C library. repository too). Our goal is to:
One of the most advantages of releasing the open source 1) Find the most frequent errors. It could be reformu-
software - free feedbacks and error reports. A lot of users test lated as: find the most common commit messages of
the upcoming kernel and report bugs. There is no any warranty fixes, and then reformulated to: find the most relevant
for the software; people use it for the own risk, test if for free commit messages of fixes, not necessary the same but
and provide the feedbacks. The software engineer could collect similar.
the reports, find problems and fix the software; later it could be 2) Find the most buggy sources. It could be reformu-
released as a stable version because it has already been tested lated to: find the files which were mentioned in the
by thousands and millions of users, and then could be used in commits with ”fix” message most often.
commercial systems. 3) Find the most buggy lines of code. It could be
Now software developing teams use version control sys- reformulated to: find the lines which were mentioned
tems (VCS), for the Linux Kernel developing the tool Git [2] in the fixing commit messages in the most buggy
is being used. Because the VCS could easily track any change source files most often.
of project state including new feature, improvement and fixing Solving the first problem will able us to getting know the
the bug, it is possible to track the fixes, build statistic for the most regular classes of the errors in C Kernel code. Developers
analysis and the paper is devoted to it. and teachers should be aware of them to learn C techniques
and kernel programming to avoid them.
II. G IT AND JG IT Solving the second problem will show us the most unstable
Git VCS [2] is the second well-known project from Linus portions of the Kernel, it could be very hard to write them at
Torvalds after the Kernel. For the process of developing once correctly because of difficulty. Also, it could mean that
the Kernel he needed a distributed VCS and, at the same the component corresponding to file is an active development
time, a local VCS, which can be used offline to track the and fixing because of significance.
changes, create the patches, see the diffs (differences between And solving the third program can be useful for analysing
commits), merge the requests given from the other developers, the errors with respect to a source code.
for example, when he is flying from Europe to America in the
plane or using a quite Internet-less place. But in 2005 there
IV. F INDING THE CLOSEST MESSAGES
was no such a system, and he decided to make it [3]. Now Git
is being utilized in a big amount of software companies which It is easy to find the same messages in the messages list
want to share the code between the team, track the changes but to find closest messages we have to use special algorithms
and control the development process. that are counting metrics of similarity of two given strings.
The strings here are the commit messages which contain will be described in the next section, GetFirst is a bounded
”Fix” or ”Fixed:”, from those starting points to the end of a iterative algorithm that returns first count values from the given
line or to the end of a string which is coming firstly. ”Fix” or collection, SortMap - is a hash table sorting algorithm by the
”Fixed” have been given because of finding it while analysing value in descending order.
the existing commits in the Linux Kernel repository.
For this work, we use the Levenshtein distance [5] algo- VI. F INDING THE MOST BUGGY SOURCES
rithm and the [6] implementation of it. This algorithm runs from the previous algorithm to travel
Levenshtein distance computes a value of a distance be- through possible huge amount of commits once. Here is this
tween two strings. To find the nearest string to a string list, proposed algorithm:
we must calculate the Levenshtein distance between the given algorithm FindMostBuggySources (commit, count)
string and every string in the string list and return the string
with a minimal distance. diffs ← DiffsScan (commit, Parent(commit))
It is not a very clear and high-speed solution, but it works. for diff ∈ diffs
fileName ← ChangedFileInDiff(diff)
V. F INDING THE MOST FREQUENT ERRORS
for edit ∈ Edits (diff)
Here is the algorithm for it proposed:
mapFileNameChanges [fileName] ++
algorithm FindMostFrequentErrors (repository, count)
FindMostBuggySourceLines (filename, edit)
for branch ∈ Branches(repository)
for commit ∈ Commits (repository, branch, path) endfor

commitMessage ← Message (commit) endfor

if commitMessage ∈ (”Fix”, ”Fixed:”) return GetFirst(count, SortMap(mapFileNameChanges))

for otherMessage ∈ messages Here commit - is a JGit object represented the commit,
count - maximum value of sources to find, specified at the
minDst ← min (minDst, program startup, DiffsScan - an algorithm to retrieve diff
LevenshteinDistance( objects from the two commit differences, here - is a part of
JGit library,Parent - an algorithm to get the parent commit
commitMessage, otherMessage)) from given commit object, it means the previous state of the
repository, changes by a given commit, ChangedFileInDiff - is
closestMessage ← (message ∈ messages|
an algorithm to get changed filename from a given diff object,
LevenshteinDistance (commitMessage, message)==min) mapFileNameChanges - a hashtable to map changed filename
to the count of changes, FindMostBuggySourceLines is an
endfor algorithm to solve the most buggy lines of code problem,
messages[] ← commitMessage will be described in the next section, GetFirst is a bounded
iterative algorithm that returns first count values from the given
mapMsgRelevance [closestMessage] ++ collection, SortMap - is a hash table sorting algorithm by the
mapMsgRelevance [commitMessage] ++ value in descending order. Note that return here is given for
consistency, it actually runs after the FindMostFrequentErrors
FindMostBuggySources (commit) ends.
endif
VII. F INDING THE MOST BUGGY SOURCE LINES
endfor
This algorithm runs from the previous algorithm. Here is
endfor the proposed algorithm:
return GetFirst(count, SortMap(mapMsgRelevance)) algorithm FindMostBuggySourceLines (fileName, edit,
Here repository - is a JGit object that abstracts of Git local count)
cloned repository, count - maximum value of errors to find,
for line ← FirstInsertLine (edit)...LastInsertLine (edit)
Branches returns all the branches from given Git repository,
Commits returns all the commits from given repository, given mapFileChanges[fileName][line] ++
branch and given starting path (we can start not from the root
of the repository but from some folder within it), Message endfor
returns commit message from the given commit object, Leven- return for fileName ∈ MostBuggySources
shteinDistance calculates the Levenshtein Distance value from
two given strings, messages is a list of already known commit GetFirstLines(count,
messages, mapMsgRelevance is a hash table that maps a mes- SortMapByChanges(mapFileNameChanges[filename]))
sage to its relevance value in the commits, MostBuggySources
is an algorithm to solve the most buggy sources problem, endfor
Here fileName is a filename to check the buggy lines, drivers/gpu/drm/i915/intel_display.c/9165
count - maximum value of bug lines per source to find, Line:3166, changes -> 7
specified at the program startup, edit - a JGit object represented drivers/staging/skein/
edit insertion in the source file (they are given from Most- threefish_block.c/6481
BuggySources run), FirstInsertLine - an algorithm to extract Line:1048, changes -> 3
the first inserted line from the edit object, from JGit library, drivers/gpu/drm/i915/i915_gem.c/5390
LastInsertLine - an algorithm to extract the last inserted line Line:1213, changes -> 7
from the edit object, from JGit library, mapFileChanges is two drivers/gpu/drm/i915/
keys hash table, maps filename and line in the file to count of intel_ringbuffer.c/4396
changes, GetFirstLines is a bounded iterative algorithm that Line:2363, changes -> 7
returns first lines values from the given two keys hashtable, drivers/gpu/drm/i915/intel_pm.c/3960
SortMapByChanges sorts the given hash table for the desired Line:2276, changes -> 9
filename by the count of changes. Note that return here is /dev/null/3911
given for consistency, it actually runs after the algorithms Line:0, changes -> 3911
FindMostFrequentErrors and FindMostBuggySources end. include/linux/mlx5/mlx5_ifc.h/3745
Line:922, changes -> 4
Creating the hash table with a key of changed line could drivers/gpu/drm/i915/intel_lrc.c/3547
be memory costly, but now we are working not with ordinary Line:774, changes -> 7
commits but the commits with fixes only, where a maximum drivers/gpu/drm/i915/i915_debugfs.c/3381
of inserted lines couldn’t be so big. Line:5092, changes -> 7
drivers/gpu/drm/i915/
VIII. E XPERIMENTS AND ANALYSIS i915_gem_gtt.c/3206
Line:2615, changes -> 5
During the R&D work, all the given algorithms were
drivers/scsi/scsi_debug.c/2619
implemented as a Java program. The algorithms are nor well
Line:4882, changes -> 5
optimized nor parallelized or distributed (it is a possible subject
drivers/net/ethernet/mellanox/mlx5/core/
of further work), so the analysis of current master Linux Kernel
en_main.c/2373
repository [7] (more than 600.000 commits) lasts for about 10
Line:297, changes -> 6
hours at regular Intel Core i5 laptop. This analysis gives us
drivers/gpu/drm/i915/i915_drv.h/2252
overall statistic.
Line:3244, changes -> 8
Because the developed software offers to choose the repos- drivers/net/dsa/mv88e6xxx.c/2247
itory or choose the initial path in the repository, it is better to Line:1238, changes -> 5
analyze different related repositories in [8] or parts of the main arch/powerpc/xmon/ppc-opc.c/2138
repository (memory management, networking and so on). The Line:2964, changes -> 2
time for analysing a repository of 20.000 commits is about 10 drivers/gpu/drm/i915/intel_dp.c/2060
minutes. Line:1227, changes -> 6
drivers/media/pci/saa7134/
So, here are some results of analyzing the whole main saa7134-cards.c/1982
Linux repository, Bluetooth protocol stack for Linux (Bluez) Line:5713, changes -> 2
[9], memory management part [10], kernel scheduling part and
networking part of the main repository. Comments
The results were corrected because of some duplicates The overall analysis gives us information that the most
(plurals, signs, etc.). relevant issues are not the code issues, they are integration
The main repository. Most common error fixes. The issues.
format here is: fixing commit/relevance. Firstly we could observe the non-informative commits
messages (it, this, that) from commit message such as ”Fixed
it/135 this”. They spoil statistics and it is strange that they are in a
typos/102 project of this level. ”Typos” is somewhere near.
the checkpatch.pl issues:/102
some error handling/78 Checkpatch warnings and issues are the most frequent.
this/70 Checkpatch.pl [11] is a script to check the code to a Linux
that/53 Kernel guideline. The guideline is given in [12]. Very strange
dtc warnings/49 that developers who want to create a patch to the Kernel,
checkpatch warning:/45 ignoring that document. The Kernel is a big amount of mostly
module autoload/44 C files and if every developer of 16.000 had used a self style
modular build/44 of coding, the project could be a big mess. At the beginning
unaligned accesses in VC code/29 of coding to every project, every developer should request the
camel case/29 project’s code style.
missing interrupts/28 Error handling is a way to improve system stability and
performance after the implementation of a principal logic. The
The most buggy files and lines: filename/fixes count developer can write code without paying attention to special
cases. After, during the testing the module or a kernel part,
some errors will occur and the code must be fixed. Error
handling can be a set of special ifs and gotos to the end of
code to free the buffers and return a negative error code.
DTC warnings are about device trees [13]. The core reason
for the existence of Device Tree in Linux is to provide a way
to describe non-discoverable hardware. This information was
previously hard coded in source code. To deal with device tree
developer must know low-level information from chipset and
motherboard vendors, so testing it is not simple.
Module autoload is a process to load module in a boot
time. Typically developers test the loadable Kernel modules
by installing them into the live kernel, but at the load time, Fig. 2. The most common error fixes types
the module could conflict with other modules in the Kernel so
it is an integration issue.
a bug of Git or JGit), a lot of fixes for Intel GPU drivers, Skein
Modular build is possible a process of creation correct Hash function driver, SCSI driver, fixes for Philips SAA7134
makefiles for building the module to build it among the kernel. TV card, fixes for Mellanox Ethernet adapter, etc. (see Figure
It is a process of integration the module to the whole kernel 3).
and kernel config.
Unaligned memory access occurs when the developer tries
to read N bytes of data starting from an address that is not
evenly divisible by N (i.e. addr % N != 0) [14]. It can cause
performance costs in some architectures or even a processor
exceptions. VC means PCI Express bus virtual channel.
Camel case is a code style issue. It usually means the
naming the variables and functions as one big word with
capital letters in the place of sub-words starts without any signs
like SomeFunction. The fixes mean that initially the developer
followed to another code styles like ”some function” or didn’t
follow to any style.
Missing interrupts is a specific OS kernel or driver issue.
A driver can set up and start an I/O operation, then wait for an Fig. 3. Top types of Linux buggy drivers
interrupt indicating completion. If that interrupt never shows
up, the driver can end up waiting for a very long time [15]. We see that Intel display and GPU drivers have the most
We think this issue looks like a problem with handling on a numerous fixes in the Linux Kernel. It could mean that the
semaphore and it could be solved with static checking methods. drivers are completely unstable, we see lots of fixes in the
interface .h file, that is nonsense. But it also means a good
The overall diagram of most common errors fixes is given support of Intel company to his drivers. In not a secret that a
on Figure 1. If we get rid of typos and uninformative fixes and big amount of free Linux code is written in the work time by
the developers from hardware and software corporations (Intel,
Oracle, IBM,..). They are paid for the patches for Linux kernel
or drivers, they test the software by the significant amount of
free Linux users to find possible bugs and then use the well-
tested software in the own projects, create commercial products
and sell it to vendors.
There are a lot of fixes in Skein Hash Function [16]
implementation. Skein is a finalist in the competition to be
an SHA-3 function, but not a winner. It is a cryptographic
function, so there were some attacks on it, and the performance
should be a matter, so there are a lot of fixes here.
Bluez. It is a Linux part (Linux Bluetooth protocol stack), it
is selected because of not a very big amount of commits (about
Fig. 1. Top error fixes after analysis the main repository of Linux Kernel 23.000) and independence of the most Linux components.
Most common error fixes: fixing commit/relevance
group the results, we can get Figure 2.
If we look into the most buggy files (files with a big count memory leaks/153
of fixing commits) we find the strange /dev/null path (possible coding style/60
typos/47
possible invalid reads/25
double free/24
setting connecting state/24
not reseting sink source/22
whitespace issues/15
not handling notification/21
missing file/21
use after free/20
passing wrong error code/20
possible NULL pointer dereference/19
warnings/18
sending command responses/18
invalid reference counting/18 Fig. 4. Pointer related errors in Bluez
incorrect error check/17
includes for gobex.h header/17
not setting scope properly/16 We can see emails from well-known companies and a stupid
device found tests/16 miss the space. Why not follow code guidelines? Why not
memory leak in gap/16 review the code before commit?
And there are some warning fixes. Why commit to the
The most buggy files and lines: filename/fixes count Linux Kernel repository a code with warnings? Especially in
OS kernel, a compiler warning in code could easily cause an
Filename: src/adapter.c/935 error when running the code, and the error in a kernel part or
Line:422, changes -> 2 in a kernel module may cost a lot.
Line:2280, changes -> 2 The overall diagram of most common error types in Bluez
Filename: src/device.c/753 is given on Figure 5.
Line:1972, changes -> 2
Filename: obexd/plugins/
phonebook-tracker.c/515
Filename: android/gatt.c/386
Filename: obexd/client/session.c/309
Filename: audio/headset.c/251
Filename: tools/mgmt-tester.c/250
Filename: audio/a2dp.c/238
Line:246, changes -> 2
Filename: audio/avdtp.c/230
Filename: lib/sdp.c/212
Filename: android/bluetooth.c/195

Comments
We see a lot of types C-related errors here (see Figure Fig. 5. Top error types in Bluez
4)! The biggest problem for the Bluez is a memory leak, but
also there are invalid reads, double free, use after free, NULL Memory management. It is a very important Kernel part
pointer (all are about pointers). We know now that even in to allocate and manage the memory pages. The code is taken
Bluetooth implementation the memory leak problem is more to analyse from /mm path in the Git Linux Kernel repository.
frequent that others, but it is still easy to catch a pointer trouble.
Most common error fixes: fixing commit/relevance

Also, we see coding style fixes, typos, whitespace issues. it/58

We can observe that diff: typo in comment/50
this by marking the pmd_t clean and
author Johan Hedberg write protected in/26
<[email protected]> __rcu annotations/25
committer Jaikumar Ganesh by adding the locking/20
<[email protected]> this by checking for/16
Fix minor whitespace issue this by removing the page/15
--- a/audio/a2dp.c the condition by checking for the head
+++ b/audio/a2dp.c as well/14
use-after-free in wb_congested_put()/14
-for (l = s->cb; l != NULL; ){ follow_huge_pmd()")/13
+ for (l = s->cb; l != NULL; ) { this by always using the fmt string
and only/12
kernel-doc warnings in mm/filemap.c/12
address line detection on x86/11
it by using irq_save/restore instead/11
reset/remove race/11
front merge check/11
compile warnings/10
build failure in
__kmem_cache_create()/10
memory leak on isolation failure]/10
condition for filling of PMD holes/10
this by changing the delta argument
to long type./10
up conversion to hotplug state Fig. 6. Top types of memory management code issues
machine/9
wrong gcc code generation with 64 bit
variables/9 comment in pick_next_task_dl()/15
pick_next_task() for RT,DL/14
fairness issue on migration/12
The most buggy files and lines: filename/fixes count oops in sched_show_task()/12
hotplug crash/11
Filename: mm/slub.c/1445 task group initialization/11
Line:2596, changes -> 2 min_vruntime tracking/11
Filename: mm/slab.c/888 one typo/10
Line:2752, changes -> 2 remote wakeups/10
Filename: mm/memcontrol.c/613 per-CPU structure initialization in
Line:3354, changes -> 2 sugov_start()/9
Filename: mm/page_alloc.c/520 build warning./9
Line:120, changes -> 2 show_stack() task pointer regression/9
Filename: mm/memory.c/351 crash in sched_init_numa()/8
Filename: mm/filemap.c/316 cleanup cgroup teardown/init")/8
Line:2409, changes -> 2 this to be milliseconds all around./7
Filename: mm/hugetlb.c/285 steal time accounting/7
Filename: mm/shmem.c/283 set_user_nice()/7
Filename: mm/page-writeback.c/259
Filename: mm/backing-dev.c/258
Line:277, changes -> 2 The most buggy files and lines: filename/fixes count
Filename: mm/vmscan.c/244
Filename: kernel/sched/fair.c/2058
Comments Line:801, changes -> 3
Line:3254, changes -> 2
The memory management fixes are the different from
Line:4243, changes -> 2
the Bluez. The fixes are related to memory management
Filename: kernel/sched/core.c/1660
problems (see Figure 6). But we still see the common problems
Line:6697, changes -> 2
(race condition, memory leak, use pointer after free), we see
Filename: kernel/sched/cputime.c/295
concurrent problems (locking, rcu is also an algorithm for
Filename: kernel/sched/deadline.c/289
simultaneous access) and seems the data race here is the major
Line:1376, changes -> 2
issue. There are problems with normal commit messages,
Filename: kernel/sched/sched.h/254
typos, warnings as always.
Line:1305, changes -> 2
The most fixed files are SLUB and SLAB allocators (SLUB Filename: kernel/sched/rt.c/217
is more recent and it has more fixes than SLAB). Also, we can Line:20, changes -> 2
see here is seldom when the fix comes after the fix in the same Filename: kernel/sched/debug.c/125
line. Line:534, changes -> 2
Filename: kernel/sched/cpufreq_schedutil.c/89
Kernel scheduling code. The scheduler is a set of func-
Line:55, changes -> 2
tions to schedule tasks to a processor to execute than to switch
Line:489, changes -> 2
to another task. The scheduler like the memory manager is a
Line:485, changes -> 2
critical part of a kernel. The code is taken to analyse from
Filename: kernel/sched/idle.c/86
/kernel/sched path in the Git Linux Kernel repository.
Line:99, changes -> 3
Most common error fixes here: fixing commit/relevance Line:149, changes -> 2
Line:151, changes -> 2
typo in a comment/20 Filename: kernel/sched/cpuacct.c/61
Filename: kernel/sched/wait.c/49 more use after free/16

Comments The most buggy files and lines: filename/fixes count

We see here almost all commit messages related to a
Filename: net/bluetooth/mgmt.c/3495
scheduling process, so it is hard to classify it and to understand
Line:2232, changes -> 4
it is a subject of additional research. We see the most problems
Filename: net/wireless/nl80211.c/3205
in pick next task() and fair scheduling algorithms (see Figure
Line:456, changes -> 3
7). We don’t see here C related errors; it could mean that
Filename: net/core/dev.c/2895
the scheduling code has been written by high professional
Line:2797, changes -> 3
software engineers. We also could see that Fair scheduling
Filename: net/mac80211/mlme.c/2731
algorithm (in fair.c) could be 40 times more complicated than
Line:1488, changes -> 3
the implementation of waiting primitives (located in wait.c).
Filename: net/ipv4/route.c/1837
Line:1464, changes -> 3
Filename: net/bluetooth/hci_core.c/1827
Line:631, changes -> 3
Filename: net/batman-adv/
translation-table.c/1763
Line:58, changes -> 2
Filename: net/bluetooth/hci_event.c/1717
Line:3740, changes -> 3
Filename: net/ipv6/addrconf.c/1708
Line:2468, changes -> 2
Filename: net/tipc/link.c/1647
Line:2145, changes -> 3
Filename: net/mac80211/rx.c/1588
Line:392, changes -> 2
Fig. 7. Top types of Kernel scheduling issues Filename: net/ipv4/tcp_input.c/1560
Line:83, changes -> 2
Filename: net/ipv6/route.c/1502
Kernel networking code. Linux is a network operating
Line:2148, changes -> 3
system so it must have a good network code. It has ipv4/ipv6
Filename: net/netfilter/
and others protocols implementation, routing and other net-
nf_tables_api.c/1490
work stuff. One of the most useful pieces of Kernel networking
Line:2697, changes -> 4
code is Netfilter [17]. The Netfilter framework offers users
Filename: net/mac80211/tx.c/1440
to create extensions to a firewall to process and filter every
Line:1821, changes -> 3
network packet, so it must be very fast and stable, and every
Filename: net/tipc/socket.c/1356
bug must be immediately fixed.
Line:432, changes -> 3
Most common error fixes here: fixing commit/relevance Filename: net/ipv4/udp.c/1243
Line:562, changes -> 3
that(+this)./93 Filename: net/sctp/socket.c/1199
whitespace errors./52 Line:2858, changes -> 2
net: use a per task frag allocator/30 Filename: net/netfilter/
mpls: support for dead routes/28 nf_conntrack_core.c/1156
sending netlink message/23 Line:766, changes -> 3
ipv4: Delete routing cache/21 Filename: net/ipv4/tcp_ipv4.c/1123
genetlink: use idr to track families/20 Line:321, changes -> 2
error path in init/20
sctp: add dst_pending_confirm flag/20 Comments Here we see different types of errors. Some of
an assertion in rxrpc_read()/20 them are the same as in the other modules (we have already
netfilter: nft_ct: add zone id set seen use after free, races, rcu annotations), some of them are
support/20 network and protocol related (see Figure 8). It is interesting
sctp: Add GSO support/20 that the most buggy source is a Bluetooth management module
sctp: implement prsctp PRIO policy/19 (see Figure 9), so this protocol may be well tested or maybe
reconnection timeouts/19 completely unstable. The IPv4 family is being used now by the
net: sched: Introduce connmark action/19 billions of people, but the amount of errors isn’t the higher, the
sparse warning/18 main problem module is a routing module (it has more fixes
races between socket add and release./17 than IPv6 routing module). If we compare the count of fixes
__rcu annotations/17 in the network protocols implementations to the number of
error path in nbp_vlan_init/17 fixes in Intel drivers, we can see the smaller amount of errors
the putting of client connections/16 here, and that could be because of an initially standardized
configuration race/16 specification to it.
R EFERENCES
[1] The Linux Kernel Archives. https://www.kernel.org/
[2] Git –distributed-is-the-new-centralized. https://git-scm.com/
[3] 10 Years of Git: An Interview with Git Creator Linus Torvalds.
https://www.linux.com/blog/10-years-git-interview-git-creator-linus-
torvalds/
[4] JGit -Eclipse. https://eclipse.org/jgit/
[5] Vladimir I. Levenshtein, Binary codes capable of correcting deletions,
insertions, and reversals, Doklady Akademii Nauk SSSR, 163(4):845-
848, 1965 (Russian). English translation in Soviet Physics Doklady,
10(8):707-710, 1966.
[6] Black, Paul E. Levenshtein distance, Dictionary of Algorithms and Data
Structures [online], U.S. National Institute of Standards and Technology.
https://xlinux.nist.gov/dads/HTML/Levenshtein.html
Fig. 8. Top types of Kernel networking issues [7] Linux kernel source tree. https://github.com/torvalds/linux
[8] Kernel.org git repositories. https://git.kernel.org/
[9] Bluetooth protocol stack for Linux.
https://git.kernel.org/pub/scm/bluetooth/bluez.git/
[10] linux/mm. https://github.com/torvalds/linux/tree/master/mm
[11] The newbie’s guide to hacking the Linux kernel.
http://www.tuxradar.com/content/newbies-guide-hacking-linux-kernel
[12] Linux kernel coding style. https://www.kernel.org/doc/Documentation/
process/coding-style.rst
[13] Device Tree Reference. http://elinux.org/Device Tree Reference
[14] Unaligned memory access. https://www.kernel.org/doc/Documentation/unaligned-
memory-access.txt
[15] Improving lost and spurious IRQ handling.
https://lwn.net/Articles/392136/
[16] The Skein Hash Function Family. http://www.skein-
hash.info/sites/default/files/skein1.3.pdf
[17] The netfilter.org project. https://www.netfilter.org/
Fig. 9. Top buggy files in network modules
[18] S.Staroletov. Linux Kernel Analysis.
https://github.com/SergeyStaroletov/LinuxKernelAnalysis

IX. C ONCLUSION
The described algorithms could be well used to study the
most frequent errors in the Linux Kernel and to analyze them.
People study information by the troubles well, and the existed
errors are a real point to learn about them.
During the research, the software for the analysis has been
developed, and the sources can be free downloaded from the
Git repository [18].
Some Linux Kernel repositories have been analysed by
using the software.
The results of the study show that the integration and
developing process organization errors on the scale of whole
Linux Kernel play a huge role. Also, there are many C
language related error fixes, but in the total number of fixes,
they don’t play a significant role.
We found that the most fixed part are the Intel drivers.
They could be unstable or well tested and fixed; we cannot
say anything about it.
Also, we see a lot of fixes of typos/uninformative com-
mits/code style issues; these fixes show us that the developers
should pay attention to a Linux code style guide, and they
must review the changes before doing the commit.
The research work can be continued to looking at a source
code of diffs to understand the vast amount of refixes (fixes
of the same line of the file) we found.

Sony Vaio DX Diag
No ratings yet
Sony Vaio DX Diag
31 pages
4
No ratings yet
4
11 pages
Lab3-Git
No ratings yet
Lab3-Git
6 pages
Linux 2023 Fall 3 Version Control System Git
No ratings yet
Linux 2023 Fall 3 Version Control System Git
33 pages
Giit
No ratings yet
Giit
89 pages
Git Presentation
No ratings yet
Git Presentation
125 pages
Git Exercises: Core Bioinformatics 2020-21
No ratings yet
Git Exercises: Core Bioinformatics 2020-21
4 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Git: A Case Study: Haresh Khanna - 15114031 Harjot Singh Oberai - 15114032 Ketan Gupta - 15114039
No ratings yet
Git: A Case Study: Haresh Khanna - 15114031 Harjot Singh Oberai - 15114032 Ketan Gupta - 15114039
27 pages
Application For Google Summit
No ratings yet
Application For Google Summit
5 pages
Lab Experiment
No ratings yet
Lab Experiment
21 pages
Introduction To GIT
100% (1)
Introduction To GIT
25 pages
Advanced Git
No ratings yet
Advanced Git
86 pages
Git Exercises For Rawan
No ratings yet
Git Exercises For Rawan
2 pages
100 Core Git Interview Questions: 1. What Is and Why Is It Used?
No ratings yet
100 Core Git Interview Questions: 1. What Is and Why Is It Used?
24 pages
git-basics
No ratings yet
git-basics
23 pages
GIT Esc Chicago 2010
No ratings yet
GIT Esc Chicago 2010
27 pages
Exercise Day 01
No ratings yet
Exercise Day 01
14 pages
Git Basics Vtu
No ratings yet
Git Basics Vtu
11 pages
Assignment 1 NPTEL CS-308.docx
No ratings yet
Assignment 1 NPTEL CS-308.docx
5 pages
Software Configuration Management
No ratings yet
Software Configuration Management
7 pages
On Network Programming Laboratory Work #1: Topic: Introduction To GIT and Distutils
No ratings yet
On Network Programming Laboratory Work #1: Topic: Introduction To GIT and Distutils
10 pages
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Git and GitHub
From Everand
Git and GitHub
Alisa Turing
No ratings yet
Git Command 2
No ratings yet
Git Command 2
5 pages
Git_Lec1
No ratings yet
Git_Lec1
27 pages
Slides Git First Steps
No ratings yet
Slides Git First Steps
171 pages
Git From Scratch: By: Eng. Mohamed Elemam Email
100% (5)
Git From Scratch: By: Eng. Mohamed Elemam Email
61 pages
Essential Git Commands Cheat Sheat 2025
No ratings yet
Essential Git Commands Cheat Sheat 2025
4 pages
Slides
No ratings yet
Slides
245 pages
2
No ratings yet
2
11 pages
28_PDFsam_matlab_prog
No ratings yet
28_PDFsam_matlab_prog
1 page
Git User Manual
100% (5)
Git User Manual
71 pages
Git User Manual PDF
No ratings yet
Git User Manual PDF
71 pages
Git 1
No ratings yet
Git 1
22 pages
Git User Manual
No ratings yet
Git User Manual
78 pages
Programming Foundations Version Control With Git
No ratings yet
Programming Foundations Version Control With Git
4 pages
Gitmanual
No ratings yet
Gitmanual
61 pages
CHERIoT Programmers' Guide: CHERIoT, #1
From Everand
CHERIoT Programmers' Guide: CHERIoT, #1
David Chisnall
No ratings yet
Gitvcs
No ratings yet
Gitvcs
9 pages
Versioning Git Matlab
No ratings yet
Versioning Git Matlab
13 pages
slides12
No ratings yet
slides12
53 pages
Git2 PDF
No ratings yet
Git2 PDF
10 pages
Ry's Git Tutorial
From Everand
Ry's Git Tutorial
Ryan Hodson
No ratings yet
Git MCQ
No ratings yet
Git MCQ
5 pages
48 Anjali DL E3
No ratings yet
48 Anjali DL E3
13 pages
48 Anjali DL E3 Compressed
No ratings yet
48 Anjali DL E3 Compressed
13 pages
An Illustrated Guide to Git on Windows
No ratings yet
An Illustrated Guide to Git on Windows
20 pages
Understanding Usage of Gcov and Lcov
No ratings yet
Understanding Usage of Gcov and Lcov
8 pages
Learning Git Ebook
100% (2)
Learning Git Ebook
254 pages
MCQ Git
No ratings yet
MCQ Git
26 pages
DevOps Chapter-2
No ratings yet
DevOps Chapter-2
31 pages
Week 05 Tutorial Sample Answers
No ratings yet
Week 05 Tutorial Sample Answers
11 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Source Code Management / Version Control System
No ratings yet
Source Code Management / Version Control System
24 pages
Csci 3901 Lab 4
No ratings yet
Csci 3901 Lab 4
3 pages
User Manual Git
0% (1)
User Manual Git
80 pages
Mingw64: (Master)
No ratings yet
Mingw64: (Master)
3 pages
2-git
No ratings yet
2-git
60 pages
Fernando
No ratings yet
Fernando
4 pages
GIT Advanced: Anthony Baire
No ratings yet
GIT Advanced: Anthony Baire
173 pages
MN007501A01-AB_enus_IMPRES_BFM_Troubleshooting_Guide_External_Software_and_Component_Config_Guide
No ratings yet
MN007501A01-AB_enus_IMPRES_BFM_Troubleshooting_Guide_External_Software_and_Component_Config_Guide
65 pages
Jancke Kinect Programming
No ratings yet
Jancke Kinect Programming
25 pages
Installing and Using USB Programming Cables: - Easy-Usb-Cab - EASY800-USB-CAB - EU4A-RJ45-USB-CAB1
No ratings yet
Installing and Using USB Programming Cables: - Easy-Usb-Cab - EASY800-USB-CAB - EU4A-RJ45-USB-CAB1
27 pages
BTT Rumba32 V1.0User Manual
No ratings yet
BTT Rumba32 V1.0User Manual
15 pages
Spectrum - Es - Administrator - Guide Perkin
No ratings yet
Spectrum - Es - Administrator - Guide Perkin
88 pages
Getting Started With Chipkit: The Arduino Compatible Pic32 Based Module
No ratings yet
Getting Started With Chipkit: The Arduino Compatible Pic32 Based Module
54 pages
Ugee Driver Instruction Manual V4.1 EN 20230821154955
No ratings yet
Ugee Driver Instruction Manual V4.1 EN 20230821154955
31 pages
ReleaseNotes_7.01.08.129
No ratings yet
ReleaseNotes_7.01.08.129
4 pages
SBC8600B UserManual V2.0
No ratings yet
SBC8600B UserManual V2.0
161 pages
Arduino - Robot
No ratings yet
Arduino - Robot
10 pages
Operating Instructions: Vegamet 391
No ratings yet
Operating Instructions: Vegamet 391
60 pages
DX Diag
No ratings yet
DX Diag
12 pages
QDX: RP Labs Igital CVR (Transceiver) : Operating Manual, Firmware 1.03
No ratings yet
QDX: RP Labs Igital CVR (Transceiver) : Operating Manual, Firmware 1.03
37 pages
UGS Licensing
No ratings yet
UGS Licensing
25 pages
Funds of Distributed Control Systems PDF
No ratings yet
Funds of Distributed Control Systems PDF
55 pages
VME102 User Guide
No ratings yet
VME102 User Guide
40 pages
DX Diag
No ratings yet
DX Diag
34 pages
Profim
No ratings yet
Profim
5 pages
Communication RSLogix Emulate 500 and RSView 32
No ratings yet
Communication RSLogix Emulate 500 and RSView 32
8 pages
Win8.1 Bluetooth Suite 8.0.1.318 Release Note
No ratings yet
Win8.1 Bluetooth Suite 8.0.1.318 Release Note
7 pages
Controller Board User Manual English V1.0 20200612
No ratings yet
Controller Board User Manual English V1.0 20200612
22 pages
AUTOSAR SWS SPIHandlerDriver
0% (1)
AUTOSAR SWS SPIHandlerDriver
98 pages
Removable Media Drives
100% (1)
Removable Media Drives
25 pages
DxDiag My Netbook
No ratings yet
DxDiag My Netbook
13 pages
Cobra User Manual v5
No ratings yet
Cobra User Manual v5
66 pages
DX Diag
No ratings yet
DX Diag
44 pages
Huawei Jny-Lx1 10.0.1.167 (C185e3r3p1) &jny-Lx1 10.0.
No ratings yet
Huawei Jny-Lx1 10.0.1.167 (C185e3r3p1) &jny-Lx1 10.0.
9 pages
Driver Updater
No ratings yet
Driver Updater
3 pages
Fond Ispitnih Pitanja Po Programu Strucne Obuke Za Procenu Rizika U Zastiti Lica Imovine I Poslovanja
No ratings yet
Fond Ispitnih Pitanja Po Programu Strucne Obuke Za Procenu Rizika U Zastiti Lica Imovine I Poslovanja
12 pages

A Survey of Most Common Errors in Linux Kernel

Uploaded by

A Survey of Most Common Errors in Linux Kernel

Uploaded by

A survey of most common errors in Linux Kernel

commitMessage ← Message (commit) endfor

if commitMessage ∈ (”Fix*”, ”Fixed:*”) return GetFirst(count, SortMap(mapFileNameChanges))

Also, we see coding style fixes, typos, whitespace issues. it/58

Comments The most buggy files and lines: filename/fixes count

You might also like

if commitMessage ∈ (”Fix”, ”Fixed:”) return GetFirst(count, SortMap(mapFileNameChanges))