[Thunar-dev] Thunar performance improvements

André Gillibert metaentropy at gmail.com
Fri Nov 18 12:42:00 CET 2011


2011/11/17, Jannis Pohlmann <jannis at xfce.org>:
> Hi André,
>
>> Secondly, I patched Thunar, as follows:
>> 1) Changed the ThunarFile implementation to only compute information
>> that's "required", and, as more information is needed, complete this
>> information.
>
> The "required" bit is a bit complicated, I guess, so I can only judge
> about this improvement based on real code.
>

Indeed. Now, I'm working on commenting and splitting patches.

>> 3) When the file is actually viewed, a background thread computes
>> more info (and so, follow symlinks), and, after a few milliseconds,
>> the icon is changed. If a symlink is discovered to point to a
>> directory, it's inserted as subdirectory in the tree view.
>
> Doesn't that mean things will jump up and down in the directory as you
> browse it?
>

Good question. When a file was changed thunar_list_model_file_changed
was invoked, and the file changed its position to keep the directory
sorted. A symlink becoming a directory would have made the directory
jump to the start of the folder, if directories are sorted before
files. To avoid that side effect, plus a re-entrancy issue with
thunar_standard_view_selection_changed (although it was fixable), I
simply don't move files when they change, so the folder is not really
sorted properly.
Currently, leaving a folder and re-entering it, sort it again, so that
symlink to directories that had been seen are moved at the top of the
folder, while symlink to directories that had not been seen, are still
sorted together with files.
This only affects symlinks to directories. Regular directories and
symlinks to files are not affected.
This inconsistency disappears if the "sort folders before files"
option is unset.

>> 4) When a file or set of files is selected, their real content-type
>> is computed in order to show a correct context menu, although, not
>> everything is computed in some cases (e.g. If there's a file + a
>> directory in the selection, it knows that the only verb is "open").
>
> I wouldn't want to add two many special cases where we load additional
> information. A first quick pass and then lazy loading additional
> information all at once sounds more simple to me.
>

The main code change is in thunar/thunar-file.c. A few "fast"
functions, that don't follow symlinks unless they have already been
followed previously, are added such as thunar_file_is_directory_fast.
The logic of lazy loading of information is in thunar-file.c, through
a central function:

thunar_file_complete_info(ThunarFile *file, guint flags)
where flags are a few categories.
THUNAR_INFO_BASIC (zero-cost information every ThunarFile gets on creation)
THUNAR_INFO_XSTAT (accurate target file type, content-type, and UNIX
attributes, which is pretty all info you can get with
lstat(2)/stat(2)).
THUNAR_INFO_ACCESS (Info you get with access(2))
THUNAR_INFO_SYMLINK_TARGET (target of a symlink (readlink(2)))
THUNAR_INFO_TRASH (info related to trash items)

The idea was that mass operations (viewing, selecting) need only
THUNAR_INFO_XSTAT, and other info would be rarely asked.
But, I discovered yesterday that, to be compatible with plugins such
as UCA, complete info must be computed whenever files are selected,
unless the plugin interface is changed to allow plugins to ask only
information they need.

Consequently, it makes sense to limit this info to two levels:
THUNAR_INFO_BASIC
THUNAR_INFO_ALL

There is not much code to change. :)

>> Consequently, the behavior of double-click or context menu is not
>> changed. Only icons may be "incorrect" a short amount of time.
>>
>> 5) The side tree view was extremely slow in some cases. It could
>> freeze Thunar for several minutes. This is because Thunar wanted to
>> know if each directory visible in the tree view had any subdirectory
>> (following symlinks) in order to display a little cross to be able to
>> expand the directory and view the subdirectories. This was performed
>> in a background thread, but, on I/O bound systems, could slow down
>> extremely all other I/O operations.
>>
>> Actually, this was the "bug" that made me initially write this set of
>> patches.
>>
>> I changed that to make it behave like Nautilus: Don't enter
>> subdirectories until the cross is clicked, in that case, if no
>> subdirectory is found, just make the cross disappear... This is a
>> "feature regression", but I may update the patch to make something
>> fast and correct most of the time: Seek a subdirectory, parsing a
>> limited number (e.g. one hundred) of sub-files at most, and stop as
>> soon as a true subdirectory (not symlink) is found. In doubt, assume
>> the directory may have subdirectories.
>
> That sounds a little better and not too complicated either. Although I
> wonder if seeking a subdirectory won't be much faster if we query less
> information. Maybe that is enough optimization already?
>

If symlinks are not followed, it can have acceptable performances.
readdir(3) is much cheaper than stat(2).

For example (target machine = K6-2 550 Mhz, 448 MB RAM, samba, 7500
symlink folder, client machine = Core 2 Duo 100 Mbps ethernet
network):
time ls -f /huge_cifs_folder > /dev/null
-> 0.89 second
time ls -l /huge_cifs_folder > /dev/null
-> 14 seconds (even though everything is in server cache)

BTW, NFS is much faster than CIFS. :)

Basically, we want to know whether a directory has regular
sub-directories. Symlink sub-directories are not significant as far as
they are not viewed.

POSSIBLE optimization, but hard to make portable: Use the st_nlink
field on the few well-known file systems where its behavior is
reliable. If st_nlink > 2, then, most probably, there's a regular
sub-directory.

Even with readdir(3), the worst case can be very poor: Many symlinks
to the same huge CIFS directory viewed, or a CIFS directory with many
crossed symlinks. Each one would be parsed independently, so it would
require 0.89*many seconds. This is due to the fact that we cannot
assume that the graph of symlinks is a tree. It's an arbitrary
oriented graph.
SOLUTION proposition: It may be possible to save info on each parsed
symlink target folder, in order to avoid recomputing whether there are
subfolders.
This doesn't solve the problem of many multi-mounting (as can be
obtained with mount --bind), or indirection through a symlink unaware
system (e.g. Thunar on Linux viewing a CIFS share of a Linux SAMBA
directory shared through a Windows SMB share, and so, masking
directory symlinks as if they were regular directories, but these are
corner cases that may not be significant.

Anyway, I wouldn't like Thunar to be one of those applications that
make the system CPU, disk or network slower for extensive periods of
time, only because it's launched, even though there's no user
interaction.

>> I may also provide a user preference to balance between performances
>> and correctness. At the highest level of correctness, it would behave
>> as the old Thunar (although twice as fast because, a "bug" in the
>> folder listing function would make everything listed/stat-ed twice).
>
> Please don't. No option for technical feature sets like this.
>

Okay. It's always better to make the machine take this decision... The
computer can find its own balance. For example, the concept of not
following symlinks could be moderated by the number of symlinks or a
time limit.

>> 6) I fixed a few performance bugs. For example, when viewing a
>> directory, it was sorted with a O(n^2) algorithm because the dir was
>> initially listed as empty, and files, after having been listed in a
>> background job, were seen as dynamically added files.
>
> That itself doesn't imply O(n²), does it? My guess would be that it
> depends on how you do the online sorting.
>

Yes, it could be made O(n*log(n)) by sorting the list of files and
then, merging them to the already-sorted list with a linear sorted
list merge algorithm, but it was done through iterative insertions.

This raises a question about events creating new files in unsorted
folders (e.g. new files created through a shell script and notified
via GAMIN).
Symlinks that "become" directories are currently not properly sorted
(not moved), and so, make any sorted insertion algorithm
non-functionnal in some way or other.
Moreover, this orthogonal idea of having the folder always sorted is
not so nice in the real world. The Windows explorer behavior of
putting new files at the end of the folder make it much easier to keep
track of the last file you just created with a separate command line
tool.

-- 
André Gillibert


More information about the Thunar-dev mailing list