[Thunar-workers] [DokuWiki] page changed: implementation:mime-glob-match
thunar-workers at xfce.org
thunar-workers at xfce.org
Sat Feb 19 19:53:49 CET 2005
A page in your DokuWiki was added or changed. Here are the details:
Date : 2005/02/19 18:53
Browser : Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050122 Firefox/1.0
IP-Address : 217.85.191.78
Hostname : pD955BF4E.dip.t-dialin.net
Old Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match?rev=1107905661
New Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match
Edit Summary:
User : benny
@@ -1,5 +1,117 @@
====== Implementation - MIME Glob Match ======
+ The **MIME Glob Match** implementation should follow the suggestions from the
+ [[requirements:standards#shared_mime-info_database|Shared MIME Database
+ specification]] as closely as possible (unlike other projects)
+ to make sure the user will be presented with usable results (although the
+ user won't know where these results come from, since the MIME System will
+ be hidden from the UI).
+
+
+ ===== Ideas for the new Implementation =====
+ The implementation should be very fast, since the MIME type detection is
+ done on every single file, that is touched by the file manager (and can
+ thereby be considered a //common operation//).
+
+
+ ==== Steps ====
+ The required steps for the Glob Match detection are as follows
+
+ - Check __Literal__ patterns (case sensitive)
+ - Check __Simple__ patterns (case sensitive)
+ - Check __Complex__ patterns (case sensitive)
+ - Check __Literal__ patterns (case insensitive)
+ - Check __Simple__ patterns (case insensitive)
+ - Check __Complex__ patterns (case insensitive)
+
+ where a __Literal__ pattern is a pattern that does not include any
+ wildcards (such as ''*?[''), __Simple__ patterns are patterns that
+ start with a ''*'' and do not include any other wildcards (e.g.
+ ''*.txt'' or ''*.ps''), and __Complex__ patterns are all other
+ patterns. It is obvious, that __Simple__ patterns are the most common
+ patterns in the ''glob'' database, and should therefore be optimized
+ the most.
+
+ In general, the pattern data (as loaded from the ''glob'' database)
+ should be stored //cache-friendly// way (read the
+ [[infrastructure:related-links#alphasort|article about AlphaSort]] for
+ an explanation). This is especially important for the __Simple__
+ patterns. So instead of using singly-linked lists, like in the
+ filer sample implementation below, Thunar should store all patterns
+ in arrays (probably one array per pattern type), and make sure that
+ the array items are properly aligned in memory.
+
+
+ === Literal patterns ===
+ The __Literal__ patterns, like ''Makefile'' or
+ ''README'' can easily be checked using string comparisons. To further
+ speed up the //__Literal__ patterns//-steps, the patterns should be
+ sorted by length, where the longest patterns should appear first, and
+ in addition to the string value, the lookup database should also store
+ the length of the string.
+
+ When comparing the filename to a literal pattern entry, the code
+ should first check if the lengths match, then if both are of the
+ same length, it should perform a string comparison. If the code
+ detects that the filename length is less than the length of the
+ current literal pattern, then it should stop the lookup and
+ continue with the next step. This optimization is based on the
+ proposition that the literal patterns are sorted by their string
+ length (longest first), and so no other literal can match the
+ given filename, because all other literals are shorter than the
+ filename.
+
+ The following pseudo code demonstrate this step:
+
+ <code c>
+ for (all literal patterns)
+ {
+ if (length (literal) < length (filename))
+ break;
+ else if (length (literal) == length (filename))
+ {
+ if (literal == filename)
+ return mime type of literal;
+ }
+ }
+ </code>
+
+
+
+ === Simple Patterns ===
+ This is the most common case of a pattern, and it should therefore
+ be optimized the most. When loading the glob database, the __Simple__
+ patterns (or atleast a few of them) contained within should be compiled
+ into some form that can be easily used for pattern matching. For example,
+ all simple patterns that look like ''*ext'' (where ''ext'' has 4 or
+ less bytes**!!**) can be compiled into a 32bit integer (using two
+ ints, one to store the case-sensitive version and one to store
+ the case-insensitive (lower-cased) version), just like the Filer
+ example below does. This would turn a byte-wise string comparison
+ into a simple integer comparison. We could even use some
+ machine-specific optimizations here.
+
+ :!: More research required!
+
+
+
+ === Complex Patterns ===
+ This is the least common pattern. It needs to be checked using
+ ''fnmatch()'' (the glob-style pattern matching in GLib does not
+ support ''[...]'' character ranges). Due to the nature of the
+ patterns there's not much that can be done here to optimize the
+ pattern matching, and since its the least common case, it may
+ not be worth to spent much effort on this.
+
+ :!: For filenames that would match only in steps 3-6 (e.g.
+ ''IMAGE.GIF'', which will normally match the simple pattern
+ ''*.gif'' in step 4), all the complex patterns will need to
+ be tested first, so it may actually be a good idea to waste
+ some more thoughts about this.
+
+
+
+ ===== Old Implementation (from Filer) =====
A possible implementation of the MIME-type detection //Glob match step//.
This implementation is not well tested, and it's basicly just a testcase.
It includes the **integer comparison**-optimization for the common case
@@ -228,8 +340,9 @@
gboolean case_sensitive)
{
for (; pattern != NULL; pattern = pattern->next)
if (g_pattern_match (pattern->pspec, name_length, name, name_reversed))
+
return pattern->type;
return NULL;
}
--
This mail was generated by DokuWiki at
http://thunar.xfce.org/wiki/
More information about the Thunar-workers
mailing list