[Thunar-workers] [DokuWiki] page changed: implementation:mime-glob-match

thunar-workers at xfce.org thunar-workers at xfce.org
Wed Mar 9 00:24:06 CET 2005


A page in your DokuWiki was added or changed. Here are the details:

Date        : 2005/03/08 23:24
Browser     : Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050122 Firefox/1.0
IP-Address  : 217.85.185.87
Hostname    : pD955B957.dip.t-dialin.net
Old Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match?rev=1110312986
New Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match
Edit Summary: 
User        : benny

@@ -37,10 +37,8 @@
  | Jasper Huijsmans                | 15.1931                 |
  | Jens Luedicke                   | 10.9706                 |
  | Yves-Alexis Perez (Home folder) | 16.0649                 |
  | Yves-Alexis Perez (MP3 folder)  | 34.6277                 |
- 
- 
  
  ==== Types of Patterns ====
  The Shared MIME Database specification mentions three different types of
  patterns ([[implementation:mime-glob-match#Literal_Patterns|Literal
@@ -116,28 +114,50 @@
  :!: More research required.
  
  
  === Simple Patterns ===
- This is the most common case of a pattern (**400** simple patterns in the
- ''shared-mime-info-0.15'' package), and it should therefore
- be optimized the most. When loading the glob database, the __Simple__
- patterns (or atleast a few of them) contained within should be compiled
- into some form that can be easily used for pattern matching. For example,
- all simple patterns that look like ''*ext'' (where ''ext'' has 4 or
- less bytes**!!**) can be compiled into a 32bit integer (using two
- ints, one to store the case-sensitive version and one to store
- the case-insensitive (lower-cased) version), just like the Filer
- example below does. This would turn a byte-wise string comparison
- into a simple integer comparison. We could even use some
- machine-specific optimizations here.
+ This is the most common case of a pattern in the Shared MIME Database
+ (version ''0.15'' has about 400 Simple Patterns), and it is basicly
+ a pattern in the form ''*.ext'', where ''ext'' does not contain any
+ of the wildcards mentioned above (unlikely with earlier attempts, a
+ Simple Pattern starts with ''*.'' instead of just ''*''). Due to the
+ huge number of Simple Patterns we'll first concentrate on reducing
+ the number of required comparisons - for both the case where a pattern
+ matches and the case where no pattern matches - and afterwards
+ concentrate on improving the comparison speed (if necessary).
  
- Since the number of simple patterns is that huge, it may be worth to
- investigate in reducing the number of comparisons instead of trying
- to speed up the comparisons. There are several ways to achieve this
- goal:
+ Since we reduced the Simple Patterns to patterns in the form ''*.ext'',
+ our implementation will first search for dots in the input file name.
+ If the input file name does not contain any dots, none of the Simple
+ Patterns can match and we're done with the Simple Patterns. Else, if
+ the file name contains one or more dots, we'll perform a tree lookup
+ for every substring that follows a dot, until we find a match or reach
+ the end of the file name. The following simple pseudo-code illustrates this
+ procedure:
  
-   * An n-ary search tree
-   * Some kind of hash lookup
+ <code c>
+ for (f = filename; *f != '\0';)
+   {
+     if (*f++ != '.')
+       continue;
+ 
+     /* perform the tree lookup on the
+      * substring f ...
+      */
+   }
+ </code>
+ 
+ Now, let's see to this obscure //tree lookup//. Most probably the best
+ way to match an extension-based file name pattern to a file name is to
+ use a n-ary tree and perform a lookup character by character. 
+ 
+ The following figure illustrates the //search tree// built from the
+ 7 Simple Patterns ''*.C'', ''*.htm'', ''*.html'', ''*.xm'', ''*.xml'',
+ ''*.xsl'' and ''*.xslt''.
+ 
+ {{implementation:mime:20050308-simple_patterns_tree_lookup.png}}
+ 
+ FIXME: Put everything online concerning this tree idea.
  
  While reducing the number of comparisons, it's important to keep an
  eye on data locality; else, if we stall the CPU 50 times waiting
  for main memory, we'll surely experience a huge performance problem
@@ -156,12 +176,8 @@
  
  (plus the magic rules > 80 checks first). Performing these actions
  on 3000 files will take time; it's therefore important to optimize
  the Glob Pattern Matching as much as possible.
- 
- Possible n-ary tree based design:
- 
- {{implementation:mime:20050308-simple_patterns_tree_lookup.png}}
  
  :!: More research required!
  
  



-- 
This mail was generated by DokuWiki at
http://thunar.xfce.org/wiki/



More information about the Thunar-workers mailing list