[Thunar-workers] [DokuWiki] page changed: implementation:mime-glob-match

Sun Feb 20 00:07:39 CET 2005

A page in your DokuWiki was added or changed. Here are the details:

Date        : 2005/02/19 23:07
Browser     : Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050122 Firefox/1.0
IP-Address  : 217.85.191.78
Hostname    : pD955BF4E.dip.t-dialin.net
Old Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match?rev=1108839229
New Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match
Edit Summary: 
User        : benny

@@ -73,8 +73,65 @@
            return mime type of literal;
        }
    }
  </code>
+ 
+ As mentioned earlier the storage should be an array of Pattern
+ structures. The ''PatternLiteral'' structure should look like
+ this:
+ 
+ <code c>
+ struct PatternLiteral
+ {
+   union
+   {
+     gchar *ptr;
+     gchar  buf[8];
+   } value;
+ 
+   gsize length;
+ 
+   const gchar *type;
+ };
+ </code>
+ 
+ For the ia32 architecture (and most other 32bit architectures), this
+ will most probably result in a 16 byte struct (if not, your compiler
+ is on crack!), which matches the size of a cacheline of mostly every
+ (all?) ia32 based CPU. In addition, care must taken that the allocated
+ array of structs is also aligned on a 16 byte boundary (IIRC, this is
+ default on 32bit BSDs already; Glibc?). To further speed up things,
+ the implementation can use prefetching for CPUs that support it. The
+ size of buf should probably be detected at configure time somehow,
+ so other architectures can benefit from this optimization as well.
+ 
+ As an explanation for the struct: ''length'' includes the length
+ of the literal in bytes, and ''type'' is a pointer to the MIME-type
+ which is located in the hashtable somewhere in the ''ThunarMimeDatabase''.
+ The interesting optimization here is the storage of the literal
+ string itself; if the literal string is no more than 8 bytes in
+ size, it is //embedded// into the ''PatternLiteral'' memory, and
+ the string is stored outside only if the literal is 9 or more byte
+ in size. A quick ''grep'' on the existing ''glob'' files show
+ that all currently existing literal patterns are less than
+ 9 byte in size and our optimization works; excellent **data
+ locality**!
+ 
+ If some literal patterns are 9 or more byte, they should be collected
+ and stored together in a single string chunk to avoid polluting
+ the cache with random memory accesses.
+ 
+ Since the literal pattern and the input filename size is known
+ and compared first, the string comparison could be done using
+ a plain ''memcmp()'' instead of using a real string comparison. But
+ this will only work if both strings are normalized first (remember,
+ this is UTF-8!).
+ 
+ The above optimizations should do the job, especially, since - as
+ mentioned already - the literal pattern is not the common case.
+ 
+ Open issues:
+   * What about **case-insensitive** comparisons!?
  
  
  
  === Simple Patterns ===
@@ -392,8 +449,9 @@
    gboolean     case_sensitive;
    gchar       *rname;
    gsize        length;
    gint         n;
+ 
  
    length = strlen (filename);
    rname = g_utf8_strreverse (name, -1);
  



-- 
This mail was generated by DokuWiki at
http://thunar.xfce.org/wiki/