[Thunar-workers] [DokuWiki] page changed: implementation:mime-glob-match
thunar-workers at xfce.org
thunar-workers at xfce.org
Sun Feb 20 00:07:39 CET 2005
A page in your DokuWiki was added or changed. Here are the details:
Date : 2005/02/19 23:07
Browser : Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050122 Firefox/1.0
IP-Address : 217.85.191.78
Hostname : pD955BF4E.dip.t-dialin.net
Old Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match?rev=1108839229
New Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match
Edit Summary:
User : benny
@@ -73,8 +73,65 @@
return mime type of literal;
}
}
</code>
+
+ As mentioned earlier the storage should be an array of Pattern
+ structures. The ''PatternLiteral'' structure should look like
+ this:
+
+ <code c>
+ struct PatternLiteral
+ {
+ union
+ {
+ gchar *ptr;
+ gchar buf[8];
+ } value;
+
+ gsize length;
+
+ const gchar *type;
+ };
+ </code>
+
+ For the ia32 architecture (and most other 32bit architectures), this
+ will most probably result in a 16 byte struct (if not, your compiler
+ is on crack!), which matches the size of a cacheline of mostly every
+ (all?) ia32 based CPU. In addition, care must taken that the allocated
+ array of structs is also aligned on a 16 byte boundary (IIRC, this is
+ default on 32bit BSDs already; Glibc?). To further speed up things,
+ the implementation can use prefetching for CPUs that support it. The
+ size of buf should probably be detected at configure time somehow,
+ so other architectures can benefit from this optimization as well.
+
+ As an explanation for the struct: ''length'' includes the length
+ of the literal in bytes, and ''type'' is a pointer to the MIME-type
+ which is located in the hashtable somewhere in the ''ThunarMimeDatabase''.
+ The interesting optimization here is the storage of the literal
+ string itself; if the literal string is no more than 8 bytes in
+ size, it is //embedded// into the ''PatternLiteral'' memory, and
+ the string is stored outside only if the literal is 9 or more byte
+ in size. A quick ''grep'' on the existing ''glob'' files show
+ that all currently existing literal patterns are less than
+ 9 byte in size and our optimization works; excellent **data
+ locality**!
+
+ If some literal patterns are 9 or more byte, they should be collected
+ and stored together in a single string chunk to avoid polluting
+ the cache with random memory accesses.
+
+ Since the literal pattern and the input filename size is known
+ and compared first, the string comparison could be done using
+ a plain ''memcmp()'' instead of using a real string comparison. But
+ this will only work if both strings are normalized first (remember,
+ this is UTF-8!).
+
+ The above optimizations should do the job, especially, since - as
+ mentioned already - the literal pattern is not the common case.
+
+ Open issues:
+ * What about **case-insensitive** comparisons!?
=== Simple Patterns ===
@@ -392,8 +449,9 @@
gboolean case_sensitive;
gchar *rname;
gsize length;
gint n;
+
length = strlen (filename);
rname = g_utf8_strreverse (name, -1);
--
This mail was generated by DokuWiki at
http://thunar.xfce.org/wiki/
More information about the Thunar-workers
mailing list