[Thunar-workers] [DokuWiki] page changed: implementation:mime-glob-match

Sun Mar 27 22:26:29 CEST 2005

A page in your DokuWiki was added or changed. Here are the details:

Date        : 2005/03/27 20:26
Browser     : Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.6) Gecko/20050313 Firefox/1.0.1
IP-Address  : 217.85.190.105
Hostname    : pD955BE69.dip.t-dialin.net
Old Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match?rev=1110904629
New Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match
Edit Summary: fix description of the Literal Pattern handling
User        : benny

@@ -85,15 +85,45 @@
  
  {{implementation:mime:20050306-literal_patterns_list.png}}
  
  The literal patterns are arranged in a list, like shown on the picture
- above. With such a sorted list, a possible implementation of the pattern
+ above. A literal pattern is represented by a C structure, like this:
+ 
+ <code c>
+ struct LiteralPattern
+ {
+   LiteralPattern *next;
+   MimeType       *mime_type; // this could also point to the mime type name
+   char            pattern[1]; // use variable-length array if available
+ };
+ </code>
+ 
+ The literal patterns should be stored in one memory chunk (for the reasons
+ stated above); on ia32, for each literal pattern, there'll be two 4 byte
+ pointers and the pattern string (zero-terminated), where the string should
+ be padded with zero to the next multiple of 4. The memory chunk should have
+ a base alignment of 16 byte (''mmap()'' does this and ''malloc()'' on BSD and
+ Solaris, dunno about glibc). For example, if you have two literal patterns
+ - ''Makefile'' and ''core'' - the memory layout will look like this:
+ 
+ {{implementation:mime:20050327-literal_patterns_memory.png}}
+ 
+ This approach consumes about ''168'' bytes of memory for ''shared-mime-info-0.15''
+ on ia32.
+ 
+ Other platforms may benefit from a different memory-layout. The best layout
+ should be determined at configure time (this would probably break cross-compiling).
+ 
+ With such a sorted list, a possible implementation of the pattern
  matching function will look like this:
  
  <code c>
- for (pattern in literal patterns)
+ for (LiteralPattern *lp = literal_patterns; lp != NULL; lp = lp->next)
    {
-     for (f = filename, p = pattern; *f != '\0'; ++f, ++p)
+     const gchar *f;
+     const gchar *p;
+ 
+     for (f = filename, p = lp->pattern; *f != '\0'; ++f, ++p)
        {
          if (*p == '\0')
            goto no_match;
          else if (*p != *f)
@@ -110,35 +140,8 @@
  version ''0.15'') has only 8 characters, this implementation
  will stop with ''no_match'' after the first comparison in many
  cases, and we won't waste CPU cycles while comparing patterns
  that cannot match.
- 
- The //list view// above is just to illustrate the concept. The
- implementation should use an array to handle the patterns. Since
- the pattern strings are of variable length, we cannot simple use
- a C array here; but instead we'll use a large memory chunk (which
- will also include the Simple Patterns) to reduce **heap fragmentation**
- and increase **data locality**, and store the pattern data depending
- on the platform (data alignment!).
- 
- FIXME From the top of my head (ia32): ''malloc()'' or ''mmap()'' a
- block of memory (together with the other pattern types) with a base
- alignment of 16 byte (''mmap()'' does this and ''malloc()'' on BSD and
- Solaris, dunno about glibc); the size of the memory block can easily
- be calculated from the parsed ''globs'' data. For each literal pattern,
- have a pointer to the next or ''NULL'' for the last element, a pointer
- to the MIME type (or something else that identifies a MIME type uniquely,
- like an index into the MIME type table) and the pattern string data. On ia32:
- The former two are 4 byte each. The size of the pattern string data
- should be rounded up to the next multiple of 4 then. While the focus is
- on ia32 currently (as that's the common case), the implementation should
- at best automatically figure a good value for other platforms here (at
- best w/o depending on autoconf checks, which would break cross-compiling).
- 
- FIXME Calculate the size required for this approach in the Shared MIME
- Database analysis tool, and try to evaluate worst-case scenarios.
- 
- :!: More research required.
  
  
  === Simple Patterns ===
  This is the most common case of a pattern in the Shared MIME Database



-- 
This mail was generated by DokuWiki at
http://thunar.xfce.org/wiki/