[Thunar-workers] [DokuWiki] page changed: implementation:mime-glob-match
thunar-workers at xfce.org
thunar-workers at xfce.org
Sun Mar 27 22:26:29 CEST 2005
A page in your DokuWiki was added or changed. Here are the details:
Date : 2005/03/27 20:26
Browser : Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.6) Gecko/20050313 Firefox/1.0.1
IP-Address : 217.85.190.105
Hostname : pD955BE69.dip.t-dialin.net
Old Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match?rev=1110904629
New Revision: http://thunar.xfce.org/wiki/implementation:mime-glob-match
Edit Summary: fix description of the Literal Pattern handling
User : benny
@@ -85,15 +85,45 @@
{{implementation:mime:20050306-literal_patterns_list.png}}
The literal patterns are arranged in a list, like shown on the picture
- above. With such a sorted list, a possible implementation of the pattern
+ above. A literal pattern is represented by a C structure, like this:
+
+ <code c>
+ struct LiteralPattern
+ {
+ LiteralPattern *next;
+ MimeType *mime_type; // this could also point to the mime type name
+ char pattern[1]; // use variable-length array if available
+ };
+ </code>
+
+ The literal patterns should be stored in one memory chunk (for the reasons
+ stated above); on ia32, for each literal pattern, there'll be two 4 byte
+ pointers and the pattern string (zero-terminated), where the string should
+ be padded with zero to the next multiple of 4. The memory chunk should have
+ a base alignment of 16 byte (''mmap()'' does this and ''malloc()'' on BSD and
+ Solaris, dunno about glibc). For example, if you have two literal patterns
+ - ''Makefile'' and ''core'' - the memory layout will look like this:
+
+ {{implementation:mime:20050327-literal_patterns_memory.png}}
+
+ This approach consumes about ''168'' bytes of memory for ''shared-mime-info-0.15''
+ on ia32.
+
+ Other platforms may benefit from a different memory-layout. The best layout
+ should be determined at configure time (this would probably break cross-compiling).
+
+ With such a sorted list, a possible implementation of the pattern
matching function will look like this:
<code c>
- for (pattern in literal patterns)
+ for (LiteralPattern *lp = literal_patterns; lp != NULL; lp = lp->next)
{
- for (f = filename, p = pattern; *f != '\0'; ++f, ++p)
+ const gchar *f;
+ const gchar *p;
+
+ for (f = filename, p = lp->pattern; *f != '\0'; ++f, ++p)
{
if (*p == '\0')
goto no_match;
else if (*p != *f)
@@ -110,35 +140,8 @@
version ''0.15'') has only 8 characters, this implementation
will stop with ''no_match'' after the first comparison in many
cases, and we won't waste CPU cycles while comparing patterns
that cannot match.
-
- The //list view// above is just to illustrate the concept. The
- implementation should use an array to handle the patterns. Since
- the pattern strings are of variable length, we cannot simple use
- a C array here; but instead we'll use a large memory chunk (which
- will also include the Simple Patterns) to reduce **heap fragmentation**
- and increase **data locality**, and store the pattern data depending
- on the platform (data alignment!).
-
- FIXME From the top of my head (ia32): ''malloc()'' or ''mmap()'' a
- block of memory (together with the other pattern types) with a base
- alignment of 16 byte (''mmap()'' does this and ''malloc()'' on BSD and
- Solaris, dunno about glibc); the size of the memory block can easily
- be calculated from the parsed ''globs'' data. For each literal pattern,
- have a pointer to the next or ''NULL'' for the last element, a pointer
- to the MIME type (or something else that identifies a MIME type uniquely,
- like an index into the MIME type table) and the pattern string data. On ia32:
- The former two are 4 byte each. The size of the pattern string data
- should be rounded up to the next multiple of 4 then. While the focus is
- on ia32 currently (as that's the common case), the implementation should
- at best automatically figure a good value for other platforms here (at
- best w/o depending on autoconf checks, which would break cross-compiling).
-
- FIXME Calculate the size required for this approach in the Shared MIME
- Database analysis tool, and try to evaluate worst-case scenarios.
-
- :!: More research required.
=== Simple Patterns ===
This is the most common case of a pattern in the Shared MIME Database
--
This mail was generated by DokuWiki at
http://thunar.xfce.org/wiki/
More information about the Thunar-workers
mailing list