summaryrefslogtreecommitdiff
path: root/posts/2014-11-20-opening-a-file.org
diff options
context:
space:
mode:
authorFranck Cuny <franckcuny@gmail.com>2016-08-04 11:12:37 -0700
committerFranck Cuny <franckcuny@gmail.com>2016-08-04 11:12:37 -0700
commit2d2a43f200b88627253f2906fbae87cef7c1e8ce (patch)
treec65377350d12bd1e62e0bdd58458c1044541c27b /posts/2014-11-20-opening-a-file.org
parentUse Bullet list for the index. (diff)
downloadlumberjaph-2d2a43f200b88627253f2906fbae87cef7c1e8ce.tar.gz
Mass convert all posts from markdown to org.
Diffstat (limited to 'posts/2014-11-20-opening-a-file.org')
-rw-r--r--posts/2014-11-20-opening-a-file.org176
1 files changed, 176 insertions, 0 deletions
diff --git a/posts/2014-11-20-opening-a-file.org b/posts/2014-11-20-opening-a-file.org
new file mode 100644
index 0000000..63e4f6a
--- /dev/null
+++ b/posts/2014-11-20-opening-a-file.org
@@ -0,0 +1,176 @@
+A very common task for a programmer is to open a file. This seems to be
+a trivial operation, and we don't think twice about it. But what is
+really happening when we're opening that file ?
+
+** A simple C program
+
+For this exercise, I'm going to use this very simple C program:
+
+#+BEGIN_HTML
+ <script src="https://gist.github.com/franckcuny/d208c34a0b8397f3e4ca.js"></script>
+#+END_HTML
+
+The code does the following things:
+
+- opens a file in read-only mode
+- checks that we got a file descriptor
+- if we don't have the file descriptor, we print an error and exit
+- we close the file descriptor
+- we exit
+
+This is really simple and not much is going on, right ? Let's take a
+better look at it.
+
+The =fopen()= function that we use is provided by the libc. It's
+documentation is pretty straight forward (=man 3 fopen=): /"The fopen()
+function opens the file whose name is the string pointed to by path and
+associates a stream with it."/.
+
+** Run the program
+
+We're going to compile the source code first, so we can run the program:
+
+#+BEGIN_EXAMPLE
+ gcc -o test test.c
+#+END_EXAMPLE
+
+** Overview
+
+First I want to have an overview of the execution of this program. For
+this we will use =strace=.
+
+#+BEGIN_HTML
+ <script src="https://gist.github.com/franckcuny/7b9b9ab4fdccab364674.js"></script>
+#+END_HTML
+
+We can ignore most of that output, only the last few lines interest us.
+We can see two functions related to the code we wrote:
+
+- a call to open, with */etc/issue* as the first argument
+- a call to close, again, with 3 as the first argument
+
+The first function is the system call =open()=, and we see that it
+returns 3, which is our file descriptor. When =close()= is called, it's
+only argument is again 3, which is the file descriptor returned by
+=open()=, and then we exit.
+
+** Deeper
+
+Now let's invoke the program with gdb:
+
+#+BEGIN_HTML
+ <script src="https://gist.github.com/franckcuny/5ab16ac3a075200aafa1.js"></script>
+#+END_HTML
+
+We can see the calls (the =callq= instructions) to our three functions:
+=fopen()=, =perror()= and =fclose()=, but we want to take a look at what
+exactly is behind this functions. Let's try to dig the =fopen=
+instruction a little bit more (I've removed all the lines that are not
+the =callq= instructions):
+
+#+BEGIN_HTML
+ <script src="https://gist.github.com/franckcuny/1d7883696306611e9bd3.js"></script>
+#+END_HTML
+
+OK, so here we can see that we're calling the function
+=_IO_new_file_fopen()=.
+
+** libc
+
+In our program, we're using functions provided by the libc. We're going
+to take a look at =_IO_new_file_fopen=, and we can read the source
+[[http://fxr.watson.org/fxr/source/libio/fileops.c?v=GLIBC27#L252][here]].
+
+Most of the function is to set a bunch of flags, and then the next call
+we're interested in is
+[[http://fxr.watson.org/fxr/source/libio/fileops.c?v=GLIBC27#L335][=_IO_file_open=]].
+The function is defined
+[[http://fxr.watson.org/fxr/source/libio/fileops.c?v=GLIBC27#L217][here]].
+As you can see, here we end up calling =open()=.
+
+** system call
+
+The =open()= function is one of the linux system calls. If we look at
+[[http://lxr.free-electrons.com/source/include/linux/syscalls.h][the
+list of syscalls]], we can see that it is mapped to
+[[http://lxr.free-electrons.com/source/include/linux/syscalls.h#L512][=sys_open=]].
+
+The function is defined in
+[[http://lxr.free-electrons.com/source/fs/open.c#L992][fs/open.c]], and
+do a call to
+[[http://lxr.free-electrons.com/source/fs/open.c#L964][do\_sys\_open]].
+
+The interesting part of the function starts with the call to
+=get_unused_fd_flags()=, where we get a file descriptor. Then we do the
+call to =do_filp_open()=, where we end up (via more functions call):
+
+- geetting a file struct
+- find the inode
+- populate the file struct
+
+To finish, we do a call to =fsnotify()=, which will notify the watchers
+on this file, and add the file descriptor with the other struct files.
+
+** inodes
+
+To open a file, you need to locate it on the disk. A file is associated
+with an inode, which contains meta data about your file, and they are
+stored on your disk. When you want to reach a file, the kernel will find
+the inode and from that the location on the disk. You can read more
+about inodes on [[https://en.wikipedia.org/wiki/Inode][wikipedia]], and
+this [[https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout][great
+page about ext4]].
+
+You can run =man 1 stat= in your shell on the file to see the
+information we can find.
+
+#+BEGIN_HTML
+ <script src="https://gist.github.com/franckcuny/0104bdea0e515f809ad4.js"></script>
+#+END_HTML
+
+An inode is a data structure to represent an object on the filesystem.
+If you look at the previous output, you can see information like the
+size, the number of blocks, how many references exists to this file
+(links), etc.
+
+Here, we can see that the inode is 679618. Now let's take a look with
+the FS debugger:
+
+#+BEGIN_HTML
+ <script src="https://gist.github.com/franckcuny/016e6fc5be47a1fd4b4b.js"></script>
+#+END_HTML
+
+There's many cools things you can do with inode, like using =man 1 find=
+to find a file based on it's inode instead of file name.
+
+** Deeper!
+
+Valgrind is another amazing tool to do analysis of a program. Let's
+recompile our binary with the =-g= option, to embed debugging
+information in our binary:
+
+#+BEGIN_EXAMPLE
+ gcc -g -o test test.c
+#+END_EXAMPLE
+
+=valgrind= has an option =--tool= to use specific tool. Let's run
+valgrind with the *callgrind* tool, followed by =callgrind_annotate= to
+get a more readable output:
+
+#+BEGIN_HTML
+ <script src="https://gist.github.com/franckcuny/313fb41e150dfb28a2f7.js"></script>
+#+END_HTML
+
+With the =--cache-sim=yes= option, we count all the instructions for
+read access, cache misses, etc. Another nifty tool is *cachegrind*,
+which shows the cache misses for different level of caches.
+
+#+BEGIN_HTML
+ <script src="https://gist.github.com/franckcuny/71c1ae266b26aa8bf6e1.js"></script>
+#+END_HTML
+
+** The end
+
+As you can see, using various tools (and there's more tools available!),
+you can see that opening a file involves a lot of operations behind the
+scene.