SimGrid
Frequently Asked Questions
Table of content
  1. I'm new to SimGrid. I have some questions. Where should I start?
    1. What is the difference between MSG, SimDag, and GRAS? Do they serve the same purpose?
    2. First steps with SimGrid
    3. Visualizing and analyzing the results
    4. Argh! Do I really have to code in C?
  2. Installing the SimGrid library with Cmake (since V3.4)
    1. Some generalitty
      1. What is Cmake?
      2. Why cmake?
      3. What cmake need?
      4. Liste of options
      5. Options explaination
      6. Initialisation
      7. Option's cache and how to reset?
    2. Cmake compilation
      1. With command line.
      2. With ccmake tool.
      3. Build out of source.
      4. Resume of command line
    3. How to install with cmake?
      1. From svn.
      2. From a distrib
    4. What is installed by cmake?
      1. CMAKE_INSTALL_PREFIX/bin
      2. CMAKE_INSTALL_PREFIX/doc
      3. CMAKE_INSTALL_PREFIX/include
      4. CMAKE_INSTALL_PREFIX/lib
    5. How to modified sources files for developers
      1. Add an executable or examples.
      2. Delete/add sources to lib.
      3. Add test
    6. Pipol-remote
  3. Installing the SimGrid library with Autotools (valid until V3.3.4)
    1. Compiling SimGrid from a stable archive
    2. Java bindings don't get compiled
    3. SimGrid development snapshots
    4. Compiling SimGrid from the SVN
    5. Setting up your own MSG code
    6. Setting up your own GRAS code
  4. Feature related questions
    1. "Could you please add (your favorite feature here) to SimGrid?"
    2. MSG features
      1. I want some more complex MSG examples!
      2. Missing in action: MSG Task duplication/replication
      3. I want to do asynchronous communications in MSG
      4. I need to synchronize my MSG processes
      5. Where is the get_host_load function hidden in MSG?
      6. How can I get the *real* communication time?
    3. SimDag related questions
      1. Implementing communication delays between tasks.
      2. How to implement a distributed dynamic scheduler of DAGs.
    4. Generic features
      1. Increasing the amount of simulated processes
      2. Is there a native support for batch schedulers in SimGrid?
      3. I need a checkpointing thing
    5. Platform building and Dynamic resources
      1. Where can I find SimGrid platform files?
      2. How can I automatically map an existing platform?
      3. Generating synthetic but realistic platforms
      4. Modeling multi-core resources
      5. Modeling dynamic resource availability
      6. How to express multipath routing in platform files?
      7. Bypassing the XML parser with your own C functions
    6. Changing SimGrid's behavior
      1. Using Fullduplex
      2. Using GTNetS
      3. Using alternative flow models
    7. Tracing Simulations for Visualization
      1. How it works
      2. Enabling using CMake
      3. Tracing Functions
      4. Tracing configuration Options
      5. Example of Instrumentation
      6. Analyzing the SimGrid Traces
    8. Model-Checking
      1. How to use it
    9. Lua Binding
      1. What is lua ?
      2. Why lua ?
      3. How to use lua in Simgrid ?
      4. Master/Slave Example
      5. Exchanging Data
      6. Bypass XML
    10. Ruby Binding
      1. Use Ruby in Simgrid
      2. Master/Slave Ruby Application
      3. Exchanging data
  5. Troubleshooting
    1. SimGrid compilation and installation problems
      1. cmake fails!
      2. Dude! "ctest" fails on my machine!
    2. User code compilation problems
      1. "gcc: _simgrid_this_log_category_does_not_exist__??? undeclared (first use in this function)"
      2. "gcc: undefined reference to pthread_key_create"
    3. Runtime error messages
      1. "surf_parse_lex: Assertion `next limit' failed."
      2. GRAS spits networking error messages
      3. I'm told that my XML files are too old.
    4. Valgrind-related and other debugger issues
      1. longjmp madness in valgrind
      2. Valgrind spits tons of errors about backtraces!
      3. Truncated backtraces
    5. There is a deadlock in my code!!!
    6. I get weird timings when I play with the latencies.
    7. So I've found a bug in SimGrid. How to report it?
    8. Cross-compiling a Windows DLL of SimGrid from linux

I'm new to SimGrid. I have some questions. Where should I start?

You are at the right place... Having a look to these the slides of the HPCS'10 tutorial (or to these ancient slides, or to these "obsolete" slides) may give you some insights on what SimGrid can help you to do and what are its limitations. Then you definitely should read the Examples of MSG. The GRAS Tutorial can also help you.

If you are stuck at any point and if this FAQ cannot help you, please drop us a mail to the user mailing list: <simgrid-user@lists.gforge.inria.fr>.

What is the difference between MSG, SimDag, and GRAS? Do they serve the same purpose?

It depend on how you define "purpose", I guess ;)

They all allow you to build a prototype of application which you can run within the simulator afterward. They all share the same simulation kernel, which is the core of the SimGrid project. They differ by the way you express your application.

With SimDag, you express your code as a collection of interdependent parallel tasks. So, in this model, applications can be seen as a DAG of tasks. This is the interface of choice for people wanting to port old code designed for SimGrid v1 or v2 to the framework current version.

With both GRAS and MSG, your application is seen as a set of communicating processes, exchanging data by the way of messages and performing computation on their own.

The difference between both is that MSG is somehow easier to use, but GRAS is not limited to the simulator. Once you're done writing your GRAS code, you can run your code both in the simulator or on a real platform. For this, there is two implementations of the GRAS interface, one for simulation, one for real execution. So, you just have to relink your code to chose one of both world.

First steps with SimGrid

If you decide to go for the MSG interface, please read carefully the Examples of MSG. You'll find in Master/slave application a very simple consisting of a master (that owns a bunch of tasks and distributes them) , some slaves (that process tasks whenever they receive one) and some forwarder agents (that simply pass the tasks they receive to some slaves).

If you decide to go for the GRAS interface, you should definitively read the GRAS Tutorial. The first section constitutes an introduction to the tool and presents the model we use. The second section constitutes a complete step-by-step tutorial building a distributed application from the beginning and exemplifying most of the GRAS features in the process. The last section groups some HOWTOS highlighting a given feature of the framework in a more concise way.

If you decide to go for another interface, I'm afraid your only sources of information will be the source code and the mailing lists...

Visualizing and analyzing the results

It is sometime convenient to "see" how the agents are behaving. If you like colors, you can use tools/MSG_visualization/colorize.pl as a filter to your MSG outputs. It works directly with INFO. Beware, INFO() prints on stderr. Do not forget to redirect if you want to filter (e.g. with bash):

./msg_test small_platform.xml small_deployment.xml 2>&1 | ../../tools/MSG_visualization/colorize.pl

We also have a more graphical output. Have a look at section Tracing Simulations for Visualization.

Argh! Do I really have to code in C?

Up until now, there is no binding for other languages. If you use C++, you should be able to use the SimGrid library as a standard C library and everything should work fine (simply link against this library; recompiling SimGrid with a C++ compiler won't work and it wouldn't help if you could).

In fact, we are currently working on Java bindings of MSG to allow all the undergrad students of the world to use this tool. This is a little more tricky than I would have expected, but the work is moving fast forward [2006/05/13]. More languages are evaluated, but for now, we do not feel a real demand for any other language. Please speak up!

Installing the SimGrid library with Cmake (since V3.4)

Some generalitty

What is Cmake?

CMake is a family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files. CMake generates native makefiles and workspaces that can be used in the compiler environment of your choice. For more information see official web site here.

Why cmake?

CMake permits to developers to compil projects on different plateforms. Then many tools are embedded like ctest for making test, a link to cdash for vizualise results but also test coverage and bug reports.

What cmake need?

CMake needs some prerequists like :

For Unix and MacOS:

For Windows :

Liste of options

"cmake -D[name]=[value] ... ./"

[name] 	enable_gtnets		[value] ON/OFF or TRUE/FALSE or 1/0
	enable_lua			ON/OFF or TRUE/FALSE or 1/0
	enable_compile_optimizations	ON/OFF or TRUE/FALSE or 1/0
	enable_compile_warnings		ON/OFF or TRUE/FALSE or 1/0
	enable_smpi			ON/OFF or TRUE/FALSE or 1/0
	enable_maintainer_mode		ON/OFF or TRUE/FALSE or 1/0
	enable_supernovae		ON/OFF or TRUE/FALSE or 1/0
	enable_tracing 			ON/OFF or TRUE/FALSE or 1/0
	enable_coverage 		ON/OFF or TRUE/FALSE or 1/0
	enable_memcheck 		ON/OFF or TRUE/FALSE or 1/0
	enable_model-checking		ON/OFF or TRUE/FALSE or 1/0
	enable_debug			ON/OFF or TRUE/FALSE or 1/0
	enable_jedule 	 		ON/OFF or TRUE/FALSE or 1/0
	enable_latency_bound_tracking 	ON/OFF or TRUE/FALSE or 1/0
	enable_lib_static		ON/OFF or TRUE/FALSE or 1/0
	enable_pcre			ON/OFF or TRUE/FALSE or 1/0
	custom_flags 			<flags>
	gtnets_path			<path_to_gtnets_directory>
	CMAKE_INSTALL_PREFIX		<path_to_install_directory>
	pipol_user			<pipol_username>

Options explaination

Initialisation

Those options are initialized the first time you launch "cmake ." whithout specified option.

enable_gtnets			on
enable_lua			on
enable_smpi			on
enable_supernovae		on
enable_tracing			on
enable_compile_optimizations	on
enable_debug			on
enable_pcre			on
enable_compile_warnings		off
enable_maintainer_mode		off
enable_coverage 		off
enable_memcheck 		off
enable_model-checking		off
enable_jedule 	 		off
enable_latency_bound_tracking 	off 
enable_lib_static		off
CMAKE_INSTALL_PREFIX		/usr/local
custom_flags			null
gtnets_path			null
pipol_user			null

Option's cache and how to reset?

When options have been set they are keep into a cache file named "CMakeCache.txt". So if you want reset values you just delete this file located to the project directory.

Cmake compilation

With command line.

cmake -D[name]=[value] ... ./
make

On Windows

cmake -G"Unix Makefiles" -D[name]=[value] ... ./
gmake

With ccmake tool.

"ccmake ./"

Then follow instructions.

Build out of source.

As cmake generate many files used for compilation, we recommand to make a build directory. For examples you can make :

"navarrop@caraja:~/Developments$ cd simgrid/"
"navarrop@caraja:~/Developments/simgrid$ mkdir build_directory"
"navarrop@caraja:~/Developments/simgrid$ cd build_directory/"
"navarrop@caraja:~/Developments/simgrid/build_directory$ cmake ../"
"navarrop@caraja:~/Developments/simgrid/build_directory$ make"

Or complety out of sources :

"navarrop@caraja:~/Developments$ mkdir build_dir"
"navarrop@caraja:~/Developments$ cd build_dir/"
"navarrop@caraja:~/Developments/build_dir$ cmake ../simgrid/"
"navarrop@caraja:~/Developments/build_dir$ make"

Those two kind of compilation permit to delete files created by compilation easier.

Resume of command line

When the project have been succesfully compiling and build you can make tests.

If you want to test before make a commit you can simply make "ctest -D Experimental" and then you can visualize results submitted into Cdash. (Go to Cdash site).

How to install with cmake?

From svn.

For Unix and MacOS:

cmake -Denable_maintainer_mode=on -DCMAKE_INSTALL_PREFIX=/home/navarrop/Bureau/install_simgrid ./
make 
make install

For Windows:

cmake -G"Unix Makefiles" -DCMAKE_INSTALL_PREFIX=C:\simgrid_install ./
gmake
gmake install

From a distrib

For version 3.4.1 and 3.4
	cmake -Dprefix=/home/navarrop/Bureau/install_simgrid ./
	make
	make install-simgrid
Since version 3.5
	cmake -DCMAKE_INSTALL_PREFIX=/home/navarrop/Bureau/install_simgrid ./
	make
	make install

What is installed by cmake?

CMAKE_INSTALL_PREFIX/bin

tesh
graphicator
gras_stub_generator
simgrid_update_xml
simgrid-colorizer
smpicc
smpiff
smpif2c
smpirun

CMAKE_INSTALL_PREFIX/doc

simgrid/examples/
simgrid/html/

CMAKE_INSTALL_PREFIX/include

amok/
gras/
instr/
mc/
msg/
simdag/
simix/
smpi/
surf/
xbt/
gras.h
simgrid_config.h
xbt.h

CMAKE_INSTALL_PREFIX/lib

libgras.so.3.5
libsimgrid.so.3.5
libsmpi.so.3.5
libsimgrid.so -> libsimgrid.so.3.5
libgras.so -> libgras.so.3.5
libsmpi.so -> libsmpi.so.3.5
lua/5.1/simgrid.so -> ../../libsimgrid.so
ruby/1.9.0/x86_64-linux/libsimgrid.so -> ../../../libsimgrid.so
ruby/1.9.0/x86_64-linux/simgrid.rb

How to modified sources files for developers

Add an executable or examples.

If you want make an executable you have to create a CMakeList.txt to the src directory. You must specified where to create the executable, source list, dependencies and the name of the binary.

cmake_minimum_required(VERSION 2.6)

set(EXECUTABLE_OUTPUT_PATH "./")			
set(LIBRARY_OUTPUT_PATH "${CMAKE_HOME_DIRECTORY}/lib")

add_executable(get_sender get_sender.c)					#add_executable(<name_of_target> <src list>)

### Add definitions for compile
target_link_libraries(get_sender simgrid m pthread) 	#target_link_libraries(<name_of_targe> <dependencies>)

Then you have to modified <project/directory>/buildtools/Cmake/MakeExeLib.cmake and add this line :

add_subdirectory(${CMAKE_HOME_DIRECTORY}/<path_where_is_CMakeList.txt>)

Delete/add sources to lib.

If you want modified, add or delete source files from a library you have to edit <project/directory>/buildtools/Cmake/DefinePackages.cmake

set(JMSG_JAVA_SRC
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/MsgException.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/JniException.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/NativeException.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/HostNotFoundException.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/ProcessNotFoundException.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/Msg.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/Process.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/Host.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/Task.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/MsgNative.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/ApplicationHandler.java
	${CMAKE_HOME_DIRECTORY}/src/java/simgrid/msg/Sem.java
)

Add test

If you want modified, add or delete tests you have to edit <project/directory>/buildtools/Cmake/AddTests.cmake with this function : ADD_TEST(<name> <bin> <ARGS>)

add_test(test-simdag-1 ${CMAKE_HOME_DIRECTORY}/testsuite/simdag/sd_test --cfg=path:${CMAKE_HOME_DIRECTORY}/testsuite/simdag small_platform_variable.xml)

Pipol-remote

Now we offer the possibility to test your local sources on pipol platforms before a commit. Of course you have to be user of pipol (Account request) cause you need to give your pipol_username to cmake. Here is a list of available systems :

    amd64_kvm-linux-debian-lenny
    amd64_kvm-linux-debian-testing
    amd64_kvm-windows-7
    amd64-linux-centos-5.dd.gz
    amd64-linux-debian-etch.dd.gz
    amd64-linux-debian-lenny.dd.gz
    amd64-linux-debian-testing.dd.gz
    amd64-linux-fedora-core10.dd.gz
    amd64-linux-fedora-core11.dd.gz
    amd64-linux-fedora-core12.dd.gz
    amd64-linux-fedora-core13.dd.gz
    amd64-linux-fedora-core7.dd.gz
    amd64-linux-fedora-core8.dd.gz
    amd64-linux-fedora-core9.dd.gz
    amd64-linux-mandriva-2007_springs_powerpack.dd.gz
    amd64-linux-mandriva-2009_powerpack.dd.gz
    amd64-linux-opensuse-11.dd.gz
    amd64-linux-redhatEL-5.0.dd.gz
    amd64-linux-suse-LES10.dd.gz
    amd64-linux-ubuntu-feisty.dd.gz
    amd64-linux-ubuntu-hardy.dd.gz
    amd64-linux-ubuntu-intrepid.dd.gz
    amd64-linux-ubuntu-jaunty.dd.gz
    amd64-linux-ubuntu-karmic.dd.gz
    amd64-linux-ubuntu-lucid.dd.gz
    amd64-unix-freebsd-7.dd.gz
    amd64-windows-server-2003-64bits.dd.gz
    amd64-windows-server-2008-64bits.dd.gz
    i386_kvm-linux-debian-lenny
    i386_kvm-linux-debian-testing
    i386_kvm-linux-fedora-core13
    i386_kvm-windows-xp-pro-sp3
    i386-linux-centos-5.dd.gz
    i386-linux-debian-etch.dd.gz
    i386-linux-debian-lenny.dd.gz
    i386-linux-debian-testing.dd.gz
    i386-linux-fedora-core10.dd.gz
    i386-linux-fedora-core11.dd.gz
    i386-linux-fedora-core12.dd.gz
    i386-linux-fedora-core13.dd.gz
    i386-linux-fedora-core7.dd.gz
    i386-linux-fedora-core8.dd.gz
    i386-linux-fedora-core9.dd.gz
    i386-linux-mandriva-2007_springs_powerpack.dd.gz
    i386-linux-mandriva-2009_powerpack.dd.gz
    i386-linux-opensuse-11.dd.gz
    i386-linux-redhatEL-5.0.dd.gz
    i386-linux-suse-LES10.dd.gz
    i386-linux-ubuntu-feisty.dd.gz
    i386-linux-ubuntu-hardy.dd.gz
    i386-linux-ubuntu-intrepid.dd.gz
    i386-linux-ubuntu-jaunty.dd.gz
    i386-linux-ubuntu-karmic.dd.gz
    i386-linux-ubuntu-lucid.dd.gz
    i386_mac-mac-osx-server-leopard.dd.gz
    i386-unix-freebsd-7.dd.gz
    i386-unix-opensolaris-10.dd.gz
    i386-unix-opensolaris-11.dd.gz
    i386-unix-solaris-10.dd.gz
    ia64-linux-debian-lenny.dd
    ia64-linux-fedora-core9.dd
    ia64-linux-redhatEL-5.0.dd
    x86_64_mac-mac-osx-server-snow-leopard.dd.gz
    x86_mac-mac-osx-server-snow-leopard.dd.gz

Two kind of uses are possible :

This command copy your source and execute a configure then a build and finish with tests.
	bob@caraja:~/Developments/simgrid/tmp_build$ make <name_of_image> 

This command copy your source and execute a \"ctest -D Experimental\" and submit the result to cdash.
	bob@caraja:~/Developments/simgrid/tmp_build$ make <name_of_image>_experimental 

All commands are resumed with :

bob@caraja:~/Developments/simgrid/tmp_build$ make pipol_experimental_list_images
bob@caraja:~/Developments/simgrid/tmp_build$ make pipol_test_list_images

Installing the SimGrid library with Autotools (valid until V3.3.4)

Many people have been asking me questions on how to use SimGrid. Quite often, the questions were not really about SimGrid but on the installation process. This section is intended to help people that are not familiar with compiling C files under UNIX. If you follow these instructions and still have some troubles, drop an e-mail to <simgrid-user@lists.gforge.inria.fr>.

Compiling SimGrid from a stable archive

First of all, you need to download the latest version of SimGrid from here. Suppose you have uncompressed SimGrid in some temporary location of your home directory (say /home/joe/tmp/simgrid-3.0.1 ). The simplest way to use SimGrid is to install it in your home directory. Change your directory to /home/joe/tmp/simgrid-3.0.1 and type

./configure --prefix=$HOME
make
make install

If at some point, something fails, check the section User code compilation problems . If it does not help, you can report this problem to the list but, please, avoid sending a laconic mail like "There is a problem. Is it okay?". Send the config.log file which is automatically generated by configure. Try to capture both the standard output and the error output of the make command with script. There is no way for us to help you without the relevant bits of information.

Now, the following directory should have been created :

SimGrid is not a binary, it is a library. Both a static and a dynamic version are available. Here is what you can find if you try a ls /home/joe/lib:

libsimgrid.a libsimgrid.la libsimgrid.so libsimgrid.so.0 libsimgrid.so.0.0.1

Thus, there is two ways to link your program with SimGrid:

Java bindings don't get compiled

The configure script detects automatically whether you have the softwares needed to use the Java bindings or not. At the end of the configure, you can see the configuration picked by the script, which should look similar to

Configuration of package simgrid' (version 3.3.4-svn) on
little64 (=4):

	 Compiler:	 gcc (version: )
	 
	 CFlags:       	  -O3 -finline-functions -funroll-loops -fno-strict-aliasing -Wall -Wunused -Wmissing-prototypes -Wmissing-declarations -Wpointer-arith -Wchar-subscripts -Wcomment -Wformat -Wwrite-strings -Wno-unused-function -Wno-unused-parameter -Wno-strict-aliasing -Wno-format-nonliteral -Werror -g3
	 CPPFlags:   
         LDFlags:	 
				   
         Context backend: ucontext
         Compile Java: no
							 
         Maintainer mode: no
         Supernovae mode: yes

In this example, Java backends won't be compiled.

On Debian-like systems (which includes ubuntu), you need the following packages: sun-java6-jdk libgcj10-dev. If you cannot find the libgcj10-dev, try another version, like libgcj9-dev (on Ubuntu before 9.10) or libgcj11-dev (not released yet, but certainly one day). Please note that you need to activate the contrib and non-free repositories in Debian, and the universe ones in Ubuntu. Java comes at this price...

SimGrid development snapshots

We have very high standards on software quality, and we are reluctant releasing a stable release as long as there is still some known bug in the code base. In addition, we added quite an extensive test base, making sure that we correctly test the most important parts of the tool.

As an unfortunate conclusion, there may be some time between the stable releases. If you want to benefit from the most recent features we introduced, but don't want to take the risk of an untested version from the SVN, then development snapshots are done for you.

These are pre-releases of SimGrid that still fail some tests about features that almost nobody use, or on platforms not being in our core target (which is Linux, Mac, other Unixes and Windows, from the most important to the less one). That means that using this development releases should be safe for most users.

These archives can be found on this web page. Once you got the lastest archive, you can compile it just like any archive (see above).

Compiling SimGrid from the SVN

The project development takes place in the SVN, where all changes are committed when they happen. Then every once in a while, we make sure that the code quality meets our standard and release an archive from the code in the SVN. We afterward go back to the development in the SVN. So, if you need a recently added feature and can afford some little problem with the stability of the lastest features, you may want to use the SVN version instead of a released one.

For that, you first need to get the "simgrid" module from here.

You won't find any configure and a few other things (Makefile.in's, documentation, ...) will be missing as well. The reason for that is that all these files have to be regenerated using the latest versions of autoconf, libtool, automake (>1.9) and doxygen (>1.4). To generate the configure and the Makefile.in's, you just have to launch the bootstrap command that resides in the top of the source tree. Then just follow the instructions of Section Compiling SimGrid from a stable archive.

We insist on the fact that you really need the latest versions of autoconf, automake and libtool. Doing this step on exotic architectures/systems (i.e. anything different from a recent linux distribution) may be ... uncertain. If you need to compile the SVN version on a machine where all these dependencies are not met, the easiest is to do make dist in the SVN directory of another machine where all dependencies are met. It will create an archive you may deploy on other sites just as a regular stable release.

In summary, the following commands will checkout the SVN, regenerate the configure script and friends, configure SimGrid and build it.

svn checkout svn://scm.gforge.inria.fr/svn/simgrid/simgrid/trunk simgrid
cd simgrid
./bootstrap
./configure --enable-maintainer-mode --prefix=<where to install SimGrid>
make 

Then, if you want to install SimGrid on the current box, just do:

make install 

If you want to build an snapshot of the SVN to deploy it on another box (for example because the other machine don't have the autotools), do:

make dist 

Moreover, you should never call the autotools manually since you must run them in a specific order with specific arguments. Most of the times, the makefiles will automatically call the tools for you. When it's not possible (such as the first time you checkout the SVN), use the ./bootstrap command to call them explicitly.

Setting up your own MSG code

Do not build your simulator by modifying the SimGrid examples. Go outside the SimGrid source tree and create your own working directory (say /home/joe/SimGrid/MyFirstScheduler/).

Suppose your simulation has the following structure (remember it is just an example to illustrate a possible way to compile everything; feel free to organize it as you want).

To compile such a program, we suggest to use the following Makefile. It is a generic Makefile that we have used many times with our students when we teach the C language.

all: masterslave 
masterslave: masterslave.o sched.o

INSTALL_PATH = $$HOME
CC = gcc
PEDANTIC_PARANOID_FREAK =       -O0 -Wshadow -Wcast-align \
				-Waggregate-return -Wmissing-prototypes -Wmissing-declarations \
				-Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations \
				-Wmissing-noreturn -Wredundant-decls -Wnested-externs \
				-Wpointer-arith -Wwrite-strings -finline-functions
REASONABLY_CAREFUL_DUDE =	-Wall
NO_PRAYER_FOR_THE_WICKED =	-w -O2 
WARNINGS = 			$(REASONABLY_CAREFUL_DUDE)
CFLAGS = -g $(WARNINGS)

INCLUDES = -I$(INSTALL_PATH)/include
DEFS = -L$(INSTALL_PATH)/lib/
LDADD = -lm -lsimgrid 
LIBS = 

%: %.o
	$(CC) $(INCLUDES) $(DEFS) $(CFLAGS) $^ $(LIBS) $(LDADD) -o $@ 

%.o: %.c
	$(CC) $(INCLUDES) $(DEFS) $(CFLAGS) -c -o $@ $<

clean:
	rm -f $(BIN_FILES) *.o *~
.SUFFIXES:
.PHONY : clean

The first two lines indicates what should be build when typing make (masterslave) and of which files it is to be made of (masterslave.o and sched.o). This makefile assumes that you have set up correctly your LD_LIBRARY_PATH variable (look, there is a LDADD = -lm -lsimgrid). If you prefer using the static version, remove the -lsimgrid and add a /lib/libsimgrid.a on the next line, right after the LIBS = .

More generally, if you have never written a Makefile by yourself, type in a terminal : info make and read the introduction. The previous example should be enough for a first try but you may want to perform some more complex compilations...

Setting up your own GRAS code

If you use the GRAS interface instead of the MSG one, then previous section is not the better source of information. Instead, you should check the GRAS tutorial in general, and the Lesson 1: Setting up your own project in particular.

Feature related questions

"Could you please add (your favorite feature here) to SimGrid?"

Here is the deal. The whole SimGrid project (MSG, SURF, GRAS, ...) is meant to be kept as simple and generic as possible. We cannot add functions for everybody's needs when these functions can easily be built from the ones already in the API. Most of the time, it is possible and when it was not possible we always have upgraded the API accordingly. When somebody asks us a question like "How to do that? Is there a function in the API to simply do this?", we're always glad to answer and help. However if we don't need this code for our own need, there is no chance we're going to write it... it's your job! :) The counterpart to our answers is that once you come up with a neat implementation of this feature (task duplication, RPC, thread synchronization, ...), you should send it to us and we will be glad to add it to the distribution. Thus, other people will take advantage of it (and we don't have to answer this question again and again ;).

You'll find in this section a few "Missing In Action" features. Many people have asked about it and we have given hints on how to simply do it with MSG. Feel free to contribute...

MSG features

I want some more complex MSG examples!

Many people have come to ask me a more complex example and each time, they have realized afterward that the basics were in the previous three examples.

Of course they have often been needing more complex functions like MSG_process_suspend(), MSG_process_resume() and MSG_process_isSuspended() (to perform synchronization), or MSG_task_Iprobe() and MSG_process_sleep() (to avoid blocking receptions), or even MSG_process_create() (to design asynchronous communications or computations). But the examples are sufficient to start.

We know. We should add some more examples, but not really some more complex ones... We should add some examples that illustrate some other functionalists (like how to simply encode asynchronous communications, RPC, process migrations, thread synchronization, ...) and we will do it when we will have a little bit more time. We have tried to document the examples so that they are understandable. Tell us if something is not clear and once again feel free to participate! :)

Missing in action: MSG Task duplication/replication

There is no task duplication in MSG. When you create a task, you can process it or send it somewhere else. As soon as a process has sent this task, he doesn't have this task anymore. It's gone. The receiver process has got the task. However, you could decide upon receiving to create a "copy" of a task but you have to handle by yourself the semantic associated to this "duplication".

As we already told, we prefer keeping the API as simple as possible. This kind of feature is rather easy to implement by users and the semantic you associate really depends on people. Having a generic* task duplication mechanism is not that trivial (in particular because of the data field). That is why I would recommand that you write it by yourself even if I can give you advice on how to do it.

You have the following functions to get informations about a task: MSG_task_get_name(), MSG_task_get_compute_duration(), MSG_task_get_remaining_computation(), MSG_task_get_data_size(), and MSG_task_get_data().

You could use a dictionary (xbt_dict_t) of dynars (xbt_dynar_t). If you still don't see how to do it, please come back to us...

I want to do asynchronous communications in MSG

In the past (version <= 3.4), there was no function to perform asynchronous communications. It could easily be implemented by creating new process when needed though. Since version 3.5, we have introduced the following functions:

We refer you to the description of these functions for more details on their usage as well as to the exemple section on Asynchronous communication applications.

I need to synchronize my MSG processes

You obviously cannot use pthread_mutexes of pthread_conds since we handle every scheduling related decision within SimGrid.

In the past (version <=3.3.4) you could do it by playing with MSG_process_suspend() and MSG_process_resume() or with fake communications (using MSG_task_get(), MSG_task_put() and MSG_task_Iprobe()).

Since version 3.4, you can use classical synchronization structures. See page XBT_synchro or simply check in include/xbt/synchro_core.h.

Where is the get_host_load function hidden in MSG?

There is no such thing because its semantic wouldn't be really clear. Of course, it is something about the amount of host throughput, but there is as many definition of "host load" as people asking for this function. First, you have to remember that resource availability may vary over time, which make any load notion harder to define.

It may be instantaneous value or an average one. Moreover it may be only the power of the computer, or may take the background load into account, or may even take the currently running tasks into account. In some SURF models, communications have an influence on computational power. Should it be taken into account too?

First of all, it's near to impossible to predict the load beforehands in the simulator since it depends on too much parameters (background load variation, bandwidth sharing algorithmic complexity) some of them even being not known beforehands (other task starting at the same time). So, getting this information is really hard (just like in real life). It's not just that we want MSG to be as painful as real life. But as it is in some way realistic, we face some of the same problems as we would face in real life.

How would you do it for real? The most common option is to use something like NWS that performs active probes. The best solution is probably to do the same within MSG, as in next code snippet. It is very close from what you would have to do out of the simulator, and thus gives you information that you could also get in real settings to not hinder the realism of your simulation.

double get_host_load() {
   m_task_t task = MSG_task_create("test", 0.001, 0, NULL);
   double date = MSG_get_clock();

   MSG_task_execute(task);
   date = MSG_get_clock() - date;
   MSG_task_destroy(task);
   return (0.001/date);
}

Of course, it may not match your personal definition of "host load". In this case, please detail what you mean on the mailing list, and we will extend this FAQ section to fit your taste if possible.

How can I get the *real* communication time?

Communications are synchronous and thus if you simply get the time before and after a communication, you'll only get the transmission time and the time spent to really communicate (it will also take into account the time spent waiting for the other party to be ready). However, getting the *real* communication time is not really hard either. The following solution is a good starting point.

int sender()
{
  m_task_t task = MSG_task_create("Task", task_comp_size, task_comm_size, 
                                  calloc(1,sizeof(double)));
  *((double*) task->data) = MSG_get_clock();
  MSG_task_put(task, slaves[i % slaves_count], PORT_22);
  XBT_INFO("Send completed");
  return 0;
}
int receiver()
{
  m_task_t task = NULL;
  double time1,time2;

  time1 = MSG_get_clock();
  a = MSG_task_get(&(task), PORT_22);
  time2 = MSG_get_clock();
  if(time1<*((double *)task->data))
     time1 = *((double *) task->data);
  XBT_INFO("Communication time :  \"%f\" ", time2-time1);
  free(task->data);
  MSG_task_destroy(task);
  return 0;
}

SimDag related questions

Implementing communication delays between tasks.

A classic question of SimDag newcomers is about how to express a communication delay between tasks. The thing is that in SimDag, both computation and communication are seen as tasks. So, if you want to model a data dependency between two DAG tasks t1 and t2, you have to create 3 SD_tasks: t1, t2 and c and add dependencies in the following way:

SD_task_dependency_add(NULL, NULL, t1, c);
SD_task_dependency_add(NULL, NULL, c, t2);

This way task t2 cannot start before the termination of communication c which in turn cannot start before t1 ends.

When creating task c, you have to associate an amount of data (in bytes) corresponding to what has to be sent by t1 to t2.

Finally to schedule the communication task c, you have to build a list comprising the workstations on which t1 and t2 are scheduled (w1 and w2 for example) and build a communication matrix that should look like [0;amount ; 0; 0].

How to implement a distributed dynamic scheduler of DAGs.

Distributed is somehow "contagious". If you start making distributed decisions, there is no way to handle DAGs directly anymore (unless I am missing something). You have to encode your DAGs in term of communicating process to make the whole scheduling process distributed. Here is an example of how you could do that. Assume T1 has to be done before T2.

 int your_agent(int argc, char *argv[] {
   ...
   T1 = MSG_task_create(...);
   T2 = MSG_task_create(...);
   ...
   while(1) {
     ...
     if(cond) MSG_task_execute(T1);
     ...
     if((MSG_task_get_remaining_computation(T1)=0.0) && (you_re_in_a_good_mood))
        MSG_task_execute(T2)
     else {
        /* do something else */
     }
   }
 }

If you decide that the distributed part is not that much important and that DAG is really the level of abstraction you want to work with, then you should give a try to SimDag.

Generic features

Increasing the amount of simulated processes

Here are a few tricks you can apply if you want to increase the amount of processes in your simulations.

Is there a native support for batch schedulers in SimGrid?

No, there is no native support for batch schedulers and none is planned because this is a very specific need (and doing it in a generic way is thus very hard). However some people have implemented their own batch schedulers. Vincent Garonne wrote one during his PhD and put his code in the contrib directory of our SVN so that other can keep working on it. You may find inspiring ideas in it.

I need a checkpointing thing

Actually, it depends on whether you want to checkpoint the simulation, or to simulate checkpoints.

The first one could help if your simulation is a long standing process you want to keep running even on hardware issues. It could also help to rewind the simulation by jumping sometimes on an old checkpoint to cancel recent calculations.
Unfortunately, such thing will probably never exist in SG. One would have to duplicate all data structures because doing a rewind at the simulator level is very very hard (not talking about the malloc free operations that might have been done in between). Instead, you may be interested in the Libckpt library (http://www.cs.utk.edu/~plank/plank/www/libckpt.html). This is the checkpointing solution used in the condor project, for example. It makes it easy to create checkpoints (at the OS level, creating something like core files), and rerunning them on need.

If you want to simulate checkpoints instead, it means that you want the state of an executing task (in particular, the progress made towards completion) to be saved somewhere. So if a host (and the task executing on it) fails (cf. MSG_HOST_FAILURE), then the task can be restarted from the last checkpoint.

Actually, such a thing does not exists in SimGrid either, but it's just because we don't think it is fundamental and it may be done in the user code at relatively low cost. You could for example use a watcher that periodically get the remaining amount of things to do (using MSG_task_get_remaining_computation()), or fragment the task in smaller subtasks.

Platform building and Dynamic resources

Where can I find SimGrid platform files?

There is several little examples in the archive, in the examples/msg directory. From time to time, we are asked for other files, but we don't have much at hand right now.

You should refer to the Platform Description Archive (http://pda.gforge.inria.fr) project to see the other platform file we have available, as well as the Simulacrum simulator, meant to generate SimGrid platforms using all classical generation algorithms.

How can I automatically map an existing platform?

We are working on a project called ALNeM (Application-Level Network Mapper) which goal is to automatically discover the topology of an existing network. Its output will be a platform description file following the SimGrid syntax, so everybody will get the ability to map their own lab network (and contribute them to the catalog project). This tool is not ready yet, but it move quite fast forward. Just stay tuned.

Generating synthetic but realistic platforms

The third possibility to get a platform file (after manual or automatic mapping of real platforms) is to generate synthetic platforms. Getting a realistic result is not a trivial task, and moreover, nobody is really able to define what "realistic" means when speaking of topology files. You can find some more thoughts on this topic in these slides.

If you are looking for an actual tool, there we have a little tool to annotate Tiers-generated topologies. This perl-script is in tools/platform_generation/ directory of the SVN. Dinda et Al. released a very comparable tool, and called it GridG.

Modeling multi-core resources

There is currently no native support for multi-core or SMP machines in SimGrid. We are currently working on it, but coming up with the right model is very hard: Cores share caches and bus to access memory and thus interfere with each others. Memory contention is a crucial component of multi-core modeling.

In the meanwhile, some user-level tricks can reveal sufficient for you. For example, you may model each core by a CPU and add some very high speed links between them. This complicates a bit the user code since you have to remember that when you assign something to a (real) host, it can be any of the (fake) hosts representing the cores of a given machine. For that, you can use the prop tag of the XML files as follows. Your code should then look at the ‘machine’ property associated with each workstation, and run parallel tasks over all cores of the machine.

  <host id="machine0/core0" power="91500E6">
    <prop id="machine" value="machine0"/>
    <prop id="core" value="0"/>
  </host>
  <host id="machine0/core1" power="91500E6">
    <prop id="machine" value="machine0"/>
    <prop id="core" value="1"/>
</host>


Modeling dynamic resource availability

A nice feature of SimGrid is that it enables you to seamlessly have resources whose availability change over time. When you build a platform, you generally declare hosts like that:

  <host id="host A" power="100.00"/>

If you want the availability of "host A" to change over time, the only thing you have to do is change this definition like that:

  <host id="host A" power="100.00" availability_file="trace_A.txt" state_file="trace_A_failure.txt"/>

For hosts, availability files are expressed in fraction of available power. Let's have a look at what "trace_A.txt" may look like:

PERIODICITY 1.0
0.0 1.0
11.0 0.5
20.0 0.9

At time 0, our host will deliver 100 flop/s. At time 11.0, it will deliver only 50 flop/s until time 20.0 where it will will start delivering 90 flop/s. Last at time 21.0 (20.0 plus the periodicity 1.0), we'll be back to the beginning and it will deliver 100 flop/s.

Now let's look at the state file:

PERIODICITY 10.0
1.0 -1.0
2.0 1.0

A negative value means "off" while a positive one means "on". At time 1.0, the host is on. At time 1.0, it is turned off and at time 2.0, it is turned on again until time 12 (2.0 plus the periodicity 10.0). It will be turned on again at time 13.0 until time 23.0, and so on.

Now, let's look how the same kind of thing can be done for network links. A usual declaration looks like:

  <link id="LinkA" bandwidth="10.0" latency="0.2"/>

You have at your disposal the following options: bandwidth_file, latency_file and state_file. The only difference with hosts is that bandwidth_file and latency_file do not express fraction of available power but are expressed directly in bytes per seconds and seconds.

How to express multipath routing in platform files?

It is unfortunately impossible to express the fact that there is more than one routing path between two given hosts. Let's consider the following platform file:

<route src="A" dst="B">
   <link:ctn id="1"/>
</route>
<route src="B" dst="C">
  <link:ctn id="2"/>
</route>
<route src="A" dst="C">
  <link:ctn id="3"/>
</route>

Although it is perfectly valid, it does not mean that data traveling from A to C can either go directly (using link 3) or through B (using links 1 and 2). It simply means that the routing on the graph is not trivial, and that data do not following the shortest path in number of hops on this graph. Another way to say it is that there is no implicit in these routing descriptions. The system will only use the routes you declare (such as <route src="A" dst="C"><link:ctn id="3"/></route>), without trying to build new routes by aggregating the provided ones.

You are also free to declare platform where the routing is not symmetric. For example, add the following to the previous file:

<route src="C" dst="A">
  <link:ctn id="2"/>
  <link:ctn id="1"/>
</route>

This makes sure that data from C to A go through B where data from A to C go directly. Don't worry about realism of such settings since we've seen ways more weird situation in real settings (in fact, that's the realism of very regular platforms which is questionable, but that's another story).

Bypassing the XML parser with your own C functions

So you want to bypass the XML files parser, uh? Maybe doing some parameter sweep experiments on your simulations or so? This is possible, and it's not even really difficult (well. Such a brutal idea could be harder to implement). Here is how it goes.

For this, you have to first remember that the XML parsing in SimGrid is done using a tool called FleXML. Given a DTD, this gives a flex-based parser. If you want to bypass the parser, you need to provide some code mimicking what it does and replacing it in its interactions with the SURF code. So, let's have a look at these interactions.

FleXML parser are close to classical SAX parsers. It means that a well-formed SimGrid platform XML file might result in the following "events":

The communication from the parser to the SURF code uses two means: Attributes get copied into some global variables, and a surf-provided function gets called by the parser for each event. For example, the event

let the parser do something roughly equivalent to:

  strcpy(A_host_id,"host1");
  A_host_power = 1.0;
  STag_host();

In SURF, we attach callbacks to the different events by initializing the pointer functions to some the right surf functions. Since there can be more than one callback attached to the same event (if more than one model is in use, for example), they are stored in a dynar. Example in workstation_ptask_L07.c:

  /* Adding callback functions */
  surf_parse_reset_parser();
  surfxml_add_callback(STag_surfxml_host_cb_list, &parse_cpu_init);
  surfxml_add_callback(STag_surfxml_prop_cb_list, &parse_properties);
  surfxml_add_callback(STag_surfxml_link_cb_list, &parse_link_init);
  surfxml_add_callback(STag_surfxml_route_cb_list, &parse_route_set_endpoints);
  surfxml_add_callback(ETag_surfxml_link_c_ctn_cb_list, &parse_route_elem);
  surfxml_add_callback(ETag_surfxml_route_cb_list, &parse_route_set_route);
                
  /* Parse the file */
  surf_parse_open(file);
  xbt_assert(!surf_parse(), "Parse error in %s", file);
  surf_parse_close();

So, to bypass the FleXML parser, you need to write your own version of the surf_parse function, which should do the following:

Then, tell SimGrid that you want to use your own "parser" instead of the stock one:

  surf_parse = surf_parse_bypass_environment;
  MSG_create_environment(NULL);
  surf_parse = surf_parse_bypass_application;
  MSG_launch_application(NULL);

A set of macros are provided at the end of include/surf/surfxml_parse.h to ease the writing of the bypass functions. An example of this trick is distributed in the file examples/msg/masterslave/masterslave_bypass.c

Changing SimGrid's behavior

A number of options can be given at runtime to change the default SimGrid behavior. In particular, you can change the default cpu and network models...

Using Fullduplex

Experimental fullduplex support is now available on the svn branch. In order to fullduple to work your platform must have two links for each pair of interconnected hosts, see an example here:

	simgrid_svn_sources/exemples/msg/gtnets/fullduplex-p.xml

Using fullduplex support ongoing and incoming communication flows are treated independently for most models. The exception is the LV08 model which adds 0.05 of usage on the opposite direction for each new created flow. This can be useful to simulate some important TCP phenomena such as ack compression.

Running a fullduplex example:

	cd simgrid_svn_sources/exemples/msg/gtnets
	./gtnets fullduplex-p.xml fullduplex-d.xml --cfg=fullduplex:1

Using GTNetS

It is possible to use a packet-level network simulator instead of the default flow-based simulation. You may want to use such an approach if you have doubts about the validity of the default model or if you want to perform some validation experiments. At the moment, we support the GTNetS simulator (it is still rather experimental though, so leave us a message if you play with it).

To enable GTNetS model inside SimGrid it is needed to patch the GTNetS simulator source code and build/install it from scratch

 svn checkout svn://scm.gforge.inria.fr/svn/simgrid/contrib/trunk/GTNetS/
 cd GTNetS
 
 unzip gtnets-current.zip
 tar zxvf gtnets-current-patch.tgz 
 cd gtnets-current
 cat ../00*.patch | patch -p1
 
  cat ../AMD64-FATAL-Removed-DUL_SIZE_DIFF-Added-fPIC-compillin.patch | patch -p1
  

Due to portability issues it is possible that GTNetS does not compile in your architecture. The patches furnished in SimGrid SVN repository are intended for use in Linux architecture only. Unfortunately, we do not have the time, the money, neither the manpower to guarantee GTNetS portability. We advice you to use one of GTNetS communication channel to get more help in compiling GTNetS.

 ln -sf Makefile.linux Makefile
 make depend
 make debug
 
 make opt
 

It is important to put the full path of your libgtsim-xxxx.so file when creating the symbolic link. Replace < userhome > by some path you have write access to.

 ln -sf /<absolute_path>/gtnets_current/libgtsim-debug.so /<userhome>/usr/lib/libgtnets.so
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/<userhome>/usr/lib/libgtnets.so
 mkdir /<userhome>/usr/include/gtnets
 cp -fr SRC/*.h /<userhome>/usr/include/gtnets
 

In order to enable gtnets with simgrid you have to give where is gtnets. (path to <gtnets_path>/lib and <gtnets_path>/include)

   Since v3.4 (with cmake)
   cmake . -Dgtnets_path=/<userhome>/usr
   
   Until v3.4 (with autotools)
   ./configure --with-gtnets=/<userhome>/usr
   
   Since v3.4 (with cmake)
   cd simgrid
   make
   ctest -R gtnets
   
   Until v3.4 (with autotools)
   cd simgrid/example/msg/
   make
   make check
   
 gtnets/gtnets gtnets/onelink-p.xml gtnets/onelink-d.xml --cfg=network_model:GTNets
 

A long version of this HowTo it is available

More about GTNetS simulator at GTNetS Website

Using alternative flow models

The default simgrid network model uses a max-min based approach as explained in the research report A Network Model for Simulation of Grid Application. Other models have been proposed and implemented since then (see for example Accuracy Study and Improvement of Network Simulation in the SimGrid Framework) and can be activated at runtime. For example:

./mycode platform.xml deployment.xml --cfg=workstation/model:compound --cfg=network/model:LV08 -cfg=cpu/model:Cas01

Possible models for the network are currently "Constant", "CM02", "LegrandVelho", "GTNets", Reno", "Reno2", "Vegas". Others will probably be added in the future and many of the previous ones are experimental and are likely to disappear without notice... To know the list of the currently implemented models, you should use the --help-models command line option.

./masterslave_forwarder ../small_platform.xml deployment_masterslave.xml  --help-models
Long description of the workstation models accepted by this simulator:
  CLM03: Default workstation model, using LV08 and CM02 as network and CPU
  compound: Workstation model allowing you to use other network and CPU models
  ptask_L07: Workstation model with better parallel task modeling
Long description of the CPU models accepted by this simulator:
  Cas01_fullupdate: CPU classical model time=size/power
  Cas01: Variation of Cas01_fullupdate with partial invalidation optimization of lmm system. Should produce the same values, only faster
  CpuTI: Variation of Cas01 with also trace integration. Should produce the same values, only faster if you use availability traces
Long description of the network models accepted by this simulator:
  Constant: Simplistic network model where all communication take a constant time (one second)
  CM02: Realistic network model with lmm_solve and no correction factors
  LV08: Realistic network model with lmm_solve and these correction factors: latency*=10.4, bandwidth*=.92, S=8775
  Reno: Model using lagrange_solve instead of lmm_solve (experts only)
  Reno2: Model using lagrange_solve instead of lmm_solve (experts only)
  Vegas: Model using lagrange_solve instead of lmm_solve (experts only)

Tracing Simulations for Visualization

The trace visualization is widely used to observe and understand the behavior of parallel applications and distributed algorithms. Usually, this is done in a two-step fashion: the user instruments the application and the traces are analyzed after the end of the execution. The visualization itself can highlights unexpected behaviors, bottlenecks and sometimes can be used to correct distributed algorithms. The SimGrid team has instrumented the library in order to let users trace their simulations and analyze them. This part of the user manual explains how the tracing-related features can be enabled and used during the development of simulators using the SimGrid library.

How it works

For now, the SimGrid library is instrumented so users can trace the platform utilization using the MSG, SimDAG and SMPI interface. This means that the tracing will register how much power is used for each host and how much bandwidth is used for each link of the platform. The idea with this type of tracing is to observe the overall view of resources utilization in the first place, especially the identification of bottlenecks, load-balancing among hosts, and so on.

The idea of the tracing facilities is to give SimGrid users to possibility to classify MSG and SimDAG tasks by category, tracing the platform utilization (hosts and links) for each of the categories. For that, the tracing interface enables the declaration of categories and a function to mark a task with a previously declared category. The tasks that are not classified according to a category are not traced. Even if the user does not specify any category, the simulations can still be traced in terms of resource utilization by using a special parameter that is detailed below.

Enabling using CMake

With the sources of SimGrid, it is possible to enable the tracing using the parameter -Denable_tracing=ON when the cmake is executed. The section Tracing Functions describes all the functions available when this Cmake options is activated. These functions will have no effect if SimGrid is configured without this option (they are wiped-out by the C-preprocessor).

$ cmake -Denable_tracing=ON .
$ make

Tracing Functions

Tracing configuration Options

These are the options accepted by the tracing system of SimGrid:

Example of Instrumentation

A simplified example using the tracing mandatory functions.

int main (int argc, char **argv)
{
  MSG_global_init (&argc, &argv);

  //(... after deployment ...)

  //note that category declaration must be called after MSG_create_environment
  TRACE_category_with_color ("request", "1 0 0");
  TRACE_category_with_color ("computation", "0.3 1 0.4");
  TRACE_category ("finalize");

  m_task_t req1 = MSG_task_create("1st_request_task", 10, 10, NULL);
  m_task_t req2 = MSG_task_create("2nd_request_task", 10, 10, NULL);
  m_task_t req3 = MSG_task_create("3rd_request_task", 10, 10, NULL);
  m_task_t req4 = MSG_task_create("4th_request_task", 10, 10, NULL);
  TRACE_msg_set_task_category (req1, "request");
  TRACE_msg_set_task_category (req2, "request");
  TRACE_msg_set_task_category (req3, "request");
  TRACE_msg_set_task_category (req4, "request");

  m_task_t comp = MSG_task_create ("comp_task", 100, 100, NULL);
  TRACE_msg_set_task_category (comp, "computation");

  m_task_t finalize = MSG_task_create ("finalize", 0, 0, NULL);
  TRACE_msg_set_task_category (finalize, "finalize");

  //(...)

  MSG_clean();
  return 0;
}

Analyzing the SimGrid Traces

The SimGrid library, during an instrumented simulation, creates a trace file in the Paje file format that contains the platform utilization for the simulation that was executed. The visualization analysis of this file is performed with the visualization tool Triva, with special configurations tunned to SimGrid needs. This part of the documentation explains how to configure and use Triva to analyse a SimGrid trace file.

Model-Checking

How to use it

To enable the experimental SimGrid model-checking support the program should be executed with the command line argument

--cfg=model-check:1 

Properties are expressed as assertions using the function

void MC_assert(int prop);

Lua Binding

Most of Simgrid modules require a good level in C programming, since simgrid is used to be as standard C library. Sometime users prefer using some kind of « easy scripts » or a language easier to code with, for their works, which avoid dealing with C errors, and sometime an important gain of time. Besides Java Binding, Lua and Ruby bindings are available since version 3.4 of Simgrid for MSG Module, and we are currenlty working on bindings for other modules.

What is lua ?

Lua is a lightweight, reflective, imperative and functional programming language, designed as a scripting language with extensible semantics as a primary goal (see official web site here).

Why lua ?

Lua is a fast, portable and powerful script language, quite simple to use for developpers. it combines procedural features with powerful data description facilities, by using a simple, yet powerful, mechanism of tables. Lua has a relatively simple C API compared to other scripting languages, and accordingly it provides a robust, easy to use it.

How to use lua in Simgrid ?

Actually, the use of lua in Simgrid is quite simple, you have just to follow the same steps as coding with C in Simgird :

Master/Slave Example

Exchanging Data

You can also exchange data between Process using lua. for that, you have to deal with lua task as a table, since lua is based itself on a mechanism of tables, so you can exchange any kind of data (tables, matrix, strings,…) between process via tasks.

Bypass XML

maybe you wonder if there is a way to bypass the XML files, and describe your platform directly from the code, with lua bindings it's Possible !! how ? We provide some additional (tricky?) functions in lua that allows you to set up your own platform without using the XML files ( this can be useful for large platforms, so a simple for loop will avoid you to deal with an annoying XML File ;) )

the full example is distributed in the file examples/lua/master_slave_bypass.lua

Ruby Binding

Use Ruby in Simgrid

Since v3.4, the use of ruby in simgrid is available for the MSG Module. you can find almost all MSG functionalities in Ruby code, that allows you to set up your environment, manage tasks between hosts and run the simulation.

Master/Slave Ruby Application

for each process method(master and slave in this example), you have to associate a ruby class, that should inherit from MSG::Process ruby class, with a 'main' function that describe the behaviour of the process during the simulation.

the class MSG::Task contains methods that allows the management of the native MSG tasks. in master ruby code we used :

Exchanging data

ruby bindings provides two ways to exchange data between ruby processes.

the MSG::Task class contains 2 methods that allows a data exchange between 2 process.

-MSG::Task.join : makes possible to join any kind of ruby data within a task.

   ...
   myTable = Array.new
   myTable <<1<<-2<<45<<67<<87<<76<<89<<56<<78<<3<<-4<<99
   # Creates and send Task With the Table inside
   task = MSG::Task.new("quicksort_task",taskComputeSize, taskCommunicationSize);
   task.join(myTable);
   ...
   task.send(mailbox);
   

-MSG::Task.data : to access to the data contained into the task.

   ...
   task = MSG::Task.receive(recv_mailbox.to_s)
   table = task.data
   quicksort(table,0,table.size-1)
   ...
   

you can find a complet example illustrating the use of those methods in file /example/ruby/Quicksort.rb

another 'object-oriented' way to do it, is to make your own 'task' class that inherit from MSG::Task , and contains data you want to deal with, the only 'tricky' thing is that "the initializer" method has no effect !

the use of some getter/setter methods would be the simple way to manage your data :)

class PingPongTask < MSG::Task
  # The initialize method has no effect 
  @time 
  def setTime(t)
    @time = t
  end
  def getTime()
    return @time
  end
end
 

you can find an example of use in file example/ruby/PingPong.rb

Troubleshooting

SimGrid compilation and installation problems

cmake fails!

We know only one reason for the configure to fail:

If you experience other kind of issue, please get in touch with us. We are always interested in improving our portability to new systems.

Dude! "ctest" fails on my machine!

Don't assume we never run this target, because we do. Check http://cdash.inria.fr/CDash/index.php?project=Simgrid (click on previous if there is no result for today: results are produced only by 11am, French time) and https://buildd.debian.org/status/logs.php?pkg=simgrid if you don't believe us.

If it's failing on your machine in a way not experienced by the autobuilders above, please drop us a mail on the mailing list so that we can check it out. Make sure to read So I've found a bug in SimGrid. How to report it? before you do so.

User code compilation problems

"gcc: _simgrid_this_log_category_does_not_exist__??? undeclared (first use in this function)"

This is because you are using the log mecanism, but you didn't created any default category in this file. You should refer to Logging support for all the details, but you simply forgot to call one of XBT_LOG_NEW_DEFAULT_CATEGORY() or XBT_LOG_NEW_DEFAULT_SUBCATEGORY().

"gcc: undefined reference to pthread_key_create"

This indicates that one of the library SimGrid depends on (libpthread here) was missing on the linking command line. Dependencies of libsimgrid are expressed directly in the dynamic library, so it's quite impossible that you see this message when doing dynamic linking.

If you compile your code statically (and if you use a pthread version of SimGrid -- see Increasing the amount of simulated processes), you must absolutely specify -lpthread on the linker command line. As usual, this should come after -lsimgrid on this command line.

Runtime error messages

"surf_parse_lex: Assertion `next limit' failed."

This is because your platform file is too big for the parser.

Actually, the message comes directly from FleXML, the technology on top of which the parser is built. FleXML has the bad idea of fetching the whole document in memory before parsing it. And moreover, the memory buffer size must be determined at compilation time.

We use a value which seems big enough for our need without bloating the simulators footprints. But of course your mileage may vary. In this case, just edit src/surf/surfxml.l modify the definition of FLEXML_BUFFERSTACKSIZE. E.g.

#define FLEXML_BUFFERSTACKSIZE 1000000000

Then recompile and everything should be fine, provided that your version of Flex is recent enough (>= 2.5.31). If not the compilation process should warn you.

A while ago, we worked on FleXML to reduce a bit its memory consumption, but these issues remain. There is two things we should do:

These are changes to FleXML itself, not SimGrid. But since we kinda hijacked the development of FleXML, I can grant you that any patches would be really welcome and quickly integrated.

Update: A new version of FleXML (1.7) was released. Most of the work was done by William Dowling, who use it in his own work. The good point is that it now use a dynamic buffer, and that the memory usage was greatly improved. The downside is that William also changed some things internally, and it breaks the hack we devised to bypass the parser, as explained in Bypassing the XML parser with your own C functions. Indeed, this is not a classical usage of the parser, and Will didn't imagine that we may have used (and even documented) such a crude usage of FleXML. So, we now have to repair the bypassing functionality to use the lastest FleXML version and fix the memory usage in SimGrid.

GRAS spits networking error messages

Gras, on real platforms, naturally use regular sockets to communicate. They are deeply hidden in the gras abstraction, but when things go wrong, you may get some weird error messages. Here are some example, with the probable reason:

I'm told that my XML files are too old.

The format of the XML platform description files is sometimes improved. For example, we decided to change the units used in SimGrid from MBytes, MFlops and seconds to Bytes, Flops and seconds to ease people exchanging small messages. We also reworked the route descriptions to allow more compact descriptions.

That is why the XML files are versionned using the 'version' attribute of the root tag. Currently, it should read:

  <platform version="2">

If your files are too old, you can use the simgrid_update_xml.pl script which can be found in the tools directory of the archive.

Valgrind-related and other debugger issues

If you don't, you really should use valgrind to debug your code, it's almost magic.

longjmp madness in valgrind

This is when valgrind starts complaining about longjmp things, just like:

==21434== Conditional jump or move depends on uninitialised value(s)
==21434==    at 0x420DBE5: longjmp (longjmp.c:33)
==21434==
==21434== Use of uninitialised value of size 4
==21434==    at 0x420DC3A: __longjmp (__longjmp.S:48)

This is the sign that you didn't used the exception mecanism well. Most probably, you have a return; somewhere within a TRY{} block. This is evil, and you must not do this. Did you read the section about Exception support??

Valgrind spits tons of errors about backtraces!

It may happen that valgrind, the memory debugger beloved by any decent C programmer, spits tons of warnings like the following :

==8414== Conditional jump or move depends on uninitialised value(s)
==8414==    at 0x400882D: (within /lib/ld-2.3.6.so)
==8414==    by 0x414EDE9: (within /lib/tls/i686/cmov/libc-2.3.6.so)
==8414==    by 0x400B105: (within /lib/ld-2.3.6.so)
==8414==    by 0x414F937: _dl_open (in /lib/tls/i686/cmov/libc-2.3.6.so)
==8414==    by 0x4150F4C: (within /lib/tls/i686/cmov/libc-2.3.6.so)
==8414==    by 0x400B105: (within /lib/ld-2.3.6.so)
==8414==    by 0x415102D: __libc_dlopen_mode (in /lib/tls/i686/cmov/libc-2.3.6.so)
==8414==    by 0x412D6B9: backtrace (in /lib/tls/i686/cmov/libc-2.3.6.so)
==8414==    by 0x8076446: xbt_dictelm_get_ext (dict_elm.c:714)
==8414==    by 0x80764C1: xbt_dictelm_get (dict_elm.c:732)
==8414==    by 0x8079010: xbt_cfg_register (config.c:208)
==8414==    by 0x806821B: MSG_config (msg_config.c:42)

This problem is somewhere in the libc when using the backtraces and there is very few things we can do ourselves to fix it. Instead, here is how to tell valgrind to ignore the error. Add the following to your ~/.valgrind.supp (or create this file on need). Make sure to change the obj line according to your personnal mileage (change 2.3.6 to the actual version you are using, which you can retrieve with a simple "ls /lib/ld*.so").

{
   name: Backtrace madness
   Memcheck:Cond
   obj:/lib/ld-2.3.6.so
   fun:dl_open_worker
   fun:_dl_open
   fun:do_dlopen
   fun:dlerror_run
   fun:__libc_dlopen_mode
}

Then, you have to specify valgrind to use this suppression file by passing the --suppressions=$HOME/.valgrind.supp option on the command line. You can also add the following to your ~/.bashrc so that it gets passed automatically. Actually, it passes a bit more options to valgrind, and this happen to be my personnal settings. Check the valgrind documentation for more information.

export VALGRIND_OPTS="--leak-check=yes --leak-resolution=high --num-callers=40 --tool=memcheck --suppressions=$HOME/.valgrind.supp" 

Truncated backtraces

When debugging SimGrid, it's easier to pass the --disable-compiler-optimization flag to the configure if valgrind or gdb get fooled by the optimization done by the compiler. But you should remove these flag when everything works before going in production (before launching your 1252135 experiments), or everything will run only one half of the true SimGrid potential.

There is a deadlock in my code!!!

Unfortunately, we cannot debug every code written in SimGrid. We furthermore believe that the framework provides ways enough information to debug such informations yourself. If the textual output is not enough, Make sure to check the Visualizing and analyzing the results FAQ entry to see how to get a graphical one.

Now, if you come up with a really simple example that deadlocks and you're absolutely convinced that it should not, you can ask on the list. Just be aware that you'll be severely punished if the mistake is on your side... We have plenty of FAQ entries to redact and new features to implement for the impenitents! ;)

I get weird timings when I play with the latencies.

OK, first of all, remember that units should be Bytes, Flops and Seconds. If you don't use such units, some SimGrid constants (e.g. the SG_TCP_CTE_GAMMA constant used in most network models) won't have the right unit and you'll end up with weird results.

Here is what happens with a single transfer of size L on a link (bw,lat) when nothing else happens.

0-----lat--------------------------------------------------t
|-----|**** real_bw =min(bw,SG_TCP_CTE_GAMMA/(2*lat)) *****|

In more complex situations, this min is the solution of a complex max-min linear system. Have a look here and read the two threads "Bug in SURF?" and "Surf bug not fixed?". You'll have a few other examples of such computations. You can also read "A Network Model for Simulation of Grid Application" by Henri Casanova and Loris Marchal to have all the details. The fact that the real_bw is smaller than bw is easy to understand. The fact that real_bw is smaller than SG_TCP_CTE_GAMMA/(2*lat) is due to the window-based congestion mechanism of TCP. With TCP, you can't exploit your huge network capacity if you don't have a good round-trip-time because of the acks...

Anyway, what you get is t=lat + L/min(bw,SG_TCP_CTE_GAMMA/(2*lat)).

if I you set (bw,lat)=(100 000 000, 0.00001), you get t = 1.00001 (you fully use your link) if I you set (bw,lat)=(100 000 000, 0.0001), you get t = 1.0001 (you're on the limit) if I you set (bw,lat)=(100 000 000, 0.001), you get t = 10.001 (ouch!)

This bound on the effective bandwidth of a flow is not the only thing that may make your result be unexpected. For example, two flows competing on a saturated link receive an amount of bandwidth inversely proportional to their round trip time.

So I've found a bug in SimGrid. How to report it?

We do our best to make sure to hammer away any bugs of SimGrid, but this is still an academic project so please be patient if/when you find bugs in it. If you do, the best solution is to drop an email either on the simgrid-user or the simgrid-devel mailing list and explain us about the issue. You can also decide to open a formal bug report using the relevant interface. You need to login on the server to get the ability to submit bugs.

We will do our best to solve any problem repported, but you need to help us finding the issue. Just telling "it segfault" isn't enough. Telling "It segfaults when running the attached simulator" doesn't really help either. You may find the following article interesting to see how to repport informative bug repports: http://www.chiark.greenend.org.uk/~sgtatham/bugs.html (it is not SimGrid specific at all, but it's full of good advices).

Author:
Arnaud Legrand (arnaud.legrand::imag.fr)
Martin Quinson (martin.quinson::loria.fr)


Back to the main Simgrid Documentation page The version of Simgrid documented here is v3.6.1.
Documentation of other versions can be found in their respective archive files (directory doc/html).
Generated for SimGridAPI by doxygen