 |
Download traces
There are traces available for download here traces-20080623.tar.bz2 (168 MB file; 624 MB uncompressed).
Description
This page provides an initial draft of TCP traffic traces for use in the test suite described in the paper Towards a common TCP evaluation suite.
The traces are for use with the Tmix [1] traffic generator. The format used is that described in [2], which differs in minor details from that in [1]. (In particular, the global-id and subset-id attributes are present and all times are given in microseconds.)
Here are the traces(168 MB file; 625 MB uncompressed).
These files are based on a 60-minute trace of campus traffic at the University of North Carolina, provided by Jay Aikat. One trace contains connections initiated inside the campus network; the other contains those initiated from external sites.
|
The traces available here were produced by the following process:
- The two original traces, with received and initiated connections, were merged. This reverses the direction of one of the traces, but since the traces are intended to be used in a symmetric configuration this is acceptable.
- Connections starting within the last 100 seconds are deleted. This reduces the dip in rate due to the end of the original trace, since only fully-captured connections are present.
- A cyclic permutation of the original trace is chosen such that the average load for the first 100 seconds is approximately the same as the overall average load.
- The trace was sorted by RTT and split into nine traces, such that each sub-trace has approximately the same average load.
- The sub-traces were truncated to different lengths, from 52 minutes to 60 minutes. This way, experiments can be made longer than the trace duration without repeating the same traffic.
tmix_utils script
Here is a python script that was used to generate these traces:
tmix_utils.py
It can perform various manipulations of tmix connection vector traces:
- merging multiple traces into one
- scaling inter-connection times to match a desired load
- random thinning to match a desired load
- truncation
- sub-sampling by RTT
- shifting to match initial rate to average rate
- block resampling and poisson resampling (as defined by [3])
It can also display basic information about a trace.
The block and poisson resampling algorithms are likely buggy, as they have not been tested extensively. In particular, they will create connections with the same global-id and subset-id; since I don't know how Tmix works, I don't know whether this is a serious bug or not.
Run python tmix_utils.py --help for a description of the script parameters, or see the source code for more detailed information.
tmix_utils.py uses the Psyco specializing compiler to significantly speed up computation, if it is available. It will work without Psyco, but much slower.
There is also a small script plotrate.py which generates a listing of the offered rate vs. time. The output of this script is suitable for use with gnuplot. The plotrate.py script is a bit more accurate, as it takes into account the delay times in the connection vectors, whereas these are mostly ignored by tmix_utils.py. (But it still doesn't take into account the time needed to transfer data--in effect it assumes infinite transfer rate.)
Trace statistics
This table shows the RTT range and average load of the nine traces:
| Trace |
Min RTT (ms) |
Max RTT (ms) |
Rate (Mbps) |
Rate for first 100 sec (Mbps) |
Reverse Rate (Mbps) |
Duration (seconds) |
Number of connections |
| unc_20080110_1400_1hr-r4s1 |
0.182 |
10.166 |
41.6142 |
26.1552 |
3.18458 |
3499.84 |
455805 |
| unc_20080110_1400_1hr-r4s2 |
10.166 |
13.651 |
42.3947 |
27.0697 |
2.27467 |
3439.83 |
364212 |
| unc_20080110_1400_1hr-r4s3 |
13.651 |
20.271 |
41.4223 |
25.6275 |
3.1699 |
3379.83 |
480683 |
| unc_20080110_1400_1hr-r5s1 |
20.271 |
23.367 |
41.8169 |
102.182 |
3.12442 |
3319.82 |
323561 |
| unc_20080110_1400_1hr-r5s2 |
23.367 |
36.764 |
37.4286 |
31.2716 |
6.31383 |
3259.82 |
544653 |
| unc_20080110_1400_1hr-r5s3 |
36.764 |
52.194 |
38.6125 |
27.2360 |
6.2782 |
3199.84 |
456672 |
| unc_20080110_1400_1hr-r6s1 |
52.194 |
81.953 |
39.8383 |
42.5377 |
5.27905 |
3139.82 |
455387 |
| unc_20080110_1400_1hr-r6s2 |
81.954 |
100.355 |
39.7479 |
45.7803 |
4.84254 |
3079.84 |
445810 |
| unc_20080110_1400_1hr-r6s3 |
100.356 |
29787.7 |
33.6108 |
28.0837 |
10.5373 |
3019.84 |
577518 |
| original trace |
0.182 |
29787.7 |
357.085 |
355.942 |
46.0998 |
3499.84 |
4443454 |
more detailed statistics are included in the trace_info file in the tarball.
References
- M. C. Weigle, P. Adurthi, F. Hernandez-Campos, K. Jeffay, F. D. Smith, Tmix: a tool for generating realistic TCP application workloads in ns-2, ACM SIGCOMM Computer Communication Review, vol. 36, no. 3, pp. 65-76, July 2006.
- P. Adurthi, Generatign Tmix-based TCP application workloads in ns-2 and GTNetS (M.S. thesis), 2006. [Online]. Available: http://www.cs.odu.edu/~mweigle/papers/adurthi-thesis06.pdf.
- F. Hernandez-Campos, Generation and validation of empirically-derived TCP application workloads (doctoral dissertation), 2006. [Online]. http://www.cs.unc.edu/~fhernand/diss-html/.
Last updated 2008-04-29 15:59:32 PDT (-0700) |