banner

Traffic Traces for TCP evaluation

 

Download traces

There are traces available for download here traces-20080623.tar.bz2 (168 MB file; 624 MB uncompressed).

Description

This page provides an initial draft of TCP traffic traces for use in the test suite described in the paper Towards a common TCP evaluation suite.

The traces are for use with the Tmix [1] traffic generator. The format used is that described in [2], which differs in minor details from that in [1]. (In particular, the global-id and subset-id attributes are present and all times are given in microseconds.)

Here are the traces(168 MB file; 625 MB uncompressed).

These files are based on a 60-minute trace of campus traffic at the University of North Carolina, provided by Jay Aikat. One trace contains connections initiated inside the campus network; the other contains those initiated from external sites.

The traces available here were produced by the following process:

  1. The two original traces, with received and initiated connections, were merged. This reverses the direction of one of the traces, but since the traces are intended to be used in a symmetric configuration this is acceptable.
  2. Connections starting within the last 100 seconds are deleted. This reduces the dip in rate due to the end of the original trace, since only fully-captured connections are present.
  3. A cyclic permutation of the original trace is chosen such that the average load for the first 100 seconds is approximately the same as the overall average load.
  4. The trace was sorted by RTT and split into nine traces, such that each sub-trace has approximately the same average load.
  5. The sub-traces were truncated to different lengths, from 52 minutes to 60 minutes. This way, experiments can be made longer than the trace duration without repeating the same traffic.

tmix_utils script

Here is a python script that was used to generate these traces:

tmix_utils.py

It can perform various manipulations of tmix connection vector traces:

  • merging multiple traces into one
  • scaling inter-connection times to match a desired load
  • random thinning to match a desired load
  • truncation
  • sub-sampling by RTT
  • shifting to match initial rate to average rate
  • block resampling and poisson resampling (as defined by [3])

It can also display basic information about a trace.

The block and poisson resampling algorithms are likely buggy, as they have not been tested extensively. In particular, they will create connections with the same global-id and subset-id; since I don't know how Tmix works, I don't know whether this is a serious bug or not.

Run python tmix_utils.py --help for a description of the script parameters, or see the source code for more detailed information.

tmix_utils.py uses the Psyco specializing compiler to significantly speed up computation, if it is available. It will work without Psyco, but much slower.

There is also a small script plotrate.py which generates a listing of the offered rate vs. time. The output of this script is suitable for use with gnuplot. The plotrate.py script is a bit more accurate, as it takes into account the delay times in the connection vectors, whereas these are mostly ignored by tmix_utils.py. (But it still doesn't take into account the time needed to transfer data--in effect it assumes infinite transfer rate.)

Trace statistics

This table shows the RTT range and average load of the nine traces:

Trace Min RTT (ms) Max RTT (ms) Rate (Mbps) Rate for first 100 sec (Mbps) Reverse Rate (Mbps) Duration (seconds) Number of connections
unc_20080110_1400_1hr-r4s1 0.182 10.166 41.6142 26.1552 3.18458 3499.84 455805
unc_20080110_1400_1hr-r4s2 10.166 13.651 42.3947 27.0697 2.27467 3439.83 364212
unc_20080110_1400_1hr-r4s3 13.651 20.271 41.4223 25.6275 3.1699 3379.83 480683
unc_20080110_1400_1hr-r5s1 20.271 23.367 41.8169 102.182 3.12442 3319.82 323561
unc_20080110_1400_1hr-r5s2 23.367 36.764 37.4286 31.2716 6.31383 3259.82 544653
unc_20080110_1400_1hr-r5s3 36.764 52.194 38.6125 27.2360 6.2782 3199.84 456672
unc_20080110_1400_1hr-r6s1 52.194 81.953 39.8383 42.5377 5.27905 3139.82 455387
unc_20080110_1400_1hr-r6s2 81.954 100.355 39.7479 45.7803 4.84254 3079.84 445810
unc_20080110_1400_1hr-r6s3 100.356 29787.7 33.6108 28.0837 10.5373 3019.84 577518
original trace 0.182 29787.7 357.085 355.942 46.0998 3499.84 4443454

more detailed statistics are included in the trace_info file in the tarball.

References

  1. M. C. Weigle, P. Adurthi, F. Hernandez-Campos, K. Jeffay, F. D. Smith, Tmix: a tool for generating realistic TCP application workloads in ns-2, ACM SIGCOMM Computer Communication Review, vol. 36, no. 3, pp. 65-76, July 2006.
  2. P. Adurthi, Generatign Tmix-based TCP application workloads in ns-2 and GTNetS (M.S. thesis), 2006. [Online]. Available: http://www.cs.odu.edu/~mweigle/papers/adurthi-thesis06.pdf.
  3. F. Hernandez-Campos, Generation and validation of empirically-derived TCP application workloads (doctoral dissertation), 2006. [Online]. http://www.cs.unc.edu/~fhernand/diss-html/.

Last updated 2008-04-29 15:59:32 PDT (-0700)

©2008 California Institute of Technology - Networking Lab.
This material is based upon work supported by the National Science Foundation under Grant No. EIA-0303620.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.