Swift - STARDUST


Swift

To store raw traffic traces and flow-level traces, STARDUST uses cloud-based Swift object storage. The following tutorial will show you how to access and process objects in Swift.

Intro

“OpenStack Object Storage (swift) is used for redundant, scalable data storage using clusters of standardized servers to store petabytes of accessible data. It is a long-term storage system for large amounts of static data which can be retrieved and updated. Object Storage uses a distributed architecture with no central point of control, providing greater scalability, redundancy, and permanence. Objects are written to multiple hardware devices, with the OpenStack software responsible for ensuring data replication and integrity across the cluster. Storage clusters scale horizontally by adding new nodes. Should a node fail, OpenStack works to replicate its content from other active nodes.” (from: https://docs.openstack.org/swift/latest/admin/objectstorage-intro.html)

Basic concepts

To understand the basic concepts of account (aka project), container, and object please check this overview from the OpenStack documentation.

Authenticating with Swift

Log into your VM.

Find your credential file (each computer will be different).

user@vm001:~$ ls -a
user@vm001:~$ cat .limbo_cred

export OS_PROJECT_NAME=telescope
export OS_USERNAME=userxxx
export OS_PASSWORD=xxxxx
export OS_AUTH_URL=https://hermes-auth.caida.org
export OS_IDENTITY_API_VERSION=3
  • This file is used to set some environment variables that allow your Swift client to authenticate with the Swift cluster.

To source the file which loads the variables into the environment (note the first dot followed by a space):

user@vm001:~$ . .limbo_cred

To check that it is in the environment you can use:

user@vm001:~$ echo $OS_PROJECT_NAME
telescope

When you use the Swift command:

user@vm001:~$ swift auth

it uses the environment variables to authenticate with Swift.

Accessing objects in containers

If you use Swift with no arguments, it will give you a list of possible commands.

user@vm001:~$ swift

swift list with no argument will list all the containers:

user@vm001:~$ swift list

One of the containers, telescope-ucsdnt-pcap-live, contains the raw pcap files for the last 5 weeks.

To list the contents of the container:

user@vm001:~$ swift list <container name>
  • Note: Since Swift containers store many files, it is not recommended to list all the files in a container.

Example

user@vm001:~$ swift list telescope-ucsdnt-pcap-live | head
datasource=ucsd-nt/year=2020/month=09/day=01/hour=07/ucsd-nt.1598943600.pcap.gz
datasource=ucsd-nt/year=2020/month=09/day=01/hour=08/ucsd-nt.1598947200.pcap.gz
datasource=ucsd-nt/year=2020/month=09/day=01/hour=09/ucsd-nt.1598950800.pcap.gz
datasource=ucsd-nt/year=2020/month=09/day=01/hour=10/ucsd-nt.1598954400.pcap.gz
datasource=ucsd-nt/year=2020/month=09/day=01/hour=11/ucsd-nt.1598958000.pcap.gz
...

Since Swift does not have the concept of a hierarchical system like the UNIX file system does, you can convince Swift to have a pseudo hierarchy by setting a delimiter:

user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d /
datasource=ucsd-nt/
  • Set a delimiter to slash (/). This will list everything until that slash (/).
  • When listing the objects in the container, setting the delimiter will list them as if they are inheritable in which the hierarchy is separated by a slash (/).

Navigate down the hierarchy by including a prefix:

user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/
datasource=ucsd-nt/year=2020/
  • the prefix _datasource=ucsd-nt/_ lists everything that has that prefix and delimited by a /
  • the next element should be _year=2020_

Continue to navigate down the directory:

  • This will list the months
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/
datasource=ucsd-nt/year=2020/month=09/
datasource=ucsd-nt/year=2020/month=10/

Now list the days:

user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/month=09/
datasource=ucsd-nt/year=2020/month=09/day=01/
datasource=ucsd-nt/year=2020/month=09/day=02/
datasource=ucsd-nt/year=2020/month=09/day=03/
datasource=ucsd-nt/year=2020/month=09/day=04/
datasource=ucsd-nt/year=2020/month=09/day=05/
...

To look for the file for 9 AM on the 27th:

user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/
datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz

This type of hierarchy is useful because there are so many objects in the container.

To check how many objects are in the container:

user@vm001:~$ swift list <container name> | wc -l
838

Example

user@vm001:~$ swift list telescope-ucsdnt-pcap-live | wc -l
838

Once you have the file you want to process, get information about the file by using the stat command:

user@vm001:~$ swift stat <container name> <object name>

Example

user@vm001:~$ swift stat telescope-ucsdnt-pcap-live datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz
               Account: 40914e7833c441859d1f0d08866bb74c
             Container: telescope-ucsdnt-pcap-live
                Object: datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz
          Content Type: application/vnd.tcpdump.pcap
        Content Length: 114818289707
         Last Modified: Sun, 27 Sep 2020 11:14:51 GMT
                  ETag: "98212d49cf5e8b02f694b5d551c8ac29"
              Manifest: .telescope-ucsdnt-pcap-live-segments/datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz/1601205047.078826/114818289707/1073741824/
            Meta Mtime: 1601205047.078826
         Accept-Ranges: bytes
                Server: nginx
            Connection: keep-alive
           X-Timestamp: 1601205290.08723
            X-Trans-Id: txaf4ce74e394048e2be429-005f7c15af
X-Openstack-Request-Id: txaf4ce74e394048e2be429-005f7c15af
  • This gives statistics of the object in this container
  • Content Length is the size of the file.
  • Use swift stat --lh to see also the file size in a more human-readable format Example
    user@vm001:~$ swift stat --lh telescope-ucsdnt-pcap-live datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz
                    Account: 40914e7833c441859d1f0d08866bb74c
                        ...
              Content Type: application/vnd.tcpdump.pcap
                        ...
    

To process the file you will need to get the data to your VM because the file is stored on the Swift cluster, not locally.

With the size of the file, you will not be able to download the file due to the size availability on your VM. You can check the storage size and availability of your VM with the following:

user@vm001:~$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        97G  3.0G   94G   4% /

Processing files stored on Swift storage

Because there is not enough available space on the VM to download a file, you can process the data while downloading it on the cluster rather than downloading the entire file and processing it.

These are the tools to do so:

Published