November 19, 2008, Wednesday, 323

Starfish Bugs and Design Issues Wiki

From DBWiki

Jump to: navigation, search

Contents

Design Issues

Bulk Data Transport

Problem:

NFS and Samba's write speeds are faster than Starfish's in a single client/single storage target setup. NFS is heavily UDP-based, but might be using some other tricks to write the data. Samba is slower than NFS, but still does a pretty good job at writing data fast. Much of the problem has to do with every other solution being implemented as a kernel extension or a very optimized C/C++ program.

Solution:

  • The bulk data transport mechanism isn't very efficient at the moment. We should move to a UDP-based mechanism that does check-pointing writes.

127.x.x.x Issue When Unicasting

Problem:

Several people have had the issue in Ubuntu where their machine name resolved to 127.0.1.1. Since Starfish uses the first device that resolves against the machine name as the primary unicast interface, when another machine tries to connect to a remote machine's Starfish unicast port, it attempts to connect to 127.0.1.1, which is itself instead of the other machine.

Solution:

  • By default, Starfish shouldn't allow the use of the 127.x.x.x class of IP addresses since, in most cases, it is wrong.

Bugs

Data Mirroring

Problem:

Data is currently not mirrored, this is a key feature of Starfish and needs to be implemented ASAP.

Solution:

  • Most of the back-end code for mirroring is there, the mirroring functionality just needs to be added.

Networking Issues

Problem:

Hosts won't find each other if they use an interface besides "eth0".

Solution:

  • Make multicast data transmitted on all interfaces, not just "eth0". Poll all interfaces, not just the one associated with the system's hostname.

SCP Issues

Problem:

SCP copies succeed initially, but are then unfortunately overwritten due to a caching violationg due to a wierd syscall call that scp performs when doing a remote copy.

Solution:

  • The problem has been identified, a fix has yet to be implemented.

Write Speed Issues

Problem:

The latest release has introduced a severe performance hit against write speeds (slowed to 23.9Mbps from 53.8Mbps)

Solution:

  • The issue has to do with the amount that a Python thread sleeps before being woken up and more specifically has to do with Python's implementation of threads and the Global Interpreter Lock. We're considering going to a tmpfile-based write stream and a different thread notification mechanism for asynchronous unicast engine write-streams.

Storage Client Disconnects Sometimes Lead To Full CPU Utilization

Problem:

When a client or storage node disconnects for any reason, sometimes the peers will spike CPU usage at 100%.

Solution:

  • This is a bug that was introduced in the latest release. It is a known bug in the network driver of each Starfish Peer - there is a fix, we just haven't tested it for release. The fix will be in the next release of Starfish.

There is no method to specify multicast address to use

Enhancement:

Starfish currently publishes its existence via multicast. This can be an issue for routers that need to keep close tabs on the multicast traffic that they're allowing and the traffic that they're denying. It would be nice if one could restrict the range of IP addresses that are selected via a more restrictive netmask... for example, you could tell Starfish to pick it's multicast addresses like so:

starfishd --multicast-network 232.40.50.0/28

The command above would instruct Starfish to select from the 232.40.50.1-232.40.50.14 IP address range.

Solution:

  • We should add a flag that allows the administrator to set a particular multicast address netmask if they want to do so.

No configuration option for binding to particular unicast network address

Enhancement:

There is currently no method to specify which unicast network address should be bound to for the Unicast UDP and TCP/IP ports.

Solution:

  • Add a command line option to specify which Unicast UDP and TCP/IP addresses should be bound to on startup.

Read re-transmits fail over ADSL VPN links

Problem:

Read request re-transmits seem to fail over ADSL VPN links.

Potential Solution:

We should eventually support ultra-high latency applications. We might want to think about how we handle UDP requests for non-volatile operations. Basically, non-volatile read operations should be allowed to be re-transmitted as long as a response has not been received for up to 15 seconds.

Reads are not cached

Enhancement:

Currently, reads are not cached, which can cause speed issues when re-reading files that are accessed often.

Potential Solution:

Implementation of a size-tunable disk-based read-cache should fix this issue.