Home   Help Search Groups Login Register  
You are not logged in. To get the full experience of these forums, we recommend you log in or register
Plusnet Usergroup » All Users - The Open Forum » Plusnet Network and Technical Issues » Broadband Platform Traffic Management
Pages: [1] 2 3 ... 9
  Print  
Author Topic: Broadband Platform Traffic Management  (Read 95919 times)
bpullen
Plusnet Staff

Posts: 1980


WWW
« on: November 16, 2006, 08:29:52 pm »

With all the recent performance issues being reported across the platform it seems appropriate to pull all the separate issues together in one thread which will allow discussion to be geared towards the actions we are taking and the plan going forward.

I suppose the biggest concern that customers have at the moment is that we do not have enough capacity and that we are trying to do too much with too little. Our answer to this is that our product team has done the math. We know how much bandwidth each customer is paying for, we know the design of our products and what they should be capable of and finally we know how much bandwidth we have to offer - These figures add up. Up until about 3-4 weeks ago the system was working much better than it currently is and we have added no more customers in that time which strongly suggests that there is something else going on here. With that in mind, what follows is an account of what we have been doing to restore customers' expectations and get to the bottom of the problems...

On the 10th of October we upgraded the software that runs the Ellacoya switches to version 6.4. That upgrade was problematic for a number of reasons. Firstly IRC signature detection didn't work. In addition to this we encountered a hardware problem causing high CPU load on the switches. Our network engineers worked hard alongside Ellacoya to ensure that this and a number of other configuration issues were resolved or otherwise mitigated. We now believe all of these problems to be resolved...

As part of the process of working though the issues following the upgrade to 6.4, we discovered that there were 760 customers who had been assigned an incorrect profile. The profile that they had been assigned had the same effect as giving them the PAYG experience. The result of this was that there was a lot more traffic being prioritised than should have been. More PAYG P2P means that there is less room for non-PAYG P2P/Usenet etc... A number of days ago these customers on incorrect profiles were returned to the profile that they are paying for. We have since been monitoring the platform to see if the impact is as high as we expect it to be.

When we did the last product refresh, a new set of profiles were created to reflect the new products. The old profiles were left on the ERX's to allow for some legacy products that were still going to exist. We have decided to get all the customers on to 1 set of profiles to remove the possibility of these differences causing us any issues. That work is being carried out by the developers right now as a priority 1 problem and is expected to be completed next Monday, although it will take a fortnight or so before we see the benefit of this

We are also aware that there are customers who have had issues with the performance of their VPN traffic. VPN traffic is treated as gold to ensure that it gets the right level of service across the platform. Before 6.4 there was a signature enabled to identify SSL based VPN traffic, but this changed following the upgrade. There is a feature offered by Ellacoya to prevent Skype logins from occurring which is closely tied to the SSL signature. We do not use this feature so it was not active in the config, which meant that the SSL signature wasn't either. Once that was enabled customers SSL based VPN was detected again and given the right level of priority.

One of the reasons we wanted to roll out 6.4 for was its increased ability to identify specific traffic types. We at PlusNet have been talking about encrypted P2P traffic, but it also included specific signatures for other BitTorrent / P2P apps, as well as a variety of streaming applications. From that point of view 6.4 has been a great success. Before the roll out there was about 200MB of unclassified traffic out of 3GB of total traffic. That has now dropped to 80MB.

The Networks Team have also been working hard to identify any games that are being used by the customers and building signatures for them so that they are correctly identified and appropriately treated. That work is really just business as usual for us as new games come out all the time that need this work doing on them.

Another issue is the problem of customers who are passing traffic to another customer on the same GW. As this traffic does not pass through the Ellacoya it is not being identified and correctly prioritised. We have identified a solution to this and are currently testing it on a subset of volunteering customers before rolling the fix to the rest of the customer base.

There have been a lot of questions raised via tickets as well as on the forums about different performance on different ERX's as well as concerns about high ping times on the first hop of customers traceroutes. Taking the different performance on different GW's question first, to some extent it is always going to be the case that different ERXs can have customers doing different things on them, resulting in one ERXs Gold queue (or other queue for that matter) being busier than another. This does mean that there can be different performance on different boxes at different times. Another factor that could influence the situation is that fact that we have implemented a new Core network in our newer Telehouse North facility (PTN). This new Core network has definitely improved performance for the GW that connects to it. The rest of the Core network is in the process of being upgraded at the moment. The new Cisco routers are racked and connected and work continues to be done to get the config done and traffic migrated on to them.

It has also been suggested that perhaps there is a problem on a specific BT 622 Central or even a specific device within a specific 622 Central. We are in a very luck y position on this one, in that our Senior Network Architect has an in depth knowledge of how the BT products and network is designed and has categorically stated that this can not be the case.

Now, with regard to the high ping times on the first hop of a traceroute but not on any of the later hops, for part of the answer it is easier to understand if you understand the hardware architecture of the ERX so I'll try and cover this as simply as I can. Each ERX has 2 main processors, 1 master and 1 for redundancy, as well individual line cards that the BT Centrals are connected to. Each one of those line cards is capable of making a routing decision and all traffic that is not intended for that ERX never touches the main processor. However all traffic that is destined for the ERX itself, is dealt with by the main processor. In the case of a traceroute the first hop is the ERX so that traffic goes to the main processor which then issues the reply, for all the other hops, the traffic is simply forwarded though the line cards so never has to touch the main processor.

The rest of the answer lies in the fact that the Ellacoya sits behind the ERX from the customersí point of view. As everyone know the Ellacoya marks traffic for prioritisation by the ERX. Traffic destined for the ERX from the customer never passes through the Ellacoya, so never gets given a priority marking. We have since implemented a fix that makes the ERX mark this type of traffic for the right level of prioritisation.

A number of customers have been telling us that FTP traffic has been performing very badly. We believe one of the reasons for this is that it is suffering as a result of the incorrect profiling applied to the previously mentioned 760 customers who had been assigned the incorrect profile. . Add to this the fact that the product team has taken the decision to move Business FTP to the Gold queue. Since this has been implemented we have see few if any, reports of slow FTP transfers on business accounts. Text Usenet for all customers has also been moved into gold and again, the feedback from this change has been a positive one.

If you've made it this far then well done!!

We will continue to work at providing the level of service we think our customers deserve and will provide further details over the coming week.

Kind Regards,

bob_cat

Posts: 87


« Reply #1 on: November 16, 2006, 09:27:17 pm »


Thanks Bob for the extensive reply, I think I have actually reconsciled the poor p2p performance as being the price of the present economies of bandwidth for providers. Having spent some years at a telco (although in their broadcast arm) I can appreciate the difficulties you face compared to customer expectations.

I think BTw needs some incentive to invest in their infrastructure to see if they can make a big impact in the economies of connectivity. The telcos need to look at technologies like Raman ring laser DWDM terabit connectivity, such as I know Energis/C&W are doing for one of their big clients. The problem is with margins being tight for competition there is little interest in delivering the extensive investment.

Bob

--
Don't do the cheese if you can't do the dreams.
Matt Norris

Posts: 1

« Reply #2 on: November 16, 2006, 11:46:27 pm »

Dear Mr Pullen,

Thank you for two informative, considered and useful information releases this evening. I am aware you have followed my own personal tickets and I have read your report with interest. It seems I am not alone with the issues you describe above, although perhaps one of the few customers that can tick off almost every one of them as being "affected by"! I sincerely hope this marks a turning point for a return to the previously excellent service PlusNet has provided me and other customers with over the number of years I have been a subscriber.

I am hearing reports from industry colleagues that PlusNet is far from alone on this one, although other ISPs seem less open about their activities than PlusNet, somewhat adding fuel to the fire perhaps?

You are correct in the dates you mention, at least from my point of view. I have most circumstances logged since early September and, while it is far from remedied at the present time, some issues have shown an improvement. I appreciate your reasoning on prioritising Business FTP/VPN as gold traffic, this is understandable - but I would not wish it to be forgotten that a number of subscribers with residential connections now work from home during the day and early evening, using VPN to connect into their own remote networks and using FTP to transfer resources to/from their servers. Clearly if we must upgrade to a business package, so be it, but I would hope their is room in the system to deliver a reliable QoS for this residential traffic, given it's lesser quantity, at these times?

I am currently nowhere near where I expect to be in terms of reliability, responsiveness and actual real-world bandwith, but I hope I can join with other subscribers in thanking you and your team for your efforts in attempting to resolve this clearly complicated set of circumstances.

Matt Norris
mrmojo

Posts: 126

« Reply #3 on: November 17, 2006, 01:02:00 am »


Thanks Bob for the extensive reply, I think I have actually reconsciled the poor p2p performance as being the price of the present economies of bandwidth for providers. Having spent some years at a telco (although in their broadcast arm) I can appreciate the difficulties you face compared to customer expectations.

This is absolutely ridiculous.

I've never understood why plusnet does not properly use the peak time bandwidth. It seems to me that if you get normal premier, you should get 13GB of traffic with ZERO shaping. After that it should be degraded. However, this is not what is happening. Instead everyone on plusnet suffers with horrific p2p speeds (that even normal Tiscali seems to beat).

If the current peak allowances are not 'sustainable', why doesn't plusnet reduce them until they are? I can remember all the spin back when the new FUPs came into play and the idea was that if you had unused peak time allowance, plusnet would not screw with your connection. This has simply not happened.

I think there has been way too much apathy from the PUG on this one. Perhaps noone in the PUG uses p2p, because it seems people like Tam have been doing all the pushing on fixing plusnet's traffic shaping instead of the PUG. If anything the PUG has been busy publishing 'propaganda' in the form of slagging BT off a bit more for plusnet's failings (the report on red exchanges, which is a complete 'red' herring (no pun intended)).
dhookham
Administrator

Posts: 3270


« Reply #4 on: November 17, 2006, 02:07:36 am »

I think there has been way too much apathy from the PUG on this one. Perhaps noone in the PUG uses p2p, because it seems people like Tam have been doing all the pushing on fixing plusnet's traffic shaping instead of the PUG. If anything the PUG has been busy publishing 'propaganda' in the form of slagging BT off a bit more for plusnet's failings (the report on red exchanges, which is a complete 'red' herring (no pun intended)).

I think you're being a bit harsh there... many in PUG use p2p (or try to, any road!), and there is a lot of discussion going on about improving the experience. Our preference is to have reasoned debates with PN then present the findings/outcomes. Would it help anyone if we all just stood here and whinged about the situation, rather than make efforts to find a way forward through dialogue?

It's the PlusNet Way
mrmojo

Posts: 126

« Reply #5 on: November 17, 2006, 03:02:11 am »

Well if you don't publish what you're doing then it looks like apathy. Especially when you put effort into publishing front page news on how many exchanges are red (which is very misleading since many exchanges have many VPs and only one of them being red causes the whole thing to be put as red -- and it would be expected that as the number of VPs rise as broadband use grows, more will hit red, even if it's in the same proportion as years or months ago).
LC100

Posts: 283

« Reply #6 on: November 17, 2006, 07:49:16 am »

Hi

It's very evident there is way too little bandwidth spare if it only takes 760 customers to have an incorrect PAYG profile to negatively affect the remaining 160,000 customers!  So if over of the next couple of weeks 700 or so new PAYG customers join PlusNet we are back to the same situtation again and more playing about with bandwidth management to squeeze them in.

I am a PAYG customer and have no speed problems and don't use P2P, however how long before PAYG customers get extra management to help support this overstretched creaking network?


bob_cat

Posts: 87


« Reply #7 on: November 17, 2006, 08:24:06 am »


I have gone from expecting to be able to achieve what is outlined toward expecting what I see as presently possible. I wouldn't mind going back to the allowances caping situation (because we clearly have gone from those days), but I am not sure the BTw pricing model can sustain it. Unless on the otherhand, when the team did their calculations about how much bandwidth we get for the price we pay they did that on a broad sweep. have allocated for everyone on the basis of an average account and not actually calculated the cost of us Premier customers.

Who knows, but I am just resigned, I know if I go elsewhere I will have a month of grief as I sort everything out and the way the market is growing it won't be long before the network I end up on gets saturated as well. May 2007, the new BTw prices.... sad

Bob

--
Don't do the cheese if you can't do the dreams.
godsell4

Posts: 397

« Reply #8 on: November 17, 2006, 08:59:20 am »

... we discovered that there were 760 customers who had been assigned an incorrect profile ... giving them the PAYG experience. We have since been monitoring the platform to see if the impact is as high as we expect it to be.

And were those 760 people heavy users ? What sort of monthly and peak time usage did they have?

SW.

BBYW1/10GB
bpullen
Plusnet Staff

Posts: 1980


WWW
« Reply #9 on: November 17, 2006, 10:29:18 am »

And were those 760 people heavy users ? What sort of monthly and peak time usage did they have?

Hi,

I will try and get hold of this information for you.

Kind Regards,

godsell4

Posts: 397

« Reply #10 on: November 17, 2006, 12:00:00 pm »

I suppose the biggest concern ... is that we do not have enough capacity ... our product team has done the math. We know how much bandwidth each customer is paying for, we know the design of our products and what they should be capable of and finally we know how much bandwidth we have to offer - These figures add up.

So does this add up? How do we explian the dropped Gold and Silver traffic as described here? Which we are still patiently waiting for a reply to.

A number of days ago these customers on incorrect profiles were returned to the profile that they are paying for. We have since been monitoring the platform to see if the impact is as high as we expect it to be.

Now these people have been moved off the PAYG profile, why as a PAYG customer am I seeing ANY traffic dropped and causing me problems on gaming to which I refer you to ticket 20240771 which started on September/7th.

regards,
SW.

BBYW1/10GB
Simon Day

Posts: 263


« Reply #11 on: November 17, 2006, 03:54:26 pm »

Hi Guys,

I have posted a follow up to this post in the Portal Forums http://portal.plus.net/central/forums/viewtopic.php?p=376524#376524

Thanks

Simon

Simon Day
Network Improvement Consultant
PlusNet Plc
LC100

Posts: 283

« Reply #12 on: November 17, 2006, 05:39:03 pm »

Hi Simon

Thank you for the updates.

It is very evident that the whole system is pushed to the absolute limit.  Are you not at the point where soon gold and titanium queues are demanding more bandwidth than you have available even if everything else is traffic shaped so much it stops?

If you take on 760 new PlusNet PAYG customer's what will happen then? 

If it is clear that some customer's have deliberately been faking signatures of VoIP traffic to get at an unfair share of bandwidth that those customer's are now ex-customers?  They obviously know what they are doing and if they remain customer's they will of course just find another way to steal from the rest of us.  These customers have just shown that the traffic management system has provided them with a method to get more bandwidth rather than a lesser but fairer share, while honest customerís have suffered.

Iíd also would like to say that if money is that tight why not consider putting the prices up?  I am sure most people would be happy paying an extra £1.00 to get a better service, and those price conscious customers that would leave, as you might then seem less competitive will already be leaving to join the likes of Sky and Talk Talk.  Knowing someone with Sky I have to say they are no where near as traffic managed as PlusNet customerís and it costís them just £5.00 a month for 40Gbyte.
psycho99

Posts: 1

« Reply #13 on: November 17, 2006, 05:40:53 pm »

Grateful for the announcement that slow P2P is being investigated.
I do use P2P (Bittorrent)in the off-peak hours and can confirm the miserable performance of late.
I have put up with it hoping it would improve but I think I would have put in a concern eventually.
So, grateful for those who raised this issue in the first place .......

Yours ever, one of the silent (and suffering !) majority
Simon Day

Posts: 263


« Reply #14 on: November 17, 2006, 11:22:03 pm »

Guys, happy to have the discussion, but could we have it in the portal forums? Hard to maintain to forum threads, even with 1/2 a bottle of nice merlot inside me  :mrgreen:

Simon Day
Network Improvement Consultant
PlusNet Plc
Pages: [1] 2 3 ... 9
  Print  
 
Jump to: