MongoDB Best Practices

As we focus more on MongoDB, we have assisted several customers with custom MongoDB environments. During this process we discovered a variety of potentially problematic settings. So, we wanted to take this opportunity to share Engine Yard’s best practices for MongoDB.

If you have a custom installation of MongoDB, please make sure to check your installation against this post. We recommend that you make changes as necessary. If you need help (until we offer Mongo in our product), our Professional Services organization can lend you a hand.

Image result for mongodb.png

General NoSQL best practices

Many articles have been written to address the NoSQL selection process. Factors that influence your choice of database deal with your application’s needs when it comes to: read/write throughput, durability, consistency of data, latency, etc. This criteria is nicely summarized by Nathan Hurst in his “Visual Guide to NoSQL Systems”.

Selecting the right NoSQL database is beyond of the scope of this post, but please do your research. It will pay off in the end as no single solution fits all scenarios. This article assumes that your research has led you to choose MongoDB for your application. We at Engine Yard recommend that you:

Test exhaustively

Test within the context of your application and against traffic patterns that are representative of your production system. A test environment that does not resemble your production traffic will block you from discovering performance bottlenecks and architectural design flaws. Examine your queries closely and always collect metrics.

Don’t assume that what worked for your RDBMS will translate.

Whatever worked on your SQL database may not work on MongoDB so make sure that your expectations are realistic and aligned with the features of the database. For better performance, design your documents and queries according to what 10gen recommends. Understand that your application might need to be re-architected in order to migrate to a non-relational data store. Read “The cost of Migration” for more information on migrating to NoSQL.

Think about the consistency and durability needs of your data.

Think about your durability and consistency needs. We cannot emphasize this enough. During your research you will find that MongoDB offers durability through replication. It is never recommended to run a standalone MongoDB for production use, make sure you understand why.

Optimizing Costs for S3 (Amazon Simple Storage Service)

Optimizing Costs for S3

Amazon Simple Storage Service (Amazon S3) is one of the most popular Amazon Web Services (AWS) offering with flexible pricing. Just pay for the used storage space. Many bloggers, including Werner Vogels CTO of AWS, host their blog while paying less than couple of  dollar a month. On the other spectrum, companies such as Sumo Logic use S3 to store petabytes of data. From our experience of using S3 and other AWS services, we are convinced that for most enterprises, S3 is one of the top five biggest spends among all AWS offerings. In this article we will discuss different approaches for reducing Amazon S3 costs and improving your margin.

There are three major costs associated with S3:

  1. Storage cost: charged per GB / month. ~ $0.03 / GB / month, charged hourly
  2. API cost for operation of files: ~$0.005 / 10000 read requests, write requests are 10 times more expensive
  3. Data transfer outside of AWS region: ~$0.02 / GB to different AWS region, ~$0.06 / GB to the internet.

Based on volume and region the actual prices differs a bit, but optimization techniques stay the same. I will use the above prices in following cost estimates.

Amanda-internet-backup-archiving-Amazon-S3

Basics of S3 Costs

One of the most important aspects of Amazon S3 is that you only pay for the storage used and not provisioned. For example, for 1 GB file stored on S3 with 1 TB of storage provisioned, you are billed for 1 GB only. In a lot of other services such as Amazon EC2, Amazon Elastic Block Storage (Amazon EBS) and Amazon DynamoDB you pay for provisioned capacity. For example, in the case of Amazon EBS disk you pay for the size of 1 TB of disk even if you just save 1 GB file. This makes managing S3 cost easier than many other services including Amazon EBS and Amazon EC2. On S3 there is no risk of over-provisioning and no need to manage disk utilization.

Given this, most S3 users don’t need to worry about cost optimization right away. Engineering time is not free. The best bet is to start simple and worry about the monthly S3 bill after it has crossed the certain threshold. However, there are few basics that are worth getting right as they may be costly to fix down the line:

  • Pick the right AWS region for your S3 bucket. Ensure Ec2 and S3 are in the same AWS region. The main benefit of having S3 and EC2 in the same region is the performance and lower transfer cost. Data transfer is free between EC2 and S3 in the same region. Downloading file from another AWS region will cost $0.02/GB.
    • For example Sumo Logic processes data within the same region same region and mostly eliminates the S3 to EC2 inter-region data transfer cost. If the S3 bucket would be in a different region assuming that each file is downloaded on an average 3 times per month (3 * 0.02 ~= $0.06 / GB), our S3 costs would triple.
  • Pick right naming schema (AWS guide). Though this doesn’t directly impact the S3 cost, it may make S3 so much slower that you need to use an additional caching layer that sometimes can be avoided.
  • Don’t share Amazon S3 credentials and monitor credential usage. A lot of developers bake IAM access keys and/or secret keys inside their application. While this may be required for users to directly perform operations on S3 and may simplify your architecture, it also means that any user can potentially cause a lot of additional costs. This may be malicious or just a simple accident. At minimum:
    • Use temporary credentials that can be revoked. Give them need to know access (minimum rights) to complete the task.
    • Monitor access keys and credential usage on a regular basis to avoid any surprises.
      • A good example of this is on any S3 bucket where the third party can upload objects, you should setup CloudWatch alert on “BucketSizeBytes“. This would prevent malicious users from  uploading terabytes of data in your S3 bucket.
  • Never start with Amazon Glacier right away. If you don’t understand Glacier or your application requirements well, or in worse case your application requirement changes, then you end up paying a lot later on (e.g. there are many horrors stories). Keep it simple. Don’t start with infrequent access storage class unless you don’t plan to read this objects.

Analyze Your Bill

The best way to start your cost optimization effort is to review your AWS bill:

  • On your AWS console review aggregated AWS S3 spend (link to AWS Console).
  • To get more granular per bucket view,  enable cost explorer or enable reporting to S3 bucket.
    • Cost explorer is the easiest to start with.
    • Downloading data from “S3 reports” to spreadsheet gives you more flexibility.
    • Once you reach certain scale (e.g. Sumo Logic bill is over 1 GB / month) using dedicated cost monitoring SaaS such as CloudHealth is the best bet.
    • Keep in mind that AWS bill is updated every 24 hours for storage charges, even if S3 storage is charged by an hour.
  • Getting per object data can be handy, but beware of the cost if you require it on a regular basis.
    • You can enable S3 Access Log that provides you entry for each API access. Keep in mind that this access log can grow very quickly and cost a lot to store
    • You can list all objects using API. Either write your own script or use some third party GUI such as S3 browser.

E.g. 85%+ of S3 costs for Sumo Logic are related to storage. The second group is API call which is around +10% of the S3 cost. However, there are some S3 buckets where API calls are responsible for 50% of costs. We used to pay for data transfers, but right now this cost is negligible.

Cost Optimizations for S3

It usually makes sense to focus on areas where you spend the most – storage, API or data transfer. Some optimizations improve your overall efficiencies while the other, automate waste reductions.

– Saving money on storage fees

Don’t store files that you don’t need! Here are some ideas to consider for reducing your storage costs.

  • Delete files after a certain date that are no longer relevant.

    A lot of deployments uses S3 for log collection, but later send them to Sumo Logic. You may automate deletion using S3 life cycles. Delete objects 7 days after their creation time. E.g. if you use S3 for backups, it makes sense to delete them after a year.

  • Delete unused files which can be recreated.

    Same image in many resolutions for thumbnails/galleries that are accessed rarely. It may make sense to just keep original image and recreate other resolutions on the fly. E.g. Sumo Logic binaries. We can rebuild our binaries using git. We need them to deploy a new version of software, but there is little sense to keep binaries older than one year. We use lifecycle rules to delete them.

  • When using S3 versioned bucket, use “lifecycle” feature to delete old versions.

    By default delete or overwrite in S3 versioned bucket keep all data forever and you will pay for it forever. In most use cases you want to keep older version only for certain time. You can setup lifecycle rule for that.

  • Clean up incomplete multipart uploads.

    Especially if you upload a lot of large S3 objects any upload interrupt may result in partial objects that are not visible, but you pay to store them. It almost always makes sense to clean up incomplete uploads after 7 days. If you have a petabyte S3 bucket, then even 1% of incomplete uploads may end up wasting terabytes of space.

– Compress Data Before You Send Them to S3

Almost always use some fast compression such as LZ4, which gives better performance and at the same time reduce your storage requirement and hence the cost.In many use cases, it makes sense to use compute-intensive compressions such as GZIP or ZSTD.

You usually trade CPU time for better network IO and less spend on S3. E.g. Most of Sumo Logic objects are compressed by GZIP, but we are investigating better compression. Most likely we will migrate to ZSTD. This gives us better performance and we use less space.

– In Big Data Applications, Your Data Format Matters

Using better data structures can have an enormous impact on your application performance and storage size. The biggest changes:

  • Use binary format (e.g. AVRO) vs. human readable format (e.g. JSON). Especially, if you store a lot of numbers then binary format such as AVRO can store bigger numbers with lesser storage as compared to JSON. For instance, “1073741007” takes 10 bytes in JSON versus number represented in AVRO as 4-bytes integers.
  • Using row-based vs. column based storage. The general rule of thumb is to use columnar based storage for analytics batch processing which can provide better compression and storage optimization. However, this topic deserves its own article.
  • What you should index, store metadata or what should you calculate on the fly. Bloom filter may reduce the need to access some files at all. Some indexes may waste storage with a little performance gain. Especially if you have to download the whole file from S3 anyway.

– Use Infrequent Access Storage Class

Infrequent access (IA) storage class provides you the same API and performance as the regular S3 storage. IA is approximately four times cheaper than S3 standard storage ($0.007 GB/month vs $0.03 GB/month), but the catch is you pay for the retrieval ($0.01 GB). Retrieval is free on standard S3 storage class.

If you download objects less than two times a month, than you save money using IA. Let’s consider following three scenarios where IA can considerably reduce the cost.

  • Scenario 1 : Using IA for disaster recovery

At Sumo Logic we use IA for backups. These backup files are used for disaster recovery. It makes sense for us to directly upload  any object over 128KB to IA and save 60% on storage for a year without losing availability or durability of the data.

  • Scenario 2: Using automation to move unwanted files to IA

At Sumo Logic we use S3 to distribute binaries. We usually use them just up to a month after initial upload to S3. However, in the rare case, we still need the capability to quickly rollback to an older version. We use S3 lifecycle to automatically move binaries after 30 days to IA. This approach reduces the cost without compromising data availability.

  • Scenario 3: Use IA for infrequently accessed data

Sumo Logic’s system is designed in such a way that in few places we use S3 as a final source of truth for reliability reasons, but we only access it when an EC2 machine goes down or during data migration, which is infrequent. Given that there is some class of S3 objects that is downloaded on average 20% times a month, it makes sense just to keep them in IA. For every 1GB, we save $0.021 GB / month S3 Standard cost GB/month – IA Standard Cost GB/Month – IA Access cost=0.03 – 0.007 – 20% * 0.01). Multiply that by a petabyte and that’s just the monthly savings.

IA is great, but when is it not?

IA has the restrictions such as minimum data size cost and minimum storage retention period. IA charges for at least 128KB data and minimum 30-day storage. In addition, data migration to and from “S3 standard” costs one API call.

However, IA is significantly easier to use than Glacier. Recovery from Glacier can take a very long time and any increase in speed will increase your cost. If you store 1TB of data on Glacier then you can extract that data for free at the rate of 1.7 MB / day. In order to recover 1TB in an hour will require 998 GB / h peak recovery rate. This will cost 0.01 * 998 * 24 * 30 = $7186! If you decide to recover 1TB in 2 hours, you will pay $3592.

How to Save on API Access

Here are some tips on how you can the reduce costs for your API access.

– API calls cost the same irrespective of the data size

API calls are charged per object, regardless of its size. Uploading 1-byte costs the same as uploading 1GB. So usually small objects can cause API costs to soar.

PUT calls cost $0.005 /1000 calls.

For instance API cost is negligible if you have to upload 10GB in a single file. If a file is divided in 5MB chunk it costs ~ $0.01. However, for 10KB file chunks, it will cost you ~ $5.00. You can see the exponential growth in cost as you upload smaller files.

– Batch objects whenever it makes sense to do so

Usually, a lot of tiny objects can get very expensive very quickly. It makes sense to batch objects If you always upload and download all objects at the same time, it is no-brainer to store them as a single file (using tar). At Sumo Logic we usually combine it with compression.

You should design a system to avoid a huge number of small files. It is usually a good pattern to have some clustering that prevents small files.

For example, instead of creating a new file, you can group the data in the same file until 15 seconds have elapsed or file size is 10MB. Create new file every 15 seconds and or, every 10MB whichever you hit first.

If you have tiny files, it usually makes sense to use some database like DynamoDB or MySQL instead of S3. You can also use a database to group objects and later upload it to S3. 10 writes per second in DynamoDB cost $0.0065 / hour or $(0.0065/3600) /sec. Assuming 80% utilization, DynamoDB provide $0.000226 / 1000 calls ([{0.0065/3600}*1000] / (10 * 0.8) ) vs. S3 PUT at $0.005 / 1000 calls. That is 95% cheaper to use DynamoDB over S3 in this use case.

The S3 file names are not a database. Relying too much on S3 LIST calls is not the right design and using a proper database can typically be 10-20 times cheaper.

How to Save on Data Transfer

If you do a lot of cross region S3 transfers it may be cheaper to replicate your S3 bucket to a different region than download each between regions each time.

1GB data in us-west-2 is anticipated to be transferred 20 times to EC2 in us-east-1. If you initiate inter-region transfer, you will pay $0.20 for data transfer (20 * 0.02). However, if you first download it to mirror S3 bucket in us-east-1 then you just pay $0.02 for transfer and $0.03 for storage over a month. It is 75% cheaper. This feature is built into S3 called cross region replication. You will also get better performance along with cost benefit.

If there are a lot of downloads from the servers which are stored in S3 (e.g. images on consumer site) then consider using AWS content delivery network (CDN) called AWS CloudFront. AWS CloudFront can be in some cases cheaper (or more expensive) than using S3. However, you gain a lot of performance.

There are CDN providers such as CloudFlare who charge a flat fee. If you have a lot of static assets then CDN can give a huge savings over S3, as just a tiny percent of original requests will hit your S3 bucket.

You may use S3 to save on data transfer between EC2 in different availability zones (AZ). The data transfer between two EC2 in different AZ costs $0.02/GB. The data transfer between two EC2 in different AZ costs $0.02/GB, but S3 is free to download from any AZ.

Consider the scenario where 1 GB data is transferred 20 times from one EC2 server to another in different availability zone. It will cost $0.20/GB (20 * 0.01). However, if you are able to upload it to S3, then you just pay for storage ($0.03 / GB / month) and the best part is that data transfer between S3 and EC2 is free. S3 charges on per hour per GB. Assuming data is deleted from S3 after a day , the S3 cost will be $0.001. 99% cost savings on that data transfer by using S3.

Final Thoughts

There are a lot of opportunities for S3 specific optimizations. Project management can be an absolute data-driven meritocracy. You can estimate the savings and the effort required to realize the savings. However understanding what’s going on and managing the complexity can be challenging. In Sumo Logic’s journey we were able to get 70%+ of savings over initial implementation. Optimizing costs can be as much fun as playing computer games. You tweak a few things and your AWS bill goes down.

Written by: Jacek Migdal

Post Reference From: https://www.sumologic.com/aws/s3/s3-cost-optimization/

Scheduled maintenance on some of our LS1 & LS8 UK servers

Reason: Power maintenance in the Data Center

Maintenance Window:

Day and Date: Tuesday, 15th October 2017
Start Time: 20:30 IST
Duration: 3 hours

Affected Services:

All Applications hosted on the Mccoy LS , LS8 UK servers will be down. Also, if your Application uses one of the above-mentioned servers, your Application will be inaccessible during the maintenance.

Please feel free to contact our support helpdesk in case you have any queries.

 

Intermittent network issues while connecting to our servers

Dear Customer,

We are facing intermittent network issues while accessing the websites and mails hosted on our servers using Airtel ISP. Our Network team is already checking this issues as this seems to be specific to only AIRTEL ISP. We have not observed any alerts or data loss when we try to acess our servers from various GEO locations.

We have contacted Airtel concerning this issue to figure out if there is any issue from their end.

If you face issues while connecting to our servers, please get in touch with our Support help desk with the following results :-

1. Ping
2. Traceroute
3. ISP details (ISP Name and IP address). You can check your IP address by accessing the website “http://whatismyip.com“.
4. Try accessing your websites using different ISP
5. Traceroute results for another website(Google.com) in order to compare

These results will help us as well as Airtel ISP to narrow down the issue and find a fix for it.

 

Feel free to contact our support desk.

Mccoy partners with Tally, worlds leading computer company.

Mccoy Global Links Pvt Ltd and TALLY (India) Pvt. Ltd. the India sales & marketing arm of the Tally Solutions group, sign MOU for RFID enabled Tally Products, September 25th, 2017. As Tally’s Preferred RFID Partner, Mccoy will be providing the RFID enablement for the TallyShoper Retail POS(Point-Of-Sale) of India’s largest product software company. This targets the high-growth Retail industry. This will also be deployed in an upcoming industry RFID pilot. RFID is a data collection technology that provides suppliers, retailers, manufacturers and distributors with up-to-the-minute supply chain visibility, from inventory and logistics to freshness dates. RFID is powerful because it provides a unique identifying serial number to each and every item, unlike bar codes that typically have a unique number for a set of similar items. According t0, Tally Solutions “Tally’s alliance with Mccoy in RFID is a key element in our strategy to provide our customers RFID support in our products as part of RFID solutions for small, medium and large enterprises.

RFID is gaining momentum in a variety of sectors, especially in the Retail and Supply Chain, where we have a key focus with our TallyShoper and Tally Ascent products. We do believe with the combination of our leading products and Mccoy’s expertise in RFID, we offer an edge to address the requirements of our customers in this rapidly emerging segment.”, “This relationship of Mccoy RFID expertise and intellectual property and Tally’s leading products will together bring an exciting value proposition for customers in the Retail and Supply Chain industries.” He added, “Mccoy’s RFID technology will eventually help customers reduce supply chain costs while speeding the distribution process, ultimately providing customers with better product availability.”partners with Tally, worlds leading computer company.

Strategy

” Mccoy’s Strategy.To become a capable provider of strategic research solutions for large enterprises & our Esteemed Customers”

Our vision: By continually delivering excellence, we inspire trust and loyalty in our customers.

With smart team work and partnerships we deliver cost competitive results with measurable impacts through excellence in delivery and expertise in software and project management.
High agility, low inertia and flat hierarchy lead to efficient project completion.
Our products and solutions are highly scalable and customizable to meet every need keeping in line with the basic theme
Mccoy team has a rich and diverse background and bring their varied experience to every project they work on.
Compliance to technology standards, business ground rules and Governmental rules & regulations are always on top of the agenda of team

Insight

Banking & Insurance

” Banking needs are more complex than others. Mccoy has 10 plus years of corporate experience and expert staff to design, implement and come up with Banking Products for small- and medium-sized Banks and nonprofit organizations. Through powerful cash-management tools, creative, specifically tailored lending solutions, or holistic asset-management strategies, Mccoy creates financial solutions to help you achieve your specific business goals.”

Mccoy ‘s Banking & Financial Services Team provides a broad range of transactional, regulatory, advisory, and dispute resolution solutions to banks, insurance companies and other financial services providers, their trade associations, and vendors

Mccoy also has few products to audit, compensation, and other committees, investors, outside directors and officers. We provide Custom Solutions to all types and sizes of banks, finance companies, and other regulated businesses in the financial sector regarding banking, corporate, securities, antitrust, and financial service provider laws and regulations, focusing on federal and state law.

Telecommunications

” Customers are typically those that desire depth in telecom domain, high levels of service, cost efficiency through global delivery and flexibility. Our long standing client relationships with the leaders in the telecom industry gives us a strong and differentiated positioning as a leading global software services company serving the communications industry”

Mccoy offers various technology and domain specific solutions in area telecommunications industry , spanning from telecom operators, equipment providers, software providers, content providers as well as allied service providers.

Healthcare

“Our approach is to provide integrated business solutions that benefit the entire organization using a combination of IT & BPO solutions and our goal is to deliver significant business value to our clients.”

Mccoy offers services including Medical and Legal Transcription, Application Development, Program Management, Technical Customer Support, Network Server Management, Data-base Administration and infrastructure maintenance support in the area of Healthcare. Mccoy has Experts and have deep domain knowledge of healthcare industry and various IT solutions, and our team consists of a group of professionals with extensive experience with major Healthcare vendor products like MEDIPRO, HealthAssist in areas of product implementation, and integration.

Media & Entertainment

” Mccoy ’s experience and technology partnerships to provide a diverse range of value-added services offerings to help drive value and reduce cost for our Media clients. We develop and implement sustainable domestic and international applications that span numerous aspects of the media value chain including television, theatrical, broadcast, digital distribution, and home video.”

Media, Entertainment & Lifestyle industry is uniquely characterized by the growth of immense variety of infotainment delivery channels, each having different characteristics, market requirements and customer preferences. Over the years the need for technical solutions in this industry has changed in quality and quantity. With the progress of technology, Media and Entertainment industry has reached new horizons. Media conglomerates need technological solutions to cross-leverage their assets, enable innovative business processes and enhance the transparency of revenue streams.

Manufacturing & Retail

” At Mccoy , we strive to provide our manufacturing clients all the tools that help achieve more productivity. Our Solutions traverse procurement, production and sales boundaries to deliver across the enterprise, including finance, human resources, and costing.”

With choosy consumers, changing market trends, and globalizing market footprints, the Retail world is in a constant state of change. To succeed, retailers must be alert, in order to outfit the needs of culturally and demographically diverse populations.

With Mccoy ’s extensive experience in servicing the IT needs of all major retail format types, including grocery chains, discount stores, department stores, specialty retailers and Internet/catalog stores, we are well versed with requirements of the industry. We have adopted an integrated, business centric approach. We ensure that our clients are offered integrated services ranging from consulting, application development and maintenance.

Logistics & Transport

” Mccoy Manufacturing suite can help you get real-time foresight into events and supply chain disruptions, and give you the ability to dynamically orchestrate equipment, people, partners, and shipments to meet and exceed your customers’ expectations.”

Supply chain providers are under pressure to maintain revenue, market share and shareholder value – all in the face of demand declines, new regulations, shorter product lifecycles, and increasing competition. It doesn’t help matters that today’s supply chains are complex, with information constantly streaming in from all directions. Without actionable insight or the ability to sense & respond immediately to the events that impact your business day in, day out, your business is operating with blinders on.

Amazon Simple Storage Service

Optimizing Costs for S3

Amazon Simple Storage Service (Amazon S3) is one of the most popular Amazon Web Services (AWS) offering with flexible pricing. Just pay for the used storage space. Many bloggers, including Werner Vogels CTO of AWS, host their blog while paying less than couple of  dollar a month. On the other spectrum, companies such as Sumo Logic use S3 to store petabytes of data. From our experience of using S3 and other AWS services, we are convinced that for most enterprises, S3 is one of the top five biggest spends among all AWS offerings. In this article we will discuss different approaches for reducing Amazon S3 costs and improving your margin.