Friday 29 December 2017

Python sets with examples

Introduction:

Python is a powerful programming language that has started regaining its fame for its usage in the Data Science along with the latest technologies like R and etc. Having said that, let us take a look at the tiny winy bits of concepts to get ourselves stronger in this programming language.
In this article, we will try to understand the concept of sets in Python programming language. To provide you a better understanding on what sets can do in Python programming language, take a look at the following section.

Python sets:

Just like dictionaries, sets have no order in the collections of items or objects. Sets require that the items or objects that are stored within are hashable. Sets also require that the elements present in them are unique and immutable – just the opposite of what we have seen in lists. However the set itself is mutable, that is, we can add or remove elements from it. To sum all the points that we have discussed over here, we need to use sets when we have an unordered set of values that are unique, immutable and which are hashable as well.
A set of elements can be created by putting the elements in curly braces or use the built-in set() function to add elements to it. Please take a look at the following example for your better understanding.
veryFirstSet = {1, 3, 5, 7, 11, 13}
veryFirstSet = set([1, 3, 5, 7, 11, 13])
Creating a set with no elements within it is a bit of a tricky process. If we try the following, you would observe the reason why?
Creating a set with empty curly braces creates a dictionary instead of a set, but whereas creating an empty set using the built-in function set() creates a set without any troubles.
As we have discussed earlier, sets are mutable – but since they are unordered, indexes would not make any sense to it. We will not be able to access or update an element based on the index of an element in a set, as set does not support both indexing and slicing. To add an element to a set, we can use the add() method and to add more than one element to a set, we can use the update() method – in either of the cases, duplicates are not allowed to be added to the set.
The following example shows the commands in its usage, just the way it is explained earlier.
Now to remove elements from a set, we can use either the discard() method or the remove() method. The difference in usage of these two functions is – when discard() method is used to remove an element of the set which doesn’t exist in the set, it will not raise an error but the same case when remove() method is used, it will error out. The following example will show the usage of both these methods on a set of elements.
Similar to the methods that we have used on lists, we can use the pop() method and the clear() method on the set. The only difference here with set is that, there is no order of elements that will be maintained and hence the reason, we will not be able to determine which element of the set will be removed using the pop() method. In case of list, the last element of the list gets popped when the pop() method is used. Let us now take a look at an example on how this can be achieved.
Let us know concentrate on the set operations like ‘Set Union’, ‘Set Intersection’, ‘Set Difference’ and ‘Set Symmetric Difference’. Sets can be used in Python to carry out mathematical set operations like Union, Difference and Intersection. We can achieve performing these activities by operators or by using methods.
Let us now consider an example of two sets with values as like given below:
setA = {1, 2, 3, 4, 5, 6}
setB = (4, 5, 6, 7, 8, 9)
Now with these set values, let us carry out the 4 different operations as we have discussed just a little bit earlier.
A Union operation applied on two different sets, will result in all the elements from both the sets excluding the duplicates (if there are any present). Union is performed by using the | operator in Python programming language or by using the union() method on the sets that we want to perform this operation. A Union operation on the sets defined above will be giving a result as shown below.
An Intersection operation applied on two different sets will result in the elements that are common between both of these sets. An Intersection operation is performed by using the & operator in Python programming language or by using the intersection() method on the sets that we want to perform this operation. An Intersection operation on the sets defined above will be giving a result as shown below.
A set Difference operation applied on two different sets will result in the elements present in the first set but not available in the second set. A Difference operation is performed by using the – operator in Python programminglanguage or by using the difference() method on the sets that we want to perform this operation. A difference operation on the sets defined above will be giving a result as shown below.
setA – setB would yield set(1, 2, 3) whereas setB – setA would yield set(7, 8, 9).
A set Symmetric Difference operation applied on two different sets will result in the elements present in both the sets excluding the ones that are repeating in both the sets. A Symmetric Difference operation is performed by using the ^ operator in Python programming language or by using the symmetric_difference() method on the sets that we want to perform this operation. A symmetric difference operation on the sets defined above will be giving a result as shown below.
Conclusion:
In this article, we have seen what sets are and why are they used in Python programming language. We have also tried to take a deeper look into the concept with various examples as well.
Hope that you were clear with the concepts after going through this detailed article, please do comment if you have any suggestions to make.

If you want more about python visit mindmajix

Thursday 28 December 2017

JIRA Tutorial

This tutorial gives you an overview and talks about the fundamentals of Atlassian JIRA.

What is JIRA ?

At its very core JIRA is a software program that helps organizations manage their issues, tasks, processes, and projects. However, it is ‘smart software’ because much of the tedious stuff related to issues, or tasks, or processes, or project management, can be automated fairly easily. 
JIRA started life in Atlassian (the parent company) as a way to track issues related to software ‘bugs’. It wasn’t long before additional features were added and it grew into a project management program with the ability to ‘automate’ a lot of the work that is necessary but provides little value, such as emailing or phoning a work colleague when a task is done and ready to be handed off.

The Jira Architecture

Installing JIRA is simple and straightforward. However, it is important for you to understand the components that make up the overall architecture of JIRA and the installation options available. This will help you make an informed decision and be better prepared for future maintenance and troubleshooting.
However, for day-to-day administration and usage of JIRA, we do not need to go into details; the information provided can be overwhelming at first glance. For this reason, we have summarized a high-level overview that highlights the most important components in the architecture, as shown in the following figure:
 
Web Browsers
JIRA is a web application, so there is no need for users to install anything on their machines. All they need is a web browser that is compatible with JIRA. The following table summarizes the browser requirements for JIRA:
 Browsers Compatibility
 Internet Explorer
 8 0 (not supported with JIRA 6.3)9.0, 10.0, 11.0
 Mozilla Firefox
 Latest stable versions
 Safari
 Latest stable versions on Mac OSX
 Google Chrome
 Latest stable versions
 Mobile
 Mobile SafariMobile Chrome

Application Services

The application services layer contains all the functions and services provided by JIRA. These services include various business functions, such as workflow and notification, which will be discussed in depth in Jira training, Workflows and Business Processes, E-mails and Notifications, respectively. Other services such as REST/Web Service provide integration points to other applications The OSGi service provides the base add-on framework to extend JIRA’s functionalities.
Learn Atlassian JIRA from Scratch and start using it effectively for Software Development. 
 
Data Storage
The data storage layer stores persistent data in several places within JIRA. Most business data, such as projects and issues, are stored in a relational database. Content such as uploaded attachments and search indexes are stored in the filesystem in the JIRA_HOME directory, which we will talk about in the next section. The underlying relational database used is transparent to the users, and you can migrate from one database to another 
Installation of JIRA would be covered as part of  JIRA Training.

1. Groups Versus Roles

OVERVIEW
The difference between JIRA groups and JIRA project roles seems to confuse many JIRA administrators. This chapter explains the differences and what each one is good for.
JIRA originally just had users and groups of users, and no project roles. Groups were pretty powerful — wherever you could do something with a user, you could generally use a group instead.
For instance, if you wanted to allow a specific user john.smith to change the Reporter field in a project’s issues, you could:
1. Create a new permission scheme with a description something like john.smith can change Reporter.
2. Next, add the john.smith user to the appropriate Modify Reporter permission entry in the new permission scheme.
3. Change the appropriate JIRA project to use the new permission scheme.
You could also do the same thing with a group:
1. Define a new JIRA group named Can Modify Reporters.
2. Add the user john.smith to the new group.
3. Create a new permission scheme with a description something like Added an extra group of users that can change Reporter.
4. Add the group (instead of the user) to the appropriate Modify Reporter permission entry in the new permission scheme.
5. Just as before, change the appropriate JIRA project to use the new permission scheme.
Both of these approaches now allow john.smith to change the Reporter field. So far so good, but there are two main problems with using JIRA groups like this: scaling and updating.
Scaling
If you want john.smith to be able to edit the Reporter field in some projects, and also allow a different user, jane.bloggs, to do the same thing in other projects, then you have to create two permission schemes, one for each user being granted this permission. If you then decide that they are both allowed to edit the Reporter in some shared projects, then you need a third permission scheme. With lots of users, this leads to an explosion in the number of permission schemes (and any other JIRA scheme that supports groups).
Keeping track of the difference between each of these permission schemes is tedious and error-prone, even with the scheme comparison tools (Administration→Scheme Tools), which are themselves deprecated in JIRA 6.4.
Updating
As time passes, users will likely need to be part of different JIRA groups. Only JIRA administrators can change the membership of JIRA groups. However project leads are allowed to make changes to project roles, and project leads usually know which project roles a user should currently be part of. Using project roles means fewer tasks for JIRA administrators.


Source: Mindmajix
Explore More courses Mindmajix

Tuesday 26 December 2017

PeopleSoft HRMS

Different kinds of organizational relationships in HRMS

                Different organization relationships a person can have are: Employee, Contingent worker or Person of Interest as per the person model. Employee is a permanent staff member of the firm who is paid by company payroll and has a hire row in job with all associated information like company, department, location and pay group etc. Example is a permanent position for a chief executive or sales manager in the firm.
Contingent worker is a contractor in the organization mostly on an assignment. Personal information and job information is stored for a contractor in the system but person is not on the company payroll. Example can be a person from a third party company which is acting as a vendor for the firm to conduct a project for the firm needs. This person is a part of the project and working in the firm till the end of the project.
Person of interest has many flavors. There are various types of POI – Pension payee, external trainee, external instructor or board member. They are members who are connected to the firm due to some business purpose. There are two types of POIs- who have a row in job data and others who do not have. POIs who do not have job information have a separate mechanism in PS to maintain their data security.

Global assignments

Global assignments enables the organization to assign employees to a global assignment and to monitor, compensate and track education, qualifications for the employee and dependents as they move to multiple locations in the organization having presence in different geographies.
Employees are based in a Home location. The employee data, benefits and payroll information is maintained at the home location.
The new location where the employee moves is the Host location.
PeopleSoft delivers three options for International Security:
ii) Host can see Home
iii) Both
International Security can be setup by navigating to: Setup HRMS -> Security -> Core Row Level Security -> Security Installation Settings

i) Home can see Host

PeopleSoft Update Manager

Navigation:  support.oracle.com > login > PeopleSoft Update Manager (PUM) Home Page (Doc ID 1641843.2)
Oracle Support delivers periodic content updates to entities that license their PeopleSoft HCM product. (These updates are called Bundles in Campus 9.0, but they are called Images in HCM 9.2 and Finance 9.2).
The PUM Home Page provides a lot of information, and typically only ERP and other technical staff will spend much time examining the Oracle delivered pages.
HCM Update Image Home Page
Only a PeopleSoft Admin can apply an Image to an environment, and they will always apply Images to test environments first.  PeopleSoft Administrators work for ERP in the SBCTC office and no college employee will be expected to apply Images to the PeopleSoft environment
It takes a lot of people a lot of time to do complete testing for one Image, so patience is always needed as testing is performed for the pillar
Customizations are always at risk of breaking or being made redundant in an update so extra care will always need to be taken in testing HCM CEMLIs.

Source: Mindmajix

Explore more courses visit Mindmajix





Friday 22 December 2017

Splunk Education

Splunk is a multinational company that produces software products for enterprises that handle big data. Their product help in turning machine oriented data to information making sense to their IT, Business, Marketing, HR,  Finance, Audits, Security kind of departments. 
splunk education, splunk
They provide different platforms and solutions for into communications, healthcare, education, non-profit, Financial services, Retail, Public sectors and  Energy & Utilities industries.

Why enterprises are considering Splunk? 

Let me ask you a question- What are log files?
“ A file that records either events that occur in an operating system or other software runs, or messages between different users of a communication software.” 
When you want to know the state of your mind, you sit at one place, close the eyes and just observe the body and mind. Similarly, for any system, software- to know the state of its’ mind you search their log files. In the same way, when you want to know the state of all the devices located in one particular data center, looking at logfiles of each device is time taking. Here comes Splunk. 

Splunk Education

Let us assume a company has taken Splunk Enterprise software for their business data processing purposes.  The employees has to get minimum knowledge to use it to deploying , customizing and designing the modules as per their company’s needs. 
So Splunk Education is an online platform that trains the company professionals to have a broader understanding about the product. 

Mode of Training Offered by Splunk Education

As these are designed for the adults with a technology background to learn new technology, they are provided with such a training model that maintains a track, provide alerts, and focused on achieving goals.
There are different  training models of delivery provided by Splunk Education They are 
1. Visual- Instructor based:
a.Public Class
b.Private Class
2. Classroom
3. E-learning
4. Custom Based.
Visual Instructor based is a mode of training that allows you to take training from your location across the internet: 
One can opt the time where he/she can join a group of professionals who are taking the same course along with you online- called as Public Class or can take individually called as Private Class.
Custom Based is a mode of training provided to specific division of professionals in a company who wanted to get trained on specific software module. 
As, the training is mainly offered for the people who are working with similar backgrounds, they always relate to their real-time environment. So the training should have these methods that relate to them. Splunk education provides not only tutorials that teaches basis on the concepts, method and process but also makes the professionals to implement them practically.  They are provided with maximum amount of information in minimum possible time by making them to do hands on exercises and  questionnaires. 
Who can attend Splunk Education? 
In a company, if a common software platform is used,  all the employees right from those who uses it for data recording or entering the details to maintaining the data in the server  there would be different level of expertise professionals working on similar data. In addition, some data is reported and analysed for making business decisions. In the entire process, some of the important roles that play on Data are 
* End users
* Enterprise Administrators
* Customers
* Security Administrators
* Architects
* App Developers
* IT Surveillance
* And many other
For all these categories, Splunk provides different courses with different levels of 
expertise and experience. 

Source: Mindmajix Splunkadmin

Explore more courses  Mindmajix

Thursday 21 December 2017

The Adobe CQ5 Content Management System

Over the past four months, I have been involved in a project using Day's (now owned by Adobe) Communique 5 (CQ5) Content Management System. In the past, I've used several off-the-shelf and custom Content Management System. CQ5 is most comparable to Magnolia, a product that I used last year. CQ5 uses the Java Content Repository and Apache Sling to create a powerful tool. This Content Management System is being used by General Motors, McDonald, Volkswagen, and Audi .

When installed, the application comes with a default project to base ideas on. CQ5 also allows marketing campaigns and the ability to serve up content across multiple channels, such as mobile. It also comes with social networking aspects, including blogs and forums and the ability to set up custom workflows for publishing and approval of various aspects

Here are a few of my thoughts about CQ5 over the past three months:

1) I think the tool could be a powerful one, but there is a steep learning curve involved, and I've had to pick this up myself, without any training or guidance. (The same happened last year when I learned Magnolia.) Unfortunately, there's little of documentation available and a lack of a support group; there is a Google forum and an API.

2) Lack of consistency with the code.

3) Stability issues with the environment.

4) Lack of up-to-date and correct documentation available. On the Day website, there are tutorials to help you get started, but these tutorials were out-dated and simply did not work when followed. This was not due to user error as it was also reported by all of my colleagues. (At least there is an API to help guide you, but a lot of the comments are out-of-date or non-descriptive.)

5) The development environment that you need to use to develop is CRXDE, which is based on Eclipse, but it is buggy and adding a file manager (Vault) to the process causes even more complications. I was also getting many crashes using this, and a lot of Java "Out of Memory" errors. (This mainly seems to have been solved with a new machine, however.)

6) Ability to make content editing easier. Despite the product's downfalls, I think that the finished product can be customized enough to give more freedom to the content editors. They will still need training, but the ability to drag and drop components around a page and copy and paste them to a new area is more flexible and quicker. However, there are areas where it can be just as slow; for example, I am not quite happy with table management aspect. It does not give the content editor enough freedom to copy and paste multiple rows/columns and apply styles across multiple rows/columns.

7) Extending components is fiddly.

8) Incomplete and incorrect code. For example, I wanted to create an Accordion-style layout by using a Multi field component that takes a Composite Field, consisting of a 'rich text' component and a 'text' component. Although this is meant to work, it didn't. A quick look into Day's code showed that this feature had areas commented out with "//TODO" comments to get the multi field working with other combinations. (I decided to find another way to accomplish the task, and I must have tried three other ways before brainstorming with a colleague to come up with a completely different solution that wasn't as user-friendly for the content editor, but it worked.)

9) A lot of patience is needed as well as a lot of fiddling around and trial and error.

10) There's generally been a lack of support from Adobe or a lack of training/consultants available from Adobe to get started or fix issues initially.


SourceJenikya's

Explore more courses visit Mindmajix.com






DWH Life cycle

DWH is a process of building a data warehouse
1) Requirement gathering
  • It is done by business analysts, Onsite technical lead and client
  • In  this  phase  a  Business  Analyst  prepares  business  requirement specification(BRS)Document
  • 80% of requirement collection takes place at clients place and it takes 3-4 months for collecting the requirements
2) Analysis:
  • After collecting the requirements data modeler starts identifying dimensions, facts & aggregation depending on the requirements
  • An ETL Lead & BA create ETL specification document which contains how each target table to be populated from source
3) System Requirement Specification (SRS)
  • After collection of onsite knowledge transfer offshore team will prepare the SRS
  • A SRS document includes software,hardware,operating system requirements
4) Data Modeling
  • It’s a process of designing the database by fulfilling the use requirements
  • A data modeler is responsible for creating DWH/Data marts with the following kinds of schema

  • Star schema
  • Snowflake schema
5) ETL Development
  • Designing ETL applications to fulfill the specifications documents which are prepared in analysis phase
6) ETL Code review:
Code review will be done by developer
The following activities take place
  • Check the naming standards
  • Check the business logic
  • Check the mapping of source to target
7) Peer Review:
Code will be reviewed by a team member
  • Validation of code but not data
8) ETL Testing:
Following tests will be carried out for each ETL Application
  • Unit testing
  • Business Functionality testing
  • Performance testing
  • User acceptance testing
9) Report development environment:
  • Design the reports to fulfill report requirement templates/Report data workbook(RDW)
10) Deployment:
  • Process of migrating the ETL Code & Reports to a pre production environment for stabilization
  • It is also known as pilot phase/stabilization phase


Source: Mindmajix.com


Tuesday 19 December 2017

Why now is the time to learn R

We’ve all heard about big data; over the past few years, many companies have invested in Hadoop, NoSQL, and data warehouses, to collect and store massive volumes of new data. Even when based on open source platforms like Hadoop, these investments can easily measure in the millions of dollars for large companies with new hardware, new staff, and untold person-hours spent implementing new systems and procedures.

Now it’s time for that investment to pay off.

The way to do that is with data science, the extraction of knowledge from data. It’s more than just tabulating and reporting on the data; data science combines computer science, statistical analysis, and a keen understanding of business needs to separate correlation from causation, and to forecast future outcomes and risk. According to TheNextWeb, Data Scientists are “changing the face of business intelligence." And, the increased availability of data has made data science crucial to product development, and creating and managing innovations that are too complex for automated systems, especially in a world where privacy concerns are paramount.

As a result, companies are hiring data scientists at a massive rate. Job postings for data scientists have skyrocketed since early 2011 according to data from job-tracker Indeed.com. Although in recent months much of the growth has been in data science skills generally, as data scientists take on specialized job titles. Meanwhile, data scientists still command impressive salaries: a median of $98,000 worldwide and $144,000 in the US, according to the latest Data Science Salary Survey by O’Reilly Media.

chart.png
Job Trends
With such strong demand and such high salaries to offer, it’s no surprise that competition for hiring data scientists is intense. As a result, companies who previously relied on legacy proprietary platforms for statistical analysis are now adopting a new alternative, open source R. So far, it has been chosen by more than two million data scientists and statisticians around the world.

R is an open source software platform for statistical data analysis. The R project began in 1993 as a project by two statisticians in New Zealand, Ross Ihaka and Robert Gentleman, to create a new platform for research in statistical computing. Since then the project leadership has grown to include more than 20 leading statisticians and computer scientists from around the world.

Largely because of its open source nature, R was rapidly adopted by statistics departments in universities around the world, attracted by its extensible nature as a platform for academic research. Being free in cost certainly played a role as well. And it wasn’t long before researchers in statistics, data science, and machine learning started to publish papers in academic journals along with R code implementing their new methods. R makes this process very easy: anyone can publish an R package to CRAN (the “Comprehensive R Archive Network”) and make it available to everyone. As of this writing, thousands of R users have contributed more than 6,100 packages to CRAN, extending R’s capabilities in fields as diverse as econometrics, clinical trials analysis, social sciences, and web-based data. And one can easily search for R applications by topic or keyword at MRAN.

While the core R project is maintained by the R Foundation (a non-profit based in Vienna, Austria), other companies and organizations are extending R as well. The BioConductor Project has created an additional 900+ packages making R the leading software for genomic and genetic data analysis. RStudio has created an excellent open-source interactive development environment for the R language, further boosting the productivity of R users everywhere. And Revolution Analytics has boosted the performance of R with Revolution R Open and made it easy to embed R into other applications with DeployR.

With R’s widespread use in the academic sector, it wasn’t long before it started being used in the commercial sector as well. A front-page article in The New York Times technology section in January 2009 spurred a lot of new interest, and Revolution Analytics has been very active, offering technical support, services, and big-data capabilities. Today, R is ranked as the 9th most popular language by IEEE Spectrum, and it is consistently ranked the most popular language for data science and thousands of companies are using R for data science applications.



Here are just a few examples:

Google uses R to calculate the ROI on advertising campaigns.
Ford uses R to improve the design of its vehicles.
Twitter uses R to monitor user experience.
The US National Weather Service uses R to predict severe flooding.
The Rockefeller Institute of Government uses R to develop models for simulating the finances of public pension funds.
The Human Rights Data Analysis Group uses R to quantify the impact of war.
R is used frequently by The New York Times to create infographics and interactive data journalism applications.
These companies have adopted R because it’s the platform their data scientists prefer to use. And, crucially, given that data scientists are a limited resource, it’s also the platform that makes data scientists the most productive. Unlike proprietary systems which provide only constrained point-and-click tools or black-box procedures, R is a fully-fledged programming language. All of the functions needed for a typical data science application are included in the base language: functions for data access and preparation, data visualization, statistical modeling, and forecasting. Complete data analyses can often be represented in just a few lines of code. And because data scientists using R produce code, not just reports, it’s easier for them to collaborate, to replicate results (particularly in automated production environments), and to reuse code from other projects to get tasks done faster.

R’s open source nature also gives companies a boost when it comes to innovation. This is incredibly important in today’s data-centric world, where even a tiny edge in being able to predict customer needs or financial returns better than your competitors can mean the difference between success and failure. Because most cutting-edge research in statistics and machine learning is done in R, the latest techniques are usually available first as a package for R, years and sometimes decades before they appear in proprietary systems.

So, with data science as a top business priority according to Gartner, the popularity of R is set to grow even further. And if you’re looking to expand your career potential, and you have data analysis skills, you could do a lot worse than getting to know the R language.



Source : David Smith


Explore More Courses Mindmajix.com

Monday 18 December 2017

Informatica Data Quality tutorial

This tutorial gives you an overview and talks about the fundamentals of Informatica Data Quality (IDQ).
  • Informatica Data Quality is a suite of applications and components that you can integrate with Informatica Power Center to deliver enterprise-strength data quality capability in a wide range of scenarios.

  • The core components are: Data Quality Workbench, Data Quality Server.
  • Use to design, test, and deploy data quality processes, called Data Quality Workbench plans. Workbench allows you to test and execute plans as needed, enabling rapid data investigation and testing of data quality methodologies.

  • Data Quality Server. Use to enable plan and file sharing and to run plans in a networked environment. Data Quality Server supports networking through service domains and communicates with Workbench over TCP/IP.

  • Both Workbench and Server install with a Data Quality engine and a Data Quality repository. Users cannot create or edit plans with Server, although users can run a plan to any Data Quality engine independently of Workbench by run time commands or from Power Center.

  • Users can apply parameter files, which modify plan operations, to run time commands when running data quality plans to a Data Quality engine.

  • Informatica also provides a Data Quality Integration plug-in for Power Center. This plug-in enables Power Center users to add data quality plan instructions to a Power Center transformation and to run the plan to the Data Quality engine from a Power Center session.

  • In Data Quality, a plan is a self-contained set of data analysis or data enhancement processes. A plan is composed of one or more of the following types of component:
  1. Data sources provide the input data for the plan.
  2. Data sinks collect the data output from the plan.
  3. Operational components perform the data analysis or data enhancement actions on the data they receive.
Role of Dictionaries
Plans can make use of reference dictionaries to identify, repair, or remove inaccurate or duplicate data values. Informatica Data Quality plans can make use of three types of reference data.
Standard dictionary files. These files are installed with Informatica Data Quality and can be used by several types of component in Workbench. All dictionaries installed with Data Quality are text dictionaries. These are plain-text files saved in .DIC file format. They can be created and edited manually.
Database dictionaries Informatica Data Quality users with database expertise can create and specify dictionaries that are linked to database tables, and that thus can be updated dynamically when the underlying data is updated.
Third-party reference data. These data files are provided by third-parties and are provided by Informatica customers as premium product options. The reference data provided by third-party vendors is typically in database format.

Friday 15 December 2017

Micro-Services with AWS Lambda and API Gateway

This is just a little tutorial I’ve started putting together, I’ve struggled to find much in the way of good resources for Lambda and the latest AWS offering API Gateway so my overall target is to help other develops look at what’s a great step forward for ‘Back-end’ developers.

The real big news about AWS API Gateway is you can marry these two services together to create ‘Micro-Services’ which means having a HTTP endpoint without the creation of servers or the headache of provisioning the requirements for them. The concept is for those who want to execute small ideas where you’re only paying for when your code is in use instead of running a server 24/7 that might be only be used for less than half of that time.








Getting started

If you haven’t tried Amazon Web Service’s Lambda service yet, it’s really good but one of the real pains is testing locally and deploying remotely in that same manner with what we’re used to. A great little tool for simplifying this is grunt-aws-lambda which will give us a work flow for testing code with events. In this guide I’m making the assumption that you have node.js already installed and know a tiny amount of JavaScript to understand what we’re working with.

So lets start with this, first make a folder for your project and then open up a terminal in that folder and run:
npm init

You’ll get asked some questions, the default answers should be fine. There’s nothing particularly important you need to do when generating your package.json file but you can always edit it later if needed.

Then run:
npm install -g grunt-cli
npm install grunt-aws-lambda grunt-pack --save-dev
This will install the grunt command line tool and the packages which will help make running our lambda function locally a lot easier.
Now create the file Gruntfile.js with the contents:
module.exports = function(grunt) {
 grunt.loadNpmTasks(‘grunt-aws-lambda’);
 grunt.initConfig({
  lambda_invoke: {
   default: {
    options: {
    }
   }
  },
 });
};
Create your basic index.js file:
console.log(‘Loading function’);
exports.handler = function(event, context) {
 context.done(null, {“Hello”:”World”}); // SUCCESS with message
};
And create a blank event.json file (for now):
{}

Now if you’ve done things right you can now run the following command and you should get a successful output as shown below.
grunt lambda_invoke
Command Output:
Running “lambda_invoke:default” (lambda_invoke) task
Loading function
Success! Message:
 — — — — — — — — —
[object Object]
Done, without errors.

The Results so far…
So now we’ve created the simplest of Lambda functions for handling API requests and returning a json response. As you can see you won’t be able to see the output of the response from running the test, just simply that an object is returned (we can do more verbose logging later on). The important thing is we now have a good workflow for testing what we develop without constantly playing with AWS and incurring the costs to go with it.


If You learn more vist AWS Lambda


Explore more training courses here Mindmajix technologies


Resources
Grunt AWS Lamba NPM Page
AWS Lambda Docs — Node.js
AWS API Gateway