ASP.NET Sql Connections over VPN w/ Trust

September 29, 2009

Overview

In a follow up to my previous post on VPN connections using SQL management studio I should say that the previous workaround still presented issues if you were doing home development.

Suffice to say, if you were trying to use trusted sql connections and were running from your home PC you likely got an error about trusted connections when connected through VPN.

I had thought there was no real work around for this but a couple moments of sheer frustration have helped put more importance on me getting this resolved. So I present the simple (although frankly idiotic) steps to get trusted connections with a .NET application working over VPN.

Steps To Follow

  1. Change your computer name to match the domain name you are connecting to; for example if your domain at work is somecompany\yourusername – rename your computer somecompany.
  2. Create a user account on your home PC that matches the username you have at work, following the above example we create a username called “yourusername”. Also use the exact same password you use at work.
  3. After restarting your computer sign into your local PC using the newly created username.
  4. Connect to VPN as per normal

You should now be able to just open SQL management studio and/or visual studio and have it “just work” like it does at work.

SSIS / ETL Example – Yahoo Equity & Mutual Fund Price History

September 9, 2009

Download Example SSIS Here

The Concept

Been working with generating trailing return history for portfolios composed of various financial products from annuities to equities to mutual funds (open-ended, ETF, close-ended, etc). Problems usually come down to where to get this information and how to store it. And that’s where this concept comes into play.

The core focus here is to get and store effectively the entire range of historical prices (volume, open, close) for a rather large set of equities and mutual funds. To do this we need a feed of data. Where I work this doesn’t exactly exist so I had to do a quick scan on the ‘internets’ and I found the Yahoo price history feed.

Yahoo Price History Feed

The yahoo price history feed takes in a symbol and the date range you would like to get back and returns a nice CSV formatted download that can be used to extract the price history. An example of this URL is listed below.

http://ichart.finance.yahoo.com/table.csv?s={symbol}&a={startMM}&b={startDD}&c={startYYYY}&d={endMM}&e={endDD}&f={endYYYY}&g={res}&ignore=.csv

* startMM = The month with zero based index. (00 = Jan, 01 = Feb, 02 = Mar … 11 = Dec)
* startDD = The start date with padding to two digits (01, 02, 31)
* startYYYY = The start year (2009, 2008, etc)
* endMM = Usually today’s month with zero based index. (00 = Jan, 01 = Feb, 02 = Mar … 11 = Dec)
* endDD = Usually today’s with padding two digits. (00 = Jan, 01 = Feb, 02 = Mar … 11 = Dec)
* endMM = Using . (00 = Jan, 01 = Feb, 02 = Mar … 11 = Dec)
* res = is the resolution you want to return, this can be either “d” for day, “w” for weekly, “m” for monthly, and “y” for yearly.

Example:
Download GOOG

The Database Implementation

Product Dimension Table

To perform our daily download of feeds we need to have a couple things – a table that can be used to get all symbols that can be and should be priced every day. Currently I read from a quick table like this…

CREATE TABLE [dbo].[product_dim](
	[product_id] [int] IDENTITY(1,1) NOT NULL,
	[cusip] [varchar](20) NULL,
	[symbol] [varchar](8) NULL,
	[isin] [varchar](20) NULL,
 CONSTRAINT [pk_product_dim] PRIMARY KEY CLUSTERED
(
	[product_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Granted this is a very very basic product dimension table. You should certainly have more information than this in your data-warehouse for a financial product.

So after we have our table to read from to get the symbols to pass to YahooTM we need a place to store the results of call YahooTM.

SQL Table to Store Price History

This is the table we will be inserting into – basic again. In a bit we’ll talk about SQL partitioning when we get to performance we’ll talk about partitioning the table based on the date id.

CREATE TABLE [dbo].[product_price_fact](
	[product_id] [int] NOT NULL,
	[price_dt_id] [int] NOT NULL, -- simple date id like 20090901
	[price_open] [decimal](14, 2) NULL,
	[price_low] [decimal](14, 2) NULL,
	[price_high] [decimal](14, 2) NULL,
	[price_close] [decimal](14, 2) NULL,
	[price_adj_close] [decimal](14, 2) NULL,
	[price_volume] [bigint] NULL,
 CONSTRAINT [pk_product_price_fact] PRIMARY KEY CLUSTERED
(
	[product_id] ASC,
	[price_dt_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
)

The SSIS Package

The SSIS package is a 2008 SSIS package built using VS2008. It has been uploaded to my skydrive account. An overview is as follows.

SSIS Yahoo Package

SSIS Yahoo Package

The following components make up this SSIS package.

Product DIM Cache

This data flow grabs all the products that need to be priced along with the date that they were last priced.

Download History

A C# SSIS script that will call each symbol and cusip to download the price history for each one using the last price date as the start and todays date as the end. It will also augment the CSV file written to include the product id from the product_dim table as well as the CUSIP used to identify the symbol. This makes our SSIS package more performant when doing the inserts.

It relies on a stored procedure to get all products and their last update price history date.

-- ================================================
-- Template generated from Template Explorer using:
-- Create Procedure (New Menu).SQL
--
-- Use the Specify Values for Template Parameters
-- command (Ctrl-Shift-M) to fill in the parameter
-- values below.
--
-- This block of comments will not be included in
-- the definition of the procedure.
-- ================================================
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:		Terrance A. Snyder
-- Create date: 2009/09/09
-- Description:	gets product price history with the last price date.
-- =============================================
CREATE PROCEDURE SSIS_GetProductPriceHistory
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

	select
		product_dim.product_id
	,	product_dim.cusip
	,	product_dim.symbol
	,	MAX(product_price_fact.price_dt_id) [last_price_dt_id]
	from product_dim with (nolock)
		left join product_price_fact with (nolock) on
			product_price_fact.product_id = product_dim.product_id
	where
		product_dim.cusip is not null and product_dim.symbol is not null
	group by
		product_dim.product_id, product_dim.cusip, product_dim.symbol
	order by MAX(product_price_fact.price_dt_id) desc

END
GO

For-Each Enumerator

This is a file for-each enumerator for every *.csv file found in the specified output directory. This walks the directory and runs the “Update” data flow task. And once the update data flow task completes it deletes the CSV file.

Update Data Flow Task

This is the heart and sole as it contains most of the SSIS logic. We grab each file and take the CSV and fiddle with it to ensure data integrity and only insert those rows which are new (in case we are working on old files or got terminated mid-stream).


SSIS_Yahoo1

Performance Notes

So there are two performance improvements you can make here – the first is to get the product_price_fact table into a partitioned view. I cover this is the SSIS Tips/Tricks post from before.

The second is in the SSIS package itself – rather than using the for-each enumerator we could use a MULTIFLATFILE connection. This will make SSIS do a bulk operation against all files – however – my current machine lacks the RAM for this.

Please note that if you download the entire history the first time you will get roughly 4GB of data or more depending on how many symbols you are quoting.

SSIS Tips / Tricks

July 31, 2009

This is just a placeholder for an article about BI / Data warehouse information. Just a snippet to talk about partitioned tables in SQL server to cover parallel execution and how to tailor that for speedier apps. Been working late every day now for the past 4 months trying to get this concept of a client dashboard ready for management review. Meeting with the head-honchos this week – hope to impress.

I do like the below snippet for use in SSIS to transform date/time to an int for my partitioned views and a date_dim table. Useful and used a lot.

declare @dt datetime
set @dt = GETDATE()
select cast(convert(char(8), @dt, 112) as int)

SSIS DateTime Tips/Tricks

yyyy-mm-dd

The common yyyy-mm-dd format is often used in file names, for example:

C:\Temp\ErrorCodes\2005-11-18.txt

A sample expression to achieve this is:

 "C:\\Temp\\ErrorCodes\\" + (DT_WSTR,4)YEAR(GETDATE()) + "-"
    + RIGHT("0" + (DT_WSTR,2)MONTH(GETDATE()), 2) + "-"
    + RIGHT("0" + (DT_WSTR,2)DAY( GETDATE()), 2) + ".txt"

A similar expression, but this time deriving the file name based on yesterday’s date, useful for loading the previous day’s data:

 "C:\\Temp\\ErrorCodes\\" + (DT_WSTR,4)YEAR(DATEADD("dd", -1, GETDATE())) + "-"
    + RIGHT("0" + (DT_WSTR,2)MONTH(DATEADD("dd", -1, GETDATE())), 2) + "-"
    + RIGHT("0" + (DT_WSTR,2)DAY(DATEADD("dd", -1, GETDATE())), 2) + ".txt"

yyyymmdd

A simple yyyymmdd formatted string from a DateTime type variable

 (DT_WSTR,4)YEAR(@[User::DateTimeVar])
    + RIGHT("0" + (DT_WSTR,2)MONTH(@[User::DateTimeVar]), 2)
    + RIGHT("0" + (DT_WSTR,2)DAY(@[User::DateTimeVar]), 2)

An alternative yyyymmdd formatted string from a DateTime type variable.

 (DT_WSTR,8) (
   (YEAR(@[User::DateTimeVar]) * 10000) +
    (MONTH(@[User::DateTimeVar]) * 100) +
    DAY(@[User::DateTimeVar])
    )

An alternative yyyymmdd INT from a string variable or column in the format yyyy-mm-dd.

(DT_I4)(SUBSTRING(Date,1,4) + SUBSTRING(Date,6,2) + SUBSTRING(Date,9,2))

Calculate the Beginning of a Previous Month

This expression starts from today, moves back three months (as an example), subtracts the day-count from the current day-of-month to get the first day, then converts the expression to a DT_DBDATE type (which does not support a time component) then converts it back to a regular DT_DATE, which does have a time component – but now it’s truncated the time to 00:00 AM.

 (DT_DATE)(DT_DBDATE)DATEADD("dd",-1 * (DAY(GETDATE())-1),DATEADD("month",-3,GETDATE()))

And for SQL I often get asked how to get the start of the current year, start of current month.

-- get start of year
SELECT DATEADD(year, DATEDIFF(year, 0, GETDATE()), 0)
-- get start of month
SELECT DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0)
-- end of month
SELECT DATEADD(mm, 1 + DATEDIFF(mm, 0, GETDATE()), 0) - 1
-- end of year
SELECT DATEADD(yy, 1 + DATEDIFF(yy, 0, GETDATE()), 0) - 1

Calculate the End of a Previous Month

To get Midnight on the last day of the previous month, we back up to the first day of the immediately following month, truncate the time to 00:00 (as in the previous example) and then subtract 1 minute to get the ending time of the previous day. (Note, in SQL Server 2005 SP2, subtracting 3ms to get the absolute last time-slice of the previous day did not work properly with the MONTH() function.)

 DATEADD("mi",-1,(DT_DATE)(DT_DBDATE) DATEADD("dd",0,DATEADD("dd",-1 * (DAY(GETDATE())-1),DATEADD("month",-2,GETDATE()))))

Getting the Fiscal Year for a Date

To get the fiscal year for a given date, use the conditional operator to check the month part of the date, and return either the year part of the date, or the year part of the date plus one depending on the cutoff of the fiscal year definition. This sample assumes a fiscal year that ends June 30th:

 MONTH( @[User::InputDate]  ) <= 6 ? YEAR ( @[User::InputDate]  )  : YEAR ( @[User::InputDate]  )  + 1

Performance and Distribution / Scaling

Database Table Partitioning Tutorial – How to Horizontal Partition Database Table

http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/

http://sqlhero.wordpress.com/2009/06/13/partition-table-in-sql-server-2008/

http://msdn.microsoft.com/en-us/library/ms191174.aspx

Switching Partitions and Concurrency (Option 2 and Option 3)

ALTER TABLE … SWITCH … PARTITION is a very convenient way to move bulk data in and out of a partitioned table. However, the locking requirements of the ALTER TABLE statement might cause problems in situations when there are ongoing queries/DML on the table. Table 2 shows the summary of the current behavior:

http://technet.microsoft.com/en-us/library/cc966380.aspx

Data Warehouse Performance

Relational Data Warehouse or Reporting work loads are characterized by low volumes of very large transactions. These applications are often identified as having predominately read workloads (e.g. Decision Support, Analysis, and Reporting) with periodic feeds or batch loads. It is important to keep these characteristics in mind as we examine the significance of database design, resource utilization and system performance. The top performance bottlenecks or gotchas to avoid for Data Warehouse or Reporting applications are outlined below.

http://technet.microsoft.com/en-us/library/cc917690.aspx

Partitioned Tables, Parallelism & Performance considerations

Q: When querying SS2005 Partitioned Tables, what kind of parallelism should I expect to see and how will that affect performance?

http://blogs.msdn.com/sqlcat/archive/2005/11/30/498415.aspx